Abstract

Regarding the existence of more than sixty interestingness measures proposed in the literature since 1993 till today in the topics of association rules mining and facing the importance these last one, the research on normalization probabilistic quality measures of association rules has already led to many tangible results to consolidate the various existing measures in the literature. This article recommends a simple way to perform this normalization. In the interest of a unified presentation, the article offers also a new concept of normalization function as an effective tool for resolution of the problem of normalization measures that have already their own normalization functions.

1. Introduction

1.1. Definitions and Notations

We always put ourselves in the framework of a context of binary data mining (see, for example, [13], which illustrate the importance of association rules mining based on choosing some quality measure) , where is a nonempty finite set of attributes or variables, a finite set of entities or objects, a binary relation from to , and discrete uniform probability in the probabilistic space [4, 5].

In the next sections, we use the following notation for two itemsets , : , i.e., the set of all transactions containing the pattern that is the dual of a pattern of [4, 6, 7]. represents the size of the total sample size; represents the number of transactions satisfying pattern ; represents the number of transactions satisfying both and ; , where is the logical negation of ; represents the support of the pattern

Hereafter, our work is divided into three sections. Section 2 gives the definition of normalization function. Section 3 recommends the raw results of normalizing function of some probabilistic quality measures. Finally Section 4, [5, 810] sets out the conclusion and perspectives.

2. Normalizing Function

2.1. Motivations

The theory and practice of normalization probabilistic quality measures (see, for example, [4, 5, 11], (Totohasina et al. [12]), [6, 7]) have been resolving included in the list of tools for problems concerning the data mining. This is done in the view of regrouping [3, 4, 6, 1217] different existing measures available from the literature. Let us notice that [4] proves existence of infinity quality measures through the concept of the so-called normalized quality measure under five conditions, but recently [16] still proposes a novel interestingness measure. By opening the door to the possibility of creating definitions of new concepts in the context of data mining, perhaps, this will bring to following a new reflection among researchers in this field. This is the normalizing function. What is meant by a normalizing function? The following section will attempt to answer such question. Remember that this paper is the logical continuation of the paper [18].

2.2. Proposal Approach

Definition 1. We consider a probabilistic measure of interest We call normalizing function of the piecewise continuous function that can normalize directly the measure and is defined asIt takes the following particular values:where is value of ; is value of at logical implication; is value of at independence; is value of at incompatibility; is normalizing function, if favors , that is to say, in case where ; is normalizing function, if disfavors , that is to say, in case where ; is normalizing function of measure , that is, affine normalizable; is normalizing function of measure , that is, homographic normalizable; is normalizing function of measure , that is, semihomographic-normalizable to right; is normalizing function of measure , that is, semihomographic-normalizable to left; , , , and are of normalization coefficient ; is real and

The definition of normalizing function is thus determined according to the quality measures. Note that this is a function as any other, with respect to the variable ; it is a numerical function of a real variable; only, this variable is as follows:

. The four coefficients of normalization of probabilistic measures quality have to meet the following conditions:

, , , and , where , and are the five real functions. We note in passing that there is also a group of measures with the same normalization coefficients. The normalizing function is also one of the means that allow to give an interpretation of a normalized measurement after providing the respective values of normalization coefficients , , , and . It also expresses an opportunity of value for all normalized measures. According to what we have just written, any normalization of function reflecting the objective in normalization must necessarily have the following properties.

Property 2 (necessary conditions). : is a continuous function, positive and strictly increasing on the interval and realizes a bijection on the interval ; that is to say, must have the following: a limit 1 to the point ; that is to say, ; a limit 0 to the point ; that is to say, .
: is a continuous function, negative, and strictly increasing on the interval and realizes a bijection on the interval ; that is to say, must have the following: a limit 0 to the point ; that is to say, ; a limit -1 to the point ; that is to say, . As a recap, we have the following.
: is a continuous function at the point and increasing on the interval and realizes a bijection on the interval ; that is to say, must have the following: a limit 1 to the point ; that is to say, ; a limit 0 to the point ; that is to say, and ; limit -1 to the point ; that is to say, .

3. Applications

We recall in Table 1 the respective definitions of the various measures that lead to the results below:(1) Cost multiplying: which is such that , , . Using research theories of normalization coefficients , , , and in [18] we have , , , and . As a result, by replacing , , , , and by their values in the expression (3) its normalizing function is such that It is easy to see that this function is continuous piecewise, in particular point .Finally, it represents a function that exposed the necessary and sufficient conditions for the normalized and continuous measures.Its tables of variation are respectively presented in Figures 1 and 2.The graphic illustration and the variation table of of the measure Cost Multiplying reveal that is strictly increasing, is positive, and realizes a bijection on and even is strictly increasing and negative and realizes a bijection on . These results show that is strictly increasing and it realizes a bijection on .Note that, in the following, for the search of the four normalization coefficients, the same principle is used with the measure “dimension multiplier” for the other following measures because they are already expressed in Table 1.(2) Example counter-example: its normalizing function is such that using the expression (4) for with , , and so(3) Informal gain: its normalizing function is such that using expression (4) for with , , , and so (4) Odd-ratio: its normalizing function is such that using expression (3) for with , , , and so(5) Conviction: its normalizing function is such that using expression (5) for with , , , and so(6) Sebag: its normalizing function is such that using expression (3) for with , , , and so(7): its normalizing function is such that using expression (2) with , , , and so(8) Support; its normalizing function is such that using expression (2) with , , , and so(9) Confidence: its normalizing function is such that using expression (2) with , , , and so(10) Recall: its normalizing function is such that using expression (2) with , , , and so(11) Lift: its normalizing function is such that using expression (2) with , , , and so(12) Laverage: its normalizing function is such that using expression (2) with , , , and so (13) Centered confidence: its normalizing function is such that using expression (2) with , , , and so (14) Featured confirmed confidence: its normalizing function is such that using expression (2) with , , , and so(15) Certainty factor: its normalizing function is such that using expression (2) with , , , and so(16) Gras implication index: its normalizing function is such that using expression (2) with , , , and so(17) Piatetsky-Shapiro: its normalizing function is such that using expression (2) with , , , and so(18) Cosinus: its normalizing function is such that using expression (2) with , , , and so(19) Loevinger: its normalizing function is such that using expression (2) with , , , and so(20) Cohen ou Kappa: its normalizing function is such that using expression (2) with , , , and so(21) Addiction: its normalizing function is such that using expression (2) with , , , and so(22) Novelty: its normalizing function is such that using expression (2) with , , , and so(23) Czekanowski-Dice ou F-measure: its normalizing function is such that using expression (2) with , , , and so(24) Relative risk: its normalizing function is such that using expression (2) with , , , and so(25) Negative reliability: its normalizing function is such that using expression (2) with , , , and soResearches out on the expressions of measurement normalizing functions prove that certain measures have identical normalizing functions, leading to what state the theorem below.

Theorem 3. (i) All measures having the same baseline have the same normalizing function.
(ii) All measures affine normalizable are homographic normalizable, but the converse is false.

Proof. (i) Suppose the four possible forms of (a) is affine normalizable:withandso we have(b) is to right semihomographic normalizable: in the case where , that is to say, , the expression is used: with and so(c) is to left semihomographic normalizable: in the case where ; that is to say, . This time we use the following expression:such that and so (d) is homographic normalizable:in the case where , and ; that is to say,and Here we use the following expression:where andso(ii) (a) It is seen that if and , then the four terms of the function of normalizations are well defined; therefore, there is no problem for calculating the normalization coefficients.
(b) We see that if , then we can always get a projective application in for the functions , , and and therefore the four coefficients of normalization are always calculable; by constraint, if , then you can never get a projective application over the interval in for the function ; therefore, we can not calculate these four coefficients. The theorem is stated.

Table 1 recalls the respective definitions of the various measures.

4. Conclusion and Perspectives

This study showed that normalization of probabilistic quality measures with a homographic homeomorphism is more powerful than the normalization homeomorphism refines initiated by André Totohasina. Indeed, we showed that any measure affine-normalizable is homographic normalizable, while the converse is false. Besides, this work has explained the process of normalization by homographic function and combination with an affine function by trying to sweep the present main measures in the literature with the aim of a presentation easier to understand. The database has several branches; the purpose of this research is the normalization of quality measures. We always say, in the context of the database, the study on association rules knows an important development, added to the measures called interest; yet probabilistic quality measures have an important place in the context of data mining. Thereafter, the probabilistic measure of quality and its normalization must be complement. As shown by research on normalization probabilistic quality measures realizing the normalization operation requires passing through a relatively complex theory. We can consider several possible ways to carry out its standardization process. In our opinion, the use of normalizing function seems the simplest way.

In future work, we understand the positive impact of consideration of these normalizing functions in the development of the bases of the rules in search of binary data.

Data Availability

The data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest

The authors declare that there are no conflicts of interest regarding the publication of this paper.