International Journal of Mathematics and Mathematical Sciences

Volume 2019, Article ID 7829805, 7 pages

https://doi.org/10.1155/2019/7829805

## An Extension of Totohasina’s Normalization Theory of Quality Measures of Association Rules

^{1}Lycée Mixte Antsiranana, Antsiranana, Madagascar^{2}Department of Mathematics and Informatics Application, University of Antsiranana, Madagascar^{3}Department of Mathematics and Informatics Application, University of Toamasina, Madagascar

Correspondence should be addressed to André Totohasina; moc.liamg@anisahotot.erdna

Received 1 September 2018; Accepted 6 January 2019; Published 29 January 2019

Academic Editor: Theodore E. Simos

Copyright © 2019 Armand et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

#### Abstract

In the context of binary data mining, for unifying view on probabilistic quality measures of association rules, Totohasina’s theory of normalization of quality measures of association rules primarily based on affine homeomorphism presents some drawbacks. Indeed, it cannot normalize some interestingness measures which are explained below. This paper presents an extension of it, as a new normalization method based on proper homographic homeomorphism that appears most consequent.

#### 1. Introduction

In the context of the implicative statistical analysis (ISA) [1], you can never leave aside the probabilistic notions of the measures of quality which assess the degree of implicative link between two patterns of association rule. In light of very rich number of measures in the literature of the binary database, researchers are working [2–7] parallelly to find the more general relationships allowing to classify partially or entirely these various measures of interest. Hence the creation of highly founded concept called “normalization under five constraints quality measures” in the context of data mining appeared in 2003 [2]. This definition of normalized measures is acquired. It thus turns out that all normalizable measures by affine homeomorphism become comparable [2, 8]. Although this author has already well-treated this subject, the problem of normalization of probabilistic quality measures remains not fully resolved. Indeed, the tool called affine homeomorphism he used may have some weakness, because it cannot normalize a measure intended to the infinity at least one of reference situations more or less intuitive such as* the incompatibility, the statistical independence, and logical implication* [1, 9–12], for example, the measures* Cost Multiplying, Sebag, Conviction, Odd-Ratio, Informal Gain, and Ratio of Example counter-example*. Therefore, as already announced by the author ([13] paragraph 3.2 page 65) “the problem remains open concerning the transformation allowing normalization of other measures in a way still to be specified”. We are interested in this issue. This article proposes a way to partly solve this problem and discussion. First, let us recall below our proposed definition of a normalized quality measure which is evolving with three intuitive events as incompatibility, stochastic independence, and implication.

*Definition 1 (cf. [6, 13–15]). *Let* X* and* Y* be itemset of a context of binary data mining , the uniform discrete probability on probabilizable space [2], all sets of transactions, the set of attributes called items or patterns, the binary relation from to , a probabilistic quality measure, and an association rule, with . , denoted the extention of the itemset : is hence an event of the discrete probability . A quality measure of an association rule is called to be normalized if it verifies the five conditions below: (i), if ; i.e., and are two incompatible events: this means the two patterns and are incompatible then;(ii), if ; i.e., and are two negatively dependent events, and as such both and patterns are then negatively dependent;(iii), if ; i.e., the two events and are independent and this means both and are independent itemsets;(iv), and ; i.e., if and are positively dependent, therefore the two itemsets and are positively dependent;(v), and or if : the itemset is then completely included in . ; i.e., the set of all transactions contains the pattern [6].

*Example 2. *According to the definition mentioned above, it is easy to show that the measure defined by

where , if favors , and , if disfavors , is normalized.

Note that this measure was discovered independently by three authors from three continents: by S. Guillaume in France in her thesis [15], by A. Totohasina in [2] in Madagascar at his research on normalization, at that time he appointed ION “Implication Oriented Normalised”, and by X. Wu, C. Zhang, and S. Zhang in [5] in USA under the name of CPIR “Conditional Probability Incrementation Ratio”. Its rich and interesting mathematical properties are studied in [6, 15]. In addition, it is historically interesting to notice that this measure was partially discovered as* Certainty Factor (CF)* by Edward H. Shortliffe and Bruce G. Buchanan in USA at 1975 [12]. Fernando Berzal et al. [16] established some statistical properties of* CF* and its relation with some common interestingness measures as Confidence and Conviction.

Hereafter, our work is divided into five sections. Section 2 recalls some properties of a homographic homeomorphism that will be the main subject of our contribution. Section 3 offers a new normalization process based on homographies, in order to solve the aforementioned problem. Section 4 recommends the raw results of each of the normalized measures. Section 5 raises a conclusion.

#### 2. About Homographic Function Processing Tool

##### 2.1. Definition and Reminders

In mathematics, a homographic function is a function which can be represented as the form of quotient of two linear functions. It is bijective and its inverse function is a particular homographic function. In the commutative field a homographic function f on is a function in itself defined by , where , and are real numbers such as . Prohibit to be zero to avoid a constant function. Sometimes the condition “ not zero” is added, as the case corresponds to linear functions, but then we lose the group structure of the set of homographic functions with the composition of applications.

We will retain that a homographic function is a homeomorphism of the form , where This function determines a bijection from to whose inverse bijection , which has the same determinant . Note that is a homography of the same type as and the graphs of homography and are hyperbolas.

It is seen that if we extend by and , we obtain a projective application and let us denote .

*Derivative and Variation.* In the real homography case , its derivative is , where is the determinant of the matrix and so is called the determinant of the homography and denoted . For this reason, here are the variations in the homographic function: if is positive, then is increasing on its two definition intervals; if is negative, then is strictly decreasing on both definition intervals.

##### 2.2. Canonical Form

Let be a homography such that , with .

In case is not zero, the canonical form (also called reduced form) of a homographic function is , where , , and . By making a change in reference by taking a point S, the set projective applications, of coordinates as a new origin, the expression of the homographic function becomes which corresponds to the inverse function multiplied by the scalar .

*Morality.* Any own homographic function nonzero determinant can thus be reduced to a homographic function type as , with .

From now, we are interested in all homographies of type as , with , that : own homography thus returns infinity to a real finite. Knowing that, this time, our main objective is to “make the infinity to be finite”, we thereafter consider the measures that have infinity value among the three conditions such as* logical implication, stochastic independence, and incompatibility*; we use the homographic function mentioned above. Then, for any situation not leading to infinity, it is relevant to use the theory in [13] which is based on the use of an affine homeomorphism. Taking advantage of the fact that affine applications are part of the great family of homographies and appear as degenerate homographies returning infinity to infinity, we will enhance the whole nonconstant homographies. In the current theory we can combine the theory in [2] and the one we have just proposed. It is thus a natural extension of such approach of [14].

##### 2.3. Notations

For convenience let us denote by the value of the probabilistic quality measure , by the value of , at logical implication, that of at independence, and the value of at incompatibility, where .

#### 3. Normalization Process by Homography

Let be the homography normalization of quality measure , the semihomographic normalized of , at right semihomographic normalized of , and at left semihomographic normalized of .

As announced in [2], the main objective of normalization of quality measure is to bring its values in ; under the three conditions that takes the value - 1 at incompatibility, 0 at independence, and 1 at logical implication in order to compare two normalizable measures. Remember always that if these three values are finite and pairwise different, so the research carried out by [14] has already taken the approach to solve this kind of problem (problem of normalization of probabilistic quality measure), i.e., the use of the expression of the normalized of :where these four coefficients, called normalization coefficient, are determined by passing unilateral limits in reference situations (incompatibility, independence, and logical implication) due to the continuity of evolution in both zones: attraction (positive dependence) and repulsion ( negative dependence ) [2, 6, 17].

If one or two of the three values , , and are infinite and in case we have two infinite values, it is necessary that is excluded, which leads us to use one of the three following expressions to find the four real coefficients, , , , and : These four coefficients are still determined by passing unilateral limits in situations of reference (incompatibility, independence, and logic implication) by taking into account the continuity evolution in both zones: attraction (positive dependence) and repulsion (negative dependence). In case where favors , can be infinite, , and ; then we obtain the following system of equations:As and , so you only use the theories in [2] for the left normalization. We can write the system of four nonlinear equations, with the following four unknowns:We are just here to take and we have four equations with four unknowns, with this particularity that the coefficient can be infinite. Hence we have the following proposition.

Proposition 3. *( i) If with , , and pairwise distinct, then the system of equations (4) admits four real solutions; (ii) if = and , then the system of equations (4) has four real solutions such that = 1, = -, , and = -; (iii) otherwise, the system of equations has no solution.*

*Proof. * The system of equations (4) is equivalent to the system of equations (5):It became a system of linear equations. The matrix writing system of equations (5) is given by the vector equation (6):Let us call = So we must have = ; so (*i*); for , just take ; for , if , the last two equations do not make sense.

The following system of equations presents the common features of equation (5) with the only difference that can be infinite, and . This gives the system of equations (7).Hence we obtain the following proposition.

Proposition 4. *( i) If , with , , and pairwise distinct, then the system of equations (7) admits four real solutions; (ii) if = and , then the system of equations (7) has four real solutions such that = -1, and , ; (iii) otherwise, the system of equations has no solution.*

*Proof. * The system of equations (7) is equivalent to the system of equations (8):It became a system of linear equations. The matrix writing the system of equations (8) is given by the vector equation (9):Let us call = So we must have = ; so (*i*); for , just take lim ; for , if , then the last two equations do not make sense.

The following system is similar to the previous equation (7), this time with and , including case where ; then in this case, m must be nonzero. Take, for example, m = 1.Then we obtain the following proposition.

Proposition 5. *( i) If , with , , , and pairwise distinct, then the system of equations (10) has four real solutions; (ii) if , , and , then the system of equations (10) has four real solutions such that , and , ; (iii) then, the system of equations has no solution.*

*Proof. * The system of equations (10) is equivalent to the system of equations (11):It is reduced to a system of linear equations. The matrix writing of the system of equations (11) is given by the vector equation (12):Let us call = Thus we must have = ; so (*i*); for , just take ; for , if , then the last two equations do not make sense.

The current form of the studied system was the appearance of the system of equations (13):In (13) and can be infinite, is real, and .

Proposition 6. *( i) If , with , , and pairwise distinct, then the system of equations (13) has four real solutions; (ii) if , and if , then the system of equations (13) has four solutions actual such that , and , ; (iii) alternatively, the system of equations has no solution.*

*Proof. *(*i*) The system of equations (13) is equivalent to the system of equations (14):It therefore comes to a system of linear equations. The matrix writing system of equations (14) is given by the vector equation (15): Let us call = Thus, we must always have ; for , just take and ; for , if , then all the equations do not make sense.

*Note 7. *Note that the different matrices for the four above-mentioned propositions as , , , have the same determinant; this means that all the measures have the following conditions and have the same condition of normalizability.

#### 4. Application of These Four Propositions

We recall in Table 1 the respective definitions of the various measures that lead to the results below.