Table of Contents Author Guidelines Submit a Manuscript
International Journal of Mathematics and Mathematical Sciences
Volume 2018, Article ID 4814716, 8 pages
https://doi.org/10.1155/2018/4814716
Research Article

Toward a Theory of Normalizing Function of Interestingness Measure of Binary Association Rules

1Lycee Mixte Antsiranana, Madagascar
2Department of Mathematics and Informatics, University of Antsiranana, Madagascar
3Department of Mathematics, Informatics and Applications, University of Toamasina, Madagascar

Correspondence should be addressed to André Totohasina; moc.liamg@anisahotot.erdna

Received 2 September 2018; Accepted 29 October 2018; Published 14 November 2018

Academic Editor: Theodore E. Simos

Copyright © 2018 Armand Armand et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Abstract

Regarding the existence of more than sixty interestingness measures proposed in the literature since 1993 till today in the topics of association rules mining and facing the importance these last one, the research on normalization probabilistic quality measures of association rules has already led to many tangible results to consolidate the various existing measures in the literature. This article recommends a simple way to perform this normalization. In the interest of a unified presentation, the article offers also a new concept of normalization function as an effective tool for resolution of the problem of normalization measures that have already their own normalization functions.

1. Introduction

1.1. Definitions and Notations

We always put ourselves in the framework of a context of binary data mining (see, for example, [13], which illustrate the importance of association rules mining based on choosing some quality measure) , where is a nonempty finite set of attributes or variables, a finite set of entities or objects, a binary relation from to , and discrete uniform probability in the probabilistic space [4, 5].

In the next sections, we use the following notation for two itemsets , : , i.e., the set of all transactions containing the pattern that is the dual of a pattern of [4, 6, 7]. represents the size of the total sample size; represents the number of transactions satisfying pattern ; represents the number of transactions satisfying both and ; , where is the logical negation of ; represents the support of the pattern

Hereafter, our work is divided into three sections. Section 2 gives the definition of normalization function. Section 3 recommends the raw results of normalizing function of some probabilistic quality measures. Finally Section 4, [5, 810] sets out the conclusion and perspectives.

2. Normalizing Function

2.1. Motivations

The theory and practice of normalization probabilistic quality measures (see, for example, [4, 5, 11], (Totohasina et al. [12]), [6, 7]) have been resolving included in the list of tools for problems concerning the data mining. This is done in the view of regrouping [3, 4, 6, 1217] different existing measures available from the literature. Let us notice that [4] proves existence of infinity quality measures through the concept of the so-called normalized quality measure under five conditions, but recently [16] still proposes a novel interestingness measure. By opening the door to the possibility of creating definitions of new concepts in the context of data mining, perhaps, this will bring to following a new reflection among researchers in this field. This is the normalizing function. What is meant by a normalizing function? The following section will attempt to answer such question. Remember that this paper is the logical continuation of the paper [18].

2.2. Proposal Approach

Definition 1. We consider a probabilistic measure of interest We call normalizing function of the piecewise continuous function that can normalize directly the measure and is defined asIt takes the following particular values:where is value of ; is value of at logical implication; is value of at independence; is value of at incompatibility; is normalizing function, if favors , that is to say, in case where ; is normalizing function, if disfavors , that is to say, in case where ; is normalizing function of measure , that is, affine normalizable; is normalizing function of measure , that is, homographic normalizable; is normalizing function of measure , that is, semihomographic-normalizable to right; is normalizing function of measure , that is, semihomographic-normalizable to left; , , , and are of normalization coefficient ; is real and

The definition of normalizing function is thus determined according to the quality measures. Note that this is a function as any other, with respect to the variable ; it is a numerical function of a real variable; only, this variable is as follows:

. The four coefficients of normalization of probabilistic measures quality have to meet the following conditions:

, , , and , where , and are the five real functions. We note in passing that there is also a group of measures with the same normalization coefficients. The normalizing function is also one of the means that allow to give an interpretation of a normalized measurement after providing the respective values of normalization coefficients , , , and . It also expresses an opportunity of value for all normalized measures. According to what we have just written, any normalization of function reflecting the objective in normalization must necessarily have the following properties.

Property 2 (necessary conditions). : is a continuous function, positive and strictly increasing on the interval and realizes a bijection on the interval ; that is to say, must have the following: a limit 1 to the point ; that is to say, ; a limit 0 to the point ; that is to say, .
: is a continuous function, negative, and strictly increasing on the interval and realizes a bijection on the interval ; that is to say, must have the following: a limit 0 to the point ; that is to say, ; a limit -1 to the point ; that is to say, . As a recap, we have the following.
: is a continuous function at the point and increasing on the interval and realizes a bijection on the interval ; that is to say, must have the following: a limit 1 to the point ; that is to say, ; a limit 0 to the point ; that is to say, and ; limit -1 to the point ; that is to say, .

3. Applications

We recall in Table 1 the respective definitions of the various measures that lead to the results below:(1) Cost multiplying: which is such that , , . Using research theories of normalization coefficients , , , and in [18] we have , , , and . As a result, by replacing , , , , and by their values in the expression (3) its normalizing function is such that It is easy to see that this function is continuous piecewise, in particular point .Finally, it represents a function that exposed the necessary and sufficient conditions for the normalized and continuous measures.Its tables of variation are respectively presented in Figures 1 and 2.The graphic illustration and the variation table of of the measure Cost Multiplying reveal that is strictly increasing, is positive, and realizes a bijection on and even is strictly increasing and negative and realizes a bijection on . These results show that is strictly increasing and it realizes a bijection on .Note that, in the following, for the search of the four normalization coefficients, the same principle is used with the measure “dimension multiplier” for the other following measures because they are already expressed in Table 1.(2) Example counter-example: its normalizing function is such that using the expression (4) for with , , and so(3) Informal gain: its normalizing function is such that using expression (4) for with , , , and so (4) Odd-ratio: its normalizing function is such that using expression (3) for with , , , and so(5) Conviction: its normalizing function is such that using expression (5) for with , , , and so(6) Sebag: its normalizing function is such that using expression (3) for with , , , and so(7): its normalizing function is such that using expression (2) with , , , and so(8) Support; its normalizing function is such that using expression (2) with , , , and so(9) Confidence: its normalizing function is such that using expression (2) with , , , and so(10) Recall: its normalizing function is such that using expression (2) with , , , and so(11) Lift: its normalizing function is such that using expression (2) with , , , and so(12) Laverage: its normalizing function is such that using expression (2) with , , , and so (13) Centered confidence: its normalizing function is such that using expression (2) with , , , and so (14) Featured confirmed confidence: its normalizing function is such that using expression (2) with , , , and so(15) Certainty factor: its normalizing function is such that using expression (2) with , , , and so(16) Gras implication index: its normalizing function is such that using expression (2) with , , , and so(17) Piatetsky-Shapiro: its normalizing function is such that using expression (2) with , , , and so(18) Cosinus: its normalizing function is such that using expression (2) with , , , and so(19) Loevinger: its normalizing function is such that using expression (2) with , , , and so(20) Cohen ou Kappa: its normalizing function is such that using expression (2) with , , , and so(21) Addiction: its normalizing function is such that using expression (2) with , , , and so(22) Novelty: its normalizing function is such that using expression (2) with , , , and so(23) Czekanowski-Dice ou F-measure: its normalizing function is such that using expression (2) with , , , and so(24) Relative risk: its normalizing function is such that using expression (2) with , , , and so(25) Negative reliability: its normalizing function is such that using expression (2) with , , , and soResearches out on the expressions of measurement normalizing functions prove that certain measures have identical normalizing functions, leading to what state the theorem below.

Table 1: Probabilistic quality measures of expression.
Figure 1: Change in measure normalizing function “Cost Multiplying”.
Figure 2: Geometric interpretation of the normalization function of the “Cost Multiplying” measure.

Theorem 3. (i) All measures having the same baseline have the same normalizing function.
(ii) All measures affine normalizable are homographic normalizable, but the converse is false.

Proof. (i) Suppose the four possible forms of (a) is affine normalizable:withandso we have(b) is to right semihomographic normalizable: in the case where , that is to say, , the expression is used: with and so(c) is to left semihomographic normalizable: in the case where ; that is to say, . This time we use the following expression:such that and so (d) is homographic normalizable:in the case where , and ; that is to say,and Here we use the following expression:where andso(ii) (a) It is seen that if and , then the four terms of the function of normalizations are well defined; therefore, there is no problem for calculating the normalization coefficients.
(b) We see that if , then we can always get a projective application in for the functions , , and and therefore the four coefficients of normalization are always calculable; by constraint, if , then you can never get a projective application over the interval in for the function ; therefore, we can not calculate these four coefficients. The theorem is stated.

Table 1 recalls the respective definitions of the various measures.

4. Conclusion and Perspectives

This study showed that normalization of probabilistic quality measures with a homographic homeomorphism is more powerful than the normalization homeomorphism refines initiated by André Totohasina. Indeed, we showed that any measure affine-normalizable is homographic normalizable, while the converse is false. Besides, this work has explained the process of normalization by homographic function and combination with an affine function by trying to sweep the present main measures in the literature with the aim of a presentation easier to understand. The database has several branches; the purpose of this research is the normalization of quality measures. We always say, in the context of the database, the study on association rules knows an important development, added to the measures called interest; yet probabilistic quality measures have an important place in the context of data mining. Thereafter, the probabilistic measure of quality and its normalization must be complement. As shown by research on normalization probabilistic quality measures realizing the normalization operation requires passing through a relatively complex theory. We can consider several possible ways to carry out its standardization process. In our opinion, the use of normalizing function seems the simplest way.

In future work, we understand the positive impact of consideration of these normalizing functions in the development of the bases of the rules in search of binary data.

Data Availability

The data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest

The authors declare that there are no conflicts of interest regarding the publication of this paper.

References

  1. R. Agrawal, T. Imieliński, and A. Swami, “Mining association rules between sets of items in large databases,” in Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD '93), pp. 207–216, 1993. View at Publisher · View at Google Scholar
  2. M. Kamber and R. Shinghal, “Proposed interestingness measure for characteristic rules,” Review, vol. 20, no. 1, p. 21, 1996. View at Google Scholar
  3. D. Grissa, Etude comportementale des mesures dinterêt dextraction de connaissance [Ph.D. thesis], Universite Clermont-Ferrand II, 2013.
  4. A. Totohasina, “Towards a theory unifying implicative interestingess mesures and critial values consideration in MGK,” in Educ. Matem. Pesq, vol. 16 no 3, pp. 881–900, Sao Paulo, 2014. View at Google Scholar
  5. G. Yongmei and B. Fuguang, “The Research on Measure Method of Association Rules Mining,” Journal of Database Theory and Application, vol. 8, no. 2, pp. 245–258, 2015. View at Publisher · View at Google Scholar
  6. H. F. Rakotomalala, B. B. Ralahady, and A. Totohasina, “A Novel cohesitive implicative classification based on MGK and application on diagonostic on informatics literacy of studens of higher education in Madagascar,” in Third International Congress on Information and Communication Technology, vol. 797 of Advances in Intelligent Systems and Computing, pp. 161–174, Springer Singapore, Singapore, 2019. View at Publisher · View at Google Scholar
  7. B. Shekar and R. Natarajan, “A Transaction-Based Neighbourhood-Driven Approach to Quantifying Interestingness of Association Rules,” in Proceedings of the Fourth IEEE International Conference on Data Mining (ICDM'04), pp. 194–201, Brighton, UK. View at Publisher · View at Google Scholar
  8. X. Wu, C. Zhang, and S. Zhang, “Efficient mining of both positive and negative association rules,” ACM Transactions on Information and System Security, vol. 22, no. 3, pp. 381–405, 2004. View at Publisher · View at Google Scholar
  9. L. Nguyen, B. Vo, and T.-P. Hong, “CARIM: An efficient algorithm for mining class-association rules with interestingness measures,” International Arab Journal of Information Technolog, vol. 12, no. 6A, pp. 627–634, 2015. View at Google Scholar · View at Scopus
  10. P. Tan, V. Kumar, and J. Srivastava, “Selecting the right interestingness measure for association patterns,” in Proceedings of the the eighth ACM SIGKDD international conference, p. 32, Edmonton, Alberta, Canada, July 2002. View at Publisher · View at Google Scholar
  11. Dr. Niket Bhargava and Manoj. Shukla, “Survey of Interestingness Measures for Association Rules Mining: Data Mining, Data Science for Business Perspective,” IRACST - International Journal of Computer Science and Information Technology & Security (IJCSITS), vol. 6, no. 2, pp. 2249–2249-9555, 2016. View at Google Scholar
  12. J. Diatta, H. Ralambondrainy, and A. Totohasina, “Towards a unifying probabilistic implicative normalized quality measure for association rules,” Studies in Computational Intelligence, vol. 43, pp. 237–250, 2007. View at Google Scholar · View at Scopus
  13. A. Totohasina, Contribution a l’étude des mesures de qualité des regles d’associations: normalisation sous cinq contraintes et cas de MGK: propriétés, bases composites des regles et extension en vue d’applications en statistique et en sciences physiques [Ph.D. thesis], Université, dAntsiranana, Madagascar: Mathématiques Informatique, Madagascar, 2008.
  14. P. Bemarisika and A. Totohasina, “Optimized mining of potential positive and negative association rules,” Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics): Preface, vol. 10440, pp. 424–432, 2017. View at Google Scholar · View at Scopus
  15. A. Totohasina and H. Ralambondrainy, “ION: A pertinent new measure for mining information from many types of data,” in Proceedings of the 1st IEEE International Conference on Signal-Image Technology and Internet-Based Systems, SITIS 2005, pp. 202–207, Cameroon, December 2005. View at Scopus
  16. M. Hahsler and K. Hornik Vienna, New Probabilistic Interest Measures for Association Rules, University of Economics and Business Administration, Augasse 26, A-1090 Vienna, Austria, 2018.
  17. L. Lin, M.-L. Shyu, and S.-C. Chen, “Association rule mining with a correlation-based interestingness measure for video semantic concept detection,” International Journal of Information and Decision Sciences, vol. 4, no. 2-3, pp. 199–216, 2012. View at Publisher · View at Google Scholar · View at Scopus
  18. A. Armand, Totohasina and Feno Daniel Rajaonasy. An extension of Totohasina’s normalization theory of quality measures of association rules, 2018, In press.