Research Article | Open Access
Xiaoyan Liu, Feng Feng, Qian Wang, Ronald R. Yager, Hamido Fujita, José Carlos R. Alcantud, "Mining Temporal Association Rules with Temporal Soft Sets", Journal of Mathematics, vol. 2021, Article ID 7303720, 17 pages, 2021. https://doi.org/10.1155/2021/7303720
Mining Temporal Association Rules with Temporal Soft Sets
Traditional association rule extraction may run into some difficulties due to ignoring the temporal aspect of the collected data. Particularly, it happens in many cases that some item sets are frequent during specific time periods, although they are not frequent in the whole data set. In this study, we make an effort to enhance conventional rule mining by introducing temporal soft sets. We define temporal granulation mappings to induce granular structures for temporal transaction data. Using this notion, we define temporal soft sets and their -clip soft sets to establish a novel framework for mining temporal association rules. A number of useful characterizations and results are obtained, including a necessary and sufficient condition for fast identification of strong temporal association rules. By combining temporal soft sets with NegNodeset-based frequent item set mining techniques, we develop the negFIN-based soft temporal association rule mining (negFIN-STARM) method to extract strong temporal association rules. Numerical experiments are conducted on commonly used data sets to show the feasibility of our approach. Moreover, comparative analysis demonstrates that the newly proposed method achieves higher execution efficiency than three well-known approaches in the literature.
In modern society, vast amounts of data are produced and collected daily by all walks of life. With an increasing amount of data, there has been an urgent need for developing powerful models, methods and apparatuses to facilitate data analysis. In response to this demand, data mining has emerged and become a fast-growing research field with various fascinating topics and practical applications. Data mining is a multidisciplinary field, which involves applied mathematics, computer science, information science, statistics, and other disciplines. In the process of knowledge discovery in databases (KDD), data mining is viewed as the most essential step in which sophisticated methods are applied to extract knowledge or patterns from data. As shown in Figure 1, six fundamental tasks in data mining are association rule mining, clustering, classification, regression, summarization, and sequence analysis. In a more general perspective, some researchers treat data mining as a synonym for KDD. Data mining has proven to be useful in a myriad of areas including biological statistics , case-based reasoning , factor analysis of heart disease , pattern classification , and group role assignment .
Association rule mining, such as association analysis and association rule learning, is of great importance in the realm of knowledge discovery and data mining. It was originally proposed in  with the aim to find frequent patterns in transactional databases and potential association rules between different item sets. Most rule extraction algorithms belong to one of the following two categories. The first one is known as the class of “candidate generation” methods with the Apriori  algorithm as its typical representative. The main drawback of these methods is that all of them require multiple database scans. The second category consists of “pattern growth” methods such as the FP-growth algorithm , which relies on the tree-based data structure (like FP-trees) to store basic information about frequent item sets. More specifically, it does not generate candidate sets of items, and it does not require multiple scans of the database by saving basic information about frequent sets of items into a custom-built data structure. In addition, Zaki  proposed another lower I/O costs vertical mining algorithm, called the equivalence class transformation (ECLAT). However, the performance of ECLAT can be affected in dense databases. By using the bitmap representation of sets, Aryabarzan et al.  presented a crucial data structure named NegNodeset and developed the NegNodeset-based Frequent Itemset Mining (negFIN) algorithm. The prominent features of the negFIN algorithm are three-fold. Firstly, it makes use of bitwise operators in order to extract NegNodesets of item sets. Secondly, it significantly reduces the complexity of computing supports. Lastly, it generates frequent item sets by using the structure called set-enumeration tree, and meanwhile, it efficiently prunes the search space with the promotion method. Djenouri et al.  developed an efficient parallel genetic algorithm for extracting diversified association rules in big data sets. To further improve pattern mining in big data, Luna et al.  designed several sophisticated algorithms which rely on a novel paradigm called MapReduce and related implementation named Hadoop. Nevertheless, it should be noticed that the abovementioned rule extraction methods may sometimes produce redundant or incoherent association rules. In view of this, Feldman et al.  proposed the maximal association rule, which is a novel complementary apparatus to extract interesting association rules that are frequently lost when using regular association rules. Amir et al.  contributed to additional developments regarding exact conceptualization and efficient identification of maximal association rules. In addition to objective measures such as support, confidence, and correlation, some researchers have been interested in considering subjective measures such as risk, interest, and utility to discover useful item sets and association rules. In particular, a new research direction named utility pattern mining [15–17] has received considerable attention in recent years.
Temporal association rule mining (TARM) is one of the most fascinating topics in the field of association rule mining. It has been successfully applied to a wide range of domains such as cancer treatment , gene analysis , and web mining . Depending on whether the time variable is considered as an implied or integral component, Segura-Delgado et al.  systematically classified the existing TARM approaches into two main categories. Agrawal and Srikant  coined the terminology of sequential pattern to facilitate the analysis of a transaction database. Inspired by this seminal idea, many scholars have conducted in-depth research with regard to sequential rule mining. Zhai et al.  designed a time constraint-based rule mining algorithm, called T-Apriori, to analyse the sequence of ecological events. Gan et al.  presented a projection-based utility mining method which is useful for mining high-utility sequential patterns from sequence data. Hong et al.  constructed a hierarchical granular framework to enhance TARM by considering different levels of time granules. Song et al.  detected changes of customer behavior by using temporal association rules mining from customer profiles and sales data at different time snapshots. Yun et al.  designed an efficient algorithm to discover high-utility patterns from incremental databases by constructing a global data structure through a single scan.
Molodtsov’s soft set theory  provides a formal framework for coping with uncertainty. Its basic principle relies on the perspective of parameterization, suggesting that one should recognize uncertainly defined objects from various facets, and every solo feature yields an approximate description of this object. Maji et al.  soon presented several operations of soft sets to complement . Ali et al.  introduced several new operations to consolidate the basis of soft set theory. Babitha and Sunil  extended the ideas of functions and relations by virtue of soft set theory. Feng and Li  clarified the relations among several kinds of soft subsets and discovered that soft sets satisfy new algebraic properties. By the combination of soft sets and fuzzy sets, Maji et al.  proposed a hybrid concept named as fuzzy soft sets. Later on, several more complicated extensions of soft sets have been developed and investigated [34–38]. Ali and Shabir  developed some logic connectives in (fuzzy) soft set theory. In , a distance-based algorithm was designed for fuzzy soft set parameter reduction. Several works pointed out that rough sets, soft sets, and fuzzy sets are closely connected models [41–43]. They model uncertainty from independent perspectives, namely, gradualness, granularity, and parameterization. Feng et al. initiated several hybrid structures combining rough sets, soft sets, and fuzzy sets . Taking a soft set as the underlying granulation structure, Feng et al.  proposed soft rough sets. Soft sets and related extensions have been widely used in many distinct domains, such as decision-making [46–51], valuation of assets , clustering , medical diagnosis , parameter reduction , feature selection , data analysis , BCK/BCI-algebras [58–60], graph theory , and computational biology . The reader is referred to John’s latest monograph  for more details regarding soft set theory and its applications.
With the assistance of soft set theory, Herawan and Deris  made an innovative proposal of identifying association rules from transaction data sets. Their pioneering work opened up a new research direction, aiming at developing soft set-based approach to rule extraction. Some concepts were first introduced in  to study the approximate reasoning theory based on soft sets, inclusive of logical formulas over soft sets, and basic soft truth degree of formulas. Feng et al.  revisited Herawan and Deris’s initial idea and refined several important notions to promote (maximal) association rule mining by virtue of soft set theory. Two important observations motivate us to continue this line of exploration:(1)The ignorance of the temporal aspect of data in the abovementioned association rule extraction approaches [64, 66] may cause some limitations. For instance, some item sets are indeed frequent within certain time periods, even if they are not frequent in the whole data set and the entire time-span. Nonetheless, it is meaningful to discover such item sets since a commodity may sell exceptionally well in a specific season but not during the rest of the year.(2)The identification of temporal frequent item sets, as an essential step in TARM process, can be facilitated by integrating time as a new component into soft set theory. In fact, the BitMap Coding (BMC) tree  must be built to generate node sets corresponding to frequent 1-item sets in the NegNodeset-based frequent item set mining process. The bit value at the index of each temporal frequent 1-item set can be combined to form the bitmap code of a temporal frequent item set. This indicates that temporal soft sets and -clip soft sets to be introduced in current work will provide a helpful apparatus for the construction of BMC trees.
To address these issues, the current study focuses on enhancing association rule extraction with the aid of temporal soft sets. The main contributions of this study are summarized as follows:(1)We define some new concepts such as temporal granulation mappings, temporal soft sets and -clip soft sets in order to establish a conceptual framework for extracting temporal association rules(2)We present a number of useful characterizations and results within the established framework, including a necessary and sufficient condition for fast identification of strong temporal association rules(3)We develop an effective approach, called negFIN-STARM, to extract strong temporal association rules by virtue of temporal soft sets and NegNodeset-based frequent item set mining
The rest of this paper is arranged in the following way: Section 2 provides the rudiments with regard to TARM. Section 3 proposes several fundamental notions such as temporal soft sets and -clip soft sets. Section 4 focuses on soft temporal association rule mining to develop the negFIN-STARM approach. Section 5 is devoted to numerical experiments and comparative analysis of four different methods for extracting temporal association rules. Section 6 concludes this research and points out future research directions.
2. Temporal Association Rules
This section focuses on temporal association rules. First, we quote some fundamental definitions from .
Definition 1 (see ). An item endowed with a time-stamp is called a temporal item. A temporal item set means a nonempty set of temporal items.
Definition 2 (see ). Assume that is a set of transactions on a temporal item set and the positive integer is the selected support threshold. Then, we say that is a temporal frequent item set with respect to and , when .
Definition 3 (see ). A pair of disjoint temporal item sets is called a temporal association rule (TAR). Let and , respectively, represent the right and left temporal item sets. Then, we denote a TAR by , where the time-stamp of every temporal item in precedes that, of any temporal item in , is the interval of two different time-stamps.
Since temporal items in a transaction are associated with respective time-stamps, TARs can be generated by finding temporal frequent item sets in the temporal transaction set with interval . TARs defined in  are therefore useful for capturing temporal dependence among items within different time spans.
Nevertheless, it is also interesting to see that some item sets are indeed frequent within certain period of time, even if they are not frequent in the whole data set during the entire time-span. In order to better describe such cases, we revisit some basic concepts in TARM and refine them in what follows.
Suppose that is an item domain. Any subset of is a transaction, and a transaction data set consists of a set formed by all transactions under inspection. Each transaction in has a unique transaction identifier (TID).
In classical association rule extraction, an item set is a subset of . When it is formed by distinct items, we call it a -item set. To simplify notation, the item set is denoted by . An item set appears in (alternatively, supports , when ).
Now, let be a collection of pairwise disjoint periods of time. If is related to a unique period (indicating that occurs during the period ), then is called the period marker of . In fact, this defines a mapping from to such that . In what follows, is called a temporal transaction data set.
Definition 4. Assume that is a temporal transaction data set, , , and is an item set. Then, supports in during a period in if and .
Definition 5. Assume that is a temporal transaction data set, and is an item set. The setis the temporal realization of in during a period in .
The set consists of all the transactions in which contain all the items in and occur during a period in . The cardinality of this set is written as , called the temporal support of in during a period in . For simplicity, and are written as and , respectively.
Definition 6. Let be a temporal transaction data set and . Given two disjoint nonempty item sets , an expression is called a temporal association rule (TAR).
We refer to and as the consequent and antecedent of the rule . The rule is simply written as .
Definition 7. Let be a temporal transaction data set, and be a TAR. Then, the temporal realization of in is given by
The cardinality of the set , denoted by , is called the temporal support of .
Definition 8. The temporal confidence of a TAR is given byIn particular, if .
The temporal confidence serves as an essential measure in the evaluation of temporal association rules. It reflects the strength of the association between antecedent and the consequent of a TAR during concerned periods.
For simplicity, , , and are written as , , and , respectively. Let stand for the set of all positive integers. To find significant and interesting TARs from a temporal transaction data set , the users or experts should specify the minimum temporal support (min-TS) and the minimum temporal confidence (min-TC) for a given subset of . An item set is temporal frequent during a period in if . A TAR is frequent during a period in if . If , is a confident TAR during a period in . A TAR is strong during a period in if it is both frequent and confident.
The next example illustrates some concepts mentioned above.
Example 1. Consider a temporal transaction data set adapted from . Let us assume that be a sample temporal transaction data set, where consisting of all the transactions. Assume that every is related to a unique period , where . From Table 1, it can be seen that is divided into four parts by . For example, the item set appears in the transaction and during the period .
Now, let us consider the subset of . By Definition 5, we have and . In a similar fashion, and . In addition, the 2-item set appears in the transaction , and transaction occurs during the period . Thus by Definition 4, we can say that supports in during the period . Also, it is clear that and .
Next, we consider the TAR . By Definition 7, its temporal realization in isand the temporal support of this rule isBy Definition 8, the temporal confidence of this rule isFinally, assume that and . We conclude that is a strong TAR during the period .
3. Temporal Soft Sets
In this section, we define some new concepts such as temporal granulation mappings, temporal soft sets, and -clip soft sets which will play a role of fundamental importance in this study. In the following, represents a universal set of objects and stands for the parameter space consisting of all parameters associated with objects in . The power set of is written as .
Definition 9 (see ). A soft set over is an ordered pair, in which and is called the approximate function of .
Definition 10 (see ). Assume that and are nonempty finite sets of alternatives and attributes, respectively. The pair is called an information system (IS), when every attribute can be identified with an information function and is the value set of .
When is a soft set over , it naturally induces an IS in the following fashion. Given every and , associate the corresponding information function as follows:
Definition 11. Let be a set of pairwise disjoint periods of time. Then, is called a temporal granulation mapping.
Definition 12. A temporal soft set (TSS) over is a quadruple such that(1) is a soft set over (2) is a temporal granulation mapping
The soft set is said to be the underlying soft set (USS) of the TSS . We also refer to as a temporal granulation of . The TSS , as an abstract representation of data, can additionally capture temporal information, which is unable to be expressed by its underlying soft set.
Definition 13. Assume that is a TSS over and . Then the -clip of is a soft set overwhere for all .
Note that -clip soft set is simply called -clip soft set. Next, we consider an example that illustrates the abovementioned notions.
Example 2. The Nobel Prizes are awarded annually to individuals and organizations in recognition of outstanding contributions in several categories: literature, chemistry, physics, physiology or medicine, and peace. In the following, we focus on three types of prizes, which are the Nobel Prizes in Physics (NPP), Physiology or Medicine (NPPM), and Chemistry (NPC).
We consideras a universal set that consists of all Nobel Prizes in scientific categories, namely, NPP, NPPM, and NPC awarded between 1901 and 1903. Detailed information regarding these prizes can be found in Table 2. Suppose that is a set of parameters, containing all the affiliation countries associated with the prizes in . More specifically, let stand for “Denmark,” “France,” “Germany,” “The Netherlands,” “Sweden,” and “United Kingdom,” respectively. Based on the information in Table 2, we can construct a soft set over , with its approximate function defined as , , , , , and .
In addition, a temporal granulation of can be derived from Table 2 in a natural way. In fact, let with for . Then, the temporal granulation mapping is given byThe intuitive meaning of is apparent. For instance, equation (10) says that the prizes , , and were bestowed in 1901. With this mapping, we can construct a TSS over , as shown in Table 3. As seen from the equations (10)–(12), the temporal granulation mapping induces a partition of as follows:Finally, by Definition 13, the -clip soft set of the TSS for are as follows:(1)The -clip of is a soft set over , where and for all with (2)The -clip of is a soft set over , where , , , and for all with (3)The -clip of is a soft set over , where , , , and for all with
4. Soft Temporal Association Rule Mining
This section aims to establish a formal framework for mining TARs by means of TSSs. Let be a set of pairwise disjoint periods of time and throughout this section.
Definition 14. (see ). Assume that is a soft set over and . Then, we callas the parameter coset of the alternative in .
It can be seen that contains all the parameters that the alternative meets, according to the information contained in .
Definition 15. Assume that is a TSS over with its USS . For any ,is called the -realization of in the TSS .
When , it is said that is -supported by the alternative . The -support of in is the cardinality of , represented by . Note that and are respectively written as and .
Definition 16. Assume that is a TSS over and are two disjoint non-empty subsets of . We call the expression as a temporal association rule (TAR) in the TSS . The non-empty parameter sets and are respectively called consequent and antecedent of the TAR .
Definition 17. Suppose that is a TSS over and is a TAR in . We refer toas the -realization of in the TSS .
The -support of , written as , is the cardinality of . For convenience, , and are simply written as , and , respectively.
Proposition 1. Assume that is a TSS over and fix . Then
Proof. We denote by the USS of the TSS . Let . Equation (15) assures and . By Definition 14, when . Thus we haveThis provesNow, suppose that . Then and for any . From the definition of the parameter coset , it follows that and . Hence , which also shows thatTherefore we derive thatThis ends the proof.
By Proposition 1, the following results can be deduced.
Corollary 1. Assume that is a TSS over and is its -clip soft set. Then we havefor all non-empty subset of .
Corollary 2. Assume that is a TSS over and are subsets of . Then we have
Proposition 2. Assume that is a TSS over and is a TAR in . Then,
Remark 1. The above assertion reveals that the -realization of a TAR in a TSS coincides with the intersection of the -realizations of the consequent and antecedent of .
By Proposition 2, the following results can be deduced.
Corollary 3. Assume that is a TSS over and is its -clip soft set. Then,where is a TAR in .
Corollary 4. Assume that is a TSS over and is a TAR in . Then,
Definition 18. Assume that is a TSS over and is a TAR in . The -confidence of is given by
For convenience, is simply written as .
Theorem 1. Assume that is a TSS over and is a TAR in . Then, is strong during a period in if and only ifwhere is the min-TS and is the min-TC.
Proof. Suppose that is strong in during a period in . Then, we haveIt follows thatThus, we haveConversely, let be a temporal association rule in such thatIt follows thatHence, we deduce thatThus, is strong in during a period in , completing the proof.
Using the aforementioned concepts and results, we can obtain the following result.
Proposition 3. Suppose that is a TSS over and is a TAR in with . Then, the following are equivalent:(1) is temporal frequent during a period in (2) is frequent during a period in (3) is confident during a period in (4) is strong during a period in
To illustrate the new notions above, we consider the following example, which is a continuation of Example 2.
Example 3. Assume that is a set of parameters, consisting of the three types of prizes under consideration, i.e., is NPC, is NPP, and is NPPM. Before using the proposed concepts regarding soft temporal association rule mining for mathematical modeling and analysis, we now first establish another TSS based on the information in Table 2. The TSS over is shown in Table 4, where the parameter set and the temporal granulation mapping is identical with what is defined in Example 2. In what follows, let us consider three different cases in which , , and , respectively. Suppose that the min-TS and . The min-TC for .
Let us first focus on the case when . Recall first that . By Definition 13, the -clip of the TSS is a soft set over , where , , , , , and for all with . By Proposition 1 and Corollary 1, we can easily getNext, we consider the TAR . By Proposition 2 and Corollary 3, its -realization in can be calculated as follows:In fact, as indicated by Corollary 3, the -realization of the TAR in the TSS is completely determined by the approximate function of the corresponding -clip . It is clear that the -support of this rule isBy Definition 18, the -confidence of this rule isHence, by definition, is strong during a period in . On the other hand, we can draw the same conclusion from Theorem 1 sinceNote also thatThus, we can also conclude that is strong during a period in by Proposition 3. This rule indicates that “From 1901 to 1902, all the Nobel Prizes in Chemistry were awarded to Germany.” Conversely, we can consider the TAR . Its -support isbut its -confidence isHence, this rule is frequent but not confident during a period in . It reveals that “From 1901 to 1902, 50% of the Nobel Prizes awarded to Germany pertain to the category of chemistry.”
Now, let us consider the second case when . Similarly, we can getHence, we conclude that is strong during the period . This rule says that “In 1902, Germany was only awarded the NPC, instead of the NPP or NPPM.”
Finally, we consider the third case when . Clearly, in this case. It follows that the -clip of the TSS is a soft set over , which coincide with the USS of the TSS . That is, for all . By Proposition 1 and Corollary 1, we haveNext, let us consider the TAR , which can also be seen as an association rule in conventional sense. By Proposition 2 and Corollary 3, its -realization in can be calculated as follows:Obviously, . By Definition 18, we also haveIt is clear thatThus, we deduce that is neither frequent nor confident during a period in . This rule reveals that “From 1901 to 1903, only one Nobel Prize in Physiology or Medicine was awarded to the United Kingdom.” In addition, it can be seen that the rule is neither frequent nor confident during a period in sinceThis rule says that “From 1901 to 1903, only a quarter of the Nobel Prizes awarded to Germany pertain to the category of physiology or medicine.”
Compared with the case of consisting of all time periods, we see that some rules such as and can only be identified as strong TARs when we restrict to the cases of or consisting of fewer time periods. This is mainly due to the fact that some item sets can be frequent during certain time periods rather than all of them. In a nutshell, we conclude that the TARM based on TSSs can help find some strong TARs which might be ignored in conventional rule extraction process.
Based on the results obtained in this section and the concepts such as TSSs and -clip soft sets proposed in Section 3, we present a novel TARM method by combining NegNodeset-based frequent item set mining with TSS-based rule mining. Our method will be abbreviated as negFIN-STARM in the sequel. The pseudocode description of the negFIN-STARM method is given in Algorithm 1. This algorithm takes a temporal transaction data set , a set , the min-TS , and the min-TC as the input. The output of Algorithm 1 is the class , which contains all strong TARs during a period in . The main procedure of the negFIN-STARM method can be divided into three stages:(1)In the first stage, we construct a TSS over from the provided temporal transaction data set . Then, according to Definition 13, we determine and construct the -clip soft set of the TSS . Next, we derive the IS from the -clip soft set of .(2)In the second stage, NegNodeset-based frequent item set mining technique and temporal soft sets are combined for generating all temporal frequent item sets. More specifically, we first employ the information function of to construct the BMC tree. Then, the Nodesets of all frequent 1-item sets are generated by traversing the BMC tree. Furthermore, we identify the NegNodesets of all frequent k-item sets . Eventually, the set-enumeration tree is built to generate the class , which consists of all temporal frequent item sets. These item sets will function as potential consequents and antecedents for finding strong TARs. Here, we would like to emphasize a crucial issue. To apply the NegNodeset-based frequent item set mining, the BMC tree must be built to generate the node set related to every frequent 1-item set. Each frequent item set is represented by a bitmap code, and every frequent 1-item set is mapped to one of its bits. In other words, the bit value at the corresponding index of each temporal frequent 1-item set can be combined to form the bitmap code of the temporal frequent item set. It is worth noting that the use of TSSs and -clip soft sets can facilitate the calculation of bitmap codes and the construction of BMC trees in this important stage.(3)In the last stage, by Corollary 3, we can calculate the -realization of using the -clip soft set for all which are disjoint. Next, by Theorem 1, it is easy to check whether or not the is strong during a period in . If this is true, then we put into the class .