Mining Temporal Association Rules with Temporal Soft Sets

Liu, Xiaoyan; Feng, Feng; Wang, Qian; Yager, Ronald R.; Fujita, Hamido; Alcantud, José Carlos R.

doi:https://doi.org/10.1155/2021/7303720

Journal of Mathematics

On this page

Abstract Introduction Conclusions Data Availability Conflicts of Interest Acknowledgments References Copyright Related Articles

Research Article | Open Access

Volume 2021 | Article ID 7303720 | https://doi.org/10.1155/2021/7303720

Mining Temporal Association Rules with Temporal Soft Sets

Xiaoyan Liu,¹Feng Feng,¹Qian Wang,²Ronald R. Yager,³Hamido Fujita,^4,5and José Carlos R. Alcantud⁶

Academic Editor: Firdous A. Shah

Received22 Aug 2021

Accepted11 Nov 2021

Published29 Nov 2021

Abstract

Traditional association rule extraction may run into some difficulties due to ignoring the temporal aspect of the collected data. Particularly, it happens in many cases that some item sets are frequent during specific time periods, although they are not frequent in the whole data set. In this study, we make an effort to enhance conventional rule mining by introducing temporal soft sets. We define temporal granulation mappings to induce granular structures for temporal transaction data. Using this notion, we define temporal soft sets and their -clip soft sets to establish a novel framework for mining temporal association rules. A number of useful characterizations and results are obtained, including a necessary and sufficient condition for fast identification of strong temporal association rules. By combining temporal soft sets with NegNodeset-based frequent item set mining techniques, we develop the negFIN-based soft temporal association rule mining (negFIN-STARM) method to extract strong temporal association rules. Numerical experiments are conducted on commonly used data sets to show the feasibility of our approach. Moreover, comparative analysis demonstrates that the newly proposed method achieves higher execution efficiency than three well-known approaches in the literature.

1. Introduction

In modern society, vast amounts of data are produced and collected daily by all walks of life. With an increasing amount of data, there has been an urgent need for developing powerful models, methods and apparatuses to facilitate data analysis. In response to this demand, data mining has emerged and become a fast-growing research field with various fascinating topics and practical applications. Data mining is a multidisciplinary field, which involves applied mathematics, computer science, information science, statistics, and other disciplines. In the process of knowledge discovery in databases (KDD), data mining is viewed as the most essential step in which sophisticated methods are applied to extract knowledge or patterns from data. As shown in Figure 1, six fundamental tasks in data mining are association rule mining, clustering, classification, regression, summarization, and sequence analysis. In a more general perspective, some researchers treat data mining as a synonym for KDD. Data mining has proven to be useful in a myriad of areas including biological statistics [1], case-based reasoning [2], factor analysis of heart disease [3], pattern classification [4], and group role assignment [5].

Association rule mining, such as association analysis and association rule learning, is of great importance in the realm of knowledge discovery and data mining. It was originally proposed in [6] with the aim to find frequent patterns in transactional databases and potential association rules between different item sets. Most rule extraction algorithms belong to one of the following two categories. The first one is known as the class of “candidate generation” methods with the Apriori [7] algorithm as its typical representative. The main drawback of these methods is that all of them require multiple database scans. The second category consists of “pattern growth” methods such as the FP-growth algorithm [8], which relies on the tree-based data structure (like FP-trees) to store basic information about frequent item sets. More specifically, it does not generate candidate sets of items, and it does not require multiple scans of the database by saving basic information about frequent sets of items into a custom-built data structure. In addition, Zaki [9] proposed another lower I/O costs vertical mining algorithm, called the equivalence class transformation (ECLAT). However, the performance of ECLAT can be affected in dense databases. By using the bitmap representation of sets, Aryabarzan et al. [10] presented a crucial data structure named NegNodeset and developed the NegNodeset-based Frequent Itemset Mining (negFIN) algorithm. The prominent features of the negFIN algorithm are three-fold. Firstly, it makes use of bitwise operators in order to extract NegNodesets of item sets. Secondly, it significantly reduces the complexity of computing supports. Lastly, it generates frequent item sets by using the structure called set-enumeration tree, and meanwhile, it efficiently prunes the search space with the promotion method. Djenouri et al. [11] developed an efficient parallel genetic algorithm for extracting diversified association rules in big data sets. To further improve pattern mining in big data, Luna et al. [12] designed several sophisticated algorithms which rely on a novel paradigm called MapReduce and related implementation named Hadoop. Nevertheless, it should be noticed that the abovementioned rule extraction methods may sometimes produce redundant or incoherent association rules. In view of this, Feldman et al. [13] proposed the maximal association rule, which is a novel complementary apparatus to extract interesting association rules that are frequently lost when using regular association rules. Amir et al. [14] contributed to additional developments regarding exact conceptualization and efficient identification of maximal association rules. In addition to objective measures such as support, confidence, and correlation, some researchers have been interested in considering subjective measures such as risk, interest, and utility to discover useful item sets and association rules. In particular, a new research direction named utility pattern mining [15–17] has received considerable attention in recent years.

Temporal association rule mining (TARM) is one of the most fascinating topics in the field of association rule mining. It has been successfully applied to a wide range of domains such as cancer treatment [18], gene analysis [19], and web mining [20]. Depending on whether the time variable is considered as an implied or integral component, Segura-Delgado et al. [21] systematically classified the existing TARM approaches into two main categories. Agrawal and Srikant [22] coined the terminology of sequential pattern to facilitate the analysis of a transaction database. Inspired by this seminal idea, many scholars have conducted in-depth research with regard to sequential rule mining. Zhai et al. [23] designed a time constraint-based rule mining algorithm, called T-Apriori, to analyse the sequence of ecological events. Gan et al. [24] presented a projection-based utility mining method which is useful for mining high-utility sequential patterns from sequence data. Hong et al. [25] constructed a hierarchical granular framework to enhance TARM by considering different levels of time granules. Song et al. [26] detected changes of customer behavior by using temporal association rules mining from customer profiles and sales data at different time snapshots. Yun et al. [27] designed an efficient algorithm to discover high-utility patterns from incremental databases by constructing a global data structure through a single scan.

Molodtsov’s soft set theory [28] provides a formal framework for coping with uncertainty. Its basic principle relies on the perspective of parameterization, suggesting that one should recognize uncertainly defined objects from various facets, and every solo feature yields an approximate description of this object. Maji et al. [29] soon presented several operations of soft sets to complement [28]. Ali et al. [30] introduced several new operations to consolidate the basis of soft set theory. Babitha and Sunil [31] extended the ideas of functions and relations by virtue of soft set theory. Feng and Li [32] clarified the relations among several kinds of soft subsets and discovered that soft sets satisfy new algebraic properties. By the combination of soft sets and fuzzy sets, Maji et al. [33] proposed a hybrid concept named as fuzzy soft sets. Later on, several more complicated extensions of soft sets have been developed and investigated [34–38]. Ali and Shabir [39] developed some logic connectives in (fuzzy) soft set theory. In [40], a distance-based algorithm was designed for fuzzy soft set parameter reduction. Several works pointed out that rough sets, soft sets, and fuzzy sets are closely connected models [41–43]. They model uncertainty from independent perspectives, namely, gradualness, granularity, and parameterization. Feng et al. initiated several hybrid structures combining rough sets, soft sets, and fuzzy sets [44]. Taking a soft set as the underlying granulation structure, Feng et al. [45] proposed soft rough sets. Soft sets and related extensions have been widely used in many distinct domains, such as decision-making [46–51], valuation of assets [52], clustering [53], medical diagnosis [54], parameter reduction [55], feature selection [56], data analysis [57], BCK/BCI-algebras [58–60], graph theory [61], and computational biology [62]. The reader is referred to John’s latest monograph [63] for more details regarding soft set theory and its applications.

With the assistance of soft set theory, Herawan and Deris [64] made an innovative proposal of identifying association rules from transaction data sets. Their pioneering work opened up a new research direction, aiming at developing soft set-based approach to rule extraction. Some concepts were first introduced in [65] to study the approximate reasoning theory based on soft sets, inclusive of logical formulas over soft sets, and basic soft truth degree of formulas. Feng et al. [66] revisited Herawan and Deris’s initial idea and refined several important notions to promote (maximal) association rule mining by virtue of soft set theory. Two important observations motivate us to continue this line of exploration:(1)The ignorance of the temporal aspect of data in the abovementioned association rule extraction approaches [64, 66] may cause some limitations. For instance, some item sets are indeed frequent within certain time periods, even if they are not frequent in the whole data set and the entire time-span. Nonetheless, it is meaningful to discover such item sets since a commodity may sell exceptionally well in a specific season but not during the rest of the year.(2)The identification of temporal frequent item sets, as an essential step in TARM process, can be facilitated by integrating time as a new component into soft set theory. In fact, the BitMap Coding (BMC) tree [10] must be built to generate node sets corresponding to frequent 1-item sets in the NegNodeset-based frequent item set mining process. The bit value at the index of each temporal frequent 1-item set can be combined to form the bitmap code of a temporal frequent item set. This indicates that temporal soft sets and -clip soft sets to be introduced in current work will provide a helpful apparatus for the construction of BMC trees.

To address these issues, the current study focuses on enhancing association rule extraction with the aid of temporal soft sets. The main contributions of this study are summarized as follows:(1)We define some new concepts such as temporal granulation mappings, temporal soft sets and -clip soft sets in order to establish a conceptual framework for extracting temporal association rules(2)We present a number of useful characterizations and results within the established framework, including a necessary and sufficient condition for fast identification of strong temporal association rules(3)We develop an effective approach, called negFIN-STARM, to extract strong temporal association rules by virtue of temporal soft sets and NegNodeset-based frequent item set mining

The rest of this paper is arranged in the following way: Section 2 provides the rudiments with regard to TARM. Section 3 proposes several fundamental notions such as temporal soft sets and -clip soft sets. Section 4 focuses on soft temporal association rule mining to develop the negFIN-STARM approach. Section 5 is devoted to numerical experiments and comparative analysis of four different methods for extracting temporal association rules. Section 6 concludes this research and points out future research directions.

2. Temporal Association Rules

This section focuses on temporal association rules. First, we quote some fundamental definitions from [19].

Definition 1 (see [19]). An item endowed with a time-stamp is called a temporal item. A temporal item set means a nonempty set of temporal items.

Definition 2 (see [19]). Assume that is a set of transactions on a temporal item set and the positive integer is the selected support threshold. Then, we say that is a temporal frequent item set with respect to and , when .

Definition 3 (see [19]). A pair of disjoint temporal item sets is called a temporal association rule (TAR). Let and , respectively, represent the right and left temporal item sets. Then, we denote a TAR by , where the time-stamp of every temporal item in precedes that, of any temporal item in , is the interval of two different time-stamps.

Since temporal items in a transaction are associated with respective time-stamps, TARs can be generated by finding temporal frequent item sets in the temporal transaction set with interval . TARs defined in [19] are therefore useful for capturing temporal dependence among items within different time spans.

Nevertheless, it is also interesting to see that some item sets are indeed frequent within certain period of time, even if they are not frequent in the whole data set during the entire time-span. In order to better describe such cases, we revisit some basic concepts in TARM and refine them in what follows.

Suppose that is an item domain. Any subset of is a transaction, and a transaction data set consists of a set formed by all transactions under inspection. Each transaction in has a unique transaction identifier (TID).

In classical association rule extraction, an item set is a subset of . When it is formed by distinct items, we call it a -item set. To simplify notation, the item set is denoted by . An item set appears in (alternatively, supports , when ).

Now, let be a collection of pairwise disjoint periods of time. If is related to a unique period (indicating that occurs during the period ), then is called the period marker of . In fact, this defines a mapping from to such that . In what follows, is called a temporal transaction data set.

Definition 4. Assume that is a temporal transaction data set, , , and is an item set. Then, supports in during a period in if and .

Definition 5. Assume that is a temporal transaction data set, and is an item set. The setis the temporal realization of in during a period in .

The set consists of all the transactions in which contain all the items in and occur during a period in . The cardinality of this set is written as , called the temporal support of in during a period in . For simplicity, and are written as and , respectively.

Definition 6. Let be a temporal transaction data set and . Given two disjoint nonempty item sets , an expression is called a temporal association rule (TAR).

We refer to and as the consequent and antecedent of the rule . The rule is simply written as .

Definition 7. Let be a temporal transaction data set, and be a TAR. Then, the temporal realization of in is given by

The cardinality of the set , denoted by , is called the temporal support of .

Definition 8. The temporal confidence of a TAR is given byIn particular, if .

The temporal confidence serves as an essential measure in the evaluation of temporal association rules. It reflects the strength of the association between antecedent and the consequent of a TAR during concerned periods.

For simplicity, , , and are written as , , and , respectively. Let stand for the set of all positive integers. To find significant and interesting TARs from a temporal transaction data set , the users or experts should specify the minimum temporal support (min-TS) and the minimum temporal confidence (min-TC) for a given subset of . An item set is temporal frequent during a period in if . A TAR is frequent during a period in if . If , is a confident TAR during a period in . A TAR is strong during a period in if it is both frequent and confident.

The next example illustrates some concepts mentioned above.

Example 1. Consider a temporal transaction data set adapted from [25]. Let us assume that be a sample temporal transaction data set, where consisting of all the transactions. Assume that every is related to a unique period , where . From Table 1, it can be seen that is divided into four parts by . For example, the item set appears in the transaction and during the period .
Now, let us consider the subset of . By Definition 5, we have and . In a similar fashion, and . In addition, the 2-item set appears in the transaction , and transaction occurs during the period . Thus by Definition 4, we can say that supports in during the period . Also, it is clear that and .
Next, we consider the TAR . By Definition 7, its temporal realization in isand the temporal support of this rule isBy Definition 8, the temporal confidence of this rule isFinally, assume that and . We conclude that is a strong TAR during the period .

3. Temporal Soft Sets

In this section, we define some new concepts such as temporal granulation mappings, temporal soft sets, and -clip soft sets which will play a role of fundamental importance in this study. In the following, represents a universal set of objects and stands for the parameter space consisting of all parameters associated with objects in . The power set of is written as .

Definition 9 (see [28]). A soft set over is an ordered pair, in which and is called the approximate function of .

Definition 10 (see [67]). Assume that and are nonempty finite sets of alternatives and attributes, respectively. The pair is called an information system (IS), when every attribute can be identified with an information function and is the value set of .

When is a soft set over , it naturally induces an IS in the following fashion. Given every and , associate the corresponding information function as follows:

Definition 11. Let be a set of pairwise disjoint periods of time. Then, is called a temporal granulation mapping.

Definition 12. A temporal soft set (TSS) over is a quadruple such that(1) is a soft set over (2) is a temporal granulation mapping

The soft set is said to be the underlying soft set (USS) of the TSS . We also refer to as a temporal granulation of . The TSS , as an abstract representation of data, can additionally capture temporal information, which is unable to be expressed by its underlying soft set.

Definition 13. Assume that is a TSS over and . Then the -clip of is a soft set overwhere for all .

Note that -clip soft set is simply called -clip soft set. Next, we consider an example that illustrates the abovementioned notions.

Example 2. The Nobel Prizes are awarded annually to individuals and organizations in recognition of outstanding contributions in several categories: literature, chemistry, physics, physiology or medicine, and peace. In the following, we focus on three types of prizes, which are the Nobel Prizes in Physics (NPP), Physiology or Medicine (NPPM), and Chemistry (NPC).
We consideras a universal set that consists of all Nobel Prizes in scientific categories, namely, NPP, NPPM, and NPC awarded between 1901 and 1903. Detailed information regarding these prizes can be found in Table 2. Suppose that is a set of parameters, containing all the affiliation countries associated with the prizes in . More specifically, let stand for “Denmark,” “France,” “Germany,” “The Netherlands,” “Sweden,” and “United Kingdom,” respectively. Based on the information in Table 2, we can construct a soft set over , with its approximate function defined as , , , , , and .
In addition, a temporal granulation of can be derived from Table 2 in a natural way. In fact, let with for . Then, the temporal granulation mapping is given byThe intuitive meaning of is apparent. For instance, equation (10) says that the prizes , , and were bestowed in 1901. With this mapping, we can construct a TSS over , as shown in Table 3. As seen from the equations (10)–(12), the temporal granulation mapping induces a partition of as follows:Finally, by Definition 13, the -clip soft set of the TSS for are as follows:(1)The -clip of is a soft set over , where and for all with (2)The -clip of is a soft set over , where , , , and for all with (3)The -clip of is a soft set over , where , , , and for all with

4. Soft Temporal Association Rule Mining

This section aims to establish a formal framework for mining TARs by means of TSSs. Let be a set of pairwise disjoint periods of time and throughout this section.

Definition 14. (see [66]). Assume that is a soft set over and . Then, we callas the parameter coset of the alternative in .

It can be seen that contains all the parameters that the alternative meets, according to the information contained in .

Definition 15. Assume that is a TSS over with its USS . For any ,is called the -realization of in the TSS .

When , it is said that is -supported by the alternative . The -support of in is the cardinality of , represented by . Note that and are respectively written as and .

Definition 16. Assume that is a TSS over and are two disjoint non-empty subsets of . We call the expression as a temporal association rule (TAR) in the TSS . The non-empty parameter sets and are respectively called consequent and antecedent of the TAR .

Definition 17. Suppose that is a TSS over and is a TAR in . We refer toas the -realization of in the TSS .

The -support of , written as , is the cardinality of . For convenience, , and are simply written as , and , respectively.

Proposition 1. Assume that is a TSS over and fix . Then

Proof. We denote by the USS of the TSS . Let . Equation (15) assures and . By Definition 14, when . Thus we haveThis provesNow, suppose that . Then and for any . From the definition of the parameter coset , it follows that and . Hence , which also shows thatTherefore we derive thatThis ends the proof.
By Proposition 1, the following results can be deduced.

Corollary 1. Assume that is a TSS over and is its -clip soft set. Then we havefor all non-empty subset of .

Corollary 2. Assume that is a TSS over and are subsets of . Then we have

Proposition 2. Assume that is a TSS over and is a TAR in . Then,

Proof: . For simplicity, let and stand for and , respectively. According to Definition 17,By Proposition 1, we haveThis ends the proof.

Remark 1. The above assertion reveals that the -realization of a TAR in a TSS coincides with the intersection of the -realizations of the consequent and antecedent of .

By Proposition 2, the following results can be deduced.

Corollary 3. Assume that is a TSS over and is its -clip soft set. Then,where is a TAR in .

Corollary 4. Assume that is a TSS over and is a TAR in . Then,

Definition 18. Assume that is a TSS over and is a TAR in . The -confidence of is given by

For convenience, is simply written as .

Theorem 1. Assume that is a TSS over and is a TAR in . Then, is strong during a period in if and only ifwhere is the min-TS and is the min-TC.

Proof. Suppose that is strong in during a period in . Then, we haveIt follows thatThus, we haveConversely, let be a temporal association rule in such thatIt follows thatHence, we deduce thatThus, is strong in during a period in , completing the proof.
Using the aforementioned concepts and results, we can obtain the following result.

Proposition 3. Suppose that is a TSS over and is a TAR in with . Then, the following are equivalent:(1) is temporal frequent during a period in (2) is frequent during a period in (3) is confident during a period in (4) is strong during a period in

To illustrate the new notions above, we consider the following example, which is a continuation of Example 2.

Example 3. Assume that is a set of parameters, consisting of the three types of prizes under consideration, i.e., is NPC, is NPP, and is NPPM. Before using the proposed concepts regarding soft temporal association rule mining for mathematical modeling and analysis, we now first establish another TSS based on the information in Table 2. The TSS over is shown in Table 4, where the parameter set and the temporal granulation mapping is identical with what is defined in Example 2. In what follows, let us consider three different cases in which , , and , respectively. Suppose that the min-TS and . The min-TC for .
Let us first focus on the case when . Recall first that . By Definition 13, the -clip of the TSS is a soft set over , where , , , , , and for all with . By Proposition 1 and Corollary 1, we can easily getNext, we consider the TAR . By Proposition 2 and Corollary 3, its -realization in can be calculated as follows:In fact, as indicated by Corollary 3, the -realization of the TAR in the TSS is completely determined by the approximate function of the corresponding -clip . It is clear that the -support of this rule isBy Definition 18, the -confidence of this rule isHence, by definition, is strong during a period in . On the other hand, we can draw the same conclusion from Theorem 1 sinceNote also thatThus, we can also conclude that is strong during a period in by Proposition 3. This rule indicates that “From 1901 to 1902, all the Nobel Prizes in Chemistry were awarded to Germany.” Conversely, we can consider the TAR . Its -support isbut its -confidence isHence, this rule is frequent but not confident during a period in . It reveals that “From 1901 to 1902, 50% of the Nobel Prizes awarded to Germany pertain to the category of chemistry.”
Now, let us consider the second case when . Similarly, we can getHence, we conclude that is strong during the period . This rule says that “In 1902, Germany was only awarded the NPC, instead of the NPP or NPPM.”
Finally, we consider the third case when . Clearly, in this case. It follows that the -clip of the TSS is a soft set over , which coincide with the USS of the TSS . That is, for all . By Proposition 1 and Corollary 1, we haveNext, let us consider the TAR , which can also be seen as an association rule in conventional sense. By Proposition 2 and Corollary 3, its -realization in can be calculated as follows:Obviously, . By Definition 18, we also haveIt is clear thatThus, we deduce that is neither frequent nor confident during a period in . This rule reveals that “From 1901 to 1903, only one Nobel Prize in Physiology or Medicine was awarded to the United Kingdom.” In addition, it can be seen that the rule is neither frequent nor confident during a period in sinceThis rule says that “From 1901 to 1903, only a quarter of the Nobel Prizes awarded to Germany pertain to the category of physiology or medicine.”
Compared with the case of consisting of all time periods, we see that some rules such as and can only be identified as strong TARs when we restrict to the cases of or consisting of fewer time periods. This is mainly due to the fact that some item sets can be frequent during certain time periods rather than all of them. In a nutshell, we conclude that the TARM based on TSSs can help find some strong TARs which might be ignored in conventional rule extraction process.
Based on the results obtained in this section and the concepts such as TSSs and -clip soft sets proposed in Section 3, we present a novel TARM method by combining NegNodeset-based frequent item set mining with TSS-based rule mining. Our method will be abbreviated as negFIN-STARM in the sequel. The pseudocode description of the negFIN-STARM method is given in Algorithm 1. This algorithm takes a temporal transaction data set , a set , the min-TS , and the min-TC as the input. The output of Algorithm 1 is the class , which contains all strong TARs during a period in . The main procedure of the negFIN-STARM method can be divided into three stages:(1)In the first stage, we construct a TSS over from the provided temporal transaction data set . Then, according to Definition 13, we determine and construct the -clip soft set of the TSS . Next, we derive the IS from the -clip soft set of .(2)In the second stage, NegNodeset-based frequent item set mining technique and temporal soft sets are combined for generating all temporal frequent item sets. More specifically, we first employ the information function of to construct the BMC tree. Then, the Nodesets of all frequent 1-item sets are generated by traversing the BMC tree. Furthermore, we identify the NegNodesets of all frequent k-item sets . Eventually, the set-enumeration tree is built to generate the class , which consists of all temporal frequent item sets. These item sets will function as potential consequents and antecedents for finding strong TARs. Here, we would like to emphasize a crucial issue. To apply the NegNodeset-based frequent item set mining, the BMC tree must be built to generate the node set related to every frequent 1-item set. Each frequent item set is represented by a bitmap code, and every frequent 1-item set is mapped to one of its bits. In other words, the bit value at the corresponding index of each temporal frequent 1-item set can be combined to form the bitmap code of the temporal frequent item set. It is worth noting that the use of TSSs and -clip soft sets can facilitate the calculation of bitmap codes and the construction of BMC trees in this important stage.(3)In the last stage, by Corollary 3, we can calculate the -realization of using the -clip soft set for all which are disjoint. Next, by Theorem 1, it is easy to check whether or not the is strong during a period in . If this is true, then we put into the class .

	Input: a temporal transaction data set , a set , the min-TS , and the min-TC .
	Output: the class that contains all strong TARs during a period in .
(1)	Construct a TSS over from the temporal transaction data set with the item domain
(2)	Calculate and construct the -clip soft set of
(3)	Construct the IS induced by
(4)	Construct the BMC tree by
(5)	Traverse the BMC tree to get the Nodesets of all frequent 1-item sets
(6)	Identify the NegNodesets of all frequent k-item sets
(7)	Build the set-enumeration tree to generate the class , which consists of all temporal frequent item sets
(8)	fordo
(9)	fordo
(10)	if then
(11)	Calculate ;
(12)	end if
(13)	if then
(14)	Put into ;
(15)	end if
(16)	end for
(17)	end for
(18)	return;

5. Numerical Experiments

In this section, we conduct numerical experiments on the commonly used chess and mushroom data sets to compare the performance of our newly presented negFIN-STARM approach with three well-known approaches in the literature, namely, the T-Apriori [23], T-FPGrowth [8], and T-ECLAT methods [9]. Hereinafter, the abbreviations such as T-Apriori, T-FPGrowth, and T-ECLAT stand for Temporal Apriori, Temporal FPGrowth, and Temporal Eclat, respectively.

5.1. Running Environment

The numerical experiment was conducted on a laptop computer equipped with a 2.00 GHz Intel Core i7 processor and 8 GB of RAM running the 64-bit Microsoft Windows 10 operating system. The algorithms are coded in Java 13.0.1 using IntelliJ IDEA 2019.2.2. The performance of the selected methods is evaluated by the runtime over the aforementioned data sets. For higher accuracy, the codes corresponding to the four methods were executed 5 times under the same conditions. The comparison is made in terms of the average values of the runtime.

5.2. Description of Data Sets

Two commonly used data sets are employed for comparing our method with existing methods mentioned above. These data sets are available from the open-source data mining library SPMF (The SPMF library at http://www.philippe-fournier-viger.com/spmf/.) founded by Philippe Fournier-Viger. The first data set is the chess data set adapted based on the UCI chess data set. The second one is the mushroom data drawn from The Audubon Society Field Guide to North American Mushrooms. Table 5 gives a basic description of these data sets.

5.3. Results and Comparative Analysis

At first, we conduct numerical experiments and comparative analysis of four different methods using the chess data set. This data set contains 3196 transactions, each uniquely related to a period in . There are 75 different items in the item domain of this data set. We consider the cases when and . The min-TS and min-TC are simply denoted by and , respectively. The runtime comparison based on the chess data set of four methods under different thresholds is shown in Figure 2. More details regarding the average runtime (in milliseconds) of four methods on the chess data set are listed in Tables 6 and 7.

(a)

(b)

(c)

(d)

As shown in Figure 2(a), the negFIN-STARM method is faster than the T-Apriori, T-ECLAT, and T-FPGrowth methods when , the min-TS , and the min-TC is 75%, 85%, and 95%, respectively. In addition, Figure 2(b) illustrates that the negFIN-STARM method runs faster than the other methods when and is designated as 400, 450, and 500, respectively.

From Figure 2(c), we see that the negFIN-STARM method is faster than the T-Apriori, T-ECLAT, and T-FPGrowth methods when , , and is designated as 75%, 85%, and 95%, respectively. In addition, Figure 2(d) illustrates that our new method performs better than those existing methods when and is set as 400, 450, and 500, respectively.

Furthermore, the quantity comparison of the obtained TARs based on the chess data set under different thresholds is demonstrated in Figure 3. In brief, we can find that the rule number decreases when the threshold increases.

(a)

(b)

(c)

(d)

More specifically, Figure 3(a) shows that if , the min-TS , and the min-TC is specified as 75%, 85%, and 95%, the number of TARs is 1481 178, 1290 849, and 348 100, respectively. In addition, as illustrated in Figure 3(b), when the min-TC , and is specified as 400, 450, and 500, the rule number is 26 632 597, 6750 662, and 1290 849, respectively.

Figure 3(c) shows that when , , and is specified as 75%, 85%, and 95%, the rule number is 312 092, 282 769, and 76 564, respectively. In addition, as shown in Figure 3(d), when and is specified as 400, 450, and 500, the rule number is 9024 350, 1926 626, and 282 769, respectively.

Similarly, we also conduct numerical experiments and comparative analysis of four methods using the mushroom data set. This data set contains 8124 transactions, each uniquely related to a period in . There are 119 different items in the item domain of this data set. We consider the cases when and . The min-TS and min-TC are simply denoted by and , respectively. The runtime comparison with respect to the mushroom data set of four methods under different thresholds is shown in Figure 4. The quantity comparison of the obtained TARs based on the mushroom data set under different thresholds is illustrated in Figure 5. The average runtime (in milliseconds) of four methods on the mushroom data set is listed in Tables 8 and 9.

(a)

(b)

(c)

(d)

(a)

(b)

(c)

(d)

The above numerical experiments demonstrate that our newly proposed method is a helpful apparatus for mining TARs. The comparative analysis illustrates that the negFIN-STARM method performs better than three well-known existing methods, which are the T-Apriori, T-FPGrowth, and T-ECLAT methods.

6. Conclusions

This paper is devoted to enhancing association rule mining by virtue of temporal soft sets. The notion of temporal granulation mappings was defined to induce the granular structure of a given temporal transaction data set. With the help of temporal granulation mappings, we introduced temporal soft sets and their -clip soft sets, which enable us to establish a conceptual framework for extracting TARs. Specially, we presented a number of useful characterizations and related results within this framework, including a necessary and sufficient condition for fast identification of strong TARs. An illustrative example regarding the Nobel Prizes was presented to show how these concepts and results can help facilitate TARM. We also developed a novel method, named negFIN-STARM, for extracting strong TARs by taking advantage of both temporal soft sets and NegNodeset-based frequent item set mining techniques. In addition, two commonly used data sets were employed to verify the feasibility of the negFIN-STARM method. Numerical results have shown that the negFIN-STARM method has better performance than existing approaches such as T-Apriori, T-ECLAT, and T-FPGrowth. It is robust with respect to the selection of different min-TS and min-TC thresholds as well. In future, it will be interesting to investigate the mining of maximal TARs using TSSs and consider its potential applications to dynamic detection, fault diagnosis, and optimal control in industrial processes.

Data Availability

The data used to support the findings of this study are available from the SPMF library at http://www.philippe-fournier-viger.com/spmf/ founded by Dr. Philippe Fournier-Viger.

Conflicts of Interest

The authors declare that there are no conflicts of interest.

Acknowledgments

This work was partially supported by the National Natural Science Foundation of China (grant no. 11 301 415), the Shaanxi Provincial Key Research and Development Program (grant no. 2021SF-480), and the Natural Science Basic Research Plan in Shaanxi Province of China (grant no. 2018JM1054).

References

Y. Li, Y. Zhao, G. Wang, Z. Wang, and M. Gao, “ELM-Based large-scale genetic association study via statistically significant pattern,” IEEE Transactions on Systems, Man, and Cybernetics: Systems, vol. 49, no. 10, pp. 2175–2188, 2019.
View at: Publisher Site | Google Scholar
Y. Yong-Bin Kang, S. Krishnaswamy, and A. Zaslavsky, “A retrieval strategy for case-based reasoning using similarity and association knowledge,” IEEE Transactions on Cybernetics, vol. 44, no. 4, pp. 473–487, 2014.
View at: Publisher Site | Google Scholar
J. Nahar, T. Imam, K. S. Tickle, and Y.-P. P. Chen, “Association rule mining to detect factors which contribute to heart disease in males and females,” Expert Systems with Applications, vol. 40, no. 4, pp. 1086–1093, 2013.
View at: Publisher Site | Google Scholar
F. Xiao, “A distance measure for intuitionistic fuzzy sets and its application to pattern classification problems,” IEEE Transactions on Systems, Man, and Cybernetics: Systems, vol. 51, no. 6, pp. 3980–3992, 2021.
View at: Publisher Site | Google Scholar
D. Liu, Y. Yuan, H. Zhu, S. Teng, and C. Huang, “Balance preferences with performance in group role assignment,” IEEE Transactions on Cybernetics, vol. 48, no. 6, pp. 1800–1813, 2018.
View at: Publisher Site | Google Scholar
R. Agrawal, T. Imielinski, and A. Swami, “Mining associations between sets of items in large databases,” in Proceedings of the of ACM-SIGMOD Internation Conference on Management of Data, pp. 207–216, Washington, NJ, USA, 1993.
View at: Google Scholar
R. Agrawal and R. Srikant, “Fast algorithms for mining association rules,” The VLDB Conference, vol. 1215, pp. 487–499, 1994.
View at: Google Scholar
J. Han, J. Pei, and Y. Yin, “Mining frequent patterns without candidate generation,” ACM SIGMOD Record, vol. 29, no. 2, pp. 1–12, 2000.
View at: Publisher Site | Google Scholar
M. J. Zaki, “Scalable algorithms for association mining,” IEEE Transactions on Knowledge and Data Engineering, vol. 12, no. 3, pp. 372–390, 2000.
View at: Publisher Site | Google Scholar
N. Aryabarzan, B. Minaei-Bidgoli, and M. Teshnehlab, “negFIN: an efficient algorithm for fast mining frequent itemsets,” Expert Systems with Applications, vol. 105, pp. 129–143, 2018.
View at: Publisher Site | Google Scholar
Y. Djenouri, A. Belhadi, P. Fournier-Viger, and H. Fujita, “Mining diversified association rules in big datasets: a cluster/GPU/genetic approach,” Information Sciences, vol. 459, pp. 117–134, 2018.
View at: Publisher Site | Google Scholar
J. M. Luna, F. Padillo, M. Pechenizkiy, and S. Ventura, “Apriori versions based on MapReduce for mining frequent patterns on big data,” IEEE Transactions on Cybernetics, vol. 48, no. 10, pp. 2851–2865, 2018.
View at: Publisher Site | Google Scholar
R. Feldman, Y. Aumann, A. Amir, A. Zilberstein, and W. Klosgen, “Maximal association rules: a new tool for mining for keywords co-occurrences in document collections,” in Proceedings of the Third International Conference on KDD, pp. 167–170, Newport Beach, CA, USA, August 1997.
View at: Google Scholar
A. Amir, Y. Aumann, R. Feldman, and M. Fresko, “Maximal association rules: a tool for mining associations in text,” Journal of Intelligent Information Systems, vol. 25, no. 3, pp. 333–345, 2005.
View at: Publisher Site | Google Scholar
U. Yun, D. Kim, E. Yoon, and H. Fujita, “Damped window based high average utility pattern mining over data streams,” Knowledge-Based Systems, vol. 144, pp. 188–205, 2018.
View at: Publisher Site | Google Scholar
P. Fournier-Viger, Y. Zhang, J. Chun-Wei Lin, H. Fujita, and Y. S. Koh, “Mining local and peak high utility itemsets,” Information Sciences, vol. 481, pp. 344–367, 2019.
View at: Publisher Site | Google Scholar
W. Gan, J. C.-W. Lin, P. Fournier-Viger, H.-C. Chao, and P. S. Yu, “HUOPM: high-utility occupancy pattern mining,” IEEE Transactions on Cybernetics, vol. 50, no. 3, pp. 1195–1208, 2020.
View at: Publisher Site | Google Scholar
D. Nguyen, W. Luo, D. Phung, and S. Venkatesh, “LTARM: a novel temporal association rule mining method to understand toxicities in a routine cancer treatment,” Knowledge-Based Systems, vol. 161, pp. 313–328, 2018.
View at: Publisher Site | Google Scholar
H. Nam, K. Lee, and D. Lee, “Identification of temporal association rules from time-series microarray data sets,” BMC Bioinformatics, vol. 10, pp. 1471–2105, 2009.
View at: Publisher Site | Google Scholar
S. G. Matthews, M. A. Gongora, A. A. Hopgood, and S. Ahmadi, “Web usage mining with evolutionary extraction of temporal fuzzy association rules,” Knowledge-Based Systems, vol. 54, pp. 66–72, 2013.
View at: Publisher Site | Google Scholar
A. Segura-Delgado, M. J. Gacto, R. Alcalá, and J. Alcalá-Fdez, “Temporal association rule mining: an overview considering the time variable as an integral or implied component,” WIREs Data Mining Knowl Discov, vol. 10, pp. 1367–1389, 2020.
View at: Google Scholar
R. Agrawal and R. Srikant, “Mining sequential patterns,” in Proceedings of the Eleventh International Conference on Data Engineering, pp. 3–14, Taipei, Taiwan, May 1995.
View at: Google Scholar
L. Zhai, X. M. Tang, L. Li, and W. L. Jiang, “Temporal association rule mining based on T-Apriori algorithm and its typical application,” in Proceedings of the International Symposium on Spatio-temporal Modeling, Spatial Reasoning, Spatial Analysis, Data Mining and Data Fusion, pp. 1–6, Beijing, China, August 2005.
View at: Google Scholar
W. Gan, J. C.-W. Lin, J. Zhang, H.-C. Chao, H. Fujita, and P. S. Yu, “ProUM: projection-based utility mining on sequence data,” Information Sciences, vol. 513, pp. 222–240, 2020.
View at: Publisher Site | Google Scholar
T.-P. Hong, G.-C. Lan, J.-H. Su, P.-S. Wu, and S.-L. Wang, “Discovery of temporal association rules with hierarchical granular framework,” Applied Computing and Informatics, vol. 12, no. 2, pp. 134–141, 2016.
View at: Publisher Site | Google Scholar
H. S. Song, J. K. Kim, and S. H. Kim, “Mining the change of customer behavior in an internet shopping mall,” Expert Systems with Applications, vol. 21, no. 3, pp. 157–168, 2001.
View at: Publisher Site | Google Scholar
U. Yun, H. Ryang, G. Lee, and H. Fujita, “An efficient algorithm for mining high utility patterns from incremental databases with one database scan,” Knowledge-Based Systems, vol. 124, pp. 188–206, 2017.
View at: Publisher Site | Google Scholar
D. Molodtsov, “Soft set theory-First results,” Computers & Mathematics with Applications, vol. 37, no. 4-5, pp. 19–31, 1999.
View at: Publisher Site | Google Scholar
P. K. Maji, R. Biswas, and A. R. Roy, “Soft set theory,” Computers & Mathematics with Applications, vol. 45, no. 4-5, pp. 555–562, 2003.
View at: Publisher Site | Google Scholar
M. I. Ali, F. Feng, X. Liu, W. K. Min, and M. Shabir, “On some new operations in soft set theory,” Computers & Mathematics with Applications, vol. 57, no. 9, pp. 1547–1553, 2009.
View at: Publisher Site | Google Scholar
K. V. Babitha and J. J. Sunil, “Soft set relations and functions,” Computers & Mathematics with Applications, vol. 60, no. 7, pp. 1840–1849, 2010.
View at: Publisher Site | Google Scholar
F. Feng and Y. Li, “Soft subsets and soft product operations,” Information Sciences, vol. 232, pp. 44–57, 2013.
View at: Publisher Site | Google Scholar
P. K. Maji, R. Biswas, and A. R. Roy, “Fuzzy soft sets,” Journal of Fuzzy Mathematics, vol. 9, no. 3, pp. 589–602, 2001.
View at: Google Scholar
P. K. Maji, R. Biswas, and A. R. Roy, “Intuitionistic fuzzy soft sets,” Journal of Fuzzy Mathematics, vol. 9, no. 3, pp. 677–692, 2001.
View at: Google Scholar
K. Gong, Z. Xiao, and X. Zhang, “The bijective soft set with its operations,” Computers & Mathematics with Applications, vol. 60, no. 8, pp. 2270–2278, 2010.
View at: Publisher Site | Google Scholar
P. Majumdar and S. K. Samanta, “Generalised fuzzy soft sets,” Computers & Mathematics with Applications, vol. 59, no. 4, pp. 1425–1432, 2010.
View at: Publisher Site | Google Scholar
Y. Liu, R. M. Rodríguez, J. C. R. Alcantud, K. Qin, and L. Martínez, “Hesitant linguistic expression soft sets: application to group decision making,” Computers & Industrial Engineering, vol. 136, pp. 575–590, 2019.
View at: Publisher Site | Google Scholar
M. Akram, A. Adeel, and J. C. R. Alcantud, “Fuzzy N-soft sets: a novel model with applications,” Journal of Intelligent and Fuzzy Systems, vol. 35, no. 4, pp. 4757–4771, 2018.
View at: Publisher Site | Google Scholar
M. I. Ali and M. Shabir, “Logic connectives for soft sets and fuzzy soft sets,” IEEE Transactions on Fuzzy Systems, vol. 22, no. 6, pp. 1431–1442, 2014.
View at: Publisher Site | Google Scholar
X. Ma and H. Qin, “A distance-based parameter reduction algorithm of fuzzy soft sets,” IEEE Access, vol. 6, pp. 10530–10539, 2018.
View at: Publisher Site | Google Scholar
M. Irfan Ali, “A note on soft sets, rough soft sets and fuzzy soft sets,” Applied Soft Computing, vol. 11, no. 4, pp. 3329–3332, 2011.
View at: Publisher Site | Google Scholar
J. C. R. Alcantud, “Some formal relationships among soft sets, fuzzy sets, and their extensions,” International Journal of Approximate Reasoning, vol. 68, pp. 45–53, 2016.
View at: Publisher Site | Google Scholar
J. C. R. Alcantud, F. Feng, and R. R. Yager, “An $N$-Soft set approach to rough sets,” IEEE Transactions on Fuzzy Systems, vol. 28, no. 11, pp. 2996–3007, 2020.
View at: Publisher Site | Google Scholar
F. Feng, C. Li, B. Davvaz, and M. I. Ali, “Soft sets combined with fuzzy sets and rough sets: a tentative approach,” Soft Computing, vol. 14, no. 9, pp. 899–911, 2010.
View at: Publisher Site | Google Scholar
F. Feng, X. Liu, V. Leoreanu-Fotea, and Y. B. Jun, “Soft sets and soft rough sets,” Information Sciences, vol. 181, no. 6, pp. 1125–1137, 2011.
View at: Publisher Site | Google Scholar
P. K. Maji, A. R. Roy, and R. Biswas, “An application of soft sets in a decision making problem,” Computers & Mathematics with Applications, vol. 44, no. 8-9, pp. 1077–1083, 2002.
View at: Publisher Site | Google Scholar
N. Çağman and S. Enginoğlu, “Soft set theory and-decision making,” European Journal of Operational Research, vol. 207, pp. 848–855, 2010.
View at: Google Scholar
F. Feng, Y. Li, and N. Çağman, “Generalized uni-int decision making schemes based on choice value soft sets,” European Journal of Operational Research, vol. 220, no. 1, pp. 162–170, 2012.
View at: Publisher Site | Google Scholar
J. Zhan, Q. Liu, and T. Herawan, “A novel soft rough set: soft rough hemirings and corresponding multicriteria group decision making,” Applied Soft Computing, vol. 54, pp. 393–402, 2017.
View at: Publisher Site | Google Scholar
Y. Liu, K. Qin, and L. Martínez, “Improving decision making approaches based on fuzzy soft sets and rough soft sets,” Applied Soft Computing, vol. 65, pp. 320–332, 2018.
View at: Publisher Site | Google Scholar
F. Feng, H. Fujita, M. I. Ali, R. R. Yager, and X. Liu, “Another view on generalized intuitionistic fuzzy soft sets and related multiattribute decision making methods,” IEEE Transactions on Fuzzy Systems, vol. 27, no. 3, pp. 474–488, 2019.
View at: Publisher Site | Google Scholar
J. C. R. Alcantud, S. Cruz-Rambaud, and M. J. M. Torrecillas, “Valuation fuzzy soft sets: a flexible fuzzy soft set based decision making procedure for the valuation of assets,” Symmetry, vol. 9, no. 253, 2017.
View at: Publisher Site | Google Scholar
H. Qin, X. Ma, J. M. Zain, and T. Herawan, “A novel soft set approach in selecting clustering attribute,” Knowledge-Based Systems, vol. 36, pp. 139–145, 2012.
View at: Publisher Site | Google Scholar
F. Xiao, “A hybrid fuzzy soft sets decision making method in medical diagnosis,” IEEE Access, vol. 6, pp. 25300–25312, 2018.
View at: Publisher Site | Google Scholar
X. Ma, H. Qin, N. Sulaiman, T. Herawan, and J. H. Abawajy, “The parameter reduction of the interval-valued fuzzy soft sets and its related algorithms,” IEEE Transactions on Fuzzy Systems, vol. 22, no. 1, pp. 57–71, 2014.
View at: Publisher Site | Google Scholar
K. Gong, Y. Wang, M. Xu, and Z. Xiao, “BSSReduce anincremental feature selection approach for large-scale and high-dimensional data,” IEEE Transactions on Fuzzy Systems, vol. 26, no. 6, pp. 3356–3367, 2018.
View at: Publisher Site | Google Scholar
F. Feng, M. Akram, B. Davvaz, and V. Leoreanu-Fotea, “Attribute analysis of information systems based on elementary soft implications,” Knowledge-Based Systems, vol. 70, pp. 281–292, 2014.
View at: Publisher Site | Google Scholar
Y. B. Jun and C. H. Park, “Applications of soft sets in ideal theory of BCK/BCI-algebras,” Information Science, vol. 178, pp. 2466–2475, 2008.
View at: Publisher Site | Google Scholar
X. Zhang, C. Park, and S. Wu, “Soft set theoretical approach to pseudo-BCI algebras,” Journal of Intelligent and Fuzzy Systems, vol. 34, no. 1, pp. 559–568, 2018.
View at: Publisher Site | Google Scholar
X. Xin, R. A. Borzooei, M. Bakhshi, and Y. B. Jun, “Intuitionistic fuzzy soft hyper BCK algebras,” Symmetry, vol. 11, no. 399, 2019.
View at: Publisher Site | Google Scholar
M. I. Ali, M. Shabir, and F. Feng, “Representation of graphs based on neighborhoods and soft sets,” International Journal of Machine Learning and Cybernetics, vol. 8, no. 5, pp. 1525–1535, 2017.
View at: Publisher Site | Google Scholar
B. Santos-Buitrago, A. Riesco, M. Knapp, J. C. R. Alcantud, G. Santos-García, and C. Talcott, “Soft set theory for decision making in computational biology under incomplete information,” IEEE Access, vol. 7, pp. 18183–18193, 2019.
View at: Publisher Site | Google Scholar
S. J. John, Softs Sets: Theory and Applications, Springer, Berlin, Germany, 2021.
T. Herawan and M. M. Deris, “A soft set approach for association rules mining,” Knowledge-Based Systems, vol. 24, no. 1, pp. 186–195, 2011.
View at: Publisher Site | Google Scholar
F. Feng, Y. M. Li, C. X. Li, and B. H. Han, “Soft set based approximate reasoning: a quantitative logic approach,” in Proceedings of the International Conference QL&SC, vol. 82, pp. 245–255, Xiamen, China, January 2010.
View at: Google Scholar
F. Feng, J. Cho, W. Pedrycz, H. Fujita, and T. Herawan, “Soft set based association rule mining,” Knowledge-Based Systems, vol. 111, pp. 268–282, 2016.
View at: Publisher Site | Google Scholar
Z. Pawlak and A. Skowron, “Rudiments of rough sets,” Information Sciences, vol. 177, no. 1, pp. 3–27, 2007.
View at: Publisher Site | Google Scholar

Copyright

Copyright © 2021 Xiaoyan Liu et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

PDF Download Citation

Download other formats

Order printed copies

Views

815

Downloads

472

Citations