Abstract

In order to realize efficient data processing in wireless network, this paper designs an automatic classification algorithm of multisearch data association rules in a wireless network. According to the algorithm, starting from the mining of multisearch data association rules, from the discretization of continuous attributes of multisearch data, generation of fuzzy classification rules, and the design of association rule classifier and other aspects, automatic classification is completed by using the mining results. Experimental results show that this algorithm has the advantages of small classification error, good real-time performance, high coverage rate, and high feasibility.

1. Introduction

Along with the computer science and technology, especially the rapid development of database technology, as well as the expansion of the scope of human activities and life rhythm speeding up, people can more quickly and more easily, in a more cheap way, obtain and store data, making people increase the ability of generating and collecting data; this makes the data and information indices increase [1]. However, in the face of these extremely inflated numbers of data, it is difficult to see the real valuable information contained in the search, so that people constantly feel the huge pressure from “information explosion” and “data surplus.” The proliferation of data hides a lot of information that plays a great role in the progress of human society. Wang proposes a secure data aggregation algorithm for a multidata query, which solves the problem of data missing by tracking the query data and safely tracking multiple data [2]. If these massive data are not used effectively, they will only become “data garbage.” Therefore, how to find really useful information from a large number of wireless network multisearch data has become the focus of attention. Yogarajan and Revathi proposed an optimal mobile data collection method for wireless sensor networks. This method studies the classification and collection of mobile data, solves the problem of data classification, and simultaneously searches and classifies data globally, and provides technical support for data confidentiality [3].

In the face of “data explosion and lack of knowledge,” data mining technology arises at the historic moment and develops vigorously, showing its strong vitality day by day. Data mining is the process of extracting or “mining” knowledge from a large amount of data, that is, extracting hidden information and knowledge that people do not know beforehand but are potentially useful from a large number of incomplete, noisy, fuzzy, and random data. Data mining bridges the gap between data and knowledge. It can not only describe the development process of historical data but also infer the current data and further predict the development trend of the future. Compared with traditional decision tree algorithms, association rules have higher classification and prediction accuracy. Therefore, in the field of data mining, association classification has been widely concerned. At present, China, Canada, the United States, and other countries have established the National Natural Science Foundation to study the classification of association rules. There are more scholars engaged in this field. With the deepening of research, association rules have become one of the cornerstones of data mining and have been widely used in decision analysis, business management, and other fields. For example, today’s database vendors, Oracle, IBM, and Microsoft, have also inherited the correlation analysis functionality in their products.

At present, the relevant research in China started relatively late, beginning in the early 1980s. At present, the research on text classification in China mainly has three stages,: the first is the feasibility study, the second is the auxiliary classification, and the last is the automatic classification. At the beginning, there were few researches on Chinese text classification, which mainly applied English classification technology to the Chinese research field. At the end of the 1990s, the taxonomy of textual texts began to be studied gradually. However, in the specific research, the characteristics of the Chinese text should be considered comprehensively in order to form a Chinese classification system. There are still many institutions in our country, such as Tsinghua University, Shanghai Jiao Tong University, Harbin Institute of Technology, Northeastern University, Institute of Computing Technology, Chinese Academy of Sciences, and Fudan University, which have done a lot of work in this area. Since 1951, many automatic classification systems have been developed in China. For example, Wu Jun et al. at Tsinghua University used weighted words as features of textual data and then used classification algorithms to construct classifiers. Using the SVM method, Zou Tao et al. from Nanjing University designed an automatic Chinese text classification system CTDCS. Liu Zhengying of Shanxi University and others developed an automatic financial classification system. Based on the neural network optimization algorithm, Wang Yongcheng et al. from Shanghai Jiao Tong University studied the Chinese text classification system. The important role of data mining technology in various fields has been increasingly prominent. Classification is an important analysis method in the field of data mining, while association rule mining and classification are an important research direction in the field of data mining. As two highly active research fields in data mining, they share certain similarities. Since then, the combination of these two important technologies, that is, the mining of association rules for the task of classification, will open a new journey of data classification, namely, association classification. Literature [4] puts forward the concept of “mutual confidence,” highlighting the research results of association rules, making the research results of one-dimensional association rules more meaningful and attracting the attention of data mining workers. On the basis of studying the theory of negative association rules, this paper uses reverse thinking to improve the data mining algorithm based on negative association rules and gives examples, ideas, and experimental verification. The proposed algorithm is a useful supplement to the study of negative association rules. In literature [5], a new learning method called RMTFL is proposed. Firstly, a matrix is used to explain the correlation between multitasks and different features, and the feature space of related tasks is calculated by Group Lasso. According to the results, isolated tasks can be obtained. Aiming at the cross-domain text classification, Jin Xiaoming et al. from Tsinghua University proposed the cross-domain active learning method. Zheng Mingling et al. from Southeast University described the correlation between class criteria by referring to the Bayesian network, so that the learning problem of multiple class criteria was changed into a series of classification problems of single-class criteria, which made the mining performance of this algorithm on multiple data sets exceed the performance of existing methods. By Tsinghua University’s Wang Jianyong discriminant model by studying the uncertainty data mining problems, in literature [6], the HARMONY algorithm is proposed; the algorithm has time-consuming features that do not need to choose, and from the database, directly find the discriminant model, which makes HARMONY a classic uncertainty than the SVM classification algorithm; the usage effect of the algorithm has larger ascension.

The innovation of this paper lies in the fact that association classification is based on association rules, which not only reflects the application characteristics of knowledge but also reflects the intrinsic correlation characteristics of knowledge. This process is mainly embodied in two aspects: the mining method of multisearch data association rules in a wireless network and how to analyze and classify the mining rules. Based on the above background, this paper designs an automatic classification algorithm for multisearch data association rules in wireless networks.

The research contributions of the thesis include the following: (1)This paper designs an automatic classification algorithm for multiple search data association rules in wireless networks(2)According to the automatic classification algorithm, the discretization of the continuous attributes of the multisearch data starts from mining the association rules of the multisearch data(3)Fuzzy classification of fuzzy data and related design of association rule classifier

The organization structure of this article is as follows.

The first section discusses the introduction part of the paper; the second section discusses the classification algorithm; the third section conducts an experimental analysis; and the fourth section summarizes the paper.

2. Algorithm Design

2.1. Multisearch Data Association Rule Mining

Association rule mining technology is an important research content in the field of data mining. Its purpose is to find the correlation between various data items in the multisearch data set of a given wireless network, that is, to find out the frequently occurring items or attributes in the data set [7, 8]. Generally speaking, the mining process of association rules mainly includes two steps: first, generate all the frequent item sets in the wireless network multisearch data set, and second, generate association rules with the generated frequent item sets. Among them, generating frequent item sets is a key step that affects the quality of association rule mining results.

Figure 1 shows a basic association rule mining process.

Excavation was carried out in two steps: (1)Firstly, the wireless network multisearch database is scanned to store the identity of each item set, and a candidate set is generated. Then, the item set that does not meet the minimum support threshold is deleted from to generate the frequent item set (2)The loop executes until is empty. Then, and are connected first to generate , and a new identifier list can be obtained through the intersection of the identifier list. The count of the item set can be obtained from , and then, the size between the count of and the minimum support threshold is compared. The item sets greater than or equal to are kept, the rest are deleted, and the final output is frequent item set

In the process of generating frequent item sets of the Apriori algorithm, there are two factors that affect the performance of the algorithm [9, 10]. First of all, in the process of iteration, the frequent item set generated each time needs to scan the original wireless network multisearch database, so the database is scanned too many times, resulting in the algorithm performance decline. Secondly, in the process of tree pruning, it is necessary to scan the frequent set for the occurrence of candidate item sets, so the frequent item sets need to be scanned for many times, resulting in the decrease of algorithm efficiency.

In view of the above problems, this study improves the process of frequent item set generation and proposes an improved Apriori algorithm, which is called as follows.

Input: database (), minimum support (min_ sup);
Output: frequent item set ().
 = find_candidate_1-itemsets ();
Int count = the number of TID in ;
For each item set of {.
 s.item-set = ;
 s.count = count of in ;
 s.Tid-list = the set of all TID includes ;
 If s.count<min_sup count.
   delete in ;
}
;
for (; ; ++){.
 for each item set 11 in .
  for each item set 12 in {.
   c.Tid-list =11,Tid-list;
   c.count = count TID in c.Tid-list;
}
  if c.count> = min_sup count.
 add to ;
.
}

In the I-Apriori algorithm, in the process of generating candidate set every time, in addition to storing and support count, it is more important to add identifier list . After the connection operation between item sets is completed, the list of identifiers and the of item sets can be obtained directly through the , and there is no need to rescan the wireless network multisearch database, thus effectively improving the performance of the algorithm.

2.2. Automatic Classification of Multisearch Data Association Rules

As the name suggests, association rule classification refers to the association rules used to distinguish or predict the tags of an instance class [11, 12]. It reflects not only the application characteristics of knowledge, namely, classification or prediction. It also reflects the intrinsic relevance of knowledge. General association rules can only be used to describe concepts, while the role of classification association rules has two concepts description and classification. Different from most other classification methods, association classification has better classification ability and better description ability.

Associative classification techniques generally involve two steps. The first step is to find all the classification association rules whose right part is the class label. The second step is to select the higher priority rule from the found classification association rules to classify the test set. The priority of rules is usually evaluated according to the confidence, support, rule length, or general quality criteria of classification association rules [1315].

In this study, we propose an improved fuzzy associative classification algorithm, the fuzzy -means clustering algorithm for continuous attribute fuzzy interval, obtaining high-quality fuzzy classification rules, and on this basis, to join a new pruning strategy to avoid generating useless rules, rules at the same time, using a new importance measure to fusion of multiple fuzzy classification rules, in order to improve classification accuracy.

2.2.1. Discretization of Continuous Attributes of Multisearch Data

The association classification algorithm can only deal with the discrete data directly, and the continuous attributes of the multisearch data need to be preprocessed and fuzzy. The fuzzy -means clustering algorithm can be used to divide the fuzzy interval according to the distribution characteristics of the data. It is a clustering algorithm based on partition. Its idea is to make the similarity between objects divided into the same cluster maximum, while the similarity between different clusters minimum [16]. The effect of text classification is largely related to the characteristics of the data set itself. In practical research and application, it is generally believed that there is no certain method that can be suitable for data with various characteristics. So in practical application, to apply association classification to the field of text classification, we still have to face the following two problems. One is to use the characteristics of the document itself to improve the speed and performance of classification. This includes the use of document content and organization rules to select appropriate data sources and abstract description of problems. The second is to consider the form and structure of the document itself. This includes analyzing the content of the document. Generally, the document will contain information such as title, author, and text, while the scientific and technological document will contain some key words, and the news manuscript will have a specific time, and WEB documents will contain hyperlinks, tags, and other information. The complaint information will contain the identification of the handling unit of the complaint information. So in practical application, we should decide how to preprocess according to the form and structure of the document itself and specifically study how to improve the algorithm to improve the speed and accuracy of text classification. In this paper, we will rationalize the abstract description of the research problem and then optimize the algorithm according to the actual situation, so as to better realize the text classification problem based on association rules [1720].

When fuzzy -means clustering algorithm is used for fuzzy classification of attributes, it is necessary to set the number of fuzzy intervals, which is often given manually and has a great impact on classification accuracy. In order to achieve better fuzzy partition effect, the validity index of the PBM index is adopted to select the optimal interval number:

wherein represents the number of fuzzy intervals. On this basis, the PBM index is calculated iteratively, and the maximum PBM index is selected as the final result of the fuzzy interval discretization.

2.2.2. Generate Fuzzy Classification Rules

After fuzzy -means clustering algorithm is used to fuzzy the continuous attributes of wireless network multisearch data, each continuous attribute is related to a group of fuzzy sets and has a corresponding fuzzy degree. From the definition of fuzzy classification rules, it can be seen that the rules used for fuzzy association classification are only a small subset of the whole set of fuzzy association rules, and the following items only have a single language value as a category label. In order to improve the efficiency of the algorithm, the following pruning strategies were added in this study: (1)When the candidate frequent 2-item set is connected by the Apriori algorithm, the candidate set that does not contain classification label is deleted. At the same time, the classification label of each frequent 2-item set is moved to the head of each frequent 2-item set when the frequent 2-item set is connected by the Apriori algorithm

First, this policy ensures that each classification rule contains a unique classification label. When generating a candidate frequent 2-item set, this strategy first checks whether each generated candidate frequent 2-item set contains a classification tag, and if it does not, it removes it, thus ensuring that useless rules that do not contain a classification tag are removed. Then, for the generated frequent 2-item set, you reorder it, moving each category label to the head of the rule. According to the attributes of the Apriori algorithm, the frequent item sets generated after that will have at least one classification label, avoiding the generation of useless rules and thus effectively reducing the number of candidate item sets. (2)When generating the candidate frequent -item set, the item set that has more than one language values for an attribute in the candidate frequent -item set is deleted. This policy ensures that each attribute in the candidate frequent -item set corresponds to only one language value. If a certain attribute in a classification rule corresponds to multiple language values, it is easy to cause uncertainty in classification

The goal of the above two strategies is to reduce the generation of useless candidate sets as early as possible, thereby directly generating useful classification rules. They can greatly reduce the number of candidate sets, thus reducing the database scan time and ensuring the generation of effective fuzzy classification rules.

The process of mining fuzzy association classification rules is as follows: (Step 1)get frequent 1-item set (Step 2)the candidate frequent 2-item set is generated and pruned(Step 3)scanning the wireless network multisearch database, computing the support of candidate frequent -item sets, deleting the items that do not conform to the minimum support, and obtaining frequent -item sets(Step 4)generate candidate frequent item set from frequent item set (Step 5)pruning and reordering candidate frequent item sets(Step 6)return to Step 3 until no new candidate -item sets are generated(Step 7)delete frequent item sets whose trust level does not meet the minimum trust level set by the user.

There are many algorithms for association rules, but there is a classic algorithm, the Apriori algorithm, and most other algorithms are refinements or deductions of this algorithm. In essence, the Apriori algorithm is analyzed and considered to be a wide priority algorithm. The essence of the whole algorithm is that it needs to scan our transaction database, and then, in the scanning process, frequent transaction items are found, that is, frequent transaction item sets. It is worth noting that the algorithm is to find all the frequent item sets. Therefore, there will be multiple scans, and in each scan, only a set of items of the same length will be considered. That is, the number of items contained in the item set. This is what is called the -item set. To give a concrete example, for example, during the first scan, the algorithm first finds all the item sets in the database that meet the minimum support of 1 item set and therefore has length 1 frequent item sets. Everybody knows it is . After the first step, we need to further mine out the frequent 2-item set based on , i.e., . Continue the loop with this method to find the frequent 3 item sets until the frequent item sets. The algorithm works until it can no longer find frequent item sets. One problem with this approach is that each time a frequent item set is generated, the entire transaction database needs to be scanned.

2.2.3. Wireless Network Multisearch Data Association Rule Classifier

After generating fuzzy classification rules, we need to use these rules to build a classification system. The key problem of classifier construction is how to determine the importance of different rules and finally set the label of the data to be classified.

The classification rule with the highest membership degree and the highest confidence degree can be simply regarded as the most important rule, so as to make use of this rule for classification, because confidence is a probability estimate of the importance of classification rules, and membership can be seen as the matching degree of the data to be classified to the rules. However, such algorithms adopt a single criterion or use a single rule to classify data, which may lead to paranoia about some categories, resulting in a decline in classification accuracy.

Taking the rule in Table 1 as an example, for the multisearch data of a wireless network to be classified, rule and rule have the highest membership degree, and the confidence degree of rule is higher than that of rule . Therefore, you can use rule to divide data into class . However, although the membership degree of rules and is low, their confidence and support degree are higher than that of rule . Therefore, it seems more reasonable to use rule or to classify wireless network multisearch data into or . In order to make more reasonable use of multirules for classification judgment, a new fuzzy classification rule judgment metric is introduced, as shown in the following formula:

where represents the membership degree, which represents the matching degree of rule for the pending wireless network multisearch data association rules. is support, and is trust.

In this study, the three parameters were used to measure the importance of fuzzy classification rules. On the one hand, the new measurement integrated membership, support, and trust to measure the importance of fuzzy classification rules, rather than simply relying on a certain parameter, which can better avoid the paranoid about some categories. On the other hand, on the basis of this determination measure, the classification data can be classified by using the whole fuzzy association classification set, which can effectively improve the classification accuracy of the classifier and avoid the occurrence of overfitting problem.

According to the new determination measure obtained above, rule has the highest importance , so the data to be classified is classified as .

After using Equation (2) to calculate the importance of each rule, the following formula is used to calculate the average weight of each category. Finally, the category with the largest average weight is selected as the classification category to complete the classification process.

In conclusion, the automatic classification algorithm for multisearch data association rules in wireless network is designed.

3. Experiment and Analysis

In order to verify the feasibility of the automatic classification algorithm of multisearch data association rules in wireless network, a simulation experiment is designed to prove it.

3.1. Experimental Environment Design

The software and hardware information of the simulated experimental environment is shown in Table 2.

In order to avoid the uniformity of experimental results, the association rule mining classification algorithm based on FP-growth algorithm in reference [4] was compared with the bit table-based association rule mining and association classification algorithm in reference [5], and the performance verification was completed together with the algorithm in this paper.

3.2. Comparative Analysis of Classification Errors

The classification errors of the three algorithms are compared and analyzed, and the results are shown in Table 3.

As can be seen from Table 2, the classification error of the reference algorithm [4] is lower than that of the reference algorithm [5], but higher than that of the proposed algorithm. The maximum classification error of the algorithm in reference [5] is 4.5%, while the maximum classification error of the proposed algorithm is only 1.8%, which can prove that the proposed algorithm is more effective in classification.

3.3. Comparison of Time Spent in the Classification Process

The classification process time of the three algorithms was compared and analyzed, and the results were shown in Figure 2.

As can be seen from Figure 3, the classification process time of the three algorithms increases with the increase of the number of experimental iterations. Among them, through the application of the algorithm in this paper, the upward trend of classification process time is not obvious, and it always keeps a relatively low state. The other two methods are obviously less stable than the method in this paper. This indicates that the proposed algorithm has high timeliness of classification and lays a good foundation for the subsequent multisearch data processing in wireless networks.

3.4. Coverage Ratio

Coverage can reflect whether the classification results are widely distributed in the multisearch data set of wireless network. The coverage rates of the three algorithms are compared and analyzed, and the results are shown in Figure 3.

According to Figure 4, with the increase of value, the coverage rates of the three algorithms all show a downward trend. When the value of is greater than 60, the data curve of the algorithm in this paper is gradually stable, while the other two comparison algorithms do not show a trend of data stability, and the curve is not stable. The average coverage rate of the proposed algorithm is higher than that of the two methods.

4. Conclusion

This paper designs an automatic classification algorithm for multisearch data association rules in wireless networks. The algorithm starts from the mining of multisearch data association rules and uses parameters to measure the importance of fuzzy classification rules. According to the new measure, the importance of fuzzy classification rules is measured by integrating membership, support, and trust. Automatic classification is accomplished by discretizing continuous attributes of multisearch data with mining results, generating fuzzy classification rules, and designing association rule classifier. Through experiments, this study also proves that the algorithm has low classification error, high classification timeliness, and high coverage rate, indicating that the algorithm can effectively realize automatic classification of data association rules.

Data Availability

The data used to support the findings of this study are included within the article.

Conflicts of Interest

The authors declare that they have no conflicts of interest.