Abstract

Due to the problem of attribute redundancy in meteorological data from the Industrial Internet of Things (IIoT) and the slow efficiency of existing attribute reduction algorithms, attribute reduction based on a genetic algorithm for the coevolution of meteorological data was proposed. The evolutionary population was divided into two subpopulations: one subpopulation used elite individuals to assist crossover operations to increase the convergence speed of the algorithm, and the other subpopulation balanced the population diversity in the evolutionary process by introducing a random population; these two subpopulations completed the evolutionary operations together. With the TSDPSO-AR algorithm and ARAGA algorithm, the attribute reduction operation for precipitation in meteorological data was performed. The results showed that the proposed algorithm maintained the diversity of the population during evolution, improved the reduction performance, and simplified the information system.

1. Introduction

With the development of the Internet of Things technology, a large number of sensors and smart terminals are used in traditional industries, which will lead to a tremendous growth in big data. How to effectively manage large amounts of data in the Industrial Internet of Things (IIoT) to improve industrial production efficiency has become an urgent problem that needs to be solved. Meteorological elements are increasing remarkably, which brings some challenges [14]. To address this issue, this paper designs configurable meteorological data acquisition to meet the needs of more application scenarios. The increase in data amount is beneficial to the improved mining of potential meteorological patterns, but there is no clear purpose in the process of collecting meteorological data, and the change in meteorological phenomena is only related to some meteorological elements collected, where the attribute redundancy in the collected meteorological data is large. These redundant attributes not only reduce the mining efficiency of meteorological data but also reduce the data mining accuracy. Therefore, it is very important to perform attribute reduction on collected weather data. In the rough set theory, which is a data mining method that effectively deals with fuzzy and uncertain information, one of its core contents is deleting redundant attributes in the knowledge base under the condition of keeping the decision ability of the knowledge base unchanged [5]. Therefore, using attribute reduction to delete redundant attributes in meteorological data and improving the mining efficiency of meteorological data has important practical significance [6]. Many scholars nationally and abroad have conducted in-depth research and discussions on this method and have also made remarkable achievements. It has been proven that the minimum attribute reduction used for solving information systems is an NP-hard problem. Therefore, many scholars use heuristic algorithms to improve the reduction efficiency. You Z et al. discuss the attribute kernel and attribute reduction operation of multiple decision tables in the distributed environment and proposes an information entropy reduction algorithm based on the vertical distribution in the multidecision table [7]. This method reduces the communication cost in the process of distributed reduction and improves the reduction efficiency through parallel and conditional information entropy and elements in the transmission class. Heuristic-based attribute reduction improves the reduction efficiency to some extent, but there are still some shortcomings. To further improve the reduction efficiency, many scholars combine rough set attribute reduction with other optimization algorithms. The coevolution reduction algorithm, combined with the quantum frog group, was proposed by Ding Weiping et al. in [8]. Using the optimal execution experience of the frog group and elite individuals to guide the model group to the target direction quickly, the convergence efficiency and global search ability of the attribute reduction are improved, but the cooperative coevolution reduction algorithm is suitable for high performance. Dimensional data sets greatly reduce the performance of data reduction with smaller data dimensions. Chen J et al. proposed an efficient rough set clustering algorithm based on a genetic algorithm [9]. The global search ability of the genetic algorithm was used to improve the convergence speed of the algorithm. Zhang Rongguang et al. proposed a particle set based on the rough set attribute reduction algorithm [10]. By introducing the improved tabu search algorithm, the local search strategy of the particle swarm optimization was improved, and the diversity of the population was improved. Based on this background, attribute reduction based on the genetic algorithm for the coevolution of meteorological data (AECMD) was proposed. The algorithm divides the evolutionary population into two subpopulations. One subpopulation quickly guides population evolution through the use of elite-assisted cross-operation, and the other subpopulation maintains population diversity by introducing random populations in the later stages of evolution. This assists in crossover strategies to avoid the impact of random populations on the evolutionary population due to fitness values that are too small. Through the coevolution of the two populations, the entire evolutionary process improves the algorithm’s reduction performance.

2. Attribute Reduction in the Rough Set Theory

2.1. Information System

Formally, an information system can be described as follows: Let [1118], where is a nonempty finite set (i.e., the domain is an attribute set, where and represent a conditioned attribute set and decision attribute set, respectively). , represents the conditioned attribute values, and is a decision attribute value. is an information function, which gives a value to each object in the information system. In the information system, represents a cluster of equivalence relations on . If and , we define , which is an indiscernibility relation. Obviously, is an equivalent relationship.

2.2. Attribute Reduction and Attribute Core

Attribute subsets and are the equivalence relation clusters on the domain . and are the conditional attribute sets and decision attribute sets, respectively, and , . positive region is recorded as [7, 1929]: If , , then it is said that a can be saved in ; otherwise, a is necessary in .

In the attribute subset , all sets of the necessary relations in the knowledge base are called cores of , as . In the information system, if the attribute subset is relative to an independent , and , then is a relative reduction in , and the collection of all -reductions is denoted as . Because of , therefore the attribute core is the intersection of all reductions and cannot be deleted in the attribute reduction process of the information system. Thus, the core of the information system can be regarded as the core of the attribute reduction [30].

2.3. Attribute Independence Degree

Let the information system , where is an equivalence relation cluster on and ; then define the dependence degree of on as [31] denotes the base of the set; denotes the positive of in the universe .

3. Coevolutionary Reduction in the Adaptive Genetic Attribute

The number of populations during evolution is limited. After several iterations, the population is composed of individuals with higher fitness values. At this time, the diversity of the population is low, which makes the selection and crossover operators lose their primary roles. In the process of evolution, different operators have different effects on population diversity. In the process of the selected operator iteration, the population evolution phenomenon of “survival of the fittest” is embodied, but it reduces the diversity of the population, the crossover operator keeps the diversity of the population, and the mutation operator improves the diversity of the species [32].

3.1. Adaptive Genetic Algorithm
3.1.1. Fitness Function

The fitness function is a method that calculates the individual’s ability to adapt to the surrounding environment. It is a key step in calculating the degree of an individual’s superiority and inferiority. It is also a key process in combining the genetic algorithm with the attribute reduction of the rough set. The purpose of attribute reduction is to remove redundant attributes as much as possible to obtain an optimal solution. Therefore, the design of the fitness function should meet the two requirements of strong classification ability and deletion of redundant attributes as much as possible. For this reason, the individual degree of attribute dependency and individuals with conditional attributes are introduced as parameters into the fitness function. The fitness function formula is as follows (4):where represents the number of conditional attributes and represents the number of individuals whose gene value is “1” in ; is the attribute dependency of in the individual whose gene value is “1”; and is the adjustment parameter.

3.1.2. Selecting the Operator

Selecting operators is a key factor in the reduction in population diversity, which reflects the evolutionary direction of the survival of the fittest and determines the search performance of the algorithm [33]. Roulette strategy determines the selection probability according to the fitness of an individual. At the beginning of the iteration, individual differences are greater, and the diversity of the population is abundant. Through this method, the evolutionary phenomenon of the survival of the fittest can be well represented. However, as the iteration progresses, the individual fitness value of the population decreases, and the performance of the selection operator is also greatly weakened. Therefore, the selected operator is improved in this paper, as shown inwhere represents the minimum fitness value of the population and represents the individual fitness value of the probability of the current selection.

After calculating the individual fitness value of the population, the population is sorted in descending order according to the size of the fitness value. When selecting the operation, the individual fitness value is subtracted from the minimum fitness value in the contemporary population, and the roulette selection operation is performed. After the individual fitness value subtracts the minimum fitness value of the population, the degree of difference between individuals increases, which enriches the population diversity and balances the selection pressure.

3.1.3. Cross and Mutation Operators

The traditional genetic algorithm uses fixed crossover probability and mutation probability, which may lead to slow convergence and premature convergence. Algorithm premature convergence affects the evolution of better individuals; the population tends to become static, with limited population diversity, and causes the crossover and mutation operators to become ineffective. The standard adaptive genetic algorithm measures the individual’s superiority and inferiority by comparing the individual fitness with the average fitness value. When the fitness value is greater than the average fitness value, the individual is considered to be a good individual and has a small probability of crossover and mutation. Premature convergence is caused by the individuals in the population. To avoid the multiplication of individuals being slightly larger than the average fitness value, causing the population to be single, this paper proposes using the average fitness value of an individual that is greater than the average fitness of the population as a measure of individual merit. This standard increases the probability of crossover and mutation in this part of the evolution process and avoids overproliferation. At the same time, from the point of view of the entire iterative process of the algorithm, due to the higher diversity of the population at the beginning of the iteration, the population has a greater probability of crossover and mutation. As the iteration proceeds, the population gradually starts to converge, and the population’s crossover and mutation probability also gradually decreases. Based on this, the crossover and mutation operators and , respectively, are improved as follows:where represents the maximum fitness value; represents the average fitness value of the individual whose fitness value is greater than the average fitness value; represents the evolution algebra; and represent the changing curvature of the crossover probability and mutation probability with regard to evolution algebra, respectively; and represent the convergence limit of the crossover probability and mutation probability, respectively; and represents the control factor.

3.1.4. Population Diversity

Population diversity is a prerequisite for the evolution of genetic algorithms. Population diversity directly affects the performance of the algorithm. If the population is binary coded, the population size is , and the total gene length is , the population diversity measure is defined as follows:where and , respectively, represent the number of the 1 and 0 loci of all individuals in the binary coding group; indicates the distribution of the gene, and 0 and 1 are indicated in the population.

3.1.5. Elite Individuals Assist in Cross-Operation

This paper is inspired by [34] to design a new crossover algorithm assisted by elite individuals. Reference [34] uses the same elite individuals to cross-operate with crossover individuals. This kind of operation can quickly lead the population towards elite individuals, but it also greatly reduces population diversity. The crossover operation of this paper is to randomly select an elite individual from the elite pool and complete the crossover operation with the cross individuals to avoid the individual elite individuals from guiding to reduce the diversity of the population. The following formula shows:

3.2. Cooperative Evolution of the Adaptive Genetic Attribute Reduction Algorithm

The attribute dependence degree, as the basis of measuring the importance of the condition attributes in the information system to the decision attribute, provides a standard measure for evaluating the importance of the conditional attributes in the information system. As a self-organized global optimization search algorithm, the genetic algorithm improves the convergence speed and optimization efficiency of the algorithm. The fitness function is used to connect attribute reduction with the genetic algorithm. The number of attributes and the attribute dependency are introduced into the function, which explains the concept of attribute reduction in the rough set. The algorithm for the interval value attribute reduction based on the genetic algorithm is shown in Agorithm 1.

Input: ,
Output: Attribute reduction red
1. Initialization:
1.1 Calculate the dependency of the decision attribute on the condition attribute according to formula (3).
1.2 Let , for any attributes , if , then ; if , then
is the minimum reduction in the base attribute for the condition attribute ; otherwise, step 3 is performed.
1.3 For any attributes , if , then the corresponding chromosomal gene position is 1; else, the random selection
of 0 and 1 as their chromosomal gene is performed.
2. Start the iterative process:
2.1 According to formula (3), calculate the individual attribute dependency value, calculate the individual fitness value by formula
(4), and then sort the population in descending order according to the size of the fitness value.
2.2 Select the first M different individuals to compose the elite library and let t = 0; from the population , select
the individuals to form subpopulations A and B according to formula (5).
2.3 Subpopulation undergoes evolutionary operations:
(1) The elite algorithm assists in the crossover and randomly selects the elite individuals in the elite bank and the individuals in
the child population to complete the collaborative cross-operation
(2) Perform the mutation operation to obtain subpopulation .
2.4 Evolution of subpopulation : If the number of iterations is higher than , generate random populations and perform elite
assisted crossover operations; otherwise, perform mutation operations to obtain subpopulation .
2.5 Combine the populations and to obtain the population and calculate the fitness value of the
population .
2.6 If has an individual fitness value greater than that of , replace with the smallest fitness value to obtain
and ) sorted in descending order. Take the first different individuals in
to update the elite library and get ;
2.7 It is determined whether the termination condition is satisfied. If it is satisfied, it ends; otherwise, start over at 2.1.

4. Experiments and Results

4.1. Experimental Data

To reduce the influence of the region on precipitation, the experimental data (i.e., annual daily values) were selected from 5 meteorological stations in Huaian (58145), Yancheng (58151), Suqian (58131), Yangzhou (58245), and Lianyungang (58044) from 2005 to 2014, and the amount of effective data was 16750. In addition to the site number, latitude, longitude, time, and other attributes, the data set also included 20 attributes, such as temperature, humidity, and pressure, of which precipitation was a decision attribute, and the rest were conditional attributes. The corresponding classification of conditional attributes, experimental variables, and precipitation levels is shown in Tables 1, 2, and 3, respectively.

4.2. Results Analysis

Since attribute reduction is based on discrete data, the ACIM algorithm is used to discretize the meteorological data and reduce operations.

4.2.1. Performance Analysis of the Improved Genetic Algorithm

The AECMD algorithm improves individual metrics in the evolution process and avoids population diversity imbalance in the evolutionary process by introducing a random population. To analyze the variation in the population diversity in the evolutionary process of the AECMD algorithm, the precipitation attributes are reduced by the AECMD algorithm and the attribute reduction algorithm based on the adaptive genetic algorithm (ARAGA). To directly analyze the diversity of the algorithm, the termination condition is set to meet the maximum number of iterations. A piece is set to meet the maximum number of iterations. Because the initial population is randomly generated, it is not guaranteed that the initial population of the algorithm is the same, and the evolutionary processes of the two attribute reductions are the same. The diversity in the iteration is shown in Figure 1.

From Figure 1, we can see that, in the initial stage of the iteration, the population diversity of the two algorithms is more abundant, and then it begins to decrease rapidly. With the evolution, the ARAGA algorithm can improve the diversity of the population, but the diversity of the population is unstable. Finally, the diversity of the population can be maintained at a lower level, and the search performance of the algorithm needs to be improved. In comparison, the diversity of the AECMD algorithm is relatively stable, and the diversity of the population can be maintained at a high level, which effectively avoids premature convergence of the algorithm and ensures the convergence performance of the algorithm. The elite individual plays an important role in the evolution of the population. To analyze the convergence performance of the AECMD algorithm using the genetic algorithm, the changes in the optimal individual and the average fitness value in the iterative process of the AECMD and ARAGA algorithms are recorded. The two changes are shown in Figures 2 and 3, respectively.

Because the initial population is different, the initial optimal individuals are not necessarily the same. From Figure 2, we can see that the AECMD algorithm starts to converge in the twenty-second generation, and the ARAGA algorithm begins to converge after 42 times and then converges slowly. From the change curve of the average fitness value, it is found that the initial average fitness of the ARAGA algorithm is higher at the beginning but, with the evolution, the advantages of the AECMD algorithm gradually become prominent. The evolution mechanism based on the elitist strategy accelerates the convergence speed of the AECMD algorithm.

4.2.2. Reduction Performance Analysis

To further analyze the effect of the AECMD algorithm on the performance of meteorological data reduction, the precipitation properties of the data set are reduced with the rough set attribute reduction algorithm based on the Tabu Discrete Particle Swarm Optimization [10, 3539] (TSDPSO-AR) algorithm. To compare the performance of reduction, the classification of every reduction attribute subset is carried out in the KNN (k-Nearest Neighbor, K=3) classifier. The training data and test data were carried out at a 4:1 ratio. To avoid contingency, cross-operations are used to calculate the accuracy of the precipitation prediction. The average optimal subset and prediction results are shown in Tables 4 and 5, respectively.

Combined with Tables 3 and 4, we can find that the classification ability of the AECMD algorithm is stronger than that of the other two algorithms. The AECMD algorithm improves the search ability of the algorithm by improving the genetic operators and improves the convergence speed of the algorithm with an elitist strategy. It can also be found that the accuracy of precipitation prediction is high for the no rain and light rain categories, and the accuracy of the prediction decreases with the increase in rainfall level. This is due to the uneven data distribution for different levels of rainfall. With the increase in rainfall grade, the number of corresponding samples is greatly reduced. At this time, a wrong prediction may have a greater impact on the accuracy of the classification; therefore, the corresponding prediction accuracy is also low.

5. Conclusion

In this paper, the crossover operator and mutation operator of the adaptive prediction algorithm are improved, and the evolutionary population is divided into two subgroups. One subpopulation improves the convergence speed by using the elite-assisted cross-operation. The other subpopulation maintains the population diversity in the evolutionary process by introducing a random population, and the two subpopulations are coevolved and complete the iterative operation. Finally, an elitist strategy based on the coevolution mechanism of the genetic algorithm, combined with attribute reduction, is used to complete the precipitation reduction operation and improve the reduction performance of the meteorological data.

Data Availability

The data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

This work was supported by the National Natural Science Foundation of China (61402236, 61373064, 61472024, and U1433203), the CERNET Innovation Project (NGII20160318), and the Jiangsu Province “Six talent peaks project in Jiangsu Province” (2015-DZXX-015).