Abstract

In order to improve the utilization rate of agricultural big data and solve the security issues problem of multisource and heterogeneous agricultural big data, an improved agricultural big data ant colony optimization algorithm (BigDataACO) is proposed to complete the multisource agricultural big data information in the feature layer and decision-making, and the problem of multisource data fusion was solved. The swarm intelligence algorithm is a process of simulating the complex problem of populations in nature through the mutual cooperation between individuals. The algorithm has potential parallelism and strong robustness, and the algorithm does not depend on specific problems. The definition, principle, and implementation method of agricultural big data fusion problem are studied. Then, the insufficiency of big data fusion modeling algorithm is analyzed. Finally, the source and core steps of the ant colony big data fusion algorithm are studied. The experimental results show that the improved BigDataACO algorithm is verified by the measured data. Compared with K-means, D-S evidence theory, and Bayesian algorithm, the uncertainty of data fusion is greatly reduced by the improved algorithm proposed in this paper.

1. Introduction

Agricultural big data is a collection of data that has a wide range of sources, diverse types, complex structures, and potential value and is difficult to apply common methods of processing and analysis, after integrating its own characteristics such as regional, seasonal, diversity, and periodicity of agriculture [1, 2]. Agricultural big data retains the basic characteristics of big data, such as huge volume, variety, low value, fast processing speed, high veracity, and high complexity, and big data application research in agriculture is still relatively small [3, 4]. With the development of Internet, cloud computing, and other technologies, the Internet of things is used in more and more application fields. Many aspects of smart city also use the Internet of things technology. Smart city includes many aspects, such as intelligent transportation and telemedicine.

By building a decentralized system, application blockchain can provide infrastructure support for big data generated by the Internet of things and help solve the ubiquitous data security problems in the Internet of things, while the Internet of things provides a lot of landing scenarios for the blockchain. This paper applies the characteristics of blockchain, such as peer-to-peer, open and transparent, secure communication, hard to tamper with, and multiparty consensus, which will have an important impact on the Internet of things: the characteristics of multicenter and weak centralization will reduce the high operation and maintenance costs of data centric architecture, the characteristics of information encryption and secure communication will help to protect privacy, and identity rights management and multiparty consensus will help to identify noncompliance. Based on the chain structure, it is helpful to build a verifiable and traceable electronic evidence storage. The distributed architecture and the characteristics of subject equivalence help to break the shackles of multiple information islands in the Internet of things and promote the horizontal flow of information and multiparty cooperation. In order to continuously promote the optimization of the agricultural economy, to realize the sustainable industrial development and regional industrial structure optimization, and further promote the construction of smart agriculture, it is necessary to comprehensively and timely grasp the development of agriculture, which needs to rely on agricultural big data and related big data fusion processing technology. However, it faces enormous challenges for prediction accuracy in traditional big data modeling algorithms. Since the data fusion is to build a classification model through the training set (i.e., through the classification algorithm), so the fusion rule set that best represents the training data is found. That is a process of gradual optimization [5, 6]; so many researchers applied the swarm intelligence algorithm to the data fusion learning model and achieved some results. The swarm intelligence algorithm is a process of simulating the complex problem of populations in nature through the mutual cooperation between individuals. The algorithm has potential parallelism and strong robustness, and it does not depend on specific problems. The construction of classification learning model based on the swarm intelligence algorithm has become a research hotspot in the field of data mining in recent years [7, 8].

In this paper, the representative of the ant colony algorithm and clustering algorithm in swarm intelligence algorithm are introduced into data fusion mining and decision-making. The problem of constructing based on traditional ant colony classification algorithm and clustering method is studied. And then, the two algorithms are improved from different angles, and a new ant colony data fusion modeling algorithm is proposed. Finally, a number of experiments verify the effectiveness of the improved algorithm in the construction of data fusion learning model.

2. Big Data Fusion and Security Algorithm Based on Ant Colony Algorithm

2.1. Definition of Big Data Fusion

Data fusion is a process in which multiple data are processed to produce more effective and more user-friendly data. Data fusion is the use of computer technology to perform multilevel, multifaceted, multilevel information detection, and correlation estimation and correlation analysis on various multisource and heterogeneous data under certain criteria [9, 10]; in order to obtain the target state and feature estimation, it is more accurate, complete, and reliable than a single data information.

The method of data fusion is generally applied in daily life. For example, when distinguishing a thing, it usually combines various sensory information to process and combines the process of more effective and more in line with the user’s needs [11]. When identifying a thing, it is often not enough to synthesize the information obtained by various senses to make accurate judgments on things. Combining multiple sensory data, the description of things will be more accurate. In traditional agricultural big data applications, in some cases, it is not necessary to obtain a large amount of raw data, and only need to obtain the final result, and then, we can use data fusion technology to achieve this purpose.

2.2. Ant Colony Classification Algorithm

Ant colony algorithm is a bionic optimization algorithm, because of its good ability to find good solutions, potential parallelism, positive feedback, and easy to combine with other algorithms, people have applied it to solve many complex combinatorial optimization problems and have shown great potential [12].

The ant colony algorithm, which simulates the foraging behaviour of ant colony, is introduced as a new computational intelligence model. The algorithm is based on the following basic assumptions: ants communicate with each other through pheromones and environment, and each ant reacts only according to its local environment and only affects its local environment; the response of an ant to the environment is determined by its internal model. Because ants are genetic organisms, the behaviour of ants is actually the adaptive performance of their genes, that is, ants are reactive adaptive subjects [13]. At the individual level, each ant makes independent choices based on the environment. At the group level, the behaviour of a single ant is random, but ant colonies can form highly ordered group behaviours through self-organization processes. It can be seen from the above assumptions and analysis that the optimization mechanism of the basic ant colony algorithm includes two basic stages of adaptation and cooperation. In the adaptation phase, each candidate solution continuously adjusts its structure according to the accumulated information. The more ants passing through the path, the larger the amount of information, the easier the path is to be selected, and the smaller the amount of information. In the collaborative phase, the exchange of information between the candidate solutions is expected to produce a better performance solution, similar to the learning mechanism of the learning automaton [14, 15]. Ant colony algorithm is actually a class of multisubject system. Its self-organizing mechanism makes the ant colony algorithm not need to have a detailed understanding of every aspect of the problem. Self-organization is essentially a dynamic process in which the ant colony algorithm mechanism increases the system without external influences and reflects the dynamic evolution from disorder to order. The first ant algorithm was proposed by Dorigo and is called the ant system [1618]. The ant system incorporates heuristic information and designs the transition probability , and the taboo table has been added to enhance the algorithm memory function. In the ant system, the probability that an ant will transfer from node to node is defined as [19]:

In which, is the value of the information on the edge , represents the postlatency effect of moving from node to node . is heuristic information, calculated by a heuristic function, which represents the a priori effect of moving from node to node . The pheromone concentration is a memory of past good quality movements, indicating the impact of past movements from node to point on the current selection. The choice of the search path in the ant system is to seek a balance between and . This method can well handle the relationship between the exploration and development of the ant optimization process.

According to the biological principle of ants, the pheromone on each side introduces a volatilization mechanism, which can encourage ants to explore new paths and avoid premature convergence [20, 21]. In each iteration, the original pheromones need to be volatilized to release new information. For each pheromone on the side, volatilization is performed using equation (2).

In which, is a constant whose value ranges from ∈ [0,1], indicating the degree to which ants have forgotten previous decisions. is the influence of controlling the previous search history. The value of is small, indicating that the volatilization rate is slow; when the value of is large, the volatilization rate is fast.

According to different pheromone update strategies, Dorigo M proposes three different basic ant colony algorithm models, which are called Ant-Cycle model, Ant-Quantity model, and Ant-Density model, the difference is in the difference in seeking.

In the Ant-Cycle model,

In which, represents the pheromone intensity, which affects the convergence speed of the algorithm to some extent, and represents the total length of the path taken by the th ant in this cycle.

In the Ant-Quantity model,

In the Ant-Density model,

The main difference between them is that the formula (3) and formula (4) use local information, that is, the ant updates the pheromone on the path after completing one step, and the formula (5) uses the overall information, that is, the ant completes. The pheromone on all paths is updated after a loop, and the performance is better when solving.

3. Improved Ant Colony Big Data Fusion Modeling Algorithm

The traditional ant colony fusion modeling algorithm uses a sequential coverage strategy to mine rules one at a time. Since the training set samples covered by the mining rules are removed each time, the search space changes, and the algorithm does not consider the interaction between the discovered rules, such that the rules output earlier will affect the rules that are output later.

3.1. Big Data Fusion Algorithm

Suppose that a multivariate data node will overflow all the nodes of the whole network with its keywords. After the node receives the packet, it will calculate the relevance of the data association [22, 23]. When the source node wants to send data, suppose a certain time , an ant that is in node , and its probability to access the next hop node will be selected according to the following probabilistic criteria, which can be written as formula (6).

In which, represents a collection of nodes that the node has not yet accessed, represents a pheromone, and represents the amount of information on the path between nodes and ; in , takes the reciprocal of the distance between nodes, which is a heuristic factor, indicating the visibility of the path; indicates the degree of importance of relative information; indicates the relative importance of heuristic information.

After all, the ants complete the process of traversing the nodes; the global update rule of the information on each path can be written as

When the node receives a pheromone update packet, it will update its pheromone table according to Eq. (8) and Eq. (9).

In which, is the evaporation coefficient of the information amount, indicating the length of the pheromone volatilization, represents the increment of the amount of information on the path between the nodes, and its value is determined as shown in equation (10).

In which, is a constant used to control the total amount of pheromone released by the ant after completing a path search, represents the total length of the path, and represents the path accessed by the ant.

This completes the improvement of the level gradient field. Next, the center point fusion algorithm is used to find the center point, and the fusion tree and data report are established, thus completing the whole algorithm.

3.2. Big Data Fusion Ant Colony Optimization Algorithm

The big data fusion ant colony optimization algorithm (BigDataACO) improves the basic ant colony method. Ant-Miner, an ant colony classification algorithm, aims at mining classification rules with certain structural forms. Where the acquisition of classification rules is one of the main functions of Ant-Miner ant colony classification algorithm. The general structure of classification rules is

In which, <conditions> rule antecedent referred to, which consists of a series of conjuncts, comprising a logical combination of the predicted property, its form is

Each conjunction is a specific value of the attribute in the training set, and the same attribute can only appear once in the predecessor. The condition items of the precategory of the classification rule are a triple <attribute, operator, attribute value>, the attribute in the triplet belongs to the attribute space of the data to be classified; an operator can be a relational operator, often using “=”, and attribute values are generally treated as discrete values. <Class> is called a post rule and is a class in the dataset.

The core operation of the ant colony classification search is to generate rules, that is, the current ants sequentially add a rule predecessor to the current partial rule.

Assume that the form of the rule item is , where is the attribute and is the value of . The formula for calculating the probability that the term is added to the current partial solution is as follows:

In which, is the number of attributes, if the current ant does not use the attribute , then is set to 1; otherwise, it is set to 0; is the number of values in the th attributes range; is the problem-dependent heuristic function of item , which is calculated as

In which, is a related entropy. The higher the value of , the greater the likelihood that will be selected.

is the pheromone of the current ant path at the node position , that is, the pheromone at the time item . The pheromone of all paths is the same when the algorithm is initialized. The value of this pheromone is inversely proportional to the number of all attribute values.

Ant-Miner uses only a single ant to construct an ant colony in the ant colony construction. Only one ant is used in each iteration of the while loop, and the pheromone update is performed after the ant has completed the construction of the rule. The traditional Ant-Miner can easily select the attribute items in the discovered rules when performing ant colony search. Although the development ability is enhanced, it is easy to prematurely converge, and its calculation method of attribute selection probability is also complicated. This paper proposes a new method based on pheromone attraction and exclusion in the construction of rules. Based on this, the probability formula of state transition is modified, so that the pheromone of ant in the rule search process not only contains the attraction part but also contains the exclusion part. In the exclusion part, ants tend to explore in the initial stage of the search rule process and tend to develop in the latter part of the search.

To use the ant colony construction rules, we first need to initialize the classification modeling algorithm, set the parameter values required by the algorithm, and then place all the training sample data in the training set. Simulate the ant optimization model in the artificial ant colony algorithm to establish the attribute node path. Each node obtains the initial pheromone value according to formula (15):

In which, represents the total number of sample attributes of the training set, and is the number of values in the value field of the attribute .

In the process of selecting attribute nodes, if the selection is random every time, the calculation time cost of the mining rules will be very large. This paper improves the probability transfer method. The probability formula for the item to be added to the current part is

In which, is problem-dependent term heuristic function. The larger the value, the higher the correlation of in the classification and the greater the possibility of being selected; is the pheromone at the position at time . The improved ant colony data fusion algorithm flow is shown in Figure 1.

4. Experimental Analysis

4.1. Experimental Methods and Data Sets

This experiment analyses and verifies the performance of the BigDataACO algorithm on multisource big data sets. In order to give a more intuitive analysis for the performance of the algorithm, we compare the BigDataACO algorithm with K-means Algorithm, D-S (Dempster-Shafer evidence theory) evidence theory, and Bayesian algorithm and verify the performance of the algorithm by clustering accuracy, purity, relevance, and time consumption.

In the experiment, we will compare and analyze the performance of the BigDataACO algorithm and other comparison algorithms proposed in this chapter on the three data sets of Agricultural Features Data, Agricultural Multilanguage Data, and Agricultural Multimedia Data. The Agricultural Features Data contains image characteristics of nine handwritten characters, each of which has 200 images and a total of 1800 images. Each picture can be represented as a 122-dimensional character shape Fourier coefficient, a 221-dimensional contour description, a 260-dimensional pixel average, and a 16-dimensional morphological feature. There are 12 kinds of visual features, that is, 12 modes of data. In the experiment, the first five modal feature sets were used for multimodal data clustering analysis. Table 1 gives a brief description of several data sets. The clustering distribution of multimodal data before data fusion is shown in Figure 2.

In the experiment, we compare the BigDataACO algorithm with K-means, D-S evidence theory, and Bayesian algorithm. Ant algorithm is a new simulated evolutionary algorithm based on population. K-means is a common clustering method based on segmentation. In this paper, the ant algorithm and K-means algorithm are combined to solve the local optimization problem to a large extent by using the randomness of the ant algorithm and overcome the sensitivity of initial parameters of the K-means algorithm. It improves the quality of clustering and overcomes the problem that the density-based algorithm cannot find arbitrary shape clustering. And Bayesian classification model is a simple and effective classification method, which has a good theoretical foundation and high classification accuracy. Because of the independent hypothesis premise in naive Bayesian classification, it is particularly important whether the K-means feature selection step can accurately and effectively classify. D-S theory is a generalization of Bayesian reasoning method, which mainly uses Bayesian conditional probability in probability theory, and needs to know prior probability. However, D-S evidence theory does not need to know the prior probability, which can well express “uncertainty,” and is widely used to deal with uncertain data.

In order to analyze and compare the performance of each multimodal clustering algorithm more comprehensively, we use the accuracy, correlation, and time consumption to measure and analyze all clustering results.

4.2. Experimental Environment

All the experiments are performed on the same PC (personal computer). The hardware configuration is as follows: Intel Core i7-7000U processor, 2.80 GHz main frequency, 32 GB memory; and the software use MATLAB2012. The data in each experimental data set is randomly divided into five parts. The first part contains 30% of the data in the whole data set, and all of them have labels. In the experiment, each clustering model is initialized with the labels. The remaining data are divided into three blocks, and all of them are labeled-free. In the experiment, it is added three times for incremental clustering fusion.

4.3. Experimental Relation Algorithm Description

The first set of experiments will verify the clustering performance of the four algorithms in the three data sets of Agricultural Features Data, Agricultural Multilanguage Data, and Agricultural Multimedia Data. In the specific experiment, 1200 data instances were randomly selected for label clustering mode initialization, and then, other unlabeled data were equally divided into three in random order to join the existing clustering results to complete the incremental clustering. For the K-means and BigDataACO algorithms, the initial cluster number set [5, 6, 12, 13, 16] five different values to complete the experiment, the DS evidence theory and the Bayesian algorithm’s iteration number and shared feature dimension are set to [121, 260, 400, 6, 8, 12]. Each experiment was performed 20 times randomly, and the cluster average results were recorded. The specific experimental results of each algorithm are compared as shown in Figure 3.

4.4. Experimental Results and Analysis

As can be seen from Figure 3, with the addition of the amount of each data block, the accuracy, correlation, and execution time of most algorithms decrease, but the execution time of the algorithms increases significantly.

The K-means algorithm may find the local optimal clustering, rather than the global optimal clustering. However, the ACO and Bayesian algorithm can achieve global optimal clustering. The original K-means algorithm calculates the distance between each observation point and all cluster centers in each iteration. When the number of observation points is large, the performance of the algorithm is not good. The accuracy of Bayesian classification is the highest, and D-S theory is a generalization of Bayesian reasoning method, which mainly uses Bayesian conditional probability in probability theory, and needs to know the prior probability. D-S evidence theory does not need to know the prior probability, which can express “uncertainty” well, and is widely used to deal with uncertain data. It is mainly suitable for information fusion, expert system, intelligence analysis, legal case analysis, and multiattribute decision analysis as an uncertain reasoning method.

In the performance comparison index, we use SSE (sum of square due to error) to represent the sum of the positions from the center point of the current classification situation to the points of its own cluster. The calculation formula of SSE can be written as where is the position of the point and is the position of the center point.

In the process of clustering algorithm iteration, we evaluate the current classification effect by calculating the SSE value under the current center point. If the SSE value is greatly reduced after an iteration, the clustering process is basically completed, and there is no need for many iterations. Compared with the K-means, D-S evidence theory, and Bayesian algorithm, the SSE value of the BigDataACO algorithm is the smallest, while that of the K-means, D-S evidence theory, and Bayesian algorithm is larger. It can also be seen from Figure 4 that the BigDataACO algorithm has the best clustering performance, and the clustering performance is relatively stable with the dynamic change of data. The implementation of the D-S evidence theory and Bayesian algorithmic algorithm under the best parameter setting also has no incremental data processing capability, so it has a similar time performance with the K-means algorithm.

5. Conclusions

In this paper, an improved BigDataACO algorithm based on ACO is proposed, that is, taking agricultural large data fusion as the research object; this paper studies the construction and prediction methods of classification models for different data sets by using an improved ant colony algorithm. In the wireless sensor monitoring of large agricultural data, data fusion technology can be combined with multiprotocol levels of sensor networks. The real-time monitoring data of sensors with certain uncertainty and ambiguity and soil moisture retrieved by hyperspectral data are used as agricultural large data sets.

In the process of fusion, the improved ant colony optimization algorithm and Bayesian maximum entropy method are used to complete the integration of the three data sets of Agricultural Features Data, Agricultural Multilanguage Data, and Agricultural Multimedia Data at the regional scale. On this basis, the improved BigDataACO algorithm is used to complete the fusion of multisource information in the data set, solve the problem of information fusion in the process of agricultural management and decision-making, and eliminate the possible redundancy between multisource agricultural information. Contradictions have improved the reliability of agricultural decision-making and the utilization of agricultural big data information. Further studies are expected to understand the connotation of the problem of big data fusion; in the era of big data, the analysis and mining for the Agricultural Multilanguage Data is a research field and which attracts much attention. To effectively learn the characteristics of massive, low-quality, heterogeneous, high-dimensional, and fast-changing big data, there are still a series of problems and challenges. Our study provides a corresponding Agricultural Multilanguage Data fusion algorithm for the incompleteness of multimodal data, real-time processing, and multisource data fusion. The purpose of this paper is to propose the organization and blockchain-enabled security method of big data to realize the mining and analysis of monitoring data of the Internet of things.

Data Availability

The raw/processed data required to reproduce these findings cannot be shared at this time as the data also form part of an ongoing study.

Conflicts of Interest

The authors declare that there are no conflicts of interest regarding the publication of this study.

Acknowledgments

This work is partially supported by the National Science Foundation of China (Grant No. 72061030), Natural Science and Technology Project Plan in Yulin of China (Grant No. 2019-78-3, 2016CXY-12-09, 2019-78-2, 2019-78-1, 2019-76-2, 2019-106-6), and Funding Project for Department of Yulin University (Grant No. 16GK24, TZRC1801), and thanks for the help.