Prediction Research of Red Tide Based on Improved FCM
Red tides are caused by the combination effects of many marine elements. The complexity of the marine ecosystem makes it hard to find the relationship between marine elements and red tides. The algorithm of fuzzy -means (FCM) can get clear classification of things and expresses the fuzzy state among different things. Therefore, a prediction algorithm of red tide based on improved FCM is proposed. In order to overcome the defect of FCM which is overdependent on the initial cluster centers and the objective function, this paper gains the initial cluster centers through the principle of regional minimum data density and the minimum mean distance. The feature weighted cluster center is added to the objective function. Finally, the improved FCM algorithm is applied in the prediction research of red tide, and the results show that the improved FCM algorithm has good denoising ability and high accuracy in the prediction of red tides.
Ocean as the cradle of human beings provides humans with abundant biological resources and mineral resources. Along with the rapid growth of population and economic society in the 21st century, we march into the sea to alleviate the shortage of resources, followed by the marine ecological environment pollution and destruction. Because of a large amount of untreated waste water directly discharged into the ocean and global climate change, harmful red tide species increased dramatically. The cause of the red tides is more complex. Although the occurrence mechanism of red tide has not yet been determined, most scholars believe that red tide occurrence is closely related to water eutrophication [1, 2].
There have been some studies on the red tide prediction. Using numerical method, Gibson et al. established NPZ ecological dynamic model and analyzed five kinds of seston feeding functions . Zhang et al. combined the multifactor including the meteorological and hydrological data to forecast the red tide . Wang et al. established a multivariate adaptive spline regression model to forecast the red tide . However, the accuracy of the traditional red tide prediction is flawed. With the rapid rise of artificial intelligence, artificial neural network (ANN) has been applied to the red tide prediction and has made a lot of achievements [6, 7]. Artificial neural network also has many defects, such as low learning efficiency and unstable network learning and memory.
In this paper, a fuzzy -means clustering algorithm is proposed for the red tide prediction. The algorithm is optimized from the initial cluster center selection and the improvement of the objective function. As the experiment shows, the prediction of red tide based on improved FCM algorithm has better clustering results, converging speed and robustness to noise and outliers.
2. Related Research
2.1. Research on the Prediction Method of Red Tide
Red tide is caused by the comprehensive action of multiple factors, such as the sudden proliferation and accumulation of some plankton . It has the characteristics of common natural disasters, namely, inhomogeneity, diversity, difference, burst, reproducibility, disorder, randomness, and predictability in time and space . As one of the most serious marine disasters, the scale of the red tide and the loss of the economy are increasing year by year. Red tide also has a negative effect on the marine economy sustainable development in our country; therefore the prediction research of red tide has an important meaning in environmental protection and reducing the economic loss [10–12].
According to the research on the prediction of red tide, there are already many prediction methods, such as empirical prediction method, statistical prediction method, numerical model prediction method, and artificial neural network [13–15].
(1) Empirical Prediction Method. There is a certain regularity between the occurrence of red tide and the changes of environmental factors, such as meteorological conditions, oceanographic processes, and ecological factors. Empirical prediction method is based on the certain regularity.
(2) Statistical Prediction Method. The empirical prediction method only depends on the change of a certain environmental factor. However, red tide in fact is caused by many factors. The statistical prediction method can have comprehensive analysis on the factors; therefore it has stronger predictive ability. Its main method includes principal component analysis, discriminant analysis, and stepwise regression method.
(3) Numerical Model Prediction Method. Due to the lack of detailed understanding of the occurrence mechanism of red tide, statistical prediction method is to a certain degree subjective and blind. Numerical model prediction method uses a variety of mathematical tools to analyze, solve, and simulate the model based on various physical and chemical biological coupling factors.
(4) Artificial Neural Network. With the rapid development of computer technology, artificial intelligence technology, and biotechnology, artificial neural network has been applied to the prediction of the red tide. However, the research of this method has been a certain constraint because it has the slow convergence speed and it is easy to fall into local minimum values.
2.2. Research on the Fuzzy -Means Clustering Algorithm
Clustering is one of the most basic activities of human understanding of the world. The purpose of clustering is to make the same class of things as similar as possible and different categories of things as different as possible. According to the different values of membership degree, the clustering method can be divided into hard clustering method and fuzzy clustering method. As for the hard clustering method, 0 means that the sample must not fall into this category and 1 means that the sample must belong to this category. Fuzzy clustering method is a combination of fuzzy theory and clustering analysis. The fuzzy clustering algorithm proposed by Dunn and extended by Bezdek is the most well-known and the most frequently used method . The major algorithms include transitive closure method based on fuzzy equivalence relation, the method based on similarity relation and fuzzy relation, and most tree method based on fuzzy graph theory. However, these methods have the high complexity of calculation and are not suitable for large data; therefore, they have been gradually reduced in research. The fuzzy clustering algorithm based on objective function is widely studied. Within-Groups Sum of Squared Error is used to construct the objective function. This algorithm is simple and effective, also supported by classical theory. Fuzzy -means clustering algorithm is the most widely used fuzzy clustering algorithm based on objective function .
Let be sample dataset with samples, and each samplehas properties; namely; then the matrix of the sample dataset can be expressed as follows:
The objective function of FCM clustering algorithm is defined as [18–20]where is clustering category, ; is matrix of membership degree,, , ; is matrix of cluster center, , ; is membership degree of the th element in the th clustering category; is cluster center of the th clustering category, , ; is weighting exponent and empirical value is ; is the Euclidean distance of target data between and .
The basic idea of FCM algorithm is to find fuzzy matrix of membership degree and cluster center, making the objective function minimum.
Membership degree function satisfies the following equations:
According to the Lagrange multiplier optimization algorithm, the update formulas of membership degree and cluster center are expressed as
FCM algorithm is described as follows.
Step 1. Give the number of clustering categories . Set the number of iterations, weighting exponent, and stop parameters.
Step 2. Randomly select cluster centers , .
Step 3. According to formula (4), update the subjection function.
Step 4. According to formula (5), update the cluster center and set the updated cluster center .
Step 5. If , then turn to Step , ; otherwise, quit the loop and get the clustering results.
Although FCM clustering algorithm as a classical algorithm is widely applied in a variety of key areas [21–23], it has its own shortcomings and deficiencies. On one hand, the initial cluster centers are randomly selected, which leads to a possible local convergence rather than the global optimal solution; on the other hand, the objective function of FCM algorithm considers only the distance between sample data and cluster centers, but it ignores the effect of the distance between cluster centers on the solution. To solve the problem above, a variety of different improved algorithms are proposed [24–26]. In this paper, an improved FCM algorithm is proposed and applied to make a red tide prediction.
3. An Improved Fuzzy -Means Clustering Algorithm
3.1. Selection of the Initial Cluster Centers
The selection principle of the initial cluster center is to make the initial cluster center within a certain threshold contain more data. This not only ensures that the clustering algorithm finds the cluster centers in a number of feasible regions, but also effectively reduces the impact of the noise and outliers on the objective function.
Let be the sample dataset, the regional minimum threshold , and regional minimum data density ; according to the principle above, the selection steps of initial cluster centers are as follows.
Step 1. Calculate Euclidean distance between any two samples in the dataset to generate distance matrix . According to the distance matrix , choose the nearest two samples.
Step 2. With the center of the two samples as center and the regional threshold as radius, plan circular region. If the data density in the region is equal or greater than minimum data density in this region, this center is chosen as a kind of initial cluster center; otherwise the region is removed.
Step 3. Choose the nearest two samples in the remaining samples outside the region. Repeat Step until classes are found. If the selected classes are less than , the criteria of and will be relaxed.
Figure 1 shows the selection process of the initial cluster center. According to the distance matrix, the two red samples have the nearest distance among all the samples. However the data density in its region is small, so the red samples are removed from the sample data. In the rest of the sample data, the initial cluster centers are continued to look for until classes are found.
According to formula (4), the membership degree of the element in the th clustering category is determined by the relative radio of and , but the real distance is not reflected in the solution of subjection function.
As shown in Figure 2, the red data point is located on the bisecting line of two datasets below it. It is obviously difficult to distinguish the class of the red data point using the traditional FCM subjection function. The principle of the mean distance minimum is proposed to classify the boundary data: firstly, with the boundary red data point as center, as radius, plan circular region; secondly, respectively, calculate the mean value of the distance between the points in the circular region and the two cluster centers; finally, the boundary red data point is assigned to the class where the mean value of the distance is smaller.
3.2. Improvement of the Objective Function
As the most common analysis method of FCM, fuzzy clustering algorithm based on dissimilar objective function only considers weighting Euclidean distance between sample data and cluster centers, without taking into account the distance between each cluster center . Therefore, a weighting fuzzy -means based on dissimilar objective function (DWFCM) is proposed and weighted distance between cluster centers is added to dissimilar objective function according to the effect of the distance between cluster centers on clustering results.
Dissimilar objective function with distance between cluster centers is defined as where , , . is the Euclidean distance between cluster centers and .
According to the Lagrange multiplier optimization algorithm under the constraint conditions of formula (3), the update formulas of cluster center and membership degree are expressed as
Formula (6) adds the effect of the cluster centers on clustering results. This optimization obtains the minimum weighting Euclidean distance between sample data and cluster centers and the maximum Euclidean distance between cluster centers. However, this optimization does not consider the effect of the distance between cluster centers on clustering results. Therefore, DWFCM algorithm adds feature weight to Euclidean distance among cluster centers.
In order to obtain the feature weight, the similarity coefficient between the cluster centers should be calculated. There are several common methods as follows .
(1) The correlation coefficient method is defined aswhere , , , . , are the th attribute of the th, th cluster center.
(2) The least arithmetic average method is defined as
(3) The angle cosine method is defined as
The correlation coefficient method is used to calculate the similarity coefficient in the DWFCM algorithm.
The feature weight is defined as
The objective function of DWFCM is defined as
Considering the unit of sample data is not unified and the data is not complete, the data are normalized before the sample is classified. The normalized functions are as follows:where , .
3.3. The Description of Improved FCM Algorithm
The steps of improved FCM algorithm are as follows.
Step 1. Set the regional threshold , regional density , regional radius , category number , iteration , weighting exponent , and stop parameters .
Step 2. According to the number of regional thresholds, regional density, and regional radius, accomplish the selection of initial cluster centers.
Step 3. Calculate the correlation coefficient and feature weight among cluster centers.
Step 5. If , then turn to Step , ; otherwise, quit the loop and get the subjection function .
Step 6. The samples are classified according to the following methods:
In this case, the sample is classified into the th class.
4. Red Tide Prediction Analysis Based on Improved FCM Algorithm
Example 1. Some changes of the marine environment factors accompany the process of the red tide from happening to extinction. The original sample set is composed of marine environment factors which have a great influence on the red tide. FCM, possibilistic fuzzy -mean algorithm (PCM), and DWFCM are used to classify the original sample set. Considering that the increase of nitrogen concentration is the key factor of eutrophication of the seawater, meanwhile, the eutrophication of the seawater is the primary condition of the occurrence of red tide, and the change of phytoplankton density is also an important indicator to measure the occurrence of red tide; nitrogen concentration (μmol/L) and phytoplankton density (104/cubic meter) are chosen as the research factors. 21 original samples from the State Oceanic Administration are shown in Table 1 where the last sample is used for forecasting and other samples are used for cluster analysis.
Before clustering, the normalized processing of original samples is completed. If the horizontal axis represents nitrogen concentration and the vertical axis represents the phytoplankton density, the simulation results of three different algorithms are shown in Figures 3, 4, and 5, and the comparison of clustering results is shown in Table 2.
In Figures 3, 4, and 5, the samples where red tide occurs are marked as the inverted triangle. The samples where red tide will occur are marked as the asterisk. The samples where red tide does not occur are marked as rhombic. The red dots represent cluster centers of samples. In Table 2, the error score is the number of the misclassified data pieces, and the error rate is the percentage of the error score in the total data. From Figures 3, 4, and 5 and Table 1, DWFCM has the best performance among three algorithms: the selection of clustering center is the most reasonable; the number of iterations is also the least; the accuracy is also the best.
Example 2. In order to sufficiently demonstrate superiority of the proposed optimization model in this paper, this example chooses another original sample set which also have a great influence on the red tide. Considering that the water temperature is the key factor of plankton growth speed and the transparency can be used to evaluate the density of plankton, water temperature (°C) and transparency (m) are chosen as the research factors. 32 original samples are shown in Table 3 where the last sample is used for forecasting and other samples are used for cluster analysis. The specific clustering results and the analysis of clustering results are shown in Tables 3 and 4 and Figures 6, 7, and 8.
As to the FCM algorithm, the sum of the membership degree of the same sample belonging to all categories is 1, which makes FCM algorithm sensitive to noise and outliers . Figure 6 shows that FCM algorithm makes the wrong clustering division for some boundary samples. In PCM algorithm, the value of the membership degree reflects the real Euclidean distance between sample data and cluster centers, which makes PCM algorithm have good robustness to noise and outliers. Figure 7 shows that PCM algorithm makes the right clustering division for boundary samples, but there are still a large number of overlapping clusters. Because DWFCM algorithm uses the principle of regional minimum data density and the minimum mean distance to select the fixed initial cluster centers, which can effectively avoid the influence of noise and outliers, it also considered the influence of the weighted cluster center on the objective function, which makes the clustering results more accurate. After the last set is added to the clustering sample, the last set is clustered to the red tides cluster using the DWFCM algorithm and the fact is that the last set is indeed the sample when red tide occurs. The simulation results show that DWFCM algorithm can effectively achieve the prediction of red tide disaster.
From Figures 6, 7, and 8 and Table 4, FCM algorithm and PCM algorithm randomly select the initial cluster centers, which make the number of iterations and the clustering results very different from each other. However, DWFCM algorithm has selected the fixed initial cluster centers before the iteration, so the number of iterations is significantly reduced and the clustering results are unified.
In view of the complexity of the red tide disaster and the shortage of the previous prediction algorithm, a DWFCM algorithm is proposed to predict red tide. The initial cluster centers are chosen by the principle of regional minimum data density and the minimum mean distance, and the objective function is optimized by using the weighted cluster center. The simulation results show that DWFCM algorithm has better denoising ability and can optimize the prediction model of red tide disaster and get more accurate predictive results. However, DWFCM algorithm introduces many parameters. In order to get accurate parameter values, a lot of experiments have to be done. Therefore, the study of combining DWFCM algorithm with other algorithms to overcome the defects in the DWFCM algorithm is the focus of study in the future.
Conflict of Interests
The authors declare that there is no conflict of interests regarding the publication of this paper.
This work was supported by the Grand Science & Technology Program Shanghai China (no. 14DZ1100700).
W. Hong-Li and F. Jian-li, Ecological Dynamics and Prediction of Red Tide, Tianjin University Press, Tianjin, China, 2006.
S.-Z. Feng, F.-Q. Li, and S.-Q. Li, Introduction to Marine Science, Higher Education Press, Beijing, China, 1999.
J. F. Zhang, Y. Bai, J. Yu et al., “Forecast of red tide in the South China Sea by using the variation trend of hydrological and meteorological factors,” Marine Science Bulletin, vol. 8, no. 2, pp. 60–74, 2006.View at: Google Scholar
H.-L. Wang, J.-P. Xiang, and G. Ge, “Multivariate adaptive spline regression prediction of total phytoplankton,” Marine Technology, vol. 25, no. 3, pp. 7–9, 2006.View at: Google Scholar
R. Zhang, H. Yan, and L. P. Du, “Research on prediction of red tide based on fuzzy neural network,” Bulletin of Marine Science, vol. 8, no. 1, pp. 83–91, 2006.View at: Google Scholar
W.-L. Huang and D.-W. Ding, Prediction Mechanism and Technology of Red Tide Disaster, Ocean Press, Beijing, China, 2004.
F. Jin-Qing, “Attention to the complexity of science and research,” Nature Magazine, vol. 24, no. 1, pp. 7–14, 2002.View at: Google Scholar
F. Jian-Feng, Study on the Nonlinear Dynamics of the Planktonic Ecosystem and the Prediction of Red Tide, Engineering Mechanics of Tianjin University, 2005.
D.-M. Guan and X.-W. Zhan, “Red tide disasters in the coastal waters in China and its countermeasures,” Marine Environmental Science, no. 2, pp. 60–63, 2003.View at: Google Scholar
C.-S. Wang, S.-M. Tang, and P.-Q. Song, “Economic loss assessment of red tide disaster in China,” Marine Environmental Science, no. 3, pp. 428–431, 2011.View at: Google Scholar
X.-L. Wang, P.-Y. Sun, and Z.-H. Gao, “The status and progress of the prediction of red tide in China,” Progress in Marine Science, vol. 21, no. 1, pp. 93–98, 2003.View at: Google Scholar
H. L. Wang, J. F. Feng, S. P. Li, and F. Shen, “Statistical analysis and prediction of the concentration of harmful algae in bohai bay,” Transactions of Tianjin University, vol. 11, no. 4, pp. 308–312, 2005.View at: Google Scholar
X. Wei-Yi and Z. De-Di, “Numerical simulation of the process of the red tide in the real sea area,” Oceanologia, vol. 32, no. 6, pp. 598–604, 2001.View at: Google Scholar
N. Yong, The application of data mining to marine environment online monitoring and HVB prediction system [M.S. thesis], Shandong University, Jinan, China, 2008.
H.-K. Xu, J.-H. Chuai, Z.-H. Zhang, and H.-W. Fan, “Data mining of traffic flow in road tunnel,” Journal of Chang'an University, vol. 25, no. 4, pp. 66–69, 2005.View at: Google Scholar
J. Li, X.-B. Gao, and L.-C. Jiao, “New fuzzy clustering algorithm based on feature weighting,” Electronic Journal, vol. 34, no. 1, pp. 89–92, 2006.View at: Google Scholar
J. F. Zhang, Y. Bai, J. Yu et al., “Forecast of red tide in the South China sea by using the variation trend of hydrological and meteorological factors,” Marine Science Bulletin, vol. 8, no. 2, pp. 60–74, 2006.View at: Google Scholar
W. Jin-Pei and S. De-Shan, Modern Data Analysis, Machinery Industry Press, Beijing, China, 2006.