Abstract

The study of forest fire prediction is of great environmental and scientific significance. China’s Guangxi Autonomous Region has a high incidence rate of forest fires. At present, there is little research on forest fires in this area. The application of the artificial neural network and support vector machines (SVM) in forest fire prediction in this area can provide data for forest fire prevention and control in Guangxi. In this paper, based on Guangxi’s 2010–2018 satellite monitoring hotspot data, meteorology, terrain, vegetation, infrastructure, and socioeconomic data, the researchers determined the main forest fire driving factors in Guangxi. They used feature selection and backpropagation neural networks and radial basis SVM to build forest fire prediction models. Finally, the researchers use the accuracy, precision, and area under the characteristic curve (ROC-AUC) and other indicators to evaluate the predictive performance of the two models. The results showed that the prediction accuracy of the BP neural network and SVM is 92.16% and 89.89%, respectively. As both results are over 85%, the requirements of prediction accuracy is met. These results can be used for forest fire prediction in the Guangxi Autonomous Region. Specifically, the accuracy of the BP neural network was 0.93, which was higher than that of the SVM model (0.89); the recall of the SVM model was 0.84, which was lower than the BANN model (0.92), and the AUC value of the SVM model was 0.95, which was lower than the BP neural network model. The obtained results confirm that the BP neural network model can provide more prediction accuracy than support vector machines and is therefore more suitable for forest fire prediction in Guangxi, China. This research provides the necessary theoretical basis and data support for application in the field of forestry of the Guangxi Autonomous Region, China.

1. Introduction

Forest fires are one of the important disturbance factors in the global forest ecosystem [1], and it causes various degrees of negative impacts on the ecological environment, resources, human health, economy, and so on [28]. On March 30, 2019, there was a forest fire in Muli County, Liangshan Prefecture, Sichuan Province, China. This fire killed 31 people, and the total fire area was about 20 hectares [9]. Since July 2019, high temperatures and drought have caused forest fires in many places in Australia, killing 28 people, and over 10 million hectares of fire damage have been recorded [10]. The severe consequences of frequent forest fires mean that problems in forest fire management must be urgently resolved. Establishing a forest fire prediction model is of great significance for reducing economic losses and casualties. Therefore, the determination of forest fire driving factors is of high significance, as well as the establishment of a high-precision forest fire prediction model.

A large amount of research has been done on forest fire prediction models. The model commonly used by most scholars is the logistic regression model [1113]. Liu et al. used meteorological factors to predict the number of forest fires through exponential equations [14]. In the past, scholars have also used zero expansion models [1517] to develop fire prediction models, for example, Guo FT (2010) used ordinary least square regression (OLS), Poisson, and zero-inflated Poisson (ZIP) models to predict the number of forest fires in the Greater Xing’an Mountains, Heilongjiang Province, China. At the same time, a negative binomial (NB) regression model and a zero-inflated negative binomial (ZINB) model were used to simulate the relationship between forest fires and meteorological factors. Argañaraz et al. (2015) used weighted regression tree models (also known as boosted regression trees) to determine the occurrence of fires.

With the continuous development of artificial intelligence technology, the use of machine learning algorithms to build forest fire prediction models has drawn increasing attention from the scientific community [1826]. Artificial neural networks are a highly nonlinear dynamic system, which can approximate and simulate any nonlinear function of nonlinear dynamic phenomena such as forest fires with strong fault tolerances [27]. Compared to the traditional multiple linear regression models or parametric regression models, artificial neural networks have stronger nonlinear mapping capabilities and can also benefit from self-learning and adaptive capabilities. Many scholars have done relevant research on specific areas, for example, Alonso et al. [28] introduced an intelligent system for forest fire risk prediction based on a multilayer perceptron network. Sakr et al. [29] used two meteorological parameters, relative humidity and accumulated precipitation, to predict whether forest fires in developing countries would occur using both SVM and ANN. Hu [30] took the forest fire situation in Guilin and Guangzhou as examples and established a BP neural network prediction model with high accuracy. Wang et al. [31] used particle swarm optimization to optimize the BP neural network to fit and predict the forest farms of Nanjing Forestry University. Support vector machines are widely used in businesses and academia as a method of data mining and knowledge exploration [29, 3235]. In comparison with existing statistical methods, SVM can quickly and efficiently implement “transduction inference” from both training and prediction samples using a simple and robust algorithm. The support vector machine method is rarely used in forest fire prediction. Scholars engaged in related research mainly include Cortez and Morais [36] and Al_Janabi et al. [20]. Paulo Cortez used five different DM algorithms to predict the burning areas in the northeast of Portugal and showed the significance of SVM prediction in this field of research. Xu [19] used SVM and semipositive definite programming modeling to select the optimal kernel function of the support vector machine to establish an SVM model for forest fire prediction. Al_Janabi [20] used five different SC technologies such as the SVM algorithm to predict areas of fire interactions and finally determined that the SVM algorithms can provide accurate predictions.

China’s Guangxi Autonomous Region is an area known for a high incidence rate of forest fires. In the past, He and Lu [37, 38] used empirical methods or established linear regression models [39] to study the relationship between forest fire occurrence rules and driving factors in Guangxi. However, due to the complex nonlinear relationship between the occurrence of forest fires and the influencing factors, no one has yet established a suitable and accurate prediction model for the area. Therefore, based on factors such as weather and topography, this study uses Matlab and other software to establish both a neural network and a support vector machine model for forest fires in the Guangxi Zhuang Autonomous Region. This study will also calculate the prediction accuracy of both these respective models. Through the comparative analysis of model fitting results, this study will judge the adaptability of the two algorithms in forest fire prediction in the Guangxi forest area.

2. Materials and Methods

2.1. Study Area

Guangxi Zhuang Autonomous Region is located in South China, with a geographic location between 21° 54′–26° 24′N and 104° 28′–112° 04′E. The land area is 237,600 square kilometers, and the sea area is about 40,000 square kilometers. Guangxi belongs to the subtropical monsoon climate zone, with a warm climate, abundant rain, and sufficient light. Summer sunshine time is long, the temperature is high, and precipitation is high; winter sunshine time is short, and the weather is dry and warm. This place has an annual average temperature of 17.5–23.5°C, average annual rainfall 841.2–3387.5 mm, and annual sunshine 1213.0–2135.2 hours. Guangxi is generally a mountainous and hilly basin landform, with a basin-like shape as a whole, and its terrain rises from southeast to northwest [23]. Guangxi is rich in forest resources. It is one of the important forest areas in the south of China. Its forest land area is 15.2717 million hectares. Forest volume reaches 602 million cubic meters in Guangxi, ranking seventh in the country. The dominant tree species in Guangxi are Taxodium, eucalyptus, and pine. Guangxi is a high-incidence area of forest fires. According to incomplete statistics, there were 19379 forest fires in Guangxi from 2001 to 2018, with an average of 1077 forest fires per year. It is an area with severe forest fire hazards in China. Figure 1 shows survey of the research area.

2.2. Data Resources

The data in this article include six parts, which are fire data, meteorological data, terrain data, vegetation data, infrastructure data, and socioeconomic data. The fire point data come from the Fire Prevention and Management Department of the Ministry of Emergency Management of China, which includes hotspot data (longitude, latitude, and image time) of satellite monitoring in the Guangxi Autonomous Region from 2010 to 2018. The meteorological data are derived from the China Meteorological Data Network for a total of eight years of daily value data sets, which include eight factors such as the pressure, temperature, relative humidity, and precipitation of this station. DEM data are obtained through the geospatial data cloud website; the vegetation factor is represented by NDVI (Vegetation Normalization Index), and the monthly vegetation index comes from the resource environment data cloud platform. The basic geographic data come from the National Basic Geographic Database of 1 : 250,000 on the National Geographic Information Resource Catalog System website. This data include points, lines, and areas of railways, highways, water systems, and residential areas. Socioeconomic data include population density and GDP per capita. It uses population and environmental data cloud platforms to obtain population and GDP spatial distribution kilometer grid data.

2.3. Data Processing
2.3.1. Variable Processing

The dependent variable in the study is a binary variable, so we need to use ArcGIS to create a certain percentage of random points (i.e., nonfire points) and assign the fire point to 1 and the nonfire point to 0 [40]. In order to ensure that the data are not excessively scattered, random points are selected at a 1 : 1 ratio according to experience [41], and in principle, double randomness in space and time should be followed [42]. In this study, slope, aspect, and special holidays were set as categorical variables, and the rest were continuous variables. The classification of slope and aspect is shown in Table 1 and 2, respectively. Since Chinese traditional festivals burn paper to commemorate lost loved ones, Chinese New Year’s Eve, the first day of the first lunar month to the fourth day of the lunar month, the fifteenth day of the first lunar month, the Qingming Festival, and the Chinese Yuan Festival (July 15th) are designated as special holidays expressed as 1; nonspecial holidays are represented as 0. The random points in this study were created by ArcGIS software and based on the 2015 national land use data. After removing random points that fell on waters, urban land, and the ocean, the number of fire points and random points was 13,178. After processing the data, a total of 26 independent variables and their states are finally obtained, as shown in Table 3.

The data in this article contain a total of 26,355 sample points from 2010 to 2018. First of all, we need to clean these data, that is, to remove abnormal samples in the original data set (including some samples with missing data and samples with observations that obviously exceed the normal range) and add target category labels. The 26 independent variables were then preprocessed.

2.3.2. Normalization

The dimensions of each forest fire driving factor are different, and the data levels are different. In order to eliminate the dimension, large differences in the magnitude of the input and output data are avoided and the contributions of various factors are balanced; we normalize the data and convert all the data into numbers between [0,1]. The function form is as follows:where and are the values before and after data normalization and and are the maximum and minimum values of the full sample data, respectively [43].

2.4. Research Method
2.4.1. Relief Algorithm

The Relief algorithm is an efficient feature weighting algorithm that simplifies feature sets and is often applied to binary classification problems. The main idea of the algorithm is to weight the features according to the correlation of each feature to the category, and the weight is calculated from the ability of the feature to distinguish the two types of samples in the neighborhood. Then, according to the weight threshold that has been set, the features with weights less than the threshold will be removed, and we will finally get the optimal feature subset [44]. The content of the Relief algorithm is as follows: we randomly select a sample R from the training set, taking k nearest neighbor samples from the same sample set of R and then finding k nearest neighbor samples from other different types of sample sets, and finally the weight of the feature is updated according to equation (2) which is shown as follows [45]:where is the class ratio, is the class ratio of a sample selected randomly, represents the distance between samples and on feature A, and is the j-th nearest neighbor sample in , as shown in the following equation:

2.4.2. Backpropagation Neural Network (BPNN)

The BP neural network is a kind of multilayer feedforward neural network. Its main characteristics are forward signal transmission and error backpropagation. In the forward transfer, the input signal is processed layer by layer from the input layer through the hidden layer to the output layer. The neuron status of each layer only affects the neuron status of the next layer. If the output layer does not get the desired output, it switches to backpropagation and adjusts the network weights and thresholds according to the prediction error, so that the predicted output of the BP neural network is constantly approaching the expected output. The topology of the BP neural network is shown in Figure 2

The iterative formula of the BP algorithm is expressed aswhere is a matrix of weights and thresholds, represents the gradient of the current performance function, and represents the learning rate. Taking the calculation process of a three-layer BP network algorithm as an example, the input node , the hidden node , and the output node are obtained. The network connection weight between the input layer node and the hidden layer node is , and the other network connection weight between the hidden layer node and the output layer node is . If the expected value of the output node is , the calculation formula of the model is as follows:

The output of the hidden node is

The output of the output node is

The error of the output node is

Then, the following steps are performed:(1)The error function differentiates the output nodes;(2)Network error function is used to differentiate nodes in hidden layers;(3)The threshold is corrected;(4)The derivative of the transfer function f (x) is obtained [46].

2.4.3. Support Vector Machine (SVM)

Support vector machine (SVM) is mainly used for pattern classification and nonlinear regression. It is a general learning algorithm based on the principle of structural risk minimization. Its basic idea is to establish a classification hyperplane as a decision surface, so that the isolated edge between positive and negative examples is maximized. This provides good generalization performance. The main reason for using support vector machines is to deal with with various nonlinear problems and to flexibly use kernel functions to improve the ability to transform data from high-dimensional space. Taking a binary classification support vector machine as an example, a training set is given, where and is a feature vector. The penalty parameter C and the kernel function were selected to construct and solve the optimization problem [47]:

And, the optimal solution: is obtained. Then, a positive component of , is selected and the threshold value was calculated based on this:

Finally, the decision function is constructed:

2.4.4. Multiple Collinearity Test

Multicollinearity means that there is a high correlation between predictors in a linear regression model [48]. Before the model prediction, since the problem of collinearity among multiple variables is very common, if there is strong collinearity among the variables, the accuracy of the prediction of the prediction model may be reduced. Variance inflation factor (VIF) was used to test the predictive variables of the model, and collinearly significant variables were excluded. The remaining variables entered the next variable screening process. The expression of the variance expansion coefficient is

Generally, VIF = 10 is used as the standard. When VIF > 10, the collinearity between independent variables is significant, and the corresponding independent variables can be gradually eliminated to improve the accuracy of the model. In the experiment, normalized 26 independent variables were selected for linear modeling. Based on the results of the multicollinearity test, the variables with a VIF > 10 value were eliminated: Avst, Mast, Mist, Mrh, Ate, Mate, and Mite. The VIF values of the remaining 19 variables are shown in Table 4.

2.4.5. Quality Measures

Confusion Matrix. Confusion matrix, also called error matrix, is a standard format for expressing accuracy evaluation. It is represented by a matrix of n rows and n columns. Compared with the calculated [49], its form is shown in Table 5.

Furthermore, classification performance indicators such as accuracy, precision, recall, and f1 value are used to evaluate the model effect.(1)Accuracy: the ratio of the number of correct samples (TP and TN) to the total number of samples. Generally speaking, the higher the accuracy, the better the classifier. The formula is as follows:(2)Precision: characterizes the classification effect of the classifier (precision effect), which predicts the correct frequency value in instances where the prediction is a positive sample:(3)Recall (recall rate): the recall (recall) effect of a certain class is to predict the correct frequency in the examples labeled positive samples:(4)f1 value: the value used to measure precision and recall, and it is the harmonic mean of these two values:

ROC Curve. The receiver operating characteristic (ROC) curve is a method for judging the prediction effect of a model, and the prediction accuracy of the model is determined by the value of the area under the curve (AUC). The area under the curve (AUC) ranges from 0.5 to 1; the larger the value, the higher the degree of fit of the model.

3. Results

This research used Matlab 2019, a programming language for algorithm implementation. Matlab is a commercial mathematical software produced by MathWorks in the United States. It is an advanced technical computing language and an interactive environment for algorithm development, data visualization, data analysis, and numerical calculation. It mainly includes Matlab and Simulink [50]. In order to evaluate the characteristic factors and their performance based on quality metrics, this study divided the data set into two parts. The first dataset is a random extract comprising 70% of the preprocessed sample data acting as the training set. The remaining 30% of data will act as the test set.

3.1. Feature Selection

Since the multicollinearity test can only remove strong collinearity feature factors, it cannot effectively reduce the model calculation rate and reduce errors. Therefore, this paper uses the Relief algorithm to filter the remaining 19-dimensional data and remove irrelevant features, as well as reduce the dimensions to improve the model operation rate and the classification performance of the classifier. The researchers first used the Relief algorithm to calculate the weight coefficients of each feature and then arranged the feature variables in a descending order of the weight coefficients to obtain the correlation weight coefficient W of each feature and the target category. Subsequently, the weight threshold W was set to 0.005 to eliminate features smaller than the threshold to achieve an optimal feature subset. Figure 3 shows the 19 feature subsets in a descending order generated by the Relief algorithm. From Figure 3, the sunshine time weight coefficient is the largest and has the greatest correlation with the target category. The sunshine time has the greatest impact on the occurrence of forest fires, so it is taken as the first feature of the final feature subset. In contrast, the effect of the aspect weight coefficient is minimal as it has the least correlation with the target category; therefore, this variable is eliminated. Finally, the eight feature subsets of sunshine hours, average relative humidity, cumulative precipitation at 20-20 hours, average wind speed, maximum wind speed, closest distance from the fire point to the residential area, latitude, and special festivals were entered into the neural network model and support vector machine model establishment.

3.2. Applying Predictors
3.2.1. BPNN

In this study, the input layer was set to eight indicators (Suh, Arh, Pre, Aws, Mws Set, Lat, and Sfe) after feature selection, and the output layer was 1 or 0 (to show whether a forest fire occurs or not). The output layer and categorical variable Sfe were processed using one-hot coding. The number of input neurons was 8, and the number of output neurons was 2. For the number of nodes in the hidden layer, a trial and error method was implemented to select the number of neurons corresponding to the least number of trainings and the smallest error. Subsequently, the optimal number of hidden layer nodes was finally determined to be 10. Therefore, the topology of the BP neural network in this study was finally determined to be 8 : 10 : 2. Researchers used the adaptive learning rate momentum factor gradient training method as the training function.

The BP neural network model used 70% of the training set to predict the remaining test set. The comparison between the predicted value and the actual value is shown in Figure 4.

Figure 4 shows that the number of errors between the predicted value and the measured value are small, and the prediction result is good. The results obtained after training are inversely normalized, and the relative errors between the results and the standard results are calculated. The calculated relative error histogram is shown in Figure 5. This figure shows that the relative errors of the training samples and the test samples are mainly scattered between −0.97 and 0.97, and the errors are relatively small and concentrated. After model training, the accuracy rate of the model test set reached 92.16%, and the mean square error (MSE) is 0.25. Therefore, the model effect is ideal.

3.2.2. SVM

In this study, the SVM was built using the LIBSVM package in Matlab, and the model was constructed using a radial basis function (RBF) kernel function for processing nonlinear data. The researcher obtained a model with a radial basis function and parameters by training the SVM of the training group data. Following this, the test set can be inputted for a fit test. Finally, the researcher used the model performance evaluation index to judge the validity of the model.

The researchers used the grid search method and cross-validation to select parameters and determined the penalty parameter C and the kernel parameter (g), where the number of cross-validation (k) was set to 10. Figure 6 shows an SVC parameter selection result map (contour map) and an SVC parameter selection result map (3D view). After calculation, the accuracy of the grid search method reached 93.42% and the accuracy of cross-validation reached 82.60%.

It can be seen from Figure 6 that the optimal values of C and are 0.5 and 9, respectively. After the parameters are set to the optimal values, SVM modeling is performed and the predicted values are obtained. Figure 7 shows a comparison chart between actual and predicted values.

After optimization, the number of support vectors is 5166, the number of support vectors at the boundary is 3292, and the total number of support vectors is 5166. The number of support vectors is inferred to be a response to whether the model is overlearning. Of the 18,449 support vectors in the training data, 5166 support vectors have not experienced learning.

After model training, the accuracy of the training set is 94.92%, the accuracy of the test set is 89.89%, and the mean square error (MSE) is 0.35. Therefore, it can be concluded that the model is performing at near optimal levels.

3.3. Performance Analysis

Researchers built a confusion matrix on the prediction results of the BP neural network and the support vector machine model. Figure 8 shows the confusion matrix visualization. Results of the confusion matrix and Figure 8 display the final accuracy rate of the BPNN model as 92.16%, the total number of correctly predicted samples as 7,286, and the number of incorrectly predicted samples as 620. The accuracy of SVM is 89.89%, and the total number of correctly predicted samples is 7,107. The total number of prediction errors is 799, meaning the accuracy of the BP neural network model prediction is higher. This study used accuracy, precision, recall, f1, and MSE (mean squared error) indicators to analyse and compare the model’s generalization ability in more detail, while using the ROC curve and the AUC value to evaluate the prediction accuracy of the model.

Table 6 shows the specific values of the five indicators of the model test sets, and Figure 9 shows the comparison results of the evaluation indicators of the BPANN and SVM models. Table 6 shows that the mean square error of the BP neural network is less than 0.1 of the SVM model. In addition, the accuracy of the BP neural network is 0.93, which is 0.03 higher than that of the SVM model. At the same time, for the prediction results, the overall accuracy rate of BPANN is higher than SVM 0.02. Figure 9 shows that the BPANN indicators are slightly higher than the SVM. The AUC values of both models in Figure 10 are higher than 85%, indicating that both models have high accuracy in predicting forest fire occurrence in the Guangxi Autonomous Region. In addition, it can be seen from the ROC curve that the prediction effect of the BP neural network model is higher than that of the SVM model.

4. Discussion

This paper analyzed the factors that affect the occurrence of fires through feature selection. For the 26 selected forest fire driving factors, the first five factors that have the greatest impact are all meteorological factors. Therefore, it can be concluded that meteorological factors are the most important cause of fires in the Guangxi Autonomous Region, China. Meteorological factors have a direct relationship with natural or man-made forest fires. Among meteorological factors, the number of hours of sunshine is the most important factor. In the daytime, sunshine is long, the temperature is high, and the relative humidity in the air is low, which means forest fires are more likely. During morning, evening, or night, the temperature is low and the relative humidity in the air is high, meaning forest fires are more unlikely. At the same time, precipitation is also related to humidity and directly affects the occurrence of fires [21, 51]. Human factors also play an important factor contributing to the occurrence of forest fires. These factors include the distance from fire points to residential areas and special festival factors. Frequent human activities, such as burning paper, reclamation, wasteland, or arson will cause the frequency of forest fires to gradually increase [52]. In the selection of forest fire driving factors, this study first used multicollinearity tests and then selected features. In the test of collinearity, a linear regression model is used to eliminate variables with high correlation between explanatory variables. However, due to the limited sample size and the representativeness of forest fire drivers, the results may miss some important factors related to forest fires [53]. Therefore, it is necessary to further analyse the explanatory variables before elimination in the future. More important factors of the dependent variable should not be excluded and entered into the next modeling classification. In addition, for the method of feature selection, this study uses the Relief algorithm to assign weights to features [54]; however, this algorithm cannot remove feature redundancy, hence it is necessary to optimize the algorithm or adopt other algorithms for this research. The setting of the weight threshold is the experience value chosen by the researcher, and a more scientific selection method needs to be discussed and implemented in the future.

For the application of these two models in forest fire prediction, Hong et al. [55] used support vector machine algorithms to predict forest fires in Dayu County, southwest of Jiangxi Province, China. The results show that the AUC value of the support vector machine is 0.75, while the AUC value of the support vector machine model in this experiment is 0.95. This result may be due to Haoyuan Hong’s prediction of 184 (fire and nonfire) data for Jiangxi Province, whereas this study selected 26355 samples. A larger training sample size provides more accurate and reliable results. Bisquert et al. [56] used artificial neural network models to predict forest fire risks in Galicia (northwestern Spain) and obtained an accuracy rate of 76% for the artificial neural network models. In comparison, the accuracy of the artificial neural network in this study was 92.16%. The forest fire driving factors used by Mar Bisquert included fire history, the enhanced vegetation index (EVI), and ground temperatures for each unit, whereas over twenty factors were used in this study. Multidimensional variables make model training more accurate. From a comprehensive point of view, the results of this study has reached a higher accuracy rate. The possible reason is that the researcher has comprehensively considered many forest fire driving factors, including 26 hours of sunshine, rainfall, and the distance from fire to residential sites. Secondly, for these factors, this study used a feature selection algorithm suitable for binary classification to determine the weight of each factor, which can more scientifically screen the driving factors that have less correlation with forest fire occurrence.

For the linear inseparability of the data, researchers used BP neural networks and support and vector machine models for classification and prediction. Support vector machines can map input vectors to high-dimensional spaces and better construct optimal classification surfaces through kernel functions. In this study, a radial basis function was used, and the grid search method was implemented to traverse all parameter combinations in the search range to optimize the kernel parameters and therefore improve the prediction accuracy. The analysis results showed that the prediction accuracy of RBF SVM exceeds 85%, which is significantly high. The model has a strong learning ability and good generalization ability, and this makes it suitable for predicting the occurrence of forest fires. However, the model has some shortcomings. For example, when the model complexity is increased, the model calculation speed is significantly reduced. This is also a reason for the limitation of support vector machines, that is, solving the convex quadratic programming problem takes up a large amount of time due to the complexity, making this method impractical for large-scale sample data sets [19, 57]. It is recommended that future researchers further study and optimize the algorithm for this purpose. In addition, this study used the radial basis kernel function SVM for modeling based on previous experience, and the results are promising. However, in subsequent studies, multiple methods such as polynomial kernel functions should be used for comparison, and the support vector machine model with the best effect should be selected for prediction.

BP neural networks are a type of one-way propagation multilayer feedforward network [58], which has strong nonlinear mapping abilities, including self-learning and adaptive technology. In the course of the experiment, since there is no clear selection method for the number of nodes in the hidden layer, the previous experience generally sets the number of layers to the number of test classifications. In this experiment, the final hidden layer node number is determined by the empirical value and the trial and error method. The optimal number of model iterations is 54 times, and the data sample calculation time is 7 seconds. The model calculation speed is very fast, and the prediction accuracy is high. This combines to provide excellent predictions of forest fires. However, in this study, for the selection of the initial weight and threshold of the BP neural network, this paper used a random setting of the model. In order to better improve the prediction accuracy, genetic algorithms or particle swarm optimizations can be used in future research to optimize the initial weight and threshold of the network. From the overall analysis, the prediction accuracy of both models is greater than 85%; therefore, it has a good applicability for forest fire prediction. However, for the prediction of forest fire occurrence in the Guangxi Autonomous Region, the generalization ability of the BP neural network is slightly higher than that of support vector machine; hence, the BP neural network model is more suitable for predicting the occurrence of forest fire in the Guangxi Autonomous Region, China.

5. Conclusions

This study initially conducted feature selections of forest fire driving factors. These were eight variables of sunshine, average relative humidity, cumulative precipitation at 20-20, average wind speed, maximum wind speed, closest distance of fire points to residential areas, latitude, and special festivals. These variables acted as the model inputs, and whether forest fires occurred was the output variable. Researchers established multilayer feedforward artificial neural networks and radial basis SVM models to predict forest fires in the Guangxi Autonomous Region, China. The results showed that the accuracy of the backpropagation neural network model test set is 92.16%, and the mean square error is 0.25; the accuracy of the SVM model test set is 89.89%, and the mean square error is 0.35. The overall model training result is that the accuracy of the backpropagation neural network is 92.16%, and the AUC value is 0.99; the support vector machine is 89.89%, and the AUC value is 0.95. It can be seen from the results that both models have high prediction accuracy for Guangxi and both are applicable to the Guangxi Autonomous Region. However, it is concluded that the BP neural network model is better than the support vector machine model. BP neural networks have strong self-learning, self-adaptive abilities in addition to fast calculation speeds for large samples, which allows the best prediction models in this field of research. These results can provide a reference for future modeling of Guangxi forest fires.

Data Availability

The data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

The authors would like to acknowledge support from the Beijing Key Laboratory for Precision Forestry, Beijing Forestry University, as well as all the people who have contributed to this paper. This research was jointly supported by the medium long-term project of “Precision Forestry Key Technology and Equipment Research” (no. 2015ZCQ-LX-01) and the National Natural Science Foundation of China (no. U1710123).