Abstract

The forest fire occurrence prediction model is a very useful tool for preventing and extinguishing forest fires. The determination of forest fire drivers is important for establishing a high-precision forest fire prediction model. In this paper, we studied the relative influence of different types of factors on forest fire occurrence in forest areas of Jiangxi Province. Several models, i.e., Multilayer perceptron (MLP), Logistic, and Support vector machine (SVM), are used to predict the occurrence of forest fires. Through modeling and analysis of forest fire data from 2010 to 2016 years, we found that climatic and topographic are influential factors in the model of forest fire occurrence in Jiangxi Province. Subsequently, we established the MLP occurrence model based on the significant factors after the variable screening. Using ROC plots to compare the effects of the three models, MLP scored 0.984, which was higher than Logistic of 0.933 and SVM of 0.974. For the independent validation set of 2017-2018, an accuracy of 91.73% was also achieved. Therefore, the multilayer perceptron is well suited for the prediction of forest fires in Jiangxi Province. Based on the prediction results, a fire risk level map of Jiangxi Province was produced. Finally, we analyzed the changes in forest fire quantity under climate change, which can be helpful for fire prevention and suppression of forest fires.

1. Introduction

Forest fire is one of the most dangerous natural disasters facing mankind today, causing great harm to forest resources, human life, property safety, and ecological environment [1]. From 2003 to 2018 years, there were 111,446 forest fire disasters in China, and the total area of forest fires nationwide was 3,289,500 hm2, with an average annual fire area of 205,600 hm2 [2, 3]. In recent years, due to the influence of climate change and human activities, the danger period of forest fires has been advanced and extended, and the number of forest fires and the area affected by them has increased globally [4, 5]. Exploring the occurrence pattern and spatial and temporal distribution pattern of forest fires is the key to developing forest disaster prevention and mitigation. The identification of forest fire drivers and the establishment of high-precision forest fire prediction models are of great significance for forest fire prevention.

Predicting the probability of forest fires in different areas in combination with the factors that trigger them has become a popular research topic.

Several statistical methods have been widely used for forest fire modeling, such as binary logistic models [68], Poisson regression models [9], and geographically weighted logistic models [10, 11]. In recent years, there has been a clear trend toward the application of machine learning algorithms, such as Bayesian [12], Random Forest [13, 14], SVM [15, 16], and Augmented Regression Trees [17], widely available for forest fire occurrence prediction.

In particular, with the development of computer technology and artificial intelligence, the application fields of artificial neural networks (ANNs) have received extensive attention and continue to expand. The traditional ANN model with shallow neural networks such as multilayer perceptrons (MLPs) has been widely applied to the prediction of fires for a long time [18]. Compared to the traditional multiple linear regression models or parametric regression models, multilayer perceptrons have stronger nonlinear mapping capabilities and can also benefit from self-learning and adaptive capabilities. The MLP model is robust and fault-tolerant due to the information distribution stored in the neurons within the network. The parallel processing method of multilayer perceptron makes the computation fast. It also has an excellent performance in the prediction of other natural disasters, such as precipitation forecasting [19], groundwater [20, 21], and drought forecasting [21, 22].

In order to be able to accurately predict the probability of forest fires, scholars have made many efforts to find the main drivers of forest fires. Previous studies have shown that forest fire is the result of the interaction of multiple factors [2325]. Meteorological factors [26], topography [27, 28], human activities [11, 29], vegetation cover [30, 31], and infrastructure [32, 33] are all influential factors in the occurrence of forest fires. The influence of climate or fuel conditions and their combinations on forest fire is complex, and different types of forest fire generally differ in terms of drivers and potential mechanisms; e.g., cultural fire disasters are common in northern China and industrial fire disasters are common in subtropical counties [34]. The relationship between forest fire drivers and occurrence in northern China and subtropical forest ecosystems has also been found to be nonstationary [35]. However, the combination of multiple forest fire factors and the use of MLP to predict forest fires in China has not yet been supported by mature research. Previous prediction studies have used feature selection less frequently, and we used the ability of the MLP hidden layer to extract depth features to select important forest fire drivers to improve the accuracy of the prediction model.

Based on the above considerations, this paper explores the adaptability of the MLP model for forest fire forecasting tasks in forest areas of Jiangxi Province based on a combination of weather, terrain, human socioeconomic, vegetation, and infrastructure factors, using a multilayer perceptron to develop a forest fire occurrence forecasting model within Jiangxi Province by comparing the commonly used Logistic and SVM models. The developed method is applied to Jiangxi Province, China, while the produced maps can help in the work of planning fire suppression strategies, establishing key fire zones, and controlling the scale of occurring forest fires.

2. Materials and Methods

2.1. Study Area

Jiangxi Province is located in southeastern China (24°29′–30°04′ N, 113°34′–118°28′ E), with a total land area of 166,900 square kilometers and is one of the four largest forest areas in China, with a forest cover of 61.16% and 607.96 million cubic meters of standing timber in 2018. Jiangxi Province has a subtropical monsoon climate. The average annual rainfall is 1,670 mm, the average annual temperature is 17°C, and the annual precipitation is 1,700–1,900 mm. More than 78% of the province is mountainous and hilly. Most of the province’s forests are natural secondary forests, with a large proportion of coniferous forests, and the top vegetation type of the zone is subtropical evergreen broad-leaved forests Figure 1 shows survey of the research area.

2.2. Data Resources
2.2.1. Fire Point Data

The fire point data used in this study were obtained from the 2010–2018 national satellite monitoring hotspot data, which were provided by the Department of Fire Prevention and Control Management, Ministry of National Emergency Management, China, and have been preliminarily processed and all data are fire point data. The data include nine elements of image time, longitude, latitude, land type, albedo 1, albedo 2, bright temperature 3, bright temperature 4, and bright temperature 5, from which the data with the land type of forest land are selected as forest fire points.

In this study, each image element represents a fire point. Nonfire point data was created through ArcGIS software of random point composition with a 1 : 1.4 ratio of fire points to nonfire points, which together constitute the sample points. To ensure that the random points fall on forest land, a range of forest land was extracted on the basis of 2015 national land use data, and random points were created within that range (data source: Geographic State Monitoring Cloud Platform (https://www.dsac.cn/)). Because the random points need to be matched with daily value meteorological data, it is also necessary to add date attributes to each random point after creating them, and the spatial extent and temporal extent of random points are consistent with forest points. Finally, the numbers of fire points and control points in the sample set are 14298 and 20141, respectively.

2.2.2. Climate Elements

In this study, daily meteorological data from 22 national meteorological stations in the study area were extracted, and the daily climatic data were provided by the China Meteorological Information Sharing Network (https://cdc.cma.gov.cn/). The meteorological dataset contains eleven climate factors (as shown in Table 1). The daily climate variables for the fire and control points were provided by the weather station closest to each fire site.

2.2.3. Topographic Elements

The 30 m resolution digital elevation model (DEM) data were derived from the geospatial data cloud (https://www.gscloud.cn/). Elevation, slope, and aspect values were extracted from the data. The slope directions were classified as flat, north (0–22.5°), northeast (22.5–67.5°), east (67.5–112.5°), southeast (112.5–157.5°), south (157.5–202.5°), southwest (202.5–247.5°), west (247.5–292.5°), northwest (292–337.5°), and north (337.5–360°).

2.2.4. Vegetation Elements

Seasonal vegetation indices were obtained from MOD13Q1. NDVI and EVI datasets were obtained through NASA Earthdata Search (https://search.earthdata.nasa.gov/) at a spatial resolution of 250 m.

2.2.5. Infrastructure Data

The basic geographic data were obtained from the 1 : 250,000 national basic geographic database on the National Geographic Information Resource Catalog System website. The distance from the fire site to the nearest railroad, the distance to the nearest road, and the distance to the nearest river was calculated by ArcGIS 10.2.

2.2.6. Socioeconomic Factors

Socioeconomic data include population density and GDP per capita at 1 km resolution, which was associated with fire points and control points using ArcGIS raster extraction tools.

For detailed information, see Table 1.

2.3. Data Processing
2.3.1. Standardization

The dimensions and units of each forest fire variable are inconsistent. To avoid the imbalance in the contribution of variables due to excessive differences in the size of the variables, we normalized the data and converted them to between 0 and 1, as shown in the formula:

Xi and Yi denote the values before normalization, Xmax denotes the maximum value of this variable, and Xmin denotes the minimum value of this variable.

2.4. Methods
2.4.1. Multilayer Perceptron (MLP)

A multilayer perceptron (MLP) is a feedforward artificial neural network model that maps sets of input data onto a set of appropriate outputs consisting of multiple layers of nodes in a directed graph, with each layer fully connected to the next one. MLP utilizes a supervised learning technique called backpropagation for training the network. MLP is a modification of the standard linear perceptron and can distinguish data that are not linearly separable. The topology of the perceptron is shown in Figure 2.

2.4.2. Design

Before training the model, we randomly divided the data set from 2010 to 2016 years into two parts. One set (training set, 70% of all samples) was used for model training, and the other set (test set, 30% of all samples) was used for model validation, in addition to independent validation of the model with forest fire data from 2017 to 2018 years in Jiangxi Province. The 23 forest fire factors in the training set were input into the model. The MLP network used in this study has four hidden layers, each with 512, 256, 128, and 64 filters, respectively. The feature vectors obtained from these dense layers were passed through the ReLU nonlinear activation function to the next layer. All models were trained using Adam-SGD with an initial learning rate of 10-3, and the learning rate was halved every 20 learning hours. The trained MLP models were tested with a test set, and the original model accuracy was obtained. To explore the significance of each factor in predicting forest fire occurrence, we isolated individual forest fire factors separately and trained them using MLP to obtain the forest fire prediction models corresponding to each isolated factor (23) and compared them with the accuracy of the original model (the case without the isolated factor), and the forest fire factor with a large difference in accuracy was the significant forest fire factor we were looking for. Then, we removed the insignificant factors from the training set and used them as an input layer to fit the final MLP model. Similarly, we used the filtered training set containing the significant factors to train the logistic regression model and the SVM model. Comparing the prediction accuracy of the Logistic Regression model, SVM model, and MLP model in the study area was used to find the most suitable prediction model for forest fire occurrence in Jiangxi Province as the optimal model. The test set was input into the optimal model, and the test accuracy for 2010–2016 years was obtained. The optimal model was used to predict the occurrence of forest fires in 2017 and 2018 years and was used to verify the model accuracy. Based on the prediction results of the optimal model, the probability maps of forest fires were drawn.

The empirical test suggested in Tien Bui et al., 2016c was used. Finally, the MLP-Net model with 4 neurons in the hidden layer is found to be the best neural network structure for processing the data at hand MLP network used in this study uses 4 hidden layers. Each layer has 512, 256, 128, and 64 neurons. The feature vectors obtained from these dense layers passed the ReLU nonlinear activation function and entered the next layer. Models were trained using Adam-SGD and the initial learning rate was 10−3. The learning rate was halved every 20 epochs.

3. Results and Analysis

3.1. Identification of Significant Variables

All factors from the training set were input into the MLP model, and a test accuracy of 94.73% was obtained. To verify the significance of each factor on forest fire prediction, we used the isolated factor method to filter variables and construct a new training set to retrain and test the model to determine the importance of each factor on forest fire occurrence.

3.1.1. Climatic Factors

Among the isolated meteorological factors, the factors that decreased compared to the accuracy of the original model were daily mean surface pressure, daily mean surface temperature, daily cumulative precipitation at 20-20 (mm), daily mean pressure, daily mean temperature, daily mean wind speed, daily minimum humidity, sunshine duration, daily minimum temperature, and daily minimum wind speed; thus, all of these factors were significant in forest fire occurrence. (Tables 2 and 3)

3.1.2. Topographic Factors

Among the topographic factors, the importance of different topographic factors varied widely, and we retrained and tested the model separately by isolating each topographic factor, and the testing accuracy is shown in Table Number. Since isolating factors such as slope, slope direction, longitude, and latitude led to a decrease in model accuracy, we can determine that these factors have a driving effect on forest fire prediction (Table 4).

3.1.3. Vegetation Factors

By isolating vegetation factors, we determined that both NDVI and EVI have a significant influence on forest fire occurrence. However, the vegetation factors were relatively less significant compared to the meteorological and topographic factors. (Table 5)

3.1.4. Human Factors

Among the isolated meteorological factors, the important factors that were reduced compared to the accuracy of the original model were special holidays, GDP, and POP, which were less influential than meteorological, topographic, and vegetation factors (Table 6).

3.1.5. Infrastructure Factors

After isolating the infrastructure factors, we found that the accuracy increased instead compared to the original model. Therefore, the distance of fire sites from water, distance from railroads, distance from roads, and distance from residential areas have no effect on forest fire occurrence. We remove these four influence factors, which have little correlation.

3.2. Comparison of MLP with Logistic and SVM Models

The subject operating characteristic (ROC) curve is a way to determine the predictive effectiveness of a model, and the predictive accuracy of the model is determined by the value of the area under the curve (AUC). The area under the curve (AUC) ranges from 0.5 to 1; the higher the value, the better the fit of the model.

The binary logistic model is a commonly used model for predicting forest fire occurrence and is a generalized linear regression model. Logistic models are powerful in terms of predictability but are limited by the assumptions of normality and linear relationships.

The SVM model is a supervised algorithm used to solve the classification problem with the aim of finding the best separating hyperplane on the feature space that maximizes the interval of positive degree samples on the dataset, introducing a kernel approach that can be used to solve nonlinear problems [36]. The kernel method is introduced and can be used to solve nonlinear problems. The kernel method projects the data into a new space, making the projected data linearly separable and ideal for integrating high-precision models with small samples [37] (Figure 3).

The test set from 2010 to 2016 and the data from 2017 to 2018 were substituted into the MLP and logistic models, and the AUC plots were used to compare the prediction effectiveness of the three methods after testing. As can be seen from Figure 3, the AUC value of the MLP model was closer to 1, while the Logistic model, which is routinely used to predict the occurrence of forest fires, was ineffective at 0.933, and the AUC value of the SVM was 0.974, which is between MLP and logistic. The area under the curve (AUC) of the upper curve is larger than the lower curve, indicating a better model fit based on the screened MLP for predicting forest fire occurrence.

3.3. Multilayer Perceptron Predicts the Probability of Forest Fires

Based on the selected significant factors, the final forest fire occurrence prediction model was trained and the model was validated using a test set. The results are shown in Figure 4.

The MLP model predicted the probability of forest fire occurrence from 2010 to 2018 years, and Figure 4 shows that the model predicted the probability of forest fire occurrence in the study area with high accuracy of 91% or more for all years and lower accuracy for 2011, 2017, and 2018.

3.4. Forest Fire Risk Level Map

We generated a fire occurrence probability distribution map indicating the relative probability of a forest fire at the location based on the predicted results. We divided the fire probability into five classes, less than 0.2 as low risk, 0.2–0.4 as lower risk, 0.4–0.6 as medium risk area, 0.6–0.8 as higher risk area, and greater than 0.8 as high-risk area. Figure 5 shows that the high-risk areas are mainly located in central and northern Gan, such as Jiujiang, Jingdezhen, northern Ganzhou, eastern Fuzhou, northern Ji’an, and eastern Shangrao (Figure 5).

3.5. Analysis of the Probability of Natural Fires under Weather Changes
3.5.1. Analysis of Annual Fire Frequency under Climate Change

We also analyzed the annual variation of the number of natural fires caused by 11 climatic factors over the years (2010 to 2018). As shown in Figure 6, we separately counted the mean values of different climatic factors (daily average surface temperature, daily average wind speed, daily average air temperature, daily average relative humidity, maximum daily surface temperature, daily maximum air temperature, daily maximum wind speed, daily minimum relative humidity, daily precipitation, and daily average water vapor pressure) for each year and compared them with the annual trend of fire occurrence.

The high number of fire points in 2011, 2013, and 2014, and the generally high average surface temperature, daily maximum surface temperature, average daily temperature, hours of sunshine, and the average temperature in these years compared to other years’ values, as seen on the multiyear average meteorological maps. The daily average relative humidity, the daily minimum relative humidity, and the accumulated precipitation at 20-20 hours are relatively low. Daily mean wind speed, mean home station pressure, mean wind speed, and maximum wind speed do not vary much, and on the annual scale, pressure and wind speed are not the main influencing factors.

3.5.2. Monthly Fire Frequency Analysis under Climate Change

In this paper, we also analyze the monthly variation in the number of natural fires caused by 11 climatic factors over the years (2010 to 2018). As shown in Figure 7, we separately counted the average values of different climatic factors (daily average surface temperature, daily average wind speed, daily average air temperature, daily average relative humidity, maximum daily surface temperature, daily maximum air temperature, daily maximum wind speed, daily minimum relative humidity, and daily average water vapor pressure) and compared them with the monthly cumulative number of fires for 12 months over 9 years.

The number of fire points is high in December, January, and March. From the monthly average meteorological map, compared with the values of other months, the monthly average air pressure is generally higher in these months, the daily average relative humidity, daily minimum relative humidity, and monthly cumulative precipitation are relatively low, and the average surface temperature, monthly maximum surface temperature, monthly average temperature, monthly cumulative sunshine hours, monthly average temperature, daily average wind speed, average station pressure, average wind speed, and maximum wind speed were not very differen; on the monthly scale, humidity and precipitation were the main influencing factors. The average air pressure and the number of fire points are positively related, and the humidity and the number of fire points are inversely related.

4. Discussion

Combining multiple forest fire factors using MLP to develop a forest fire occurrence prediction model, significant drivers of forest fire occurrence in Jiangxi Province were obtained by isolating factors. MLP model testing reported that meteorological factors, topographic factors, vegetation factors, and human social factors all play a driving role in forest fire occurrence. However, infrastructure factors are not important for predicting forest fires in Jiangxi Province, which is consistent with previous studies. The results are consistent with previous studies [32, 38].

Meteorological and topographic factors are the most significant drivers of forest fires in Jiangxi Province, followed by human activity factors. This is consistent with previous studies [39]. Mean surface humidity and precipitation affects the water content of fuels, while high daily surface temperatures allow for elevated fuel temperatures and easier fire conditions [6, 40]. The significant latitude and longitude factors among topographic factors may be due to the fact that Jiangxi Province is a hilly area with highly variable topography and uneven topographic conditions in space, and slope and slope direction also have significant effects on forest fire occurrence. In this study, the vegetation factor had the least influence, and this paper speculates that the spatial variability of topographic and anthropogenic factors is greater, eroding the explanatory power of vegetation.

Jiangxi Province is rich in forestry resources, the development of forestry industry requires human productive activities in the forest, and the increase in population density and GDP thus promotes the occurrence of forest fires [41, 42]. The increase in population density and GDP has contributed to the occurrence of forest fires.

Previous studies of forest fire occurrence prediction models often use linear models, and some scholars used Ripley’s K(d) function to analyze the spatial pattern of fire occurrence in two different ecosystems in the Daxinganling and Fujian and used binary logistic models to construct forest fire occurrence models for these two study areas [43]. However, we found that the relationship between fire occurrence probability and forest fire drivers in Jiangxi Province is not a linear model, so it is more appropriate to adopt a nonlinear model. The nonlinear mapping is one of the greatest advantages of the SVM model. When fitting using the SVM model, a grid search is used to traverse all the predefined optional parameter ranges to find the optimal parameter combinations, score the parameters, and finally select the optimal parameters to optimize the kernel function. During the model training process, the SVM is committed to finding an optimal cut to binary partition the high-dimensional data, and there are so many vectors that the computation is slow and inefficient [44]. The SVM model has a strong model generalization ability, the classification effect is better than the logistic model, and the accuracy rate is improved.

Among the nonparametric models, MLP outperforms SVM models because deep learning models like MLP, compared to SVM, can learn deeper features after data input while minimizing information loss, and such features are generally more beneficial for classification tasks than features extracted by conventional machine learning [45, 46].

For forest fires in Jiangxi Province, the MLP has a strong prediction ability, and for forest fire fires in 2010–2016 years, the MLP model achieved 93.95% prediction accuracy and 91.73% prediction accuracy for 2017-2018 years. The overall independently verified prediction accuracy was low. We found a sharp increase in the number of forest fire points in 2013 and 2014, and through the annual weather change map, we found that the daily maximum surface temperature, average surface temperature, sunshine duration, average temperature, and daily maximum temperature in Jiangxi Province reached their peaks in 2013 and 2014, and the daily precipitation in these two years was also at the lowest level, resulting in low average humidity and minimum humidity, which promoted the forest fire fires in 2013 and 2014.

From the results of this study, it was found that the MLP model used to predict the occurrence of forest fire in Jiangxi Province had more excellent results compared to the logistic and SVM models, and in previous studies, the AUC value of the logistic model in Jiangxi Province was 0.68, which was much lower than the model in this study, probably because the number of fire points in this study was 14298, which was much higher than the 1037 times of Chen and Feifei fire points [47], and Ngoc Thach used MLP to predict the fire prediction accuracy of 83.8% for tropical forest fire hazard in Thuan Chau region, Vietnam. In contrast, the accuracy of the artificial neural network in this study was 93.46%; Nguyen et al. selected 10 driving factors, while 23 factors, including human activity factors, were selected in this paper, and the high-dimensional data brought more information to improve the accuracy obtained by model training [48].

The MLP model with fast iteration, high computational power, and high prediction accuracy has shown good prediction capability in forest fire studies [48]. However, as a “black box,” this method cannot calculate regression coefficients or confidence intervals. In addition, due to the limited number of sample points in this experiment, a larger sample size would provide more reliable model results.

5. Conclusions

In this study, the MLP model was developed using multiple influencing factors and forest fire sites in Jiangxi Province, and after training the model by isolating the driving factors, it was concluded that the main driving factors of forest fires in Jiangxi Province were meteorological factors and topographic factors. In addition, by comparing the MLP model with the commonly used binary logistic model and binary SVM model through the AUC graph, the AUC value of MLP is significantly higher than the logistic score of 0.933 and the SVM model of 0.974. In terms of accuracy, the prediction accuracy of MLP model is 93.46%, and the accuracy of SVM model is 92.16%. The accuracy of the Logistic model is 85.58%. The experimental results show that the MLP model has strong learning ability, fast calculation speed, and high prediction accuracy, which is more suitable for forest fire occurrence prediction in Jiangxi Province, and it is necessary to further expand the study area and add more influencing factors for how the changes of forest fire influence factors affect forest fire occurrence in the future.

Data Availability

The data used to support the study are included within the article.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Authors’ Contributions

KG and ZKF conceived and designed the experiments, SW performed the experiments, KG analyzed the data, and KG and ZKF wrote the main manuscript. All authors have read and agreed to the published version of the manuscript.

Acknowledgments

The authors would like to acknowledge support from the Beijing Key Laboratory for Precision Forestry, Beijing Forestry University, as well as all the people who have contributed to this paper. This research was jointly supported by the medium long-term project of “Precision Forestry Key Technology and Equipment Research” (no. 2015ZCQ-LX-01) and the Key R&D Projects in Hainan Province (ZDYF2021SHFZ256) and Natural Science Foundation of Hainan University, grant number KYQD(ZR)21115.