Abstract

Forest fires caused by different environmental and human factors are responsible for the extensive destruction of natural and economic resources. Modern machine learning techniques have become popular in developing very accurate and precise susceptibility maps of various natural disasters to help reduce the occurrence of such calamities. The present study has applied and tested multiple algorithms to map the areas susceptible to wildfire in the Mediterranean Region of Turkey. Besides, the performance of XGBoost, CatBoost, Gradient Boost, AdaBoost, and LightGBM methods for wildfire susceptibility mapping is also examined. The results have revealed the higher testing accuracy of CatBoost (95.47%) algorithm, followed by LightGBM (94.70%), XGBoost (88.8%), AdaBoost (86.0%), and GBM (84.48%) algorithms. Resultant wildfire susceptibility maps provide proper inventories for forest engineers, planners, and local governments for future policies regarding disaster management in Turkey.

1. Introduction

Forest fires are critical natural disasters with severe ecological, economic, and social consequences worldwide [1, 2]. In recent years, a significant increase in the frequency of wildfire incidents and the extent of affected areas has been observed, indicating the sternness of the problem. According to the European Forest Fire Information System (EFFIS), an area of 831.46 km2 in Turkey was affected by wildfires in 2019, nearly double that of 2018, while the figure reached 998.57 km2 in 2020 [3]. These statistics show that Turkey’s affected areas due to forest fires are increasing with time. Research reveals that anthropogenic factors [4, 5] and climate change [68] play a critical role in the frequency of wildfire occurrence and increase in affected areas. Wildfires are not only responsible for the mass destruction of forests but they also have several adverse effects on the natural environment, such as increased erosion risk [912], poor water quality [9], changes in land use [13], and elimination of wildlife [14]. Nevertheless, many Mediterranean plant species have evolved some form of adaptation mechanism for survival in a fire [15]. For instance, in the Mediterranean Region of Turkey, plants like Pinus brutia have become resistant to fire regimes [16].

The findings of previous literature disclose that multiple environmental and human factors may be responsible for triggering forest fires in the Mediterranean Region [17, 18]. Different related studies have used data from various environmental parameters like elevation, aspect, slope, vegetation, temperature, humidity, and wind, along with human parameters like distance to roads, distance to settlement, and population, to figure out the reasons for wildfires [1922]. A review of related literature shows no consistently used model to evaluate fire risk analysis. Weights and variables in modelling indices may differ for different regions as wildfires have specific characteristics worldwide [23, 24]. According to the relevant literature, many other models have been used in forest fire risk analysis [25, 26]. In various studies, a Geographic Information System (GIS) program is frequently used to process large datasets and produce useful fire susceptibility maps [2729]. In addition, the use of satellite images in a GIS-based program for forest fire risk analysis is empirically tested to produce more robust results [30, 31].

Statistical methods such as bivariate and multivariate analysis [32, 33], multiple linear regression [34, 35], and logistic regression [3639] have been widely used for forest fire modelling. In recent years, machine learning algorithms in forest fire risk analysis have also gained popularity [4044]. Rodrigues and De La Riva [45] developed random forest (RF), boosting regression trees (BRT), and support vector machine (SVM) algorithms in their study in the region covering almost the entire Spanish peninsula. It is concluded that the RF achieves the highest performance. Nelson et al. [46] compared CART, BRT, and RF in a study conducted in British Columbia, Canada, and found that the best performing model was BRT, followed by CART and RF. Analysis of related studies reveals that selecting an appropriate model for forest fire risk mapping is challenging as each algorithm’s results vary from region to region [47].

Machine learning techniques have been applied and tested extensively in many empirical studies for developing susceptibility maps and accurate predictions of various natural calamities [48]. For example, Ma et al. [49] used an extreme gradient boosting (XGBoost) method for flash flood risk assessment in Yunnan Province of southwest China. The XGBoost method successfully identified the relationships between selected factors and flash flood events and outperformed the comparative LSSVM_RBF models. A study on landslide susceptibility mapping in Ulus district of Bartın Province in the Western Black Sea Region of Turkey [50] compared four new gradient boosting algorithms named gradient boosting machine (GBM), categorical boosting (CatBoost), extreme gradient boosting (XGBoost), and light gradient boosting machine (LightGBM). The accuracy results revealed the highest predictive capacity of the CatBoost model.

On the contrary, the RF method was found to have the lowest predictive ability compared to ensemble methods, Wu et al. [51] created a map of landslide susceptibility using an alternative decision tree (ADTree) in Longxian County (Shaanxi Province, China). Additionally, they used new GIS-based ensemble techniques, including ADTree with bootstrapping (Bagging), adaptive boosting (AdaBoost), and ADTree. The outcome showed that the ADTree-AdaBoost model had the best results. For the Three Gorges Reservoir area in China, Chen et al. [52] created a landslide susceptibility map using three advanced machine learning methods of gradient boosting decision tree (GBDT), random forest (RF), and information value (InV) models. Among these compared models, the GBDT method showed the highest accuracy. Can et al. [53] used the extreme gradient boosting (XGBoost) method for landslide susceptibility mapping of the Atatürk Dam upper basin in Turkey. The performance of the XGBoost algorithm was found to be high in various metrics. In the Wanzhou section of the Three Gorges Reservoir area (China), a landslide susceptibility map was developed using the weighted gradient boosting decision tree (weighted GBDT) model by [54]. The logistic regression (LR) model and gradient boosting decision tree (GBDT) model were also used for comparison in the study. The results showed that the weighted GBDT model had the highest accuracy, followed by the GBDT and LR models. However, the weighted GBDT and GBDT models produced very similar results.

Dang et al. [55] used AdaBoost, XGBoost, RF, and multilayer perceptron (MLP) machine learning algorithms to identify commercial buildings at high fire risk in the Humberside area, UK. The results revealed that AdaBoost’s performance was better than other algorithms. In the Yunnan Province of China, Zhou et. al. [56] applied the CatBoost algorithm, for the risk estimation of forest fires. The analysis was made using five forest fire risk factors, and it was seen that the model and the actual fire points overlapped. Rosadi and Andriyani [57] compared the AdaBoost algorithm with decision tree and SVM method to predict the occurrence of forest fires. The study explained that fuzzy c-means clustering and AdaBoost methods provided good results in predicting forest fires. Michael et al. [58] used two satellite-derived measurements (NDVIW and NDVIT) in three machine algorithms (LR, RF, and XGBoost) to improve fire risk mapping. The research conducted in a region of Greece determined that XGBoost model produced the best results.

The review of recent studies reveals the application and performance of various algorithms to develop susceptibility maps of different natural disasters. However, most of these machine-learning methods are used in landslide susceptibility mapping. A literature gap is observed as very few studies have used and applied machine learning techniques to produce wildfire susceptibility maps. In addition, this research work is in the interest of and on the side of the United Nations’ sustainable development 2030 agenda. The present study aims to apply and test various algorithms to map the areas susceptible to wildfire in the Mediterranean Region of Turkey. The formulation of susceptibility maps is essential to locate the areas prone to wildfire and take the necessary measures to avoid any mishaps in the future. The main contribution of this article is to evaluate the performance of XGBoost, CatBoost, Gradient Boost, AdaBoost, and LightGBM methods for wildfire susceptibility mapping. To our knowledge of previous literature, none of the earlier studies investigated these models’ performance for wildfire susceptibility mapping. Therefore, finding the best performance model for wildfire susceptibility maps will help improve policymaking in the future to reduce the risks of forest fires.

2. Material and Methods

2.1. Selected Study Area

The study area includes five provinces of the Mediterranean Region of Turkey that were declared “disaster areas” by the Disaster and Emergency Management Presidency (AFAD) on July 31, 2021, during the recent wildfire incident in Turkey. These provinces include Muğla, Antalya, Mersin, Adana, and Osmaniye, stretching along the Taurus Mountains and the Mediterranean coastal belt south of Turkey. The study area extends from 36°1′14.884″ N to 38°24′8.919″ N latitude and 27°12′18.892″ E to 36°42′19.828″ E longitude (Figure 1).

The selected area remains under the influence of the Azores’ High pressure in summer, while, in winter, typical climatic characteristics of the confluence of northern polar air mass and southern tropical air mass dominate the region [59, 60]. The area generally has a typical Mediterranean climate with hot and dry summers and mild, rainy winters. The altitude reaches approximately 3500 meters from the seacoast. The annual temperature varies from 12°C to 20°C, the annual precipitation average is 400–1200, and the relative humidity averages about 53–69 percent (http://www.mgm.gov.tr). In addition, prevailing dry and continuous north winds are considered highly responsible for forest fires, especially during fire periods. The area has the highest number of endemic plants in Turkey due to the limestone-covered lands and climatic conditions suitable for karstification [6164].

The selected area extends over 66,014.26 km2, of which approximately 54% is covered by forests (Table 1). The area is divided into different ecological regions based on the increase in altitude from the coastline. Some leading plant associations and forests in the study area are Quercus coccifera, Olea europea, Arbutus andrachne, Laurus nobilis, Ceratonia siliqua, Pinus brutia, Pinus nigra, Abies cilicica subsp. cilicca, Abies cilicica subsp. isaurica, Cedrus libani, Juniperus excelsa, Juniperus foetidissima, Quercus libani, Quercus infectoria, and Quercus cerris. Pinus brutia is a dominant species in the region with high fire sensitivity [65, 66].

2.2. Historical Forest Fires

The preparation of a historical forest fire inventory based on different sources (satellite images, fieldwork, historical archives, etc.) is the first step in modelling the susceptibility of forest fires [41, 6770]. In this research, 3256 samples of past forest fire events were determined. The historical forest fire dataset was generated using data from NASA’s Fire Information for Resource Management System (FIRMS) (https://earthdata.nasa.gov/firms) and NASA’s Earth Observing System Data and Information System (EOSDIS). The dataset of forest fire events was produced using near real-time (NRT) moderate resolution imaging spectroradiometer (MODIS), thermal anomalies/fire locations with 1 km spatial resolution from the Terra and Aqua platforms. Furthermore, the projection of this data is WGS84 and is known as MODIS Collection 61. The historical scope of the dataset covers the period from April 2021 to August 2021, as the area was declared “Disaster Areas Affecting General Life” by AFAD. According to the dataset, the majority of forest fire events occurred in Muğla in August and Antalya in July. Nevertheless, almost no forest fire events occurred in Mersin in April and Osmaniye in August as shown in Figure 2 and illustrated in Table 2.

2.3. Forest Fire Conditioning Factors

The possibility of a wildfire occurrence is based on the environmental conditions of any forest area. Environmental conditions can be topographical, climatic, vegetation-related, and human-related. These classes are known as wildfire conditioning factors and are essential to generate a final susceptibility map [71]. Thirteen wildfire conditioning factors including elevation, slope degree, slope aspect, Topographical Wetness Index (TWI), annual mean temperature, annual mean relative humidity, annual mean wind speed, land use, distance from water bodies, distance from residential areas, distance from roads, Normalized Difference Vegetation Index (NDVI), and land surface temperature (LST), were selected and produced in the GIS framework as geospatial database (Figures 35).

The LST is a critical indicator for a wide range of research topics. It represents the interaction of the atmosphere with the surface as well as the energy flow between them. The LST has been calculated using thermal infrared data from polar orbit satellites and geostationary satellites. Various algorithms have been developed to overcome external influences and retrieve LST data with high accuracy. In this research, the LST data from MODIS (moderate resolution imaging spectroradiometer) was calculated based on the generalized split-window (GSW) algorithm of [72]:

In the formula, Ts represents LST. , ( represent land surface emissivities in channels ). are top of atmosphere (TOA) brightness temperatures measured in channels . for MODIS data. are coefficients obtained from simulated data.

All the spatial variables are represented based on WGS 1984 Mercator coordinate system. Raw datasets were obtained from various data sources as shown in Table 3.

Initially, the digital elevation model (DEM) was acquired from ASTER (advanced spaceborne thermal emission and reflection radiometer) as the GDEM (global digital elevation model) with a 30 m spatial resolution. Topographical variables such as elevation, slope, aspect, and TWI (topographic wetness index) were generated from DEM. The following equation was used to generate the TWI:

In the equation, As indicates the specific catchment area (m/m2) and “β” shows the angle unit of the slope degree.

Climatic elements observations for an extended period were obtained from MGM (Directorate General of Meteorology) for each meteorological station. Selected climatic variables of annual mean temperature, annual mean relative humidity, and annual mean wind speed were joined to station locations in the GIS environment. Afterwards, the database of the variables was produced using IDW (inverse distance weighted) interpolation method. The vector-based data were acquired from the Turkish Ministry of Agriculture and Forestry to generate land use variables, distance from water bodies, and residential areas. Data were processed in GIS with tools such as rasterization for land use and Euclidean distance for the proximity of water reservoirs and settlements. The road network data was downloaded from OSM (open street map) to produce distances from roads using the Euclidean distance tool in the GIS. All the variables were represented at the same spatial resolution (30 m). Contrariwise, the spatial variables obtained from MODIS (moderate resolution imaging spectroradiometer) were in different spatial resolutions, e.g., the resolutions of NDVI and LST were 250 m and 1 km, respectively.

2.4. Boosting Algorithms

(1) Gradient Boosting Machine (GBM). Gradient boosting machines (GBMs) are one of the unique machine learning algorithms that have shown significant success in many types of research. It was produced with a formula based on the gradient descent of the boosting methods to establish a statistical association within the studies. These boosting methods and related algorithms were named gradient boosting machines [73, 74]. In GBMs, the learning system is concerned with sequentially fitting new models to ensure that the resulting response variable is a more accurate estimate. The main goal of the algorithm is to construct new base learners in such a way that they are maximally correlated with the negative gradient of the ensemble-related loss function. In this aspect, GBMs have a significant record of success both in practical applications and in various machine learning and data mining challenges [75].

(2) Extreme Gradient Boosting (XGBoost). XGBoost is a gradient-boosting algorithm that combines weak learner predictions to get stronger learner predictions. This aspect has been frequently used by data scientists in research lately for better results [76]. In the XGBoost algorithm, CART acts as the base classifier. The input sample of the following decision tree and the training and prediction results of the previous decision tree are associated with each other and decided jointly. In addition to solving regression and classification problems, it is a flexible algorithm according to its intended use [49].

(3) Light Gradient Boosting Machin (Light GBM). LightGBM is a kind of gradient boosting decision tree. This algorithm is mainly utilised in classification, sorting, and regression. LightGBM uses a histogram-based algorithm to increase computational speed and reduce complexity. It supports algorithms such as GBM, GBDT, GBRT, and MART, and its accuracy and efficiency are pretty high [77, 78]. In LightGBM, gradient-based one-side sampling (GOSS) is one of the methods used to calculate information gain, so that less-trained instances contribute more to information gain [79]. With these aspects, Light GBM provides quick practice and more extensive performance, low memory usage, good accuracy, support of GPU learning, and capacity to process large-scale data [80].

(4) Categorical Boosting (CatBoost). Categorical boosting (CatBoost) was developed by [81]. CatBoost is a GBDT application in machine learning. This algorithm has two important features: ordered target statistics and ordered boosting. CatBoost is a good algorithm for solving complex data in a problem. However, it may not be very suitable for solving problems that are not too complicated [82]. Another major feature of CatBoost in problem solving is that it captures high-degree dependencies and uses combinations of categorical features [83].

(5) Adaptive Boosting (AdaBoost). Freund and Schapire developed the AdaBoost algorithm in 1995, whose weight can be regulated without the learner’s requirement for prior knowledge [84]. Freund and Schapire developed the algorithm to solve the multiclass problem when there is a wide category in 1997 [85]. Since AdaBoost is an adaptive algorithm, it is among the most common boosting algorithms. In addition, AdaBoost is straightforward to use and practical in solving problems. It usually gives very effective results [86].

2.5. Accuracy Evaluation

To assess the performance of the LightGBM, GBM, XGBoost, AdaBoost, and CatBoost algorithms in forest fire susceptibility modelling, we used the receiver operating characteristic (ROC) curve as the first stage. ROC illustrates the true positives (TP), true negatives (TN), false positives (FP), and false-negative (FN) samples in machine learning models [41, 87]. After that, all the models were evaluated and compared with statistical methods such as overall accuracy (equation (3)), precision (equation (4)), recall (equation (5)), sensitivity (equation (6)), specificity (equation (7)), F1 measure (equation(8)), Kappa Index (equation (9)), and area under the curve (AUC):

3. Results and Discussion

3.1. Importance of Wildfire Conditioning Factors

The importance degree of the wildfire conditioning factors used in the present study is given in Figure 6. It is observed that the importance of the elements had the same ranking in all the models. Wind speed was observed as the most important factor in all the models, followed by humidity, temperature, LST, distance from water bodies, distance from residential, elevation, slope, land use, NDVI, distance from roads, TWI, and aspect, respectively. The AdaBoost model ignored the effects of some factors such as land use, NDVI, distance from roads, TWI, and aspect, while the GBM model disregarded the impact of the aspect.

3.2. Wildfire Susceptibility Models

In this study, we evaluated the predictive performance of five machine learning algorithms including LightGBM, GBM, XGBoost, AdaBoost, CatBoost in wildfire susceptibility mapping. The prediction performance of all models was compared with each other. Thirteen conditioning factors were prepared to analyze the wildfire susceptibility of the study area. Afterwards, all the factors were extracted to a total of 6292 target points, including historical forest fire points (3256) and nonfire points (3036). Also, target points were separated as binary into 0 for nonfire and 1 for historical fire samples. The models were trained with these sample points as the input dataset. The input dataset was divided into 70% for training and 30% for validation. After the analysis, maps of the results of all selected models were produced in ArcMap. The resultant wildfire susceptibility maps were classified into five classes: very low, low, medium, high, and very high, using the natural breaks classifier (Figure 7).

The spatial distribution of forest fire classes according to the models is presented in Table 4 and Figure 8. It has been observed that the high and very high susceptibility classes share the total area of 25%, 11%, 23%, 78%, and 10% with the methods of XGBoost, CatBoost, GBM, AdaBoost, and LightGBM, respectively.

3.3. Evaluation and Comparison of the Wildfire Susceptibility Models

Statistical evaluation of the selected models is presented in Table 5. Accordingly, all the models have shown high and acceptable accuracy in training and testing scores. The training scores are found to be higher than the testing scores in all the models, validating that all the models have avoided the overfitting problem.

According to the statistical measure evaluations, the testing scores reveal the better performance of CatBoost algorithm than the other models followed by LightGBM, XGBoost, AdaBoost, and GBM algorithms. Overall accuracy scores demonstrate that the CatBoost model correctly classifies the samples with 95% accuracy. Also, the CatBoost model has a more relevant sampling rate of 0.951 in precision and 0.954 in recall, with higher accuracy compared to the other models. According to F1 scores, the performance of the precision and recall measurements shows that the CatBoost (0.952) model reached higher accuracy, and it was followed up by the LightGBM (0.936), XGBoost (0.874), AdaBoost (0.864), and GBM (0.827). In addition, according to the specificity scores, the CatBoost model is better in classifying the TN samples more correctly, with a value of 0.954. The other models also perform well in classifying the TN samples (LightGBM 0.939, XGBoost 0.885, AdaBoost 0.873, and GBM 0.838). In classifying TP samples, the CatBoost and LightGBM models revealed equal performance with a value of 0.956. In contrast, the other models, such as XGBoost, GBM, and AdaBoost showed sensitivity scores of 0.892, 0.853, and 0.846, respectively. The Kappa Index shows a more balanced distribution of CatBoost model in classified samples with a value of 0.909. Moreover, the AUC values indicate that the CatBoost model has a 0.955 testing score, and it was followed by LightGBM, XGBoost, AdaBoost, and GBM algorithms with 0.948, 0.888, 0.859, and 0.846 AUC values, respectively (Figure 9).

The algorithms used in the present study have been applied in many predictive mapping studies of various natural processes because of their solid and separative prediction performance as an alternative to traditional statistical and machine learning methods. The results of the present study are parallel to many previous studies which compare the efficiency of different algorithms in susceptibility maps. Sahin [88] found that CatBoost model is superior in predicting landslide susceptibility areas in the Bolu region of Turkey. Similarly, Saber et al. [89] also appreciated CatBoost and LightGBM algorithms in flash flood susceptibility. Zhou et al. [56] proposed a fire prediction model using the CatBoost algorithm for Yunnan Province in China. The model that used inputs such as vegetation, meteorological, terrain, and human factors reached 0.83 AUC value. The results indicated that the CatBoost model effectively predicted the risk of forest fire occurrence.

The CatBoost algorithm is not tested extensively in previous studies on wildfire susceptibility despite its higher performance and accuracy in determining susceptibility maps. Hakim et al. [90] employed only AdaBoost and LogitBoost algorithms, while Arabameri et al. [91] tested VIKOR and Cforest models to create land subsidence susceptibility maps. Rosadi and Andriyani [57] applied the AdaBoost algorithm to predict the forest-fire occurrence and compared it with classical classification methods such as SVM (support vector machine) and decision tree methods. Michael et al. [58] investigated long-term vegetation condition effects on wildfire risk mapping by applying RF, logistic regression, and XGBoost methods. Similarly, Dang et al. [55] used random forest (RF), multilayer perceptron (MLP), XGBoost, and AdaBoost to develop an annual fire prediction model for the Humberside area in the United Kingdom. The researchers reported that the AdaBoost algorithm outperformed compared to the other models. However, they did not use CatBoost model in their study, which has yielded more robust results in many previous studies, including the present research.

Hence, to the best of the authors’ knowledge, no previous research has compared wildfire susceptibility mapping using the ML methods such as XGBoost, CatBoost, GBM, AdaBoost, and LightGBM. The present study’s results help open further research by testing more innovative techniques in machine learning to enhance the accuracy of wildfire susceptibility maps.

4. Conclusion

Wildfires are one of the most dangerous natural hazards for forest areas and habitats. Due to global warming, the frequency of forest fires has increased in the last decades, especially in the Mediterranean climate zones. Predictive susceptibility mapping for wildfires is an effective tool for planners and managers to prevent and protect against the undesirable effects of wildfires. The reliability of the susceptibility maps differs from its input parameters and the methodology used. In recent years, ML-based predictive mapping studies have been increasing rapidly and gaining the trust of researchers. In the present study, we compared state-of-the-art ML algorithms such as XGBoost, CatBoost, GBM, AdaBoost, and LightGBM to produce wildfire susceptibility mapping for the Mediterranean Region of Turkey. To our best knowledge, no study compared these algorithms before in wildfire susceptibility mapping literature. For the analysis, the thirteen input parameters were used: elevation, slope degree, slope aspect, TWI, temperature, humidity, wind speed, land use, distance from water bodies, distance from residential, distance from roads, and NDVI LST. According to the order of importance of the factors, while wind speed is the most crucial factor, the aspect is the least important in all the models. After producing the susceptibility maps, statistical accuracy assessment techniques such as overall accuracy, precision, recall, sensitivity, specificity, AUC, F1 score, and Kappa Index were applied. The results showed that the CatBoost algorithm had higher accuracy than the other models, followed by LightGBM, XGBoost, AdaBoost, and GBM algorithms. However, all the models have revealed reasonably good AUC measurement performance: 0.955, 0.888, 0.859, and 0.846, respectively.

The present study has several limitations. First, the spatial interpolation technique produced wind speed, humidity, and temperature factors. The interpolation technique has disadvantages such as dependence on sample locations, generalization, and ignoring the geomorphological conditions. Therefore, some parameters may not fully reflect the actual climatic conditions of the selected area. Second, the present study does not depict human-induced factors of wildfire due to the absence of spatial dimension of such data. Third, the present study only provides an ML-based evaluation to give some idea about wildfire-sensitive areas in the study area. Therefore, future wildfires should be followed and examined, considering the maps produced in this study. One of the main limitations is the contradiction between the large-scale areas and the required details for the modelling. Therefore, explaining the factors during the model development becomes more complicated due to the missing details. In addition, the requirement of high computation capacity made the modelling challenging to deal with all the available resolutions.

The present study is considered novel research in producing hotspots and wildfire susceptibility maps of Turkey’s Mediterranean Region. The maps are expected to provide valuable inventories for forest engineers, planners, and local governments for future policies regarding disaster management in Turkey. Besides, the present study also provides a comparative analysis of relatively new ML algorithms such as XGBoost, CatBoost, GBM, AdaBoost, and LightGBM for wildfire susceptibility mapping research. Suggestions for future research include comparing the methods used in this study with other statistical and ML-based methods and using different input parameters in the models. This research recommends future research to use the cloud computing platforms such as Google Earth engine, Google Colab, Amazon AWAS, and Kaggle for performing the modelling of wildfire susceptibility mapping.

Data Availability

All the relevant data are included in the article.

Disclosure

This work was performed as part of the duties of the authors as well as a collaboration between the International College for Engineering and Management (ICEM) and Karabuk University (KBU).

Conflicts of Interest

The authors declare that they have no conflicts of interest.