The survival of humanity is dependent on the survival of forests and the ecosystems they support, yet annually wildfires destroy millions of hectares of global forestry. Wildfires take place under specific conditions and in certain regions, which can be studied through appropriate techniques. A variety of statistical modeling methods have been assessed by researchers; however, ensemble modeling of wildfire susceptibility has not been undertaken. We hypothesize that ensemble modeling of wildfire susceptibility is better than a single modeling technique. This study models the occurrence of wildfire in the Brisbane Catchment of Australia, which is an annual event, using the index of entropy (IoE), evidential belief function (EBF), and logistic regression (LR) ensemble techniques. As a secondary goal of this research, the spatial distribution of the wildfire risk from different aspects such as urbanization and ecosystem was evaluated. The highest accuracy (88.51%) was achieved using the ensemble EBF and LR model. The outcomes of this study may be helpful to particular groups such as planners to avoid susceptible and risky regions in their planning; model builders to replace the traditional individual methods with ensemble algorithms; and geospatial users to enhance their knowledge of geographic information system (GIS) applications.

1. Introduction

Wildfires, alternatively termed forest fires, bushfires, woodland fires, and vegetation fires, boosted by wind and high summer temperatures, are able to destroy entire forests faster than they can be brought under control [1], causing irreversible, incalculable environmental, economic, and social damage [2]. Wildfires cause direct forest degradation [3]. Like the Australian wildfires 2020 [4] which a wide variety of forest flora [5] and forest species [6] were destroyed within a very short period of time. Soil nutrients loss is a long-lasting effect [7, 8], which wildfires can bring into a region. Ecosystems and biodiversity [9] such as bird nesting and habitats [10, 11] are also so vulnerable to wildfire phenomena. Destroying watersheds [12] and reducing water quality [13, 14] are destructive impacts of this disaster. Last but not least, impacts on human settlements and health [15, 16] can be considered as nonreturnable negative influence of wildfire disaster.

A fundamental requirement in natural hazard management is to accurately locate wildfire endangered regions [17], meaning that to find the areas that have the highest potential for future wildfire occurrence. Throughout proper natural hazard management, wildfire can be controlled and damages are minimized [18]. In fact, assessing the susceptibility of a locality to wildfire occurrence is based on a specific correlation among the historical wildfire events and its related causing factors such as topographical, hydrological, and geological [19].

Numerous approaches and algorithms have been used for wildfire hazard mapping [20]. Recent studies have utilized remote sensing (e.g., aerial photos, LiDAR data, and signals) and thematic maps either directly or indirectly in conjunction with geospatial information systems (GIS), and they have the potential to support assessments of wildfire risk from a variety of aspects such as fuel load [21, 22], burn severity [23], and intensity measurements [24, 25]. Probability and susceptibility are foundational in the field of wildfire research [26], being essential for risk, vulnerability, response, and safety studies [27]. As a practical example, wildfire spatiotemporal distributions can be derived from the susceptibility over a period [28], in order to establish trends, which can be monitored and projected into the future. The existing methods used in wildfire probability mapping cover a variety of algorithms. For instance, qualitative analytical hierarchy process (AHP) and Mamdani fuzzy logic (MFL) methods were used by Pourtaghi et al. [18]. Their outputs denoted that the qualitative analysis might not be accurate as it was a knowledge-based approach and differed from a person to person. Linear and quadratic discriminant analysis, frequency ratio (FR), and weights-of-evidence (WOE) were used together with thirteen causative factors in a research by Hong et al. [27], and the area under the curve (AUC) for the forest fire susceptibility mapping did not exceed 82.2%. Jaafari et al. [29] used five decision tree-based classifiers in the wildfire mapping and reported a high level of performance (). However, decision tree-based models are often computationally expensive in modeling and sensitive to training the big data [30]. FR which is known as a simple and popular statistical algorithm was also utilized in mapping the wildfire hazard [31]. In this study, however, FR performance () was less effective compare to Shannon’s entropy model (). Support vector machine (SVM) as another popular algorithm was used by Tien Bui et al. [32] to detect the most wildfire susceptible areas in the Cat Ba National Park area (Vietnam) resulting to an AUC of 87.5%. Pourghasemi et al. [33] produced the wildfire susceptibility maps based on evidential belief function (EBF) and binary logistic regression (BLR) models. The validation of the result illustrated the outperformance of EBF () over BLR (). Wildfire susceptibility mapping using sixteen conditioning factors, Gholamnia et al. [34] exploited machine learning (ML) methods (e.g., artificial neural network (ANN), dmine regression (DR), data mining (DM) neural, least angle regression (LARS), multilayer perceptron (MLP), random forest (RF), radial basis function (RBF), self-organizing maps (SOM), SVM, and decision tree (DT)) and reported the highest (88%) and lowest accuracy (65%) for RF and logistic regression (LR), respectively. Kalantar et al. [35] mapped the forest fire susceptibility using three ML algorithms, namely, multivariate adaptive regression splines (MARS), SVM, and boosted regression tree (BRT) with resampling techniques in the training phase. They reported the resampling process enhanced the modeling and BRT with an AUC of 91% outperformed others. In this context, several ML methods, for example, DT, have an inherent computational complexity, requiring a number of preanalysis stages and significant processing time [36]. Although the aforementioned studies acquired satisfactory AUC, all the AUC values were less than 91% (majority between 74%-83%), and it motivated us to investigate other algorithms and pursue the higher accuracy for wildfire prediction.

It has been proven by some other researches such as Brun et al. [37] and Podschwit et al. [38] that ensemble and multimodel approaches might lead to much more accurate results. Zhou [39] stated that ensemble modeling offers a state-of-the-art learning approach, which has become a focus of modeling research since the 1990s and has been shown to produce results that are considerably more precise than using a single method [4043]. A study by Jaafari et al. [44] was undertaken to examine and compare four hybrid (artificial intelligence) methods against a single model in mapping the wildfire probability in the Hyrcanian ecoregion, Iran. Their finding proved up to18% increase of modeling accuracy using hybrid models rather than a single model. It is apparent that any individual method, whatever its advantages, has limitations. In ensemble modeling by appropriate selection of two methods, it trains multiple algorithms and subsequently combines them for analysis [45], then one can reduce or eliminate the other one’s limitations, and vice versa [46]. Hence, in the present study, the ensemble model was proposed to improve the modeling and performance for higher accuracy.

Alternatively, EBF is capable of fast data processing without preassumptions [47]. Applied to wildfire susceptibility mapping, a bivariate statistical analysis (BSA) approach would be based on the comparison of a wildfire inventory map as a dependent variable and a single input influencing map (geology/wildfire, aspect/wildfire, altitude/wildfire, etc.) [48]. In execution, the spatial correlation between wildfire inventory locations and each class of each wildfire influencing factor would be measured. For instance, the weights derived for the geology factor represent the impact of each geology type on wildfire occurrence in the region. In addition, multivariate statistical analysis (MSA) only assesses the impact of factors on wildfire occurrences, rather than the influence of each class. Using an ensemble modeling approach, both the impact of classes and separate factors can be assessed in a single integrated analysis. EBF and IoE are classified as BSA approaches and extract the impact of each class of every conditioning factor. Among various ways to perform LR analysis, it is able to evaluate the impact of the factor itself on the wildfire event irrespective of class impacts. Thus, here, the ensemble analysis has the potential capability of producing more reliable and accurate outcomes compare to an individual algorithm. Although ensemble modeling has been utilized in the wildfire domain, there is a range of other techniques that have not been tested in ensemble analysis yet. The research literature indicates applications of individual EBF, index of entropy (IoE), and LR modeling, yet their comparability and application in ensemble modeling remain untested, in refining the derived wildfire susceptibility maps, as far as we can ascertain.

For this purpose, an ensemble approach to wildfire modeling using IOE, EBF, and LR algorithm was introduced and examined. The three algorithms were selected on the basis of their relatively quick execution and comprehensibility, as well as the fact that they do not require specific dedicated software [49]. To address the important factors in bushfire occurrence in the Brisbane Catchment, Australia, we evaluated and ranked the initial fourteen causative factors (i.e., altitude, slope, aspect, curvature, topographic wetness index (TWI), topographic position index (TPI), rainfall, geology, soil, land use land cover (LULC), distance from rivers, distance from roads, wind, and normalized difference vegetation index (NDVI)). Especially, the study area faces midsummer heatwave triggering fire conditions, and this study could enlighten the sources of hazard for decision makers to protect the species threatened with extinction [50]. By producing more reliable susceptibility and risk maps, this study would assist in wildfire management, forestry, and strategies to local residents. We believed that combined into an ensemble method, the accuracy can exceed the individual outputs.

2. Study Area

The study area (the Brisbane catchment, Australia) is located between 153°129.212E 27°1740.095S and 152°2231.144E 27°567.549S (Figure 1), and its LULC is mainly cropping, plantation forestry, and urban and rural areas. Its climate is warm with two seasons, a dry winter and a hot humid summer. Average temperature ranges from 9 to 12°C and 21 to 29.8°C, respectively. The ecoregion of the study area is temperate broadleaf and mixed forest. However, temperate forests experience a wide range of variability in temperature and precipitation. In regions where rainfall is broadly distributed throughout the year, deciduous trees mix with species of evergreens. Species such as Eucalyptus and Acacia typify the composition of the temperate broadleaf and mixed forests in Australia. In Australia, the temperate forests stretching from southeast Queensland to South Australia enjoy a moderate climate and high rainfall that give rise to unique eucalyptus forests and open woodlands. This biome in Australia has served as a refuge for numerous plant and animal species when drier conditions prevailed over most of the continent. That has resulted in a remarkably diverse spectrum of organisms with high levels of regional and local endemism. Recently, record-breaking temperatures and extreme events such as drought caused devastating wildfire across Australia, destroying million acres of species and threatening human life (https://www.bloomberg.com/graphics/2020-australia-fires/), which globally ranks Australia as the most prone country to wildfires [51]. Since there is a very high possibility of wildfire danger in the dry season, we used geographic data on the extent of the wildfires that occurred from 2011 to 2019. Figure 1 shows the study area along with the inventory of the extent of wildfires. Brisbane, the capital of Queensland, is in the southeastern corner of this state and is one of the predominant wildfire regions in Australia.

3. Methodology

The mapping algorithms for wildfire susceptibility were applied both individually and as an ensemble in this study. The stepwise methodology flowchart in Figure 2 illustrates different stages of this research. To achieve the primary aim of the study (ensemble modeling), the first four steps were implemented. Subsequently, the outcomes of these steps were entered into the last stage to perform the secondary goal of wildfire risk mapping. The analysis started with a random selection of forest fire inventory points and will be explained in Section 3.1.1. The training dataset, as the initial input, was utilized in both methods of IoE and EBF, in order to evaluate its correlation with influencing factors using the two methods. For the second input to the BSA analysis, a set of conditioning factors was used (Section 3.1.2). Section 3.2 describes the use of multicollinearity and Pearson’s correlation analysis to eliminate some of the factors from the dataset to avoid redundancy. In the third step, the BSA was undertaken using both the IoE and EBF methods, and their final susceptibility map was produced using MSA. Area under curve (AUC) technique was used to evaluate the reliability of the outcomes using the testing dataset (30%) (Section 3.6). Subsequently, as illustrated by the dashed arrow in the flowchart, the derived BSA weights were used in ensemble with the LR algorithm. The ensemble analysis was used to produce the final wildfire susceptibility map. Thereafter, the secondary goal of the study was initiated. The most susceptible wildfire class was overlaid on several vulnerability maps derived from different sources. The aim is to show that the risk map can be varied based on the application and aim of the analysis.

3.1. Data Used

An accurate wildfire-influencing factors dataset together with precise detection of the wildfire-ravaged locations are critical for probabilistic wildfire susceptibility analysis. Both dataset’s precision has direct impact on the final outcomes [52]. The characteristics, sources, and descriptions of each dataset will be described in the following subsections.

3.1.1. Wildfire Inventory Dataset

Susceptibility analysis can be undertaken through the assessment of similar past events and their causative factors. A range of sources, such as in situ mapping, historical records, reports, remote sensing, and aerial photos can be used to prepare the inventory dataset [53]. In this research, the wildfire inventory dataset was compiled by the Australian Bureau of Agricultural and Resource Economics and Sciences (ABARES) for the National Forest Inventory (NFI). The raw data was delivered in vector format. Each polygon contained details of the location, date, and size of the burnt areas (Table 1). The inventory dataset covers the wildfire records from 2011 to 2019. According to Table 1, the year 2016 had the highest incidence of wildfire covering 674,072 sqm. In this region, most of the wildfire occurrences were located in the far north and the northeast (Figure 1). Since the inventory data was in polygon format, a random point selection technique was applied, however, with few innovations as described below.

Unlike most of the previous studies [5456], inventory points were not selected in terms of the whole basin as this would have overlooked the size of the burnt areas. As a preparatory step, four areas of interest were defined (Figure 3) around the inventory regions.

In the next step, the total area of the wildfire polygons (the centroid of the fire) in each zone was measured. Finally, wildfire random points, with respect to their areas, were selected as listed in Table 2. We aimed to choose a total of 300 inventory points. So according to the total area of each zone, the specific numbers of points were derived from them. For instance, in zone 1, the percentage of the forest fire areas with respect to the whole wildfire areas was 12%. Twelve percent of 300 points would be 36 points, which have been randomly derived from this zone. For each fire event (polygon), a buffer zone was generated to avoid marginal fire region. Consequently, 300 nonfire samples were randomly extracted from the remaining areas for modeling purpose.

Our training and testing datasets were created using the space robustness technique, which divides the data into two categories without considering the dates of the events [42].

Once 300 inventory points were acquired, 300 nonfire points were compiled and the data was divided by random selection from the total inventory points according to the standard 70% training, 30% testing proportion [5759].

3.1.2. Influencing Factors

Pourtaghi et al. [18] offers a good review of these factors. Since there is no accepted framework for dataset creation, many studies rely solely on data availability, expert knowledge, and literature [33, 60]. In this study, the primary influencing factors were selected by the traditional literature-based approach. Prior to the main analysis, a statistical multicollinearity analysis was performed on the selected factors (Section 3.2). Our initial selected influencing factors dataset consisted of altitude, slope, aspect, curvature, TWI, TPI, rainfall, geology, soil, LULC, distance from rivers, distance from roads, wind, and NDVI. These factors are the most cited and relevant according to the literature [18, 61, 62]. The factors were drawn from a variety of sources, which will be mentioned accordingly. A raw dataset was used to derive the primary influencing factors. This comprised (a) a digital elevation model (DEM) with 5-meter spatial resolution (produced from LiDAR data) was used to compute altitude, slope, aspect, curvature, TWI, and TPI; (b) soil map (1 : 250,000 scale) and geology map (1 : 100,000 scale) were obtained from the CSIRO and Australian government websites; (c) Landsat imagery which was used to provide NDVI map; (d) roads and rivers networks; and (e) rainfall and wind information from the meteorological stations. Through proper methods and conversions, each factor was prepared and imported to the GIS environment.

Topography is one of the most influential factors in wildfire occurrence. Precipitation, temperature, sun exposure, and wind are all related to the topography of the locality [63]. Topography can affect wildfire in different ways. Slope, aspect, and altitude influence solar radiation levels [64] and impact on the fuel moisture content [65]. The wildfire spread direction is often determined by topographical factors and wind [66]. Topography, fuels, and climate are recognized as the three main elements in wildfire creation, spread, and intensity [67], with topography the most stable factor.

Altitude influences the wildfire behavior by affecting the extent and timing of precipitation, seasonal drying of fuel, and wind [68]. Higher temperature and lower rainfall in lower lands cause fuels to dry faster. Slope affects fuel preheating and, thus, the rate and direction of spread [69]. Fuel preheating can be affected by the slope. Sharp slopes preheat and dry upslope fuels, causing faster combustion [70]. Therefore, during the wildfire event, slope defines the direction of the spread [71]. Slope position and degree are both important factors in the extent of wildfire spread. Usually, the largest wildfires are initiated at the base of the slope. Additionally, fires tend to spread faster up a slope than down one [72]. Fire tends to move, and based on the landform types of the region, it transfers in various ways [73]. For instance, narrow canyons are one the most dangerous forms of the land in the event of wildfires [74]. In such a condition, a greater degree of slope increases the destructive power of the wildfires. The reason is this landform creates strong updrafts of air, preheating the upslope fuels, thus, increasing the likelihood of heat transfer. In some cases, if the valley is narrow enough, it might initiate an outbreak of fire on the opposite side.

Aspect defines the direction of the slope. The impact of aspect on fuel temperature and moisture is apparent [75]. Aspect controls the solar radiation received, which indirectly influences the vegetation types and cover [76]. In the southern hemisphere, north-facing slopes tend to have less vegetation and lighter fuel loads, particularly in lower-elevation forests [77]. North slopes receive higher levels of solar radiation and are consequently warmer, so fuels tend to dry out sooner. On the other hand, south slopes contain more vegetation and, therefore, greater fuel quantities. The drying process for these slopes is slower due to shadows. In the case of wildfire occurrence, they cause more severe wildfire. Curvature, which shows the morphology of topography, is another influential factor in wildfire occurrence [78]. Positive, negative, and zero curvature values specify that the surface is convex, concave, or flat, respectively [79]. One of the ways to evaluate the impact of topography on the hydrological characteristics of the region [80] is TWI. TWI shows the amount of flow accumulation at any point in a drainage basin and the downslope trend of the water by the power of gravity, and it measures the slope and direction of hydrologic flow [81]. The TWI thematic map was generated using the system for automated geo-scientific analyses (ArcGIS). where is the cumulative up slope area draining through a point, and is the slope angle at the point [46]. Table 3 provides both detailed and general descriptions of soil types in the study area. This information is useful for land management and planning.

TPI defines or characterizes shapes such as canyons and ridges [82]. This factor reflects the difference in elevation between a focal cell and all cells in the neighborhood [83], which can make a simple and useful means to classify the landscape into morphological classes.

One of the characteristics of the study area that has a direct and indirect influence on wildfire incidents is weather condition [63]. Factors such as fire ignition potential, severity, heat transfer, and intensity are all associated with weather condition [84]. Fuel moisture and humidity are directly associated with the precipitation amount [85], while wind speed affects heat transfer and direction [86].

LULC is considered a significant influencing factor for wildfire, as well as for other natural hazards, such as flooding and landslide [87]. In this research, the LULC factor, consisting of 56 classes, was used to investigate the most influential land cover type on wildfire.

In terms of wildfire, vegetation can be grouped into ground fuels (e.g., roots) [88], surface fuels (e.g., grass) [89], ladder fuels (e.g., small-size trees) [90], and crown fuels (e.g., forest canopies) [91]. The wildfire combustion and behavior are highly affected by the size, moisture, and chemical content of the fuels [92]. Regarding the chemical content of fuels, some vegetation-like shrubs contain volatile oils, which make them to burn with higher intensity [93]. Shrubs have small branches as well which can create long flame lengths. Combustible biomass can be measured from a variety of sources. One of the main sources is NDVI. This factor is determined by the density of vegetation in the area using remote sensing [94]. As shown below, to calculate the NDVI the near-infrared (NIR) and red channels of Landsat, imagery was used.

3.2. Multicollinearity Analysis

As was mentioned, not all causative factors were selected for the final modeling. Although there is no framework available to define the most influential factors, there are a number of statistical models that can assist researchers in their data selection [95]. These methods are able to statistically evaluate a group of factors and highlight the least significant and/or the factors that have duplicate impact. Through these assessments, redundant and less-effective factors can be eliminated from the dataset to decrease the computational time and complexity and increase the functionality. The correlations between factors were evaluated prior to the main analysis using Pearson’s correlation coefficients [96] and variance inflation factors (VIF) [97] to exclude multicollinearity [98], which causes errors in analysis [99].

The degree of a factor’s interrelatedness with other influencing factors can be calculated using VIF [100] and represents the influencing factor’s estimated regression coefficient accordingly. The square root of VIF shows the standard error for that factor. A VIF of 5 or 10 and greater represents a multicollinearity problem in the dataset [95].

The Pearson’ correlation coefficients method evaluates the correlation coefficient of two influencing factors, for example, aspect () and geology () in wildfire occurrence [101]. The correlation value is calculated by their covariance divided by the product of their standard deviations (Eq. (3)). A measured value greater than 0.7 indicates a high level of collinearity in the dataset [95]. where is the mean of .

When either of these methods reaches their threshold values, the collinearity should be reduced by eliminating one or more factors from the analysis [97].

3.3. Bivariate Statistical Analysis (BSA)

The two selected BSA methods were IoE and EBF. The application of IoE is based on the methodology proposed by Vlcko et al. [102], in which the weight value for each factor is expressed as an entropy index. The approach to EBF is based on the Dempster-Shafer theory of evidence [103].

3.3.1. Index of Entropy Model (IoE)

Entropy is an assessment of the disorder, instability, imbalance, and uncertainty of a system [104], and, according to Boltzmann’s principle, the measurement of entropy of a system describes its thermodynamic state in terms of its degree of disorder. Shannon’s entropy model for information theory is regarded as superseding Boltzmann’s principle [105]. Applying the Shannon model of information, a weighted index of wildfire hazard based on the environmental influencing factors can be extracted. where and are the domain and wildfire percentages, respectively, denotes the density of the occurrence of wildfire for every class of every influencing factor (e.g., each type of geology), and denote the entropy values, is the information coefficient, and represents the calculated weight value for the specific influencing factor, without consideration of the classes.

The final wildfire susceptibility map was generated by summating the weighted products of the secondary parametric maps. The following equation was used to develop the final wildfire susceptibility map from the IoE model. where is the value of the wildfire susceptibility index.

3.3.2. Evidential Belief Function (EBF)

The Dempster–Shafer theory of evidence has been introduced by Dempster [106], and the EBF method has been applied to other natural hazards such as flooding [107] and landslide [108]. Its relevance in natural hazard modeling is that it can accept uncertainty and can integrate information from multiple sources of evidence [109]. It is used for assessing the degree of probability of the truth of a hypothesis, as well as for evaluating the nearness with which the evidence comes to proving the truth [110]. Its functional parameters are the degrees of belief (Bel), disbelief (Dis), uncertainty (Unc), and plausibility (Pls) [111]. The proposition’s lower and upper limits of the probability are denoted by Bel and Pls, respectively; the difference between belief and plausibility by Unc, which describes ignorance [112]; and the belief that the proposition is false based according to the evidence by Dis, where or , provided that . In the situation where a class of an influencing factor does not contain any wildfire event, Bel is equal to zero, and Dis is reset to zero. Applied to wildfire occurrence, the EBF estimates the spatial correlations among the classes of each conditioning factor. An overlay of the inventory map on each influencing factor layer displays the pixels that could contain wildfire or nonwildfire influencing factors. A set of factors, , which contains mutually exclusive and exhaustive factors of , was used in this study. The calculation is performed using the equation: where the weight of (e.g., weight of the first class of altitude) is represented by and supports the belief that the existence of wildfire exceeds its absence. denotes the weight of that supports the belief that wildfire absence exceeds its presence. EBF calculation requires several stages which are not explained in this paper. A more detailed description can be found in Bui et al. [113].

3.4. Multivariate Statistical Analysis (MSA)

As stated, a BSA method evaluates the impact of each class of each influencing factor on wildfire occurrence (e.g., the impact of different types of geology on wildfire). In terms of our research objective, the most accurate BSA method will be selected to perform the ensemble modeling with LR. LR is one of the most popular MSA methods to examine the multivariate regression relationship among a dependent factor (e.g., flooding) and several independent influencing factors (e.g., altitude and slope) [114].

For our research purposes, LR is used to measure the wildfire probability in an area, based on a specific formula created by the influencing factors and a dependent factor. The method necessitates a dependent factor established by values of 0 and 1, indicating the nonexistence or existence of wildfire, respectively. The factor was created in ArcGIS using the inventory dataset. To create this dataset, the original influencing factors were reclassified using the BSA weights derived from the most accurate method (either IoE or EBF) in order to implement the ensemble modeling. Subsequently, the dependent and reclassified influencing factors were converted from raster to ASCII format as a requirement of SPSS. LR was executed in the SPSS V.19 software environment. The logistic coefficients were derived and used as inputs in the equation below to measure the final wildfire susceptibility map. where is the wildfire probability in the range 0 to 1 on an -shaped curve. is the linear combination and it follows that LR involves fitting an equation of the following form to the data: where is the constant intercept of the model, represents the weight coefficients of the LR model for each factor, and represents the influencing factors [47].

3.5. Ensemble Modeling

For the purpose of the ensemble, the weights derived from the more accurate BSA method (either IoE or EBF) will be used to reclassify each wildfire influencing factor. Subsequently, the reclassified factors will be entered into LR as input variables in order to perform the MSA. The derived final wildfire susceptibility map will be based on this ensemble modeling. Through this integration, the weak points of BSA and MSA will be resolved, and the outcome will be an integration of the two analyses.

3.6. Accuracy Assessment

Model validation is a fundamental step in any natural hazards study [115]. The well-known area under curve (AUC) technique has been used in many natural hazard susceptibility mapping studies [116118], producing the prediction and success rates by means of a comprehensive quantitative method [119]. The validation was achieved by comparing the wildfire inventory data and derived probability maps. The wildfire probability map was initially partitioned into classes of equal area, and these were then ranked hierarchy from minimum to maximum value [95]. Prediction accuracy was evaluated qualitatively, using AUC by sorting all cells in the study area into a hierarchy of calculated values, arranged in descending order, thus, ranking each prediction. Hence, the values of cells were divided into 100 classes with 1% accumulation intervals. In the subsequent step, the presence of wildfire in each interval was measured using the ArcGIS “Tabulate area” tool. The success and prediction curves denote the percentage of wildfire in each probability class. The curve creation was implemented by plotting the cumulative percentage of areas susceptible to wildfire (from highest to lowest probability) on the -axis and the cumulative percentage of wildfire events on the -axis. The success and prediction curves determine the percentage of wildfire occurrence for each probability category; the more wildfire events in categories of greater susceptibility, the steeper the AUC curve [95]. A perfect classification occurs where , rather than one by chance where . The 70% training and 30% testing points will be used to generate the success and prediction rates, respectively, as mentioned in Section 3.1.1. 210 wildfire inventory points were used for training and the remaining 90 points for testing.

4. Results and Discussion

4.1. Correlation Analysis

Multicollinearity among the wildfire influencing factors has been implemented. Tables 4 and 5 listed the VIF and Pearson’s correlation coefficient values, respectively. As mentioned in the methodology section, a VIF can be computed for each predictor in a predictive model. A VIF value of 1 means that the wildfire influencing factor is not correlated with other factors. The greater the VIF value, the greater the association of the factors with other factors is. A VIF above 5 indicates multicollinearity in the dataset. Table 3 shows that the highest VIF values are 9.22 for NDVI and 6.32 for TWI factors which are above the threshold. In the case of Pearson’s correlation, values greater than 0.7 denote high collinearity. The diagonal elements (bold text) are the correlations between each variable and itself. Therefore, their value is always equal to 1. In Table 5, the highest value of 0.9 derived between TWI and rainfall represents a considerable collinearity. The second highest collinearity of 0.8 was detected between LULC and NDVI. Both outcomes of VIF and Pearson’s correlation analysis suggest that by including TWI and NDVI in the analysis, the problem of collinearity may arise. These outcomes show that other factors of LULC and rainfall in the dataset already provided adequate information, which would merely be duplicated if we include the TWI and NDVI.

The final selected wildfire influencing factors dataset includes altitude (Figure 4(a)), slope (Figure 4(b)), aspect (Figure 4(c)), curvature (Figure 4(d)), TPI (Figure 4(e)), rain (Figure 4(f)), geology (Figure 4(g)), soil (Figure 4(h)), LULC (Figure 4(i)), distance from river (Figure 4(j)), distance from road (Figure 4(k)), and wind speed (Figure 4(l)). As it can be seen in Figure 4, all the scaled influencing factors have been classified due to the requirement of the BSA techniques. The quantile method was used for the classification [120]. The advantage of quantile is that features are grouped equally in each category (equal-sized subdivisions), with the least external influence. Table 6 represents the statistics related to the scaled wildfire influencing factors, such as minimum, maximum, mean, and standard deviation. For instance, the highest location in the study area has an altitude of 217 m. All the factors were transferred to IoE and EBF in order to extract the correlations among their classes and wildfire occurrence.

4.2. Weights Derived from Correlation Analyses

Both IoE and EBF were individually implemented, and the derived weights are listed in Table 7. IoE was computed by considering the frequency of different classes of influencing factors, which significantly reduces the unevenness among the factors and, therefore, provides a realistic and accurate metric of their influence on wildfire occurrence. The result of the bivariate analysis of IoE is shown by values in Table 6.

In the case of EBF, the weights of the bivariate analysis derived for each class were represented by the value. There were some similarities in the calculation of IoE and EBF up to this point, implying that the values of and were very close. However, this was not the case for the remainder of the processing and the final probability map creation. From this stage forward, there are specific equations related to each method. Using these equations, the correlation assessments were applied to the derived BSA weights from IoE and EBF, and two wildfire probability maps were produced. In order to avoid repeating the BSA values from IoE and EBF, only IoE values will be discussed below.

The outcomes based on BSA values (IoE and EBF) denote that the slope, in the last two ranges of 9.58-14.97 and 14.97-76.38 degree, had the highest values of 0.296 and 0.164, respectively. It is already known that fire on the steep slopes tends to move faster and causes more severe burning. Therefore, these classes of slope received higher weights. The highest derived value (0.248) of rainfall is for the smallest class of rainfall ranging between 536.00-671.32 mm. It is clear that wildfire susceptibility increase by the decrease in rainfall [121]. High rainfall and relative humidity contribute to fuel moisture, which in turn reduces the probability of ignition. With regard to altitude, the middle classes seem to have the highest influence on wildfire occurrence. The class of 46.76-57.95 m with a BSA of 0.338 was detected as the most influential category. The spatial correlation between wildfire incidence and altitude reveals that when the altitude increases, the probability of wildfire decreases. This result is supported by previous findings that low-elevation areas are more vulnerable to fire occurrence [122]. The relation between curvature and wildfire probability revealed that the convex class has the highest BSA value of 0.548. In the case of aspect, the BSA value is highest for northeast-facing slopes with a value of 0.229. As noted in Section 3.1.2, north-facing slopes obtain more solar radiation and are consequently warmer, so fuels tend to dry out sooner. This may be a logical explanation for the derived weight for this aspect class. Results for the TPI factor show that the ridge landform is relatively conducive (highly susceptible) to wildfire occurrence. Wildfires on ridges can burn in any direction and by wind can move up through saddles and canyons. The steep slope runs off the water faster and keeps a smaller amount of moisture content and ridge areas acted as a steep slope. Therefore, those areas are more prone to the fire hazard if were covered by vegetation. For distance to rivers, the highest wildfire probability is within distances 1,416.25–2,832.51. In the case of distance to roads, the distance of 240.99–481.99 m to roads has the highest susceptibility. Both distances to rivers and roads were recognized as positive factors in this study, as they act like barriers towards fire spread. There are features like lakes, roads, and rivers, which act as barriers to wildfire spread [123], preventing the continuity of fire in the area. Ridges have a similar influence, acting like fuel breaks, interrupting the continuation. Some landform types can influence prevailing wind patterns by funneling air, wind speed, and, thus, fire intensity [124]. The BSA weights for soil showed clearly that the class of “Mp6” has the greatest effect on wildfire occurrence. The dominant soil types of this class are sand and clay, which usually have very low moisture content. The geology types of “Granitoid” and “Miscellaneous unconsolidated sediments” had the highest BSA weights of 0.268 and 0.231, respectively. Regarding the LULC factor, cropping received the highest weight of 0.252 among LULC types followed by residential. The class of 4.09–4.19 in wind factor had the highest value of 0.335. The highest weights were derived for the classes that have a very high influence in triggering wildfires due to clear reasons that have been stated.

Table 7 also lists information regarding the MSA. In the case of IoE, in addition to the derived BSA weights for each class, this method also provided a relevant weight for each influencing factor itself (). LR weights represent the MSA weights for each influencing factor derived from the ensemble modeling of LR. This implies that the derived weights from LR are according to the classified influencing factors imported from BSA analysis. The findings based on IoE revealed that the most important wildfire influencing factors affecting the wildfire distribution were soil (1.251) and geology (0.623). LR ensemble MSA outcomes showed that LULC and soil received the highest weights of 0.781 and 0.653, respectively. The soil characteristics and its effect on forest fire were discussed in [83] and were consistent with our finding. Apart from the topographic and geologic factors (e.g., TPI and soil) influencing the bushfire in the region, it was obvious that LULC played a major role as triggering factors. In detail, as it was revealed by BSA weighting values and susceptibility map, cropping and residential areas gained higher weight and were more susceptible to the fire risk rather than other LULC. It highlighted that human activity contributed to that hazard as well. The influence of the residence area as an important factor was in agreement with Tien Bui et al. [32] and Kalantar et al. [35]. Therefore, more detailed investigation is desirable by the management committee to find some of the reasons (e.g., cigarettes, grinding activities, and power lines adjacency) in widening the bushfire in residential areas. Another detected human activity in the area was cropping, and it needs to be carefully explored according to the crops and process of cultivation. Some parts of the cropping process might contribute to worsen the fire in its intensity and speed such as intentional ignition for agricultural clearing or wrapping the bunch of bananas in plastic bag on the plant to reduce the ripening time, which can magnify the fire as fuel. This could improve the decision upon revising the process of cropping in those susceptible areas.

The IoE and LR methods demonstrated some similarities and differences in outcomes in the analysis of the influencing factors. TPI and soil were identified as considerable influencing factors in both models. Penman et al. [125] concluded that on ridges with higher TPI, the probability of lightning ignitions was higher which was in agreement with our outcome. In IoE, geology is the second and curvature the third most influential factor. Alternatively, in LR, after LULC, slope is the most influential factor. Parente et al. [63] found slope to be the main influencing factor. The LR outcome also showed that rainfall has a high negative correlation with wildfire occurrence for the weight of -0.326, implying that as rainfall increases, the probability of wildfire occurrence decreases. There is a conflict among the MSA weights derived from IoE and ensemble LR for aspect. The derived values were 0 and 0.781 from IoE and LR, respectively. It is well-known that aspect has a considerable influence on a region’s characteristics such as exposure to sunshine, wind direction, precipitation, drying winds, and the morphologic structure that has been associated with fire occurrences. Here, IoE failed to indicate a strong association between aspect and wildfire occurrence. This will be dealt with in the AUC analysis.

In addition, in order to have a visual view of the location of the burnt areas, a 3D map of the altitude and wildfire inventories was produced and is presented in Figure 5. It can be clearly seen that most of the burnt areas are located on the slopes and in the north and northeast of the region.

4.3. Susceptibility Maps and Validations

The individual IoE (Figure 6(a)) and EBF (Figure 6(b)) modeling and the ensemble LR (Figure 6(c)) modeling were implemented. Three wildfire probability index maps were generated. In order to create a susceptibility map, the probability index maps should be classified. In this study, the probability maps were classified using the quantile classification method and grouped into relative categories of very low, low, moderate, high, and very high susceptibilities of wildfire occurrence (Figure 6).

Table 8 shows that the IoE and EBF proportions of the very high wildfire susceptibility category are 10.25% and 13.52%, respectively, which is greater than that of LR (6.56%). This implies that the LR ensemble model produced less exaggerated outcomes, compared to the two individual methods. The wildfire susceptibility map derived from the ensemble model classified 68.34% of the total region as very low susceptibility. However, the two individual methods, IoE and EBF, detected 22.77% and 51.21%, respectively, as “very low.” Regarding the LR ensemble model, the other zones, high, moderate, low, and very low, were 3.78%, 9.23%, and 12.09%, respectively.

The AUC prediction accuracy of the two individual methods and final ensemble model created by LR is displayed in Figure 7. The greatest accuracy (88.51%) was recorded by the ensemble method, while the least accuracy (75.32%) was recorded by the individual IoE method. The AUC results showed that the outcome of the ensemble analysis was more reliable than that of the individual methods. Consequently, the EBF technique robustness to exploit different variables in the processing was proved. One of the advantages in the EBF model is a calculation of the degrees of uncertainty in the prediction along with belief and disbelief (Table 7) during the process of generating probability mass functions [79]. The degrees of uncertainty and plausibility of a pixel (classified as fire event) can be measured quantitatively which could not be achieved through other methods, and it is a fast algorithm without heavy calculation and iteration [110]. The closer value to 0.5 is recorded, and the higher uncertainty of the class is considered. By looking at the uncertainty value in Table 7, one can find how reliable the measurement is in every single feature and evidence. Besides, the three exploited algorithms do not require the assumption of normal distribution that provides robust operation for the complex events modeling [110]. The simplicity of the EBF, IoE, and LR makes them a great choice for modeling big data rather than the iteration in ML. For instance, the LR has less parameters to fine tune, while the SVM model requires optimizing the kernel function, the penalty, and gamma parameters.

The risk and vulnerability can be performed for different characteristics. Brisbane City Plan 2014 is a huge future development plan of the Brisbane City Council that has a variety of aspects. The city planners divided the region into different zones based on specific topics. The class of “very high” of the final wildfire susceptibility map derived from ensemble modeling was extracted and overlaid on the Brisbane City Plan thematic layers. These layers illustrate the spatial distributions of a variety of important species, features, etc., in the city of Brisbane. Figure 8 illustrates different risk maps. The single upper individual map represents only the very high susceptibility zone. Subsequently, every pair of maps in each row represents the vulnerability map and its overlaid outcome with very high susceptibility zone. The seven maps relate to (a) general zoning, (b) significant trees, (c) critically endangered species, (d) heritage area, (e) industrial areas, (f) freight route, and (g) koala habitat areas.

The following maps were prepared by overlaying the aforementioned maps with the class of “very high” of the wildfire susceptibility map derived from the ensemble modeling (Figure 8). The first wildfire risk likelihood map is related to the general zoning map. As illustrated in the first row (Figure 8(a)), the very high susceptibility zone is distributed across general residential areas; consequently, it would be efficient and reasonable if fire prevention strategies and management plans were organized based on these fire-risk zones. The second row (Figure 8(b)) represents a significant tree map. The overlaid risk map shows that very few portions of those trees are located in the very high wildfire susceptibility zone. Therefore, those species are not at very high risk. In the third-row maps (Figure 8(c)), only a few of the critically endangered species are located in the very high susceptibility zone. In the case of the heritage area (Figure 8(d)) in the fourth row, there are several areas in the susceptible zone. Susceptible areas increase in the industrial areas (Figure 8(e)) map in the fifth row, and many industrial regions are in the very high wildfire susceptibility zone.

Freight routes (Figure 8(f)) which are usually used for transportation and cargo services are presented in row six and only the red routes occur in the very high susceptible areas. The slightest disruption to such routes can have a serious impact on the total transportation of the region. The last row illustrates the koala habitat areas (Figure 8(g)). This species is very special in Australia and those areas which they inhabit are ranked as very high wildfire susceptibility zone, according to our analysis.

Our research indicates that a variety of elements are at risk when wildfires vary. In some cases, the risk map gives more information than the susceptibility map and may be of more use to government agencies, councils, insurance companies, and the inhabitants of risk areas. Wildfire management is a colossal undertaking and many aspects are impractical. Therefore, the maps like those illustrated in Figure 8 could assist organizations and individuals at risk. The best policy strategies include preventative and mitigating measures.

5. Conclusion

This research has focused on increasing the accuracy of wildfire susceptibility mapping through ensemble modeling. Additionally, the importance of the application of susceptibility mapping in risk analysis was evaluated and illustrated. The mapping process was commenced with the structuring and processing of two datasets: the inventory and influencing factors. The IoE, EBF, and LR statistical methods were used to perform BSA and MSA. The process calculated the statistical weights for each wildfire influencing factor (e.g., geology map) and the classes of each influencing factor (i.e., geology types) to produce the final wildfire susceptibility map. IoE and EBF ranked each class of each influencing factor, while the influencing factors were classified according to their BSA weights and entered into ensemble modeling using LR. The final assessment of the accuracy of susceptibility maps showed that the ensemble method produced a more reliable outcome with an 88.51% prediction capability. The AUC exceeded most of the previous research. It is a recommendation that precision in the composition of the raw datasets is crucial, to realize the full potential of the increased accuracy of an ensemble model. The selection of the most influential classes and factors has a considerable impact on the result.

Additional information extracted from the analysis and the derived weights confirmed that features such as rivers, rock outcroppings, road networks, and water bodies can act as fire barriers (fuel breaks). City planners can take advantage of these natural and manmade features in controlling and minimizing wildfires. Using the topographic map of a region, these features can be easily located. The influencing factors of slope, TPI, and soil have a significant impact on the creation and location of the wildfire susceptible areas, while land management practices have a considerable impact on the magnitude and consequences of this phenomenon. As described, fire risk analysis is an essential practice protecting forest environment, where possible. The zoning maps of a variety of demographic, topographical, and environmental aspects were overlaid onto the “very high” wildfire susceptibility map, from which produced alternative risk information. It might improve the human perception and understanding of the hazard. Managing fire is vital for the protection of human dwellings and environmental habitats. Wildfires are an annual occurrence in Queensland, and preparations for prevention and mitigation are invaluable. Hence, accurate identification of the conditioning and triggering factors in the region along with a risk map would enhance the mitigation strategies and plans, and it could pose minimum loss and damages. Some strategies could be applied to replace the wood with fire resistance lumber or metals for building and construction in susceptible areas. In this study, irrigated cropping was classified as low and very low-prone area to bushfire. This finding might be investigated as future research and discussion over the potential and effect of irrigation system to keep standard moisture in the soil preventing the fuel source. Moreover, the vulnerable wildfire zones exhibiting with dry soil could raise the attention and need more inspection in terms of the groundwater sources and their quality. Our planned future research into modeling wildfire will focus on comprehensive risk and vulnerability assessment using a time series perspective of the trend and extent of wildfire in the region.

Data Availability

The data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest

The authors declare that they have no competing interests.

Authors’ Contributions

M.S.T, H.Ö, F.S, and F.S. conceived and designed the experiments. M.S.T, F.H, and F.H. performed the experiments. M.S.T, H.Ö, F.S, and F.S. analyzed the data. M.S.T, H.Ö, M.R.H, F.S, and F.S. contributed reagents/materials/analysis tools. M.S.T, M.R.H, F.S, B.K, V.S, and F.S. wrote the paper. B.K, N.U, and V.S. edited, restructured, and professionally optimized the manuscript.


Thanks are due to the Australian Research Council Centre of Excellence for Australian Biodiversity and Heritage. This research was done by the Australian Research Council Centre of Excellence for Australian Biodiversity and Heritage (CE170100015; http://EpicAustralia.org/).