#### Abstract

This paper explores wildfire modeling based on meteorological variables for Tanjung Puting National Park, located on the island of Borneo. A separable model is developed for predicting daily wildfire burn area using variables such as temperature, sea level pressure, humidity, precipitation, visibility, and wind speed. Each component in the model is estimated using kernel smoothing and maximum likelihood methods. The data are shown to be largely compatible with the separable model, suggesting that the relationship between wildfire burn area and any of these weather variables in particular does not appear to change significantly depending on the values of the other weather variables. The analysis appears to confirm the findings of previous studies on wildfire in Southern California which indicate that wildfire hazard may be suitably estimated using a simple multiplicative model where the impact of each weather covariate is estimated separately.

#### 1. Introduction

The island of Borneo has suffered severe deforestation and forest degradation over the past two decades, with fire acting as a significant factor [1]. Located on the southern coast of the island's Indonesian territory known as Kalimantan, Tanjung Puting National Park covers over 400 000 hectares and is susceptible to anthropogenic fires and wildfires year round. A map of the region is shown in Figure 1. The park contains a variety of habitats, including lowland rainforest, peat swamp forest, mangrove swamp, and abandoned agricultural areas, and is well known as the home of Camp Leakey, a world renowned center for the study and rehabilitation of orangutans [2].

Accurate estimation of wildfire hazard is very important in aiding National Park officials to prepare supplies and staff in preventing, combatting, and controlling large wildfires. One way to obtain estimates of wildfire hazard would be to produce a statistical model that uses weather variables such as mean humidity, mean temperature, and precipitation in forecasting total daily burn area due to wildfires in the National Park. While a variety of different types of models may be used to predict wildfire incidence based on models for human or lightning-caused ignition and other possible factors, or to model the spread of existing fires possibly relying on physical characteristics of the fires and the landscape, the focus here is on the forecasting of wildfire activity solely using meteorlogical variables. Such statistical forecasts may be useful not only for planning and preventive purposes, but also for the sake of understanding the critical role that these weather variables can play in affecting wildfire incidence and behavior.

Recently, separable point process models of this sort have been used to estimate wildfire hazard in Southern California, as a function of weather variables [3]. The current paper explores the fit of such models to Tanjung Puting National Park. Using weather variables as covariates, components of a purely multiplicative model can readily be estimated individually if the assumption of separability is satisfied [4]. In such cases, one may use a nonparametric method such as kernel smoothing in order to suggest a parametric form for each component in the model. While Schoenberg et al. [3] found separable models to fit rather well to wildfire data in Southern California, a question posed was whether these types of models could fit adequately in other regions. Here, we explore the use of kernel smoothing and semiparametric approaches in estimating separable point process models for daily burn area in Tanjung Puting National Park. While a variety of different response variables are used in wildfire hazard models such as the US National Fire Danger Rating System [5], the focus of this paper is on forecasting total daily burn area, rather than fire frequency or the spread rates of existing fires. The purpose of fitting such a model is not only for the accurate estimation of wildfire hazard on a given day, but also in order to simulate realistic overall wildfire activity given meteorological conditions in the National Park. In addition, the simple empirical-based model proposed here may be seen as a baseline against which alternative, more complex physics-based models for forecasting wildfire incidence in Indonesia, such as that described in de Groot et al. [6], may be compared.

A description of the weather and fire data for Tanjung Puting National Park used in this paper can be found in Section 2. Kernel smoothing techniques as well as several bandwidth selection methods are explored in Section 3. The definition of separability is also reviewed in Section 3, and the different distributions explored in order to simulate fires for testing separability are described. Results of the methods chosen in Section 3 are then detailed and explained in Section 4. Conclusions are given in Section 5, and a discussion of limitations and suggestions for further study are explored in Section 6.

#### 2. Data

There are over 160 weather stations located among the islands of Indonesia. Situated in Pangkalan Bun, just outside the boundaries of Tanjung Puting National Park (, , elevation 25 meters), weather station 966450 (WRBI) records a variety of daily meteorological variables. We focus here on temperature, sea level pressure, humidity, precipitation, visibility, and wind speed, collected from January 2001 to January 2007. The data are presented on Tutiempo.net, which bases its data summaries on data exchanged under the World Meteorological Organization (WMO) World Weather Watch Program according to WMO Resolution 40 (Cg-XII).

The MODIS Rapid Response System utilizes a contextual fire detection algorithm that incorporates a combination of an absolute threshold test and a series of contextual tests that look for the characteristic signature of an active fire using two 4 m wavelength bands and an 11 m wavelength band [7]. The algorithm further uses cloud and water masking, as well as several false alarm rejection tests such as sun glint rejection to verify the existence of detected wildfires. On-board the satellites Terra and Aqua, the MODIS sensor passes over Borneo four times a day, ensuring accurate and thorough coverage of fire activity on the Island [8]. The MODIS sensor is a well-established system used to recognize fires at a spatial resolution of 1 km [9]. All fires detected within the region of Tanjung Puting National Park from January 2001 to January 2007 by the MODIS sensor on both the Terra and Aqua satellites, whose total area exceeded 9600 , were used for this analysis.

Data were missing for one or more weather variables on certain days over the time range considered here. We restrict our attention to the 1533 days where temperature, visibility, wind speed, sea level pressure, humidity, and precipitation were all recorded. On these days, there were 329 days on which fires were recorded, with 793 being the largest amount of area burned on any single day during this 6-year period.

#### 3. Methods

Spatial-temporal marked point process models are used to represent observations of rare events such as wildfires or earthquakes. For a thorough treatment of point processes and related constructs, see Daley and Vere-Jones [10]. A few important details are summarized here. A point process is a random collection of points in some metric space . In modeling the occurrence of wildfires, for example, one may identify with each event a point , where represents the time of the event's origin, the corresponding location, and a real-valued measure of its size. The basic construct of a point process model is the * conditional intensity*, , which one can interpret as the limiting expected rate at which points of mark amass around any location of space-time, conditional on the history of the process prior to time .

In order to model the incidence of wildfires in Tanjung Puting National Park, one technique would be to create a model based on the point process models developed in other papers to describe wildfires in Los Angeles County. As suggested by Schoenberg [11], a model that is purely multiplicative, or *separable* in the terminology of Cressie [12], may be appropriate. Typically, in such models, each component of the model may be estimated individually. As mentioned in Section 1, the goal of our analysis is to use daily weather variables to model the expected total daily burn area, rather than fire frequency or the spread rates of existing fires. That is, we model the integral taken over all fire sizes, all locations , and all times within the day in question. In analogy with the model proposed in Schoenberg et al. [3], we consider models where on any given day , the expected burn area is separable, that is,
where denotes total burn area on day , and , and represent precipitation, visibility, humidity, sea level pressure, temperature, and wind speed, respectively, for day .

One may argue that the association between the variable visibility and wildfire activity may possibly be due to visibility being a proxy for wildfires that have already occurred; that is, low visibility is often largely the * result* of large wildfires, rather than the other way around. Hence one may wonder about the performance of a separable model with visibility excluded, that is, a model of the form
In addition, for comparison with the work of de Groot et al. [6] in forecasting wildfire activity, one may also assess a separable model similar to (1) but with both visibility and sea level pressure removed, that is,
Such a model would represent a much simpler alternative to the much more complex physics-based models summarized in de Groot et al. [6], which also use only precipitation, humidity, temperature, and wind speed.

In estimating each of the individual component functions in (1), one approach is to use a nonparametric method such as kernel smoothing [13]. That is, if represents the corresponding weather variable in (1), then the component may be estimated using kernel regression via , where is any real number and is the value of the weather variable on day . The function is called the *kernel density* and typically obeys the constraint . The parameter represents the *bandwidth*, which controls the degree of smoothing.

There are several different methods for automatically choosing a bandwidth for kernel smoothing. Silverman's “rule of thumb" bandwidth selection technique is a common method used for automatically choosing a bandwidth for kernel smoothing, where the bandwidth = , with the sample standard deviation, the interquartile range, and the number of observations of the variable being smoothed [13]. The bandwidth chosen by Silverman's rule, however, often is too small when the covariate under consideration is not normally distributed [13, 14].

Another method commonly used in bandwidth selection is the likelihood cross validation (LCV) technique [13]. This approach temporarily removes each observation in the dataset and then calculates the estimate of the kernel smoothed function at that point using an initial bandwidth . This value, , is then used to calculate the distance from the observed total burn area on the day that was removed in computing the kernel estimate. The bandwidth that minimizes is then chosen as the optimal bandwidth. LCV bandwidth selection is not optimal, however, when used to estimate the relationship between a particular weather variable and observations of rare events such as fire incidence [14]. In particular, when the covariate has many repetitions of identical values, bandwidths estimated by LCV tend to be too small. This is the case for the observed weather variables studied, where over 58% of mean temperature observations, for example, are exactly the same on ten or more days.

In light of the shortcomings of likelihood cross validation, Schoenberg et al. [14] suggest a modified version of LCV bandwidth selection that will result in a smoother estimate. In modified likelihood cross validation, instead of only removing in the prediction of the density at , all observations with the same value as are removed when predicting . Thus, rather than removing one observation at a time, the modified LCV approach removes one small portion of the *x*-axis at a time. As with LCV, the bandwidth that minimizes is then chosen as the optimal bandwidth.

Model (1) is purely multiplicative, and one may wish to test whether such a model, which is called * separable* in the terminology of Cressie [12], may be appropriate. Several statistics for testing separability in point process models were proposed in Schoenberg [11], and extended in Chang and Schoenberg [4] to the case of multi-dimensional point processes with covariates. The method described in Schoenberg [11] involves selecting a pair of covariates, and comparing a bivariate kernel smoothing of the response variable (which in this case is total daily burn area), smoothed with respect to both covariates, with the product of two univariate kernel estimates, smoothed with respect to each of the covariates individually. The former may be considered a * nonseparable* estimate of burn area, since it does not assume a multiplicative relationship between the two variables, whereas the product of the two univariate kernel estimates may be considered a * separable* estimate of wildfire burn area based on these two variables. The statistics suggested by Schoenberg [11] and Chang and Schoenberg [4] to be most powerful in detecting departures from separability is their Cramer-von Mises-type statistic , which is the integrated squared difference between these two kernel estimates. In order to produce -values for these test statistics, simulations of separable models may be used, exactly as in Chang and Schoenberg [4]. In addition, one may assess the fit of the resulting separable model by computing its root mean squared error in predicting daily wildfire area burned, and comparing with a simple alternative such as a homogeneous Poisson model. Note that since the distribution of wildfire sizes tends to be heavy tailed and well-approximated by the Pareto or tapered Pareto distributions [15, 16], the root mean squared fire size is typically much larger than the mean, and hence it is important to compare the root mean squared error of a model with that of a simpler model such as the homogeneous Poisson process, rather than with the mean wildfire size.

#### 4. Results

Wildfire activity in Tanjung Puting National Park appears to depend rather critically on weather variables such as precipitation, temperature, humidity, and atmospheric pressure. For instance, the solid curve in Figure 2 shows a smoothed estimate of the relationship between daily area burned and sea level pressure, obtained by kernel regression using a Gaussian kernel function and bandwidth selected by modified LCV. The fitted curves suggest that the average daily burned area increases with increasing atmospheric pressure, although the scatter about the curves shrouds this observation in uncertainty. (Note that in the right panel of Figure 2, the *y*-axis has been truncated to highlight the smoothed curve, and as a result not all points are shown in the figure.)

**(a)**

**(b)**

Figure 3 shows the smoothed estimate of the relationship between daily burn area and visibility. As visibility increases, the mean area burned in wildfires decreases rapidly. Note that this is consistent with the hypothesis mentioned in the previous section, regarding low visibility being essentially a proxy for wildfire activity already in progress. This kernel regression plot of mean visibility and number of fires per day suggests an exponential form for the function in model (1). Similar kernel regression plots of number of daily fires against each of the other four weather variables suggest exponential forms for , , , and , whereas a linear model appears preferable for .

**(a)**

**(b)**

The assumption of separability in model (1) should be tested to ensure that a separable model is in fact appropriate for the data. Figure 4 shows nonseparable and separable kernel estimates, respectively, of daily burn area as a function of temperature and mean sea level pressure. Both estimates show that when mean sea level pressure is high, expected area burned is also high, though the two estimates have obvious discrepancies, especially when both temperatures and atmospheric pressures are highest. Nevertheless, the left panel of Figure 5 shows that the difference between the nonseparable and separable estimates shown in Figure 4 is not statistically significant. The estimated *P*-value of using 100 simulations is .22, suggesting that a separable model for mean temperature and mean sea level pressure may be reasonable for wildfire incidence in Tanjung Puting National Park.

**(a)**

**(b)**

**(a)**

**(b)**

Similar to Figure 4, Figure 6 shows the nonseparable and separable kernel estimates of burn area as a function of humidity and precipitation. The two estimates in Figure 6 appear to agree generally. Both the nonseparable and separable estimates in Figure 6 are high when humidity is between 58% and 68% and precipitation is low. The nonseparable estimate predicts a high amount of area burned when precipitation is below 25 millimeters, while the separable estimate suggests a high expected amount of area burned when precipitation is below 10 millimeters. The right panel of Figure 5 shows that the difference between the nonseparable and separable estimates shown in Figure 6 is not statistically significant. The estimated *P*-value of using 100 simulations is .35, suggesting that a separable model for burn area as a function of mean humidity and precipitation may be reasonable for wildfire incidence in Tanjung Puting National Park. Similar tests of separability were conducted for all possible combinations of weather variables and their *P*-values are presented in Table 1.

**(a)**

**(b)**

Table 1 shows that a separable, or purely multiplicative form for model (1) may be reasonable in light of the fact that the difference between the nonseparable and separable kernel estimates of burn area for any two covariates and is not statistically significant. The implication is that the relationship between wildfire burn area and one covariate such as temperature, for example, does not appear to change significantly depending on the values of the other covariates.

The extent to which the weather variables used in model (1) result in improved predictions of daily wildfire burn area may be indicated by the relative decrease in root mean squared (RMS) error in wildfire area when using these variables. Compared to the best-fitting homogeneous Poisson (“null") model with constant expected burn area over all days, the separable model (1) reduced the root mean squared error from 38.20 to 31.36 . In the second column of Table 2, the RMS errors are reported when the entire 6-year dataset was used both in fitting and for model assessment. As a precaution against overfitting, the models were also fitted to the first 4 years of data and then assessed based on the final 2 years, and the resulting RMS errors are reported in the third column of Table 2. Although a considerable contribution of the association between these weather variables and daily burn area is due to visibility, as seen by the relative performance of model (1) compared to model (2), note that model (2) nevertheless does provide a very substantial improvement compared to the null model, indicating that the other weather variables such as temperature and precipitation have a cumulative effect that is stronger than that of visibility. Similarly, the association between sea level pressure and burn area is rather weak once the other variables (temperature, precipitation, humidity and wind speed) have been taken into account, as seen by the similarity in the performance of models (2) and (3). Note that the difference between columns 2 and 3 is largely due to the fact that 2006 saw an unusually high level of burn activity in Tanjung Puting. Indeed, even the homogeneous Poisson model, which only has one fitted parameter, has a very substantial increase in RMS error during the last two years of the dataset, and this is clearly not the result of overfitting.

#### 5. Conclusions

A purely separable model which predicts wildfire burn area as a function of temperature, sea level pressure, humidity, precipitation, visibility, and wind speed, appears to offer satisfactory fit to the data from Tanjung Puting National Park from January 2001 to January 2007. For the Tanjung Puting data, the relationship between each of the weather variables and burn area appears to be approximately exponential, with the exception of precipitation whose relationship with wildfire area is closer to linear. Departures from separability are not statistically significantly as indicated by application of the tests of Schoenberg [11]. The results appear to support the findings of Schoenberg et al. [3] and Schoenberg et al. [14] which suggest estimating wildfire hazard using a simple multiplicative model where the impact of each weather covariate is estimated separately. The separability of the model implies that the relationship between wildfire burn area and any of these weather variables in particular does not change significantly depending on the values of the other weather variables.

#### 6. Discussion

Accurate wildfire prediction based solely on daily weather variables such as those considered in model (1) is inherently limited. Weather is only one of several factors relating to wildfire occurrence and spread in Tanjung Puting National Park. In addition to obvious human interactions with wildfire activity such as arson, fire prevention policies, and fire suppression activities, *slash-and-burn* techniques, the preferred method of land clearing in Indonesia where fire is used as a tool to clear land, can rapidly spread fire if conducted in a negligent fashion or during periods of drought [17]. Nevertheless, the use of weather variables for gaining a better knowledge of when Tanjung Puting National Park is most susceptible to wildfire activity would be very valuable to park management and officials. The weather variables are easily attainable for park officials, and thus the use of current weather or immediate future weather information could be used in a model such as that discussed in this paper to inform park officials when they should prepare supplies and staff for containing or fighting particularly large fires.

The separability of model (1) has not been shown to be significantly violated for the dataset considered here. Were we to suggest this model for use by officials at Tanjung Puting National Park we must also note model (1) is quite simplistic and its fit could no doubt be improved by using more complicated functional forms for each of the terms, as well as considering different interactions between the variables. Furthermore, a homogeneous Poisson model is not an ideal baseline with which to compare the mean squared prediction error, and in future research, actual forward prediction should be used to assess the validity of the model, using data obtained separately from that used in model fitting.

It should be noted that the relationships between burn area and the variables examined here are purely empirical, based solely on observations within this 6-year period at this particular location in Indonesia. It is likely that these relationships will change over time, and one might object that a 6-year time frame is not sufficiently long to account for longer-term climatic variations such as those associated with ENSO events. In addition, it would certainly be imprudent to infer that the observed relationships between wildfire activity and weather variables should necessarily apply in other locations, or to extrapolate beyond the scope of our data, to significantly higher or lower temperatures or pressures, and so forth. The exponential relationships, in particular, between burn area and sea level pressure, temperature, and wind speed, should certainly be expected to taper off after some point.

In addition to these shortcomings, many important variables are excluded from the model. Only six weather variables are used, while other important factors such as vegetation, land use, and other various human interaction variables are not included in the model. Nevertheless, model (1) could potentially be used as a baseline for assessing more complex wildfire forecasting schemes for Tanjung Puting National Park, such as those proposed by de Groot et al. [6].