Abstract

Precision mapping towards tropical forest cover data is critical to address the global climate crisis, such as land-based carbon measurement and potential conservation areas identification. In the recent decade, accessibility to open public datasets on forestry is rapidly increased. However, the availability of finer-resolution of forest cover data is still very limited. As a developing country with numerous rainforests, Indonesia suffered multifaceted threats, particularly deforestation. Thus, precise forest cover data can be useful to fulfill Indonesia’s nationally determined contribution to climate change. In this study, we mapped the national forest cover data for Indonesia using a new object-based image classification approach based on combined Planet-NICFI and Sentinel-2 optical imageries. Our findings had relatively high accuracy compared with the other studies, with the F score ranging from 0.67 to 0.99 and can capture the fragmented forest in fine resolution (i.e., ∼5 m). In addition, we found that Planet-NICFI bands had a higher contribution in predicting forest cover than Sentinel-2 imageries. Utilizing forest cover data for further analyses should be performed to help the achievement of national and global agenda, e.g., related to the FOLU net sink in 2030 and the Global Biodiversity Framework.

1. Introduction

For the last few decades, ecosystem service has been the main issue in international nature conservation and rural development [1], and it is still a concern as the exploitation of natural resources, human-induced land use change, and global greenhouse gasses continue at a high rate [2]. Forests are not only affected by human activity but also serve an important role in mitigating upcoming threats such as landslides, floods, and loss of biodiversity [3]. Tropical rainforests, in particular, are known for their richness and contribution to the earth’s land-based ecosystem. Indonesia is one of the countries that have relatively massive forests, accounting for 39% of Southeast Asia’s forest extent [4]. Depending on the altitude and regional climate, it can range from lowland to mountainous forests. Each of these forest types contributes significantly to the ecosystem services that humans rely on, such as raw materials, reservoirs of biodiversity, soil protection, sources of timber, biomedicines, carbon sequestration, climate, and water regulations [57]. Indonesian tropical forests also play a critical role in the livelihood of local communities and the national economy [8].

However, deforestation has been one of the main issues of climate and biodiversity crises. The negative environmental consequences of tropical deforestation were far-reaching and long-lasting [9]. This rapid deforestation rate has contributed to biodiversity losses due to habitat degradation and fragmentation, particularly in Indonesia [10]. The latest studies from Margono et al. [11] suggest that forest loss in Indonesia has been recorded as one of the highest rates of primary losses in the tropics for the period 2002–2012, with annual primary forest cover loss in 2012 being the highest, totaling 0.84 Mha, more than the official forest loss report of Brazil (0.46 Mha). During the same period in other tropical rainforest countries, Mexico lost 0.28 Mha and Colombia with a primary forest cover loss of 0.69 Mha [8]. Indonesia, as a developing country, still struggles with infrastructure development which puts the forest with all the ecosystem services that it provides at risk [12]. Many policies drive investment in Indonesia to support economic growth in the form of infrastructure and land-based permits that will directly threaten forest cover. In Kalimantan and Sumatra, the amount of foreign investment toward infrastructure and extractive industries is five times greater than international funding for forest conservation schemes [13]. Barri et al. [14] analyzed that 50% of total deforestation (5.72 million hectares) in 2013–2017 occurred in logging concession, timber and oil palm plantations, and mining. Other numerous research studies also reported some factors that contribute to deforestation in Indonesia, such as road development [15], agricultural expansion [16], wildfires [17], and illegal logging and encroachment [18].

Halting deforestation and retaining the intactness of the forest ecosystem is a prevalent challenge in climate change mitigation [19], which may be assessed by the reliable assessment of carbon storage based on accurate mapping of forest types. Furthermore, the spatially explicit mapping of forest cover is critical for carbon stock estimation [20], wildfire behavior simulation [21], and wildlife habitat modeling [22]. In this regard, mapping the precise and reliable expected forest cover will support monitoring which can also be used as input in forest management and policymaking related to sustainable forest management.

With rising satellite availability and image resolutions, remote sensing data archives are continuously growing, possibly enabling users to access and analyze enormous time-series datasets. Remote sensing has become popular as a valuable tool for monitoring land cover, and it also works well for forest cover identification. Many previous research studies have shown that remote sensing data can predict forest and other land cover types with excellent accuracy [2327]. In addition, combining two or more sensors can improve the model’s performance in depicting forest cover data [2830].

The methods for identifying forest cover in Indonesia rapidly grew from 1995 until the recent years. Regarding [31], the map of Indonesia’s current land cover and land use was created using visual interpretation based on medium resolution imageries (i.e., Landsat). The accuracy of the forest cover classes is reported to be high (>90%), based on field verification and the operators’ local knowledge. However, visual-interpreting methods were relatively time-consuming, and the use of numerous interpreters over space and time compromises the consistency of the output map product [11]. Margono et al. [24] conducted a study about forest cover identification using a pixel-based method. Machine learning (ML) algorithms (e.g., random forest, support vector machine, and regression trees) typically produce better results than conventional classifiers since they do not require preconceptions regarding the distribution of the input data [32]. Machine learning is a subfield of artificial intelligence concerned with the development and investigation of systems that can learn from data. In the machine learning model, there are three approaches: supervised learning, semisupervised learning, and unsupervised learning. A machine learning system could, for instance, be trained on images to learn to differentiate between forest and nonforest images. After learning, it can then be used to classify new images into forest and nonforest object. The fandom forest algorithm is a classification method that used multiple and random subsets of data and features to produce multiple decision trees. A random forest classifier (RF) is a combination of tree predictors such that each tree depends on the values of a random vector sampled independently and with the same distribution for all trees in the forest [32].

Currently, the need for forest cover data with very high spatial resolution is increasing to support monitoring, reporting, and decision-making [33]. Nevertheless, recently available data related to the precise forest cover data is very limited, e.g., global forest change (∼30 meters; [34]), PALSAR forest (25 meters; [35]), and Indonesia’s primary forest cover (30 meters; [24]). By mapping the presence of forests in Indonesia, a consistent forest distribution and area can be obtained, which can then be used as a base map and also as a reference for management and information in Indonesia. This is because the maps produced use inputs that are specific to conditions in Indonesia, so the maps that are the resulting data are more specific when compared to globally processed forest maps. Forest mapping is essential because it can be used to support preservation programs, such as efforts to protect and preserve biodiversity. In the presence of fragmented forests, the area and distribution of forests can sustain biodiversity existence. Fragmented and isolated forest sections vary greatly in ecology and composition and may not support the same level of biodiversity or ecosystem function as forests of the same size but within large forest systems [36]. Mapping of the forest in Indonesia also plays an essential role in forest management. The spatial and temporal variation in primary forest loss documents the continuing appropriation of natural forests within Indonesia, including the increasing loss of primary forests in wetlands and in land uses meant to limit or prohibit clearing, with implications for accurate greenhouse gas emissions estimation.

In this research, a random forest classifier (RF) is used to classify forest cover using an object-based image classification approach. The primary objective of this study is to demonstrate the simplicity of the random forest ensemble method and its efficacy in image classification. This study’s ultimate objective is to achieve the utmost classification accuracy by implementing high-quality image data acquired by a modern sensor (Sentinel-2 and Planet) and a mathematically robust classifier that is a random forest. Results from this study highlight the importance of spatially and temporally explicit data in bringing transparency to an important land use dynamic. Here we present a refined forest cover dataset at the national level in Indonesia with a spatial resolution of ∼5 meters based on spectral combinations from Sentinel-2A and Planet-NICFI imageries using the random forest algorithm. Moreover, we also evaluated our data using reference points to assess model performance and compared the forest cover data with another forest cover dataset. In addition, we also explored current forest cover dynamics in the 2017–2021 period to improve national forest monitoring in Indonesia.

2. Materials and Methods

2.1. Study Area

This study was conducted in Indonesia (Figure 1), a tropical country that harbors various forest ecosystems (e.g., dryland forest, swamp forest, and mangrove forest) with a total area of about 189.1 million ha [11]. Local communities had significant connectivity with the forest ecosystems [37]; therefore, understanding more precise forest cover data is crucial in Indonesia.

2.2. Training Data

We used binary information related to the forest cover (i.e., forest and nonforest) for the response variable. We conducted a visual interpretation of the forest cover information using very high-resolution satellite images (e.g., Planet-NICFI and ESRI World Basemap) in 45 selected plots (1 × 1 degree) to capture the various forest ecosystems that occurred across Indonesia (Figure 1). In this study, we also compiled the training data from the field surveys and secondary sources of the Ministry of Environment and Forestry (MoEF) in 2017 and 2021. Afterward, we collected 45,119 and 38,886 points for forest and nonforest information, respectively, by using homogenous purposive random sampling in [38].

2.3. Data Preprocessing

Sentinel-2 is a high-resolution, multispectral sensor developed by the European Space Agency (ESA) to support Copernicus Land Monitoring research (https://sentinels.copernicus.eu/web/sentinel/home). In collaboration with Planet, Norway’s International Climate and Forests Initiative (NICFI) provides very high-resolution imagery to support tropical forest monitoring, cope with global climate change, retain biodiversity, and facilitate sustainable development (https://www.planet.com/nicfi/). This study used combined optical sensors of the harmonized Sentinel-2 multispectral instrument (MSI) Level-2A (spatial resolution: ∼10 to 20 meters, except B1 with 60 meters resolution) and Planet-NICFI (spatial resolution: 4.77 meters) based on surface reflectance data to identify forest cover data in the study area.

Cloud cover was the most common obstacle in using optical remotely sensed data to retrieve land cover information, particularly in the tropical region [39]. We used the Sentinel-2 quality assurance data of the cloud mask to eliminate the cloud pixels of the spectral reflectance data. We performed filter median over a yearly time window during each time frame of analysis (i.e., 2017 and 2021) to obtain the nearly free cloud images [40].

2.4. Data Covariates

To predict forest cover data, we used 20 variables retrieved from Sentinel-2 Level A and Planet-NICFI imageries as the model predictors. The covariates consist of reflectance and spectral indices from both sensors. Table 1 shows the details of the predictors used in this study.

2.5. Forest Cover Prediction

In this study, we performed a random forest algorithm with the different parameterization of the number of trees (N): N = 50, N = 100, N = 500, and N = 1000 to produce forest and nonforest categories, following Condro et al. [40]. All preprocessing and forest classification were performed using the Google Earth Engine. Google Earth Engine is a cloud-based geospatial analysis platform that delivers massive computing capabilities to address a variety of high-impact societal issues including deforestation, drought, disaster, disease, food security, water management, climate monitoring, and environmental protection [45]. In this study, we used the Google Earth Engine Platform to generate a good quality of satellite imageries and perform the machine learning classification to predict the forest cover maps across Indonesia. The preprocessing consists of geometric, radiometric, and spectral corrections. We used reflectance of Sentinel top of atmosphere (TOA) with 2-level data where the data were already georegistered with a root mean square error (RMSE) less than 10 m resolution which represented the best quality imagery available for the collected data. Cloud-masking using Quality Assessment band (BQA) and filter statistics by pixels (i.e., median) over the period of Sentinel-2 TOA reflectance imageries were performed before further analysis within The Google Earth Engine. To identify the national commodity cover in Indonesia, i.e., oil palm, rubber, coffee, cacao, and rice paddies, we used machine learning classification through the random forest (RF) algorithm. A random forest classifier provides an ensemble model that effectively distinguishes spectrally similar agricultural land and forest cover by generating multiple trees from training data and its predictors [4547]. Many studies have investigated the performance of the RF algorithm to identifying land cover from hyperspectral, multispectral imageries, and digital elevation model data as well [4850]. The number of variables per split has been defined as the root square of the number of features. This cloud computing platform is very useful for big-data analysis, particularly for the planetary scale of remotely sensed data [45].

2.6. Model Evaluation

We used the confusion matrix approach to assess the model performance by comparing the forest cover model with the testing reference data [51]. This matrix was applied to calculate discrimination metrics, i.e., overall accuracy (OA) and F score [52]. In this study, the Kappa coefficient was not considered as a reliable metric due to the findings from previous studies that showed the flaws of using this metric [53]. We performed k-fold cross-validation (k = 5) to create data partitioning (i.e., training and testing) for model evaluation [54]. Finally, we performed the model evaluation within different regions (i.e., Sumatra, Kalimantan, the Lesser Sunda, Sulawesi, Maluku, and Papua) to capture the variance of the accuracy.

In addition, we explored the variable importance of forest cover data based on the fusion spectral features using a mean decrease in Gini (MDG). The MDG predicts each variable contribution to the nodes’ homogeneity [55, 56]. We also evaluated the spectral characteristics of forest cover areas through two different sensors (i.e., Sentinel-2 and Planet-NICFI).

3. Results

3.1. Model Evaluation

This study found that random forest performs relatively well in estimating forest cover across Indonesia, with the OA and F score ranging from 0.69 to 0.99 and 0.67 to 0.99, respectively. The random forest algorithm (N = 1000) outperformed the other parameterization, with the F score ranging from 0.89 to 0.99 and OA ranging from 0.92 to 0.99. The discrimination metrics obtained from the various parameterizations of the random forest algorithm are compared in Figure 2.

Our result showed that Planet-NICFI images had higher contributions (79.7%) to forest cover identification than Sentinel-2 (20.3%). The highest relative contribution of the predictors to the model was the red band of Planet-NICFI (53.8%). The green band of Planet-NICFI imageries had the second highest relative contribution of the predictors (19.2%). Besides, the three fewer contribution variables to the model were B5, B4, and B3 (i.e., red edge, red, and green) of Sentinel-2 imageries (<0.002%) (Figure 3).

Our study provided exploratory analysis towards spectral imageries to differentiate forest and nonforest classification. Sentinel-2 had a wider range of wavelength in capturing object reflectance (ranging from ∼450 nm to 2200 nm) rather than the Planet-NICFI dataset (ranging from ∼450 nm to 900 nm). This study found lower forest reflectance than nonforest in blue to red channels for both sensors. On the other hand, we found higher reflectance of forest cover than nonforest in the near infrared to shortwave infrared bands (Figure 4).

Our study indicated that the forest cover had a relatively high similarity in the spectral distribution with the nonforest category due to the remaining other vegetation areas that were classified as nonforest. Hence, we also tested some spectral indices from both sensors to characterize the forest cover within the study area. The results showed that Planet-NICFI NDVI (df = 2232; tstat = 29.46; value <0.05), Sentinel-2 EVI (df = 2278; tstat = 27.22; value <0.05), Sentinel-2 IBI (df = 2278; tstat = −58.56; value <0.05), and Sentinel-2 SAVI (df = 2278; tstat = 28.70; value <0.05) can significantly separate forest cover with nonforest cover.

3.2. Forest Extent and Change over Time

This study showed that Indonesian forests covered 96,419,384.40 ha (∼51% of the total land area) and 86,773,348.49 ha (∼46% of the total land area) in 2017 and 2021, respectively (Figure 5). We found that Papua Province had the highest forest cover areas in 2017 (∼22.9 ha) and 2021 (∼20.7 million ha) compared with the other provinces. In recent years, most of the eastern Indonesia provinces still have a relatively high forest cover. On the other hand, DKI Jakarta was the province with the least forest, covering only 0.4% of the total area. The Java region had the lowest forest cover areas, with a total percentage of about 20% of the total area. The national forest extents for each province in 2017 and 2021 are shown in Table 2.

4. Discussion

The pixel quality of satellite imagery is crucial for land resource identification, particularly for forest cover prediction [57]. Cloud coverage is one of the obstacles that are mostly found in optical satellite imageries, such as Sentinel-2 and Planet-NICFI [58]. Although Sentinel-2 had a higher spectral resolution than Planet-NICFI, we found more noise effects due to cloud cover in it. The cloud can obscure important information about the object behind the closed area [59]. Therefore, our findings indicate that the model covariates used to identify forest cover data had a relatively good quality of pixels.

This study found that the OA for our data ranged from 0.69 to 0.99. On the other hand, Margono et al. [24] conducted primary forest cover identification in Indonesia from 2000 to 2012, with OA ranging from 0.7 to 0.91. In addition, [29] integrated three different satellite sensors to produce a land cover dataset using Google Earth Engine with OA ranging from 0.67 to 0.82. Moreover, [60] identified forest cover and produced OA ranging from 0.73 to 0.77 for the pixel-based method and 0.8 to 0.84 for visual interpretation method.

The difference in spatial resolution in satellite imagery also makes the forest cover captured vary. By using images with very high spatial resolution (≤5 m) the captured forest cover data becomes more precise as well as fragmentation information. On the other hand, missed detection can lead to the false conclusion that a natural ecosystem is an intake where in fact they have gone with high level of disturbance, e.g., fragmentation [61]. Forest degradation and fragmentation can lead to loss of biodiversity due to the missing connectivity and a reduction of water quality [62].

In Figure 6, we choose a case study in Bogor botanical garden, one of the ex-situ conservation locations in Bogor city, to compare the analysis that we have done with some other data that has a lower spatial resolution. Figure 6 depicted that CCI (Figure 6 (B), 300 m spatial resolution) and PALSAR (Figure 6 (D), 25 m spatial resolution) data could not capture the fragmented forest areas in the area. Meanwhile, the global forest change (Figure 6 (C), 30 m spatial resolution) and the dynamic world (Figure 6 (E), 10 m spatial resolution) can capture the fragmented forest but not completely. The comparison conducted by Boyle et al. [63] also showed that the Global Forest Change data only correctly detect 70.8% of forest fragments with an area >30 m while very high-resolution images (IKONOS, with a spatial resolution of 6 m) can precisely detect 100% forest fragments with an area of >6 m. In this study, we also found that Planet-NICFI had more significant contribution than Sentinel-2 imageries based on its variable importance in depicting forest cover in the tropical region. Previous studies also found that Planet-NICFI data provided better outputs than the other optical imageries in predicting forest cover [29, 64].

Variable selection is one of the methods for solving multicollinearity, and it also has the benefit of being simple to perform and resulting in a sparse model [65]. However, findings from Chan [65] show that variable selection drops variables and reduces information gain, while the multicollinearity measures to optimize are subjective. This study conducted machine learning approaches in predicting forest cover data based on combined optical satellites to improve model performance. Feng et al. [66] found that deleting highly correlated variables had no effect on model performance due to machine learning’s capacity to control model complexity by downplaying the significance of redundant variables. On the other hand, it has no effect on the accuracy of predictions.

Apart from the advantages of the data (e.g., precise spatial resolution), we also found some caveats within our dataset. The use of our data is limited in some instances for several reasons. Due to a large amount of input data, we aggregated various forest types in the tropical region of Indonesia, which could not capture forest diversity. The dynamics of forest cover changes can be seen more clearly with a higher temporal resolution. Forest cover data with a higher temporal resolution is much better for systematic forest cover change analysis [67]. Unfortunately, the data we present also has relatively low temporal resolution, i.e., annually, which makes the seasonal dynamics of forest cover impossible to capture. Our data also only captured general forest cover.

5. Conclusions

Our findings provide useful information regarding the detailed spatial resolution of the forest cover dataset at the national level of Indonesia. Random forest algorithm had an excellent performance in capturing tropical forest cover based on optical satellite imageries, with an overall agreement between 92% and 99%. Our data can deliver better precision and detail in depicting forest patches within small-scale areas than other data. To achieve a better sustainable forest management, stakeholders need to have a precise dataset regarding forest resources. This information can be useful for forest monitoring and planning, particularly for the national agenda related to the forest and other land uses as net carbon sinks in Indonesia. Further work on precision forest cover data utilization should be incorporated into carbon dynamics, conservation management, and spatial planning. Exploration regarding prediction techniques should be addressed for future studies (i.e., deep learning and neuromorphic computing).

Data Availability

The forest cover data that support the findings of this study are available in Zenodo at https://zenodo.org/record/7115068#.YzKnb7TP2Uk.

Conflicts of Interest

The authors declare that there are no conflicts of interest.

Acknowledgments

The authors would like to express gratitude toward Prof. Lilik Budi Prasetyo, M.Sc, from Faculty of Forestry, IPB University, and Drs. Kustiyo from National Research and Innovation Agency, Indonesia, for their comprehensive review to this manuscript, and Forest Watch Indonesia for providing infrastructure and financial support for this research. This research was funded by Forest Watch Indonesia through the state of the forests of Indonesia program.