Abstract

Soil organic carbon constitutes an important indicator of soil fertility. The purpose of this study was to predict soil organic carbon content in the mountainous terrain of eastern Lesotho, southern Africa, which is an area of high endemic biodiversity as well as an area extensively used for small-scale agriculture. An integrated field and laboratory approach was undertaken, through measurements of reflectance spectra of soil using an Analytical Spectral Device (ASD) FieldSpec® 4 optical sensor. Soil spectra were collected on the land surface under field conditions and then on soil in the laboratory, in order to assess the accuracy of field spectroscopy-based models. The predictive performance of two different statistical models (random forest and partial least square regression) was compared. Results show that random forest regression can most accurately predict the soil organic carbon contents on an independent dataset using the field spectroscopy data. In contrast, the partial least square regression model overfits the calibration dataset. Important wavelengths to predict soil organic contents were localised around the visible range (400–700 nm). This study shows that soil organic carbon can be most accurately estimated using derivative field spectroscopy measurements and random forest regression.

1. Introduction

Soil organic carbon (SOC) is an important property related to soil biological, physical, and chemical characteristics and constitutes a major component of the global carbon cycle [1]. SOC is classified as the third most important global carbon sink (>2500 Pg) and contains almost double the carbon that is found in the atmosphere (750 Pg) and terrestrial living biomass (560 Pg) combined [26]. In agricultural landscapes, SOC depletion as a result of accelerated soil erosion can lead to reduced crop yields, lowered moisture retention capacity, and reduced nutrient status [710]. SOC also contributes to stabilization of the soil and formation of aggregates, which can promote resistance to erosion [11, 12]. In mountain landscapes, however, soil physical properties are highly variable spatially, where changes in slope steepness, depth of weathering products, slope processes, and microclimate give rise to variations in vegetation types and soil properties and high rates of soil erosion [13].

In the mountainous highlands of Lesotho, southern Africa, soil erosion is a significant problem due to a combination of geologic, climatic, ecological, and human factors [1416]. Weathering of the underlying Jurassic basalts has produced a low-strength mixture of silica and expansive clay minerals [17]. Plagioclase within the basalts has been affected by zeolitization and chloritization, and olivine in particular has been replaced by iron oxides, serpentine, and clays (mainly montmorillonite) [18]. These weathering products make the resulting soil susceptible to erosion by surface sheet flow, subsurface clay expansion, slaking and soil piping, and landslide/debris flow activity caused by subsurface waterlogging and failure (e.g., [19]). Although many studies have been concerned with calculating soil volume loss by erosion through gullies (locally known as dongas) and evaluating change in gully morphology over time (e.g., [20, 21], few studies have considered organic carbon export as part of the sediment yield [22, 23]. This is important, however, because SOC export depletes the remaining nutrient base of soils within the catchment and can lead to lower agricultural productivity as well as environmental degradation, and SOC can change downstream trophic status and may lead to eutrophication and other negative consequences within aquatic ecosystems.

The first step in quantifying potential SOC export from a catchment is to map spatial patterns of SOC storage in surface soils. This is because (1) SOC content is highest in the soil A horizon, nearest the land surface, and (2) overland or sheet flow leading to enhanced soil erosion first affects this A horizon, which is therefore preferentially lost by erosion. Accurate measurement of SOC at different spatial scales is challenging [24] because it has been most commonly based on grid sampling of soil in the field. However, these techniques are expensive, time-consuming, and not spatially continuous [25], and a close sampling interval is required to capture SOC spatial patterns effectively. In mountain environments, high-density sampling resolution is particularly difficult to achieve because of poor accessibility and high relief and also because of the high variability of soils, ecosystems, and SOC found in these environments. Therefore, there is a growing demand for effective methods in quantifying SOC in mountain environments.

Remote sensing techniques offer a cost-effective, reproducible, and rapid method of quantifying spatially distributed data on SOC [26]. This is possible through the correlation between soil reflectance and soil organic content. Previous investigations show that increasing values of SOC are inversely proportional to an overall decrease in reflectance in the visible (Vis, 400–700 nm), near-infrared (NIR, 700–1400 nm), and shortwave infrared (SWIR, 1400–2500 nm) regions of the electromagnetic spectrum [27]. McMorrow et al. [28] observed absorption from 677 nm to 1108 nm which is associated with SOC and iron oxides. Other absorption features are present within the range of NIR and SWIR, related to lignin and cellulose at 1120 nm and 2100 nm, respectively. Airborne and satellite platforms have largely contributed to the assessment of SOC by examining these different spectral signatures, but one of the drawbacks of these technologies is that they cannot discriminate between carbon from surface vegetation and soils, and they generate a mixed pixel signal. It is also difficult to estimate SOC using satellite and airborne remote sensing methods when SOC concentrations within the soil are small because it results in a very weak signal [29].

Laboratory and field spectroscopy methods in the visible and near-infrared bands (Vis-NIR, 400–2500 nm) are therefore advantageous in measuring and modelling SOC content [30]. These approaches are both rapid and nondestructive [31]. Under controlled laboratory conditions, SOC content can be estimated with high precision and accuracy [32], whereas field-based measurements may be affected by atmospheric conditions, soil moisture, texture, and shadow effects [33]. Some researchers have reported satisfactory results when using Vis-NIR spectra to predict SOC in the field [34, 35]; however, results vary according to soil type and moisture content and thus cannot be applied everywhere. Viscarra Rossel and Behrens [29] argued that soil absorption spectral signatures are likely to overlap and vary spatially and temporally. This is a motivation to develop SOC models for different regions or ecological/geomorphic contexts. There are few studies addressing SOC modelling using either laboratory or field measurements in southern Africa [35], especially in mountainous regions where SOC is vulnerable to land use change mainly by overgrazing, soil erosion, and climate change.

This study therefore sought to (i) estimate SOC content using spectroscopy measurements under field and laboratory conditions, (ii) compare the performance of two different modelling approaches (partial least squares regression (PLSR) and nonlinear random forest (RF)) in predicting SOC content, and (iii) evaluate the role of spectral derivatives in determining model outputs. The purpose of this analysis is to identify better modelling approaches for SOC using remotely sensed data, which will enable a quicker and more accurate evaluation of SOC, in particular in poorly known areas such as mountains.

2. Materials and Methods

2.1. Study Area

The study area is in eastern Lesotho, southern Africa (Figure 1). In this region, mountains of the Drakensberg-Maluti range rise to 3482 m a.s.l. at Thabana Ntlenyana, with river valleys incised into Jurassic basalts, and low-nutrient xerophytic grasslands present on flat-topped mountain summit [36, 37]. The mean annual rainfall is 775 mm with 85% falling in the summer season [38]. The mean winter monthly temperatures range from −6.3°C to 5.1°C, and the mean summer maximum temperatures range from 16.5°C at high altitudes to 29°C in the lowlands [39]. Lesotho has a highly degraded mountain landscape [40, 41].

Most of the population (86%) depends on subsistence agriculture [42], and increased population and climate change have meant that cattle and sheep grazing has expanded from river valleys to high elevation pastures, leading to overgrazing and a loss of ecological integrity in these areas [43, 44]. Consequently, agricultural productivity has been declining in Lesotho [45].

2.2. Soil Sampling and Spectral Measurements

Fieldwork was conducted along the Mokhotlong River in eastern Lesotho (Figure 1) in October 2015 (austral spring/summer). This river flows east to west and has incised through Jurassic basalts, giving rise to a highly meandering river pattern with bedrock spurs and steep valley sides. The river floodplain is very narrow with small strip agricultural fields located adjacent to the river channel and, where slopes sediments are available, terraced fields are present along lower valley slopes (Figure 2). Higher elevation areas at the tops of valley sides and on plateau summits (∼2600–2900 m a.s.l.) are not enclosed and are characterised by tussocky grasslands [37].

2.3. Spectral Measurements in the Field

Soil spectral reflectance was measured in the field using an Analytical Spectral Device (ASD) FieldSpec® 4 optical sensor (Analytical Spectral Devices, Inc., Boulder, CO, USA). This instrument measures wavelengths from 350 to 2500 nm and with 3–10 nm spectral resolution. Spectra were recorded for the region 350–1000 nm with a sampling interval of 1.4 nm, and 2 nm for the region 1000–2500 nm. ASD sampling points in the study area (n = 109) were generated randomly using Hawth’s Analysis Tool within ArcMap. All points were converted to latitude and longitude, and a handheld GPS was used to navigate to these locations in the field. Once the sampling point was located, a plot of 10 m by 10 m was drawn where the coordinates of the sampling point were considered as the centroid of the plot. Within each plot, 3 subplots of 2 m by 2 m dimensions were randomly selected in order to take into consideration any variability within the plot. Five spectral measurements from the nadir at about 1 m height and with 5° field of view above the soil surface were scanned in each subplot and averaged, giving a total of 15 spectral measurements for each plot. For every measurement, the white reference panel was used to calibrate atmosphere conditions and irradiance of the sun. A surface (top 5 cm) soil sample of ∼400 g was taken from each of these three subplots for subsequent laboratory analysis.

2.4. Spectral Measurements in the Laboratory

Spectral measurements on the 109 soil samples collected in the field were made in the laboratory. All measurements were done on samples that had been air-dried and lightly crushed in a pestle and mortar to a uniform fine sediment. Measurements were made on a black background plate. The soils were scanned using the ASD with the white reference panel for comparison and with exactly the same methodology as in the field.

2.5. SOC Analysis

In the laboratory, soil samples were first dried and lightly crushed. On this mixed sample, the loss on ignition (LOI) method was used to quantify SOC concentration [46]. A subsample (∼20 g) was combusted in a muffle furnace for 8 hours at 430oC. LOI was measured as the difference between the oven-dry soil mass and the soil mass after combustion, divided by the oven-dry soil mass [47]. LOI values were converted to SOC with a factor of 0.55 [48].

2.6. Spectra Preprocessing and Transformation

Before modelling, the noisy ends of the spectra were removed in order to correct for low-intensity radiation appearing at the spectra edge. The spectrum below 400 nm was removed. The water vapour absorption features ranging between 1350–1460, 1790–1960, and 2350–2500 nm which can affect the model were also removed [49, 50]. The same operation was done with laboratory spectral data for standardization. The laboratory and field spectral first derivative transformation and Savitzky–Golay smoothing were applied in order to enhance the spectral signal [51]. Savitzky–Golay smoothing was first applied for reducing the noise effect before derivative transformation because derivative spectra are sensitive to noise [52] (Figure 3). The same methodology was undertaken for both field and laboratory spectral measurements.

2.7. Statistical Analysis

All statistical analysis was implemented in R program v3.1.3 [53]. Before modelling, 70% of the data (training dataset) were randomly selected for training the models for both laboratory and field data and the rest (30%) as the testing dataset. The normality of SOC values within the datasets was checked using the Kolmogorov–Smirnov goodness-of-fit test in order to select the most suitable statistical test. We used the Kruskal–Wallis test [54] to compare the training, testing, and whole datasets because of the lack of normality in the testing dataset. Student’s t-test was then used to compare the averaged field spectra with the averaged laboratory spectral data, which were both normally distributed. In order to identify outliers, Hotelling’s T2 distribution for multivariate analysis was performed with the field spectral data [55].

2.7.1. Partial Least Squares Regression (PLSR)

The PLSR method was implemented in order to construct predictive models when the independent variables are many, noisy, and highly collinear such as hyperspectral reflectance data [56]. The method uses orthogonal factors or components, called latent variables, as new independent variables of the dependent variable. These latent variables are simply linear combinations of the original independent variables, but in contrast to principal component regression, they are extracted such that they explain as much of the covariance between the dependent and independent variables as possible. PLSR is discussed in detail by Martens and Næs [57]. The optimum number of factors through the leave-one-out cross-validation method was used in order to minimize overfitting [58]. The root mean square error of cross-validation (RMSECV) was used in order to evaluate the optimal number of components that minimizes the RMSECV.

PLSR uses many algorithms for feature selection. In this research, the variable importance in projection (VIP) was used in order to identify key wavelengths. The logic of VIP is to accumulate the importance of each variable being reflected by the weight from each component. VIP provides a list of ranked variables and a threshold of between 0.83 and 1.21 which is used to select key wavelengths [59]. In this study, wavelengths where peak maxima were above a VIP threshold of 1 were selected as key wavelengths [60].

2.7.2. Random Forest (RF) Regression

The RF regression is a machine learning algorithm based on a classification and regression tree [61, 62]. The model uses recursive partitioning to split the data (spectra) into different homogeneous groups, named regression trees (ntree). Each tree is individually grown to its optimum size based on a bootstrap sample from the training dataset (70%) without any pruning (a continuous selection of input variables at every node). In RF regression, a random subset of variables (mtry) is selected to determine the split at each node [61]. The model uses a deterministic algorithm to select the number of random samples and variables from the training dataset. In each tree, the data that are not included in the tree (the out-of-bag (OOB) data, 30% of the total dataset) are predicted and the OOB error is produced in terms of mean square errors through the difference between OOB data and data used to grow the regression trees [61, 63]. The OOB error provides an estimation of the important variables by calculating how much the OOB error is increased when a variable is changed, while all others remain unchanged [64]. This attribute enables the operator to select or train the RF model to focus on certain features. In this study, the RF algorithm was implemented for both field and laboratory spectral datasets. Because of the large number of variables, model optimization which is computationally intense was not implemented. The default setting for mtry (1/3 of the total number of wavelengths) and ntree (500) was used.

The recursive feature selection [65] was performed to determine the least number of wavelengths that predict SOC concentration with greatest accuracy. The recursive feature elimination algorithm is a wrapper feature selection method that uses all features (variables) as a starting point [66]. Models with low accuracy are removed from the current subset. The procedure ends when the given numbers of variables are dropped [66]. The combination of the recursive feature selection with the important variables ranked according to the percent decrease in mean squared error helped us to identify key wavelengths.

2.8. Model Validation

Many parameters can be used to assess the performance of models in spectroscopy. Spectra models are generally assessed in terms of their coefficient of determination (R2) and root mean square error (RMSE). Other parameters such as the Akaike information criterion (AIC) and the ratio prediction to deviation (RPD) can also be used. In this study, all of these parameters were used for completeness. The root mean square error of calibration (RMSEC) and validation (RMSEP) are calculated as follows:where ym are measured values from the laboratory measurement, yp are predicted values derived from spectral data using either PLSR or RF, are predicted values obtained using the validation set, and N is the number of samples. The model with a lowest coefficient RMSE has the best performance. The AIC is a compromise between model accuracy and model parsimony [67] and is calculated as follows:where n is the number of samples and p the number of features used in the prediction. The model with the smallest AIC has the best performance. The RPD was also used to compare the performance of models of different datasets, as well as their practicability. This metric is a way of normalizing the RMSEs of prediction in order to compare calibration models where the measured variables have different variances [68], and is calculated as follows:where STDEV(y) refers to the standard deviation of the reference data (calibration dataset) and RMSEP refers to the root mean square error of prediction.

The six categories of interpretation as suggested by Viscarra Rossel et al. [69] were adopted as follows: an RPD value of >2.5 indicates excellent models/predictions; 2.0 < RPD < 2.5 indicates very good quantitative models/predictions; 1.8 < RPD < 2.0 indicates good models/predictions, where quantitative predictions are possible; 1.4 < RPD < 1.8 indicates fair models/predictions which may be used for assessment and correlation; 1.0 < RPD < 1.4 indicates poor models/predictions, where only high and low values are distinguishable; and RPD < 1.0 indicated very poor models/predictions, and their uses are not distinguishable.

3. Results

3.1. SOC Sample Analysis

Of the 109 soil samples collected, 94 were analysed for SOC content, excluding outliers. Hotelling’s test detected 4 outliers with a horizontal cut off limit of 6.32845 and a vertical cut off limit of 3.98695. The statistical description of SOC of the calibration dataset, validation dataset, and the whole dataset is presented in Table 1. SOC values ranged from 1.93 g 100 g−1 to 10.6 g 100 g−1 with a mean value of 5.04 g 100 g−1 and a standard deviation of 2.11. The Kolmogorov–Smirnov test shows that all datasets were not normally distributed ( values of 0.0126, 0.0020, and 0.2100 for the whole dataset, calibration dataset, and validation dataset, respectively). All subsets have a skewed distribution. The Kruskal–Wallis test for skewed distributions indicates that there are no significant differences among the three datasets at 5% significant level ( value = 0.64). Thus, both calibration and validation datasets statistically represent the whole dataset.

3.2. Comparison of Field and Laboratory Spectra

The mean reflectance values of field and laboratory measurements were computed for all 94 samples. In general, the reflectance values of the SOC measured in the laboratory are higher than those measured in the field, confirmed by a one-tailed Student's t-test (significance at 5% level, value = 0.024). Pearson’s correlation test reveals that both spectra measurement are strongly correlated at 5% significant level (R = 0.99, value < 2.2e−16).

3.3. Key Wavelength Selection
3.3.1. Number of Variables Selected by VIP

VIP algorithms were computed with PLSR for both field and laboratory spectra. The algorithm selected 80, 720, 57, and 881 key wavelengths for the first derivative laboratory spectra, the raw laboratory spectra, the first derivative field spectra, and raw field spectra, respectively. The location of 57 selected wavelengths of the first derivative field spectra is presented in Figure 4.

3.3.2. Number of Variables Selected by the Recursive Feature Selection

The recursive feature selection was computed for both field and laboratory spectra. The smallest number of wavelengths for each model that would offer the best prediction using the random forest method was identified. The lowest number of key wavelengths (49) was obtained with the first derivative field spectra with a RMSE of 0.61 g 100 g−1; the location of these wavelengths is presented in Figure 4. The use of 105 wavelengths produces the lowest RMSE (0.62 g 100 g−1) for the first derivative laboratory spectral data. For the laboratory raw spectral data, the use of 105 variables produced the lowest RMSE (0.61 g 100 g−1). For the field raw spectral data, the use of 789 variables produced the lowest cross-validation error (1.059).

3.3.3. The Position of Key Wavelengths

Table 2 shows the position of key wavelengths selected by both the VIP algorithm and the recursive feature selection method for all combinations of models. For interpretation purposes, the functional group and vibration mode of wavelength are also presented. The recursive feature selection algorithm selects around half of the key wavelengths in all datasets between the range 400 and 700 nm. The VIP algorithm implemented under the first derivative spectral data (laboratory and field) also selects most of the key wavelengths in the same range. However, the VIP algorithm implemented under the raw spectral data (field and laboratory) did not select key wavelengths within the visible spectrum, and most wavelengths selected are around 2000–2200, 1400–1500, and 1350–1450 nm.

3.4. Model Development

In total, 16 models have been developed, 8 with the field spectral data and 8 with the laboratory spectral data. For each spectral data (field or laboratory), 4 models were developed with RF and 4 others for PLRS using deferent transformations: raw data, the first derivative (FD), key wavelengths (K), and the combination of first derivative and key (FD-K) wavelengths. RPD was used as the most important criterion in ranking those models.

3.4.1. Models from Laboratory Spectral Data

Results from the two different modelling approaches used to predict SOC (PLSR and RF) using the laboratory spectral data are presented in Table 3. The effects of wavelength selection and the spectral first-derivative on the model are also presented. The best prediction model is selected according to its performance with respect to fitting to the validation dataset.

The results show that PLSR overfits the calibration dataset with Rc2 between 0.99 and 0.89, but with relatively low accuracy in predicting a new dataset. PLSR models are very good models/prediction (2.0 < RPD < 2.5), whereas RF models are considered as excellent models/predictions (RPD > 2.5). The best predictive model for the laboratory data was obtained with L-FD-RF-K (RPD = 3.77, Rp2 = 0.87, and RMSEP = 0.64 g 100 g−1).

3.4.2. Models from Field Spectral Data

Table 4 shows the 8 different models developed using the field datasets. The results indicate that PLSR models are likely to overfit the calibration dataset. All RF models show excellent models/predictions (RPD > 2.5). The best prediction model for the field spectra data was achieved with F-FD-RF-K (RPD = 3.77, Rp2 = 0.88, and RMSEP = 0.64 g 100 g−1), followed by F-FD-RF (RPD = 3.03, Rp2 = 0.89, and RMSEP = 0.79 g 100 g−1). As for RF algorithms, first derivative and key wavelength selection improve the performance of F-PLSR. Compared to F-PLSR (RPD = 1.88, Rp2 = 0.80, and RMSEP = 1.05 g 100 g−1), both F-FD-PLSR (RPD = 2.26, Rp2 = 0.86, and RMSEP = 0.88 g 100 g−1) and F-PLRS-K (RPD = 2.27, Rp2 = 0.86, and RMSEP = 0.75 g 100 g−1) predict SOC content in an independent dataset with higher accuracy.

4. Discussion

This study highlights the complexity of relationships between field and laboratory measurements of the spectral signatures of soils and their relationships to SOC. The remote sensing approach adopted here is suitable for inaccessible and geomorphically variable mountain landscapes. The complexity of the study area is depicted by high variability of soil properties including SOC values (Table 1). This can be explained by the geomorphological context of the study area which is characterised by steep bedrock-controlled slopes with stepped profiles, underfit river valleys [70], and nutrient-poor soils with limited bioavailability of nitrogen [37]. These soils accumulate on flat basalt weathered surfaces and in association with river floodplains and terraces. Subsistence farmers use these latter locations for small-scale agriculture (Figure 2).

High soil variability leads to high R2, RPD, and RMSEP values [71, 72]. This can also explain why results from our RF models are more accurate than the results that Viscarra Rossel and Behrens [29] found using 72 wavebands under laboratory conditions and slightly better than what Nawar and Mouazen [73] have recently found under laboratory conditions. Their best predictive RF model had a R2 of 0.84, RMSEP of 0.14 g 100 g−1, and RPD = 2.55, with a relatively low range of SOC in the whole dataset. However, the results we have obtained with the PLSR model are within the range that Viscarra Rossel and Behrens [29] and Li et al. [74] found under similar laboratory conditions, and Stevens et al. [75] and Viscarra Rossel and Behrens [29] found under similar field conditions, using the same regression approaches.

Comparing RF to PLSR, our results found that RF outperforms PLSR in predicting SOC values in an independent dataset. The predictive performance of RF can be explained by the fact that machine learning algorithms are less sensitive to nonlinear and complex data [29]. The weakness of PLSR to accurately predict SOC values in a new dataset has also been reported in some studies (e.g., [75, 76]).

It is difficult to compare the performance of laboratory and field models because they both achieve very similar results with slightly different combinations of input variables. However, Li et al. [74] found that the PLSR model achieved better accuracy within their laboratory spectral data than in their field data. The results also revealed that the reflectance of laboratory measurements is visually higher than the field reflectance. This is because of the presence of soil moisture in the field [77] which increases the forward scattering of light and enhances absorption at all wavelengths [78]. However, Viscarra Rossel et al. [79] found no significant difference between field and laboratory measurements because they removed the spectral bands for water absorption. This may be a suitable method for accounting for variations in soil moisture in field samples.

The effect of spectral derivative and key wavelength selection on improving the raw model was evident in this study. This has also been reported by other researchers (e.g., [80, 81]). Peng et al. [81] assessed the impact of 8 different preprocessing methods and showed that including the first derivative achieved the best result. Li et al. [74] demonstrated the superiority of the first derivative with SG smoothing to improve PLSR. Vasques et al. [80] also used SG smoothing to improve SOC model results. Viscarra Rossel and Behrens [29] improved the RF method by using a discrete wavelet transform algorithm as a feature selection method. These previous studies show the importance of key wavelength selection. Our results show that wavelengths between 400 and 700 nm are most important to predict SOC content (Figure 4), which is also in accordance with some previous studies [29, 82].

According to Viscarra Rossel and Hicks [83], the visible portion of the soil spectrum is mostly related to Fe oxide, either goethite or haematite. Different soil chromophores including chlorophyll, tannis, humic substances, iron oxide, and clay minerals may also explain the high correlation between spectral data and SOC in the visible region [34]. Viscarra Rossel et al. [79] found correlations with wavelengths around 410, 570, and 660 nm in the visible part of the spectrum. Under laboratory conditions, Wang et al. [72] reported 440, 560, 625, 740, and 1336 nm as the principal spectral bands to predict SOC. Nocita et al. [84] suggested that the spectral region between 580 and 680 nm was sufficient to predict SOC. Our results show some other important wavelengths at around 2000–2200 and 1400–1500 nm. Molecular vibration and rotation of organic functional groups are the main factors leading to absorption in the NIR region. Wavelengths around 2000–2200 nm, for example, may be attributed to the effects of carbonyl C = O/CH stretch vibration [83] or clay minerals [85] (Table 2). Spectral peaks around 1455 nm can be attributed to the presence of water but with the phenomenon of spectral overlapping, as discussed by Viscarra Rossel and Behrens [29]. Nevertheless, Stuart [86] attributed the spectral portion between 1400 and 1500 nm to the first overtone N-H stretching and first overtone O-H stretching.

5. Conclusions and Wider Implications

This study shows that SOC values in degrading mountainous landscapes can be predicted using different modelling approaches based on field and laboratory spectral measurements. The best model was shown to be the RF because the PLSR model was more likely to overfit the calibration dataset. There were also some slight differences when these models were applied to the field and laboratory data: the PLSR model was slightly better with the laboratory compared to the field data, whereas there was no difference with the RF model. The best model results were obtained with transformed spectral data, with the key wavelengths to predict SOC values mostly localised around the visible range (400–700 nm). These results are significant because they show that different models can produce results of varying accuracy, even when based on the same datasets. This has implications for operator choice when it comes to dataset analysis for the purpose of accurately predicting SOC values.

The results of this study also have wider implications for the accurate prediction of SOC content in degraded mountain landscapes such as Lesotho. The spatial variability in SOC values (Figure 1) shows that a field-based approach alone is unlikely to be able to accurately capture this variability, even with high-resolution sampling. The application of different spectral analytical techniques means that better predictive models for SOC content can be established, using satellite and not just ground-based remote sensing data, as described in this study. This will allow for a better understanding of spatial patterns of SOC in inaccessible mountain landscapes. Furthermore, quantifying carbon storage within soils, and coupling this to land surface models of soil erosion (e.g. [13]), can mean that carbon export from mountains can be calculated. This is critical for carbon budgeting as well as evaluation of soil nutrient status.

Data Availability

The soil analysis and the spectral data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

This study was supported by NRF grant 90526 (to JK).