Abstract

Rapid and onsite determination of the soil status and quality parameters holds a brighter potential for improving food security, and minimizing waste of the excessive application of soil amendments hence reducing environmental pollution. In this study, a pocket-sized shortwave NIR spectroscopy (740–1070 nm) and multivariate statistics were used to classify soil from different land-use types and simultaneously predict nitrogen (N), phosphorus (P), potassium (K), calcium (Ca2+), magnesium (Mg2+), and pH in Ghana. Different Algorithms. Linear discriminant analysis (LDA), support vector machine (SVM), and partial least squares algorithms (full-range partial least square, FrPLS; interval partial least squares, IPLS; synergy interval partial least squares, Si-PLS) were attempted for building a suitable classification and quantification model. The models were assessed by the classification rate, coefficient of determination (Rp2), and root mean square error of prediction (RMSEP) in the prediction set. A total of 110 soil samples from 0 to 15 cm, 15 to 30 cm, and 30 to 45 cm layers were collected from the field of different land-use cropping systems. The results obtained showed that SVM had a 98.61% classification rate of the soil from the cropping system. While Si-PLS was superior in predicting N, P, K, Mg2+, Ca2+, and pH. The performance of the Si-PLS model for N, P, K, Mg2+, Ca2+, and pH were 0.571, 0.779, 0.910, 0.778, 0.826, and 0.904 for Rp2 and 0.033%, 0.738 mg·kg−1, 0.117 cmol·kg−1, 0.654 cmol·kg−1, 3.0219 cmol·kg−1, and 0.4760 pH unit for RMSEP, respectively. The results revealed that the portable NIR spectroscopic technique could be used to measure the soil status and some quality parameters. However, further studies are needed to proof its application. This could lead to improving the yield and saving the cost of fertilizer application.

1. Introduction

Soil quality and soil fertility management play a significant role in agricultural productivity and environmental pollution control, and therefore, a rapid knowledge of the soil quality status of the soil is a vital step. The status of soils is normally measured by analysing the soil using the traditional laboratory technique known as wet chemistry to provide useful information. The wet chemistry approach comes with its numerous challenges, such as it is expensive, time consuming, involve chemical usage, often restricted to fewer samples, or samples are bulked from an area to provide representative composites and prompting the use of pedotransfer functions as a substitute [1, 2]. It is also known to generate unwanted waste and destructive to the original soil samples [3]. Above all, it is limited to the laboratory and cannot be used in the field where it is needed most to provide rapid and accurate results to assist in the promotion of precision agriculture. This shows that an alternative technique is required in the face of promoting in situ determination to encourage precision agriculture. The development of alternative measurement methods that are accurate, rapid, and inexpensive is of great value [4].

NIR spectroscopy is an advanced analytical technique that has gained ground in various fields including agriculture. It provides many useful advantages over the traditional analytical methods. These advantages include the following: it is physical, nondestructive, rapid results, and no chemical usage, hence environmentally friendly and inexpensive [2]. Research conducted by other researchers have revealed the potential usefulness of NIR spectroscopy for soil analysis and notable among them include the measurement of heavy metals in soil [4], soil physical, chemical, and biochemical properties [5], soil carbon and nitrogen [6, 7], discrimination of three major soil types [8] and discrimination of organic matter in soil from grass and forest [9]. All these aforementioned studies have proven that NIR spectroscopy could provide the needed alternative for soil analysis. However, all these studies involve the use of large NIR machine that defeat the purpose of onsite usage. Hence, there is little or no attempt to use a small NIR spectrometer for simultaneous determinations of soil health properties. However, due to the advances in computers and electronics, portable or small NIR spectroscopy has been proposed and developed coupled with chemometric. This could provide an added advantage over the laboratory-based NIR spectroscopy. However, up until now, little or no studies have been done in Ghana on the use of pocket-sized user-friendly NIR spectroscopy for soil analysis on the classification of different land-use types and also for predicting soil health quality parameters.

This research, therefore, seeks to investigate the feasibility of applying pocket-sized NIR spectroscopic techniques coupled with multivariate statistics by employing a variable-wise selection protocol for the simultaneous classification and detection of soil health properties to inform the stepwise precision application of soil amendment. The specific objectives are to predict the identification of soil under different land-use types and determine N, P, K, pH, Ca2+, and Mg2+ simultaneously by employing synergy interval variable selection optimum.

2. Materials and Methods

2.1. Sample Collection

A total of 110 soil samples were collected at different depth (0–15, 15–30, and 30–45 cm) from different land-use types such as arable, native, pasture, and plantation as describe by others [10]. Physically, any rough stones and plant debris were removed before the soil samples were air dried. The soil samples were then individually uniformly ground, sieve through a 2 mm sieved, and then package in a well-labelled polythene bag before analysis.

2.2. Sample Spectral Acquisition

The spectrum of each sample was obtained in the reflectance mode using a pocket-sized spectrometer (SCIO™) in a spectral range of 740 nm–1070 nm in a 1 nm resolution for spectra data recording. To scan the samples, a 60 g sample was poured into a glass container as seen in Figure 1 and scanned four times after rotating it at 45o. The whole process was carried out at 28−31°C and 65% relative humidity. The raw dataset of 110 soil samples stored in the cloud based were downloaded using a research license of SCIO lab and imported into MATLAB version 9.5.0 (Mathworks Inc., USA). The downloaded raw dataset was divided into two subsets called the calibration set (77 samples) for developing the model and the prediction set (33 samples) for evaluating the predictability of the developed model. To avoid bias in the selection of members in each subset, the Kennard−Stone algorithm was used in the partitioning of the dataset.

2.3. Reference Methods

The pH of the soils was measured in a 1 : 2.5 (w/v) soil: water ratio with a pH meter [11]. Total nitrogen (N) was determined using the micro Kjeldahl digestion method [12]. Available phosphorous in the soil was determined following the Bray-1 acid method [13]. Ca, Mg, and K were determined through extraction using the ammonium acetate method at pH 7 [14]. All the analysis were done in triplicates, and the measured soil chemical properties were statistically processed in terms of the range (maximum to minimum values), mean, and standard deviation (SD) as seen in Table 1.

2.4. Mathematical Signal Treatments

In this study, five mathematical spectral signal pretreatments (MC, mean centring; MSC, multiplicative scatter correction; SNV, standard normal variate; FD, first derivative; and SD, second derivative) were comparatively used to obtain the best model developed. In NIR modelling, it has become very necessary to pretreat the raw data set with the best techniques and the challenge, however, is there are several of them. It has, therefore, become a huge task coupled with the fact that it cannot be left-out. Spectral pretreatment is known to be an effective method to reduce or eliminate the optical scattering from different particles, reduce noises, and thereby improve prediction accuracy and robustness of the developed model [15]. Also, any interferences caused by light scattering, baseline shift, and slope variations caused by the particle size are causing unwanted signals to be removed [16]. MC uses the principle of calculation of average; thus, this average spectrum of the data set is calculated and this average is subtracted from each spectrum of the acquired data [17]. SNV is normally used to remove scatter variation from the light source in the spectral data by eliminating multiplicative interferences and scatter [16, 18]. MSC is a unique preprocessing technique that is normally used for the correction of scattered light and to remove different inclinations of spectral peak. For more information, refer to [19]. Also, FD and SD derivatives spectra pretreatments are used to separate overlapping peaks and eliminate the baseline shift and it is improved by using the Savitzky–Golay algorithm.

2.5. Quantification Models

The partial least squares (PLSs) algorithm is a well-known linear multivariate algorithm proposed by Herman Wold for modelling complicated data set [20]. It has recently found its use for analysing spectra data with strong collinear, noise, and redundant variables. However, the original PLS works on full spectrum and involves a larger sample matrix which often has both useful and unwanted information. To overcome this bottle neck in the PLS model, other researchers have resorted to the manual selection of different spectral regions to estimate some chemical composition [21, 22]. This approach, however, is slow and cumbersome and requires a prior experienced knowledge about unique spectra selection. To solve the aforementioned challenges associated to the PLS model, the interval partial least squares (IPLSs) and synergy interval partial least squares (Si-PLSs) models were proposed. For IPLS, it works by splitting the spectra into smaller equidistant regions and they develop the model for each subinterval by the original PLS, while Si-PLS also split the data set into a number of intervals and then calculate all possible PLS models for all possible combinations of more than one interval (two, three, and four intervals). The best interval for IPLS and Si-PLS are selected based on the lowest root mean square error of calibration (RMSEC) for a single selected interval and for a combination of intervals with the for the best outcome is chosen respectively. The results of the model are normally evaluated by using three main parameters, namely, the RMSECV, the root mean square error of prediction (RMSEP), and the coefficient of determination (R2) [23, 24]. These parameters are calculated by using the following equation:where n = the number of samples. yi = the reference measurement results for sample i,  = the estimated results of the model for the sample i, and  = the mean of the reference measurement results for all samples in the data set.

3. Results and Discussion

3.1. Spectral Data Presentation

Spectral profile obtained contains useful information for modelling. Figure 2(a) presents the raw spectra of soil samples from different land-use types and this revealed several absorptions bands. However, the spectra profile appears to show similarities with no unique differences when looked at with the naked eyes. Furthermore, the spectra profile appears to have no useful information and this, therefore, called for the use of multivariate algorithms to assist in the building of qualitative and quantitative models for predicting useful parameters of interest. Also, the wavelength range (740–1070 nm) used possesses unique functional groups such as C-H stretch, C-H deformation, S-H, N-H, CH2, and CH3 that could correspond to various parameters in soil such as N, P, K, pH, and other distinct attributes (as seen in Table 1) that could be useful for differentiating the various soil types, as seen in Figure 2(b). The wet chemistry results obtained in this study showed a wide range of chemical properties as seen in Table 1, and this could be attributed to the wide array of land use types for the study from which the samples were collected. The results obtained also agree with those of other authors [10]. Furthermore, the relationship between the spectral absorption wavelength and soil chemical composition (absorption of C-H, O-H and N-H bonds) made it possible to quantify specific soil health parameter of interest using appropriate selection of the wavelength region [25], and this could be attributed to the clear separation as observed in Figure 2(b). Also, the organic matter present in the samples used have distinct spectral fingerprints in the NIR region because the relatively strong absorption of overtone and the combination modes relative to several functional groups (CH: aliphatic, CO: carboxyl, NH: amine and amide) are usually present in the organic compounds [26].

3.2. Principal Component Analysis (PCA)

Principle component analysis offers an unsupervised pattern recognition tool in a dimensional space for observing any possible cluster trends. It works by reducing the dimension of the data matrix and translating useful information into interpretable variables known as principal components (PCs). Figure 3(a) shows the outcome of PCA and it revealed that there were four distinct soil groups. All the samples clustered well along the two PCS planes where PC1 and PC2 could explain 92.68% and 6.68% of the variance, respectively, giving a total accumulative contribution of 99.37% variance for the 110 samples used in this study. This means the first two principal components (PC1 and PC2) cover the maximum information and provided the chemical compositional information in the NIR region for modelling. Soil samples have considerable unique differences in chemical properties in accordance with their land use type. Since PCA is not a classification tool, LDA and SVM multivariate classification techniques were used for building a classification model.

3.3. Classification Model

There are several classification algorithms and most often the selection of the ones to use is a big challenge. In this experiment, linear discriminant analysis (LDA) and the support vector machine (SVM) were comparatively used. This was because every multivariate classification model has its own strength and weakness. From Table 2, it could be observed that the LDA model had its optimum classification rate at 98.65% and 97.22% in the calibration set and the prediction set, respectively, after the FD preprocess technique was applied on the raw data. This finding supports the aforementioned fact that preprocessing methods are known to improve modelling results as it normally eliminates unwanted information, reduce noise, improved accuracy, and enhance robustness of the developed classification model [15].

On the other hand, the SVM obtained the best results comparatively at a classification rate of 99.32% and 98.61% in the calibration and prediction sets, respectively, as seen in Table 2. Also, among the preprocessing techniques used, and MSC and FD improved the raw spectra data set, hence enhanced the final classification rate. It could be explained that MSC is unique in the correction of scattered light and to remove different inclination of the spectral peak while FD enhanced spectra separation. In this research, the MSC-SVM/FD-SVM model was superior in the classification of land use types. A cross-validated analysis was done, and Figure 4 shows cross validation done using randomly selected spectra to test the model. Among the samples used in Figure 4, it was observed that only one sample was misclassified. This sample was the one from the pasture land-use type. It could be explained that the SVM created a hyperplane that allowed the separation in the higher dimension feature space because the SVM is a transformational tool that converts data from a low dimension input space to a high dimension feature space [17].

Explaining the phenomenon of the accurate classification is vital. Figure 5 reveals the total contribution of the unique wavelengths that contributed to the neat separation and classification of the land use types. At the first component, the major peak was found around 900 nm and this corresponds to CH3 and CH2 at the third overtone [26] associated with organic materials, while at the second and third components, the major peaks were found around 800–830 nm, 850–875 nm, 925–950 nm, and 1000–1050 nm. These wavelengths correspond with RNH2, ArCH, CH3, CH2, and RONH2 [26] that are associated with chemical properties like nitrogen, pH, organic carbons, and among others in the soils used in this research.

3.4. Quantitative Models

The spectral prediction of nitrogen, phosphorus, potassium, pH, calcium, and magnesium were modelled by using different PLS and other wavelength selection techniques (IPLS and Si-PLS). From the results obtained by using the full PLS algorithm, first derivative spectra preprocessing performed better than the others in all the soil quality parameters as seen in Table 3. This performance could be due to first derivatives spectra pretreatment’s ability in greatly defining the presence and locations of hidden absorption bands [27]. Also, from Table 4, the parameters measured did not show any well-defined pattern for the preprocessing models' performance. The parameters measured did not show any well-defined pattern for preprocessing model performance. Thus, mean centring (MC) preprocessing was superior for nitrogen and calcium, while first derivative and SNV outperformed the others for phosphorus, pH, and potassium. Generally, results obtained by using Si-PLS showed an optimal performance for all the parameters studied, as seen in Table 5. Specifically, FD preprocessing spectra treatment also enhanced the results of most quality parameters (N, P, and K), while MC enhanced Calcium results and No preprocessing treatment was needed for pH and magnesium.

Comparatively, as seen from Table 6, IPLS performed the least followed by full PLS, while Si-PLS performed best for all the parameters (N, P, K, Mg2+, Ca2+, and pH) studied. These revelations could be explained by that each PLS type has its unique properties. PLS performed on the full spectral region of the soil samples and contained some irrelevant spectral information which inevitably reduces the performance of the PLS model, while IPLS actually overcome the challenges of PLS by selecting a maximum region of interest to calibrate the PLS model. However, only a single interval selection gives way for the neglect of other useful spectral information. Hence, it could be seen that IPLS performance declined drastically. On the other hand, its counterpart (Si-PLS model) uses the combination of more than one useful selection of intervals to model the parameter of interest as in the case of this study. Therefore, Si-PLS showed its own superiority over PLS and IPLS because it overcame the demerits showed by both techniques (full PLS and IPLS).

More specifically, for nitrogen prediction, Si-PLS performed best, as seen in Table 6. The optimal spectral interval selected were 770–784, 945–958, and 973–986 nm at 4 PLS components, as seen in Figure 6(a). These spectra corresponded to various absorption bands for the nitrogen content in soil as these ranges are associated with RNH2 according to others [26]. These wavelengths are also associated with C-H and N-H third overtones. For phosphorus, the optimum selected wavelengths were 768–781, 894–907, 973–986, and 1058−1070 nm at 3 PLS components as seen in Figure 6(b), which represents the third overtone region and correspond to ArOH, CH3, and ArCH. The mobilization of phosphorus plays a vital role in capturing, storing, and converting the sun’s energy into biomolecules, such as adenosine triphosphate (ATP) that drives biochemical reaction (photosynthesis). While for potassium, the optimal spectra range was found around 846–860, 876–890, 921–935, and 996–1010 nm with 7 PLS components in the second overtone region, which represents ArCH and CH in the electromagnetic wave as seen in Figure 6(c). Potassium supports transporting and forming sugars and starch through the plant. It is also vital in water regulation in plant. The total pH in soil is very important because it influences several soil factors affecting plant growth such as soil structure, soil bacteria, and nutrient availability among others and it is described as the master soil variable [28]. In this study, the optimum spectra tool selected four unique wavelengths for pH were 810–823, 824–837, 922–935, and 1019–1031 nm at 7 PLS components, as shown in Figure 6(d). These wavelengths represent C-H3, C-H2, C-H, and O-H corresponding to acidity [26]. It is particularly important to rapidly determine soil pH onsite as it readily gives a hint of the soil condition and the expected direction of many soil processes and can also be applied for nutrient cycling for plant nutrition and soil remediation [28]. For the optimum modelling of calcium and magnesium (Figures 6(e) and 6(f)), the Si-PLS method selected 756–770, 801–815, 936–950, and 981–995 nm at 8 PLS component and 768–781, 824–836, 967–979, and 1019–1030 nm at 12 PLS component, respectively. Ca2+ and Mg2+ are micronutrients required by plants for growth though in minute quantities. More specifically, Ca2+ is a component of plant cell that maintains cell walls strength and improves the fruit set and quality. Also, it has a positive effect on soil properties by improving the soil structure by enabling nitrogen-fixing bacteria on the roots of leguminous plants to capture atmospheric nitrogen into the soil. Mg2+, on the other hand, is an essential component of chlorophyll molecule; therefore, it is essential for photosynthesis in plant. Notably, Ca2+ and Mg2+ levels and their balances are two important factors affecting the growth of plant [29]. Furthermore, heavy metals do not absorb NIR; however, such constituents which do not absorb NIR radiation can be predicted owing to their correlation with other spectrally active parameters [30, 31]. Also, the findings in this study were similar to those of other researchers [31, 32]. And, as can be seen from Table 6, the results means that the model could be used acceptably for screening and other “approximate” calibration and the range 0.83–0.90 could be usable with caution for most applications, including research [33].

4. Conclusion

For the first time, this work has revealed that pocket-sized NIR spectroscopy in the range of 740–1080 nm could be used onsite to differentiate soils of different land used types and N, P, K, Mg2+, Ca2+, and pH simultaneously. The systematic comparison of different PLS calibration models for the prediction of soil health parameters revealed that the efficient spectral interval showed its superiority in measuring N, P, K, Mg2+, Ca2+, and pH in soils with the coefficient of correlation ranging from 0.699 to 0.898 and RMSEP between 0.033 and 3.02 in the prediction set. This means that for the models developed, the nitrogen model could be acceptable for very rough to rough screening, while the other could also be acceptable for screening, other “approximate” calibration, and usable with caution for most applications, including research [33]. These findings mean that portable NIR spectroscopy could be used for the rapid prediction of the soil status and quality parameters simultaneously with caution. However, more studies are needed to proof the robustness of the findings as it has a huge possibility of reducing the use of the time-wasting wet chemistry technique. It could also assist in making precision fertilizer application a reality in resource-poor communities, especially in developing countries. Also, this study only provides a feasibility study of using portable NIRS, and further studies are therefore required at different geographical locations and wide land-use types.

Data Availability

The data used to support the findings of this study are available upon request from the corresponding author.

Conflicts of Interest

The authors declare that there are no conflicts of interest.

Authors’ Contributions

Ernest Teye developed the concept, provided resources, designed the experiment, wrote the first draft, and provided supervision. Charles L. Y. Amuah and Kwadwo K. Kusi analysed the spectra data and wrote and reviewed the first draft. Ransford Opoku Darko performed the experiment and analysed the soil chemistry data. Michael Miyittah, Rebecca Owusu, and Emmanuel Afutu revised the drafted manuscript and provided supervision. Kofi Atiah supervised the soil analysis and reviewed the first draft. Thomas Abindaw performed the soil analysis and analysed the data.

Acknowledgments

The support provided by the Directorate of Research, the Innovation and Consultancy (RSG/INT/CANS/2021/101) of the University of Cape Coast is highly acknowledged. The authors are also thankful to Mr Steve Adu, Osei Agyemang, and Mr Francis Padi Lamptey for their support during the sample collection and laboratory analysis.