Abstract

Although visible-near-infrared spectroscopy can rapidly and accurately determine soil nutrients without sample destruction, some problems remain unresolved, such as the mismatch of the established spectral model with different types of samples, limiting the wide application of this technology. Here, we took riverside and mountain soils as examples to explore the calibration transfer between two different types of soils by the WMPDS-S/B algorithm (wavelet multiscale piecewise direct standardization combined with Slope/Bias correction method) and by adding new samples. The predicted TN and TC concentrations improved significantly after being transformed. Compared with adding new samples, the WMPDS-S/B algorithm obtained more accurate results. The average relative errors dropped from 440.2% (without transformation) to approximately 6% for TN and from 342.0% to approximately 7% for TC. The maximum relative errors were reduced from 538.1% to less than 20% for TN and from 403.7% to less than 20% for TC. The RMSEP decreased from 2.42 to approximately 0.04 for TN and from 15.74 to approximately 0.4 for TC. The WMPDS-S/B algorithm had advantages in selecting fewer known samples and obtaining better prediction results. In contrast to past studies, which resolved the calibration transfer between different spectrometers and the measurement environment for the same samples, our study resolved the calibration transfer between different types of samples under the same spectrometer and the measurement environment. The former could only be used for correction among instruments, while the latter fundamentally solved the problem of model sharing across different samples.

1. Introduction

Spectral analysis is fast, accurate, and nondestructive, and it can provide rich material information [13]; thus, it is widely used in many fields [49]. In agriculture, soil nutrients have been successfully predicted by the spectral model, which is established by the known soils sampled in the same sites and under the same environment [1014]. That is, the spectral model can work only within a specific range. If the unknown samples are different from the model samples due to altering sampling sites, seasons, or other factors, the spectral model will lead to inaccurate results. These questions have hindered the progress of spectroscopy technology; therefore, a resolution for the problem is highly needed.

Currently, studies are concentrated on the similar questions for the same samples derived from different instruments, measurement conditions, or other aspects (called calibration transfer) [1519]. Many algorithms have successfully resolved these calibration transfer, such as direct standardization (DS) [20, 21], piecewise direct standardization (PDS) [22, 23], orthogonal signal method (OSC) [24, 25], wavelet transform (WT) algorithm [26, 27], and canonical correlation analysis(CCA) [28, 29]. Successful calibration transfer between the same type of samples could keep the accuracy of the same set of spectrometers; however, it could not resolve how to make a spectrometer share a stable spectral model for different types of samples.

Studies on calibration transfer between different types of samples are very scarce. The main solution was to add new samples in the original spectral model. Then, by using the enriched sample set, a new spectral model was rebuilt and was used for predicting the different samples. This method needed to rebuild the spectral model; thus, it was labor-intensive and time-consuming, and the prediction capability of the new spectral model was reduced. Therefore, it was urgent to explore other methods to predict unknown samples of different types quickly and accurately.

The piecewise direct standardization (PDS) algorithm was a calibration transfer method proposed by Wang and Kowalski in 1991 [30]. It required few samples, transferred the correction model faster, and improved the robustness of calibration effectively, so PDS was the most popular method for calibration transfer. However, a fixed transformation matrix was used for the whole spectral data and might not be suitable for every specific spectrum region [31, 32]. The disadvantages of PDS could be avoided by wavelet analysis. Wavelet analysis could disassemble a signal by layers and dimensions; therefore, it had good localization properties and could provide frequency information in the target signal. Its multiresolution analysis combined with PDS had been successfully used in calibration transfer for the same samples derived from different instruments [3335]. We hypothesized that it could also be used for calibration transfer between different types of samples.

Concentrations of total nitrogen (TN) and total carbon (TC) in soils were better predicted than other soil nutrients by spectral models and were thus selected for the study of calibration transfer [36, 37]. The Slope/Bias algorithm (S/B) was often used to correct the result of the chemical values in calibration transfer, which rebuilt the linear relationships based on abundant data [38]. For instance, Cooper et al. provided a simple slope and bias correction to transfer calibration with different instruments [39]. We would also use S/B to reduce the differences between concentrations of soil nutrients.

Here, we explored the calibration transfer of TN and TC concentrations between riverside and mountain soil samples using wavelet multiscale piecewise direct standardization combined with the Slope/Bias correction method (WMPDS-S/B). Additionally, we compared the results with those by adding new samples. To reduce the influence of other factors on calibration transfer, we sampled spectral data from the same equipment and the same test environment.

2. Experimental

2.1. Samples and Data Acquisition

Two types of upper soils (0–20 cm) were collected from the riverside of Licun River and the foot of Fushan Mountain in Qingdao, China. Licun River soil (riverside soil) was silt loam rich in nutrients, while Fushan Mountain soil (mountain soil) was sandy loam poor in nutrients. We removed foreign bodies such as stones from the soil samples. After drying to a constant weight, all soil samples were filtered through a 0.45 mm nylon screen. Soil TN and TC concentrations were measured with a carbon and nitrogen analyzer (Perkin Elemental Analyzer, USA). The concentrations of TN and TC between two types of soils have a large difference (Table 1).

Soil spectral data were determined by an Ocean Optics QE65000 spectrometer. The sampling interval was 1 nm, and the integral time was 1000 ms. The spectral range was 200–1100 nm, mainly the visible and near-infrared spectrum, including a small amount of the ultraviolet spectrum. The optical fiber probe was inserted into the hole of the probe bracket at a 45° angle, and this made the probe stick to the bracket and just exposed the bracket.

Soil (3–5 g) was gently flattened in a homemade sample box, whose size was the same as the probe bracket and whose cell overlapped the hole of the probe bracket. The schematic diagram of soil spectral data measurements is shown in Figure 1. Each soil sample was measured five times. To reduce the influence of the noise in the reflectance spectrum, the spectral data of 226–975 nm were retained. The reflectance spectrum of the river and mountain soils is shown in Figure 2.

The principal component space was constructed by the scores of the spectral principal component (Figure 3), and the distribution of the principal components in the samples could be used to determine whether the unknown samples were fit to the original model. As shown in Figure 3, the river soil and mountain soil samples were divided into two regions in the principal component space. This outcome indicated that the spectrum of the two types of soil samples was obviously different. Therefore, the model established by the river soil was hardly suitable for predicting the mountain soil directly.

2.2. Data Treatment and Analysis
2.2.1. Preetreatment and Modeling

The reflectance spectral data (226–975 nm) of the river and mountain soils were pretreated using Savitzky–Golay smoothing and differentiation. The calibration set and test set of the two types of soil samples were divided into 3 : 1 proportions by the Kennard–Stone algorithm, with 45 soil samples as the calibration set and the rest of the samples as the test set. The spectral model of the calibration set was built by partial least squares (PLS) [40, 41] (Table 2). Results showed that a better spectral model was built by the river soil (Table 2); therefore, it was set as the First Sample and mountain soil was set as the Second Sample.

2.2.2. Calibration Transfer Algorithm

Here, we use two methods: wavelet multiscale piecewise direct standardization combined with the Slope/Bias correction method (WMPDS-S/B) and the method of adding new samples. WMPDS-S/B is more complicated. Its specific steps are described below.(1)Wavelet decomposition is used to obtain the coefficients of each wavelet.

First, the wavelet generating function and the number of wavelet decomposition layers are selected. Then, the spectrum of the First Sample and the spectrum of the Second Sample are decomposed by wavelet analysis, and the approximate coefficients and and detail coefficients and () of the First Sample and the Second Sample are obtained. The correlation formula is as follows:(2)The transformation matrices of each wavelet coefficients are calculated using the PDS algorithm.

A part of the soil samples from the Second Sample are taken as standard samples for calculating the transformation matrices, denoted as standard set . The remainders of the Second Sample are unknown samples, denoted as unknown set . If we want to calculate the transformation matrices of the approximate coefficients between the calibration set of the First Sample and the standard set of the Second Sample , , the average spectral of , are first obtained, and the formulas are as follows:where is the jth wavelength point of the ith sample and and are the number of samples for and .

The spectral band of the window width () is intercepted near the jth wavelength point. Let , make , and then the regression coefficients are calculated. Loop , and find all the . Put on the main diagonal of the transformation matrix , and set other elements as 0, and finally get the transformation matrix . According to the above algorithms, we can obtain the transformation matrices of the detail coefficients between the calibration set of the First Sample and the standard set of the Second Sample.(3)The transfer spectrum of the standard set and unknown set in the Second Sample is obtained by using transformation matrices.

According to the transformation matrix , the transfer approximation coefficients and are achieved by the transforming approximation coefficients in the standard set and unknown set of the Second Sample and .

Then, the transfer detail coefficients of the standard set and unknown set in the Second Sample and are calculated sequentially. Finally, the transfer spectrum and are obtained by reconstructing each wavelet coefficient of the Second Sample.(4)The Slope/Bias correction method is used to calculate the final prediction value.

The least squares solution is obtained by fitting the transfer spectrum and the measured nutrient values in a standard set of the Second Sample with linear regression, providing the slope and intercept of the linear model. By substituting into the formula, we can find the prediction values of the unknown set by the following equation:

In this paper, the wavelet generating function chooses “db3,” and the decomposition layer number N is 3.

2.2.3. Evaluation Standard

The evaluation standard used the average relative error (errora), the maximum relative error (errorm), and the prediction root mean square error (RMSEP), which offered a comprehensive analysis. The smaller the average relative error, the maximum relative error and RMSEP were, the better the calibration transfer was.

3. Results and Discussion

3.1. Calibration Transfer of TN Concentrations between Two Different Types of Soils

The calibration transfer of TN concentrations between two different types of soils was realized by WMPDS-S/B. The PDS window width and the number of the standard set in the Second Sample affected the results of calibration transfer (Figure 4). We made the number of the standard set 15 by the Kennard–Stone algorithm, the window width ranged from 3 to 19, and the interval was 2. The influence of different window widths on the results was studied. We studied the effects of the number of the standard set on the results. Setting the window width to 15, the number of the standard set ranged from 11 to 20 by the Kennard–Stone algorithm, and the interval was 1. When the window width was 15 and the number of the standard set was 14, the average relative error was the smallest (Figure 4). Therefore, we chose this parameter for the subsequent calibration transfer.

Using the WMPDS-S/B algorithm, the average relative error (errora), the maximum relative error (errorm), and RMSEP were obviously decreased (Table 3). The errora dropped from 440.2% to 5.77%. The errorm were reduced from 538.1% to 16.67%, and the RMSEP was reduced from 2.416 to approximately 0.041 (Table 3).

We used the optimum number of the standard set to divide the standard set and unknown set in the Second Sample by the Kennard–Stone algorithm; the result of the calibration transfer was accidental (Table 3). To avoid the contingency, we used interval sampling by proportion (Table 4) and random sampling (Table 5) to divide the standard set and unknown set. The two methods of sample classification led to similar results of calibration transfer (Tables 4 and 5).

The different proportions between the standard set and unknown set in the Second Sample by the Kennard–Stone algorithm could bring different results of calibration transfer (Table 4). Here, the results showed that. with increased proportion between the standard set and unknown set (from 1 : 2 to 1 : 4), the errora increased, the errorm decreased, and the RMSEP increased (Table 4). It seemed that the proportion of 1 : 2 was the best, perhaps because the number of the standard set was proper and contained enough information.

Random sampling did not have a big influence on the results of calibration transfer (Table 5). The values of errora, errorm, and RMSEP derived from random sampling were similar with those derived from interval sampling by the Kennard–Stone algorithm (Tables 4 and 5).

3.2. Calibration Transfer of TC Concentrations between Two Different Types of Soils

Using the same method, the PDS window width and the number of the standard set in the Second Sample were also found to affect the results of calibration transfer (Figure 5). When the window width was 11 and the number of the standard set was 18, the calibration transfer of the TC concentrations model was the best (Figure 5). Therefore, we chose these parameters for the subsequent calibration transfer of TC concentrations.

Under the best condition, the standard set and unknown set were classified by the Kennard–Stone algorithm, proportion, and random sampling, respectively (Table 6). The accuracy of prediction was significantly improved after the WMPDS-S/B algorithm was used for the calibration transfer (Table 6). The errora decreased from 341.97 to approximately 6%, the errorm decreased from 403.70% to <20%, and the RMSEP decreased from 15.737 to <0.46 (Table 6).

Among all the prediction results, the best was that from the 1 : 3 proportion between the standard set and unknown set (Table 6), whose errora, errorm, and RMSEP were the smallest. The results of the 1 : 2 proportion were relatively poor, maybe because the number of the standard set was too large and some unrelated information was included in the calibration transfer.

Some division approaches could lead to spectral overcorrection and reduce the prediction effect of the calibration transfer. But in this study, when the number of standard set was fixed to 18, the values of errora, errorm, and RMSEP were similar across different classification methods (Table 6). This shows that different classification methods produce similar prediction results, suggesting the division approaches in this study were reasonable.

3.3. Comparison between the WMPDS-S/B Algorithm and Adding Samples

Under the condition of selecting the same samples of the standard set by the Kennard–Stone algorithm, we then study the effects of the two methods on the results of calibration transfer in TN and TC concentrations (Tables 7 and 8). The number of the standard set is 10, 15, 20, 25, and 30. The window width of the WMPDS-S/B algorithm in TN concentrations is 15 and in TC concentrations is 11. The results of the First Sample include the relative coefficients of calibration set (rc), the relative coefficients of test set (rp), and the RPD. The results of the unknown set of the Second Sample include the errora, errorm, and RMSEP.

The prediction performance (the value of the rc, rp, and RPD) of the original model did not change by using the WMPDS-S/B in the First Sample model of soil TN and TC concentrations (Tables 7 and 8). With the addition of the number of the standard set, the prediction performance of the model by adding new samples was unstable (Tables 7 and 8), with high rc but low rp and RPD. So, WMPDS-S/B could keep the stability of the original model.

Among all calibration transfer algorithms, the best was the WMPDS-S/B algorithm, while the method of adding new samples was the worst. By adding new samples, the errora, errorm, and RMSEP of TN and TC concentrations were all high. Therefore, it could not be selected for calibration transfer between two different types of soils. The WMPDS-S/B algorithm was based on the S/B algorithm and combined with the WMPDS algorithm to eliminate spectral differences. It could improve the accuracy of nutrient concentration prediction between two different types of soil.

The number of the standard set had a relatively small impact on the results by the WMPDS-S/B algorithm (Tables 7 and 8). With the increase in the number of the standard set in the Second Sample, the errora, errorm, and RMSEP decreased by adding new samples, showing that the addition of the samples was beneficial to the improvement of the prediction. When the number of the standard set increased to 30, which was the largest samples in both the TN and TC concentrations, the result was the best. However, when the number of the standard set was a certain value in the middle, which was 15 and 20 in the TN and TC concentrations, respectively, the result had the best prediction effect by the WMPDS-S/B algorithm. Compared to the first method, the number of the standard set had a relatively small impact on the results by the WMPDS-S/B algorithm, and this method could get better results in selecting as few a number of standard set as possible. In practical applications, it was impossible to select many standard set samples, which would increase workload and be time-consuming. The new algorithm had advantages in selecting fewer standard samples and obtaining better prediction results.

3.4. Differences from Previous Studies

The calibration transfer studied in this paper was different from previous studies, which were mainly for the calibration transfer between two different types of samples. Previous studies had solved the change of the same sample in different instruments and testing environments, thus ensuring the consistency of the instruments [1318]. Fernandez et al. used four different calibration transfer techniques (Direct Standardization, Piecewise Direct Standardization, Orthogonal Signal Correction, and Generalized Least Squares Weighting) to offset the effect of temperature change on the prediction of gas sensors [13]. Yahaya et al. compared the calibration transfer effect of the reflectance spectrum of pH for mango among different spectrometers (QE65000, Jaz, and ASD FieldSpec 3) through direct calibration transfer [18]. These studies are all based on the calibration transfer of the same samples under different test conditions or different instruments.

At present, many studies were based on spectroscopy for the prediction of different types of soil nutrients. For example, Kuang et al. used near-infrared spectroscopy to detect the content of soil organic carbon and total nitrogen, and soil samples were collected on five farms. The result for the RPD of organic carbon OC was from 2.66 to 3.39 and of total nitrogen TN was from 2.85 to 3.45 [42]. The soil nutrient concentrations model was established through many soil samples at different locations, and this method still needed a large number of known information about different types of soil nutrient chemical values, so it was not the soil calibration transfer in real sense.

In this study, we solved calibration transfer between two different types of samples in the same instrument and the test environment using one type of soil nutrient concentrations model to predict another type of soil nutrient concentration. If the concentrations model of riverside soil was used to predict the concentrations of mountain soil directly, there was a large error (Tables 3 and 6). After we did the calibration transfer, the concentrations of mountain soil were predicted more accurately. The error significantly declined, and the prediction results could be accepted. The calibration transfer that we studied in this work could fundamentally solve the problem of model sharing between different types of samples and eliminate the technical bottleneck restricting the application of the spectral method. This solution would make it possible for one spectral instrument to rapidly share a spectral model to determine soil nutrients.

3.5. Future Studies on Calibration Transfer with Different Samples

Wavelet analysis has multiscale and multiresolution characteristics. It can process signals and analyze spectral information from coarse to fine. The spectral data can be decomposed into two subspaces by multiscale wavelet processing, that is, low frequency and high frequency. Calibration transfer between low-frequency and high-frequency information is carried out. Different transfer matrices can be obtained according to different frequency domains. The transfer matrices between different regions and different varieties are further refined. However, there are some problems in the WMPDS-SB algorithm.(1)The PDS window width and the number of standard samples still need to be selected artificially. In the future, we can select the optimal PDS window and the number of samples automatically through algorithms.(2)The entire band of the spectrum is used in this study. To reduce redundant information and improve prediction performance, we will extract the diagnostic spectral wavelength in the future.(3)In this study, the method of calibration transfer in soil nutrients is labeled samples, which still require a portion of the chemical value of unknown samples. In the future, we intend to achieve the calibration transfer among different regions or different varieties with smaller labeled samples or without labeled samples.(4)The WMPDS-S/B algorithm is applied to the calibration transfer in two different types of soil and has good results. In the future, this algorithm can be applied to calibration transfer in different types of soil samples.

4. Conclusions

This paper took mountain and riverside soils as examples to explore the calibration transfer of TC and TN concentrations between two different types of soils using WMPDS-S/B. To reduce the influence of calibration transfer in the same samples, we sampled spectral data with the same equipment and in the same test environment. The riverside soil was set as the First Sample, and the mountain soil was regarded as the Second Sample. First, a calibration model was established by the First Sample. Then, concentrations of total carbon (TC) and total nitrogen (TN) for the Second Sample were predicted by the transferred calibration model. In the calibration transfer of TN concentrations with two different types of soil, when the PDS window width was 15 and the number of standard set in the Second Sample was 14, the prediction accuracy of TN concentrations improved significantly compared to the accuracy without using calibration transfer. The average relative error (errora) dropped from 440.20% to approximately 6%, the maximum relative error (errorm) was reduced from 538.1% to less than 20%, and RMSEP was reduced from 2.416 to approximately 0.04. In calibration transfer of TC concentrations between two different types of soils, when the PDS window width was 11 and the number of standard set was 18, the prediction accuracy of TC concentrations in the Second Sample showed significant improvement compared to the accuracy without using calibration transfer. The errora dropped from 341.97% to approximately 7%, the errorm was reduced from 403.70% to less than 20%, and the RMSEP was reduced from 15.737 to approximately 0.4.

Under the condition of selecting the same samples of standard set, we studied the effects of the three methods on the results of calibration transfer in TN and TC concentrations. Compared to the traditional method of adding new samples, the WMPDS-S/B algorithm obtained better forecasting results. With the increase of the number of standard set in the Second Sample, errora, errorm, and RMSEP were decreased by adding new samples. However, the number of the standard set had a relatively small impact on the results from the WMPDS-S/B algorithm. This method had advantages in selecting fewer standard samples and obtaining better prediction results. It was feasible to realize the calibration transfer of soil nutrients between two different types of soils rapidly and accurately, providing new ideas and methods for calibration transfer among different types of samples.

Data Availability

All data in the paper are fully available without restriction at https://figshare.com/s/6c301c6479dc3f1a554c, or from the corresponding author upon request.

Conflicts of Interest

The authors report there are no conflicts of interest.

Acknowledgments

This work was supported by the National Key Research and Development Program of China (2017YFC1404802), Shandong Provincial Natural Science Foundation, China (ZR2018LD007), the National Key Research and Development Program of China (no. SQ2016YFSF070090), and the National Key Research and Development Program of China (no. SQ 2017YFC1403702).