The rapid and accurate detection of the moisture content is of great significance to the quality evaluation and oil extraction process of walnut kernel. Near-infrared (NIR) spectroscopy is an ideal method for measuring the moisture content in walnut kernel. In this study, a regression model for moisture content in walnut kernel was developed based on NIR diffuse reflectance spectroscopy using chemometric methods. The different spectral pretreatment methods were adopted to preprocess the original spectral data. The whole spectra band was divided into 5 subbands, 10 subbands, 15 subbands, and 20 subbands to screen specific wavelengths relevant to the walnut kernel moisture content. PLS (partial least square regression), MLR (multivariate linear regression), PCR (principle component regression), and SVR (support vector regression) were used to establish the relationship model between the spectral data and measurement values of the moisture content. In comparison, the optimized modeling conditions were determined as follows: detection wavelength 1349–1490 nm, SNV-FD (standard normal variate transformation and first derivative) preprocessing method, and PLS algorithm. Under these conditions, the square correlation coefficient (R2) and root mean square error of prediction (RMSEP) of the prediction model were 0.9865 and 0.0017, respectively. The results of this study provided a feasible method for the rapid detection of moisture content in walnut kernel. To improve the performance and applicability of the model, it is necessary to continuously expand the size of the sample set.

1. Introduction

The walnut is one of the most important special oils and woody oil crops, which has both ecological and economic value. The walnut kernel is not only rich in nutritional value (e.g., oil, proteins, carbohydrates, and minerals) but also contains some minor components (e.g., phenols, tocopherols, and phytosterols) with high antioxidant capacity [1, 2]. China is a major consumer, producer, and exporter of walnut, with the walnut production of over 1.5 million tons in 2018 (FAO Statistics). Restricted by technologies such as shelling, drying, and peeling, the harvested walnuts cannot be processed as quickly as possible [35]. Consequently, the moisture content in walnut kernel becomes an important reference index for evaluating walnut kernel quality, choosing storage conditions and processing methods [68]. On the one hand, proper moisture can increase the oil extraction rate. The moisture content of walnut kernel should be less than 5.0% according to LS/T 3121–2019 Walnut for Oil. On the other hand, to a certain extent, the level of moisture content determines the economic benefits for growers and the walnut business. Walnut kernels with low moisture content will be lighter and their prices will decrease, while high moisture content will accelerate the rate of oil oxidation and mildew, resulting in the deterioration of walnut quality [911]. Therefore, it is crucial to accurately detect the moisture content in walnut kernel.

The classical detection method for the moisture content is the drying method. The moisture in the walnut kernel can be divided into free water and equilibrium water according to whether it can be removed by the drying method. The moisture that can be removed by the drying method is free water, and the moisture studies in this study are free water. Due to its drawbacks of time-consumption, destruction, and complication, it is important to find a rapid, nondestructive, and “online” method, which could provide significant information. Near-infrared diffuse reflectance spectroscopy is a powerful tool to evaluate the quality parameters, which has been widely used in food, agriculture, chemical industry, biomedicine, and other fields [1214]. With the recent development of spectroscopy technology, more and more NIR spectroscopy technologies have been used as national standards. The application of NIR spectroscopy in crop quality analysis has become increasingly popular [1517]. A number of studies have shown that near-infrared spectroscopy technologies can be used to predict the moisture content of grains and oil seeds [1820]. For example, NIRS technique has been successfully used for the rapid analysis of moisture content in wheat flour samples [2124]. López A et al. used near-infrared spectroscopy technologies to detect the moisture content in potato products [25, 26]. NIR diffuse reflectance spectroscopy also allowed rapid and online analysis of the moisture content in cocoa beans samples [27, 28]. Moreover, Lakshmanan has suggested that NIRS could be applied to monitor moisture and oil moisture content at the reception in the mill about the quality and composition of the copra, which was useful in the copra oil extraction process [29]. When chemometric methods were used together with NIR, a quantitative model of chestnut moisture content could be easily established, with the root mean square error of cross-validation (RMSECV) of 0.05 and the coefficient of determination (R2) of 0.9 [30]. Zhang and Dai used NIR spectroscopy to establish a quantitative model of moisture content in corn seed and concluded that the model performance of a single variety of corn seed was greater than that of multiple varieties [31, 32]. However, there are fewer studies on detecting the moisture content in walnut kernel using NIR spectroscopy. Here, the main purpose of this study attempts to build a rapid quantitative method for moisture content measurement in walnut kernel using NIR spectroscopy, which was expected to provide technical support for the rapid detection and control of walnut kernel quality.

2. Experiment

2.1. Preparation of Walnut Kernel Samples

The walnut kernel samples used in the experiment were collected from Aksu, Xinjiang, China. The walnut variety was Wen 185. In order to make the walnut kernel samples more representative, the method of moisture absorption in a closed container was adopted. According to Lu’s experimental method [27], the walnut kernels were crushed by a high-speed universal pulverizer and passed through a 10-mesh fine sieve after removing the shell. The filtered samples were used in the experiment. A certain amount of walnut kernels were placed in a closed container with water at the bottom, and then, the container was stored in a constant temperature incubator at 20°C to make the water absorbed evenly. At last, a total of 136 walnut kernel samples with different moisture contents were prepared in this way.

2.2. Chemical Analysis

According to the Chinese national standard GB/T 14489.1–2008, the moisture content in walnut kernel was measured by the drying method. The moisture content of each sample was measured immediately after the spectral data were collected. Each sample was measured twice, and the average value was taken as the reference value as given in Table 1. The coverage range of the moisture content was 1.20–9.92%, with the average value of 5.55% and the standard deviation of 0.27. The error margin of the drying method used for the sample was less than 8%. The 136 walnut kernel samples were randomly divided into the calibration set and prediction set according to the ratio of 3 : 1, of which 102 samples were contained in the calibration set and the remaining samples were contained in the prediction set.

2.3. NIR Spectra Acquisition

NIR spectra of walnut kernel samples were recorded using the FOSS NIRS DS2500 spectrometer (FOSS, Denmark) system equipped with a tungsten halogen lamp. The detector was silicon (780–1100 nm) and lead sulfide (1100–2500 nm). NIR spectrum was recorded in the wavelength range 780–2500 nm (interval 2 nm, and data points 860 per a sample). The operating conditions were listed as follows: the operating temperature 35°C and the number of scans 32. The depth of walnut kernels was 20 mm thickness. All data were obtained in triplicate, and the mean value was used in subsequent calculations.

2.4. Statistical Analysis

The spectral data were preprocessed by standard normal variate transformation (SNV), multiplicative scatter correction (MSC), first derivative (FD), orthogonal signal correction (OSC), detrend, and normalization (normalize), which were for reducing the influences of background noise, baseline drift, and spectral scattering [33]. In model construction for the moisture content in walnut kernel, multiple linear regression (MLR), principal component regression (PCR), partial least squares regression (PLS), and support vector regression (SVR) have been applied to develop the corresponding models between spectral data and measurement values [34]. The performances of multivariate models were evaluated by the coefficients of determination (R2), the root mean square error of prediction (RMSEP), the root mean square error of calibration (RMSEC), and residual predictive deviation (RPD). All modeling computations were implemented using the Unscrambler X10.4 and Matlab v2007a.

3. Results and Discussion

3.1. Characteristics Analysis of Samples Spectral Data

The raw NIR spectra (780–2500 nm) of walnut kernel samples are plotted in Figure 1. It can be seen from Figure 2 that the basic variation trends of spectral curves are similar, and there are no obvious abnormal variation points found. All spectral curves have the spectral energy absorption at the wavelengths of 1210 nm, 1450 nm, 1725 nm, 1890 nm, 2000 nm, 2328 nm, and 2350 nm, and the differences of these near-infrared absorption peaks are mainly due to the differences in the internal component contents of these samples. Among these, the spectra present the dominant absorption peaks at 1450 nm and 1940 nm, which are assigned to the first overtone and combination overtone of O-H stretching vibration. In fact, they are indeed closely related to the moisture contents in walnut kernel [35]. Although the near-infrared spectra of different walnut kernel samples are obviously different, there still exist spectral overlaps and background interference. Modeling directly from the original spectrum must affect the model’s robustness and accuracy. Therefore, the modeling conditions need to be optimized, and the spectrum also needs to be preprocessed.

3.2. Optimization of Modeling Conditions
3.2.1. Screening of Pretreatment Methods

The original spectral signal collected by the instrument not only contains information related to the chemical composition of the sample but also contains irrelevant interference information such as baseline drift, sample physical properties, background, and noise. These interferences directly affect the accuracy of the final analysis result. To improve the prediction precision of the moisture content model, normalize, FD, SNV, detrend, MSC, and OSC were used to preprocess the original spectral data. The results are shown in Figure 2. It is clear that there are certain differences between the preprocessed spectra with different algorithms. Among them, the profiles of spectra after normalize, MSC, and SNV are not significantly different from that of original spectra. However, the degree of the spectrum dispersion is significantly reduced, which highlighted the useful information in some bands. On the contrary, FD, detrend, and OSC cause significant changes in spectral morphology. The FD algorithm can eliminate the interference caused by the background and baseline drift, but at the same time, it also can cause noise amplification. The detrend algorithm can avoid the influence of trends, while the OSC algorithm can filter out part of the noise in the original spectrum and retain the main information. Although different pretreatment methods have the ability to eliminate information irrelevant to the analyte component, different measurement systems suffer from different interference factors. Therefore, it is necessary to choose the appropriate pretreatment method in the modeling phase. The PLS-based results are given in Table 2.

It can be seen from Table 2 that the pretreatment methods have a certain impact on the performance of the calibration model. Except for the OSC method, most methods can improve the prediction precision of the built models. By comparing models with the single pretreatment method, the model with the normalize algorithm has the best prediction precision. The value of RMSEP is 0.0024, which is improved by 33.3% compared with that of the model only with original spectra. This is due to that the normalize algorithm can effectively reduce the interference of invalid variable information and highlight the valid information related to moisture in the band. The OSC algorithm can only filter out their projections in the direction orthogonal to analyte, but the noise in the spectral data is not absolutely orthogonal to analyte. So, the residual part of the noise may cause overfitting and would further affect the stability of the model. Additionally, both the MSC algorithm and the SNV algorithm can reduce baseline drift and spectral scattering, which have the similar improvement on prediction accuracy. The FD algorithm also can eliminate the baseline drift and background interference of the spectra, but it amplifies the influence of some noise information, leading to the loss of part of the effective information and the deterioration of prediction accuracy. When combining two of the five algorithms as a new pretreatment algorithm, SNV-FD was deemed to be the best pretreatment method. The R2 in calibration and prediction are both greater than 0.98. The RPD value is 7.40, which is the maximum value of several methods. Moreover, the value of RMSEP is 0.0020, which is improved by 44.4% compared with that of the model only with original spectra and by 16.7% compared with that of the model with the best single pretreatment algorithm. Here, the optimized prediction model of walnut kernel moisture content was established using the SNV algorithm and the first derivative algorithm. Through analysis, it was also found that the combination of more than two pretreatment algorithms was not necessary for further performance improvement.

3.2.2. Selection of Characteristic Band

For the NIR spectrum of the moisture content in walnut kernel, there are a large number of wavelengths irrelevant to the moisture content, which could cause a great interference to moisture content measurement. Selecting specific wavelengths relevant to the analyte can significantly reduce the number of input variables. This will mainly bring two benefits. On the one hand, band selection can avoid instrument hardware interference because the presence of interference signals such as the noise makes the signal-to-noise ratio of some bands low and the quality of the collected spectrum poor. On the other hand, a small number of variables are beneficial to effectively extract information and eliminate noninformation [36]. Here, the whole spectra band is divided into 5 subbands, 10 subbands, 15 subbands, and 20 subbands, respectively. Coupled with the SNV + FD algorithm, the PLS models of moisture content are established as shown in Figure 3. It can be seen that the model divided into ten subbands had the best prediction precision, and the coverage range of RMSECV was 0.0023–0.0035. This is probably due to that the wider spectrum area could introduce more interference information, while the narrower spectrum area may lose the analyte information and reduce the prediction accuracy of the model. Among these ten subbands, the model based on subband 5 (1349–1490 nm) can get the best prediction precision, mainly due to the strong absorption peak of -OH (near 1450 nm) in water. The values of R2 and the RMSECV are 0.9845 and 0.0023, respectively, meaning that the prediction precision is improved by 32% compared to that of the model established with the whole band. Therefore, subband 5 is chosen as the modeling band for moisture content measurement.

3.2.3. Comparison of Modeling Methods

General modeling methods include linear calibration methods (PLS, MLR, PCR, and so on) and nonlinear calibration methods (SVR, artificial neural networks, and so on) in near-infrared spectroscopy analysis [34]. In this study, four representative methods including MLR, PLS, PCR, and SVR were applied to develop the appropriate regression model for walnut kernel moisture content measurement. The corresponding performances are shown in Figure 4.

It can be seen from Figure 4 that there are significant differences in the prediction performance of the four modeling methods. The prediction results of linear modeling (PLS, PCR, and MLR) are significantly better than that of nonlinear modeling (SVR). This was mainly because that there is a good linear relationship between the moisture content and spectral information in nature. However, the SVR algorithm is a nonlinear modeling method and cannot effectively process the information related to the moisture content. Among these linear modeling methods, the PLS model has the best performance with R2 equal to 0.9865 and RMSEP equal to 0.0017, showing that the prediction precision can be improved by 60.5% compared to that of the SVR model. The reason may be that the PLS algorithm introduces the concentration information of the measured component into the decomposition process of the spectrum matrix, and the scores of the spectral matrix and the concentration matrix are exchanged before each principal component is calculated. Consequently, the principal component of the spectral matrix is associated with the concentration of the measured component. However, PCR can eliminate useless noise information, but the influence of the measured component concentration is not considered during the decomposition of the spectral matrix [37]. In the MLR method, good results rely on the requirements that the input variables should be independent of each other, and the treatment effect of multicollinearity between variables should be poor. In fact, these requirements are impossible in complex spectra. Therefore, the PLS method is chosen for moisture content model construction.

3.3. External Validation of the Quantitative NIRS Models

In order to validate the stability and accuracy of the model, the PLS, MLR, PCR, and SVR calibration models were built to predict the moisture content of 30 unknown samples as shown in Figure 5. The data for Figure 5 have been performed double-blind.

It is shown in Figure 5 that the performance of the PLS model is excellent. The prediction precision can be improved by 56.4% than that of the SVR model. This is mainly because the PLS algorithm considers the concentration matrix and spectrum matrix simultaneously, and the multilinearities between spectra are reduced. Moreover, the paired t-test was applied to judge whether there was a significant difference between the near-infrared spectroscopy method and the national standard method. The t-test result showed that the value was 0.704, indicating that the two methods had no significant difference (with criterion α = 0.05 and assumption that there was a significant difference between the true value and the predicted value). Through comparison, the results obtained in present work are quite well comparable with the accuracy of online equations for the moisture content in coco-peat reported by Lu and coworkers (R2 = 0.99; RMSEP = 0.014) [27]. The current research also shows a quite lower RMSEP value (0.0017) than that recorded by Salguero-Chaparro and coworkers (0.016) in the prediction of the moisture content in olive fruits [38]. In addition, the R2 and RMSEP values of the prediction model in this work are greater than those observed by Hu and coworkers (R2 = 0.90; RMSEP = 0.05) for the moisture content in the chestnut moisture content [30]. Therefore, it can be concluded that it is feasible to use near-infrared diffuse reflectance spectroscopy to measure the moisture content in walnut kernel. However, walnut kernels show different moisture contents due to different varieties, origins, and cultivation managements, as well as changes along with storage conditions and time after harvest, which limit the applicability of this model. Thus, to improve the adaptability of this model, it is necessary to update the sample set by introducing new samples when they appear. Now, this approach and the corresponding dataset are in progress in our laboratory.

4. Conclusions

In this work, a quantitative model for moisture content measurement in walnut kernel was developed based on NIR diffuse reflectance spectroscopy using chemometric methods. The experimental results showed that suitable spectral pretreatment algorithms and band selection can significantly improve the prediction precision of the moisture content model. Through analysis, the optimized prediction model for walnut kernel moisture content can be established according to the following conditions: pretreatment method SNV + FD, band 1349–1490 nm, and PLS method. The values of R2 and RMSEP of this model were 0.9865 and 0.0017, showing that this model can accurately predict the moisture content in walnut kernel. The results also indicated that this method meets the requirements of routine food control and can be used to measure the moisture content of walnut kernel in practice. However, the moisture content of walnut kernel is susceptible to several factors such as variety, origin, cultivation management, and storage condition. Therefore, more efforts are needed to expand the size of the sample set to improve the performance and versatility of the model.

Data Availability

The data used to support the findings of this study are deposited in the Dryad repository (https://datadryad.org/stash/share/Gv_u3jnm9QF0YB1xbZTC-krtuiHP-X2ouqfmvUPMUe0).

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Authors’ Contributions

Dan Peng designed the study and interpreted the results. YaLi Liu and JiaSheng Yang collected the samples and got the associated spectra. Dan Peng, JingNan Chen, and Yanlan Bi made the programs using Matlab and tested the data. Yanlan Bi helped for drafting the manuscript.


This work was supported by Key Scientific and Technological Project of Henan Province (212102110341) and the National Natural Science Foundation of China (31601537).