Spectroscopy in Materials ChemistryView this Special Issue
Analysis of the Oil Content of Rapeseed Using Artificial Neural Networks Based on Near Infrared Spectral Data
The oil content of rapeseed is a crucial property in practical applications. In this paper, instead of traditional analytical approaches, an artificial neural network (ANN) method was used to analyze the oil content of 29 rapeseed samples based on near infrared spectral data with different wavelengths. Results show that multilayer feed-forward neural networks with 8 nodes (MLFN-8) are the most suitable and reasonable mathematical model to use, with an RMS error of 0.59. This study indicates that using a nonlinear method is a quick and easy approach to analyze the rapeseed oil’s content based on near infrared spectral data.
Infrared absorption spectroscopy is a common approach for analyzing food composition [1–3]. For a certain characteristic absorption frequency, Lambert’s law provides the following equation [4, 5]: where represents incident light intensity, represents transmission light intensity, represents the attenuation coefficient, represents the distance the light travels through the material, and represents concentration.
Equation (1) is widely used for determining food composition. However, because the wavelengths in the infrared absorption spectrum are diverse and the force of penetration is tiny, infrared absorption spectroscopy can only be used for analyzing transparent liquids. It is of great difficulty to analyze the oil content of rapeseed using infrared absorption spectroscopy. Therefore, to solve this problem, this study instead uses a nonlinear approach to analyze near-spectral data to determine the oil content of rapeseed.
2. Artificial Neural Networks
2.1. Fundamental of ANN Models
Artificial neural networks (ANN) model is composed of an interconnected group of artificial neurons. In most circumstances, an artificial neural network is an adaptive system that is equipped to be adapting continuously to new data and learning from the accumulated experience and noisy data [6, 7]. Apart from that, the system structure can be changed based on external or internal information that flows through the network during the learning phase. Meanwhile, essential information can be abstracted from data or model complex relationships between inputs and outputs [8–10].
As can be seen from Figure 1, the main structure of the artificial neural network (ANN) is made up of the input layer and the output layer. The input variables are introduced to the network by the input layer . Also, the response variables with predictions, which stand for the output of the nodes in this certain layer, are provided by the network. Additionally, the hidden layer is included. The type and the complexity of the process or experimentation usually iteratively determine the optimal number of the neurons in the hidden layers .
2.2. Model Development
Gu and Wang  have accomplished a series of researches from correlative precision instrument from which we could obtain data of rapeseeds’ near infrared spectroscopy by analyzing absorbance under different wavelengths. We defined (%) as the percentage composition of the oil in rapeseed. Data of 29 rapeseed samples are shown on Table 1.
In order to confirm the most suitable and robust ANN model in analyzing the oil content of rapeseed, 21 models were established including linear prediction model, general regression neural networks (GRNN)  and multilayer feed-forward neural networks (MLFN) [15, 16]. Into that matter, nodes of MLFN models were set to be from 2 to 20, so that the most robust MLFN model could be found. The independent variables are the absorbancies under the wavelength of 1.68 μm (reference wavelength), 1.73 μm (characteristic absorption wavelength of fat), 1.94 μm (characteristic absorption wavelength of water), 2.10 μm (characteristic absorption wavelength of starch), and 2.18 μm (characteristic absorption wavelength of protein), respectively, while the dependent variable is the percentage composition of the oil in rapeseed. Training set is consist of 24 samples while the rest of the samples are considered to be the testing set. To ensure the accuracy of the experiments, we did the training process repeatedly. The composing of trained samples and tested samples is different in each experiment. Results of the 21 models were obtained by correlative software, which are shown in Table 2.
Results presented by Table 2 imply that the lowest RMS error of testing exists in the MLFN model with 8 nodes (MLFN-8), which is 0.59, lower than those generated by linear prediction model and GRNN model. And the accuracy rate of the testing is 100% with the permission error. Therefore, the MLFN-8 model is proved to be an accurate and robust model.
3. Results and Discussion
3.1. Training Results of MLFN-8
Training and testing results of MLFN-8 model were extracted from the experiments. For more intuitionistic, six figures described by data are used to portray the training and testing results, which are shown in Figures 2 to 7.
In training process, the comparison result between predicted values and actual values is depicted by Figure 2. The regulation between predicted values and actual values implies that the training process is precise.
Figure 3 depicts the relationship between residual values and actual values during training process, showing that the residual values are relatively concentrated.
Different from Figure 3, Figure 4 depicts the relationship between residual values and predicted values during training process. Similar to the result shown in Figure 3, the residual values present the same phenomenon as Figure 3, which indicates that the training process is precise.
In general, Figures 2, 3, and 4 depict the results of training process, showing that the values are concentrated and correspond with the normal training process of MLFN-8 model. It is worth mentioning that the residue values are generally tiny and close to zero, which implies that the training process is correct and precise.
3.2. Testing Results of MLFN-8
In testing process, as shown in Figure 5, the comparison between predicted values and actual values is also close to linear situation, which means that the MLFN-8 model is precise while predicting.
In order to confirm the robustness of comparison between residual values and actual values as well as the comparison between residual values and predicted values, we plotted the comparison between residual values and these two kinds of values, which are shown in Figures 6 and 7.
Figures 5, 6, and 7 depict the average testing process of the MLFN-8 model. All the values shown in the three figures are the average values, from which we can draw a conclusion that the model is accurate and robust.
According to the results presented above, MLFN-8 model is proved to be a suitable and rational model in determining the oil content of rapeseed.
There are several previous studies that are relative to the field we studied [12, 17–20]. Gu and Wang  analyzed the oil content of rapeseed by multiple linear regression based on near spectral data, which is the chief inspiration of our work. In contrast, our work has a higher robustness and precision since the core we paid attention to is the well-fitted nonlinear function. Besides, Madsen  established a quick determination approach of oil content in rapeseed by a commercial nuclear magnetic resonance spectrometer. Tkachuk  utilized a near infrared reflectance technique to determine oil, protein, chlorophyll, and glucosinolate content in whole rapeseed kernels. In addition, Velasco and relative coworkers  used near-infrared reflectance spectroscopy to estimate the seed weight, oil content, and fatty acid composition in intact single seeds of rapeseed. Shafii and his coworkers  analyzed the interaction effects on the winter rapeseeds yield and oil content. These researches can analyze the oil content and other properties of rapeseeds effectively, which can be seen as the great references. However, these analytical approaches still need complex manual operation and the process is intricate to some extent. Our study has successfully proved that the oil content of rapeseed can be analyzed by artificial neural networks, which is a quick and easy method that can be calculated automatically by computer.
In the field of food science and analytical chemistry, oil content of rapeseed reveals the yield of the relative products in practical applications. Taking one of the production steps as an example, people should estimate and evaluate the oil content of the rapeseed samples before mass run. Therefore, using artificial neural networks can achieve this step in a high effective way.
Oil content of rapeseed is a crucial aspect on practical applications of food science and chemistry. In this paper, instead of using traditional analytical methods, we successfully used artificial neural networks (ANNs) method to analyze the oil content of 29 rapeseed samples based on near spectral data with different wavelengths. Results show that the multilayer feed-forward neural networks with 8 nodes (MLFN-8) are the most suitable and reasonable mathematical model during experiments. In future research, we will aim at looking for the explicit nonlinear functions of near spectral data in the analysis of rapeseed’s oil content.
Conflict of Interests
The authors declare that there is no conflict of interests regarding the publication of this paper.
This work was funded by the National Marine Public Welfare Research Project (nos. 201305002 and 201305043) and the Natural Science Foundation of Dalian (no. 2012003219).
K. Motobayashi, K. Minami, N. Nishi et al., “Hysteresis of potential-dependent changes in ion density and structure of an ionic liquid on a gold electrode: in situ observation by surface-enhanced infrared absorption spectroscopy,” The Journal of Physical Chemistry Letters, vol. 4, no. 18, pp. 3110–3114, 2013.View at: Google Scholar
K. Fuwa and B. L. Vallee, “The physical basis of analytical atomic absorption spectrometry: the pertinence of the Beer-Lambert law,” Analytical Chemistry, vol. 35, no. 8, pp. 942–946, 1963.View at: Google Scholar
N. Gupta, “Artificial neural network,” Network and Complex Systems, vol. 3, no. 1, pp. 24–28, 2013.View at: Google Scholar
H. Li, X. F. Liu, S. J. Yang et al., “Prediction of polarizability and absolute permittivity values for hydrocarbon compounds using artificial neural networks,” International Journal of Electrochemical Science, vol. 9, no. 7, pp. 3725–3735, 2014.View at: Google Scholar
Y. Wang, T. Yang, Y. Ma et al., “Mathematical modeling and stability analysis of macrophage activation in left ventricular remodeling post-myocardial infarction,” BMC Genomics, vol. 13, supplement 6, article S21, 2012.View at: Google Scholar
T. Yang, Y. A. Chiao, Y. Wang et al., “Mathematical modeling of left ventricular dimensional changes in mice during aging,” BMC Systems Biology, vol. 6, supplement 3, article S10, 2012.View at: Google Scholar
W. Z. Gu and Y. X. Wang, “Analysis of rapeseed oil by linear regression using near spectral data,” Journal of the Chinese Cereals and Oils Association, vol. 10, no. 2, pp. 57–64, 1995 (Chinese).View at: Google Scholar
C. H. Chen, T. K. Yao, C. M. Kuo et al., “Evolutionary design of constructive multilayer feedforward neural network,” Journal of Vibration and Control, vol. 19, no. 16, pp. 2413–2420, 2013.View at: Google Scholar
B. Shafii, K. A. Mahler, W. J. Price et al., “Genotype X environment interaction effects on winter rapeseed yield and oil content,” Crop Science, vol. 32, no. 4, pp. 922–927, 1992.View at: Google Scholar