Abstract

Fourier-transform infrared (FTIR) offers the advantages of rapid analysis with minimal sample preparation. FTIR in combination with multivariate approach, particularly partial least squares regression (PLSR), has been widely used for adulterant analysis. Limited study has been done to compare PLSR with other regression strategies. In this paper, we apply simple linear regression (SLR), multiple linear regression (MLR), and PLSR for prediction of lard in palm olein oil. Pure palm olein oil was adulterated with lard at different concentrations and subjected to analysis with FTIR. The marker bands distinguishing lard and palm olein oil were determined using Fisher’s weights. The marker regions were then subjected to regression analysis with the models verified based on 100 training/test sets. The prediction performance was measured based on the percentage root mean square error (%RMSE). The absorption bands at 3006 cm−1, 2852 cm−1, 1117 cm−1, 1236 cm−1, and 1159 cm−1 were identified as the marker bands. The bands at 3006 and 1117 cm−1 were found with satisfactory predictive ability, with PLSR demonstrating better prediction yielding %RMSE of 16.03 and 13.26%, respectively.

1. Introduction

Adulteration of oils is an issue persisting in the market [1]. In 2013, a company in Taiwan was found to market cheaper oils as premium class oils. This was followed by an incident of lard-based cooking oil being adulterated with gutter oil where more than 1,300 food products were affected [2, 3]. Consumer Voice [4] further reported that 47.09% of 1,015 edible oil samples tested from 14 states in India were not in compliance with the Food Safety and Standards Regulations.

Lard is considered one of the cheaper oils in the food industry. It can be blended effectively with other oils, with the intention to reduce the production cost. The presence of lard in cooking oil is important due to two perspectives: economic considerations and religious restrictions. Religions such as Islam and Judaism forbid the consumption of swine and any of its derivatives [1, 5] and hence should not be present in halal-labelled products. From the economic perspective, the credibility of Malaysia as a major producer and exporter of palm oil would be at risk should their products be found adulterated. A company in Malaysia was allegedly charged with intention to export palm oil adulterated with fatty acid to Sri Lanka [6].

Various methods have been developed to identify the adulteration of cooking oil; these include Gas Chromatography Mass Spectroscopy (GC-MS), High-Performance Liquid Chromatography Mass Spectrometry (HPLC-MS), Fourier-Transform Infrared (FTIR), Nuclear Magnetic Resonance (NMR), etc. The advantages and disadvantages of these analytical methods for adulterant analysis are summarized in Table 1.

Most of these techniques are costly and time-consuming. FTIR offers the advantages of rapid analysis with minimal sample preparation and is inexpensive. This technique, integrated with statistical approach particularly partial least squares (PLS), has demonstrated promising sensitivity for adulterant analysis [12, 13, 14]. FTIR coupled with PLS has been used for detection of adulterants in edible oils including avocado oil, sunflower oil, and palm oil with a detection limit as low as 2–3% [15, 16, 17]. In some cases, the detection level may be much higher, for example, the quantification of hazelnut in virgin olive oil is reported at 25% or higher. PLS regression has been commonly coupled with FTIR technique for prediction of adulterants; there is however limited study on the possibility of other regression strategies. Hence in this paper, we apply simple linear regression (SLR), multiple linear regression (MLR), and partial least squares regression (PLSR) for prediction of lard in palm olein oil using FTIR. This will provide fundamental knowledge on the performance of different regression models for adulterant analysis contributing toward quality control purposes.

2. Materials and Methods

2.1. Sample Preparation

Readily available palm olein cooking oil was purchased from the local market. Pure lard was extracted from adipose tissues of swine purchased from the local market. The adipose tissues were cut into small pieces and heated in an oven at 90°C for 2 hours. The liquid fat was ladled into a glass jar. It was left to cool to room temperature before storage. Prior to use, lard was preheated with a block heater (Stuart SBH200D) at 50°C for 1 hour, until the solidified lard turned into liquid.

The lard, pure palm olein oil, and the adulterated olein oil at 20% and 50% were analysed. The samples were agitated with a vortex mixer (VELP Scientifica Model ZX4) for 1 minute to ensure homogeneity [1, 17, 18].

2.2. FTIR Spectra Measurement

The samples were scanned with a Fourier-transform infrared spectrophotometer (Thermo Scientific Nicolet iS10) equipped with a diamond crystal attenuated total reflectance (ATR). The spectra were acquired at a resolution of 4 cm−1 with 64 scans in the range of 4000–525 cm−1. The spectrum was ratioed against a fresh background spectrum recorded from the bare ATR plate. Prior to collection of each background spectrum, the ATR plate was cleaned with pure ethanol. At each concentration level, a total of 20 replicates were scanned yielding 80 spectra. The spectra were saved in csv format for further analysis using Matlab R2013a.

2.3. Spectra Processing

The spectra were baseline corrected and subjected to peak detection according to the first derivative approach. The peaks detected were then matched across samples to produce a peak table with rows and columns representing samples and variables (in wavenumber), respectively. The algorithm is referred to [19] for brevity. The resultant peak table was analysed to deduce the marker bands differentiating pure and adulterated samples.

2.4. Variable Selection

Fisher Weights, a multiclass variable selection method, was employed to determine the variable(s) with discriminatory ability. The weight, , for each variable, m, according to class (c = 1…C) was calculated based on the following equation. The variable with a higher magnitude of weight is elucidated with greater discriminatory ability [20]. They are called the marker bands which are used for prediction of lard adulteration in palm olein oil using SLR, MRL, and PLSR.where are mean of the variable in class c and overall mean of the variable, respectively, is the pooled standard derivation, and is the number of members in class c.

2.5. Simple Linear Regression (SLR)

The peak area of a marker band was calculated as the sum of signal from peak start to peak end. The vector of peak area, X, is assumed with linear relationship with the corresponding lard concentration, C. The regression is expressed as where b is the coefficient and is the predicted concentration.

2.6. Multiple Linear Regression (MLR)

The calibration model was built using the spectral data, X (a matrix), with its corresponding lard concentrations, C, in which C = X · B and B = (X′ · X)−1 . X′ · C. The regression equation can then be written as , considering only the linear terms [21].

2.7. Partial Least Squares Regression (PLSR)

The PLS calibration model was developed using the spectral data, X, and its corresponding lard concentration, C, based on two principal components. The PLS algorithm assumes a linear relationship between X and C. They are decomposed into the models of X = T · P + E and C = T · q + f, where E and f are the noise, T is the scores matrix common for X and C, and P and q are the loadings matrices. The algorithm of PLS involves the projection of X onto the weight vector to get a scores vector, t. X is then projected into the scores to get loadings, p. After every PLS component, the X matrix is deflated by subtracting t · p from X. The algorithm of PLS according to NIPALS (non-linear iterative partial least squares) is explained in detail in [21].

2.8. Model Evaluation

The models were built using the training samples and validated with the test samples. A two-third of the 80 spectra were used as the training samples with equal number from each class whilst the remaining served as the test samples. The samples were split randomly for 100 iterations, and these 100 training/test sets were subjected to SLR, MLR, and PLSR according to the selected spectral regions for prediction of lard. For PLSR, the matrix of training samples was in addition standardized, and the corresponding concentration, C, was mean-centred; the test set was standardized using the mean and standard deviation of the training samples. The prediction performance was evaluated based on the percentage root-mean-squares error (%RMSE), in which

A lower %RMSE signifies better prediction. Typically, the training samples will inherit better prediction than the test samples. However, if a model predicts exceptionally well for the training samples but not for the test samples, it implies that the model is overfitted. Figure 1 illustrates the flow chart of the training/test set splitting for regression analysis. The process was programmed as a routine, and all analyses were performed in Matlab R2013a.

Analysis of Variance (ANOVA) with Tukey’s test was performed to evaluate the %RMSE attained based on different spectral regions over 100 training/test splits to determine if there is a significant different at 95% confidence level.

3. Results and Discussion

The spectra pattern of pure and adulterated oil is shown in Figure 2; they are considerably similar with several major absorption peaks identified at the regions of 3000–2800 cm−1, 1700–1600 cm−1, and 1500–900 cm−1. These characteristic peaks are likewise reported by [1] with some discrepancies; the peak at 2954 cm−1 is shifted to 2922 cm−1 and that at 914 cm−1 is inconsistently detected.

Based on Fisher Weights, five peaks at 3006 cm−1, 2852 cm−1, 1117 cm−1, 1236 cm−1, and 1159 cm−1 were identified as variables with the most significant discriminatory ability, agreeing with [22]. These peaks were reported to reduce in intensity with increasing concentration of lard; nevertheless, this observation is not entirely evidenced in the present study. The peak at 3006 cm−1 was seen to increase corresponding to lard concentrations, opposing the findings of [22]. For other marker bands, an inverse relationship is demonstrated between the peak intensity and concentration of lard as reported. Figure 3 illustrates the spectral regions of five variables with the most significant discriminating ability.

The peak at 3006 cm−1 is attributed to the stretching of cis C=CH bond in unsaturated fatty acids, whereby the more abundant the bond is, the higher the peak intensity [23]. As stated on the label of palm olein oil used in this study, the product contains 43% saturated fats, 43% monounsaturated, and 14% poly-unsaturated fats. In comparison to the composition of lard with 48% and 11% mono- and polyunsaturated fats, as reported by [5], the lard is anticipated with richer cis C=CH bonds. This offers an explanation to the positive correlation between the peak intensity and lard concentrations. The peak at 2852 cm−1 is the characteristic of C-H stretching where the intensity is governed by the abundance of long-chain saturated fatty acids [24]. Typically, lard contains higher amounts of stearic acid (18 : 0); nevertheless, its total saturated fatty acid (42%) is lower than palm olein oil (45.8%) supporting the reduced intensity at 2852 cm−1 as the lard concentration increases. The peak at 1117 cm−1 on the contrary is attributed to the out-of-plane CH bending; according to [13], a higher abundance of oleic acyl groups in oil (18 : 1) would evidence a reduction in the peak intensity. Lard typically contains 42% of oleic acid whilst palm olein oil comprises of 38% [25]; this suggests the inverse relationship between the peak intensity and lard concentrations. Other peaks at 1236 cm−1 and 1159 cm−1 are linked to the stretching of C-O group in esters. According to [1], the fingerprint region at 1500–1000 cm−1 is the most suitable for discrimination of pure oil from the admixture of lard.

A two-third of the 80 spectra was randomly assigned as the training samples (n = 52) to develop the calibration model whilst the remaining 28 samples were used to test the model. Note that, for the training set, each level of concentration has an equal number of samples. A total of 100 training and test sets were used to ensure the model is consistent and reliable for prediction. These 100 training/test sets were subjected to SLR, MLR, and PLSR according to spectral regions of 3006 cm−1, 2852 cm−1, 1117 cm−1, 1236 cm−1, and 1159 cm−1.

Table 2 summarizes the %RMSE of prediction according to spectral regions and training/test sets using various regression models. Evidently, the spectral regions with better predictive ability are those at 1130–1100 cm−1 and 3020–2990 cm−1, where the peak maximum is recorded at 1117 and 3006 cm−1, respectively. This is demonstrated in PLSR and MLR with the former outperforms the latter whilst SLR exhibits exceptionally poor prediction across all regions—presumably has no predictive ability. The %RMSE based on the regions at 1159, 1236, and 2852 cm−1 continue to increase in ascending order, according to PLSR, indicative of diminishing predictive ability. An extensive review on infrared spectroscopic technique for adulteration of food lipids [26] corroborated the aforementioned effective region at 3020–2990 cm−1 and 1130–1100 cm−1 for prediction of lard [1, 13, 2731].

Among the three regression models, PLSR demonstrates more reliable and consistent prediction; this approach has been widely used for prediction of adulterants exhibiting superior accuracy over other strategies such as principal component regression, ordinary least squares and ridge regression [32,33]. MLR is a linear approach that models the relationship between a dependent variable with more than one explanatory variable (independent). This approach will fall short when the number of independent variable is more than the number of sample, such as the spectral data, and if the variables are not independent. Besides, if the variables are characterized with profound noise, the prediction may be very susceptible to changes [34]. SLR on the other hand is very sensitive to outliers and tends to be overfitted. Figure 4 illustrates the predicted concentration versus the expected concentration of test samples based on three different models (SLR, MLR, and PLSR) with specific reference to the spectral regions of 3006 and 1117 cm−1.

4. Conclusion

In this paper, we compared three different regression models (SLR, MLR, and PLSR) for prediction of lard in palm olein oil. The marker bands for differentiation of lard and palm olein oil were identified at 3006 cm−1, 2852 cm−1, 1117 cm−1, 1236 cm−1, and 1159 cm−1. The regions with promising predictive ability were confirmed at 3006 and 1117 cm−1 with PLSR demonstrating better accuracy.

Data Availability

The data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest

The authors declare that there are no conflicts of interest regarding the publication of this article.

Acknowledgments

The authors would like to thank University Malaysia Sarawak for funding this study under the budget of Faculty of Resource Science and Technology.