This paper proposed the joint use of Fourier Transform Infrared Attenuated Total Reflectance Spectroscopy (FTIR-ATR) and Partial Least Square (PLS) regression for the simultaneous quantification of four adulterants (coffee husks, spent coffee grounds, barley, and corn) in roasted and ground coffee. Roasted coffee samples were intentionally blended with the adulterants, at adulteration levels ranging from 0.5 to 66% w/w. A robust methodology was implemented in which the identification of outliers was carried out. High correlation coefficients (0.99 for both calibration and validation) coupled with low degrees of error (0.69% for calibration; 2.00% for validation) confirmed that FTIR-ATR can be a valuable analytical tool for quantification of adulteration in roasted and ground coffee. This method is simple, fast, and reliable for the proposed purpose.

1. Introduction

New and challenging risks, such as adulteration, have emerged as food supply chains become increasingly global and complex, although fraud in the food sector has been an issue since ancient times. Food adulteration tends to be economically motivated and is achieved through addition, substitution, or removal of food ingredients. It is an issue that concerns not only consumers, but producers and distributors as well [1].

Coffee is one of the most valuable and most commonly consumed beverages in the world. Due to its high price, this commodity is usually targeted for adulteration. Impurities and adulterants are the most common concern. Any low-cost material of biological origin could be used as a potential adulterant in coffee [2]. Roasted and ground coffee presents physical characteristics (particle size, texture, and color) that are easily reproduced by roasting and grinding a variety of biological materials (cereals, seeds, parchments, etc.). As reported in previous works, coffee husks, sticks, spent coffee grounds, corn, barley, rice, and soybeans have been worldwide admixed with coffee for the sole purpose of adulteration [3, 4].

In order to develop analytical tools suitable to detect and identify adulteration in roasted and ground coffee, different techniques and procedures have been proposed, including UPLC [2], GC-MS [3], Direct Infusion Electrospray Ionization [5], HPAEC-PA [6], HPLC-DAD [7], UV-Vis, and Infrared Spectroscopy [4, 811]. Among these techniques, spectroscopic methods have gained attention in recent studies because they are fast, reliable, and simple to perform and usually do not require sample pretreatment, being thus appropriate for establishment of routine laboratory analysis.

In previous studies, we have shown that Diffuse Reflectance Fourier Transform Infrared Spectroscopy (DRIFTS) is suitable for identification, discrimination, and quantification of adulterants in roasted and ground coffee [4, 9, 10]. However, application of this method requires that the sample be mixed with KBr prior to analysis, and the amount employed for analysis is quite small, which could affect representativity, considering that adulterated coffee samples are inherently heterogeneous. Such problems could be minimized by employing Attenuated Total Reflectance (ATR) instead. ATR does not require any sample pretreatment and also allows the employment of larger samples [12]. Therefore, in the present study, we evaluate whether or not Fourier Transform Infrared Attenuated Total Reflectance Spectroscopy (FTIR-ATR) is a more effective technique for quantification of adulteration in roasted coffee than DRIFTS. Aside from employing a new measurement technique, we have also further improved our previous studies by increasing the range and number of adulterated samples.

2. Material and Methods

2.1. Samples

Arabica coffee, barley, and corn samples were acquired from local markets. Coffee husks were provided by the Minas Gerais State Coffee Industry Union (Sindicato da Indústria de Café do Estado de Minas Gerais, Brazil). Spent coffee grounds were provided by a local soluble coffee manufacturer (Café Brasília, Minas Gerais, Brazil).

Coffee beans (50 g), coffee husks (30 g), barley (50 g), and corn (30 g) samples were roasted in a convection oven (Model 4201D Nova Ética, São Paulo, Brazil) at temperatures ranging from 200 to 260°C, under different time intervals. Roasting degrees (light, medium, and dark) were established by comparing luminosity () values of the samples to measurements performed in commercially available coffee. A tristimulus colorimeter (Hunter Lab Colorflex 45/0 Spectrophotometer, Hunter Laboratories, VA, USA) with standard D65 illumination and normal colorimetric observer angle of 10° was used in the color measurements. The established roasting degrees were defined as light (), medium (), and dark (). Spent coffee grounds (three lots of 2 kg each) were washed with distilled water to remove impurities. Three 200 g samples were randomly selected from each lot and dried at 100°C for 5 h in order to reach moisture content levels similar to that of ground roasted coffee (~5 g/100 g). Further details on color measurements and roasting conditions are available in our previous study [9]. Pure coffee and adulterants (coffee husks, spent coffee ground, barley, and corn) were intentionally mixed, at adulteration levels ranging from 0.5 to 66 g/100 g, as described in Table 1.

2.2. FTIR Analysis

All measurements were performed in a dry controlled atmosphere (20 ± 0.5°C) employing a Shimadzu IRAffinity-1 FTIR Spectrophotometer (Shimadzu, Japan) with a deuterated L-alanine-doped triglycine sulfate (DLATGS) detector. A Pike sampling accessory (MIRacle), with zinc selenide window, was employed for the ATR measurements. All spectra were recorded in the range of 4000–700 cm−1 with 4 cm−1 resolution and 20 scans and submitted to background subtraction (atmosphere spectra). Preliminary tests were performed to evaluate the effect of particle size (0.39 mm < < 0.5 mm; 0.25 mm < < 0.39 mm; 0.15 mm < < 0.25 mm; and < 0.15 mm) on the quality of the spectra, and the best quality spectra (higher intensity and lower noise interference) were obtained for samples with < 0.15 mm.

Because the 34 solid mixtures were manually prepared, five replicates of each sample were obtained in the FTIR-ATR using different parts of each sample, in order to ensure representativity. Therefore, a total of 170 spectra were obtained for adulterated samples.

2.3. Statistical Analysis

MATLAB software, version 7.13 (MathWorks, Natick, MA, USA), and PLS Toolbox version 6.5 (Eigenvector Technologies, Manson, WA, USA) were employed for data analysis. PLS was employed for quantification of adulterants mixed in roasted coffee samples using the ATR spectra as chemical descriptors, with adulteration levels ranging from 0.5% to 66% in mass (see Table 1). The models were built with 170 spectra. The data were divided in two sets, calibration and validation, employing the Kennard-Stone algorithm, which promotes a data scan, selecting the more representative samples for the calibration set. The resulting calibration and validation sets were comprised of 102 and 68 spectra, respectively.

The data were submitted to two sequential evaluations. The first was focused on the efficiency of different data preprocessing applications. The second was related to the importance of the variables in the quantification process. In this step, different spectra ranges were evaluated in order to check if the use of specific region could improve the quality of the model.

The purpose of preprocessing is to linearize the response of variables and remove extraneous sources of variation (variance), which are not of interest in the analysis. Interfering variance appears in almost all real data because of systematic errors present in the experiment, requiring the model to work harder [13]. The data preprocessing methods tested were mean centering (1), Multiple Scatter Correction (MSC) followed by mean centering (2), MSC followed by first derivative, smoothing, and mean centering (3), Standard Normal Variates (SNV) followed by mean centering (4), SNV followed by first derivative, smoothing, and mean centering (5), absorbance normalization followed by mean centering (6), and first derivative followed by smoothing and mean centering (7).

Mean centering corresponds to subtraction of the average absorbance value of a given spectrum from each data point. Multiple scatter correction (MSC), originally developed to compensate the effects of light scattering in reflectance spectroscopy, has become a widely employed technique for removing general spectra drift features such as day-to-day intensity variations. Spectra derivatives are commonly used for baseline correction, because they provide visualization of small peaks that are difficult to detect in the original spectra. However its application also leads to a decrease in signal/noise ratio and thus a smoothing filter (Savitzky-Golay) was employed to provide noise reduction. SNV is applied to every spectrum individually; once the average and standard deviation of all the data points of the spectra are calculated, every data point is subtracted from the mean and divided by the standard deviation. Absorbance normalization consisted in dividing (i) the difference between the absorbance value at each data point and the minimum absorbance value by (ii) the difference between the maximum and minimum absorbance values [13, 14].

The optimal number of latent variables (LV) for each model was estimated by a cross-validation method (venetian blinds), based on the smallest value of root mean square error of cross-validation (RMSECV). Model performance was measured by evaluation of the root mean square errors for both calibration (RMSEC) and validation (RMSEP) sets, calculated as follows:where and correspond to the real and predicted adulteration levels of sample and and are the total number of samples in the calibration and prediction (validation) sets, respectively. The models with better prediction ability should present lower values of RMSEC and RMSEP.

Model optimization was performed by detection and elimination of outliers. Outliers correspond to samples that are very different from the rest of the data set, and their detection is crucial when developing multivariate models. In this study, outlier detection in the calibration set was based on the methodology proposed by Valderrama et al. [15], which is appropriate for detection of samples with extreme leverages, for example, large residuals in the block (data) or large residuals in the block (model response). If a sample presents leverage (measure of the influence of each sample on the PLS model) larger than a limit value, it is considered an outlier. Such limit can be evaluated as three times the ratio between the number of latent variables and the number of samples [15]. The outliers of validation set were detected by jackknife residue test, as described by De Souza and Junqueira [16].

3. Results and Discussion

Table 2 shows the results obtained for the PLS models based on the full-spectrum (4000–700 cm−1) approach and employing the different preprocessing techniques cited in Section 2.3. For the obtained models, the LV number ranged from 4 to 8, and the RMSEC and RMSEP values ranged from 1.44 to 3.80 and from 2.42 to 3.56, respectively. Among the tested pretreatments, the ones that provided a significant improvement in model performance with the lowest RMSEC and RMSEP values were SNV followed by mean centering. This model was built with 8 LV that together explained 93.5% and 99.3% of the cumulative variance in (spectra data) and in (adulterants concentration), respectively. The obtained RMSEC and RMSEP values were 1.44 and 2.42, respectively, and the correlation coefficient values of calibration () and validation () were 0.99 for both parameters (Table 2). It is noteworthy to mention that such model is more robust in comparison to the one obtained in our previous study [10] employing DRIFTS (LV = 10, RMSEC = 2.01, , RMSEP = 3.70, and ).

The next step was to evaluate if the selection of a specific spectral range could improve prediction accuracy, given that the full spectra could present some systematic variables that do not necessarily represent samples variance. For this reason, the plot of correlation coefficient that provided the main regions responsible for the quantification process is shown in Figure 1. The spectra regions that present greater contribution in the prediction process are characterized by having high absolute values of correlation coefficient. Analyzing the plot in Figure 1 it is possible to see that the highest values of correlation coefficient are concentrated in the range of 1134–700 cm−1 and that extending the spectra range from 700 up to 1735 cm−1 would still provide significant values of correlation coefficients, so both regions were tested. Comparing the data of Figure 1 with the data of Figure 2, in which the mean spectra of pure coffee and of the adulterants are shown, it is possible to check that the latter wavenumber range is characterized by vibrations of several types of bonds such as C–H, C–O, and C–N [17]. Chlorogenic acids, a class of phenolic compounds comprised of quinic acid esterified to a variety of trans-cinnamic acids, present strong absorption in the region of 1450–1000 cm−1. Bands in the range 1085–1050 cm−1 can be assigned to axial C–O deformation of the quinic acid, in the range 1420–1330 cm−1 attributed to O–H angular deformation and C–O–C ester bond absorption in the 1300–1000 cm−1 range [18]. These chlorogenic acids are present in significantly greater amounts in coffee and its by-products than in barley and corn. Carbohydrates also exhibit several absorption bands in the range of 1500–700 cm−1 [19, 20], so it is expected that this class of compounds will contribute to many of the observed bands that occur in the spectra. Particularly, the skeletal mode vibrations of the glycosidic linkages in starch (present in corn and barley but not in the other samples) are usually observed in the 950–700 cm−1 wavenumber range, the so-called anomeric region of the spectrum [21]. Notice in Figure 2 that the sharp bands in the region of 950–700 cm−1 are coincident with the spectra of corn and barley but shifted in relation to the bands for the spectra of coffee, spent coffee, and coffee husks. These differences can be attributed to the different types of polysaccharides present in coffee and its adulterants. β-Glycosidic links are expected to appear in coffee and by-products in association with arabinogalactans, galactomannans, and cellulose, whereas α-glycosidic links should primarily appear in corn and barley due to the presence of starch. Other substances that naturally occur in coffee are reported to present absorbance bands in the range of 1700–1400 cm−1 [9]. Examples include caffeine (1700–1600 cm−1) and trigonelline (1650–1400 cm−1), as pointed out in the literature [22, 23].

An evaluation of the coefficients shown in Figure 1 indicates that, besides the previously discussed regions, the only other peaks with significant values of correlation coefficient are 2918 and 2850 cm−1. In the mean spectra shown in Figure 2, two significant absorption bands can be clearly seen between 2920 cm−1 and 2852 cm−1, which are more intense in coffee and spent coffee grounds spectra. Such bands can be partly assigned to unsaturated and saturated lipids present in coffee, corn, and barley oils, which do not undergo changes during roasting, and, more specifically, the band at ~2852 cm−1 can be attributed to stretching of C–H bonds of methyl (–CH3) group in the caffeine molecule [9]. This latter band is less evident in the spectra for barley and corn in comparison to the others, since corn and barley do not contain caffeine. The intensities of such bands are clearly affected by both levels of caffeine and lipids in coffee and primarily affected by caffeine in coffee husks (virtually devoid of oil) and by the lipids in roasted corn, roasted barley, and spent coffee grounds. The majority of the caffeine present in coffee is extracted during soluble coffee production whereas the lipid fraction is only partially extracted; thus spent coffee grounds may be considered to be devoid of caffeine but still containing significant amount of lipids.

In view of the aforementioned, the tested ranges were 4000–700 cm−1 (full spectra), 1735–700 cm−1, and 1135–700 cm−1. New models were built using these selected regions and the data were submitted to SNV and mean centering as preprocessing strategies. Table 3 shows the PLS results for this evaluation. As can be seen in Table 3, the selection of spectra ranges did not contribute to the prediction improvement; the model built with full spectra presented better prediction capacity than the other models.

As the best PLS model obtained was built with full spectra and its data were submitted to SNV and mean centering, the next step was to optimize it by using the procedure for detection of outliers. The outliers were detected at 99% confidence level, and the results are summarized in Table 4. The optimization of the validation set was only performed after finishing the optimization of the calibration set. Besides, no more than three rounds of outlier detection (four models) should be performed, in order to avoid the “snowballing effect,” when repetitive rounds continue to identify outliers [15]. As can be seen from Table 4, three rounds of outlier detection were performed. In the final model (4th), twenty-three outliers were detected in the calibration set (corresponding to 22% of the samples) and fourteen in the validation set (corresponding to 20.5% of the samples). Most of the outliers identified were associated with samples that presented the highest levels of adulteration (over 40%). The optimized model obtained after outliers removal consisted of 79 and 54 samples in the calibration and validation sets, respectively. It was built with 8 LV that together explained 88.7 and 99.5% of the accumulated variance in (spectra data) and in (adulterants concentration), respectively. The RMSEC and RMSEP values were 0.69 and 2.00, respectively; the obtained correlation coefficients of calibration () and validation () were 0.99 for both. The curve of experimental values versus predicted values of the optimized model is shown in Figure 3(a). As can be seen by examination of the plot, this model is capable of predicting adulteration levels with accuracy. Residuals are randomly distributed about the mean value, which is satisfactorily close to zero, as is shown in Figure 3(b).

A comparison of the model obtained in the present study with the one based on DRIFTS [10] is shown in Table 5. The types of adulterants and roasting conditions were the same in both studies, as well as the outlier removal procedure. The major difference, besides employing distinct measurements techniques (DR versus ATR), is that with FTIR-ATR we employed a larger number of samples at lower adulteration levels. A comparison of the models indicates that the one based on FTIR-ATR is more robust and presents better prediction abilities, with much lower RMSEC and RMSEP values. This in association with the fact that it employed a larger number of samples at low levels of adulteration indicates that FTIR-ATR is more appropriate for detection of adulteration in roasted and ground coffee.

4. Conclusion

PLS models of ATR spectra were successfully developed. The optimized model was built with full spectra (4000–700 cm−1) that were submitted to SNV and mean centering as data preprocessing strategy. It was capable of predicting adulteration levels ranging from 0.5% to 40%. For this final model, the determination coefficients were 0.99 for both calibration and validation sets, and the errors observed during calibration and validation were quite low, 0.69% and 2.00%, respectively. It can be concluded that because the use of the full spectrum provided more robust models, the detection of adulteration and discrimination of adulterated and nonadulterated coffee samples cannot be attributed to a single class of components, rather being dependent on a variety of compounds, such as lipids, chlorogenic acids, caffeine, and polysaccharides. PLS and FTIR-ATR proved to be promising techniques, suitable for quantification of multiple adulterants in roasted and ground coffee.

Conflict of Interests

The authors declare that there is no conflict of interests regarding the publication of this paper.


The authors acknowledge financial support from the following Brazilian government agencies: CAPES, CNPq, and FAPEMIG.