This work presents the application of the NIR technique associated with exploratory analysis of spectral data by main principal components for the discrimination of Amazon cocoa ground seeds. Cocoa samples from different geographic regions of the state of Pará, Brazil (Medicilândia, Tucumã, and Tomé-Açu), were evaluated. The samples collected from each region were divided into four groups distinguished by the treatment applied to the samples, which were fermented (1-with fat and 2-fat-free) and unfermented (3-with moisture and 4-dried). Each set of samples was analyzed separately to identify the influence of moisture, fermentation, and fat on the geographical differentiation of the three regions. From the results obtained, it can be observed that it was not possible to differentiate the samples of seeds not fermented by geographic origin. However, fermentation was crucial for efficient discrimination, providing more defined clusters for each geographic region. The presence of fat in the seeds was a determinant to obtain the best model of geographic discrimination.

1. Introduction

Cocoa (Theobroma cacao L.) is one of the most important commercial crops in the world, contributing to the maintenance of inputs to various industries, such as food, pharmaceuticals, and cosmetics, contributing to the capital turnover of millions of dollars per year in addition to significantly influencing the world economy and income generation in several developing countries [1].

It is the primary raw material for the production of chocolate, which contains functional and bioactive groups, such as polyphenols, responsible for anticancer bioactivity [2], vasodilator [3], in addition to antioxidant bioactivity [4, 5], and anti-inflammatory [6].

The quality of cocoa seeds and various food products is correlated with several aspects, including genotype, crop conditions, and climate. Knowledge of geographical origin is often recognized and appreciated by the food industry and consumers and is an important factor that can add value to cocoa beans [7].

Unfermented cocoa seeds have an estimated chemical composition of 1% organic acids, 1% caffeine, 1-2% theobromine, 2-3% sucrose, 2-3% cellulose, 4–6% pentosans, 4–6% starch, 5 to 6% polyphenols, 10 to 15% proteins, 30 to 32% lipids, and 32 to 39% moisture [1, 8]. These compounds’ profiles differ even before fermentation, according to the region where they were cultivated.

The fermentation and drying stages are crucial in cocoa processing, especially for the formation of color, aroma, and flavor of chocolate. These steps lead to profound changes in the chemical structure of cocoa seeds, changing the profile of phenolic, sugars, peptides, and triglycerides, among others [912]. Fat is the main constituent of fermented cocoa seeds composition and its fatty acid profile is changed during fermentation and directly affects the texture, viscosity, melting behavior, aroma, and flavor of the food produced from it [13].

According to the report from the Executive Committee of the Cocoa Plantation Plan (Ceplac), the state of Pará reinforced its position as the largest cocoa producer in Brazil in 2017 [14], and this production is dispersed in different localities, which produce fermented cocoa seeds with different sensory characteristics.

Geographical identification can help in the quality control and traceability management of these seeds. This identification and geographical authenticity can be performed through specific laboratory analyses, which are most time-consuming and tedious and require chemical products that are sometimes harmful to the environment [1517].

The infrared (IR) spectroscopy technique, particularly in the Near-Infrared Region (NIR), is a simple, fast, nonchemical waste method and requires minimal sample handling. This method has been used effectively to determine the origin of various food and nonfood products [18].

Numerous food matrices have already been discriminated by their geographical origins using the NIR technique, among them, products such as honey [19], alcoholic beverages [20], mushrooms [21], green tea [22], butter [23], and wines [24].

The efficiency of the NIR technique has already been tested on cocoa from different regions of the world, and several components of the cocoa have been analyzed and correlated with the NIR method, such as color, volatile, phenolic compounds, antioxidants, fermentation index, and fat [7, 17, 2529].

In addition, the multivariate classification methods (LDA, KNN, BPANN, and SVM) have also been tested by Teye et al. [7] for the discrimination of cacao originated in different regions. In this study, Teye et al. [7] used 194 samples and evaluated the grain discrimination from seven different regions (Ashanti, Brong-Ahafo, Central, Eastern, Volta, northwest, and southwest), according to the different growing regions in Ghana. PCA was used to test the models, as it provides relevant information about the trend of the samples and the formation of clusters [17, 2529].

The purpose of this article was to use the NIR technique to evaluate the influence of sample preparation on the geographical discrimination of cocoa fruits in the cities of Medicilândia, Tucumã, and Tomé-Açu, which are the main important producing regions in the state of Pará, Brazil.

2. Material and Methods

2.1. Selecting Collection Regions

The region’s selection for the execution of this present work was based on the production volume and the geographical distinction between them. Cities from the three main producing regions of the state of Pará (Medicilândia, Tucumã, and Tomé-Açu) and geographically distant from each other were chosen. In the Transamazon region, southwest of Pará, the city of Medicilândia (geographic coordinates 03°26′45″ S and 52°53′20″ W) was chosen, while in the southeastern and northeastern Pará, the cities of Tucumã (geographic coordinates 06°44′52″ S and 51°09′39″ W and Tomé-Açu (geographic coordinates 02°28′41.3″ S and 48°16′50.7″ W) were chosen, respectively. The production quality was the second criterion, where the region of Medicilândia stands out in the international market for having superior quality.

2.2. Sample Collection and Preparation

A total of 189 samples were used, 117 samples of raw cocoa fruits (RC) and 72 dried and ground fermented cocoa samples (DF). Samples were collected in three rural properties in the cities of Tomé-açu (RC = 40, DF = 20), Tucumã (RC = 40, DF = 21), and Medicilândia (RC = 37, DF = 31), corresponding to three different geographic regions of the state of Pará, Brazil. 15 samples were used for external prediction of the classification model performance.

Raw cocoa samples were collected following the traditional harvesting practice of each region, choosing random fruits at the edges and the center of the plantation.

After the harvest, the fruits were broken and stripped; the cotyledons were separated and milled individually in a multifunctional mill (model A11, IKA, Staufen, Germany). The milled material was then sieved in a 600 μm mesh and stored at −22°C until the analysis.

Fermented cocoa samples were obtained after the fermentation and drying processes, according to the traditional methods of each producer in the three regions mentioned above. These samples were dried to constant weight in an oven (model 80 series, Lucadema, São José do Rio Preto, SP, Brazil) at 80°C for moisture standardization. After the drying process, the dried fermented cocoa cotyledons were milled individually and then sieved in a 600 μm mesh and stored at −22°C until the analysis.

After the NIR analysis, the 117 samples of raw cocoa (RC) were dried to constant weight in an oven under air circulation at 80°C. After drying these samples, the spectra were again obtained.

2.3. Defatting of Cocoa Samples

Dried fermented cocoa samples (56 samples) were exhaustively degreased with petroleum ether (Synth, Diadema, SP, Brazil) at a temperature of 55 ± 1°C (method 963.15, AOAC), for 24 hours, according to the Association of Official Analytical Chemists-AOAC [30]. After removing the fat, DF samples were kept for 24 hours at 80°C in an air circulation oven (model 80 series, Lucadema, Brazil) to remove the solvent. After solvent removal, dried fermented degreased cocoa samples (DFD) were stored at −22°C until analysis.

Through the procedures described above, it was possible to evaluate cocoa samples in 4 categories: raw cocoa (117), dried unfermented cocoa (117), dried fermented cocoa (72), and dried fermented degreased cocoa (56) (Table 1).

2.4. Obtaining NIR Spectra

NIR spectra were obtained using an MPA FT-NIR spectrometer (Bruker Optics, Ettlingen, Germany). The spectral data were acquired using absorbance mode in the spectral range from 3500 to 12500 cm−1, with 16 cm−1 resolution and an average of 32 scans per spectrum. For the samples spectral reading, vials with a volume of 3 ml were used with a capacity for about 1 g of the milled and sieved cotyledon. The spectral reading was performed at ±25°C.

2.5. Discrimination Model Development by Geographical Origin

The spectroscopy software used to construct discrimination models was OPUS 6.5 (Bruker Optics, Ettlingen, Germany). The spectral data were previously processed in OPUS 6.5, according to the type of sample. For the samples of cocoa RC, DF, and DFD, the vector normalization pretreatment (SNV- standard normal variate) was performed, while for the DU samples, SNV treatment was performed along with the first derivative.

After the application of pretreatments, the data were processed to develop the discrimination model by the geographical origin of each set of samples. The exploratory analysis method, PCA, was applied to multivariate data to make a visual inspection of results more evident.

The spectral region that has best differentiated the geographic regions was chosen by the software operator, based on information related to the sample chemical composition, such as water content, fat, and protein.

3. Results and Discussion

3.1. Spectral Analysis

The complete NIR spectra obtained from RC, DU, DF, and DFD samples are shown in Figure 1. The prominent peaks can be observed around 9000-8000, 8500-6000, 6000-5000, and 5000-4500 cm−1 within the NIR range. In the RC spectrum (Figure 1(a)), two intense absorption peaks are observed, in 5312 and 7202 cm−1, which are related to the water content in the samples, according to [7], corresponding to the region of the first Overton of the O-H stretch and O-H deformation. The water absorption bands should be removed to reduce interference with the chemical structures corresponding to the group’s CH and CO and combinations of amine groups [31]. With that, we could see in Figures 1(b) and 1(c) that after removing the water, the absorption peaks of other components in the samples became more evident.

The regions corresponding to 9000-7500, 6000-6030, 6030-4000, 4950-4500, and 4500-3850 cm−1 were used for all the calibration model development.

None of the spectra presented in Figure 1 was possible to verify the differences between fruits from different analyzed regions, even in the fermented samples. These differences were only noticed after mathematical treatments through the application of statistical analysis techniques for model development.

3.2. Discrimination Models for Raw Cocoa

For the RC cocoa samples, it was impossible that the spectral band (6200-4000 cm−1) could discriminate groups of the fruits from different regions, as observed in Figure 2(a).

The graph of the scores presented in Figure 2(b), referring to the spectral range between 4000 and 6200 cm−1, shows the lack of differentiation between the three regions’ scores. Bands referring to the phenol group, which can be observed in the regions corresponding to 3562-3322 cm−1, aromatic ring related bands, referring to the region of 2925-2854 cm−1 (attributed to the CH stretch of the aromatic ring) and 1645-1544 cm−1 (attributed to C in the aromatic ring) [7], were overlapped. This effect is directly related to the amount of water and fats in the RC cocoa samples.

Therefore, high absorption of radiation NIR by water present in samples is a factor that contributes negatively to analysis, since the water spectra overlap with the other spectra of interest, making it challenging to construct an analytical and statistical model of geographic discrimination of the in-nature Amazonian cocoa.

3.3. Discrimination Models for Dried Unfermented Cocoa

After the removal of water by drying, the DU samples have shown a trend toward the formation of groups related to discrimination by geographic region. The best spectral range for the DU discrimination was between 4300 and 9300 cm−1 observed in Figure 2(c).

These spectra located between 4300 and 4597, 4902–5199, 5805–6102, and 6406–9300 cm−1 in the whole spectrum for DU are associated with the carbonyl group spectral regions, corresponding to stretching combination (CH2 and CH), first harmonic of CH present in the aromatic ring, combination of C-C and C-N and second harmonic of N-H [15]. These molecular vibrations are caused by functional groups corresponding to polyphenols, alkaloids, vicilin class globulins, proteins, amines, acids, polysaccharides, and other aromatic compounds [7]. These diverse functional groups are the digital identity of each sample from different Amazon regions.

The differentiation between the samples from Tomé-Açu and Tucumã was evident, as observed in Figure 2(d). However, the scores related to the set of Medicilândia samples get mixed with the scores of the samples from the other two regions. Several factors can explain the formation of characteristic groups of cocoa, such as chemical composition [32, 33], degree of fermentation [34], and genotype [35], which may be associated with the region of origin of the seeds [36].

According to [37], cocoa, like other food products, has its characteristics influenced by locality. This influence is directly related to the main components of alkaloid polyphenols, proteins, amines, polysaccharides, acids, and diverse aromatic compounds, as previously mentioned [7].

3.4. Discrimination Models for Fermented Cocoa

The fermentation of cocoa beans is essential, due to the diverse microbial processes developed as a consequence of changes in temperature, pH, and oxygen availability, promoting significant biochemical changes in the type and concentration of flavor precursors in cocoa beans [37] which is a crucial step in the formation of quality sensory attributes. Chemical and biochemical complex changes and interactions occur in the cocoa beans during drying, storage, and fermentation, contributing to the complexity and identity of the Amazon cocoa [38].

The fermentation produces flavor and aroma precursors, such as free amino acids and peptides from enzymatic degradation of proteins and sugars that reduce the enzymatic degradation of sucrose, as well as a significant increase in volatile compounds such as organic acids, esters, alcohols, and aldehydes, after and during the fermentation of cocoa beans. The stoichiometry of their progenitors influences the concentration of these precursors; therefore, this is a response to the nutritional quality of the cultivated soil, from which the nutrients necessary for the synthesis of their parents were collected, revealing the good indicators of the production region [39].

Fermentation was a crucial factor in forming the characteristics of each region, because unlike the dried unfermented samples (DU), the dried fermented samples (DF), whose spectra are shown in Figure 2(e), allowed good discrimination between groups from different regions. The best spectral discrimination range observed was 6000 to 6800 cm−1.

These results allow one to verify that there is a strong influence of the fermentation on the biochemical characteristics of the samples, to allow a clear distinction between them, as shown in Figure 2(f).

Caligiani et al. [10] reported that the progressive fermentation of cocoa seeds causes the hydrolysis of peptides in amino acids, reducing the violet color of the cotyledons at the end of the fermentation. The phenolic compounds are reduced upon drying; this reduction is mainly attributed to the enzymatic action, followed by nonenzymatic reactions due to quinone polymerization, with pH increase and high capitation of O2 during sun drying [9].

The study of Sirbu et al. [12] estimated the increase of lipid content during fermentation from 3 to 6% when compared to the value before fermentation. In addition, the authors observed a change in the triacylglycerol profile caused by fermentation. Some polar triacylglycerols, such as derivatives of hydroxyl allyl fatty acid presented in unfermented cocoa seeds, were not found in dried fermented cocoa seeds, thus allowing chemical differentiation between unfermented and fermented cocoa seeds.

According to the literature, we can see that fermentation causes changes in the profile and concentration of various components of cocoa. Such components or concentrations of components can be characteristic for each region, and because of this, it is possible to carry out geographic identification. The NIR technique associated with chemometrics can previously identify the similarity and differences in various types of samples and classify them, making it possible to visually identify groups that have similar or distinct characteristics.

In this work, we used 15 external samples (not used in the construction of the model) to be able to test the efficiency of the constructed discrimination model (5 samples from Tomé-Açú, 5 from Tucumã, and 5 from Medicilândia). 100% of the samples were correctly classified according to their regions of origin.

3.5. Discrimination Models for Degreased Cocoa

To evaluate the importance of the lipid content of the samples in geographic differentiation, it was made the degreasing of the DF cocoa samples and the spectral evaluation was carried out again, using the same spectral treatment applied to discriminate the nondegreased fermented samples. Scores of the degreased samples are shown in Figure 3.

As observed in Figure 3, even though the mathematical model treatment was used for DF samples (dried fermented), it was not observed the separation between the three groups; however, a partial separation between two regions (Tomé-Açu and Tucumã) was observed, which was the same as visualized in the DU samples. This shows that the fat removal made it difficult to classify the samples in three different geographic regions using the DFD sample discrimination model, validating partially the results of [40], who have shown a strong correlation of fatty acid composition in the geographic differentiation. In our research, we observed that it was possible to discriminate fermented samples more easily with their original butter constitution.

The lipids and proteins storage cells in cocoa seeds have complex cytology, composed of a compacted cytoplasm with multiple vacuoles, where proteins and lipids can be found, as well as other components such as starch granules, which are important for the definition of specific flavor and aroma characteristics. These biochemical and cytological characteristics of the cells change in percentage and morphological characteristics from region to region, affected by several physical, chemical, and climatic factors [39].

Since there was an indication of partial separation of DFD samples, it was developed a model from two separation steps. In the first separation step, the spectral region of 5950 to 6835 cm−1 was used (Figure 4(a)), in which the regions of Tomé-Açu and Tucumã were separated (Figure 4(b)). In the second step of the model, the sample data for the two groups that did not show clear separation in the first step (Medicilândia and Tucumã), the spectral region of 7100 and 9300 cm−1 was used (Figure 4(d)). Two spectral regions were used to build the complete separation model of the DFD samples, as shown in Figure 5.

The DFD cocoa seed sample separation model was performed in two stages. For the first stage, it was used the spectral data of the cocoa from the three regions (Medicilândia, Tucumã, and Tomé-Açu) (Figure 4(b)), and for the second stage, it was used the spectral data of the cocoa from Medicilãndia and Tucumã, whose separation it was less evident in the first stage (Figure 4(c)). Thus, compared to the previous separation model (of DF seeds), we could observe that the lipid fraction had considerable influence; however, it was not essential for geographic identification.

According to Sirbu et al. [12], after fermentation, cocoa presents a different profile of triacylglycerol and starch granules, which influence the spectroscopic behavior. And after the fat removal, the other components that changed during fermentation remain there, making geographic differentiation possible [12].

The application of the NIR technique is quite simple; however, the path to acquiring spectral information and analytical data is time-consuming and costly. Once the database of spectral and analytical information is obtained, the mathematical model can be constructed through the use of computational algorithms that plan and/or optimize such experimental procedures. From this, this elaborated model can be used to analyze external samples only with spectral information. The spectral data will be correlated with the data previously obtained and the result is estimated. In this way, the analysis becomes simple, fast, and economical. It is enough that, after the construction of the prediction model, the spectra of the material in question are obtained and the model is applied.

4. Conclusions

The results showed that it was possible to evaluate the influence of sample preparation on the geographic discrimination of cocoa from different geographic regions, in the Pará state, using the NIR technique.

The importance of the fermentation process for geographic identification was observed, and it was noticed provided specific characteristics for each region evaluated. The fermented cocoa lipids were crucial in the formation of groups in the fermented samples, also indicating that fermentation is a crucial step to develop specific characteristics for each collection region. Thus, it is assumed that the biotas of each region should affect the fatty acids contained in cocoa butter differently since similar results were not found in defatted samples.

The effect of the climate and production process, although not directly explicit, cannot be ruled out, considering that they can significantly influence the type of biota and the fermentation process, as well as other possible variables.

Data Availability

The data used to support the findings of this study are available from the corresponding author upon request.

Ethical Approval

Ethics approval was not required for this research.

Conflicts of Interest

The authors declare that there are no conflicts of interest.


The authors would like to thank the Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES, Finance Code 01) for the doctoral scholarship of the first author and Programa de Pós-graduaçãoemCiência e Tecnologia de Alimentos of the Federal University of Pará (PPGCTA/UFPA) for the great support offered. This study was financed by the Instituto Tecnológico Vale (ITV) (Cacau P2) and supported by the Coordenação de Aperfeiçoamento de Pessoal de Nível Superior-CAPES (Process Number: 1537413).