The potential of predicting maturity using total soluble solids (TSS) and identifying organic from inorganic pineapple fruits based on near-infrared (NIR) spectra fingerprints would be beneficial to farmers and consumers alike. In this study, a portable NIR spectrometer and chemometric techniques were combined to simultaneously identify organically produced pineapple fruits from conventionally produced ones (thus organic and inorganic) and also predict total soluble solids. A total of 90 intact pineapple fruits were scanned with the NIR spectrometer while a digital refractometer was used to measure TSS from extracted pineapple juice. After attempting several preprocessing techniques, multivariate calibration models were built using principal component analysis (PCA), K-nearest neighbor (KNN), and linear discriminant analysis (LDA) to identify the classes (organic and conventional pineapple fruits) while partial least squares regression (PLSR) method was used to determine TSS of the fruits. Among the identification techniques, the MSC-PCA-LDA model accurately identified organic from conventionally produced fruits at 100% identification rate. For quantification of TSS, the MSC-PLSR model gave Rp = 0.851 and RMSEC = 0.950 °Brix, and Rc = 0.854 and RMSEP = 0.842 °Brix at 5 principal components in the calibration set and prediction set, respectively. The results generally indicated that portable NIR spectrometer coupled with the appropriate chemometric tools could be employed for rapid nondestructive examination of pineapple quality and also to detect pineapple fraud due to mislabeling of conventionally produced fruits as organic ones. This would be helpful to farmers, consumers, and quality control officers.

1. Introduction

Pineapple (Ananas comosus (L.) Merr) is the most economically significant crop in the family Bromeliaceae with exceptional juiciness, vibrant tropical flavour, and immense health benefits. Pineapple fruit is a good source of vitamin C, fiber, and other minerals. It also contains sugar, bromalin (protein-digesting enzyme), citric acid, malic acid, vitamins A and B, and excellent amount of fiber [1]. Quality evaluation and assurance of pineapple fruits before export, during processing, and in the fresh market is a required activity to ensure quality and safety. These are normally based on internal quality traits such as total soluble solids (TSS), firmness, and acidity. However, TSS (°Brix) has been established as the most vital internal quality indicator. For instance, TSS is among the most important internal quality attribute in determining fruit maturity and harvesting time as well as assessing and grading postharvest quality fruits [2].

At present, the conventional technique for the determination of the internal quality parameters of fruits such as pineapple involves destructive means. This method is usually cumbersome and wasteful. It requires specialised equipment, elaborate procedures, and trained personnel, which results in high analysis costs and does not allow for the entire fruits produced to be analyzed [3]. Oftentimes, a representative sample is used to predict the entire fruits and this normally leads to misjudgement. Therefore, rapid nondestructive prediction of TSS in pineapple would be of great value in determining the best harvesting time with vital consequence on its eating quality. This would be timely in meeting the ever-increasing consumer demand for consistent high-quality fruits.

Furthermore, the high price enjoyed by selling organically produced pineapples as against conventional ones has led to mislabeling also known as food fraud to gain undue financial advantage. Studies have revealed that consumer demand for organic food is growing and organic agriculture is more profitable due to higher price farmers receive for their produce [4, 5]. Organic production in general is a system which excludes the use of synthetic fertilizer, pesticides, growth regulators, and other agrochemicals [6]. However, the techniques used for detecting the differences (organically produced fruits such as pineapples from conventional ones) are relatively expensive, destructive, and time consuming and often require elaborate sample preparation. Hence, simultaneous detection of pineapple fruit quality parameters (authenticity and quantification) using portable NIR spectrometer and chemometric technique would be very beneficial. This could offer rapid examination of pineapple fruit quality.

Near-infrared spectroscopy provides an alternative for the determination of pineapple quality parameters. This method provides rapid and nondestructive detection of food quality and safety. It has found its use for qualitative and quantitative analysis in the food industry. NIR spectroscopy is simple, rapid, and nondestructive and requires minimal or no sample preparation. According to other authors, NIR spectroscopy could be one of the most commonly used techniques due to its speediness, noncontact, and low operating cost [79]. For fruit quality measurements, NIR spectroscopy has been used to detect nitrate levels in intact pineapple [10], soluble solids content and acidity in kiwifruit [11], and pear internal quality indices [12]. Furthermore, miniaturization of NIR spectroscopy has resulted in the development of commercial handheld or portable spectroscopic systems that offer additional speed, simplicity, and sensitivity. It also presents an ideal tool for agrifood quality evaluation for in situ measurements due to its portability [13, 14]. Other researchers have used handheld or portable NIR spectrometers to determine fruit quality parameters such as TSS, TA, and sugar content in fruits [7, 15, 16]. Furthermore, Cayuela and Weiland used two portable spectrometers to predict several quality parameters in intact oranges [17] while Sánchez et al. studied by improving the performance of portable NIR instrument for intact nectarines [18]. However, and to the best of our knowledge, no researcher has investigated the feasibility of using portable NIR spectrometer coupled with chemometric techniques to simultaneously discriminate organically produced pineapples and conventional ones and also predict total soluble solids (TSS) nondestructively.

Therefore, the main objective of this study was to evaluate the potential of using portable NIR spectrometer for rapid and nondestructive identification and quantification of pineapple quality parameters such as TSS of intact pineapple fruits. The specific objectives were to determine the best multivariate technique for identification and to predict TSS of intact pineapple fruits.

2. Materials and Methods

2.1. Pineapple Fruit Samples

In this study, 90 pieces of sugarloaf pineapple fruits at different maturity stages were obtained directly from pineapple farmers in the Central Region of Ghana and transported to the University of Cape Coast, School of Agriculture Teaching and Research Laboratory. These fruits comprise 30 pieces of organically produced pineapple fruits and 60 pieces of conventionally produced pineapple fruits. The fruits were then stored at 26°C (±1°C) for two days before measurements were taken.

2.2. Sample Spectra Acquisition

The spectrum of each pineapple was collected in the reflectance mode using a handheld spectrometer (SCIO™) with spectra range between 740 nm and 1070 nm in a 1 nm resolution for spectra data recording. For each fruit, the lower part was scanned three times after rotating it at 120°. The scanning was done at an ambient temperature of 26 ± 1°C with a humidity of 60%. Figure 1 shows the setup of the scanning processing using SCIO NIR spectrometer.

2.3. Reference Measurements (TSS/°Brix)

Total soluble solids (TSS) contents were determined using a digital refractometer (model: PAL-1, °Brix range of 0–35%; Atago, Tokyo, Japan) according to the methods described by others [15, 16, 19]. For each pineapple fruit, the base was selected and juiced. About 1.0 ml juice was then taken for TSS measurement with a digital refractometer. Triplicate measurements were performed and the results expressed as °Brix.

2.4. Data Partition

The raw dataset (from the 90 samples) after preprocessing with suitable techniques was divided into two subsets, calibration set (data from 68 samples) for developing the model and prediction set (data from 22 samples) for evaluating the predictive ability of the constructed models. To avoid bias, 75% of data from both organic and inorganic samples were selected as the calibration set while the remaining data were selected as the prediction set. As shown in Table 1, the members in each set were selected in order to come to a 3/1 division of calibration set/prediction set.

2.5. Software Device

Spectra data recordings stored in a cloud-based dataset with their corresponding reference value for time of scanning were downloaded using a research license of SCIO lab and imported to MATLAB version 9.5.0 (Mathworks Inc., USA) with Windows 10 Basic for data processing for all preprocessing treatments and multivariate algorithms.

2.6. Spectra Preprocessing Techniques

The average raw spectra of the pineapple samples are shown in Figure 2(a) while the other pretreatment spectra are also shown in Figures 2(b) and 2(c). The activity of preprocessing of the spectra data is an integral part of modelling to eliminate background information and noise from the useful properties of the scanned samples [3, 20]. In this research, two spectra preprocessing techniques (mean centering (MC) and multiplicative scatter correction (MSC)) were applied because the models developed using the raw spectra data did not give the desired results.

MC is normally used as resolution enhancement method, and it is known to work by simply adjusting a dataset to reposition the centroid of the data to the origin of the coordinate system [21].

MSC on the other hand is a useful preprocessing technique for the correction of scattered light and inclination of baseline variation. For more information, refer to other authors [21, 22]. For spectra with additive effect (xi) and multiplicative effect (yi), it can be represented aswhere Sm is the mean of the set of spectra. The residual error vector ei gives information about the random noise. Hence, the MSC-corrected spectra are calculated aswhere for a measured data containing a set of spectra Si (i = 1, 2, …, K) with data points Sij (j = 1, 2, …, N), the mean of these data could be represented as Sim, which is the mean of that spectrum.

2.7. Principal Component Analysis (PCA)

PCA is an unsupervised data description and dimension reduction techniques which is mostly used to deal with large spectra data [23]. It normally involves the first step of data analysis in order to detect patterns from the data matrix as it brings out visualized data trends in dimensional space [24]. For more information, refer to these authors [25].

2.8. Multivariate Models

The development of computers and software programmes is making chemometric techniques a very powerful tool for processing NIR spectra data as it overcomes the difficulty of multicollinearity and gives scientific statistical inferences for meaningful conclusions to experimental results [26, 27]. Choosing the best method is the next challenge as it is quite a cumbersome process since there existed quite a lot of types. In this research, K-nearest neighbor (KNN) and linear discriminant analysis (LDA) were employed comparatively.

K-nearest neighbor is a linear and nonparametric classification method which works based on a distance function that measures the difference or similarity between two stances [28, 29]. For KNN, parameter K influences the results of the classification model; hence, the choice of K is normally optimized by calculating its potential with several K values (normally small K values of 3 or 5). It must be known that KNN cannot work well if large differences are present in the number of samples in each class [25]. This therefore makes KNN tool a more suitable technique for modelling similar class groupings.

Linear discriminant analysis is a linear and parametric supervised pattern recognition technique which has found it useful for analyzing spectra data. It works by finding linear combination of features which brings out clearly the ratio of between-class variance and reduces the ratio of within-class variance [30]. For more information, refer to [25]. It is important to note that the performance of LDA is based on the number of principal component factors.

2.9. Partial Least Squares (PLS)

Partial least squares (PLS) is a well-known linear multivariate method used for spectra data processing and it can analyze data with strong collinear, noisy, and redundant variables. For more information, refer to [31, 32]. The results of the PLS model are normally evaluated by using three main parameters, namely, the root mean square error of cross-validation (RMSECV), the root mean square error of prediction (RMSEP), and the correlation coefficient (R), among others [33]. These parameters were calculated by the following equations:where n = the number of samples, yi = the reference measurement results for sample i,  = the estimated result for sample i when the model is constructed with sample i removed,  = the estimated results of the model for the sample i, and  = the mean of the reference measurement results for all samples.

3. Results and Discussion

3.1. Spectra Presentation/Analysis

The fingerprint from the spectra dataset was used to create the statistical models. It could be observed from the spectra profile in Figures 2(a)2(c) that the major peaks are around 960–1050 nm. The wavelength range corresponds to O-H 2nd overtone and N-H 2nd overtone, which represents H2O, ROH, ArOH (OH bond on the aromatic group), and NH2 functional groups [34]. These groups are familiar with major constituents of water, glucose, sucrose, and cellulose of pineapples. TSS is an organic molecule that contains C-H, O-H, C-O, and C-C bonds, and NIR spectroscopy could be used to nondestructively measure this molecule [35, 36]. After preprocessing of the spectra dataset with MC and MSC, a clear separation between organic and inorganic pineapple fruits appeared in MSC pretreatment spectra profile as shown in Figure 2(c). This suggests that organic and inorganic pineapple fruits could be differentiated within 800 nm and 1070 nm range using MSC pretreatment technique. The MSC is therefore a useful tool for correcting baseline shift and light scattering problems related to the spectral dataset as mentioned by other authors [22].

3.2. Principal Component Analysis (PCA)

Principal component analysis was used to identify cluster trend in the spectra data. From the results obtained, it was observed that PCA done on raw and MC preprocessed spectra data did not give any clear cluster trends or separations as shown in Figures 3(a) and 3(b). However, MSC-PCA technique gave a separation with clear cluster trend as shown in Figure 2(c). Again this further proves the unique characteristics of MSC as an effective technique for baseline light scattering corrections as proposed by Geladi and coworkers [22]. PCA was able to identify the most important directions of the variability in the multivariate data space (X matrix) and to determine the primary phenomena in the spectra dataset [19]. The PCs (PC1, PC2, and PC3) contain spectra information and its corresponding chemical compositional information hence accounted for 99.36% of the total variance that existed for the 90 pineapple samples used in this study. Pineapples have a considerable difference in their chemical properties according to their preharvest activities and postharvest practices that categorize them as either organic or inorganic.

3.3. Classification Models

In this study, KNN and LDA were attempted for developing classification models for classifying organic and inorganic pineapple fruits.

The results of the classification models are shown in Table 2. The two classification models used performed well using MSC-PCA dataset. KNN and LDA had classification rate above 98% in both the calibration set and prediction set at optimal principal components (PCs) = 3, respectively. This means MSC-PCA preprocessing enhanced the performance of both KNN and LDA as compared to raw and MC dataset. On the other hand, LDA was slightly superior to KNN in the training set. This means LDA was well able to find the linear combination of features and the resulting combination used was a better linear classifier. More so, it could be explained that the good accuracy obtained by the model could be as a result of complex distinct organoleptic and nutritional properties between organically grown pineapple fruits as against conventionally grown ones. This phenomenon is further supported by the evidence that organic production excludes the use of synthetic fertilizer, pesticides, growth regulators, and other chemicals [6] and impact of the fruit’s quality and safety due to chemical residues. Other studies have also revealed that organically produced pineapple fruits comparatively have high vitamin C and moderate acidity [4] as well as highest total soluble solids contents [37].

3.4. Quantification Model

Partial least squares model was used for the determination of TSS (°Brix) in both organic and inorganic pineapple fruits. From Figure 4, the measured values correlated linearly with NIR predicted measurements. However, there were some few outliers which subsequently affected the PLS model. From Table 3, it could be seen that the MSC-PLS model was the best comparatively with parameters of Rc = 0.851, and RMSEC = 0.950 °Brix in the calibration set while Rp = 0.854 and RMSEP = 0. 842 °Brix in the prediction set. This result indicates that proper preprocessing technique is an efficient way to improve the accuracy of the PLS model [33]. For a good model, R value should be close to unity while RMSEC and RMSEP should be close to zero. The weaknesses of this MSC-PLS model could be attributed to the characteristics of the PLS model, because classic PLS model was built using full spectrum range, which normally consists of useful and irrelevant or redundant information (noise). The noisy spectrum normally reduces the performance of the model. Hence, to improve this model for intact pineapple quality evaluation, other known PLS types should be investigated while comparing with other nonlinear algorithms. Notwithstanding, the results compare favourable with those found by other authors for using VIS-SWNIR spectroscopy for predicting soluble solids content in pineapple fruits [15]. More so, it must be stated that through favourable statistical correlations, the NIR multivariate models predicted the °Brix values.

3.5. Selection of Vital Wavelengths

When developing the PLS model, there is the need to consider how much each wavelength contributes to the final outcome. Figure 5 shows PLS loading weights of the best model and this explains how the complexity of the PLS model was developed. The loadings show how well the wavelength was taken into consideration by the model components. It is used to understand how much each x-variable (wavelengths) contributes to the meaningful variation in the data and to interpret variable relationships as well as interpret the meaning of each model component [38]. The loading weights were normalised so that the length and directions could be made meaningful. From Figure 5, we can see several peaks at certain wavelengths (754, 760, 823, 850, 884, 901, 910, 950, and 960 nm) which are considered to be more useful for the developed PLS model used for determining total soluble solids (TSS, °Brix) content in pineapples. These vital observed wavelengths in this study are closely related to the chemical composition in pineapple fruits. Specifically, these wavelengths are related to the third overtone region comprising OH and CH stretching vibrations of sucrose solutions [39], an important component of TSS. For example, the wavelengths at 910, 950, and 960 nm are related to the chemical group of C-H and O-H which are attributed to TSS, while the region of 750–820 nm reveals sucrose, glucose, and fructose [9, 39].

4. Conclusion

The research has shown the potential of handheld NIR spectroscopic technique for rapid nondestructive measurements of pineapple quality. MSC gave the best PCA cluster trend with clear separation in the first three PCs. The overall results showed that handheld spectrometer coupled with MSC-PCA+LDA model could be used to identify organically and conventionally grown intact pineapple fruits with 100% identification rate in both the training set and prediction set, respectively. On the other hand, PLS regression model could be used for predicting TSS (°Brix) with RMSEC = 0.95 and RMSEP = 0.84 at 5 factors with an accuracy of 85% in both the calibration set and prediction set, respectively. There is a potential of these models to be imported into mobile phone technology for effective all-round application.

Data Availability

The data used to support the findings of the study are available from the corresponding author upon request.

Conflicts of Interest

The authors declare that there are no conflicts of interest.


The authors highly appreciate the support provided by the University of Cape Coast, Agilent Foundation, and Mars. The supply of fresh pineapple fruits by ACOPPS and AMOPPA pineapple farmers in the Central Region is highly acknowledged. The authors appreciate proofreading assistance provided by Mrs. Winifred Akpene Teye.