Blueberry fruits of different cultivars are featured with different quality indices. In this work, three types of quality factors, including 6 physical parameters, 12 chemical and nutritional components, and 3 antioxidant indices, were measured to compare and classify blueberry fruits from 12 different cultivars in China. Using the autoscaled data of quality factors, unsupervised principal component analysis was performed for exploratory analysis of intercultivar differences and the influences of quality factors. A supervised classification method, partial least squares discriminant analysis (PLSDA), was combined with the global particle swarm optimization algorithm (PSO) and two multiclass strategies, one-versus-rest (OVR) and one-versus-one (OVO), to select discriminative quality factors and develop classification models of the 12 cultivars. As a result, OVO-PLSDA with 8 quality factors could achieve the classification accuracy of 0.915. This study will provide new insights into the quality variations and key factors among different blueberry cultivars.

1. Introduction

Blueberry, a native American cultivar, has been widely cultivated around the world [1]. It is a very popular fruit due to its pleasant flavor, high nutritional value, and healthy effects [2, 3]. Fresh blueberry fruits are rich in various nutritional ingredients, such as anthocyanins, polyphenols, flavonoids, polysaccharides, vitamins, minerals, and dietary fibres, [4]. Moreover, modern scientific experiments have revealed many of its functional activities, such as antioxidant, antimicrobial, antihypertensive, anti-inflammatory, and neuroactive properties, and its ability to prevent obesity, diabetes, cancer, and other chronic diseases [510]. Besides being consumed as fresh fruits, blueberries are also widely used to produce natural extracts, blueberry wine, beverages, jams, preserved fruits, and food ingredients or additives.

The cultivation of blueberry in China started in the mid-1980s and began to be popularized in the early 21st century. At present, the main cultivation area has amounted to about 3500 hm2 around Northeast and North China, Jiangsu, Zhejiang, Liaodong Peninsula, and Southwest and South China [11]. According to an incomplete survey, over 100 different cultivars of blueberries have been introduced and bred in China, among which about 10–15 cultivars are important in the domestic market [12].

The physical and chemical quality factors, the nutritional ingredients, and functional activities of blueberries depend largely on the specific cultivars and cultivation conditions. Many sensory indices (such as hardness, brittleness, and chewiness) of fruits are closely related to texture and physical factors, which directly affect their storage and transportation characteristics. They have become important indices for testing fruit quality and the primary factors for evaluating acceptability of freshly consumed fruits [13]. The levels of chemical indices and nutritional ingredients also play an important role in the flavor and nutritional value of blueberries, which have been intensively studied and compared among different cultivars and producing areas [14]. Among its various functional activities, the antioxidant capacity and substances, such as polyphenols (especially anthocyanins) and flavonoids, are also associated with many other functional activities and have been compared with blueberries from different cultivars, geographical origins, and with different postharvest processing methods [15].

Statistical and chemometrics have been widely used to reveal the contributions of multiple variables in complex chemical systems [1621]. At present, the studies on blueberry quality mainly focus on one or several indicators, but few studies have been performed yet on comprehensive evaluations of physical, chemical, and nutritional quality factors among different cultivars of blueberries [2224]. The objective of this work was to study the quality variations among some major blueberry cultivars by a fusion analysis of some physical and chemical factors, nutritional ingredients, and antioxidant abilities. In order to reveal the key factors among different blueberry cultivar, besides the unsupervised principal component analysis (PCA), the supervised partial least squares discriminant analysis (PLSDA) was also used to develop multiclass classification models using feature sets selected by the global particle swarm optimization (PSO) algorithm [2528].

2. Materials and Methods

2.1. Blueberry Samples

Mature blueberry fruit samples (N = 366) of 12 different cultivars were provided by several local blueberry orchards in Huaining, Anhui province. The blueberry fruit was harvested in 3 days after attaining the maximum blue color. After harvesting, the blueberries were packed in fresh-keeping boxes and placed in 4°C incubators, which were transported to the lab on the second day. Intact fruits with uniform size and color were selected, and their quality indexes were measured. Fresh fruits were washed and dried, packed and sealed, and frozen at −18°C to be used for determination of physicochemical indexes. The detailed information about sample size, cultivar, and sources are listed in Table 1.

2.2. Quality Analysis of Blueberry Fruit

In this work, a set of 21 quality factors, including 6 physical factors, 12 chemical and nutritional components, and 3 antioxidant indices, were determined for the collected blueberry fruit. The 21 quality factors are listed in Table 2.

2.3. Measurements of Physical Factors

For each sample, 10 blueberry fruits were randomly selected, and L∗ and hardness were measured at 3 sites along the equatorial line. The measurement of lightness value (L∗) was performed using a CR-400 Chroma Meter (Minolta, Osaka, Japan). The hardness value was determined using the method by Hu et al. with a TA.XTplus texture analyser (Stable Micro Systems, England) [29]. The average single fruit weight, shape index (the ratio of maximum height to width), and specific gravity were also measured on 10 randomly selected fruits. To measure the juice yield, fruits (20 g) were beaten and centrifuged at 6000 r/min for 15 min to obtain the upper juice.

2.4. Determination of Chemical Factors and Nutritional Ingredients

The total soluble solid (in °Brix) was analyzed using a PAL-1 Digital Hand-Held Pocket Refractometer (Atago, Japan). The pH value was measured on blueberry pulp using a PB-10 pH meter (Sartorius, Germany). The titratable acidity (TA) value was analyzed using an indicator method for acid-base titration [30]. The level of vitamin C (ascorbic acid) was determined by the 2,4-dinitrophenylhydrazine colorimetric method with a U-3900 ultraviolet-visible spectrophotometer (Hitachi, Tokyo, Japan) [31]. The contents of total phenols (TP) were analyzed using the Folin–Ciocalteu method [32]. The standard curve was made using pyrogallic acid (in ethanol), and TP content was computed as gallic acid equivalents per 100 g fresh weight (mg GAE/100 g FW). The analysis of total flavonoids (TF) was performed using the AlCl3 colorimetric method [33]. TF content was expressed as catechin equivalents per 100 g fresh weight (mg CE/100 g FW). The total soluble sugar was determined using the modified anthrone-sulfuric acid method [34]. The level of reducing sugar was analyzed using the direct titration method of copper tartrate solution [35]. The moisture was determined by the direct drying method [36]. The ash content was analyzed by burning and weighing [37]. Protein determination was performed using the classical Kjeldahl method [38]. The contents of anthocyanins were determined by the differential pH method [39].

2.5. Antioxidant Analysis

Three different antioxidant indices were measured to compare the antioxidant capacities of blueberries. The scavenging capacity of 2, 2-diphenyl-1-picrylhydrazyl (DPPH) and hydroxyl radicals and the ferric reducing antioxidant power (FRAP) values were determined following the procedures described for blueberry samples [5, 40, 41].

2.6. Chemometrics Data Analysis

Considering the scale variations in different quality factors, each factor was autoscaled, namely, the values were made to have a zero mean and a standard deviation of 1. For exploratory analysis of the data, unsupervised principal component analysis (PCA) was performed to show the class distributions of blueberry [42]. The DUPLEX algorithm was performed to divide the data of each class into training and test objects, which were combined to generate the final training and test sets [43].

Partial least squares discriminant analysis (PLSDA) was used to develop two-class classification models [4447]. To tackle the multiclass problems in this work, two chemometrics strategies, one-versus-rest (OVR) and one-versus-one (OVO), were performed and compared to develop a set of binary PLSDA classifiers [48, 49].

In order to probe and reveal the key quality factors reflecting the cultivar variations of blueberry, the global particle swarm optimization (PSO) algorithm was used to select the most discriminative feature sets [28]. PSO can imitate the social behavior of bird flocking where a population of particles or candidate solutions are improved iteratively to approach the best solution by combining random search and the best known solutions. PSO can be started with a population of random feasible solutions. In this work, 100 initial feasible solutions were randomly generated as strings of 0 s and 1 s, where 0 and 1 represent the absence and presence of a quality factor, respectively. Discriminative feature sets were selected to obtain the lowest overall classification error rate of Monte Carlo cross validation (OCERMCCV) defined aswhere B is the number of random data splitting by MCCV; Mi and Ni are the number of misclassified objects and test objects for the ith data splitting, respectively [50].

2.7. Software

All the data processing and chemometric algorithms were performed on MATLAB 7.0.1 (MathWorks, Sherborn, MA, USA). The DUPLEX algorithm was performed using the codes included in the TOMCAT toolbox [51]. All the other data analysis algorithms, including PCA, OVO, OVR, PLSDA, and PSO, were performed using self-compiled MATLAB codes.

3. Results and Discussion

The ranges and standard deviations (SD) of the 21 quality factors of 12 blueberries are summarized in Table 3 where the raw data of quality factors have different scales. To illustrate the distribution of different classes, principal component analysis (PCA) was performed on the autoscaled data (Figure 1). The first two principal components (PCs) account for 87.59% of the total data variances. Projection of the 12 classes onto the first 2 PCs showed the variations among different cultivars of blueberries. The loadings of the first PC (Figure 1) indicated that the levels of 1 physical parameter (hardness) and 5 chemical and nutritional components (vitamin C, total phenols, total flavonoids, proteins, and anthocyanins) contribute significantly to the class separation achieved by PC1. For the second PC (Figure 1), 5 parameters had important contributions, including 2 physical parameters (average single fruit weight and shape index), 2 chemical and nutritional components (titratable acidity and anthocyanins), and an antioxidant index (scavenging capacity of DPPH radical). Obviously, the level of anthocyanins was a key quality factor to discriminate different blueberries as it plays an important role in both PC1 and PC2.

Although PCA could obtain some separation of different blueberries, supervised methods were needed to achieve more accurate classification models based on key quality parameters. Therefore, multiclass classification models were developed using OVR-PLSDA and OVO-PLSDA models with subsets of key quality parameters selected by the PSO algorithm. To obtain representative training and test data sets, the DUPLEX algorithm was performed on each of the 12 classes of blueberries to divide the measured data into training and test objects (Table 1). The training and test objects from each class were combined to generate the final training and test sets, including 236 and 130 objects, respectively.

`For both OVR-PLSDA and OVO-PLSDA, the number of significant latent variables (LVs) of each binary PLSDA submodel was determined using MCCV to obtain the lowest OCERMCCV. With different sizes (3–15) of parameter subsets, PSO was performed to search for the optimal subsets by minimizing the OCERMCCV. In this study, the number of data splitting of MCCV was 100. For each data splitting, 80% of the training objects was used for model development, and 20% of the training objects was used for validation. For PSO, the algorithm was stopped when the value of objective function (OCERMCCV) could not be reduced by 0.1% in the next cycle. The maximum total number of PSO cycles was set to be 100, which to our knowledge was sufficient to solve the small-scale (21 variables to be selected) optimization problem in this work. To examine the optimization performances of PSO, a 100-cycle PSO was performed to search for the best subsets for the 8-variable OVO-PLSDA and 10-variable OVR-PLSDA, and the lowest OCERMCCV for each cycle is shown in Figure 2. Though there were slight fluctuations of OCERMCCV, PSO could significantly reduce OCERMCCV with sufficient cycles, and the searching results were stable for both of the two methods.

The classification results of OVR-PLSDA and OVO-PLSDA with selected subsets of quality factors are summarized in Table 4. In terms of OCERMCCV, for OVR-PLSDA, the 3 best subsets included 10, 12, and 9 quality parameters, respectively, and for OVO-PLSDA, the 3 best subsets had 8, 9, and 8 quality parameters, respectively. Generally, OVO-PLSDA could obtain better training and prediction accuracy than OVR-PLSDA. This could be attributed to the large submodel complexity and uneven class sizes of OVR. Both OVR-PLSDA and OVO-PLSDA required at least 8 or 9 features to obtain best classification accuracy of the 12 blueberries, indicating that the variations among different blueberries were multivariate.

To demonstrate the variations in key quality factors, the pooled coefficient of variation (CV) for each variable was computed and compared. The best classification accuracy of 0.915 was obtained by OVO-PLSDA using 8 selected quality parameters, including hardness (CV, 0.2593), juice yield (CV, 0.1516), titratable acidity (CV, 0.2692), vitamin C (CV, 0.4289), total phenols (CV, 0.1666), total flavonoids (CV, 0.1998), anthocyanins (CV, 0.2773), and antioxidant capacity (DPPH) (CV, 0.3137). Averages of the 4 factors having the highest CV values over the 12 blueberries are shown in Figure 3. shows that the correlation of titratable acidity and vitamin C was 0.64 and anthocyanins and DPPH was 0.56. The low correlations among the key quality factors also imply that the discrimination of different blueberries requires multivariate quality factors. Key quality factors could be identified from their high frequency of being included in the selected sets in Table 1, including hardness (4), total flavonoids (6), vitamin C (5), total phenols (5), anthocyanins (6), and DPPH (5), indicating that all of the 3 types of quality factors are useful to characterize and discriminate different blueberries. The results demonstrated that the quality of blueberry is of multivariate nature, and the analysis requires the aiding of chemometrics.

4. Conclusions

In this study, the quality factors of blueberry fruits from 12 different cultivars in China were compared by analysis of 6 physical parameters, 12 chemical and nutritional components, and 3 antioxidant indices. Unsupervised PCA and supervised PLSDA were used to reveal the variations among different blueberries and to identify the key quality factors aided by the PSO algorithm. The results indicated that high classification accuracy (0.915) of the 12 blueberries could be obtained by using 8 quality factors, and all of the 3 types of quality factors are useful to characterize and discriminate different blueberries. Key quality factors were identified from their high frequency of being selected by PSO, including hardness, total flavonoids, vitamin C, total phenols, anthocyanins, and DPPH.

Data Availability

The data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Authors’ Contributions

Juan Song and Qiong Shi equally contributed to this work.


Authors are grateful to the financial support from the National Natural Science Foundation of China (Grants nos. 21665022, 31972164, 21776321, and 21706233), Key Projects of Technological Innovation of Hubei Province (2016ACA138), Guizhou Provincial Science and Technology Department (Nos. QKHJC[2017]1186, QKHZC[2019]2816, and QKHPTRC[2020]5009), the Talented Researcher Program from Guizhou Provincial Department of Education (QJHKYZ[2018]073 and QJHKYZ[2015]390), Tongren Science and Technology Bureau (No. TSKY2019-3), the Talented Youth Cultivation Program from “the Fundamental Research Funds for the Central Universities”, and South-Central University for Nationalities (No. CZP20007).