The proposed work is focused on the simultaneous quantification of 14 compounds in the medicinal plant Achillea millefolium based on Near-Infrared Spectroscopy (NIR). The regression model of single-compound models (SCMs) and multicompound model (MCM) were created by partial least-squares regression (PLSR). Also, these models were calibrated by gas chromatographic mass spectroscopy (GC-MS). The results showed that the averaged standard errors of prediction (SEP) for the SCMs and MCM were 0.49 and 0.62, respectively, and most of the 14 compounds were significantly correlated. 43 correlations were significant at the 0.01 level (47.25% of the total), and 11 correlations were significant at the 0.05 level (12.09% of the total). The first three principal components (PCs) of principal component analysis (PCA) can explain >78% of the total variance. According to the component matrix and the communality table, octadecanoic acid has the largest influence on PC 1 (extraction squared = 46.72%), whose extraction was 0.932. The communality of neophytadiene, Z,Z,Z-9,12,15-octadecatrienoic acid, and oleic acid was also found to be large, whose extractions were 0.955, 0.937, and 0.859, respectively. These results indicate that if one compound shows a linear relationship with the NIR absorbance signal (SCM) also, an MCM can be created due to the close interrelations of these compounds. In this context, the present work highlights a suitable sample preparation technique to perform NIR analysis of raw plant material to benefit from robust and precise calibrations. To sum up, this NIR spectroscopic approach offers a precise, rapid, and cost-effective high-throughput analytical technique to simultaneously and noninvasively perform quantitative analysis of raw plant materials.

1. Introduction

Currently, there is a growing need for the analysis of medicinal plants because they contain a large number of beneficial medicinal compounds. A. millefolium, a widely distributed medicinal plant in Europe and Asia, is extensively used as a folk medicine because of its multiple pharmacological activities, and its essential oils are important for the anti-inflammatory activities of plants [13]. To quantitatively determine one compound in a medicinal plant, often the information about the remaining compounds is lost due to the extraction of only one single compound, which, however, may be the key for the therapy effect [4, 5]. Over the years, these research studies were focused on fingerprint technology, such as high-performance liquid chromatography (HPLC), gas chromatography (GC), high-performance thin-layer chromatography (HPTLC), capillary electrophoresis (CE), and nuclear magnetic resonance (NMR), which are helpful to give an overall understanding of the chemical active ingredients [610]. In doing so, it has to be considered that fingerprint techniques are commonly complex and very specific for only one or few compounds which makes single-spectrum-based fingerprint techniques not only time consuming but also hard to standardize [11]. To simultaneously determine multiple compounds, other analytical techniques, oftentimes combined with multivariate techniques that generate a series of fingerprint spectra, have to be applied. The near-infrared (NIR) region expands from 4000 to 12800 cm−1 (2500–780 nm), which covers the overtone and combination transitions of the C-H, O-H and N-H groups. Compared with midinfrared spectra, NIR absorption bands are weaker and more difficult to identify due to the higher level of excitation bonds [12]. The molecular overtone and combination bands in the NIR are typically broad and overlapping leading to complex spectra mixes. Since it is very difficult to assign characteristic features to specific chemical components, multivariate analysis (MVA) techniques, e.g., regression (PLSR), principal component regression (PCR), or multiple linear regression (MLR), are often employed to extract the desired chemical information and to bring out hidden data structures. These days, an increasing number of research is focused on the chemical assembly of the medicinal plants to quantify the relevant compounds in the samples by NIRS [13, 14], for example, Cortex phellodendri [15], Magnolia officianlis [16], Piper methysticum Forst. f. [17, 18], and American ginseng [19]. The noninvasive character of NIR spectroscopy for determining medicinal relevant compounds shows many advantages to other techniques, such as hardly any need for sample preparation and the possibility to perform outdoor analysis with commercially available hand-held instruments [20,21]. Since NIRS is often combined with MVA and statistics, parameters such as the right choice of the calibration set (training set) and the validation set (test set) samples, data pretreatments, and statistic methods for constructing the model play an important role for creating a suitable quantitative model [2225]. Only few studies focus on the interrelation of the chemical active compounds which might be one of the most important factors when talking about herbal medicine. Most of the time, only one regression model for each property, which can be called a single-compound model (SCM), is used. This means, to quantify many compounds in a medicinal plant, one has to build as many regression models as there are compounds present in the sample. To simultaneously determine more than one compound, a multiple compound model (MCM) has to be generated. Furthermore, a MCM can reflect the interrelationship of all the compounds in the samples by only one measurement. To sum up, the main objective of this study was to quantify each of the 14 main compounds in A. millefolium by NIRS and to compare the single-compound data with the multicompound data evaluation technique. Diverse sample preparation procedures are reviewed, and the different multivariate data evaluation approaches are discussed in detail.

2. Materials and Methods

2.1. Sample Preparation of Achillea millefolium

A. millefolium plants were collected around Innsbruck (Austria, Europe). Each sample consisted of 5 individual plants, 36 samples in total. Twelve samples were dried in an oven at 40°C, and the remaining 24 were dried at room temperature. After the drying process, flower heads were cut off and grinded by using a roll cut machine (IKA/ULTRA TURRAX/Tube drive, Staufen, Germany) to about 1 mm. All the grinded samples were stored in an exsiccator prior to NIR analysis.

2.2. NIR Spectroscopy

NIR Fourier-Transform spectrometers (FT-NIR; Büchi, Flawil, Switzerland) were used to measure the NIR spectra (4000 to 10000 cm−1) of samples. Spectra were recorded in the diffuse reflection mode by using an integrating sphere device (Büchi). Each of the 36 samples was measured three times, leading to 108 NIR spectra, and analyzed by Chemometric software NirCal 4.21 (Büchi). These spectra were randomly divided into two parts, a learning set (67%, c-set) and test set (33%, v-set). The reflection spectra were transposed to the log (1/R) absorbance spectra followed by various data pretreatments to correct for offset effects due to an inhomogeneous particle size distribution. Partial least-squares regression (PLSR) and principal component regression (PCR) analysis were implemented to build the models.

2.3. GC-MS Analysis

The dried flower heads were extracted 3 times with CH2Cl2 (1 : 10 w/v) and ultrasonicated for 10 min. After evaporation of the solvent, the supernatant was transferred to a volumetric flask; n-Heptanol was used as an internal standard. The extracted compounds were identified by GC-MS using an Agilent 6890 Network GC system MSD ChemoStation (Palo Alto, US). Column: MS quartz capillary column (0.25 mm I.D × 30 m × 0.25 μm). Carrier gas: helium, 2 mL/min; split ratio: 1 : 10; temperature program: 60°C to 180°C at 5°C/min and 180°C to 280°C at 2.5°C/min. Electron impact (EI) spectra were obtained at −70 eV. The search libraries were NIST02, Wiley7n, and Flavor2.

2.4. Quantitative Data Analysis
2.4.1. NIRS Model Evaluation

The optimum number of factors for building the models was obtained by the predicted residual error sum of squares (PRESSs) function given aswhere xn is for predicted values and yn for reference values.

The optimum regression models were evaluated by the following calculated values:(i)Bias of the c-set and the v-set, which show the deviation between the values of predicted and actual; it is naturally zero after bias correction.(ii)The c-set SEE and the v-set SEE (SEP), which show the precision of the regression models for the c-set and the v-set, respectively.(iii)Consistency, which shows the robustness of the regression models; it should approach 100. Consistency = SEE/SEP × 100.(iv)Regression coefficient (R2) of the c-set and the v-set, which will show the relation of the predicted values to the actual values; R should approach 1.(v)Regression intercepts and slopes of the c-set and the v-set.

2.4.2. Comparison of the PLS Models

A paired t-test was conducted to compare the difference between the SCM regression models and the MCM. ANOVA was used to compare the differences of the varying sample preparations (air-and-oven-dried, oven-dried and air-dried). The homogeneity of the variances was tested by Levene statistic, and multiple comparisons were conducted by LSD when the equal variance was assumed to be equal and by Tamhane in case of nonequal variance. Pearson bivariate correlation, principal component analysis (PCA),, and hierarchical cluster analysis were conducted to find the inner relationship among the 14 compounds.

3. Results and Discussion

Comparison of the single-compound models (SCMs) and the multiple compound model (MCM).

44 compounds were identified by GC-MS (data not showed here), whereas those 14 which seem to be of particular interest in herbal medicine were used for creating the PLS models (Table 1).

The 108 log (1/R) NIR spectra (Figure 1) were used for creating the 17 PLSR models (Table 2, Figures 2(a) and 2(b)), including the 14 models for each single compound 1 to 14 and the three models for air-dried samples, oven-dried samples, and oven-and-air-dried samples, respectively.

As can be gathered from Table 2, varying wavelengths, data pretreatments, calibration methods, and factors were used to build the best and most specific regression model for each single compound. Wavelengths for different compounds did not vary much; most of them focus on a broad wavenumber range from 4596 to 9000 cm−1. MSC was used as a data pretreatment for 9 of the 17 models to suppress unwanted scatter effects due to different particle sizes present in samples. Since PLS is well suited for coupling digital filtering [24], it was indicated that the NIRS data show a collinear relationship to some extent. Coupling digital led to a low consistency in all cases, which is needed for MSC as a data pretreatment. MSC turned out to reducing the calibration factors, or simplifying the regression model, and increasing the consistency (robustness) of the models. [26]. The average number of factors after the MSC data pretreatment was 6, which is the lowest amount of average factors for all data pretreatments applied. The PLS model evaluation showed an average SEE of 0.35 for the SCM and 0.56 for the MCM whereas the average SEP showed 0.49 and 0.62, respectively. The average consistencies of the line from regression for the SCM and the MCM are 69.01 and 92.34. The average R2 for the c-set for the SCM and the MCM is 0.93 and 0.83, while the average R2 of the v-set for the SCM and the MCM is 0.89 and 0.82, respectively (Table 3). It was shown that both the SCM and the MCM have high R2 and low SEE and SEP, which is an indication for the high prediction abilities of the individual SCMs and the simultaneous determination by means of the MCM. The paired t-test showed significant differences between each parameter for the SCM and the MCM, except the bias, although both the SCM and the MCM are well suited for the prediction of unknown samples. The SEE and SEP of the 14 compounds calibrated by the SCM were lower than those obtained by the MCM, and the R2 of both the c-set and v-set of the 14 compounds calculated by the SCM was higher than that by the MCM. In detail, from the SCM to the MCM, the SEE increased at 0.22 () and the SEP at 0.12 (). The consistency increased at 23.33 () while the R2 decreased at 0.10 and at 0.07 for the c-set and the v-set, respectively. These results denote that the MCM, representing 14 compounds, shows, on one hand, a higher robustness of the regression model but reduces the precision and prediction ability on the other hand. In other words, one has to decide whether to perform simultaneous, fast, and robust but not very precise analysis (MCM) or to perform very precise single-compound analysis by implementing a single-compound model.

3.1. Influence of the Sample Sets on the Regression Models

PLS models of the 14 compounds for different sample pretreatments were built and compared to the MCMs (Table 2). The Sair-oven models showed a higher SEE (average = 0.85), SEP (average = 0.87), consistency (average = 98.04), and intercept for the c-set and the v-set (average = 1.13 and 1.16), but a lower v-set bias (average = 0.02), R2 (v-set = 0.72, c-set = 0.73), and slopes (v-set = 0.54, c-set = 0.53) than both Soven and Sair (Tables 3 and 4). For the Soven and the Sair regression line, there were no significant differences between most of the parameters except the consistency, which is higher in Sair (92.34) than in Soven (61.98) () and the v-set R2 of Sair (0.83) and Soven (0.92) (). The conducted ANOVA showed most parameters of the v-set (except v-set bias) were not significantly different with the other parameters. That means that many samples with great variations can increase the robustness of the regression models, but reducing the R2 and the precision as a consequence. All the regression models were almost equal in the prediction ability since there were no differences in the v-set evaluation. One thing that has to be considered is that all the 16 regression models (14 SCMs, Sair and Soven) were employed by PLS, and only the Sair-oven showed significantly better regression parameters by employing PCR, which maybe relevant to the greater variation but lower collinearity in the calibration set of Sair-oven.

3.2. Interrelation of the 14 Main Compounds in Achillea millefolium

Our research brought up the following question: why can 14 compounds with different properties be quantified by only one NIR regression model?

It is implied that there have to be some internal relationships between the 14 compounds. In order to expose these, Pearson bivariate correlation, PCA, and hierarchical cluster analysis were conducted on the 14 compounds. Pearson bivariate correlation analysis showed that most of the 14 compounds were significantly correlated. 43 correlations were significant at the 0.01 level (47.25% of the total), 11 correlations were significant at the 0.05 level (12.09% of the total), and 37 correlations were not significant (40.66% of the total) as can be seen in Table 5.

The PCA showed that 3 PCs explain >78% of the total variance of the 14 compounds. According to the component matrix and communality tables (not shown here), octadecanoic acid (C12) is mainly related to principal component (PC) 1 (extraction squared is 46.72%), whose communality extraction is 0.932. The communalities of C4 (neophytadiene), C8 ((Z,Z,Z)-9,12,15-octadecatrienoic acid), and C9 (oleic acid) were also high, whose extractions were 0.955, 0.937, and 0.859, respectively (Table 6). These results corresponded to the cluster analysis in Figure 3, which showed that C1, C2, C3, C5, C6, C7, and C1 were close to C9 and C12, but far from C4 and C8. In other words, C4, C8, C9, and C12 represented the main variances of the 14 compounds.

3.3. Choosing the Right Sample Sets

It is a fact that different sample sets lead to different NIR regression models. Each of them may have different characteristics even if they all work well [27]. Large sample varieties and numbers can help building high robust models. In contrast, a suitable sample pretreatment procedure leads to homogenous sample sets that result in higher-precision calibration models. Careful preparation of the validation set before analysis leads to much more precise predictions and minimizes the need for spectral data pretreatment. The more homogenous the c-set and the v-set samples, the better the model and the predictions will be. Careful development of the c-set and v-set samples is crucial for near-infrared spectroscopic analysis for quantifying medicinal plants with varying compounds.

4. Conclusions

Both one regression model for one compound (SCM) or one regression model for multiple compounds (MCM) can be used to quantify the chemical compounds in A. millefolium. The former approach showed a higher R2 for the c-set and the v-set and lower SEE and SEP than the later approach that, in contrast, showed a higher consistency than the former approach. It seemed that although the R2 decreased and the SEE and the SEP increased, the MCM with many compounds brings out some internal relations of the compounds present in the samples. The MCM showed an increased robustness whereas the precision decreased. In our opinion, a combined use of a SCMs and a MCM is well suited to quantitatively analyse A. millefolium as well as other medicinal plants. This method has some similarity to chemical fingerprint methods, but presents itself simpler in operation than the other methods [28]. Theoretically, the MVA-supported NIR technique could merge data arising from different chemical methods, such as GC-MS, HPLC/HPLC−MS, HPCE, and TLC, to create a big complex model, which would look like a multidimensional fingerprint. Generally, the construction of an MCM needs large sample amount, the more, the more robust, but as soon as it is established, it can help save much time and make working more cost effective.

Data Availability

The data used to support this study are available from the corresponding author upon request.

Conflicts of Interest

The authors declare that there are no conflicts of interest regarding the publication of this paper.

Authors’ Contributions

Lan-Ping Guo and Jian Yang contributed equally to this work.


This research was supported by the National Key R & D Program of China (no. 2017YFC1700701), National Natural Science Foundation of China (nos. 81891014 and 81603241), and the Fundamental Research Funds for the Central Public Welfare Research Institutes (no. ZZXT201906).