Abstract

The multivariate calibration methods—principal component regression (PCR) and partial least squares (PLSs)—were employed for the prediction of total phenol contents of four Prunella species. High performance liquid chromatography (HPLC) and spectrophotometric approaches were used to determine the total phenol content of the Prunella samples. Several preprocessing techniques such as smoothing, normalization, and column centering were employed to extract the chemically relevant information from the data after alignment with correlation optimized warping (COW). The importance of the preprocessing was investigated by calculating the root mean square error (RMSE) for the calibration set of the total phenol content of Prunella samples. The models developed based on the preprocessed data were able to predict the total phenol content with a precision comparable to that of the reference of the Folin-Ciocalteu method. PLS model seems preferable, because of its predictive and describing abilities and good interpretability of the contribution of compounds to the total phenol content. Multivariate calibration methods were constructed to model the total phenol content of the Prunella samples from the HPLC profiles and indicate peaks responsible for the total phenol content successfully.

1. Introduction

Prunella, belongs to the family of Lamiaceae, seems to be a rich source of plant species contain high amounts of phenolic compounds and anthocyanins [1, 2]. The research on antioxidant compounds in the Lamiaceae family has been focused on phenolic diterpenes, flavonoids and phenolic acids [36]. Several studies have shown that Prunella species exhibit high antioxidant potentials, which are tightly connected with the total phenolic content [7, 8].

The antioxidant activity could be dependent on the extraction solvent, the hydrophilicity of compounds, the sample, and type of phenolic compounds, which means that different phenolic compounds react in different ways in antioxidant activity assays. The phenolic compounds prove the importance of antioxidant behaviour and contribute significantly to the total antioxidant activity of medicinal and aromatic samples [9, 10]. Total phenolic content of plants is an important parameter for their antioxidant properties. The Folin-Ciocalteu procedure of Singleton [11] has been used as a measure of total phenolics in natural products for many years. On the other hand, analytical techniques have been used to isolate, identify, and determine individual phenolic compounds by gas chromatography and mass spectrometry (GC-MS) [12, 13], high performance liquid chromatography (HPLC) [14], and liquid chromatography and mass spectrometry (LC-MS) [15]. However, the instrumental methods are expensive and often not very suitable for routine determinations [16].

Multivariate chemometric methods such as principal component regression (PCR) and partial least squares (PLSs) allow to extract analytical information from the full spectra/chromatograms, providing to use simultaneously an elevated number of signals. Moreover, these techniques allow a rapid analytical response with minimum sample preparation, reasonable accuracy and precision, and without a preliminary separation step in complex matrices [17, 18]. The multivariate chemometric methods have been involved in a wide range of studies for identification and quality control of herbal medicines [19].

Preprocessing of chromatographic data is necessary for complex mixtures and chemometrics offers many tools well suited to handling this task. In addition, it should be emphasized that a powerful application of multivariate calibration methods requires a careful preprocessing of the chromatograms. Therefore, different preprocess techniques such as baseline correction to enhance the signal to noise ratio [20], smoothing [21], and normalization of signals to remove undesired effects due to unequal amounts of injected samples and background signals [22] and alignment of chromatograms have been applied to the data. The most popular alignment technique is correlation optimized warping (COW) [2326]. The COW does not require peak detection and offers a satisfactory alignment of the complex chromatograms. After a successful preprocessing of chromatograms their further analysis with chemometric methods is possible. Recently, the combination of chromatographic instruments and chemometric approaches for data pretreatment allows a good and fast investigation of the plants [2730]. The total antioxidant capacity of green tea extract has been predicted from their chromatograms by multivariate calibration methods successfully.

In the present study, PCR and PLS multivariate calibration models were developed for the prediction of total phenol content of Prunella samples. The total phenol content of Prunella extracts obtained using 12 solvents was determined by the Folin method. Simultaneously, these extracts were analysed by HPLC. Multivariate calibration models relating the chromatographic profiles with total phenolic content of Prunella extracts were constructed. Correlation optimized warping, smoothing, normalization, and column centering were employed to analytical signals. Root means square errors were calculated for calibration and validation sets as comparison criteria. The results demonstrated that the developed PCR and PLS multivariate calibration models can successfully predict the total phenolic content of Prunella samples. One of the other purposes of this study was to identify the compounds present in the plant samples potentially responsible for the total phenol content of the samples by HPLC-DAD and multivariate calibration methods.

2. Materials and Methods

2.1. Plant Material

Prunella L. species (Prunella vulgaris L., Prunella laciniata (L.) L., Prunella grandiflora L., Prunella orientalis Bornm.), were collected from different localities in Turkey during June-July 2009: Prunella vulgaris L.; Bursa-Keles, Prunella laciniata (L.) L.; Bursa: Çalı-İnegazi, Prunella grandiflora (L.); Balıkesir: Edremit, Kazdağı, Prunella orientalis Bornm.; Antalya: Kemer.

2.2. Reagents and Reference Standards

Rosmarinic acid was purchased from Sigma-Aldrich (St. Louis, USA), the Folin-Ciocalteu phenol reagent, gallic acid, quercetin, kaempferol, and rutin were purchased from Sigma and were used without further purifications. Analytical grade of hydrochloric acid, HPLC grade of methanol, butanol, ethyl acetate, acetonitrile, hexane, formic acid, and caffeic acid were purchased from Merck (Merck, Darmstadt, Germany).

2.3. Preparation of Extracts

The collected samples were dried at room temperature and stored at 4°C. The whole parts of Prunella samples (1 g) were separately blended with either water or organic solvent (methanol, butanol, ethyl acetate, acetonitrile, hexane) containing 80 mg of ascorbic acid as an antioxidant and 10 mL of 6 M HCl (final concentration 1.2 M HCl) at room temperature in dark for 8 h under magnetic stirrer. The samples were treated with nitrogen gas before extraction. The extraction was also performed without acid hydrolysis using the same solvents. The hydrolysed/unhydrolysed samples (total volume 50 mL) were separated from the solid matrix by filtration through sheets of qualitative filter paper. Butanol and hexane extracts were evaporated to dryness. Finally, the residue was dissolved in methanol. The extracts were used for determination of total phenol by Folin and HPLC analysis.

2.4. HPLC Analysis

The HPLC analysis of Prunella extracts was carried out according to the procedure reported in the literature [31].

2.5. UV-Vis Spectroscopy Analysis
2.5.1. The Folin-Ciocalteu Method

The total phenolic content by the Folin-Ciocalteu reagent was determined according to the procedure reported in the literature [32]. Total phenols were expressed as mg of gallic acid equivalent (GAE) per g of dried weight.

2.6. Data Set

The data set containing 96 chromatograms of 48 Prunella extracts was obtained by HPLC including two repeated extractions. The total phenol content of the Prunella extracts was measured by Folin method. The mean of the replicates is computed and the 48 chromatograms were warped to correct the peak shift. A chromatogram that was representative (where all peaks are present) was selected as a target chromatogram. After the chromatograms alignment, Whittaker smoothing, normalization, and column centering were used for preprocessing of the 48 chromatograms, respectively. All data preprocessing was executed with subroutines developed under Matlab 7.6 software from the Mathworks (Natick, MA, USA). A detailed description of smoothing and correlation optimized warping method can be found in the literature [21, 26]. After the preprocessing of the chromatograms, the data set was divided into a calibration set (36 samples), to build the model, and a test set (12 samples) to validate the model. The calibration set was selected by uniform sampling of sorted Folin values. Finally, PCR and PLS regression methods were used to construct the multivariate regression models.

2.7. Selection of the Optimum Number of Components

To select the optimum number of factors to be used in the PLS and PCR calibration models a cross-validation procedure was used. In this procedure, each 𝑖th sample of the data set is left out once, and for the remaining samples the PLS model is built. Then the root mean square error of cross-validation (RMSECV) is computed for PLS and PCR models with different number of components [29]: RMSECV=𝑀𝑖.=1𝑦𝑖̂𝑦𝑖2𝑀,(1) where 𝑦𝑖 is the measured total phenol content of the 𝑖th sample and ̂𝑦𝑖 is the predicted total phenol content from a calibration equation obtained for the data without the 𝑖th sample, 𝑀 is the number of the calibration samples. The optimum number of factors of the PLS and PCR models correspond to the number of factors resulting in the lowest RMSECV.

The performance of the calibration model and its prediction ability is measured by the root mean square error (RMSE) obtained on the calibration set and root mean square error of prediction (RMSEP) obtained on the test set, respectively, RMSE=𝑀𝑖.=1𝑦𝑖̂𝑦𝑖2𝑀,(2)RMSEP=𝑖𝑀𝑡.=1𝑦𝑡𝑖̂𝑦𝑡𝑖2𝑀𝑡,(3) where 𝑀 and 𝑀𝑡 are the number of samples in the calibration and test sets, 𝑦𝑡𝑖 and ̂𝑦𝑡𝑖 denote the experimental value and the predicted value from the model for the 𝑖th sample from test set 3.

3. Theory

3.1. Data Preprocessing

Chromatographic profiles can be organized in an 𝑚×𝑛 data matrix X, where the 𝑚 objects (Prunella extracts) constitute the rows and 𝑛 variables (measuring time points) the columns. The results of chemometric data treatment are influenced by the applied preprocessing. In this study, different methods were applied and compared, that is, COW, smoothing, normalization, and column centering for pretreating the data.

Chemometric treatment of the chromatogram requires that all signals are adjusted to the same length and that corresponding variables (such as peak apexes) are placed in proper columns of the data matrix. Correlation optimized warping aligns chromatograms by piecewise linear stretching and compression of the time axis. At the beginning of the procedure, the profile to be aligned (𝑃) and the target profile (𝑇) are divided into a user-specified number of section (𝑁). Each section of the profile 𝑃 has its length stretched or shortened by shifting the position of its section and point by a limited number of data points, defined by the slack parameter (𝑡). The slack allows the section and points to shift from 𝑡 to 𝑡 points. For each section of 𝑃, the stretched or shortened sections are interpolated to the corresponding section of 𝑇 and the correlation coefficient between both sections is computed [2830]. A more detailed description of the COW method can be found in the literature [26, 33]. To remove the level differences, column centering is a generally applied preprocessing technique. By removing the column mean from each corresponding value, every centered variable has a mean of zero.

Data smoothing techniques are used to eliminate “noise” and extract real trends and patterns. One of the smoothing techniques, Whittaker smoothing was applied for the noisy data (chromatogram) in this study [21].

3.2. Multivariate Linear Regression
3.2.1. Principal Component Regression (PCR)

Principal components are primarily abstract mathematical entities and further details are described in the literature [34]. In multivariate calibration, the aim is to convert these to compound concentrations. PCR uses regression to convert principal component scores to concentration. Also PCR uses the latent variables created by principal component analysis to build a multivariate linear regression model.

3.2.2. Partial Least Squares (PLSs)

Partial least squares is often presented as the major regression technique for multivariate data to express the relation between X and y. PLS uses the nonlinear iterative partial least squares algorithm (NIPALS) [22, 2830].

4. Results and Discussion

4.1. Total Phenol Contents of Prunella Species

Total phenol contents were determined in Prunella species using pure solvents of water, methanol, ethyl acetate, acetonitrile, butanol, hexane, and their acidic solutions by Folin method. Solvents with relatively lower polarity except hexane and acidic hexane were more efficient in general for extracting phenolic compounds in Prunella species (Table 1). On the other hand, pure solvents with higher polarity and no acid extracted significantly higher amounts of phenolic compounds than non-polar solvents. The solvent most suitable for the extraction of phenolic compounds appeared to be acidic ethyl acetate in all species. The acidic solvent extraction is particularly suitable for the flavonoid extraction because many phenolic compounds occurring as glycosides or esters. Although the total phenol content using the Folin method ranged from 1.00 to 24.63 mg GAE per g dried sample for solvent extracts, the total phenol contents of acidic extracts were ranged from 1.24 to 87.33 mg GAE per g dried sample. Total phenol contents determined by the Folin method are similar with four Prunella L. species in each solvent. Although the water extract of Prunella grandiflora has the highest total phenol content, the each hexane extract of Prunella species has the smallest total phenol content among four Prunella species. The acidic ethyl acetate extract of Prunella vulgaris has the highest total phenol content but the acidic hexane extract of Prunella vulgaris has the smallest total phenol content among four Prunella species in acidic extracts. The values obtained by applying the Folin method conclude that the order of solvent efficiency is acidic ethyl acetate > acidic acetonitrile > acidic methanol > acidic butanol > acidic water > water > butanol > methanol > acetonitrile > ethyl acetate > acidic hexane > hexane. These results are higher than those reported by Li et al. [35] and Cai et al. [36] who compared water and methanol for the extraction of phenolic compounds from Prunella vulgaris. The major phenolic compounds reported in Prunella vulgaris are flavonols (rutin), anthocyanins (cyanidin and delphinidin), phenolic acids (caffeic acid), and tannins. On the other hand, rosmarinic acid has been determined in high amount in Prunella vulgaris [37]. There is no investigation on the phenolic content of Prunella laciniata (L.) L., Prunella orientalis Born, and Prunella grandiflora L. species in detail in the literature.

4.2. PCR and PLS Calibration

Multivariate calibration models were built with data matrix X consisting of the 36 chromatograms, and a response vector y, representing the total phenol contents results. A division of the data into a calibration set and a test set was made since the data set is large enough. PCR and PLS calibration models were obtained from the calibration sets described in Section 2.6. The prediction of total phenol contents by these two methods were calculated based on RMSE (Table 2). It can be seen that PLS has much better prediction ability than PCR. Also different preprocessing methods, that is, smoothing, normalization, and column centering improved the prediction for calibration set.

In the case of shifts present in retention times between chromatograms, alignment of the corresponding peaks is needed. Therefore, the correlation optimized warping was performed to align the chromatograms. It was found that column centering gave the best results for the calibration set, which has the lowest RMSEP. All other results were obtained by data preprocessed in this way. The RMSE of the column centered data is similar to the RMSE of data preprocessed with smoothing, normalization and column centering for PCR calibration. However, the RMSE of column centered data is smaller than the RMSE of data preprocessed with smoothing, normalization and column centering for PLS calibration. Preprocessing with column centering, smoothing followed by normalization and column centering were successfully applied for the warped HPLC chromatograms. It is seen that the PLS model allows better prediction than PCR model for column centered data (Figure 1). The results indicate that the predicted total phenol contents by PLS model are close to the true total phenol contents for each plant extract that reveal the validity of calibration model (Table 1).

PLS components were evaluated to verify whether the extracts with high total phenol content could be distinguished on the score plot. When examining the score plot (Figure 2) of the HPLC profiles obtained by PLS model, the samples with the total phenol content were clustered into three groups. The proximity of the total phenol contents between hydrolysed/unhydrolysed samples on the score plot results in the distinction of these groups, that is, containing (a) acidic ethyl acetate (samples 8-17-26-35), (b) acidic acetonitrile (samples 7-16-25-34) and (c) methanol (samples 1-10-19-28), acetonitrile (samples 2-11-20-29), ethyl acetate (samples 3-12-21-30), hexane (samples 4-13-22-31), acidic water (samples 5-14-23-32), and acidic butanol (samples 6-15-24-33) and acidic hexane (samples 9-18-27-36) extracts of four Prunella species. These groups allowed extracting additional information about Prunella species. From the groups (a) and (b) it is assumed that the extracts with high total phenol contents are in the same groups. Also the three groups demonstrated that the same solvent extracts of Prunella species have similar total phenol contents.

One of the major aspects of this study is to identify the compounds in the plant extracts potentially responsible for the total phenol content of the samples. The PLS loadings were evaluated to investigate the contribution of the phenolic compounds to the total phenol content (Figure 3). Peaks with negative loadings correspond to the compounds with high total phenol content. Positive loadings represent compounds that show an opposite behavior to the total phenol content. Peaks at 9.54, 24.52, and 25.40 min with negative PLS loadings contributes the total phenol content in samples. The phenolic compounds that can be present in Prunella species, that is, rosmarinic acid, caffeic acid, rutin, quercetin, and kaempferol are illustrated (Figure 4). It was observed that five compounds are in small or great amounts present in the Prunella samples. Rosmarinic acid (peak at 25.40 min) has negative PLS loadings (Figure 3). On the other hand, caffeic acid (peak at 19.80 min), rutin (peak at 22.60 min), quercetin (peak at 27.80 min), and kaempferol (peak at 29.18 min) have positive PLS loadings. However, peaks with positive PLS loadings (peak at 5.16, 5.40, 6.14, 6.73, 9.03, 9.53, 14.49, 25.16, 25.72, 27.2, 30.92, and 32.15 min) possibly correspond to compounds reducing total phenol content. The concentration of these compounds increases with a decrease of the total phenol content in the plant extract. The peaks with the positive PLS loadings are largely present in the Prunella species, possessing a little total phenol content.

Individual contribution of phenolic compounds to the total phenol content was investigated in order to explain positive and negative PLS loadings. Levels of total phenol content of individual phenolic compounds and mixtures at 1.66 mM were ranged between 0.267 μmol GAE and 0.331 μmol GAE by the Folin method (Table 3). The differences between total phenol contents of individual phenolic compounds added in extract could be explained by the number and position of substituted hydroxyl or methoxyl groups and glycosylation around the flavonoid skeleton, the interactions between phenolic compounds in extract and added standard phenolic compound in extract. The Folin method is based on the chemical oxidation of reduced molecules by a mixture of the two strong inorganic oxidants, phosphotungstic acid and phosphomolybdic acids. The results show that the concentration of rosmarinic acid in the extract is increased and also the total phenol content of extract is increased like negative PLS loading. However, the concentrations of rutin, quercetin, caffeic acid, and kaempferol in the extract are decreased and also the total phenol content of extract is decreased like positive PLS loadings.

5. Conclusions

According to the overview of the results, PCR and PLS calibration models were constructed to model the total phenol content of the Prunella samples from the HPLC profiles and to indicate peaks responsible for the total phenol content. The effectiveness of PCR and PLS models for predictions of total phenol contents was described in the chromatograms of the plant extracts. The PLS calibration has some advantageous over PCR and performs slightly better prediction. By comparing the peaks in the chromatograms with the loadings of the calibration models, peaks responsible for total phenol content can be indicated after alignment by correlation optimized warping, normalization, smoothing, and column centering. From this study, the application of PLS resulted in better model to predict the total phenol content of Prunella extracts from chromatograms and the contribution of each compound to the total phenol content is easy to interpret. Peak with major negative PLS loadings are responsible for the total phenol content determined by HPLC as rosmarinic acid in Prunella extracts. The retention time of three peaks indicated as potentially interesting compounds with negative and positive PLS loadings.

Acknowledgment

The authors are thankful to Uludag University Research Foundation (Project no. 2009/38) for providing financial support for this study.