Abstract

The nondestructive and high sensitive analysis of a low content of an active pharmaceutical ingredient (API) was a difficult problem, especially in a complex system of pharmaceutical formulations. In this paper, a rapid and no sample preparation method was developed, which used a 1064 nm Raman spectrometer to detect entecavir monohydrate (ETV-H) in Baraclude tablets. Entecavir was a drug approved by FDA for the treatment of chronic hepatitis B and became the first choice in the market. The wavelength selection results displayed that the signal-to-background ratio of the Raman spectrum with 1064 nm excitation wavelength was 14 times that of the commonly used 785 nm wavelength. The partial least squares (PLS) method was used to calibrate concentration models containing 0.1% to 1.0% w/w% ETV-H in calibration set samples. Different preprocessing methods were used to eliminate the background interference and extract more spectral information. Calibration samples were used to choose the best performing model. Then, all the calibration samples combined with the best performing models’ parameters successfully predicted the content of ETV-H in Baraclude tablets. Combining baseline processing and standard normal variate (SNV) with PLS, the model showed a good result with an R2 of 0.973, RMSEC of 0.05%, and RMSEP of 0.03% on the spectral region of 1350–1700 cm−1. The limit of detection of this model was 0.17%. These results showed that 1064 nm Raman spectroscopy technology could be an alternative analytical procedure to quantify low-content API in intact tablets.

1. Introduction

In 2005, Baraclude (entecavir) tablets were approved by the FDA for the treatment of chronic hepatitis B, which had infected more than 400 million people in the world [1]. In the studies of FDA, people treated with Baraclude showed significant improvement in the liver inflammation and liver scarring caused by HBV. So, entecavir (ETV) had been the first choice for hepatitis B treatment in the market [2]. ETV was a carbocyclic 2′-deoxyguanosine analogue, which could phosphorylate into the form of triphosphate that can inhibit HBV in active cells. The chemical name for ETV was 2-amino-1, 9-dihydro-9-((1S, 3R, 4S)-4-hydroxy-3-(hydroxymethyl)-2-methylenecyclopentyl)-6H- purin-6-one. ETV monohydrate (ETV-H) was the API of commercial ETV pharmaceutical tablets.

In the pharmaceutical industries, pharmaceutical quality control (QC) and quality assurance (QA) were necessary at all stages of product processing from a raw material to a packaged product. In the pharmaceutical production process, the final drug product might contain an insufficient or excessive content of API [2, 3]. So, QC of the active pharmaceutical ingredients (APIs) content in pharmaceutical tablets was necessary to ensure the safety and efficacy of the drug products [46]. With respect to low content API formulations, FDA described that the API in the formulations was less than 1% [7]. The possible problem for low-content API formulations was the low relative potency of the final product due to its vulnerability to loss and contamination in the production process [8]. So, it was more significant for the QC of low-content API formulations. Traditional QC methods such as High-Performance Liquid Chromatography (HPLC), solid state NMR spectroscopy, Mass Spectroscopy (MS), and X-ray powder diffraction were time consuming, destructive, expensive, and required lengthy sample preparation [913]. Motivated by the process analytical technology (PAT) initiative, QC techniques were being developed to provide real-time process control for better understanding of the chemical and physical processes during the production process. Vibrational spectroscopic techniques such as Near Infrared spectroscopy (NIRS) and Raman spectroscopy emerged as valuable tools for pharmaceutical quality analysis because of their fast analysis speed, no need for sample preparation, and no damage to the sample [1417]. Due to the higher resolution than NIR, Raman spectroscopy became the more promising QC technique in the pharmaceutical industry [1820]. Griffen et al. reported quantification of the low level polymorph content (0.62–1.32%) in tablets by transmission Raman spectroscopy [21]. Li et al. had demonstrated low-content (<0.1%) quantification in powders using 785 nm Raman spectroscopy [22]. Assi et al. studied the application of a handheld Raman spectrometer for the quantification of ciprofloxacin in proprietary ciproxin tablets and generic ciprofloxacin tablets [23]. Daniela et al. studied the potential of Raman spectroscopy as process analytical technology (PAT) for the in-line and real-time monitoring of the powder blending process and proved that Raman spectroscopy was effective in the determination of API in the tablets. They also found that Raman performed better than the traditional HPLC analysis [24]. Liljana et al. compared three technologies including midinfrared, near-infrared, and Raman spectroscopy for the quantitative analysis of low API in blending powders and found that Raman presented the most favorable statistical indicators in this comparative study [25].

Since the Raman signal was proportional to the concentration of the analyte in the sample, the application of univariate regression analysis (peak height/peak area) was theoretically feasible [26, 27]. However, in the quantitative analysis of most tablets, useful single peaks were difficult to find due to the interference of other components in the tablet. So, the application of more complex multivariate methods such as PLS (partial least-squares) regression and PCA (principal components analysis) was required. The multivariate methods could handle thousands of variables to improve the accuracy of the methods, since not only the intensity or area of the selected bands but also the variation over the entire spectral range were considered. Therefore, Raman spectroscopy combined with complex multivariate algorithms provided a promising basis for the development of successful quantitative models, even for fairly complex mixtures or low content of the tested components. Farias et al. applied Raman spectroscopy combined with PLS to determine and quantify crystalline forms of the API in final products as obtained in the production lines [28]. Maxwell et al. reported the development of chemometric models (PLS and PCR) of Raman spectroscopy to determine the polymorphic changes of theophylline in pharmaceutical products [29].

Before building quantitative models of APIs in tablets, it was necessary to pretreat the data to remove the interference information in Raman spectra. Baseline correction was commonly used in Raman spectroscopy to eliminate fluorescent contaminants and instrumental factors. Three commonly used preprocessing methods were multiplicative scatter correction (MSC), standard normal variate (SNV), and Savitzky–Golay derivatives (SG derivatives). The purpose of MSC was to remove scattering artifacts and that of SNV was to remove the scattering variations between measurements [30, 31]. SG derivatives were used for the purpose of removing noise and background variances [32].

In the present study, Raman spectroscopy with an excitation wavelength of 1064 nm was used to quantify the low-content ETV-H in tablets for the first time. Calibration samples with different concentrations of ETV-H in the dosage forms were prepared and determined by Raman spectroscopy. Pretreated methods (baseline correction, MSC, SNV, and Savitzky–Golay first and second derivatives) were performed for the data. Then, the data were used to build the quantitative model to detect the low-content ETV-H in Baraclude tablets in the market.

2. Materials and Methods

2.1. Materials

ETV-H was obtained from Zhejiang Ausun Pharmaceutical Co., LTD (purity > 99.9%). Lactose monohydrate (Lac-H), Microcrystalline cellulose (MCC), Crospovidone (PVPP), Polyvinyl pyrrolidone K30 (PVP K30), and Magnesium stearate (MgSt) were purchased from Aladdin Ltd. (Shanghai, China). Baraclude® (Entecavir, Bristol Myers Squibb Company) was purchased from the local pharmacy.

2.2. Preparation of Calibration and Test Set Samples

As the commercially available Baraclude tablets only had an API content of 0.25% w/w%, the calibration set was developed as with an ETV-H mass concentration of 0.1%–1.0% w/w%. The content of each component in the calibration set samples is shown in Table 1. All the components were sieved through a 400-mesh sieve (with an aperture of 38 μm) for homogeneity and weighed according to the quantities listed in the table. In order to mix absolutely, the total weight of each concentration sample was increased to 2000 mg. Tablets were prepared by mixing the API and all excipients using an MX-S vortex oscillator. All concentration samples were divided into ten tablets, and each tablet (200 mg) was pressed by using a YP-15 manual powder compactor (Josvok technology co. LTD, Tianjin, China) with a 10 mm die set. The compression force was 25 MPa with a dwell time of 1 min. Each concentration was tested by Raman spectroscopy using three tablets.

The test set contained two different specifications of Baraclude tablets (0.5 mg and 1.0 mg ETV-H in tablet, the concentration of ETV-H both was 0.25% w/w%) and homemade tablets with an ETV-H mass concentration of 0.5% w/w%. Each test sample was determined by Raman spectroscopy using two tablets.

2.3. Instrumentation and Software

The Raman spectra were collected by using a Rigaku Progeny handheld Raman spectrometer (Rigaku Co., Tokyo, Japan) with a 1064 nm high-power excitation laser. The instrument could give a maximum laser power of 490 mW at the source. The actual laser power reaching the sample was 142 mW. The focused spot diameter was 25 μm. Raman spectra were recorded in the wave number range 200–2500 cm−1 at a resolution of 8–11 cm−1 with transmission volume phase gratings.

A LabRAM HR Evolution Raman spectrometer (Horiba Jobin Yvon Inc.) with 532 nm, 633 nm, and 785 nm excitation wavelengths was also used to obtain the Raman spectra of ETV-H. The laser output power was set to 10 mW (maximum output power), and the integration time was set to 10s over the Raman shift range of 200 cm−1 to 4000 cm−1.

Multivariate data analysis was carried out using Matlab R2017b (Mathworks Inc., MA, USA).

2.4. Spectral Pretreatments and Chemometrics

In order to establish better analysis and prediction methods, it was necessary to select the wave number range that was favorable for predicting analytes and eliminating noise before quantitative analysis. This study chose the correlation coefficient method to select the wave number range. The correlation coefficient method was realized by calculating the correlation between the ETV-H concentration in tablets and wave numbers. The correlation coefficient is calculated bywhere j represented the jth wave number and i represented the ith sample, xi,j was the Raman intensity of the jth wave number of the ith sample, was the mean of the jth wave number, was the mean reference value for all samples, and yi was the reference value of the ith sample.

After selecting the wave number range, Raman data should be pretreated by preprocessing methods.

Baseline correction: this method selected the polynomial fitting method for the pretreatment of the spectra. Savitzky–Golay first and second derivatives, standard normal variant (SNV), and multiplicative scattering correction (MSC) were performed before quantitative analysis. The first and second Savitzky–Golay derivatives were performed with a window size of 10 points and a second-order polynomial. The calibration set was validated by dividing the samples randomly into two sets (one set was the calibration set and another set was the validation set).

Partial least squares (PLS) were used for quantitative modeling. To avoid the overfitting of the model, the most suitable PLS model should have a low number of PLS latent variables (factors) and low values of three parameters, which were the root-mean-square error of cross validation (RMSECV), root-mean-square error of calibration (RMSEC), and root-mean-square error of prediction (RMSEP). RMSE was defined as the following equation:where xi is the measurement value, yi is the prediction value, and n is the number of samples [33].

Limit of detection (LOD) of the calibration model was defined as follows [21]:where σ is the standard deviation of the regression fit and S is the slope of the calibration curve.

3. Results and Discussion

3.1. The Determination and Selection of Different Raman Wavelengths

The ETV-H solid powder was measured by different excitation wavelengths of Raman. As shown in Figure 1, the Raman spectrum obtained from 1064 nm excitation wavelength got higher signal-to-background (SBR) and lower background fluorescence interference than the 785 nm, 633 nm, and 532 nm Raman spectra. For the 532 nm excitation wavelength spectrum, the SBR ratio (Signal/Background, S/B) at 1487 cm−1 (the highest peak in the spectrum) was 0.1. Also, the S/B was 1.0 for 633 nm, 1.9 for 785 nm, and 27.3 for 1064 nm. Obviously, the S/B of the spectrum with an excitation wavelength of 1064 nm was significantly higher than several other wavelengths. S/B of the Raman spectrum with 1064 nm excitation wavelength was 14 times that of the commonly used 785 nm wavelength. Therefore, this work used 1064 nm excitation wavelength to quantify ETV-H in tablets.

3.2. Chemometric Models for Quantitation of Baraclude Tablets

Calibration set samples (21 samples) with different API mass concentrations (w/w%) were measured by Raman spectroscopy at 1064 nm excitation wavelength. The coating layer of Baraclude tablets (the ingredient was Opadry®) was carefully scraped off with a special blade. After the coating layer was removed, Baraclude tablets (0.5 mg and 1.0 mg ETV-H in tablet, the concentration of ETV-H was both 0.25% w/w%) were measured to obtain Raman spectra of the test set. All Raman spectra of the samples are shown in Figure 2. With the increase of the concentration of ETV-H in tablets, Raman intensity changed slightly at the peak of 1580 cm−1 (characteristic Raman peak of ETV-H). So, it was necessary to select the wave number range instead of the single peak.

Raman spectra of ETV-H and all excipients are presented in Figure 3. The assignments of all Raman characteristic peaks of ETV-H and all excipients are listed in Table 2. The main features of ETV-H Raman spectra were the strong bands at 1487 cm−1 and 1580 cm−1, which were assigned to the stretching vibration of N-C and bending vibration of H-N-H. The strong band at 614 cm−1 was assigned to the bending vibration of C-C-C, and the other strong band at 672 cm−1 was assigned to the torsion vibration of N-C-N-C, N-C-N-H and out-of-plane deformation vibration of N-N-N-C, O-C-N-C. The bending vibration of OCH and torsion vibration of H-O-C-H gave rise to a strong band at 1329 cm−1. The stretching vibration of N-C, C-C, O-C and bending vibration of N-C-H, H-C-H gave rise to Raman bands of 1065 cm−1, 1196 cm−1, and 1684 cm−1.

The Raman characteristic peaks of lactose monohydrate (Lac-H) were 355 cm−1, 376 cm−1, 398 cm−1, 473 cm−1, 852 cm−1, 879 cm−1, 915 cm−1, and 954 cm−1. The strong bands in the region of 852–915 cm−1 were ascribed to the stretching vibration of O-C and C-C and out-of-plane deformation vibration of O-C-C-C. The strong bands at 355 cm−1, 376 cm−1, 398 cm−1, and 473 cm−1 were assigned to the bending vibration of O-C-C. The 954–1138 cm−1 region could be assigned to the stretching vibration of O-C. The strong bands in the region of 1220–1472 cm−1 were ascribed to the bending vibration of H-C-O and H-C-H.

Raman characteristic peaks for microcrystalline cellulose (MCC) were 379 cm−1, 439 cm−1, 455 cm−1, 899 cm−1, 972 cm−1, 998 cm−1, 1036 cm−1, 1063 cm−1, 1094 cm−1, 1123 cm−1, 1154 cm−1, and 1380 cm−1. The band region of 439–899 cm−1 could be assigned to the out-of-plane deformation vibration of O-C-C-C. Also, the stretching vibration of O-C appeared as a series of bands in the region of 972–1154 cm−1. A strong band at 379 cm−1 was ascribed to the bending vibration of H-O-H, O-C-C. In addition, the other strong band at 1380 cm−1 was assigned to the bending vibration of H-C-O.

It could be concluded that ETV-H differed from all the excipients in the spectra region of 1350–1700 cm−1. This meant that API could be distinguished in this region. The correlation coefficient method was also used to select the characteristic wavelength, and the results showed that the wave number range of 1350–1700 cm−1 could be used as the feature range. After selecting the characteristic wavelength range, the number of wave numbers used for the quantitative model could be reduced from 512 to 72.

The data used to support the findings of this study are available from the corresponding author upon request.

The blue dotted frame represents the selected spectral range of 1350–1700 cm−1.

Partial least squares (PLS) method was used to calibrate concentration models in 21 calibration set samples containing 0.1%, 0.2%, 0.3%, 0.4%, 0.6%, 0.8%, and 1.0% w/w% ETV-H. The optimum number of PLS latent variables (LVs) was chosen by comparing the root mean squared error of cross validation (RMSECV, leave-one-out validation) to ensure that most of the variations were included [34]. As shown in Figure 4(a), three PLS components were selected based on the minimal RMSECV values obtained by combining baseline correction and SNV for the pretreatment of the samples within the wave number range of 1350–1700 cm−1. This methodology avoided overfitting of the model when excess potential variables were selected.

A latent variables loadings plot of the model (three PLS components, baseline correction, and SNV for the pretreatment of the samples within the wave number range of 1350–1700 cm−1) is shown in Figure 4(b). The loadings plot revealed that ETV-H was positively correlated with latent variable 1 (LV1) and latent variable 2 (LV2), such as the characteristic peaks of ETV-H at 1487, 1540, and 1580 cm−1.

The positively correlated peaks in LV1 were 288 cm-1, 355 cm-1, 398 cm-1, 473 cm-1, 850 cm-1, 874 cm-1, 1083 cm-1, 1138 cm-1, 1196 cm-1, 1260 cm-1, 1325 cm-1, 1467 cm-1, 1487 cm-1, 1540 cm-1, 1578 cm-1, and 1686 cm-1. Referred to the data in Table 2, the correlated peaks at 288 cm-1, 1196 cm-1, 1325 cm-1, 1487 cm-1, 1540 cm-1, 1578 cm-1, and 1686 cm-1 were produced by ETV-H. Other correlated peaks at 355 cm-1, 398 cm-1, 473 cm-1, 850 cm-1, 874 cm-1, 1083 cm-1, 1138 cm-1, 1260 cm-1, and 1467 cm-1 were originated from lactose monohydrate (Lac-H). The large positive coefficient produced by Lac-H was caused by its highest content in tablet and the content varied slightly with the change of API concentration.

The positively correlated peaks in LV2 were 379 cm−1, 435 cm−1, 457 cm−1, 612 cm−1, 670 cm−1, 965 cm−1, 992 cm−1, 1067 cm−1, 1094 cm−1, 1124 cm−1, 1154 cm−1, 1387 cm−1, 1485 cm−1, 1540 cm−1, and 1576 cm−1. Referred to the data in Table 2, the positively correlated peaks at 379 cm−1, 435 cm−1, 457 cm−1, 965 cm−1, 992 cm−1, 1067 cm−1, 1094 cm−1, 1124 cm−1, 1154 cm−1, and 1387 cm−1 were produced by microcrystalline cellulose (MCC). Other correlated peaks at 612 cm−1, 670 cm−1, 1485 cm−1, 1540 cm−1, and 1576 cm−1 were originated from ETV-H. The reason for the large positive coefficient produced by MCC was the second highest content in tablet and the content changed slightly for the samples of different API concentrations.

The total scores of the three variables reached 99%. All the PLS components for other models were chosen for the same method.

Then, the total 21 calibration set samples were divided into a calibration set (14 samples) and a validation set (7 samples). Combining different preprocessing methods and spectral ranges, we obtained 16 models (such as the model with baseline correction as the pretreatment method in the full spectral range). As shown in Table 3, PLS LVs of all the models were calculated by the method mentioned above. The accuracy of the PLS calibration model was evaluated by assessing the correlation coefficient (R2) and RMSEP. So, the RMSECV, R2, RMSEC, and RMSEP values of all the models were calculated by using the corresponding LV value. The best performing model was marked in italics in Table 3, which was the model combining the baseline correction with the SNV method for the data pretreatment in the wave number range of 1350–1700 cm−1. As shown in Figure 5, the squared correlation coefficient of the best model was 0.970, RMSEC was 0.05%, and RMSEP was 0.05%. These results represented the reliability and accuracy of the model.

Then, all the calibration samples (21 samples) were used to build PLS models using the best performing model parameters, and the ETV-H content in Baraclude tablets and homemade tablets was successfully predicted. The best quantitative model was obtained from the spectral region 1350–1700 cm−1 using baseline correction and SNV as the preprocessing method with the results of an R2 of 0.973, RMSEC of 0.05%, and RMSEP of 0.03% (Figure 6). As shown in Figure 6, due to the content uniformity margin of the tablets in which the API content would fluctuate, the surface of the prepared tablets was not uniform despite the long mixing process. Raman spectroscopy could not overcome the problem of uneven surface content, so the exact ETV-H content in each tablet could not be determined by Raman spectroscopy. Therefore, for all samples, only approximate values for the real ETV-H concentration were used for the quantitation model. The error of the approximate value plus the error of the measured value would produce error transmission and affected the statistical indicators of the models R2, RMSEC, and RMSEP.

It could be seen that the original data could be used to quantify ETV-H in Baraclude tablets after baseline processing and SNV pretreatment. In addition, the prediction results of the models were not bad when using SNV pretreatment only. From this point of view, SNV performed better than other pretreatment methods for Raman spectral data. According to equation (3), the σ value of the LOD equation referred to RMSEC of the model, and S referred to slope of the calibration curve. So, the LOD of the best performance model was 0.17%. According to US Pharmacopeia guidelines, the acceptable percentage of the labeled amount of ETV-H in the drug content ranged from 90.0% to 105.0%. As shown in Table 4, the predicted ETV-H contents in commercial tablets were all within the above range.

4. Conclusions

Raman spectroscopy with 1064 nm excitation wavelength was successfully employed as an analytical tool for the nondestructive, and no sample preparation required determination of low content ETV-H in Baraclude tablets. By applying different preprocessing methods (baseline correction, SNV, MSC, and Savitzky–Golay first and second derivatives), PLS quantitative models were built to predict the concentration of ETV-H in Baraclude tablets. The calibration samples were divided into two sets, and the best performing model was chosen. Then, the best chemometrics model using all the calibration samples to build the model and predict the ETV-H content in the test set samples. It showed a good result with an R2 of 0.973, RMSEC of 0.05%, and RMSEP of 0.03% on the spectral region 1350–1700 cm−1 with baseline processing and SNV as preprocessing methods for the raw data. The LOD of the best performance model was 0.17% w/w%. The predicted ETV-H contents in Baraclude tablets were all in the range defined in US Pharmacopeia.

In addition to the backscattering mode, Raman technology also had other two modes, transmission mode Raman and spatially offset Raman spectroscopy. Transmission mode Raman spectroscopy had the ability to penetrate the entire tablet to obtain the whole information which included API concentration and the spectra of all ingredients. Spatially offset Raman spectroscopy had the ability to obtain deep feature information inside the sample through nontransparent packaging or surface. In comparison, backscattering Raman could only obtain information on the sample surface, which would be insufficient when analyzing low-content API tablets. However, compared with the backscattering Raman, spatially offset Raman and transmission Raman spectroscopy both require larger laser power, which might affect the measured sample. So, this study chose backscattering Raman spectroscopy for the quantification of low-content API tablets. The results showed that 1064 nm Raman spectroscopy had the capability to predict the low-content API in pharmaceutical tablets in the market.

Data Availability

The data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest

The authors declare no conflicts of interest.

Authors’ Contributions

Yanlei Kang and Yushan Zhou contributed equally to this work.

Acknowledgments

This work was supported by the National Key Research and Development Program of China (grant nos. 2016YFC0800900, 2016YFC0800905, 2016YFC0800905-Z03, and 2016YFC0800902); the Key Research and Development Program Projects of Zhejiang Province (grant no. 2018C03G2011156); and the Research Project of the State Key Laboratory of Industrial Control Technology, Zhejiang University, China (grant no. ICT1806).