Abstract

Given the extensive use of antibiotics at present, the identification of antibiotics and production quality monitoring are of high importance. However, conventional antibiotic identification methods have a low sensitivity and a long detection time. Here, we propose an identification method that combines terahertz (THz) spectroscopy and chemometric technology. THz time-domain spectroscopy (THz-TDS) was performed for sixteen types of antibiotics, including β-lactam, cephalosporins, macrolides, and tetracyclines. The absorption spectra within the frequency range of 0.2–1.5 THz were calculated. For dimensionality reduction, principal component analysis (PCA) and t-distributed stochastic neighbor embedding (t-SNE) were implemented, respectively. The data after dimensionality reduction were input into a support vector machine (SVM). The model parameters were optimized through grid search (GS), genetic algorithm (GA), and particle swarm optimization (PSO) methods, and the optimal identification results were obtained after comparison across these methods. Experiments indicate a differentiation of the THz absorption spectra among the sixteen types of antibiotics. After dimensionality reduction, the training time of the model significantly decreased. The use of the t-SNE-PSO-SVM model achieved the highest average accuracy on the prediction set, which was 99.91%. Thus, our study does not only confirm that the t-SNE-PSO-SVM model proves to be a reliable method for antibiotics identification, but also confirms that the combination of THz-TDS and chemometric pattern recognition has great potential for drug detection.

1. Introduction

Antibiotics are a large class of antibacterial chemical substances that occur naturally or are semisynthetic or synthetic. There are a great variety of antibiotics, which are further divided into seven major classes, namely, tetracyclines, macrolide antibiotics, aminoglycosides, peptide antibiotic, lincosamides, streptogramins, and β-lactam antibiotics [1]. Nearly every bacterium has a specific antibiotic against it. Antibiotics are mainly used to treat various types of bacterial infections in humans or in livestock to promote their growth. However, the problem of antibiotic residues has become increasingly severe due to excessive antibiotics use. Therefore, antibiotics detection and identification is of high importance [2]. In the past few decades, numerous efforts have been made to develop analytical methods for qualitative or quantitative determination of antibiotics. Conventional methods for antibiotics detection mainly include high-performance liquid chromatography (HPLC) [3] and gas chromatography mass spectrometry (GC-MS) [4]. Although these chromatography-based techniques are sensitive and reliable, they are usually time-consuming. Capillary electrophoresis (CE) [5], immunochemistry [6], and enzyme-linked immunosorbent assay (ELISA) [7] can achieve high-accuracy detection of antibiotics. However, these procedures usually involve complex sample preprocessing, which needs to be done by well-trained professionals. The expensive costs of surface plasmon resonance (SPR) sensors [8] and Raman spectroscopy [9] have restricted their extensive application. Given the above, it is necessary to establish a sensitive, fast, and reliable method for antibiotics detection [10].

The THz band has a wave frequency ranging between 0.1 and 10 THz, which is between the infrared and microwave frequencies. The THz waves are transient, safe (single photon energy, 4.1 meV) and highly penetrating, and have fingerprinting properties [11]. THz-TDS has already found extensive applications in biological tissue identification [12, 13], food and drug detection [14, 15], and explosive detection [16]. The vibration-rotation energy levels of such macromolecules as antibiotics are located within the THz band. As compared with other spectral detection methods, THz spectroscopy exhibits unique advantages. THz spectroscopy can detect not only molecular spinning and lattice vibration but also the inner structure and organizational features of the drugs. Limwikrant et al. [17] obtained the THz spectra of ofloxacin and complex of oxalic acid. Zhang et al. [18] analyzed the molecular vibration modes of piracetam and 3-hhydroxybenzoicacid. Zhang et al. [19] obtained the THz fingerprinting spectra of metronidazole, tinidazole, and ornidazole. Xie et al. [20] showed through DFT calculation that tetracycline had definitive THz absorption spectra at certain frequencies. Many studies have demonstrated the feasibility of applying THz spectroscopy to antibiotics detection. Qin et al. [2124] applied THz spectroscopy to the detection of tetracycline hydrochloride and achieved good results. Massaouti et al. [25], Wang et al. [26], and Long et al. [27] used a similar method, the quantitative detection of antibiotics in the samples.

The THz technology offers extensive applications in the research fields of pesticide and antibiotic identification and residual pesticide detection [28]. Many studies have shown that THz spectroscopy is a feasible detection technique for antibiotics. Most of the studies focus on quantitative detection of antibiotics, though the use of THz spectroscopy to identify antibiotics has been rarely reported. Yan et al. [29] applied three-layer BP neural networks to identify absorption spectra of nine illicit drugs and six antibiotics, but the average identification rate was low. In this study, THz-TDS was applied to the detection of sixteen types of antibiotics, including penicillins, cephalosporins, macrolides, and tetracyclines. Then, the THz absorption spectra of these antibiotics were calculated. Dimensionality reduction was performed using principal component analysis (PCA) and t-distributed stochastic neighbor embedding (t-SNE). Next, pattern recognition was performed using the GS-SVM, GA-SVM, and PSO-SVM models, and the best identification model was found by comparison. Thus, a novel method for fast and reliable antibiotics identification was established.

2. Materials and Methods

2.1. Terahertz Spectroscopy System

A Z-3 Time-Domain Spectrometer (Zomega, USA) was used for the experiments. The system was located within a closed hood during the measurement process to reduce the influence of water vapor. The ambient temperature was controlled at 23°C, and humidity was below 2%. The THz-TDS parameters were set as follows: wavelength of the femtosecond laser system 800 nm, frequency 80 MHz, pump light intensity 100 mW, probe light intensity 20 mW, scan stroke 50 ps, useful spectral range 0.2–2 THz, and dynamic range above 70 db. All the experiment process was shown in Figure 1.

2.2. Sample Preparation

Sixteen types of antibiotics, including β-lactam, tetracyclines, macrolides, and cephalosporins, were used. The name, class, and main ingredient of these antibiotics are shown in Table 1. First, the drug samples were ground in an agate mortar to avoid scattering of the THz waves caused by particle heterogeneity and also to increase the signal-to-noise ratio. Then a certain amount of the sample was weighed and placed on the automatic tablet press, with the pressure set to 2 tons and pressure maintenance time of 1 min. A digital caliper (precision 0.02 mm) was used to measure the thickness of the sample tablets. Thus, 40 samples were prepared for each of the 16 types of antibiotics and used to detect THz absorption spectra. Finally, the 640 samples were randomly divided into a training set ( samples) and a test set ( samples).

2.3. Data Processing

The THz time-domain spectral information of the samples was obtained. The reflection peaks were removed by empirical mode decomposition [30]. Denoising was done by Savitzky–Golay filtering, followed by Fourier transform to convert the time-domain information to the frequency-domain information. The model was extracted based on the optical parameters proposed by Dorney et al. [31] and Duvillaret et al. [32], and the absorption coefficient of the sample was calculated.where is the amplitude ratio; is the signal amplitude of the sample; is the signal amplitude of the reference; is the phase difference; is the phase of the sample; and is the phase of the reference.

The index of refraction and absorption coefficients are calculated using the formulae below:where is the index of refraction; is the absorption coefficient; c is the speed of light in a vacuum; is the angular frequency; and d is the sample thickness.

3. Results and Discussion

3.1. Spectral Analysis

THz-TDS was performed for the sixteen types of antibiotics shown in Table 1. The THz time-domain spectra thus obtained are shown in Figure 2(a), and on this basis, the frequency-domain spectra and absorption spectra were calculated. The spectra corresponding to the frequency from 0.2 to 1.5 THz are shown in Figures 2(b) and 2(c). The sixteen types of antibiotics were barely differentiated by the time-domain and frequency-domain spectra. Some of the antibiotics shared the same absorption peaks, and the antibiotics could not be differentiated by the spectral features alone. To solve this problem, we introduced chemometric pattern recognition and established identification models.

3.2. Visualization of the Sample Classification

PCA can reduce a large number of intercorrelated indicators into a group of fewer and nonintercorrelated synthetic indicators. PCA usually consists of the following steps [33]. First, calculate the covariance matrix of the sample data, and then calculate the eigenvalues of the covariance matrix and the corresponding orthogonal unit eigenvectors. Sort the eigenvalues, and choose the maximum eigenvalues and the corresponding eigenvectors. Convert the data to the new space constructed by these eigenvectors. PCA can effectively restore the original data and solve the problems of information overlap and multicollinearity while reducing the dimensionality of data.

t-SNE is a method that introduces a t-distribution to optimize the crowding problem suffered by the original SNE algorithm [34]. The core principle of t-SNE is to perform similarity modeling of the data points by using a normalized Gaussian kernel in the high-dimensional space and by using a t-distribution in the low-dimensional space. Following this principle, there will be a higher probability of similar points being selected and a lower probability of nonsimilar points being selected.

This algorithm consists of the following steps [35]:

First, represent the similarity between the two data points by conditional probability.

Then, represent the joint probability distribution of the low-dimensional data by a t-distribution with a degree of freedom of 1.

Finally, obtain the optimal simulation points by gradient descent that minimizes the KL divergence of all points. Thus, samples in the low-dimensional subspace are obtained. Make sure that the probability distribution of data mapped into the low-dimensional space can effectively simulate the probability distribution in the high-dimensional space .

For the selected frequency band, the number of dimensions of data from the absorption spectra was as high as 143. In order to reduce the training time of the model and to increase the accuracy of the identification models, dimensionality reduction was performed using PCA and t-SNE, which was followed by pattern recognition using different models. Then, different methods were compared to find the optimal dimensionality reduction method for antibiotics identification. PCA was applied to the absorption spectra of 640 samples . Figure 3(a) shows the 3D distribution of the principal components of the absorption spectra for different antibiotics. Three principal components (PC1, PC2, and PC3) were identified, and their contribution rates were 86.62%, 10.23%, and 1.13%, respectively. The sum of the contribution rates of the three principal components was 97.98%. Therefore, these three principal components could sufficiently represent the original absorption spectra. Figure 3(b) shows the 3D distribution of the different antibiotics visualized by t-SNE. It is clear to see that the divergence of the samples in Figure 3(b) is far higher than that in Figure 3(a). The samples were well clustered together, with few overlaps between different classes.

3.3. Identification Analysis

After dimensionality reduction by either PCA or t-SNE, the new data matrix (640 samples × 3 dimensions) was used to replace the original spectral data matrix (640 samples × 143 dimensions). The 640 samples were randomly divided into a training set ( = 480 samples) and a test set ( = 160 samples). The parameters of the SVM model were trained using the training set. Then, SVM was, respectively, combined with GS, GA, and PSO to optimize the model parameters [36, 37]. Finally, the prediction accuracy of the model was evaluated using the test set. The optimal combination of dimensionality reduction method and model parameter optimization was determined by comparison. Thus, the optimal identification model for the THz spectra of the antibiotics was established.

Here, the identification model was built based on an SVM. An SVM is a supervised machine learning algorithm. In SVMs, the optimal decision hyperplane is found that maximizes the distance from the two sides of the hyperplane to the two classes of samples nearest to the hyperplane. In this way, good generalization is achieved for identification. The performance of an SVM mainly depends on the penalty factor c and kernel parameter of the model. The model should be trained to achieve the optimal identification result, and the optimal model parameters should be chosen. To do this, parameters c and were first optimized by grid search (GS). Then, GS was combined with different dimensionality reduction methods to establish the No-GS-SVM, PCA-GS-SVM, and t-SNE-GS-SVM models. The optimal cross-validation accuracy (CVAccuracy) of each model was determined using 5-fold cross-validation, along with the prediction accuracy of this model on the training set and test set. The results are shown in Table 2. Figure 4 shows the results of parameter optimization by GS-SVM. Figure 4(a) shows the 3D results of parameter selection by No-GS-SVM, with the CVAccuracy being 99.5833%. Figure 4(b) shows the 3D results of parameter selection by PCA-GS-SVM, with the CVAccuracy being 99.7917%. Figure 4(c) shows the 3D results of parameter selection by t-SNE-GS-SVM, with a CVAccuracy of 100%. It is clear to see that the recognition accuracy was the highest after dimensionality reduction with t-SNE.

A genetic algorithm (GA) and particle swarm optimization (PSO) were introduced to find the optimal combination of parameters c and to further improve the prediction accuracy. The initial population size was set to 20, and the number of iterations was 50. The CVAccuracy, training set accuracy, and prediction set accuracy of No-GA-SVM, PCA-GA-SVM, and t-SNE-GA-SVM under 5-fold cross-validation are shown in Table 2. The fitness curves of the three models are presented in Figure 5. Figure 5(a) shows the fitness curve of No-GA-SVM, with a CVAccuracy of 99.7917%; Figure 5(b) shows the fitness curve of PCA-GA-SVM, with a CVAccuracy of 100%; Figure 5(c) shows the fitness curve of t-SNE-GA-SVM, with a CVAccuracy also of 100%. As compared to the GA, PSO does not include crossover and mutation operations, and the global optimum is searched by tracking the current optimal value. For this reason, the accuracy of PSO is higher. The initial population size was set to 20, and the number of iterations was 50. The CVAccuracy, training set accuracy, and prediction set accuracy of No-PSO-SVM, PCA-PSO-SVM, and t-SNE-PSO-SVM under 5-fold cross-validation are shown in Table 2. The fitness curves of the three models are shown in Figure 6. Figure 6(a) is the fitness curve of No-PSO-SVM, with a CVAccuracy of 100%; Figure 6(b) is the fitness curve of PCA-PSO-SVM, with a CVAccuracy of 100%; Figure 6(c) is the fitness curve of tSNE-PSO-SVM, with a CVAccuracy also of 100%. The optimal recognition accuracy was reached after PSO.

As shown in Table 2, under 5-fold cross-validation, the CVAccuracy of the same SVM model combined with t-SNE was higher than that of the SVM model combined with PCA or the SVM model without dimensionality reduction. When the same dimensionality reduction method was used, PSO-SVM exhibited the best identification performance compared to GA-SVM and GS-SVM.

PCA and t-SNE were, respectively, combined with GS-SVM, GA-SVM, and PSO-SVM. The samples were randomly divided into a training set and test set. Each model was run 100 times to calculate the average training accuracy, average prediction accuracy, and average time consumption. The comparison results are shown in Table 3. For the same recognition model, both the average training accuracy and prediction accuracy were higher with dimensionality reduction than without dimensionality reduction. t-SNE was consistently superior to PCA for dimensionality reduction for the THz spectra of antibiotics and also better than no use of dimensionality reduction. Additionally, the training time of the model was significantly shortened after dimensionality reduction. The time of a single training run after dimensionality reduction with PCA was shorter than that with t-SNE. This comprehensive comparison indicated that of 9 recognition models, t-SNE-PSO-SVM had the highest average prediction accuracy of 99.91%. Therefore, t-SNE-PSO-SVM was better for the recognition of THz spectra of antibiotics and had higher practical application value.

4. Conclusions

The present study was mainly concerned with antibiotics identification based on THz-TDS. Antibiotics come in many forms, and their direct differentiation may be impossible. We found that the THz time-domain spectra and absorption spectra only displayed minor differences between different antibiotics, which made direct differentiation difficult. Therefore, chemometric pattern recognition was introduced to build recognition models for antibiotics. PCA and t-SNE were, respectively, used for feature selection and dimensionality reduction. Then, these two methods were combined with GS-SVM, GA-SVM, and PSO-SVM to build the identification models. The optimal model was chosen after parameter optimization and comparative analysis. The experiments showed that the training time of the identification model was significantly shortened after dimensionality reduction, and the recognition accuracy was higher with t-SNE than with PCA. The comprehensive comparison indicated that t-SNE-PSO-SVM had the highest average prediction accuracy among all models, which was 99.91%. Therefore, t-SNE-PSO-SVM was more suitable for antibiotics identification. Our study also confirmed that the combination of THz-TDS and chemometric pattern recognition has great potential for drug detection.

Data Availability

The data used to support the findings of this study have not been made available because the experimental data involved in the paper are all obtained based on our own designed experiments and need to be kept confidential, we are still using it for further research.

Conflicts of Interest

The authors declare that they have no conflicts of interest regarding the publication of the paper.

Acknowledgments

This work was supported by the National Defense Basic Scientific Research Program of China (JCKY2018404C007, JSZL2017404A001, and JSZL2018204C002); Sichuan Science and Technology Program of China (2019YFG0114); and Graduate Student Innovation Fund of SWUST (19ycx0104).