Abstract

The hazardous psychoactive designer drugs are compounds in which part of the molecular structure of a stimulant or narcotic has been modified. A quantitative structure-retention relationship (QSRR) study based on a Levenberg-Marquardt artificial neural network (L-M ANN) was carried out for the prediction of the capacity factor (k′) of hazardous psychoactive designer drugs that contain Tryptamine, Phenylethylamine and Piperazine. The genetic algorithm-partial least squares (GA-PLS) method was used as a variable selection tool. A PLS method was used to select the best descriptors and the selected descriptors were used as input neurons in neural network model. For choosing the best predictive model from among comparable models, square correlation coefficient (R2) for the whole set is suggested to be a good criterion. Finally, to improve the results, structure-retention relationships were followed by nonlinear approach using artificial neural networks and consequently better results were obtained. Also this demonstrates the advantages of L-M ANN. This is the first research on the QSRR of the designer drugs using the GA-PLS and L-M ANN.

1. Introduction

Designer drugs (sometimes also referred to as club drugs) are a particular class of synthetic drugs most often associated with underground youth dance parties called raves, wherein participants listen to techno music and experiment with psychoactive substances. These drugs have been created by changing the molecular structure of other existing drugs, to create something new with similar pharmacological effects, hence, the name designer drug. They are plentiful, cheap, and dangerous. For example, the pharmaceutical drug amphetamine (which was originally created as an anesthetic) has been modified to be 80 to 1,000 times more potent than heroin. Prepared by underground, amateur chemists known as cookers, designer drugs can be injected, smoked, snorted, or ingested. These synthetic drugs can be easily obtained on the street or on the Internet.

Once changed, they become known by a variety of street names, for example, XTC, Ecstasy, Adam, Lover’s Speed, Special K, Fantasy, and Nature’s Quaalude. Most have a rapid onset of affect (1 to 4 minutes) and a short duration of action (generally 30–90 minutes, and no more than a few hours). They are sold as tablets or capsules, and often produce feelings of stimulation and euphoria, a sense of well-being, and various sensory distortions. Higher doses can lead to paranoia, hallucinations, violent or otherwise irrational behavior, and fatal overdosing. Some designer drugs are depressants, so they are used when an individual is coming down from a stimulant like Ecstasy.

In general, the physical symptoms common among users of psychoactive designer drugs include hypertension, increased heart rate, clenched teeth, blurred vision, uncontrolled tremors, anorexia, nausea and vomiting, impaired speech, seizures, permanent brain damage, and death.

Some common psychological side effects include confusion, irritability, severe anxiety, extreme emotional, sensitivity, depression, amnesia, violent behavior, insomnia, and hallucinations [14].

As shown in Figure 1, these hazardous psychoactive designer drugs have I, II, and III basic skeletons. Type I structure is phenylethylamine-(PEA-) related compounds with structural similarities to both amphetamine and the psychedelic PEA, mescaline. Type II structure is tryptamines-(T-) related compounds with structural similarities to hallucinogenic psilocin. Type III structure is phenylpiperazine-(PP-) related compounds with structural similarities to stimulus effects 1-(3-trifluoromethylphenyl)piperazine (TFMPP).

Most of the best known research chemicals are structural analogues of tryptamines or phenethylamines, but there are also many other completely unrelated chemicals which can be considered as part of the group. It is very difficult to determine psychoactivity or other pharmaceutical properties of these compounds based strictly upon structural examination. Many of the substances have common effects whilst structurally different and vice versa. As a result of no real official naming for some of these compounds, as well as regional naming, this can all lead to (and is anecdotally known to have led to) potentially hazardous mix-ups for users.

In order to prevent damage resulting from drug abuse, it is necessary to analyze the active ingredients, publicize the risks of these compounds, and, if illegal, quickly act to regulate them. For that, the library required the screening of these compounds, while there were a few of data for gas chromatography-mass spectrometry (GC/MS) [57] and liquid chromatography-mass spectrometry (LC/MS) [8, 9] and there was little data on hazardous psychoactive designer drugs. Also, there were few library systems for the liquid chromatography with photodiode array spectrophotometry (LC/PDA) [10, 11]. In order to eliminate the impact of the dead volume of the chromatographic system, the capacity factor (𝑘) was calculated according to 𝑘=𝑡𝑟𝑡0𝑡0,(1) where 𝑡𝑟 is the retention time of the designer drugs, and 𝑡0 is column void volume time of a nonretained compound or the dead time. It is known that the capacity factor (𝑘) of a substance is related to the partition process, adsorption process, or both.

Using chemometrics tools to predict drugs and chemical tissue distribution, membrane permeability or biphasic system partition is of major importance in physicochemical, environmental, and life sciences. Chemical distribution phenomena depend not only on molecular structure but also on the properties of the system in question [12]. Quantitative structure-retention relationship (QSRR) techniques based on different molecular descriptors have been successfully used to model organic chemicals properties [13]. A number of reports, deal with QSRR calculation of several compounds, have been published in the literature [1416]. The QSRR models apply to partial least squares (PLSs) methods often combined with genetic algorithms (GAs) for feature selection [1719]. Because of the complexity of relationships between the property of molecules and structures, nonlinear models are also used to model the structure-property relationships. Levenberg-Marquardt artificial neural network (L-M ANN) is nonparametric nonlinear modeling technique that has attracted increasing interest. In the present study, GA-PLS and L-M ANN were employed to generate QSRR models that correlate the structure of hazardous psychoactive designer drugs, with observed 𝑘. This is the first research on the QSRR of the designer drugs using the GA-PLS and L-M ANN.

2. Computational

2.1. Data Set

Capacity factor (𝑘) of 104 hazardous psychoactive designer drugs taken from the literature [20] is presented in Table 1. There are 51 types of PEA compounds with a type I structure, where these have a PEA skeleton such as 3,4,5-trimethoxyamphetamine (TMA), 2,5-dimethoxy-4-ethylthiophenethylamine (2C-T-2), 2,5-dimethoxy-4-propylthiophenethylamine (2C-T-7), and 4-bromo-2,5-dimethoxyphenethylamine (2C-B). There are 32 types of T compounds with a type II structure, where these have a T skeleton such as bis(methylethyl)[2-(5-methoxyindol-3-yl)ethyl]amine (5-MeO-DIPT) and 1-indol-3-ylprop-2-ylamine (AMT), and there are 21 types of PP compounds with a type III structure, where these have a piperazine skeleton such as 1-(3-chlorophenyl)piperazine (3CPP), 1-(4-methoxyphenyl)piperazine (4MPP), and TFMPP. These are tested using LC/PDA and GC/electron ionization (EI)/MS, and a library is created based on the analysis data obtained. The data registered into the library consisted of the capacity factor (𝑘) ratio of each drug with the internal standard (IS), the ultraviolet (UV) spectrum, and the MS data.

2.2. Computer Hardware and Software

All calculations were run on an HP laptop computer with an AMD Turion64X2 processor and a Windows XP operating system. The optimizations of molecular structures were done by HyperChem 7.0, and descriptors were calculated by Dragon Version 3.0 software. Cross-validation, GA-PLS, L-M ANN, and other calculations were performed in the MATLAB (Version 7, MathWorks, Inc.) environment.

2.3. Molecular Modeling and Theoretical Molecular Descriptors

The derivation of theoretical molecular descriptors proceeds from the chemical structure of the compounds. In order to calculate the theoretical descriptors, molecular structures were constructed with the aid of HyperChem version 7.0. The final geometries were obtained with the semiempirical AM1 method in HyperChem program. The molecular structures were optimized using Fletcher-Reeves algorithm until the root mean square gradient was 0.01 kcal mol−1. The resulted geometry was transferred into Dragon program, to calculate 1497 descriptors, which was developed by Todeschini et al. [21].

2.4. Genetic Algorithm for Descriptor Selection

To select the most relevant descriptors with GA, the evolution of the population was simulated [2224]. Each individual of the population, defined by a chromosome of binary values, represented a subset of descriptors. The number of the genes at each chromosome was equal to the number of the descriptors. The population of the first generation was selected randomly. A gene was given the value of one, if its corresponding descriptor was included in the subset; otherwise, it was given the value of zero. The number of the genes with the value of one was kept relatively low to have a small subset of descriptors [25] that is the probability of generating zero for a gene was set greater. The operators used here were crossover and mutation. The application probability of these operators was varied linearly with a generation renewal. For a typical run, the evolution of the generation was stopped, when 90% of the generations had taken the same fitness. In this paper, size of the population is 30 chromosomes, the probability of initial variable selection is 5 : 𝑉 (V is the number of independent variables), crossover is multipoint, the probability of crossover is 0.5, mutation is multipoint, the probability of mutation is 0.01, and the number of evolution generations is 1000. For each set of data, 3000 runs were performed.

2.5. Nonlinear Model
2.5.1. Artificial Neural Network

An artificial neural network (ANN) with a layered structure is a mathematical system that stimulates biological neural network, consisting of computing units named neurons and connections between neurons named synapses [2628]. All feed-forward ANN used in this paper are three-layer networks. Each neuron in any layer is fully connected with the neurons of a succeeding layer. Figure 2 shows an example of the architecture of such ANN. The Levenberg-Marquardt backpropagation algorithm was used for ANN training, and the linear functions were used as the transformation functions in hidden and output layers.

3. Results and Discussion

3.1. Linear Model
3.1.1. Results of the GA-PLS Model

To reduce the original pool of descriptors to an appropriate size, the objective descriptor reduction was performed using various criteria. Reducing the pool of descriptors eliminates those descriptors which contribute either no information or whose information content is redundant with other descriptors present in the pool. These descriptors were employed to generate the models with the GA-PLS program. The best model is selected on the basis of the highest square correlation coefficient leave-group-out cross-validation (𝑅2), the least root mean squares error (RMSE), and relative error (RE) of prediction. These parameters are probably the most popular measure of how well a model fits the data. The best GA-PLS model contains 11 selected descriptors in 5 latent variables space. The 𝑅2, mean RE, and RMSE for training and test sets were (0.856, 0.806), (14.11, 20.37), and (0.28, 0.41), respectively. For this in general, the number of components (latent variables) is less than the number of independent variables in PLS analysis. The predicted values of 𝑘 are plotted against the experimental values for training and test sets in Figure 3. For this in general, the number of components (latent variables) is less than the number of independent variables in PLS analysis. The PLS model uses higher number of descriptors that allow the model to extract better structural information from descriptors to result in a lower prediction error.

3.2. Nonlinear Model
3.2.1. Results of the L-M ANN Model

The networks were generated using the eleven descriptors appearing in the GA-PLS models as their inputs and 𝑘 as their output. For ANN generation, data set was separated into three groups: calibration, prediction, and test sets. A three-layer network with a sigmoid transfer function was designed for each ANN. Before training the networks, the input and output values were normalized between −1 and 1. The network was then trained using the training set by the back-propagation strategy for optimization of the weights and bias values. The proper number of nodes in the hidden layer was determined by training the network with different number of nodes in the hidden layer. The root-mean-square error (RMSE) value measures how good the outputs are in comparison with the target values. It should be noted that for evaluating the overfitting, the training of the network for the prediction of 𝑘 must stop when the RMSE of the prediction set begins to increase while RMSE of calibration set continues to decrease. Therefore, training of the network was stopped when overtraining began. All of the above-mentioned steps were carried out using basic backpropagation, conjugate gradient and Levenberg Marquardt weight update functions. It was realized that the RMSE for the training and test sets is minimum when seven neurons were selected in the hidden layer. Finally, the number of iterations was optimized with the optimum values for the variables. The 𝑅2, mean RE, and RMSE for calibration, prediction, and test sets were (0.923, 0.912, 0.880), (8.04, 9.61, 12.08), and (0.16, 0.18, 0.21), respectively. The statistical parameters 𝑅2, RE, and RMSE were obtained for proposed models. Inspection of the results reveals a higher 𝑅2 and lowers other values parameter for the training, test and prediction sets compared with their counterparts for GA-PLS. Plots of predicted 𝑘 versus experimental 𝑘 values by L-M ANN for training and test sets are shown Figures 4(a) and 4(b). Obviously, there is a close agreement between the experimental and predicted 𝑘 and the data represent a very low scattering around a straight line with respective slope and intercept close to one and zero. This clearly shows the strength of L-M ANN as a nonlinear feature selection method. The key strength of L-M ANN is their ability to allow for flexible mapping of the selected features by manipulating their functional dependence implicitly. The whole of these data clearly displays a significant improvement of the QSRR model consequent to nonlinear statistical treatment.

3.3. Model Validation and Statistical Parameters

The applied internal (leave-group-out cross-validation (LGO-CV)) and external (test set) validation methods were used for the predictive power of models. In the leave-group-out procedure one compound was removed from the data set, and the model was trained with the remaining compounds and used to predict the discarded compound. The process was repeated for each compound in the data set. The predictive power of the models developed on the selected training set is estimated on the predicted values of test set chemicals. The data set should be divided into three new subdatasets, one for calibration and prediction (training), and the other one for testing. The calibration set was used for model generation. The prediction set that was applied deal with overfitting of the network, whereas test set whose molecules have no role in model building was used for the evaluation of the predictive ability of the models for external set.

On the other hand, by means of training set, the best model is found, and then, the prediction power of it is checked by test set, as an external data set. In this work, from all 104 components, 63 components are in calibration set, 21 components are in prediction set and 20 components are in test set.

The result clearly displays a significant improvement of the QSRR model consequent to nonlinear statistical treatment and a substantial independence of model prediction from the structure of the test molecule. In the above analysis, the descriptive power of a given model has been measured by its ability to predict partition of unknown designer drugs.

For the constructed models, some general statistical parameters were selected to evaluate the predictive ability of the models for 𝑘 values. In this case, the predicted 𝑘 of each sample in prediction step was compared with the experimental acidity constant. Root mean square error (RMSE) is a measurement of the average difference between predicted and experimental values, at the prediction step. RMSE can be interpreted as the average prediction error, expressed in the same units as the original response values. The RMSE was obtained by the following formula:1RMSE=𝑛𝑛𝑖=1𝑦𝑖𝑦𝑖21/2.(2)

The second statistical parameter was relative error (RE) that shows the predictive ability of each component and is calculated as1RE(%)=100×𝑛𝑛𝑖=1𝑦𝑖𝑦𝑖𝑦𝑖.(3) The predictive ability was evaluated by the square of the correlation coefficient (𝑅2) which is based on the prediction error sum of squares and was calculated by the following equation:𝑅2=𝑛𝑖=1𝑦𝑖𝑦𝑛𝑖=1𝑦𝑖𝑦,(4) where 𝑦𝑖 is the experimental 𝑘 in the sample 𝑖, 𝑦𝑖 represented the predicted 𝑘 in the sample 𝑖, 𝑦 is the mean of experimental 𝑘 in the prediction set, and 𝑛 is the total number of samples used in the test set.

The main aim of the present work was to assess the performances of GA-PLS and L-M ANN for modeling the capacity factor of hazardous psychoactive designer drugs. The procedures of modeling including descriptor generation, splitting of the data, variable selection, and validation were the same as those performed for modeling of the 𝑘 of psychoactive designer drugs.

4. Conclusion

The QSRR model provides significant additional insight into the relationship between the molecular structure and fundamental processes and phenomena in chemistry. In this study, the GA-PLS and L-M ANN modeling were applied for the prediction of the capacity factor of hazardous psychoactive designer drugs. Two methods seemed to be useful, although a comparison between methods revealed the slight superiority of the L-M ANN to GA-PLS. High-correlation coefficients training (calibration and prediction) and test sets and low prediction errors confirmed the good predictability of two models for new compounds with very low prediction error (specially for L-M ANN). It is easy to notice that there was a good prospect for the L-M ANN application in the QSRR modeling.