Abstract

Malaria has been one of the most significant public health problems for centuries. QSAR modeling of the antimalarial activity and blood-to-plasma concentration ratio of Chloroquine and a new series of 4-aminoquinoline derivatives were developed using genetic algorithms with multiple linear regression (GA-MLR) method. We obtained two different models against Chloroquine-sensitive (3D7) and Chloroquine-resistant (W2) strains of Plasmodium falciparum with good adjustment levels. Drug distribution in blood, defined as drug blood-to-plasma concentration ratio ( ), is related to molecular descriptors. Leave-many-out (LMO) and Y-randomization methods confirmed the models' robustness.

1. Introduction

With approximately 243 million cases and 863,000 attributed deaths reported globally in 2009 [1], malaria is one of the most severe infectious diseases; primarily affecting the world’s most disadvantaged populations. Chloroquine (CQ), a low-cost drug, is widely used for antimalarial agents. However, the emergence of CQ-resistant malarial parasite strains has prompted the search for alternative strategies to combat the disease. The clones of P. falciparum were used most often for in vitro testing of the antimalarial activity on different strains, among which are Chloroquine-sensitive (NF54, NF54/64, 3D7, D6, F32, D10, HB3, FCC1-HN, Ghana), Chloroquine-resistant (FcB1, W2, FCM29, BHz26/86, Dd2, EN36, ENT30, FCR3, FCR-3/A2), and/or multidrug-resistant (K1, TM91C235) strains, to find effective compounds against resistant malaria. Some recent QSAR models and reviews are reported on antimalarial compounds [210].

Blood-to-plasma concentration ratio ( defined as ) is a measure of the drug distribution within the blood. Drugs, when reaching the blood stream, can bind to plasma proteins and/or to blood cells. If a drug binding in plasma exceeds its binding in blood cells, values are below 1 ( ). When drug binding in blood cells exceeds its plasma binding ( ), then values are above 1. Red blood cells (RBCs) are the host cells for malaria parasites, and any effect of the drug on red cell membranes might be relevant for its in vivo effects [11]. may be an important parameter in drug potency and is therefore worthy of investigation. It is related to either the volume of distribution or clearance of the drug. Even though the determination of is relatively simple, such data is absent in most pharmacokinetic studies [12, 13].

The objective of this study was first to develop QSAR models and to explain the antimalarial activity of a new series of 4-aminoquinoline, structurally related to CQ, against P. falciparum various clones (3D7, W2) in vitro using theoretical molecular descriptors. Second, the aim was to establish regression models to predict the blood-to-plasma concentration ratio ( ) using mainly in silico molecular descriptors.

2. Materials and Methods

2.1. Software

A Pentium IV personal computer (CPU, 3.2 GHz) under Windows XP operating system was used. Molecular modeling and geometry optimization were employed by Hyperchem [14]. Dragon software [15] was employed for calculation of theoretical molecular descriptors. SPSS software [16] was used for MLR analysis. Other statistics calculations were also performed in the MATLAB [17] environment.

2.2. Ensemble Data and Molecular Descriptors

We used a series of 4-aminoquinoline antimalarial compounds with experimentally determined ADME properties, taken from the Ray et al. paper [18]. Based on the results of their research group, antimalarial compounds effective against drug-resistant strains of P. falciparum by varying the chemical substitutions around the heterocyclic ring and the basic amine side chain of the popular antimalarial drug chloroquine have developed [19, 20]. Recently, they have screened a panel of these novel antimalarial compounds for improved leads based on the evaluated ADMET properties [18]. Each compound in the studied database was characterized by growth inhibition of 3D7 and W2 strains of P. falciparum, blood-to-plasma concentration ratio. Figure 1 depicts all the structures used in this study. The panel includes a small number of CQ analogues with altered substitutions on the quinoline ring, although the majority of the compounds in the panel contain substitutions of the alkyl groups attached to the basic nitrogen position on the aminoalkyl side of the chain. Two data sets of log IC50 (each compound at 1 and 10 μM concentration) were used for the QSAR studies. The activity data has been given as IC50 (nM) values, referring to growth inhibition of chloroquine derivatives uptake into drug-resistant (W2) and drug-sensitive (3D7) strains of P. falciparum. The experimental values of these antimalarial activities are shown in Table 1. The red blood cells (RBCs) to plasma partition ratio ( defined as ) were measured for each compound at 1 and 10 μM. The values were normalized by transforming them to the logarithm of drug concentration in blood cells to plasma ratio ( ). The values are summarized in Table 2. The molecular structures of all the Chloroquine derivatives were built with Hyperchem (Version 7, HyperCube, Inc.) software. AM1 semiempirical calculation was used to optimize the 3D geometry of the molecules. The Polak-Ribier algorithm with root mean squares gradient 0.1 kcal/mol was selected for optimization. By using DRAGON [15] we derived a total of 1481 1D, 2D, and 3D molecular descriptors from the 3D structure of each compound.

To decrease the redundancy existed in the descriptors data matrix, the correlation of descriptors with each other and with the properties of the drugs was examined, and collinear descriptors (i.e., ) were detected. Among the collinear descriptors, one with the highest correlation with activity was retained, and the others were removed from the data matrix.

The list and meaning of the molecular descriptors is provided by the DRAGON package, and the calculation procedure is explained in detail, with related literature references, in the Handbook of Molecular Descriptors [21].

2.3. MLR Modeling Procedure

Multiple Linear Regression (MLR) which demonstrates great ease of implementation along with the interpretability of resulting equations were the statistical method of choice for building the QSAR model. The forward-stepping variant of MLR was utilized, starting with the selection of a single variable which contributes most to the model based on its highest -statistics or lowest value. At each step, MLR alters the model from the previous step by adding predictor variables and terminating the search when a statistically significant model has been obtained [22, 23]. QSAR Modeling [24] is free JAVA-based software developed by the courtesy of the Theoretical and Applied Chemometrics Laboratory’s research group. Genetic algorithm (GA) search was carried out exploring MLR models. The GA used was the same as that previously used [25, 26].

3. Results and Discussions

3.1. The Selected Descriptors

The majority of the selected descriptors in our GA-MLR modeling are composite descriptors, which can be divided into five groups: GETAWAY, 3D-MoRSE, RDF, WHIM, and 2D autocorrelations descriptors. The GETAWAY (Geometry, Topology, and Atom Weights AssemblY) try to match the 3D molecular geometry provided by the molecular influence matrix and atom relatedness by topology with chemical information by using various atomic weighting schemes. 3D-MoRSE descriptors, which are representations of the 3D structure of a molecule and encode features such as molecular weight, van der Waals volume, electronegativities, and polarizabilities. The radial distribution function (RDF) descriptors are based on the distance distribution of the compounds. WHIM descriptors are based on statistical indices calculated on the projections of atoms along principal axes. 2D autocorrelations descriptors, in general, explain how the considered property is distributed along the topological structure. Three spatial autocorrelation vectors including unweighted and weighted Moran and Geary and Broto-Moreau autocorrelation vectors were calculated. The physicochemical property was considered in atomic masses (m), atomic van der Waals volumes (v), atomic Sanderson electronegativities (e), and atomic polarizabilities (p) as weighting properties [21]. Table 3 depicts the names and meanings of the molecular descriptors used in this work.

Tables 4 and 5 show the data of the descriptors used in this study. The correlation matrixes of the descriptors used in this study are given in Tables 6, 7, 8, and 9. Inspection of these results shows that all the values deviate from unity are noticeable so there is no significant correlation between the independent variables.

3.2. Validation of the Models

A good fit was assessed based on the determination squared correlation coefficients ( ), adjusted determination coefficient ( ), standard deviation (s), root-mean-square error (RMSE), Fisher’s statistic (F) and number of variables. Most of the QSAR modeling methods implement the leave-one-out (LOO) or leave-many-out (LMO) cross-validation procedure, which are internal validation techniques [27]. LOO cross-validation procedure consists of removing one data point from the training set and constructing the model only on the basis of the remaining training data and then testing on the removed point. LMO cross-validation procedure calculate the models leaving multiple observations out at a time, reducing the number of times it has to recalculate a model. The outcome from the cross-validation procedure is cross-validated (LOO- or LMO- ), which is used as a criterion of both robustness and predictive ability of the model. In this paper, we have performed the LOO cross-validation and leave-5-out cross-validation method as the internal validation tool. The robustness of the model was examined by the Y-randomization test [28]. For the Y-randomization test, performed ten times, ≤ 0.3 and ≤ 0.05 for all results were considered acceptable. These limits were selected based on Eriksson and coworkers’ suggestions [28]. The Y-randomization test is capable of verifying if models with high values of and present chance correlation [29, 30].

In order to make more realistic validation of the predictive power of the models, external validation was also performed. For that purpose, six Chloroquine derivatives (3, 6, 8, 15, 18 and 19) were selected from 21 compounds at random to construct the external test set, and the remaining 15 Chloroquine derivatives comprised the training set that was employed to calibrate the QSAR models.

3.3. QSAR Models for 2D7 and W2 Strains

By using the best multilinear regression method equations for both antimalarial activities against Chloroquine-sensitive (3D7) and Chloroquine-resistant (W2) strains of P. falciparum were constructed with up to five descriptors. The predicted log values and the residuals for the compounds are listed in Table 1. QSAR models generated for the two strains (3D7, W2) are shown in Table 10. These models have good capacity to explain the observed values of biological activity because it possesses excellent adjustment level: high correlation coefficient and low root-mean-square error ( = 0.94, = 0.92 and RMSE = 0.14 for 3D7 strain and = 0.94, = 0.91, and RMSE = 0.16 for W2 strain). To validate the selected prediction function, a cross-validation, and an external test were carried out. The models also have good predictive capacity ( = 0.86 for the both strains). In general, MLR models were able to explain data variance and were quite stable to the inclusion-exclusion of compounds as measured by LOO correlation coefficients ( > 0.5). Also, the results of the LMO test are collected in Table 4. From a theoretically acceptable model the cannot have smaller values than and or . Overall, the best model is achieved when and . Y-randomization results are in agreement with the suggested limits [28]. This indicates that the explained variance by the model is not due to chance correlation. Y-randomization results are shown in Figures 2 and 3. Each of related training set equations and statistical parameters is summarized in Table 11. In turn, plots of LOO cross-validation and test set predictions versus experimental log IC50 values (for 3D7 and W2 strains) for the MLR models are shown in Figure 4.

3.4. QSAR Model for Blood-to-Plasma Concentration Ratio

The best linear models consisted of the five descriptors in order to relate them to the log values tabulated in Table 12. The predicted values and the residuals for the compounds are listed in Table 2. As can be seen, the MLR models have good statistical quality with low prediction error. The models obtained were validated by calculating the cross-validated values obtained using the LOO cross-validation method. This is the measure of the predictive power of regression equations. The values for the best regression models for log were suggestive of robust models. The results of the LMO test are collected in Table 3. On average, the overall test steps and which is another proof that the model is not underdetermined. The model was further validated by applying the Y-randomization. Several random shuffles of the Y vector were performed. Y-randomization results are in agreement with the suggested limits [28]. Y-randomization results are shown in Figures 5 and 6. The prediction ability of the MLR models were also tested using the validation set of data (Table 13). The correlations between the predicted and experimental values of (from LOO cross-validation and external test) are shown in Figure 7.

4. Conclusions

A quantitative structure-activity relationship (QSAR) study was applied to the series of 4-aminoquinoline antimalarial compounds potentially active against the 3D7 and W2 strains of P. falciparum. For each strain, statistically significant models were obtained using the GA-based MLR method. These models may be considered as mathematical equations for the prediction of antimalarial activities of the compounds structurally similar to those used in this study. Models based on GA-MLR were developed to predict the blood-to-plasma concentration ratio of the analogues based on selected molecular descriptors. The predictive ability of the test and its validation set were confirmed by the models. The LOO and LMO cross-validation methods, the Y-randomization technique, and the external validation indicated that the model is significant, robust, and has good internal and external predictability. The use of these models may be an important tool in early drug discovery by providing a relevant pharmacokinetic parameter.

Acknowledgments

The authors thank the Young Researchers Club, Hamedan Branch of Islamic Azad University for financial support. The authors wish to thank Professor E. B. de Melo and Dr. R. Ghavami for their precious help on this work. Anonymous reviewers are gratefully acknowledged for their helpful suggestions that have led to improving the paper.