Journal of Chemistry

Journal of Chemistry / 2013 / Article

Research Article | Open Access

Volume 2013 |Article ID 154629 |

Shreekant Deshpande, Mohammad Goodarzi, Seturam B. Katti, Yenamandra S. Prabhakar, "Topological Features in Profiling the Antimalarial Activity Landscape of Anilinoquinolines: A Multipronged QSAR Study", Journal of Chemistry, vol. 2013, Article ID 154629, 14 pages, 2013.

Topological Features in Profiling the Antimalarial Activity Landscape of Anilinoquinolines: A Multipronged QSAR Study

Academic Editor: Marjana Novic
Received27 Jun 2012
Revised30 Aug 2012
Accepted30 Aug 2012
Published18 Nov 2012


The antimalarial activity of a series of 4-anilinoquinolines was modeled with topological and other functional descriptors using feature selection approaches CP-MLR and GA. Five models were identified from each approach to explain the activity of the compounds. They jointly shared eighteen descriptors. Among them five descriptors, namely, H-052, MATS4m, MATS7e, Mor30p, and R7m, were common to both approaches. In PLS analysis the eighteen descriptors have led to a three-component model (, , ). and the common descriptors were among the most influential ones to modulate the activity. Among them, MATS7e indicated the favorability of nonlinear and branched molecular topology for higher activity. MATS4m has also advocated in favor of branching/nonlinearity in the molecule for the activity. The H-052 argued that R'CH2-CHX-CH2R fragments (X is halogen) in the scaffold enhance the activity. In BP-ANN these descriptors led to very good predictive models (training validation ; test ). The study has offered direction to understand the patterns of the antimalarial activity of anilinoquinolines for exploring potential prototype compounds.

1. Introduction

Malaria is a vector-born parasitic infection (vector: female mosquitoes of the Anopheles genus; parasite: protozoa, genus Plasmodium) of the tropical regions with serious health and economic implications. The interventional measures of the last decade have resulted in the form of some relief to the incidents of deaths due to malaria. However, these efforts did not decline the manifestation of drug resistance cases [1]. In fact, until the recognition of drug-resistant strains of Plasmodium falciparum, the treatment of malaria has heavily relied on chloroquine as first-line drug [2, 3]. Also, in clinical practice chloroquine suffers due to several limitations/side effects which include gastrointestinal, stomach, neural, and blurring of vision [4, 5]. The mechanistic investigations on the antimalarial activity of this (quinoline) class have indicated that chloroquine and other analogues follow similar pathway in the expression of the activity [6, 7]. The drug resistance of parasite is compound centric and not due to altered mechanism of action [810]. This has renewed the research interest to explore alternative quinolines as potential antimalarial agents. Moreover, existence of large preclinical and clinical information and low cost/ease of preparation of alternative/new drugs or drug-like molecules encouraged the researchers to venture into this chemical class [1116].

In quinoline class of compounds, amodiaquine (Figure 1) is a clinically practiced antimalarial agent [17]. The chloroquine-resistant Plasmodium parasites are not automatically cross-resistant to it [18]. However, amodiaquine is reported to cause agranulocytosis and hepatitis [19, 20]. The side effects are attributed to the 4-hydroxyanilino moiety of amodiaquine. In biological system it undergoes enzymatic oxidation to the quinoneimine form and makes nucleophilic addition to proteins [21, 22]. In this scenario, to overcome the undesirable side effects of amodiaquine, different 7-chloro-4-(3′,5′-disubstituted anilino)quinolines were explored as alternative antimalarial agents [2326]. These compounds structurally resemble amodiaquine but are devoid of amodiaquine’s 4-hydroxyl on the anilino moiety which is attributed for the side effects.

In medicinal chemistry paradigm the rational drug design approaches, which include quantitative structure-activity relationship (QSAR) and molecular modeling protocols, cull out structural and functional information of chemical entities desirable for biological response. This may come handy to modulate/design the biological response of intended compounds. Here structure-activity elucidation of the compounds is attempted taking into account the correlation between the chemical structure space indices and their biological response landscape. The earlier QSAR study [27] on some 7-chloro-4-(3′,5′-disubstituted anilino)quinolines, involving 2D molecular features, has denoted that 3′- and 5′-substituents of the anilino moiety map different domains with substructure preferences in the activity space. It also gave indication in favor of the electron rich centers in the aniline substituent groups for better antimalarial activity. In this background, the QSAR analysis of the antimalarial activity of an enlarged dataset of 4-anilinoquinolines has been undertaken with a perspective to broaden the structural information relevant to the activity space. The results are presented hereunder.

2. Materials and Methods

2.1. Chemical Structure Database and Biological Activity

A dataset of 90 anilinoquinolines (Figure 1(b)) along with their antimalarial activity (IC50, inhibitory concentration or dose in micromoles of compound to reduce 50% FcB1R strain of P. falciparum) reported in the literature was considered for this study [2326]. The substitution positions in these compounds are briefly summarized in Table 1. The antimalarial activity (IC50) of all these compounds was reported using the same experimental protocol [2326]. The compounds exhibited good variation (~2.6 orders) in their antimalarial activity. For the purpose of modeling study the activity has been transformed in the form of logarithm of inverse of inhibitory concentration and expressed as pIC50 (Table 2).

Comp no.

2–19–NHC(O)– –NRR′–CH2OHH
20–31–NHC(O)–CH2–N(CH2)5–CH2–OC(O)– –R′′H
68–73–NHC(O)–CH2–N(CH2)5–CH2–OC(O)– –R′′H
74–80–NHC(O)–CH2–N(CH2)5–CH2–OC(O)–NH– –R′′H

aThe NRR′ and R′′ in R1/R2 groups represent the extended structural moieties (functionalized alkyl, aryl, and other functional units) attached to (in place of) them.

Eq. Eq. AlldEq. dEq. Com

AQ16(Tr)1NHCH2C6H4Cl (para)6.076.416.196.436.516.216.07
AQ36(T) NHCH2CH2-pyrrolidine7.997.958.128.027.948.048.05
AQ37(Tr) NHCH2CH2-piperidine8.187.968.017.997.748.008.04
AQ38(V) NHCH2CH2-morpholine8.107.557.797.777.817.957.78
AQ59(Tr)NHCH2C6H4Cl (para)8.348.368.578.468.318.358.19
AQ60(V)NHCH2C6H4OMe (para)8.367.948.
AQ61(T)NHCH2C6H4CF3 (para)8.267.848.107.978.257.917.79

aCompounds: 1–61, [23]; 62–67, [24]; 68–74, [25]; 75–90, [26]. The training, validation, and test sets are specified by suffixing the compound numbers with (Tr), (V), and (T), respectively.
bSee Table 1 for positioning of the NRR′/R′′ variations in the compounds.
cCP, GA, PLS, and BP-ANN represent CP-MLR, GA PLS, and BP-ANN regressions; Eq. and Eq. , respectively, correspond to Eq. and Eq. of Table 3; “All” corresponds to 18-descriptor PLS regression model (Table 5); dEq. , dEq. , and “com” represent BP-ANN models from the descriptors of Eq. , Eq. (Table 3), and common feature of CP-MLR and GA (Table 4), respectively.

The structure database of the compounds under investigation has been generated using the X-ray crystal structure of amodiaquine (Figure 1(a)) [28] to impart 3D characteristics to the chemical space of the agents. Accordingly, in SYBYL [29] by making use of the procedure implemented therein the 3D structures of the compounds (Figure 1(b)) were generated from the X-ray crystal structure of amodiaquine (Figure 1(a)). In Dragon software [30] these conformations have resulted in 490 and 686 descriptors, respectively, to profile the 0D to 2D and 3D characteristics of the molecules. Prior to the QSAR study, all those descriptors showing a correlation of less than 0.1 with the dependent variable (descriptor versus activity ) and descriptors showing intercorrelation greater than or equal to 0.9 () were excluded. This has reduced the 0D to 2D and 3D descriptors to 101 and 131, respectively, for correlating with the activity.

For QSAR study, using the fingerprints of BIT-packed version of Molecular ACCess System (FP-BIT-MACCS) of the compounds, the dataset was divided into two mutually exclusive groups as training and test sets. The concepts of molecular finger-prints were originally introduced by Molecular Design Limited, Inc (MDL) as a part of informatics services to the life sciences and chemical industry [31]. In molecular operating environment (MOE) software [32], the cluster analysis the MACCS fingerprints of the compounds was carried out at 85% similarity to segregate them (compounds) into training and test sets. All compounds were arbitrarily put into training set (50 compounds) and test set (40 compounds) in such a way that members of the clusters were distributed in both the sets. Furthermore, to facilitate the comparison of significance of descriptors with one another in the derived models, all the descriptor values are scaled between “0” and “1” (inclusive of both values). For this the original descriptor values have been scaled using the following transformation: where , and are the training set feature ’s original, minimum, maximum, and transformed descriptor values, respectively.

The significance of obtained molecular features in explaining the antimalarial activity of the compounds has been investigated using the combinatorial protocol in multiple linear regression (CP-MLR) [33], genetic algorithm (GA) [34], partial least squares (PLS) [35, 36], and artificial neural networks (ANN) [37] methods. Only the training set compounds were used for deriving the models and the test set compounds were used for the external validation of the derived models. Purposefully a large test set was created to facilitate a follow-up study of derived models in back-propagation artificial neural networks (BP-ANN). The modeling procedures and the computations are briefly described below.

2.2. CP-MLR

Combinatorial protocol in multiple linear regression (CP-MLR) is filter-based variable selection procedure [33]. The procedural aspects are discussed in some of the recent publications [38, 39]. This operates through a combinatorial seeding strategy followed by predefined filters to assess the significance of seeds and finally employs MLR to develop the models from the significant seeds. A unique combination of descriptors (variables) is referred to as a seed. Here, filter-2 controls the seeds through -values (default -value ≥ 2.0) of the coefficients of individual descriptors of the seed in regression; filter-2 controls the seeds through -values of variables’ coefficients in regression which is set as greater than or equal to two; filter-3 provides a comparison of seeds in different equations in terms of square root of adjusted multiple correlation coefficient of the regression, -bar; filter-4 estimates the consistency of the equation in terms of cross-validated or with leave-one-out (LOO) cross-validation as default option (). In CP-MLR, for the selection of features from datasets the initial threshold of filter-1 was assigned as 0.3 and subsequently liberated to 0.79 to boost the formation of different seeds. The search was started with two-variable seeds and with an initial filter-3 value of 0.74. The information rich descriptors were collected by successively incrementing the number of variables per seed as well as the threshold of filter-3 to the optimum -bar value of the preceding generation.

2.3. GA

The genetic algorithm variable subset selection (GA-VSS) routine as implemented in MOBY DIGS [40, 41] was used for the selection of GA features. It has proceeded with an initial population of one hundred solutions (chromosomes) with maximum allowed variables in a solution as five. The fitness for each chromosome was calculated based on leave-one-out (LOO) cross-validation (). The reproduction/mutation trade-off () value was set to 0.5. Based on the value, the crossover and mutation values of GA were automatically fixed in situ in the computation. The optimum solutions were identified at the end of one hundred generations of GA evolution process (selection, crossover, and mutation).

The models emerged from the CP-MLR and GA approaches are further regressed for the chance correlation through one hundred simulation runs with repeated randomization of biological response [42, 43]. The correlation coefficients of simulated regressions have been used to determine the average correlation coefficient of Y-randomization as well as percent chance correlation of the model under scrutiny. Also, the derived models are externally validated by predicting the activities of test set compounds which are not a part of the model generation exercise. The test set predictions are used for computing the test set -square statistics () of the model in question. Normally, models with value greater than 0.5 are treated as reliable. Finally, the descriptors identified in CP-MLR and GA have been further subjected to the partial least squares (PLS) [35, 36] analysis to present single-window QSAR models comprising all identified descriptors.

2.4. Applicability Domain

The usefulness of a model may be declared based on its ability to predict new compounds. In this context, applicability domain defines the predictive space of a model. The training set data, when projected in the model’s multivariate parameter space, demarcates the plotting regions as populated with data and empty ones. Here, the populated regions define the applicability domain of the model and indicate that the space is suitable for the predictions. Computationally, the applicability domain of the models is evaluated through the plot of standardized residuals versus leverage values () for each compound [44]. It is also known as Williams plot and is useful for the detection of both the response outliers (-outliers) and structurally influential chemicals (-outliers) in the model. In this plot, the applicability domain is determined inside squared area within times standard deviations (where may be given a value between 2 to 3) and leverage threshold () which is typically fixed at (where is the number of compounds in the training set and is the number of parameters in the model). In this plot, if a compound’s leverage value () is smaller than the , the probability of its prediction come true may be as high as that of the training set compounds. Making use of these settings, the applicability domain of the models from the CP-MLR, GA, and PLS have been scrutinized for their predictive capability.

2.5. BP-ANN

In ANN modeling a training set was used for the model generation while a validation set was applied to stop the overfitting of the network. Additionally a test set was used to verify the predictivity of the generated model. In computation, the CP-MLR/GA training set (50 compounds) was considered as such for the training the network of ANN. However, the test set (40 compounds) of the CP-MLR/GA was randomly divided into ANN’s validation (20 compounds) and test (20 compounds) sets. Coinciding with the number of descriptors in individual feature selection models, for ANN also five descriptors were considered in the input. Before training the networks, the input and output values were normalized with autoscaling of all data. The initial weights were selected randomly between (−0.3) and (0.3). Using the standard evaluation procedure with different numbers of hidden layer nodes, the optimum number of nodes for the hidden layer was assessed. The goal of training the network is to minimize the output errors by changing the weights between the layers [37]. Equation (2) gives the changes in the values of the weights in the network in the optimization of the output, as follows: In this, is the change in the weight factor for each network node, is the momentum factor, and is a weight update function, which indicates how weights are changed during the learning process. The weights of hidden layer were optimized using the second derivative optimization method, namely, Levenberg-Marquardt algorithm [45, 46].

2.6. Levenberg-Marquardt Algorithm

In this algorithm, the update function, , is calculated the following using equations: where is gradient, is the Jacobian matrix that contains first derivatives of the network errors with respect to the weights, and is a vector of network errors. The parameter is multiplied by some factor () whenever a step would result in an increased and when a step reduces is divided by .

2.7. Statistical Parameters

In training the network, the over-fitting of data was controlled by comparing the root-mean-square errors (RMSEs) of training and validation sets. It measures the goodness of the output and is useful for the comparison of the target values. The training of the network for the prediction of target value was stopped when the RMSE of the validation set began to increase while that of training set continues to decrease. The goodness of fit of activity of the test set compounds was used to further validate the developed models. The predictive ability of the constructed models was assessed using different statistical measures, namely, the training, validation, and test sets’ correlation coefficients (), and corresponding root-mean-square error of prediction (RMSEP), relative standard error of prediction (RSEP), and mean absolute error (MAE) values. More information on the statistical parameters can be found in applied statistics handbook [47]. The statistical parameters used in the study are calculated using the following equations: where is the observed activity, is the mean of observed activity values, is the predicted activity of the compound in the sample, and is the number of samples in the concerned set. The ANN computations were carried out using the MATLAB 7.6 for Windows [48].

3. Results and Discussion

The QSAR analysis of the antimalarial activity of anilinoquinolines has been carried out in CP-MLR and GA approaches using the 0D to 3D features of the molecules from Dragon software. At the end of the analysis, from each approach, five 5-parameter equations were identified as significant ones to model the activity of the compounds. The models identified from each approach are shown in Table 3. There are no common models between CP-MLR and GA approaches. However, several descriptors are common to models from both approaches. The models have predicted the activities of training and test set compounds within the reasonable limits of their actual values. Statistically, they have explained between 66% to 69% variance ( to 0.69) in the activity of training set compounds and also predicted higher than 50% variance () in the activity of test set compounds (Table 3). For selected CP-MLR and GA models the training and test set predictions are shown in Table 2.

Eq.ConstantDescriptor and scaled regression coefficient





aNumber of compounds used in regression (training set) is 50 and number of compounds used in test set is 40. The regression coefficients of the descriptors shown are from the scaled descriptor values (scaled between “0” and “1”, inclusive of both values). All regression coefficients are significant at higher than 95% confidence level. In domization study (100 simulations each time) no model has shown average greater than 0.107 and maximum greater than 0.385. For all regression equations, is the multiple correlation coefficient, and are cross-validated from leave-one-out (LOO) and leave group of five out, respectively, is the standard error of the estimate, is the F-ratio between the variances of calculated and observed activities, and is test set value.

The equations from CP-MLR have jointly shared eleven descriptors and likewise the GA equations have shared twelve descriptors (Table 4). Together, these models have led to 18 descriptors (Table 4) as information rich features to model the antimalarial activity of the compounds. All these descriptors belong to seven different classes, namely, functional groups (COOR, NR2), atom-centered fragments (H-047, H-052), 2D autocorrelations (MATS4m, MATS8m, MATS5e, MATS7e), radial distribution function (RDF085p), 3D molecule representation of structures based on electron diffraction signals (Mor15m, Mor28m, Mor17p, Mor30p), Weighted Holistic Invariant Molecular descriptors (E1m, E2m), and GEometry, Topology, and Atom-Weights AssemblY (R6m, R7m, R7m+, R6e+, RTe+) descriptors (Table 4). A brief physical meaning of these descriptors in terms of structural features is described in Table 4.

S. no.DescriptorClassaFSbInformation contentc

1MATS4m2D-AutoCGMoran autocorrelation—lag 4, weighted by atomic masses
2MATS8mGMoran autocorrelation—lag 8, weighted by atomic masses
3MATS5eGMoran autocorrelation—lag 5, weighted by atomic Sanderson electronegativities
4MATS7eCGMoran autocorrelation—lag 7, weighted by atomic Sanderson electronegativities
5nNR2FunctGNumber of tertiary amines (aliphatic)
6H-047ACFGH attached to C1(sp3), C0(sp2)
7H-052CGH attached to C0(sp3) with 1X attached to next C
8RDF085pRDFCRadial distribution function—8.5, weighted by atomic polarizabilities
9Mor15m3D-MorseG3D-MoRSE—signal 15, weighted by atomic masses
10Mor17pG3D-MoRSE—signal 17, weighted by atomic polarizabilities
11Mor30pCG3D-MoRSE—signal 30, weighted by atomic polarizabilities
12E1mWHIMC1st component accessibility directional WHIM index, weighted by atomic masses
13E2mC2nd component accessibility directional WHIM index, weighted by atomic masses
14R6mGETAWAYCR autocorrelation of lag 6, weighted by atomic masses
15R7mCGR autocorrelation of lag 7, weighted by atomic masses
16R7m+CR maximal autocorrelation of lag 7, weighted by atomic masses
17R6e+GR maximal autocorrelation of lag 6, weighted by atomic Sanderson electronegativities
18RTe+CR maximal index, weighted by atomic Sanderson electronegativities

aDescriptor class: 2D-Auto: 2D autocorrelations; Funct: Functional; ACF: atom-centered fragments; RDF: radial distribution function; 3D-Morse: 3D molecule representation of structures based on electron diffraction signals; WHIM: Weighted Holistic Invariant Molecular descriptors (E1m, E2m); GETAWAY: GEometry, Topology, and Atom-Weights AssemblY.
bFeature selection approach involved in descriptor identification, C for CP-MLR and G for GA.
cSee [30] for more details.

Among the identified variables (Table 4), 5 descriptors (H-052, MATS4m, MATS7e, Mor30p, and R7m) are common to both CP-MLR and GA approaches. Of these, MATS7e (Moran autocorrelation of lag 7 weighted by atomic Sanderson electronegativities) has appeared in all models with negative regression coefficient (Table 3). This has pointed that molecular topology leading to a reduced autocorrelation of lag 7 weighted by atomic electronegativities improves activity. This in turn explains that nonlinear and/or branched molecular topology leads to higher activity. The descriptors H-052 (with a positive regression coefficient) and Mor30p (with a negative regression coefficient) are part of all CP-MLR models as well as present in some GA models (Table 3). The H-052 argues in favor of R′CH2–CHX–CH2R fragments (X is halogen atom) in the scaffold for the activity. Mor30p is 3D molecule representation of structure based on specific electron diffraction weighted by atomic polarizability. It describes the mutual arrangement of atoms in molecule leading to the 3D distribution of chosen property, that is, polarizability. The negative regression coefficient of Mor30p recommends typical arrangement of atoms in molecule leading to small descriptor values for high activity. The descriptors MATS4m (with a positive regression coefficient) and R7m (with a negative regression coefficient) have appeared in selected models of both CP-MLR and GA approaches (Table 3). The positive regression coefficient of MATS4m shows that small path lengths and branching in the molecule (lag 4 weighted by atomic mass) contribute to higher activity. The R7m is also a kind of autocorrelation of lag 7 weighted by atomic mass derived from the molecular leverage matrix. The negative regression coefficient of R7m argues that similar or almost similar atomic leverages (of lag 7) raise the activity (Table 3). Apart from the foregoing features, RDF085p, E1m, E2m, RTe+, R6m, and R7m+ are exclusive to the models from CP-MLR and MATS8m, MATS5e, H-047, NR2, Mor15m, Mor17p, and R6e+ are exclusive to those from GA approach.

The descriptor RDF085p (Table 3; (2)) measures the probability of finding molecular constituents in a spherical volume of radius 8.5 Å weighted by atomic polarizability. Its positive regression coefficient argues in favor of this for improvement in antimalarial activity. The descriptors E1m and E2m (Table 3; (3)–(5)) represent 3D molecular information of atomic densities along principal axes 1 and 2 weighted by atomic mass. Principal axes of a molecule are from the eigenvalues and eigenvectors of weighted covariance matrix of its centered Cartesian coordinates. They are derived from the projections of the atoms (of the molecule) along each individual principal axis and convey information related to molecular size, shape, symmetry, and atom distribution. In the regression equations (Table 3; (3)–(5)) E1m and E2m are associated with negative and positive coefficients, respectively. This argues in favor of an atomic arrangement to maximize the 2nd principle axis of the molecule for high activity. Similar to R7m, the other GETAWAY class descriptors RTe+, R7m+, and R6m (Table 3; (1), (4), (5)) are associated with negative regression coefficients. In the molecules, while RTe+ accounts for the maximal molecular leverage autocorrelation index weighted by atomic Sanderson electronegativities, R7m+ accounts for the maximal molecular leverage autocorrelation of lag 7 weighted by atomic mass. The R6m is the molecular leverage autocorrelation of lag 6 weighted by atomic mass. All these descriptors advocate similar or almost similar leverages for high activity.

Concerning the descriptors exclusive to equations from GA, the functional groups descriptor nNR2 (Table 3; (9)) accounts for number of tertiary aliphatic amines in the molecule. Its positive regression coefficient speaks in favor of tertiary aliphatic amines for high activity. The H-047 (Table 3; (7), (8)) has appeared in GA equations with positive regression coefficient. In these analogues, it argues that unsubstituted methylenes lead to activity improvement. The 2D autocorrelation descriptors MATS8m, and MATS5e (Table 3; (10)) have appeared with negative regression coefficient. Both these descriptors infer in favor of molecular topology leading to a reduced autocorrelation of lag 8 weighted by atomic mass and lag 5 weighted by atomic electronegativities for improved activity. They further illustrate that nonlinear and/or branched molecular topology increases the activity. The descriptors Mor15m (Table 3; (9); positive regression coefficient) and Mor17p (Table 3; (6), (10); negative regression coefficient), similar to Mor30p, most probably show the influence of the specific distribution of atoms in the molecule on its activity. The GETAWAY class descriptor R6e+ has appeared in (Table 3; (6) and (10)) with positive regression coefficient. It represents maximal molecular leverage autocorrelation of lag 6 weighted by atomic Sanderson electronegativities. This suggests that increasing divergence in the leverage of lag 6 contributes to higher activity.

As a followup of feature identification, PLS analysis has been carried out on the eighteen descriptors of CP-MLR and GA and the five common descriptors of both the approaches to facilitate the development of single-window structure-activity models. For PLS analysis, the descriptors have been autoscaled (zero mean and unit s.d) to give each one of them equal weight in the study. In the cross-validation procedure of the PLS analysis [35, 36], three components are found to be the optimum to explain the activity of the compounds. The PLS model from the eighteen descriptors of CP-MLR cum GA has explained 73.1% variance (, , , ) in the antimalarial activity of the training set compounds and showed a test set value 0.676. Figure 2 shows a plot of the fraction contribution of normalized regression coefficients of these descriptors to the activity. Of the eighteen descriptors, the fraction contributions of five common descriptors of both approaches are found amongst the most significant ones to modulate the activity of the compounds. Also, the PLS model from these five common descriptors of CP-MLR and GA has explained 63.8% variance (, , , ) in the antimalarial activity of the training set compounds and showed a test set value 0.510. The MLR-like PLS coefficients of these two feature sets are shown in Table 5. All descriptors have conveyed the same meaning as in the case of regression equations from CP-MLR and GA.

S. no.DescriptorMLR-like coeff (f.c)a
(CP-MLR) (GA)b(CP-MLR) (GA)c

1MATS4m12.491 (0.058)25.075 (0.143)
2MATS8m−1.965 (−0.015)
3MATS5e−0.708 (−0.027)
4MATS7e−2.605 (−0.143)−2.520 (−0.169)
5nNR20.079 (0.043)
6H-047−0.001 (−0.003)
7H-0520.049 (0.109)0.110 (0.297)
8RDF085p0.005 (0.026)
9Mor15m0.093 (0.018)
10Mor17p−0.518 (−0.112)
11Mor30p−1.395 (−0.151)−1.601 (−0.212)
12E1m−0.390 (−0.033)
13E2m0.623 (0.054)
14R6m−0.819 (−0.025)
15R7m−2.268 (−0.067)−4.961 (−0.180)
16R7m+−2.749 (−0.018)
17R6e+19.530 (0.059)
18RTe+−1.080 (−0.037)


(max)0.115 (0.374)0.089 (0.236)

oefficients of MLR-like PLS equation in terms of descriptors for their original values; f.c is fraction contribution of regression coefficient, computed from the normalized regression coefficients obtained from the autoscaled (zero mean and unit s.d) data.
bCombined descriptors of CP-MLR and GA.
cDescriptors common to CP-MLR and GA.

The predictive ability of regression models derived from the CP-MLR, GA, and PLS approaches is assessed using applicability domain (AD) analysis. The AD plots for Eq. (1) and Eq. (6) and the PLS model are shown in Figure 3. They are from the models involving all the compounds, that is, training and test sets together. In the plots, the -outliers (response outliers) limits were set to 2.5 times the standard deviation units. In the AD plot of Eq. (6) (Figure 3(b)), two test set compounds are marginally outside the allowed region. Of these two, one compound (AQ14) is response outlier (observed residual value is 1.061; limiting residual value is ±0.993) and the other compound (AQ01) is leverage outlier (observed leverage is 0.366; limiting leverage value is 0.36). Except for these minor deviations, the AD plots argue in support of the predictive power of the presented models. Also the models are free from serious or influential outliers (Figure 3).

The models discussed so far could explain up to 73% variance in the activity. Prevalence of some degree of nonlinearity in the activity in relation to the structural features is among the main reasons for this kind of situation. Often the biological activity landscape of chemical entities is far more nonlinear when compared to their physicochemical (also other properties) arena. In modeling studies artificial neural networks (ANNs) have a special place to address these situations. In ANN, involving of descriptors from feature selection approaches is a desirable option as they provide direction for the modification of chemical space to carry out activity modulation [49]. In view of this the features of selected models of CP-MLR and GA (Table 3; (1) and (6)) and the five common descriptors of CP-MLR and GA (MATS4m, MATS7e, H-052, Mor30p, and R7m) have been used separately for the development of three BP-ANN models for the activity. The ANN architecture with network parameters and the predictive statistics of the emerged models are shown in Table 6. In ANN models, these descriptors have well explained the antimalarial activity of the compounds (). Also they gave satisfactory predictions for the test set compounds (test set ). The plots of observed versus ANN predicted activities are shown in Figure 4. In ANN models also the features of CP-MLR, GA, and common sets infer the same meaning as discussed in previous paragraphs. The results clearly demonstrated that these descriptors have the ability to identify the patterns in the data and predict the activity of potential analogues.

BP-ANN architecture and parameters
LayerNodesTraining parameters

Input5 + 1 (bias)Learning rate ( )0.57–0.66
Hidden6Momentum ( )0.55–0.77
Output1Transfer functionSigmoid
Optimization algorithmLevenberg-Marquardt
Iterations ( )17–40

Model statistics

Feature seta RMSEPRSEP (%)MAE (%)


aThe ANN input features of CP-MLR and GA sets, respectively, correspond to and given in Table 3. The ANN input features common sets are features common to CP-MLR and GA.
b : squared correlation coefficient; RMSEP: root-mean-square error of prediction; RSEP: relative standard error of prediction; MAE: mean absolute error.

4. Conclusions

The antimalarial activity of a series of anilinoquinolines was modeled with the feature selection approaches CP-MLR and GA. This has led to the identification of eighteen descriptors to model the activity of the compounds. Among the identified descriptors, five (H-052, MATS4m, MATS7e, Mor30p, and R7m) are common to both CP-MLR and GA approaches. For the development of the single-window structure-activity model, all eighteen features were analyzed in PLS. In PLS analysis, the common descriptors of CP-MLR and GA are found among the most influential ones to modulate the activity of the anilinoquinolines. In regression as well as PLS models the negative coefficient of MATS7e argued that nonlinear and/or branched molecular topology leads to higher activity. H-052 represents the hydrogen(s) attached to sp3 carbon which is next to the carbon anchoring halogens. Its regression coefficient advocated in favor of such fragments for higher activity. The regression coefficient of H-052 advocated for the groups containing hydrogen of sp3 carbon attached to next carbon containing halogens in the substituents for higher activity. In BP-ANN, the descriptors from the selected equations of both feature selection approaches and the five most significant descriptors of PLS analysis (MATS4m, MATS7e, H-052, Mor30p, and R7m) have explained higher than 81% variance in the antimalarial activity of the training set compounds and showed a test set value greater than 0.75. These results offered direction to understand the patterns of the antimalarial activity of anilinoquinolines and may serve to predict the activity of potential prototype compounds. The values of the eighteen descriptors involved in the regression equations are provided as supplementary material to facilitate likely structural exploration (Supplementary material will be available online at


This work is supported by CDRI Communication no. 8310.

Supplementary Materials

Supplementary Data: Molecular indices involved in derived models

  1. Supplementary Data


  1. WHO report on communicable diseases (World Malaria Report 2011), June 2012,
  2. C. V. Plowe, “Antimalarial drug resistance in Africa: strategies for monitoring and deterrence,” Current Topics in Microbiology and Immunology, vol. 295, pp. 55–79, 2005. View at: Google Scholar
  3. A. C. Uhlemann and S. Krishna, “Antimalarial multi-drug resistance in Asia: mechanisms and assessment,” Current Topics in Microbiology and Immunology, vol. 295, pp. 39–53, 2005. View at: Google Scholar
  4. M. Foley and L. Tilley, “Quinoline antimalarials: mechanisms of action and resistance,” International Journal for Parasitology, vol. 27, no. 2, pp. 231–240, 1997. View at: Publisher Site | Google Scholar
  5. S. Leecharoen, S. Wangkaew, and W. Louthrenoo, “Ocular side effects of chloroquine in patients with rheumatoid arthritis, systemic lupus erythematosus and scleroderma,” Journal of The Medical Association of Thailand, vol. 90, no. 1, pp. 52–58, 2007. View at: Google Scholar
  6. A. C. Chou and C. D. Fitch, “Control of heme polymerase by chloroquine and other quinoline derivatives,” Biochemical and Biophysical Research Communications, vol. 195, no. 1, pp. 422–427, 1993. View at: Publisher Site | Google Scholar
  7. A. F. G. Slater, “Chloroquine: mechanism of drug action and resistance in Plasmodium falciparum,” Pharmacology & Therapeutics, vol. 57, no. 2-3, pp. 203–235, 1993. View at: Publisher Site | Google Scholar
  8. C. Biot, G. Glorian, L. A. Maciejewski et al., “Synthesis and antimalarial activity in vitro and in vivo of a new ferrocene-chloroquine analogue,” Journal of Medicinal Chemistry, vol. 40, no. 23, pp. 3715–3718, 1997. View at: Publisher Site | Google Scholar
  9. D. De, F. M. Krogstad, L. D. Byers, and D. Krogstad, “Structure-activity relationships for antiplasmodial activity among 7-substituted 4-aminoquinolines,” Journal of Medicinal Chemistry, vol. 41, no. 2, pp. 4918–4926, 1998. View at: Publisher Site | Google Scholar
  10. P. A. Stocks, K. J. Raynes, P. G. Bray, B. K. Park, P. M. O'Neill, and S. A. Ward, “Novel short chain chloroquine analogues retain activity against chloroquine resistant K1 Plasmodium falciparum,” Journal of Medicinal Chemistry, vol. 45, no. 23, pp. 4975–4983, 2002. View at: Publisher Site | Google Scholar
  11. L. M. Werbel, P. D. Cook, E. F. Elslager, J. H. Hung, J. L. Johnson, S. J. Kesten et al., “Antimalarial drugs. 60. Synthesis, antimalarial activity, and quantitative structure-activity relationships of tebuquine and a series of related 5-[(7-chloro-4-quinolinyl)amino]-3-[(alkylamino)methyl][1,1-biphenyl]-2-ols and,” Journal of Medicinal Chemistry, vol. 29, no. 6, pp. 924–939, 1986. View at: Publisher Site | Google Scholar
  12. H. L. Koh, M. L. Go, T. L. Ngiam, and J. W. Mak, “Conformational and structural features determining in vitro antimalarial activity in some indolo[3,2-c]quinolines, anilinoquinolines and tetrahydroindolo[3,2-d]benzazepines,” European Journal of Medicinal Chemistry, vol. 29, pp. 107–113, 1994. View at: Publisher Site | Google Scholar
  13. P. M. O'Neill, D. J. Willock, S. R. Hawley et al., “Synthesis, antimalarial activity, and molecular modeling of tebuquine analogues,” Journal of Medicinal Chemistry, vol. 40, no. 4, pp. 437–448, 1997. View at: Publisher Site | Google Scholar
  14. S. R. Hawley, P. G. Bray, M. Mungthin, J. D. Atkinson, P. M. O'Neill , and S. A. Ward, “Relationship between antimalarial drug activity, accumulation, and inhibition of heme polymerization in Plasmodium falciparumin vitro,” Antimicrobial Agents and Chemotherapy, vol. 42, no. 3, pp. 682–686, 1998. View at: Google Scholar
  15. P. M. O'Neill, P. G. Bray, S. R. Hawley, S. A. Ward, and B. K. Park, “4-Aminoquinolines—past, present, and future; a chemical perspective,” Pharmacology & Therapeutics, vol. 77, no. 1, pp. 29–58, 1998. View at: Publisher Site | Google Scholar
  16. T. J. Egan, “Structure-function relationships in chloroquine and related 4-aminoquinoline antimalarials,” Mini Reviews in Medicinal Chemistry, vol. 1, no. 1, pp. 113–123, 2001. View at: Publisher Site | Google Scholar
  17. S. R. Meshnick and A. P. Alker, “Amodiaquine and combination chemotherapy for malaria,” American Journal of Tropical Medicine and Hygiene, vol. 73, no. 5, pp. 821–823, 2005. View at: Google Scholar
  18. K. H. Rieckmann, “Determination of the drug sensitivity of Plasmodium falciparum,” Journal of the American Medical Association, vol. 217, no. 5, pp. 573–578, 1971. View at: Publisher Site | Google Scholar
  19. C. S. R. Hatton, C. Bunch, T. E. A. Peto et al., “Frequency of severe neutropenia associated with amodiaquine prophylaxis against malaria,” The Lancet, vol. 327, no. 8478, pp. 411–414, 1986. View at: Publisher Site | Google Scholar
  20. K. A. Neftel, W. Woodtly, M. Schmid, P. G. Frick, and J. Fehr, “Amodiaquine induced agranulocytosis and liver damage,” British Medical Journal, vol. 292, no. 6522, pp. 721–723, 1986. View at: Publisher Site | Google Scholar
  21. J. L. Maggs, N. R. Kitteringham, and B. K. Park, “Drug-protein conjugates—XIV: mechanisms of formation of protein-arylating intermediates from amodiaquine, a myelotoxin and hepatotoxin in man,” Biochemical Pharmacology, vol. 37, no. 2, pp. 303–311, 1988. View at: Publisher Site | Google Scholar
  22. A. C. Harrison, N. R. Kitteringham, J. B. Clarke, and B. K. Park, “The mechanism of bioactivation and antigen formation of amodiaquine in the rat,” Biochemical Pharmacology, vol. 43, no. 7, pp. 1421–1430, 1992. View at: Publisher Site | Google Scholar
  23. S. Delarue, S. Girault, L. Maes et al., “Synthesis and in vitro and in vivo antimalarial activity of new 4-anilinoquinolines,” Journal of Medicinal Chemistry, vol. 44, no. 17, pp. 2827–2833, 2001. View at: Publisher Site | Google Scholar
  24. S. Delarue-Cochin, E. Paunescu, L. Maes et al., “Synthesis and antimalarial activity of new analogues of amodiaquine,” European Journal of Medicinal Chemistry, vol. 43, no. 2, pp. 252–260, 2008. View at: Publisher Site | Google Scholar
  25. E. Davioud-Charvet, S. Delarue, C. Biot et al., “A prodrug form of a Plasmodium falciparum glutathione reductase inhibitor conjugated with a 4-anilinoquinoline,” Journal of Medicinal Chemistry, vol. 44, no. 24, pp. 4268–4276, 2001. View at: Publisher Site | Google Scholar
  26. S. Delarue-Cochin, P. Grellier, L. Maes, E. Mouray, C. Sergheraert, and P. Melnyk, “Synthesis and antimalarial activity of carbamate and amide derivatives of 4-anilinoquinoline,” European Journal of Medicinal Chemistry, vol. 43, no. 10, pp. 2045–2055, 2008. View at: Publisher Site | Google Scholar
  27. M. K. Gupta and Y. S. Prabhakar, “Topological descriptors in modeling the antimalarial activity of 4-(3,5-disubstituted anilino)quinolines,” Journal of Chemical Information and Modeling, vol. 46, no. 1, pp. 93–102, 2006. View at: Publisher Site | Google Scholar
  28. A. Semeniuk, A. Niedospial, J. Kalinowska-Tluscik, W. Nitek, and B. J. Oleksyn, “Molecular geometry of antimalarial amodiaquine in different crystalline environments,” Journal of Molecular Structure, vol. 875, no. 1–3, pp. 32–41, 2008. View at: Publisher Site | Google Scholar
  29. SYBYL, version 7.3, Tripos associates, St. Louis, Mo, USA, 2006.
  30. R. Todeschini, V. Consonni, A. Mauri, and M. Pavan, DRAGON software version 5. 0-2005, Milano, Italy,
  31. M. Waldrop, “Unique writing course for student engineers,” Chemical & Engineering News, vol. 57, no. 3, pp. 28–29, 1979. View at: Publisher Site | Google Scholar
  32. MOE: The Molecular Operating Environment from Chemical Computing Group Inc., 1255 University Street, Suite 1600, Montreal, Quebec, Canada H3B 3X3.
  33. Y. S. Prabhakar, “A combinatorial approach to the variable selection in multiple linear regression: analysis of selwood et al. Data set—a case study,” QSAR & Combinatorial Science, vol. 22, no. 6, pp. 583–595, 2003. View at: Publisher Site | Google Scholar
  34. D. R. Rogers and A. J. Hopfinger, “Application of genetic function approximation to quantitative structure-activity relationships and quantitative structure-property relationships,” Journal of Chemical Information and Computer Sciences, vol. 34, no. 4, pp. 854–866, 1994. View at: Publisher Site | Google Scholar
  35. S. Wold, “Cross-validatory estimation of the number of components in factor and principal components models,” Technometrics, vol. 20, no. 4, pp. 397–405, 1978. View at: Publisher Site | Google Scholar
  36. L. Stahle and S. Wold, “Multivariate data analysis and experimental design in biomedical research,” in Progress in Medicinal Chemistry, G. P. Eillis and W. B. West, Eds., vol. 25, pp. 291–338, Elsevier Science, Amsterdam, The Netherlands, 1988. View at: Google Scholar
  37. D. Graupe, in Principles of Artificial Neural Networks, pp. 59–111, World Scientific, Singapore, 2nd edition, 2007.
  38. S. Deshpande, V. R. Solomon, S. B. Katti, and Y. S. Prabhakar, “Topological descriptors in modelling antimalarial activity: N1-(7-chloro-4-quinolyl)-1,4-bis(3-aminopropyl)piperazine as prototype,” Journal of Enzyme Inhibition and Medicinal Chemistry, vol. 24, no. 1, pp. 94–104, 2009. View at: Publisher Site | Google Scholar
  39. S. Deshpande, R. Singh, M. Goodarzi, S. B. Katti, and Y. S. Prabhakar, “Consensus features of CP-MLR and GA in modeling HIV-1 RT inhibitory activity of 4-benzyl/benzoylpyridin-2-one analogues,” Journal of Enzyme Inhibition and Medicinal Chemistry, vol. 26, no. 5, pp. 696–705, 2011. View at: Publisher Site | Google Scholar
  40. R. Todeschini, V. Consonni, and M. Pavan, MOBYDIGS software, Version 1.2 for Windows, Talete Srl, Milan, Italy, 2002,
  41. M. Pavan, A. Mauri, and R. Todeschini, “Total ranking models by the genetic algorithm variable subset selection (GA–VSS) approach for environmental priority settings,” Analytical and Bioanalytical Chemistry, vol. 380, no. 3, pp. 430–444, 2004. View at: Publisher Site | Google Scholar
  42. S.-S. So and M. Karplus, “Three-dimensional quantitative structure-activity relationships from molecular similarity matrices and genetic neural networks. 1. Method and validations,” Journal of Medicinal Chemistry, vol. 40, no. 26, pp. 4347–4359, 1997. View at: Publisher Site | Google Scholar
  43. Y. S. Prabhakar, V. R. Solomon, R. K. Rawal, M. K. Gupta, and S. B. Katti, “CP-MLR/PLS directed structure-activity modeling of the HIV-1 RT inhibitory activity of 2,3-diaryl-1,3-thiazolidin-4-ones,” QSAR & Combinatorial Science, vol. 23, no. 4, pp. 234–244, 2004. View at: Publisher Site | Google Scholar
  44. P. Gramatica, “Principles of QSAR models validation: internal and external,” QSAR & Combinatorial Science, vol. 26, no. 5, pp. 694–701, 2007. View at: Publisher Site | Google Scholar
  45. D. W. Marquardt, “An algorithm for least-squares estimation of nonlinear parameters,” Journal of the Society for Industrial and Applied Mathematics, vol. 11, no. 2, pp. 431–441, 1963. View at: Publisher Site | Google Scholar
  46. M. T. Hagan and M. B. Menhaj, “Training feedforward networks with the Marquardt algorithm,” IEEE Transactions on Neural Networks, vol. 5, no. 6, pp. 989–993, 1994. View at: Publisher Site | Google Scholar
  47. L. Sachs, Applied Statistics: A Handbook of Techniques, Springer, Berlin, Germany, 1982.
  48. MATLAB, Version 7.6,
  49. M. Goodarzi, S. Deshpande, V. Murugesan, S. B. Katti, and Y. S. Prabhakar, “Is feature selection essential for ANN modeling?” QSAR & Combinatorial Science, vol. 28, no. 11-12, pp. 1487–1499, 2009. View at: Publisher Site | Google Scholar

Copyright © 2013 Shreekant Deshpande et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

More related articles

 PDF Download Citation Citation
 Download other formatsMore
 Order printed copiesOrder

Related articles

We are committed to sharing findings related to COVID-19 as quickly as possible. We will be providing unlimited waivers of publication charges for accepted research articles as well as case reports and case series related to COVID-19. Review articles are excluded from this waiver policy. Sign up here as a reviewer to help fast-track new submissions.