Table of Contents
ISRN Analytical Chemistry
Volume 2013, Article ID 151464, 8 pages
Research Article

Linear and Nonlinear QSAR Study of N2 and O6 Substituted Guanine Derivatives as Cyclin-Dependent Kinase 2 Inhibitors

Faculty of Chemistry, Shahrood University of Technology, P.O. Box 316, Shahrood 3619995161, Iran

Received 11 April 2013; Accepted 23 May 2013

Academic Editors: J. N. Latosinska and C. Y. Panicker

Copyright © 2013 Nasser Goudarzi et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.


The inhibitory activities (pIC50) of N2 and O6 substituted guanine derivatives as cyclin-dependent kinase 2 (CDK2) inhibitors have been successfully modeled using calculated molecular descriptors. Two linear (MLR) and nonlinear (ANN) methods were utilized for construction of models to predict the pIC50 activities of those compounds. The QSAR models were validated by cross-validation (leave-one-out) as well as application of the models for prediction of pIC50 of external set compounds. Also, the models were validated by calculation of statistical parameters and Y-randomization test. Two methods provided accurate predictions, although more accurate results were obtained by ANN model. The mean-squared errors (MSEs) for validation and test sets of MLR are 0.065, 0.069 and of ANN are 0.017 and 0.063, respectively.

1. Introduction

The cyclin-dependent kinases (CDKs) are a class of enzymes which play a fundamental role in cell cycle regulation [1, 2]. Particularly as their name suggests CDKs activation partially depends on the binding of another class of proteins named cyclins, for example, cyclins of the D family complex with CDK4 and CDK6 during G1 phase, cyclin E with CDK2 in late G1, cyclin A with CDK2 in S phase, and cyclin B with CDK1 (also known as cdc2) in late G2/M. Then, aberrant CDK control and consequent loss of cell cycle check point function have been directly linked to the molecular pathology of cancer [3]. It is well known that phosphorylation in a conserved threonine residue of the CDK subunit is required for its complete activation. This task is performed by the CDK activating kinase. These proteins properly regulate the cell cycle progress and DNA synthesis only as an active complex (T160pCDK/cyclin) [4]. Overall, the activity of the CDK/cyclin complex can be depleted by at least two different mechanisms that contain the phosphorylation of the CDK subunit at the inhibitory sites or the binding of the specialized natural inhibitors known as CDK inhibitors. In the first mechanism, the amino acid residue Y15 and to a lesser extend T14 (in CDK2) are phosphorylated by human Wee 1 Hu [5]. This inhibitory phosphorylation is independent of previous cyclin binding [6]. The second mechanism involves the binding of natural CDK inhibitors. Four major mammalian CDK inhibitors have been discovered: P21 (CIP1/WAF190) and P27 (KIP1) inactive CDK2 and CDK4 cyclin complexes by binding to them. The two other inhibitors are and that are specific for CDK4 and CDK6. They inhibit the formation of the active cyclin complexes by binding to the inactive CDK, and they can also bind to the active complex [2, 7]. However, it has been shown that natural CDK inhibitors are subexpressed in some carcinogenic cells, and medicinal chemists have put some of their effort in the search for new synthetic inhibitors to replace them [812]. Some of them have entered in clinical field; for instance, flavopiridol induces cell cycle arrest and tumor growth inhibition [13].

The search for more potent and selective CDK inhibitors is a daunting challenge due to the similarity of the ATP binding site along the different CDK subtypes [14]. According to this, the development and use of new strategies to overcome this problem are urgently needed. Nowadays, new and exciting strategies have emerged and become available to find more potent and selective inhibitors, and they normally use quantitative structure-activity relationships (QSARs) derived from different computational calculation approaches [1519].

Quantitative structure-activity/property relationship (QSAR/QSPR) was used for correlation of different activities and properties to characteristics of molecular structures. In recent years, several QSAR and QSPR models based on both linear and nonlinear methods that aimed to predict different activities and properties were used [2030]. The reliable prediction of inhibition of CDK2 has an important role in medicinal researches. The ultimate role of the different formulations of the QSAR theory is to suggest mathematical models for estimating relevant endpoints of interest, especially when these cannot be experimentally determined for some reason. These studies simply rely on the assumption that the structure of a compound determines the related activity. The molecular structure is therefore translated into the so-called molecular descriptors through mathematical formulae obtained from several theories, such as chemical graph theory, information theory, and quantum mechanics [31, 32]. In this work, we carried out a QSAR study by predicting the inhibitory activity of a set of N2 and O6 substituted guanine derivatives by using multiple linear regression (MLR) and artificial neural network (ANN) dealing with linearity and nonlinearity, respectively.

2. Basic Theory

2.1. Multiple Linear Regression (MLR)

The general goal of multiple linear regression (MLR) is to model the relationship between some independent variables and a dependent variable by fitting a linear equation to observed data. Generally, the multiple linear regression model is as the following equation: where is the number of independent variables, are the regression coefficients, and is the dependent variable [33].

2.2. Artificial Neural Network (ANN)

ANNs have large numbers of computational units connected in a vast parallel construction. Neural networks do not need an obvious formulation of the handled problem. They act as a means to introduce scaled data to the network. The data from the input layer (input neurons) propagate through the network via interconnections. Scalar weights are specialized to each connection. A remarkable aspect of the neural networks is their learning step. In this step, the value of weights and biases would be optimized based on a set of measured numerical values (training set). More details about neural networks are given in [34, 35].

3. Materials and Methods

3.1. Data

The experimental data used in the present study to model IC50 were taken from [36]. The whole data set included 56 compounds, whose biological activities (pIC50 values) were determined for inhibition of CDK2. It is worthy to say that pIC50 values span a broad range from 4.11 to 7.22 M. In this work, the structure-activity model is generated by ANN and MLR. The names of these compounds, their experimental and calculated pIC50 values by ANN and MLR methods, and also their values using leave-one-out are shown in Table 1. As can be seen, this set contains 56 inhibitory activity (pIC50) data of CDK2s. The data set was split into training, validation, and test sets to increase the network’s generalization. The training set of 34 compounds, with pIC50 values ranging from 4.11 to 7.22, was used to construct the model. The validation set of 11 compounds, with pIC50 values ranging from 4.19 to 6.62, was used to prevent overtraining/over fitting of the ANN model. The test set of 11 compounds, with pIC50 values ranging from 4.19 to 6.96, was used as an external set to evaluate the predictive ability of the model.

Table 1: The names of compounds, their experimental and calculated pIC50 values by ANN and MLR methods, and also their values were calculated using leave-one-out.
3.2. Descriptor Generation and Screening

The inhibitory activation of compounds is related to some of their structural, electronic, and geometric properties. The value of these properties can be encoded quantitatively by numerical values named molecular descriptors. These molecular parameters are to be used to search for the best QSAR model of the inhibitory activation. The 2D structures of the molecules were drawn using Hyperchem8 software [37]. The molecular structures were optimized using the Polak-Ribiere  algorithm until the root-mean-square gradient was 0.001 kcal mol−1. The resulting geometry was transferred into the Dragon program package, and 1481 descriptors were produced [38]. Then, these descriptors were given to SPSS 17 for statistical work [39]. It is worth mentioning that in the first preselected analysis, some descriptors were removed because many of them included zero or other constant/near-constant values and did not have enough information of structure. On the other hand, to decrease the redundancy existing in the descriptor data matrix, the correlation coefficient of the descriptors with each other  (Pearson’s correlation) was examined, and the collinear descriptors (with ) were removed. By doing so, 238 descriptors were remained. Then by using the stepwise mode for regression, 14 models were given. With considering some parameters such as , , , and standard error (SE), model number 14 containing 10 descriptors was used as MLR model to predict of pIC50. These descriptors are 3D-MoRSE-signal 20/unweighted (Mor20u), Geary autocorrelation-lag 2/weighted by atomic Sanderson electronegativities (GATS2e), autocorrelation of lag 5/weighted by atomic polarizabilities (R5p), Moran autocorrelation-lag 2/weighted by atomic Sanderson electronegativities (MATS2e), mean topological charge index of order 6 (JGI6), mean topological charge index of order 6 (JGI4), autocorrelation of lag 5/weighted by atomic masses (R1m), leverage-weighted autocorrelation of lag 0/weighted by atomic masses (HATS0m), 3D-MoRSE-signal 20/weighted by atomic van der Waals volumes (Mor06v), 1st component accessibility directional WHIM index/weighted by atomic electrotopological states (E1s). The name, class, and meaning of these descriptors are shown in Table 2.

Table 2: Descriptors were selected for construction of model.

The correlation matrix for the selected 10 descriptors presented in the model is shown in Table 3. These results show there is not any correlation between the selected descriptors.

Table 3: Correlation matrix for the selected descriptors.

4. Results and Discussion

The prediction ability of QSAR/QSPR models is affected by two factors: one is the descriptors, which should carry enough information of molecular structure for the interpretation of the activity/property. The other is the modeling method employed [20]. The number of descriptors available for QSAR/QSPR studies is often so large that it is difficult to obtain a model including all of them. Therefore, identifying important descriptors certainly plays an important role in QSAR/QSPR.

Descriptors should represent the maximum information in activity variations, and collinearity among them must be kept to a minimum. As can be seen from the correlation matrix (Table 3), there is no significant correlation between the selected descriptors. In the present work, these descriptors were used for construction of both linear and nonlinear models. The following linear model was obtained by the training set compounds and 10 selected molecular descriptors:

pIC50 = 11.746 + 0.684 Mor20u − 2.228 GATS2e − 18.175 R5p − 5.114 MATS2e + 65.07 JGI6 − 40.405 JGI4 + 2.155 R1m − 4.434 HATS om − 0.149 Mor06v − 0.715 E1s.

This model was then used to predict the validation and test sets of data. Then, artificial neural network (ANN) was used to make a nonlinear model to calculate the inhibitory activities (pIC50) of the compounds. To do so, a 3-layer feedforward network with backpropagation pattern was used in which mean squared error (MSE) was applied as the performance function. The MLR selected descriptors were used as the input layer of the network. To have a strong network, 5 parameters were optimized. The optimized parameters are (1) the number of descriptors (between 2 and 10), (2) the number of nodes in the hidden layer, (3) the transfer function (including log sigmoid and tan sigmoid), (4) training function (including Bayesian regulation (trainbr) and Levenberg-Marquardt (trainlm)), and finally (5) number of epochs. Table 4 shows the training settings of the optimized network. It should be noted that the training of the network for the prediction of pIC50 was interrupted when the MSE of the validation set started to increase, to avoid overfitting.

Table 4: The training settings for the ANN model.

According to Table 4, a network with a Levenberg-Marquardt training function and log-sigmoid transfer function with 10 descriptors (the same MLR descriptors) has the least MSE value (0.0171). In order to evaluate the predictive ability of the linear and nonlinear models and to compare them, we employed the percentage of mean absolute error (MAE), mean squared error (MSE), predictive residual sum of squares (PRESSs), standard error of prediction (SEP), determination coefficient (), percentage of relative error prediction (REP (%)), and relative mean absolute error (RMAE). These statistical parameters for MLR and ANN are listed in Table 5.

Table 5: Comparison of the statistical parameters obtained from ANN and MLR models.

As can be seen from Table 5, all the error parameters’ values of ANN for both test and validation sets are smaller than those of MLR. This is believed to be due to the nonlinear capabilities of the ANN model.

The used statistical parameters are defined as: where is the experimental value, is the predicted value, is the mean value, and is the number of compounds.

To avoid chance correlation and to guarantee the network’s predictability power, Y-randomization test was carried out. The results of several repetition of this test are shown in Table 6. The low values of show that there is no chance correlation in the developed model.

Table 6: values of the test set after several Y-randomization tests.

Figures 1 and 2 show plots of the predicted values versus experimental ones of ANN and MLR models for validation and test sets. The obtained results show the superiority of ANN model than MLR to predict of pIC50 of these compounds. The ANN and MLR residuals of leave-one-out are plotted against the experimental values in Figure 3. The symmetric distribution of residuals at both sides of the zero line indicates that no systematic error exists in the development of the MLR and ANN models.

Figure 1: Plot of the predicted values versus the experimental ones for the validation set.
Figure 2: Plot of the predicted values versus the experimental ones for the test set.
Figure 3: Plot of the ANN and MLR residuals of leave-one-out versus experimental values.

5. Conclusion

From the analysis of the obtained results, we can conclude that (1) the proposed models can sufficiently represent structure-activity relationship of the compounds. (2) By comparison of results from the MLR and ANN, the performance of the ANN model is clearly better than that of MLR, which indicates that nonlinear model can simulate the relationship between the structures of the compounds and their activities more accurately. (3) The calculated statistical parameters of these models reveal the superiority of ANN over MLR model.


  1. C. Norbury and P. Nurse, “Animal cell cycles and their control,” Annual Review of Biochemistry, vol. 61, pp. 441–470, 1992. View at Google Scholar
  2. D. O. Morgan, “Principles of CDK regulation,” Nature, vol. 374, no. 6518, pp. 131–134, 1995. View at Google Scholar · View at Scopus
  3. M. Hall and G. Peters, “Genetic alterations of cyclins, cyclin-dependent kinases, and Cdk inhibitors in human cancer,” Advances in Cancer Research, vol. 68, pp. 67–108, 1996. View at Google Scholar
  4. T. M. Sielecki, J. F. Boylan, P. A. Benfield, and G. L. Trainor, “Cyclin-dependent kinase inhibitors: useful targets in cell cycle regulation,” Journal of Medicinal Chemistry, vol. 43, no. 1, pp. 1–18, 2000. View at Google Scholar · View at Scopus
  5. N. Watanabe, M. Broome, and T. Hunter, “Regulation of the human WEE1Hu CDK tyrosine 15-kinase during the cell cycle,” EMBO Journal, vol. 14, no. 9, pp. 1878–1891, 1995. View at Google Scholar · View at Scopus
  6. K. Coulonval, L. Bockstaele, S. Paternot, and P. P. Roger, “Phosphorylations of cyclin-dependent kinase 2 revisited using two-dimensional gel electrophoresis,” Journal of Biological Chemistry, vol. 278, no. 52, pp. 52052–52060, 2003. View at Publisher · View at Google Scholar · View at Scopus
  7. N. P. Pavletich, “Mechanisms of cyclin-dependent kinase regulation: structures of Cdks, their cyclin activators, and Cip and INK4 inhibitors,” Journal of Molecular Biology, vol. 287, pp. 821–828, 1999. View at Google Scholar
  8. M. D. Losiewicz, B. A. Carlson, G. Kaur, E. A. Sausville, and P. J. Worland, “Potent inhibition of Cdc2 kinase activity by the flavonoid L86-8275,” Biochemical and Biophysical Research Communications, vol. 201, no. 2, pp. 589–595, 1994. View at Publisher · View at Google Scholar · View at Scopus
  9. A. M. Senderowicz and E. A. Sausville, “Preclinical and clinical development of cyclin-dependent kinase modulators,” Journal of the National Cancer Institute, vol. 92, no. 5, pp. 376–387, 2000. View at Google Scholar · View at Scopus
  10. I. R. Hardcastle, B. T. Golding, and R. J. Griffin, “Designing inhibitors of cyclin-dependent kinases,” Annual Review of Pharmacology and Toxicology, vol. 42, pp. 325–348, 2002. View at Google Scholar
  11. M. Knockaert, P. Greengard, and L. Meijer, “Pharmacological inhibitors of cyclin-dependent kinases,” Trends in Pharmacological Sciences, vol. 23, no. 9, pp. 417–425, 2002. View at Publisher · View at Google Scholar · View at Scopus
  12. P. L. Toogood, “Progress toward the development of agents to modulate the cell cycle,” Current Opinion in Chemical Biology, vol. 6, pp. 472–478, 2002. View at Google Scholar
  13. G. I. Shapiro, “Preclinical and clinical development of the cyclin-dependent kinase inhibitor flavopiridol,” Clinical Cancer Research, vol. 10, no. 12, pp. 4270s–4275s, 2004. View at Google Scholar
  14. M. Vieth, R. E. Higgs, D. H. Robertson, M. Shapiro, E. A. Gragg, and H. Hemmerle, “Kinomics—structural biology and chemogenomics of kinase inhibitors and targets,” Biochimica et Biophysica Acta, vol. 1697, no. 1-2, pp. 243–257, 2004. View at Publisher · View at Google Scholar · View at Scopus
  15. M. Fernandez, A. Tundidor-Camba, and J. Caballero, “Modeling of cyclin-dependent kinase inhibition by 1H-pyrazolo[3,4-d]pyrimidine derivatives using artificial neural network ensembles,” Journal of Chemical Information and Modeling, vol. 45, no. 6, pp. 1884–1895, 2005. View at Google Scholar
  16. M. P. Gonzalez, J. Caballero, A. M. Helguera, M. Garriga, G. Gonzalez, and M. Fernandez, “2D autocorrelation modelling of the inhibitory activity of cytokinin-derived cyclin-dependent kinase inhibitors,” Bulletin of Mathematical Biology, vol. 68, no. 4, pp. 735–751, 2006. View at Google Scholar
  17. H. Dureja and A. K. Madan, “Topochemical models for prediction of cyclin-dependent kinase 2 inhibitory activity of indole-2-ones,” Journal of Molecular Modeling, vol. 11, no. 6, pp. 525–531, 2005. View at Publisher · View at Google Scholar · View at Scopus
  18. J. Z. Li, H. X. Liu, X. J. Yao, M. C. Liu, Z. D. Hu, and B. T. Fan, “Structure-activity relationship study of oxindole-based inhibitors of cyclin-dependent kinases based on least-squares support vector machines,” Analytica Chimica Acta, vol. 581, pp. 333–342, 2007. View at Google Scholar
  19. S. Samanta, B. Debnath, A. Basu, S. Gayen, K. Srikanth, and T. Jha, “Exploring QSAR on 3-aminopyrazoles as antitumor agents for their inhibitory activity of CDK2/cyclin A,” European Journal of Medicinal Chemistry, vol. 41, no. 10, pp. 1190–1195, 2006. View at Publisher · View at Google Scholar · View at Scopus
  20. M. Goodarzi and M. P. Freitas, “Predicting boiling points of aliphatic alcohols through multivariate image analysis applied to quantitative structure-property relationships,” Journal of Physical Chemistry A, vol. 112, no. 44, pp. 11263–11265, 2008. View at Publisher · View at Google Scholar · View at Scopus
  21. M. Goodarzi and M. P. Freitas, “Augmented three-mode MIA-QSAR modeling for a series of anti-HIV-1 compounds,” QSAR and Combinatorial Science, vol. 27, no. 9, pp. 1092–1097, 2008. View at Publisher · View at Google Scholar · View at Scopus
  22. M. Goodarzi, T. Goodarzi, and N. Ghasemi, “Spectrophotometric simultaneous determination of manganese(ii) and iron(ii) in pharmaceutical by orthogonal signal correction-partial least squares,” Annali di Chimica, vol. 97, no. 5-6, pp. 303–312, 2007. View at Publisher · View at Google Scholar · View at Scopus
  23. N. Goudarzi, M. H. Fatemi, and A. Samadi-Maybodi, “Quantitative structure-properties relationship study of the 29Si-NMR chemical shifts of some silicate species,” Spectroscopy Letters, vol. 42, no. 4, pp. 186–193, 2009. View at Publisher · View at Google Scholar · View at Scopus
  24. N. Goudarzi and M. Goodarzi, “Prediction of the logarithmic of partition coefficients (log P) of some organic compounds byleast square-support vector machine (LS-SVM),” Molecular Physics, vol. 106, pp. 2525–2535, 2008. View at Publisher · View at Google Scholar
  25. N. Goudarzi and M. Goodarzi, “Prediction of the acidic dissociation constant (pKa) of some organic compounds using linear and nonlinear QSPR methods,” Molecular Physics, vol. 107, no. 14, pp. 1495–1503, 2009. View at Publisher · View at Google Scholar · View at Scopus
  26. N. Goudarzi and M. Goodarzi, “Prediction of the vapor pressure of some halogenated methyl-phenyl ether (anisole) compounds using linear and nonlinear QSPR methods,” Molecular Physics, vol. 107, no. 15, pp. 1615–1620, 2009. View at Publisher · View at Google Scholar · View at Scopus
  27. N. Goudarzi and M. Goodarzi, “QSPR models for prediction of half wave potentials of some chlorinated organic compounds using SR-PLS and GA-PLS methods,” Molecular Physics, vol. 107, pp. 1739–1744, 2009. View at Google Scholar
  28. Z. Elmi, K. Faez, M. Goodarzi, and N. Goudarzi, “Feature selection method based on fuzzy entropy for regression in QSAR studies,” Molecular Physics, vol. 107, no. 17, pp. 1787–1798, 2009. View at Publisher · View at Google Scholar · View at Scopus
  29. N. Goudarzi, M. Goodarzi, M. C. U. Araujo, and R. K. H. Galvao, “QSPR modeling of soil sorption coefficients (KOC) of pesticides usingSPA-ANN and SPA-MLR,” Journal of Agricultural and Food Chemistry, vol. 57, pp. 7153–7158, 2009. View at Google Scholar
  30. N. Goudarzi, M. Goodarzi, and M. Arab Chamjangali, “Prediction of inhibition effect of some aliphatic and aromatic organic compounds using QSAR method,” Journal of Environmental Chemistry and Ecotoxicology, vol. 2, pp. 47–50, 2010. View at Google Scholar
  31. N. Trinajstic, Chemical Graph Theory, CRC Press, Boca Raton, Fla, USA, 1992.
  32. A. R. Katritzky, V. S. Lobanov, and M. Karelson, “QSPR: the correlation and quantitative prediction of chemical and physical properties from structure,” Chemical Society Reviews, vol. 24, no. 4, pp. 279–287, 1995. View at Google Scholar · View at Scopus
  33. N. R. Draper and H. Smith, Applied Regression Analysis, Wiley Series in Probability and Statistics, New York, NY, USA, 1998.
  34. J. Zupan and J. Gasteiger, Neural Networks in Chemistry and Drug Design, Wiley-VCH, Weinheim, Germany, 1999.
  35. N. K. Bose and P. Liang, Neural Networks, Fundamentals, McGraw-Hill, New York, NY, USA, 1996.
  36. J. H. Alzate-Morales, J. Caballero, A. Vergara Jague, and F. D. Gonzalez, “Insights into the structural basis of N2 and O6 substituted guanine derivatives as cyclin-dependent kinase 2 (CDK2) inhibitors: prediction of the binding modes and potency of the inhibitors by docking and ONIOM calculations,” Journal of Chemical Information and Modeling, vol. 49, no. 4, pp. 886–899, 2009. View at Publisher · View at Google Scholar
  37. HyperChem Release 7, HyperCube, Inc.,
  38. R. Todeschini, Milano Chemometrics and QSPR Group,
  39. SPSS for windows Statistical package for IBM PC, SPSS Inc.,