About this Journal Submit a Manuscript Table of Contents
BioMed Research International
Volume 2013 (2013), Article ID 798743, 7 pages
http://dx.doi.org/10.1155/2013/798743
Research Article

Predicting the DPP-IV Inhibitory Activity Based on Their Physicochemical Properties

1School of Materials Science and Engineering, Shanghai University, 149 Yan-Chang Road, Shanghai 200072, China
2Department of Chemistry, College of Sciences, Shanghai University, 99 Shang-Da Road, Shanghai 200444, China
3Department of Neurosurgery, Changhai Hospital, Second Military Medical University, 168 Chang-Hai Road, Shanghai 200433, China

Received 29 March 2013; Revised 10 May 2013; Accepted 28 May 2013

Academic Editor: Yudong Cai

Copyright © 2013 Tianhong Gu et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Abstract

The second development program developed in this work was introduced to obtain physicochemical properties of DPP-IV inhibitors. Based on the computation of molecular descriptors, a two-stage feature selection method called mRMR-BFS (minimum redundancy maximum relevance-backward feature selection) was adopted. Then, the support vector regression (SVR) was used in the establishment of the model to map DPP-IV inhibitors to their corresponding inhibitory activity possible. The squared correlation coefficient for the training set of LOOCV and the test set are 0.815 and 0.884, respectively. An online server for predicting inhibitory activity pIC50 of the DPP-IV inhibitors as described in this paper has been given in the introduction.

1. Introduction

The incretin hormones glucagon-like peptide-1 (GLP-1) and glucose-dependent insulinotropic polypeptide (GIP) are the endogenous peptides that stimulate glucose-dependent insulin secretion [1]. One of the important roles of dipeptidyl peptidase IV (DPP-IV) [2] is a rapid inactivation of the GLP-1 and GIP. Inhibition of DPP-4 increases the levels of endogenous intact circulating GLP-1 and GIP. Consequently, inhibitors of DPP-4 or gliptins have been recently regarded as a prospective approach for the treatment of type-2 diabetes mellitus.

In recent years, multiple small-molecule DPP-4 inhibitors have been reported [3, 4]. The development of a structurally diverse collection of DPP-4 inhibitors is a hot research [58]. Computational and various mathematical approaches have been widely employed in the quantitative structure-activity relationship (QSAR) analysis [913]. Using statistical methods, QSAR analyses were carried out on a dataset of 47 pyrrolidine analogs acting as DPP-IV inhibitors by Paliwal et al. [14]. Murugesan et al. used the comparative molecular field analysis (CoMFA) and comparative molecular similarity indices analysis (CoMSIA) to analyze the structural requirements of a DPP-IV active site [15]. Gao et al. developed a novel 3D-QSAR model to assist rational design of novel, potent, and selective pyrrolopyrimidine DPP-4 inhibitors [16]. Moreover, several efforts by using computational and mathematical approaches have been made in investigating small molecules of DPP-4 inhibitors. In our previous studies [17], we have attempted to use the quantum chemistry method [18] to optimize a series of DPP-IV inhibitors, and a 2D-QSAR model has been built, which can predict the inhibitory activity of small molecule with satisfying results. However, it is time consuming to calculate the molecular descriptors adopted in 2D-QSAR model.

In view of this, here we will try to devise an effective method to correctly recognize the possible activity prediction of small molecules based on physical and chemical properties of the compounds.

According to the general development trend [19, 20] and the recent research progress [2131], the following procedures should be considered to establish a powerful statistical predictor for a biological system: (i) a valid benchmark dataset is constructed or selected to train and test the predictor; (ii) the samples are formulated with potent mathematical functions that are contributed to the prediction; (iii) a powerful algorithm is introduced or developed to operate the prediction; (iv) cross-validation tests are used to estimate the performance of the predictor; (v) a user-friendly online-server is established for the predictor that is accessible to the public. In this study, we attempt to describe how to deal with these steps for predicting the DPP-IV inhibitory activity pIC50 based on their physicochemical properties available via our program.

2. Materials and Methods

2.1. Data Preparation

The dataset used in the present work contains 48 pyrrolidine amides derivatives. In the current study, a diverse series of DPP-IV inhibitors with known IC50 values were collected from the papers [32, 33]. The detailed structures are documented in Supplementary Materials.(See Supplementary Material available at http://dx.doi.org/10.1155/2013/798743.) Figure 1 demonstrates the common structure of all of these analogues. All of the structures of compounds under investigation are based on the structure of Figure 1.

fig1
Figure 1: Molecular structure of cyanopyrrolidine amides as DPP-IV inhibitors.

How to describe the molecules is an important problem in the establishment of the statistical model. In this study, the molecular descriptors for the 48 molecules were calculated by the second development software based on the calculator plugins, which is a product of ChemAxon [34]. ChemAxon is a company that provides chemical software development platforms and desktop applications for the biotechnology and pharmaceutical industries [35].

2.2. The Introduction of Procedure

Due to the use of Marvin Sketch graphic interface and JChem for Excel program, the calculations of small molecular descriptors are not very convenient. ChemAxon provides the calculation plugins of invoking function API, so our lab members have made a careful study and repeated experiments. The calculation results are compared with the ones of Gaussian 09 [18], JChem for Excel [34], HyperChem 7.5 [20, 36], and Dragon [37] programs calculation. By invoking the Calculator Plugins and using the Java language, we successfully developed a convenient and available customized batch calculation program (second development software) for the small molecular descriptors.

This program contains a selection of tree box; the user can choose the visual way to the calculation of molecular descriptors (as shown in Figure 2, command-line version does not provide molecular descriptor selection). The molecule structures are constructed from Gauss View 5.0 package [38, 39] as MOL-format file. Command-line version of the program is operated commonly in Linux server, through the similar execution command as follows:

java-jar JChemCmd.jar Molecules Pathway Result.csv Method.xml

798743.fig.002
Figure 2: The program interface for the computation of molecular descriptors.
2.3. Model Validation
2.3.1. Dataset

The full dataset included training set (36 compounds) and test set (12 compounds). The whole samples were ranked by activity and were extracted every fourth sample for the generation of the test set.

2.3.2. Leave-One-Out Cross-Validation (LOOCV) and Predictive Validation

In this study, Leave-one-out cross-validation (LOOCV) [40, 41] was used to investigate the prediction quality of training set. In the cross-validation, each sample is used to test the model that is established by all of the other samples at the same time.

2.3.3. Fitting and Predictive Performances of Models

The fitting and predictive performances of model were measured by the squared correlation coefficient () and root mean square error for both the training set and the external test set. Here the performances of models can be estimated by and defined as follows, respectively: where and are the actual and predicted pIC50 values of sample, respectively, and is the average pIC50 value of the entire samples. is the numbers of the training set.

2.4. Methods

For the sake of the redundancy of some features, the selection of descriptors before establishing a suitable model is necessary. The selection of descriptors plays an important role in construction for the actual model. In this work, mRMR-BFS method (minimum redundancy maximum relevance-backward feature selection) [42, 43] was used for the selection of molecular descriptors. The support vector regression (SVR) model was established based on the feature selection results.

2.4.1. mRMR-BFS Algorithm

The mRMR (minimum-redundancy maximum-relevance) algorithm was introduced by Ding and Ping [44], which was used usually for feature selection. It sorts a feature based on score function which is maximum relevance to target and minimum redundancy to the already selected features. The score function is defined as follows: where , , ,  and , , and  are the feature sets. and  are the feature numbers. The mutual information is as follows: where , , and are the probabilistic density functions.

More details about mRMR algorithm can be found in [44, 45].

To gain an even better performance of predictor and feature selection, backward feature selection (BFS) based on the result of mRMR is also used in this study. The most important 50 variables were obtained from the mRMR procedure. We initialize the BFS-selected feature set with all features in :

With the mRMR-selected feature subset , the next BFS-selected feature set can be gained by the following steps.(1)Suppose that the candidate feature set is . Then an SVR model based on each is established and evaluated by LOOCV method. (2)The feature which gets the lowest is selected when removed from . (3)The feature is removed from forming the next BFS-selected feature set.

2.4.2. SVM (Support Vector Machine)

Vapnik and his co-workers developed the SVM algorithm, which is a supervised machine-learning method that is used for classification and regression analysis. Owing to embodying the structural risk minimization principle, the SVM exhibits a better whole performance. The SVM is suitable for the problems which are involved in the small sample set. In this work, SVM was applied to regression. The details of the algorithm can be found in reference [46]. The algorithm was performed by using the software package Weka 3.6.7 [47, 48].

3. Results and Discussion

3.1. Selection of Features

Firstly, mRMR method was applied to rank the total 75 features according to their mRMR scores. Secondly, we used the backward feature selection (BFS) algorithm based on SVR to search for the feature combinations. As different machine learning methods will lead to different results, several robust machine learning methods like the nearest-neighbor algorithm (NNA), support vector machine (SVM based on RBF kernel function), and Adaboost were employed to find an optimal feature subset with leave-one-out cross-validation, respectively. As a result, we adopted the SVM as the prediction engine based on the LOOCV in this study.

Table 1 lists an optimal subset attained by employing the above two-stage feature selection method, mRMR-BFS. The six features in optimal subset can be clustered into three categories (based on the category of Calculator Plugins [49]): elemental analysis, geometry, topology, and others. The geometry and topology factor are more important in this work. The geometry and topology factor are related to the size of the molecule as it indicates that the size of cyanopyrrolidine amides derivatives plays a main role in the inhibitory activity.

tab1
Table 1: Symbols for molecular descriptors involved in the model.
3.2. Results of Computation

In this work, , , and were used to present the squared correlation coefficients for the training set, cross-validation set, and external test set, respectively. Also , , and were adopted to present the root mean square errors for the training set, cross-validation set, and external test set, respectively.

The final model was built by the SVR based on the Gaussian kernel function (RBF) with the parameters  , , and that are 2.0, 0.05, and 1.0, respectively. The Gaussian kernel function (RBF) is given as follows:

The model based on the above parameters with original data is given as follows: where is the Lagrange coefficient of support vectors.

The experimental versus predicted pIC50 values based on the SVR model for the training set and test set are shown in Figure 3. As a result, the values of , , and were 0.953, 0.815, and 0.884, respectively. And the values of , , and were 0.123, 0.247, and 0.193, respectively. Figure 3 illustrates that the regression straight line is appropriate not only for the fitting pIC50 values of the training set but also for the predicted pIC50 values of the external test set. Table 2 shows the experimental and the calculated values over the training set and the test set. From Figure 3 and Table 2, it can be concluded that the predicted values are in good agreement with the experimental ones. Figure 4 illustrates the dispersion plot of the residuals for the training and test sets. The predicted values are randomly dispersed around the zero-value line in Figure 4. It means that the model is appropriate for the data.

tab2
Table 2: Experimental and predicted pIC50 for the training and test sets.
798743.fig.003
Figure 3: Predicted versus experimental pIC50 for the training (circles for fitting and triangle for CV, respectively) and test (stars) sets.
798743.fig.004
Figure 4: Dispersion plot of the residuals for the training and test sets.
3.3. Analysis of the New Method

The secondary development program developed in this work was used to establish a robust model with , , and ,  respectively. In order to validate the generalization and reliability of the descriptors obtained by using our secondary development program, the same training and test sets were also constructed and optimized at the level of theory with the Gaussian program; 1262 descriptors were computed by HyperChem 7.5 program [20], JChem for Excel package [34], and the Dragon program [37]. And a robust and reliable model was obtained with , , and ,  respectively. The statistical comparisons were summarized in Table 3.

tab3
Table 3: Comparative statistical parameters obtained by the secondary development program and the Gaussian program concerning the same compounds.

It is indicated that it takes less than 30 minutes for a molecule from the structure optimization to the computation of descriptors by using the second development program. In contrast, more than 36 hours were taken based on the Gaussian program. These results show that the computing speeds are greatly improved by using the secondary development program, while the statistical parameters of models are as good as those obtained with the Gaussian method. Therefore, the second development program is very helpful not only for saving the time of descriptor computation but also for providing the effective QSPR models online available in the future.

In a benchmark test, the support vector regression (SVR) was contrasted with the multiple linear regression (MLR) and the back propagation-artificial neural network (BP-ANN) on the . The statistical comparisons were shown in Table 4. From Table 4, SVR has a better generalization ability in our work.

tab4
Table 4: of different methods.
3.4. The Online Web Server

Since user-friendly and publicly accessible online servers represent the trend for developing more useful models or predictors, we established a web server for predicting the DPP-IV inhibitory activity pIC50 at http://chemdata.shu.edu.cn:8080/QSARPrediction/index.jsp.

The web server allows users to upload the MOL-format file of a molecule, and the server will return the result of prediction according to the model of our mRMR-BFS-SVR method. In this course, the Calculator Plugins [49] of ChemAxon was invoked in the background program. The server developed has the most outstanding characteristic that users need to do nothing except for uploading the file of the unknown small molecule. Then they can get the predicted result after waiting for some time. It is a remarkable advance compared to our previous work [17, 20, 36].

4. Conclusions

In this paper, the secondary development program was proposed to bring an efficient and fast calculation means for molecular descriptors. The mRMR-BFS was adopted in the procedure of feature selection. The SVR was used to construct the model to map DPP-IV inhibitors to their corresponding inhibitory activity. The , , and of the model are 0.953, 0.815, and 0.884, respectively. These results are as good as those obtained with the Gaussian method. The web server, which provides a quick approach to predict the DPP-IV inhibitory activities pIC50 of unknown small molecules based on their MOL-format files, was established by using our secondary development program at http://chemdata.shu.edu.cn:8080/QSARPrediction/index.jsp. A user-friendly and rapid approach whose accuracy is approximate with the Gaussian method is proposed in this work.

Acknowledgments

This study was supported by the National Science Foundation of China (20973108, 20902056), the Shanghai Education Committee Project (11ZZ83), and the Leading Academic Discipline Project of Shanghai Municipal Education Commission, China (J50101). The authors also acknowledge ChemAxon for their excellent products.

References

  1. M. H. Kim and M. K. Lee, “The incretins and pancreatic beta-cells: use of glucagon-like peptide-1 and glucose-dependent insulinotropic polypeptide to cure type 2 diabetes mellitus,” Korean Diabetes Journal, vol. 34, no. 1, pp. 2–9, 2010. View at Publisher · View at Google Scholar
  2. A. Sarashina, S. Sesoko, M. Nakashima et al., “Linagliptin, a dipeptidyl peptidase-4 inhibitor in development for the treatment of type 2 diabetes mellitus: a phase I, randomized, double-blind, placebo-controlled trial of single and multiple escalating doses in healthy adult male japanese subjects,” Clinical Therapeutics, vol. 32, no. 6, pp. 1188–1204, 2010. View at Publisher · View at Google Scholar · View at Scopus
  3. K. Augustyns, P. Van der Veken, and A. Haemers, “Inhibitors of proline-specific dipeptidyl peptidases: DPP IV inhibitors as a novel approach for the treatment of type 2 diabetes,” Expert Opinion on Therapeutic Patents, vol. 15, no. 10, pp. 1387–1407, 2005. View at Publisher · View at Google Scholar · View at Scopus
  4. A. E. Weber, “Dipeptidyl peptidase IV inhibitors for the treatment of diabetes,” Journal of Medicinal Chemistry, vol. 47, no. 17, pp. 4135–4141, 2004. View at Publisher · View at Google Scholar · View at Scopus
  5. S. D. Edmondson, A. Mastracchio, R. J. Mathvink et al., “(2S,3S)-3-amino-4-(3,3-difluoropyrrolidin-1-yl)-N,N-dimethyl-4-oxo-2-(4-[1, 2,4]triazolo[1,5-a]-pyridin-6-ylphenyl)butanamide: a selective α-amino amide dipeptidyl peptidase IV inhibitor for the treatment of type 2 diabetes,” Journal of Medicinal Chemistry, vol. 49, no. 12, pp. 3614–3627, 2006. View at Publisher · View at Google Scholar · View at Scopus
  6. J. L. Duffy, B. A. Kirk, L. Wang et al., “4-Aminophenylalanine and 4-aminocyclohexylalanine derivatives as potent, selective, and orally bioavailable inhibitors of dipeptidyl peptidase IV,” Bioorganic and Medicinal Chemistry Letters, vol. 17, no. 10, pp. 2879–2885, 2007. View at Publisher · View at Google Scholar · View at Scopus
  7. J. Xu, L. Wei, R. J. Mathvink et al., “Discovery of potent, selective, and orally bioavailable oxadiazole-based dipeptidyl peptidase IV inhibitors,” Bioorganic and Medicinal Chemistry Letters, vol. 16, no. 20, pp. 5373–5377, 2006. View at Publisher · View at Google Scholar · View at Scopus
  8. J. Xu, L. Wei, R. Mathvink et al., “Discovery of potent, selective, and orally bioavailable pyridone-based dipeptidyl peptidase-4 inhibitors,” Bioorganic and Medicinal Chemistry Letters, vol. 16, no. 5, pp. 1346–1349, 2006. View at Publisher · View at Google Scholar · View at Scopus
  9. T. S. Garcia and K. M. Honório, “Two-dimensional quantitative structure-activity relationship studies on bioactive ligands of peroxisome proliferator-activated receptor δ,” Journal of the Brazilian Chemical Society, vol. 22, no. 1, pp. 65–72, 2011. View at Scopus
  10. G. C. García, I. Luque Ruiz, and M. Á. Gómez-Nieto, “Analysis and study of molecule data sets using snowflake diagrams of weighted maximum common subgraph trees,” Journal of Chemical Information and Modeling, vol. 51, no. 6, pp. 1216–1232, 2011. View at Publisher · View at Google Scholar · View at Scopus
  11. D. Jana, A. K. Halder, N. Adhikari, M. K. Maiti, C. Mondal, and T. Jha, “Chemometric modeling and pharmacophore mapping in coronary heart disease: 2-arylbenzoxazoles as cholesteryl ester transfer protein inhibitors,” MedChemComm, vol. 2, no. 9, pp. 840–852, 2011. View at Publisher · View at Google Scholar · View at Scopus
  12. V. Kovalishyn, V. Tanchuk, L. Charochkina, I. Semenuta, and V. Prokopenko, “Predictive QSAR modeling of phosphodiesterase 4 inhibitors,” Journal of Molecular Graphics and Modelling, vol. 32, pp. 32–38, 2012. View at Publisher · View at Google Scholar · View at Scopus
  13. B. Niu, Q. Su, X. C. Yuan, W. Lu, and J. Ding, “QSAR study on 5-lipoxygenase inhibitors based on support vector machine,” Medicinal Chemistry, vol. 8, no. 6, pp. 1108–1116, 2012.
  14. S. Paliwal, D. Seth, D. Yadav, R. Yadav, and S. Paliwal, “Development of a robust QSAR model to predict the affinity of pyrrolidine analogs for dipeptidyl peptidase IV (DPP-IV),” Journal of Enzyme Inhibition and Medicinal Chemistry, vol. 26, no. 1, pp. 129–140, 2011. View at Publisher · View at Google Scholar · View at Scopus
  15. V. Murugesan, N. Sethi, Y. S. Prabhakar, and S. B. Katti, “CoMFA and CoMSIA of diverse pyrrolidine analogues as dipeptidyl peptidase IV inhibitors: active site requirements,” Molecular Diversity, vol. 15, no. 2, pp. 457–466, 2011. View at Publisher · View at Google Scholar · View at Scopus
  16. Y. D. Gao, D. Feng, R. P. Sheridan et al., “Modeling assisted rational design of novel, potent, and selective pyrrolopyrimidine DPP-4 inhibitors,” Bioorganic and Medicinal Chemistry Letters, vol. 17, no. 14, pp. 3877–3879, 2007. View at Publisher · View at Google Scholar · View at Scopus
  17. X. Y. Yang, M. J. Li, Q. Su, M. Wu, T. Gu, and W. Lu, “QSAR studies on pyrrolidine amides derivatives as DPP-IV inhibitors for type 2 diabetes,” Medicinal Chemistry Research, 2013. View at Publisher · View at Google Scholar
  18. S. Peng, Z. Jian-Wei, Z. Peng, and X. Lin, “QSPR modeling of bioconcentration factor of nonionic compounds using Gaussian processes and theoretical descriptors derived from electrostatic potentials on molecular surface,” Chemosphere, vol. 83, no. 8, pp. 1045–1052, 2011. View at Publisher · View at Google Scholar · View at Scopus
  19. T. Gu, W. Lu, X. Bao, and N. Chen, “Using support vector regression for the prediction of the band gap and melting point of binary and ternary compound semiconductors,” Solid State Sciences, vol. 8, no. 2, pp. 129–136, 2006. View at Publisher · View at Google Scholar · View at Scopus
  20. J. Zhu, W. Lu, L. Liu, T. Gu, and B. Niu, “Classification of Src kinase inhibitors based on support vector machine,” QSAR and Combinatorial Science, vol. 28, no. 6-7, pp. 719–727, 2009. View at Publisher · View at Google Scholar · View at Scopus
  21. V. Kovalishyn, J. Aires-de-Sousa, C. Ventura, R. Elvas Leitão, and F. Martins, “QSAR modeling of antitubercular activity of diverse organic compounds,” Chemometrics and Intelligent Laboratory Systems, vol. 107, no. 1, pp. 69–74, 2011. View at Publisher · View at Google Scholar · View at Scopus
  22. L. Xing, R. Goulet, and K. Johnson, “Statistical analysis and compound selection of combinatorial libraries for soluble epoxide hydrolase,” Journal of Chemical Information and Modeling, vol. 51, no. 7, pp. 1582–1592, 2011. View at Publisher · View at Google Scholar · View at Scopus
  23. S. Kar, O. Deeb, and K. Roy, “Development of classification and regression based QSAR models to predict rodent carcinogenic potency using oral slope factor,” Ecotoxicology and Environmental Safety, vol. 82, pp. 85–95, 2012. View at Publisher · View at Google Scholar
  24. B. Niu, X. C. Yuan, P. Roeper et al., “HIV-1 protease cleavage site prediction based on two-stage feature selection method,” Protein and Peptide Letters, vol. 20, no. 3, pp. 290–298, 2013.
  25. B. Niu, Y. D. Cai, W. C. Lu, G. Z. Li, and K. C. Chou, “Predicting protein structural class with AdaBoost Learner,” Protein and Peptide Letters, vol. 13, no. 5, pp. 489–492, 2006. View at Publisher · View at Google Scholar · View at Scopus
  26. B. Niu, Y. H. Jin, K. Y. Feng et al., “Predicting membrane protein types with bagging learner,” Protein and Peptide Letters, vol. 15, no. 6, pp. 590–594, 2008. View at Publisher · View at Google Scholar · View at Scopus
  27. B. Niu, Y. H. Jin, K. Y. Feng, W. C. Lu, Y. D. Cai, and G. Z. Li, “Using AdaBoost for the prediction of subcellular location of prokaryotic and eukaryotic proteins,” Molecular Diversity, vol. 12, no. 1, pp. 41–45, 2008. View at Publisher · View at Google Scholar · View at Scopus
  28. B. Niu, Y. Jin, L. Lu et al., “Prediction of interaction between small molecule and enzyme using AdaBoost,” Molecular Diversity, vol. 13, no. 3, pp. 313–320, 2009. View at Publisher · View at Google Scholar · View at Scopus
  29. B. Niu, Y. Jin, W. Lu, and G. Li, “Predicting toxic action mechanisms of phenols using AdaBoost Learner,” Chemometrics and Intelligent Laboratory Systems, vol. 96, no. 1, pp. 43–48, 2009. View at Publisher · View at Google Scholar · View at Scopus
  30. B. Niu, L. Lu, L. Liu et al., “HIV-1 protease cleavage site prediction based on amino acid property,” Journal of Computational Chemistry, vol. 30, no. 1, pp. 33–39, 2009. View at Publisher · View at Google Scholar · View at Scopus
  31. Q. Su, W. C. Lu, B. Niu, X. Liu, and T. H. Gu, “Classification of the toxicity of some organic compounds to tadpoles (Rana Temporaria) through integrating multiple classifiers,” Molecular Informatics, vol. 30, no. 8, pp. 672–675, 2011. View at Publisher · View at Google Scholar · View at Scopus
  32. I. L. Lu, S. J. Lee, H. Tsu et al., “Glutamic acid analogues as potent dipeptidyl peptidase IV and 8 inhibitors,” Bioorganic and Medicinal Chemistry Letters, vol. 15, no. 13, pp. 3271–3275, 2005. View at Publisher · View at Google Scholar · View at Scopus
  33. T. Y. Tsai, T. Hsu, C. T. Chen et al., “Rational design and synthesis of potent and long-lasting glutamic acid-based dipeptidyl peptidase IV inhibitors,” Bioorganic and Medicinal Chemistry Letters, vol. 19, no. 7, pp. 1908–1912, 2009. View at Publisher · View at Google Scholar · View at Scopus
  34. L. Weber, “JChem base—chemAxon,” Chemistry World, vol. 5, no. 10, pp. 65–66, 2008.
  35. 2013, http://www.chemaxon.com/.
  36. S. S. Yang, W. C. Lu, T. H. Gu, L. M. Yan, and G. Z. Li, “QSPR study of n-octanol/water partition coefficient of some aromatic compounds using support vector regression,” QSAR and Combinatorial Science, vol. 28, no. 2, pp. 175–182, 2009. View at Publisher · View at Google Scholar · View at Scopus
  37. T. Todeschini, “Dragon 5.0: software for molecular descriptors,” in Milano Chemometrics and QSAR Research Group, University of Milano-Bicocca, Milan, Italy, 2004.
  38. V. Mukherjee, K. Singh, N. P. Singh, and R. A. Yadav, “Quantum chemical determination of molecular geometries and interpretation of FTIR and Raman spectra for 2,4,5- and 3,4,5-tri-fluoro-benzonitriles,” Spectrochimica Acta A, vol. 71, no. 4, pp. 1571–1580, 2008. View at Publisher · View at Google Scholar · View at Scopus
  39. Y. Chen, Z. Yi, S. J. Chen, J. S. Luo, Y. G. Yi, and Y. J. Tang, “Study of density functional theory for surface-enhanced raman spectra of p-aminothiophenol,” Spectroscopy and Spectral Analysis, vol. 31, no. 11, pp. 2952–2955, 2011. View at Publisher · View at Google Scholar · View at Scopus
  40. T. Zhang, “A leave-one-out cross validation bound for kernel methods with applications in learning,” Computational Learning Theory Proceedings, vol. 2111, pp. 427–443, 2001.
  41. J. Yuan, Y. M. Li, C. L. Liu, and X. F. Zha, “Leave-one-out cross-validation based model selection for manifold regularization,” in Advances in Neural Networks, vol. 6063 of Lecture Notes in Computer Science, pp. 457–464, 2010. View at Publisher · View at Google Scholar · View at Scopus
  42. M. Kompany-Zareh, “An improved QSPR study of the toxicity of aliphatic carboxylic acids using genetic algorithm,” Medicinal Chemistry Research, vol. 18, no. 2, pp. 143–157, 2009. View at Publisher · View at Google Scholar · View at Scopus
  43. M. Goodarzi, B. Dejaegher, and Y. Vander Heyden, “Feature selection methods in QSAR studies,” Journal of Aoac International, vol. 95, no. 3, pp. 636–651, 2012.
  44. C. Ding and H. Peng, “Minimum redundancy feature selection from microarray gene expression data,” in Proceedings of the IEEE Bioinformatics Conference, pp. 185–205, August 2003. View at Publisher · View at Google Scholar
  45. Z. He, J. Zhang, X. H. Shi et al., “Predicting drug-target interaction networks based on functional groups and biological features,” PLoS ONE, vol. 5, no. 3, Article ID e9603, 2010. View at Publisher · View at Google Scholar · View at Scopus
  46. B. Üstün, W. J. Melssen, and L. M. C. Buydens, “Facilitating the application of Support Vector Regression by using a universal Pearson VII function based kernel,” Chemometrics and Intelligent Laboratory Systems, vol. 81, no. 1, pp. 29–40, 2006. View at Publisher · View at Google Scholar · View at Scopus
  47. E. Frank, M. Hall, L. Trigg, G. Holmes, and I. H. Witten, “Data mining in bioinformatics using Weka,” Bioinformatics, vol. 20, no. 15, pp. 2479–2481, 2004. View at Publisher · View at Google Scholar · View at Scopus
  48. L. Chen, L. Lu, K. Feng et al., “Multiple classifier integration for the prediction of protein structural classes,” Journal of Computational Chemistry, vol. 30, no. 14, pp. 2248–2254, 2009. View at Publisher · View at Google Scholar · View at Scopus
  49. 2013, http://www.chemaxon.com/products/calculator-plugins/.