Abstract

To search for newer and potent antileishmanial drugs, a series of 36 compounds of 5-(5-nitroheteroaryl-2-yl)-1,3,4-thiadiazole derivatives were subjected to a quantitative structure-activity relationship (QSAR) analysis for studying, interpreting, and predicting activities and designing new compounds using several statistical tools. The multiple linear regression (MLR), nonlinear regression (RNLM), and artificial neural network (ANN) models were developed using 30 molecules having pIC50 ranging from 3.155 to 5.046. The best generated MLR, RNLM, and ANN models show conventional correlation coefficients R of 0.750, 0.782, and 0.967 as well as their leave-one-out cross-validation correlation coefficients of 0.722, 0.744, and 0.720, respectively. The predictive ability of those models was evaluated by the external validation using a test set of 6 molecules with predicted correlation coefficients of 0.840, 0.850, and 0.802, respectively. The applicability domains of MLR and MNLR transparent models were investigated using William’s plot to detect outliers and outsides compounds. We expect that this study would be of great help in lead optimization for early drug discovery of new similar compounds.

1. Introduction

Leishmaniasis is a parasitic disease caused by protozoan parasites of the genus Leishmania and is generally recognized as an important public health problem, touching millions of people living mainly in large areas of tropical and subtropical regions. Currently, there are only a limited number of drugs that are available for the treatment and control of this Leishmaniasis disease, all of which are associated with limiting factors such as high toxicity, variable efficacy, long dosing schedules, and/or parenteral administration [1, 2]. To date, no vaccine against any clinica form of Leishmaniasis has been commercialized and its treatment relies solely on chemotherapy that has been based on the use of pentavalent antimonial drugs. Other drugs, such as pentamidine, miltefosine, and amphotericin B, have been used as alternative medications towards resistant parasites. With the emergence of some resistant strains, all those drug therapies cause serious side effects. Furthermore, general treatment is unaffordable for many afflicted countries. Therefore, there is always a need for designing newly potent, safer, and cheaper drugs [35].

A promising strategy for discovering new therapeutic leads is to study classes of compounds potentially bioactive or old active compounds for alternative uses. Nitrogen heterocycles such as quinolines, pyrimidines, acridines, phenothiazines, indoles quinones, in general, and particularly thiadiazole derivatives, as well as their reduced derivatives, have been tested in the last years in antileishmanial tests.

Thiadiazole derivatives since their discovery in twentieth century have demonstrated a broad spectrum of pharmacological properties [6]. They were used initially as antibacterial agents [7] and rapidly revealed an interesting antiproliferative activity against both protozoa and tumor cells [8, 9]. Consequently, these derivatives have been extensively used in antiparasitic chemotherapy and a large variety of new thiadiazole derivatives were synthesized and evaluated for their in vitro antileishmanial activity [10, 11]. Today, traditional methods for drug discovery and development have been gradually replaced by modern approaches, wherein computational techniques have become inevitable in drug development pipeline by reducing the amount of synthetic works and biological evaluations needed to achieve the required results.

In the present study, QSAR studies based on principal components analysis (PCA), multiple linear regression (MLR), nonlinear regression (RNLM), and artificial neural network (ANN) calculations were performed on a series of 36 of (5-nitroheteroaryl-1,3,4-thiadiazole-2-yl) piperazinyl derivatives [12], in order to identify the key structural features required to design new potent lead candidates of this class. The results extracted from this study might be helpful to design highly potent antileishmanial drugs.

2. Materials and Methods

2.1. Data Sources

The experimental antileishmanial activity IC50 (μM) of 36 thiadiazole derivatives was collected in previous study [12]. The dataset was split randomly into a training set (thirty molecules) to build the quantitative model and the remaining molecules were used to test the performance of the proposed model (Test set). The molecular structures of the studied molecules with their antileishmanial activity converted into pIC50 (-) are presented in Table 1.

2.2. Molecular Descriptors Generation

The calculation of electronic descriptors was carried out by using the Gaussian03W package [13]. The geometries of 36 thiadiazole derivatives were optimized with DFT method with the B3LYP functional and 6-31G (d) base set. Then, several related structural features were opted from the obtained results of calculation as follows: highest occupied molecular orbital energy (), lowest unoccupied molecular orbital energy (), dipole moment (μ), energy gap (ΔE), and total energy ().

ChemSketch program [14] was employed to calculate the other molecular descriptors such as the molar volume MV (cm3), the molecular weight MW (g/mol), the molar refractivity MR (cm3), the parachor Pc (cm3), the density D (g/cm3), the refractive index n, and the octanol/water partition coefficient .

2.3. Methods of Data Analysis
2.3.1. Principal Component Analysis (PCA)

The principal component analysis is a data analysis method used to transform a set of variables correlated to a new set of variables, called principal components. They are fewer but independent. Using these new variables, the dimensionality of the system is reduced with a minimum loss of information [15]. The obtained matrix of coordinates allows us to analyze the dispersion of individuals in the new defined space [1618]. After that, the principal component analysis (PCA) was used to determine the nonlinearity and non-multicollinearity among variables and to select descriptors that correlate with the activity.

2.3.2. Multiple Linear and Nonlinear Regressions

Multiple linear regression is used to study the relation between one dependent and several independent variables. The aim of this method is to minimize the difference between the actual and predicted values of any experimental effect and was used to select the descriptors to be used as input in the multiple nonlinear regression as well as in the artificial neural network. Multiple linear and nonlinear regressions are generated using the software XLSTAT version 2013 that were used to predict effects on the antileishmanial activity. Equations of models were mainly justified by the correlation coefficient (R), the mean squared error (MSE), Fisher’s F statistic, and the significance level (p-value) [19].

2.3.3. Artificial Neural Network (ANN)

This ANN method is performed in order to rise the chance of characterizing the studied compounds and to generate a predictive QSAR model between the molecular descriptors selected in the MLR equation and the observed activity values. The ANN analysis was applied by using the Matlab (2014) software, neural mounting tool (nntool) toolbox. Artificial neural network is a nonlinear empirical method [20] that is used in the prediction of an experimental effect, while its application is booming in many disciplines. It is, among others, a very interesting alternative method to traditional statistics for the data processing.

2.3.4. Internal and External Validation

To determine the stability of the predictive model and to test the influence of each sample on the final model, two basic principles, internal validation and external validation, were carried out in this study. Cross-validation technique is one of the most popular methods for internal validation. In fact, there are at least three cross-validation techniques: “holdout method”, “k-fold cross-validation”, and “leave-one-out cross-validation”. In this study, the internal predictive capability of the model was evaluated using leave-one-out cross-validation (Rcv). A good Rcv of ten indicates a good robustness and high internal predictive power of a QSAR model. However, recent studies [21] indicate that there is no evident correlation between the value of Rcv and actual predictive power of a QSAR model, suggesting that the Rcv remains inadequate as a reliable estimate of the model’s predictive power for all new chemicals. To test that reliably predictive power, the use of a set of external validation, not used for the development of the model, is required. As long as the original data set is large enough, the latter can be easily divided into two: a learning set in which the model is developed and a set of validation used to characterize its predictive power.

To further refine the predictive ability of the developed QSAR models, another group of metrics, metrics, determining the proximity between the observed and predicted activity was introduced by Roy and Roy [22]. The metrics are calculated based on the correlation of the observed and predicted response data. Presently two different variants of this parameter, and , are calculated for both the training (internal validation) and test (external validation) set. For an acceptable QSAR model, the value of should be > 0.5 and should be < 0.2.

3. Results and Discussion

3.1. Data Set for Analysis

QSAR study was performed using the activity values of 36 thiadiazole derivatives, as previously reported [12], to determine the quantitative relationship between the structures of the studied compounds and biological activity. The values of 12 chemical descriptors are shown in Table 2.

3.2. Principal Component Analysis

The total of 12 descriptors coding the 36 molecules was submitted to principal components analysis (PCA). The first three axes F1, F2, and F3 represent, respectively, 37.69%, 18.69%, and 17.81% of the total variance and they estimate 74.19% of the total information.

The PCA was conducted to identify the correlation between the different descriptors. It is also helpful for understanding the distribution of the compounds [20]. The correlation’s matrix of the 12 descriptors is shown in Table 3.

The correlation coefficients in the obtained matrix provide the information about the high or low interrelationship between the descriptors. Generally good colinearity (r > 0.5) was present between the majority of the variables. In this study, no descriptor strongly correlated with the others.

3.3. Multiple Linear Regressions

The obtained descriptors were used for the development of a mathematical linear model to predict quantitatively the physicochemical effects of substituents on the antileishmanial activity of 30 thiadiazole derivatives using backward selection in the multiple regression analysis.

In the study of the descendant MLR multiple linear regression based on the elimination of descriptors until a valid model was obtained, methods were employed to determine the best regression models.

Many attempts have been made to develop a relationship with the indicator variable of activity pIC50, but the best relationship obtained by this method is only one corresponding to the linear combination of three descriptors selected, the energy , the energy , and the octanol/water partition coefficient .

The resulting equation isThe positive correlation of these factors and log P with the value of the pIC50 in (1) shows that an increase in the values of these factors implies an increase in the value of the pIC50, whereas a negative correlation of the shows that an increase in the value of this factor indicates a decrease in the value of the pIC50. The correlations of the predicted and observed activities are illustrated in Figure 1.In the equation, N is the number of compounds, R is the coefficient of correlation, Rcv is the coefficient of correlation for cross-validation, MSE is the mean squared error, F is Fisher’s criterion, and P is the significance level.

High coefficient of correlation (R = 0.750) and lower mean squared error indicate that the model is more reliable. A P smaller than 0.05 means that the obtained equation is statistically significant at the 95% level. The obtained model was cross-validated by its applicable Rcv value (Rcv = 0.722) using the leave-one-out (LOO) method. A value of Rcv greater than 0.5 is the basic criteria to qualify a model as valid [21]. Additionally, the metric values ( and ) indicate that QSAR model is acceptable. This reported model was also used to predict the pIC50 values of the remaining 6 compounds in the test set. Such predicted values are also recorded in Table 5. is 0.84 and mean squared error MSE test is 0.119, which confirms that the proposed model has a better predictive ability. The correlation coefficients among descriptors in the model were calculated using the variance inflation factor (VIF), as shown in Table 4.

The VIF was defined as 1/(1 − R2), where R is the multiple correlation coefficient for an independent variable against all other descriptors in the model. If VIF is greater than 5, the models are unstable and must be eliminated; models with a VIF value of 1 to 4 can be accepted. Table 4 shows that all of the VIF values of the three descriptors are smaller than 5.0. Thus, there is no colinearity among the selected descriptors, and the obtained model is stable.

With the MLR models, the values of predicted pIC50 calculated from (1) and the observed values are given in Table 5.

3.4. Multiple Nonlinear Regression

We also used a nonlinear regression model to improve the structure-activity relationship and to evaluate the effect of the substituent. We applied the proposed descriptors by multiple linear regressions for 30 molecules in the whole formation and we used the correlation coefficient (R) and the mean squared error (MSE) to select the best performance of regression. We used a pre-programmed function of XLSTAT followingwhere a, b, c, d… represent the parameters and X1, X2, X3, X4… represent the variables.

The resulting equation wasThe activity values pIC50 predicted by this model are almost similar to that observed. Figure 2 shows a very regular distribution of activity values based on the observed values. The obtained coefficient of correlation in (4) is quite very interesting (0.782). The QSAR model expressed by (4) is cross-validated by its appreciable Rcv values (Rcv = 0.744) obtained using the leave-one-out (LOO) method. A value of Rcv greater than 0.5 is the important criterion for qualifying a QSAR model as valid [21]. In addition, the metric values ( and ) indicate that QSAR model is acceptable.

The robustness and predictive power of the model were further supported by the significant value (0.850) of the test set data. The observed and calculated pIC50 values are given in Table 5.

According to metric values, internal and external, the results obtained by MNLR are relatively better than those obtained by MLR, but the latter approach is more transparent and gives the most interpretable results and a good explanation of the descriptors associated with antileishmanial activity of 36 thiadiazole derivatives.

3.5. Artificial Neural Networks ANN

In order to increase the probability to characterize the compounds, artificial neural network can be used to generate predictive models of quantitative structure-activity relationship between a set of molecular descriptors obtained from the multiple linear regression and the observed activities. The ANN calculated activity model was developed using the parameters of the studied compounds. The correlation between ANN calculated and experimental activity values is very significant as illustrated in Figure 3.

The obtained correlation coefficient (R) value is 0.967 for this data set of the thiadiazole derivatives. This confirms that the artificial neural network (ANN) results are the best to predict the quantitative structure-activity relationship model. Furthermore, the good results obtained with test set show that the ANN model is the high predictive power. The predicted activities calculated with the artificial neural network and the observed values are given in Table 5. A comparison of the quality of MLR, MNLR, and ANN models shows that the ANN model has substantially better predictive capability because the ANN approach gives better results than MLR and MNLR. ANN was able to establish a satisfactory relationship between the selected descriptors and the activity of the studied compounds.

3.6. Domain of Applicability

To estimate the reliability of any QSAR model and its ability to predict new compounds, the domain of applicability must be essentially defined [23]. The predicted compounds that fall within this domain may be considered as reliable. The applicability domains were discussed with the Williams graphs in Figures 4 and 5 of the MLR and MNLR models, respectively, in which the standardized residuals and the leverage values () are plotted.

It is based on the calculation of the leverage for each molecule, for which QSAR model is used to predict its activity:where is the row vector of the descriptors of compound i and X is the variable matrix deduced from the training set variable values. The index T refers to the matrix/vector transposed. The critical leverage is generally fixed at (3k + 1)/N, where N is the number of training molecules and k is the number of model descriptors. If the leverage value h of molecule is higher than the critical value (), i.e., , the prediction of the compound can be considered as not reliable.

From Figure 4, the leverage values () of any compound in the training and test sets are less than the critical value (). Also, the standardized residuals of all compounds in the training and test sets are less than three standard deviation units (±3σ). Therefore, the predicted activity by the developed MLR model is reliable.

The Williams plot for the MNLR model is shown in Figure 5. It is obvious that there is no outlier in training set and no outside in test set. Therefore, the predicted activity by the MNLR model is reliable.

3.7. Proposed Novel Compounds

Consequently, with MLR and MNLR approaches, we can design new compounds with different and improved values of antileishmanial activity than the studied compounds (Table 6). Taking into account the above results, we added suitable substitutions and then calculated the activities of the new compounds using the proposed model in (1) and (4).

The leviers of new compounds are less than the critical value (). Therefore they are regarded reliable compounds for designing new compounds with different and improved values of activity compared to the studied compounds. These results provided here can be used for drug design and development of new and safer drugs is warranted.

4. Conclusion

The statistical analysis methods were used to develop quantitative structure-activity relation models of the antileishmanial activity of (5-nitroheteroaryl-1,3,4-thiadiazole-2-yl) piperazinyl derivatives. The artificial neural network had substantially better predictive capability than the multiple linear and nonlinear regressions, with greater predictive power. We established satisfactory relations between several descriptors and antileishmanial activity. The results show that the proposed models in this paper can predict activity accurately and that the selected descriptors are pertinent to explain this activity. The accuracy and predictability of the proposed models were illustrated by comparison of the predicted values of the studied activity for the different models (Table 5). The applicability domain of the proposed models was investigated using William's plot to detect the subspace of chemical structures that can be predicted reliably by the two regressions models. The proposed methods will reduce the time and cost of synthesis and determination of the antileishmanial activity of (5-nitroheteroaryl-1,3,4-thiadiazole-2-yl) piperazinyl derivatives. Furthermore, the descriptors are sufficiently rich in chemical, electronic, and topological information to encode structural features that could be used with other descriptors in the development QSAR models.

Conflicts of Interest

The authors declare that there are no conflicts of interest regarding the publication of this paper.

Acknowledgments

We are grateful to the “Association Marocaine des Chimistes Théoriciens” (AMCT) for its pertinent help concerning the programs.