Abstract

The carbonic anhydrases (CAs) (or carbonate dehydratases) form a family of metalloenzymes that catalyze the rapid interconversion of carbon dioxide and water to bicarbonate and protons (or vice versa), a reversible reaction that occurs rather slowly in the absence of a catalyst. The β-CAs have been characterized in a high number of human pathogens, such as the fungi/yeasts Candida albicans, Candida glabrata, Cryptococcus neoformans, and Saccharomyces cerevisiae and the bacteria Helicobacter pylori, Mycobacterium tuberculosis, Haemophilus influenzae, Brucella suis, and Streptococcus pneumonia. The β-CAs in microorganisms provide physiological concentration of carbon dioxide and bicarbonate (CO2/) for their growth. Inhibition of β-CAs from the pathogenic microorganism is recently being explored as a novel pharmacological target to treat infections caused by the these organisms. The present study aimed to establish a relationship between the β-CAs inhibitory activity for structurally related sulphonamide derivatives and the physicochemical descriptors in quantitative terms. The statistically validated two-dimensional quantitative structure activity relationship (2D QSAR) model was obtained through multiple linear regression (MLR) analysis method using Vlife molecular design suits (MDS). Five descriptors showing positive and negative correlation with the β-CAs inhibitory activity have been included in the model. This validated 2D QSAR model may be used to design sulfonamide derivatives with better inhibitory properties.

1. Introduction

The CAs belong to the family of metalloenzymes that catalyze the rapid interconversion of carbon dioxide and water to bicarbonate and protons (or vice versa), a reversible reaction that occurs rather slowly in the absence of a catalyst. The active site of most carbonic anhydrases contains a zinc ion [1]. Genetically five different types of CAs enzymes are known till date. The α-CAs are present in vertebrates, protozoa, algae, and some bacteria and also in cytoplasm of green plants [2]. While the β-CAs are predominantly found in bacteria, algae, chloroplasts of both mono- and dicotyledons and some fungi and archaea [3], the γ-CAs are found in archaea and some bacteria [4]. Both the δ-and ζ-CAs forms are present only in marine diatoms [5].

These enzymes which catalyze the interconversion between carbon dioxide and bicarbonate, with release of a proton, are involved not only in pH homeostasis and regulation but also in biosynthetic reactions, such as gluconeogenesis and ureagenesis in animals, CO2 fixation (in plants and algae), and electrolyte secretion in a variety of tissues/organs, with many of the 16 mammalian CAs isozymes being the established drug targets for design of diuretics, antiglaucoma, antiepileptic, antiobesity, and/or anticancer agents [610].

The first bacterial CA to be recognized as a β-CAs was the CynT gene of Escherichia coli [11]. More than a dozen eubacterial β-CAs are now known [12], including enzymes from common pathogens such as Helicobacter pylori, Mycobacterium tuberculosis, and Salmonella typhimurium. β-CAs are also known in the Archaea (Methanobacterium thermoautotrophicum) [13], in yeast (Saccharomyces cerevisiae) [14], cyanobacteria (Synechocystis PCC6803) [15], carboxysomes of chemoautotrophic bacteria (Halothiobacillus neapolitanus) [16], and in green (Chlamydomonas reinhardtii) [17] and red algae (Porphyridium purpureum) [18].

The β-CAs in a microorganism provide physiological concentration of carbon dioxide and bicarbonate () for its growth. Thus, inhibition of β-CAs from the pathogenic microorganism is emerging as a novel pharmacological target to treat infections caused by them [19].

QSAR is a computerised statistical method which tries to explain the observed variance in the biological effect of compounds as a function of molecular changes caused by the nature of substituent. The quantitative structure-activity relationship (QSAR) approach became very useful and largely widespread for the prediction of biological activities, particularly in drug design. This approach is based on the assumption that the variations in the properties of the compounds can be correlated with changes in their molecular features [20].

Several agents belonging to different chemical classes like sulfonamides, sulfamates, aromatic and aliphatic carboxylates, boronic acids, dithiocarbamates, and so forth have been reported to cause β-CAs inhibition in the in vitro inhibition studies [2131]. Amongst these, sulphonamide derivatives have shown especially good β-CAs inhibitory activity and thus display a potential for development as effective antimicrobial agents [29]. While there is a report that correlates the structures of some sulphonamide derivatives with their α-CAs inhibitory activity [32], no attempts have so far been made to correlate the structure of reported β-CAIs with their inhibitory activity. Hence, it was thought appropriate to perform a QSAR study to understand the correlation between the physicochemical parameters and the β-carbonic anhydrase inhibitory (β-CAI) activity of the sulphonamide derivatives reported in the literature. It is expected that such 2D QSAR studies will provide better tools for rational design of promising β-CAIs.

2. Methodology

2.1. Material and Methods

All molecular modeling studies were performed using the VLife MDS [33]. The studies were carried out on Dell PC with a Pentium IV processor and Windows XP operating system. The structures of all compounds were sketched and cleaned in Chem Draw Ultra 8.0 version [34]. Energy minimization and geometry optimization were conducted using the merck molecular force field (MMFF) method with the root mean square gradient set to 0.01 kcal/molÅ and the iteration limit to 10, 000.

The structure of sulphonamide derivatives along with their inhibitory activity () against Candida albicans (Nce103) were taken from literature [21, 22, 29] and the data is presented in Tables 1, 2, and 3. These reported sulphonamide derivatives have been assayed for their CO2 hydration activity using an applied photophysics instrument [35]. The β-CAI activity data have been expressed as the negative logarithm (p) on the molar basis and regarded as dependant variable for quantitative analysis. A total of 65 molecules were divided into training and a test set to ensure external validation of model derived from the appropriate descriptors. Nearly 30% of the molecules, from total population, have been selected as test molecules by manual selection method. Selection of molecules in the training set and test is a key feature of any QSAR model. Therefore, care was taken while selecting the test set that it should contain the representative structure from the whole data set.

2.2. Descriptor Calculation

2D QSAR study requires the calculation of molecular descriptors. A large number of theoretical 2D individual descriptors such as Mol. Wt.,Volume, XlogP, and smr; physiochemical such as Estate Numbers, Estate contributions, Polar Surface Area, Element Count, Dipole moment, and Hydrophobicity XlogpA, Hydrophobicity SlogpA; and topological such as T_2_Cl_6, T_C_Cl_6, T_T_S_7, and T_T_Cl_7 type were computed. A total of 736 descriptors were calculated by QSAR Plus module within VLife MDS. Highly correlated descriptors were removed. Similarly descriptors having same values were also ignored. The reduced set of descriptors was then treated by Forward-Backward Stepwise Variable Selection for further reduction of nonsignificant descriptors and finally only five significant descriptors were considered in our 2D QSAR analysis.

2.3. Data Sets
2.3.1. Model-1

The training set contained 45 molecules: a01, a02, a04, a05, a06, a07, a09, a10, a11, a14, a15, a17, a18, a19, a23, a25, a27, p01, p02, p05, p06, p07, p08, p09, p10, p11, 12, p14, u01, u02, u03, u05, u07, u08, u09, u10, u14, u15, u16, u17, u19, u20, u22, u23, and u25.

The test set contained 20 molecules: a08, a16, a20, a26, a41, p03, p04, p13, p15, u04, u06, u11, u12, u13, u18, u21, u24, u26, u27, and u28.

2.3.2. Model-2

The training set contained 43 molecules: a01, a04, a07, a09, a11, a15, a19, a20, a23, a27, p01-p15, u01, u02, u03, u04, u05, u07, u08, u09, u10, u12, u13, u14, u15, u18, u19, u20, u23, u24, u27, and u28.

The test set contained 19 molecules: a02, a05, a06, a08, a10, a14, a16, a17, a18, a25, a26, a41, u03, u06, u07, u11, u16, u17, and u22.

2.4. Model Development

The 2D QSAR model was generated by MLR method by using V-Life MDS. The model shows the relation between the biological activity (dependant variable) and molecular descriptors (independent variables) by using linear equations. This method of regression estimates the values of the regression coefficients by applying least square curve fitting method. MLR is the traditional and standard approach for multivariate data analysis. Multivariate analysis is the analysis of multidimensional data matrices by using statistical methods. Such data metrices can involve dependent and/or independent variables. For getting reliable results, parameters were set such that the regression equation should generate number of independent variables (descriptors) 5 times less than that of compounds or molecules.

2.5. Statistical Analysis

Statistical quality of generated model was judged based on parameters such as squared correlation coefficient , crossed validated, which is relative measure of quality of fit and Fischer’s value-test which represents-ratio between the variance of calculated and residual variance, and pred_[36]. The best way to evaluate quality of regression model is internal validation of QSAR model. Mostly to check internal validation, leave-one-out (LOO) cross-validation method is used. In LOO method, one object (one biological activity value) is eliminated from training set and training dataset is divided into subsets (number of subsets = number of data points) of equal size. Model is built using these subsets and dependent variable value of the data point that was not included in the subset is determined, which is a predicted value. Mean of predicted will be same forand LOO (cross-validated correlation coefficient value) since all the data points will be sequentially considered as predicted in LOO subset. Same procedure is repeated after elimination of another object until all objects have been eliminated once. To calculate , the following equation was used: where,, andare predicted, actual, and mean values of the , respectively.is the predictive residual error sum of squares (PRESS). Definitive validity of model is examined by mean of external validation also, which evaluates how well equation generalizes. To calculate the pred_, the following equation was used: whereandare predicted and observed activity values, respectively, of test set compounds, andis the mean activity value of training set. Statistical significance of these models was further supported by “fitness plot” obtained for each model; this is a plot of experimental versus predicted activity of training and test set compounds and provides an idea about how fit the model was trained and how well it predicts activity of external test set (Figure 2) [3739].

3. Results and Discussion

MLR analysis was performed using Vlife MDS 3.5 on sulphonamide derivatives taken from literature [21, 22, 29]. For selection of variables a two-way stepping algorithm was used. The following MLR QSAR model-1 (3) was generated: The above model-1 (3) was not found to be satisfactory. Thus, to improve the quality of model, outliers (u21, u25, and u26) were identified from data set by calculating residuals value (Observed activity-Predicted activity) and removed. Compounds that have unexpected biological activity and are unable to fit in a QSAR model and are known as outliers [40]. Using data set (model-2, 2.3), new model-2 was generated:

From model-2 (4) we can conclude that using the two-way stepping algorithm, the most significant descriptors contributing to model are chi3cluster, SsssCHCount, and Polar surface area including P and S, K3alpha, and SsssOCount. The description of the descriptors used in the model is given in Table 4. Generated QSAR model shows high squared correlation coefficient between descriptors (chi3cluster, SsssCHCount, and Polar surface area including P and S, K3alpha, and SsssOCount) and β-CAI activity against Candida albicans Nce103. The squared correlation coefficient also explains 90% of the variance in biological activity. Cross-validation of model was performed by LOO method. Thequalifies it to be a valid model.

The contribution graph for 2D QSAR model (Figure 1) reveals that the descriptors chi3cluster and k3alpha are contributing 44.0% and 08%, respectively. Three more descriptors SsssCHcount and Polar surface area including P and S and SssOcount are contributing inversely 30.30%, 10.0%, and 02.0%, respectively, to biological activity.

Figure 2 shows the data fitness plot of model (4). The plot is an idea about linearity fit between the observed and predicted activity.

The correlation matrix given in Table 5 strongly supports the fact that no two descriptors used in model are correlated. The predicted activity for the molecules used in generating and testing the 2D QSAR model using (4) is presented in Table 6.

4. Conclusion

A set of 65 molecules of sulphonamide derivatives were subjected to 2D QSAR analysis using MLR to understand correlation between the physicochemical parameters and the β-CAI activity. A valid QSAR model for designing and predicting the β-CAI activity of newer sulphonamide derivatives has been successfully generated using MLR method. The 2D QSAR model (4) generated indicates that chi3cluster and k3alpha are positively contributing to β-CAI activity while SsssCHcount and Polar surface area including P and S, SssOcount are negatively contributing to β-CAI activity.

Acknowledgment

Authors would like to thank Principal of PDEA’s S. G. R. S. College of Pharmacy, Saswad, for providing facilities to complete this study.