Abstract

The paper suggests a new method that combines the Kennard and Stone algorithm (Kenstone, KS), hierarchical clustering (HC), and ant colony optimization (ACO)-based extreme learning machine (ELM) (KS-HC/ACO-ELM) with the density functional theory (DFT) B3LYP/6-31G(d) method to improve the accuracy of DFT calculations for the Y-NO homolysis bond dissociation energies (BDE). In this method, Kenstone divides the whole data set into two parts, the training set and the test set; HC and ACO are used to perform the cluster analysis on molecular descriptors; correlation analysis is applied for selecting the most correlated molecular descriptors in the classes, and ELM is the nonlinear model for establishing the relationship between DFT calculations and homolysis BDE experimental values. The results show that the standard deviation of homolysis BDE in the molecular test set is reduced from 4.03 kcal mol−1 calculated by the DFT B3LYP/6-31G(d) method to 0.30, 0.28, 0.29, and 0.32 kcal mol−1 by the KS-ELM, KS-HC-ELM, and KS-ACO-ELM methods and the artificial neural network (ANN) combined with KS-HC, respectively. This method predicts accurate values with much higher efficiency when compared to the larger basis set DFT calculation and may also achieve similarly accurate calculation results for larger molecules.

1. Introduction

In the past few decades, quantum chemistry has attracted remarkable attention and made significant improvements along with the corresponding fields of physics, mathematics, and computer science [17]. Quantum chemical methods have captured the physical essence of molecules through numerous research efforts, to the point where various molecular properties (including geometry, dipole moment, polarizability, thermodynamics, and excited states) can be obtained without experiments. However, quantum chemical calculations for large molecules can be quite demanding, in particular for high-accuracy calculations that need to solve the Schrödinger equation thoroughly. Therefore, accuracy often has to be sacrificed for practical reasons, for example, introducing approximations when solving the wave function for large molecular systems. Despite these approximations, the size of molecule is still limited. For a large system with a regular structure, the quantum chemical methods can only give a qualitative description. For complex systems, it is hard to calculate with existing quantum chemical methods; even if the calculation can be carried out, the accuracy might be very low and the error is far beyond the specifications in scientific research [8, 9]. Therefore, alternative methods for large or complex systems need to be developed to meet the challenges of molecular calculations.

Artificial intelligence methods can resolve the accuracy problem of quantum chemical calculations for molecular bond properties by establishing the relationship between the calculated and experimental values instead of strictly solving wave functions with very high cost. By this method, high-precision calculation results can be obtained without wasting time on advanced quantum chemical methods and ultra-large basis sets because this combination-type calculation method can take mutual advantages of two methods to reduce the systematic errors induced by defects of theories and functions with simple physical parameters adopted in artificial intelligence methods. Thus quantum chemical calculations with low-level methods and small basis sets are sufficient for obtaining valid data from artificial intelligence methods. Meanwhile, small basis sets enable quantum chemical calculations to apply to larger molecules, further increasing the chance to design novel lead molecules.

Although the artificial intelligence method is a relatively new multidisciplinary research field, in the past decade, it has received a lot of attention and thus has developed rapidly. The combination strategy consists of the quantum chemical method first being used to obtain the molecular properties and then take the quantum chemical calculation results as inputs for the statistical methods to establish the relationship between the experimental and calculation values. There are many choices on statistical methods including linear methods such as linear regression [1014], nonlinear methods such as neural networks [1537], and support vector machines [3840]. Usually, the linear regression method is simple and intuitive, but problems in molecular systems are mainly nonlinear; for the same descriptor inputs, neural networks can better solve complicated nonlinear problems that might be too difficult to be modeled by mathematical equations [27]. However, the traditional neural networks have the following major drawbacks. (1) The speed of training is slow. The gradient descent method generally takes multiple iterations to converge, so the training process takes a long time. (2) Because it is easy to fall into a local minimum, the global minimum can be missed during the calculation. (3) The selection of learning rate is sensitive. The learning rate has a great impact on the performance of the neural network, so the proper must be chosen to obtain a stable network. If is too small, the rate of algorithm convergence is very slow and the training process takes a long time; on the other hand, if it is too large, the training process might be unstable (divergent). Therefore, it is important to explore a training algorithm where the learning rate is comparably high, which increases the possibility of acquiring the global optimum solution. This leads to better generalization for the prediction. The extreme learning machine (ELM) method avoids these defects, thus exhibiting a better performance than neural networks [3, 4].

Nitric oxide (NO) plays an important physiological role in the human life cycle [4150]. It is difficult to accurately obtain homolysis BDE values in NO carrier molecules due to the complexity of the experiments. Professor Cheng’s group recently devoted considerable efforts to measuring Y-NO bond homolysis BDE in solvents [5161], which provides a great opportunity for designing NO molecule carriers in silico. Hence, we herein use artificial intelligence and DFT to achieve high-accuracy homolysis BDE calculation results to better design potentially novel NO carriers.

The content of this paper is arranged as follows. First, it describes the HC, ACO, and ELM methods. Second, it illustrates the technology roadmap of the KS-HC/ACO-ELM method. Third, it discusses the calculation results of the Kenstone, HC, ACO, and ELM methods (i.e., the appropriate classification of the training set and the test set, clustering analyses for molecular descriptors with both the known and unknown class numbers, and the establishment of the ELM model, resp.). Finally, the results and the discussions are summarized.

2. Methods

2.1. Hierarchical Clustering (the Known Number)

The basic idea of the hierarchical clustering (HC) method is to classify samples in terms of distances. Firstly, it defines the distance between samples and classes and combines the nearest two classes into classes of samples, and then it recalculates the distance between the new class and other classes and classifies them according to the minimum distance. By reducing one class at a time, this process is repeated until all of the samples come into one class. The chart showing the clustering process is called a clustering chart. One of the features of the HC method is that when samples are classified into a class in a certain level, they will always belong to the same class in the subsequent divisions.

There are a variety of methods to define the distance between classes. Different definitions generate different HC analysis methods, which include the following: the shortest-distance method, the longest-distance method, the middle-distance method, the center of gravity method, the group-average method, the variable group-average method, the variable method, and the variance and sum method. The recurrence formula for these methods is where is the distance between classes. The values of parameters , , and vary according to the different methods and are adopted in the program. The default values of the parameters are used in the calculations.

2.2. Ant Colony Optimization (the Unknown Number)

One of the clustering problems is that the number of clusters is unknown. In the ant colony optimization (ACO) clustering method, the samples are thought of as ants with different attributes, and the cluster center is the food sources that the ants seek. Therefore, the sample clustering is the mechanism of ants looking for food sources.

If is a data set to be classified and is the feature set of the samples, the clustering algorithm is as follows.(1) Initially, allocate samples into classes because one sample belongs to one class.(2) Calculate Euclidean distances between classes and  : where represents the Euclidean distance between classes and , is the cluster center vector, and is the number of samples in class  .(3) Calculate the amount of pheromone on each path. Let be the radius of the cluster. is the residual pheromone at time on the path from to . The pheromone on the path () is where and and are constants.(4) Calculate the probability of merging into : where , stands for a number of a certain class, represents a collection of all numbers of classes, where the distance is less than or equal to in the class , is the total number of classes, and is the weight parameters.(5) If  , then is subsumed in the , the number of classes minus 1. is a given probability value. Recalculate the merged cluster center.(6) Determine whether there is merging or not. If there is no merging, stop the cycle; otherwise, go to step and continue the iteration.

2.3. Extreme Learning Machine

The extreme learning machine (ELM) was proposed by Huang et al. in 2006 [62]. The core principle of ELM is a single-hidden layer feed-forward neural network (SLFN), a typical structure of which is shown in Figure 1. This network consists of an input layer, a hidden layer, and an output layer. The neurons in the former two and latter two are fully connected. There are neurons corresponding to the input variables in the input layer; there are neurons in hidden layer; there are neurons corresponding to output variables in the output layer.

Briefly, suppose the connection weight between the input layer and the hidden layer is given by where represents the weight between neuron in the input layer and neuron in the hidden layer.

The connection weight matrix between the hidden layer and the output layer is given by where represents the weight value between the neuron in the hidden layer and the neuron in the output layer.

The threshold of the neuron in the hidden layer is given by

The input matrix and the output matrix in the training set with samples are given by

The activation function of the neuron in the hidden layer is ; the output of net can be seen in Figure 1. Consider where , .

Equation (10) can be expressed as where is the transpose of the matrix and is the output matrix of the hidden layer of the neural network. Consider

Because the number of neurons in the hidden layer is equal to the number of samples in the training set, SLFN can approach the training samples with zero error for any and ; that is, where ,

However, when the number of the training set is relatively large, the number of the hidden layer neurons is usually smaller than to reduce the computational complexity. Meanwhile, the training error of SLFN can approach any ; that is,

Therefore, when the activation function can be infinitely differentiable, not all of the parameters of SLFN need to be adjusted. The values of and can be chosen randomly before training and remain the same in the rest of the selection process. The connection weights between the hidden layer and output layer can be obtained by solving the equations to obtain the least squares solution: the estimated solution is , where is the Moore-Penrose generalized inverse of the output matrix of the hidden layer .

From the above analysis, ELM can randomly generate and before training. It is only needed to determine the number of hidden layer neurons and the activation function of the hidden layer neuron (infinitely differentiable), and then can be calculated. Generally, ELM learning algorithms follow the next steps.(1)Determine the number of the neurons in the hidden layer and randomly set the connection weight between the input layer and the hidden layer and the bias of the neurons in the hidden layer.(2)Select an infinitely differentiable function as the activation function of the neurons in the hidden layer, and then calculate the matrix of the hidden layer output.(3)Calculate the output layer weight .

3. Technology Roadmap

The technology roadmap of the ELM method based on KS-HC/ACO to improve the accuracy of the molecular bond energies calculated by DFT is shown in Figure 2. It consists of three parts. Part one is the collection of the data, including molecular bond data and the molecular descriptors. The second part describes the theoretical calculation. Ninety-two important organic NO carrier molecules are studied, 12 molecular descriptors are calculated by DFT B3LYP/6-31G(d) (, , , , , , , , , , , and ), and homolysis BDE experimental values are included [34]. The Kenstone method is used to divide the training set and the test set. HC and ACO are taken as different classification methods; the former requires a predefined number of the classes to select the appropriate molecular descriptors that will be modeled by the ELM method, while the latter does not need to specify the number of classes. The third part aims at testing and analyzing the models. When the calculation results are not satisfied, the algorithm should be optimized repeatedly up to a reasonable accuracy. At the same time, it is suggested to determine the comparison test.

4. Results and Discussion

4.1. Kenstone Calculation Results

The basic idea of the Kenstone method [63] is to divide the training and test sets according to the distance between two data points in the data set. First, the furthest points are put into the training set; the distances between the remaining points and the training data are then calculated. The minimum distances are retained while the maximum distance is selected and put into the training data one by one until the training set population reaches the desired number. While random assignment of the training and test sets can occasionally yield a better calculation result, it cannot guarantee the best result or an outlier-free outcome. For the Kenstone method, these problems are avoided, which means that the KS-HC/ACO-ELM model always yields good results and generalizability. In this study, 80 molecules selected with the Kenstone method are regarded as the training set and 12 molecules represent the test set. Molecules 13, 19, 21, 27, 31, 35, 36, 43, 55, 70, 78, and 81 are in the test set, and the rest are in the training set. It can be seen from the Kenstone calculation results that molecules 13, 19, 21, 27, 31, 35, 36, and 43 contain N–NO; molecule 55 contains O–NO; molecules 70 and 78 contain S–NO; and molecule 81 contains C–NO. This shows that the KS method can achieve diversity and balance for the training and test sets.

4.2. Hierarchical Clustering Calculation Results

It is critical for the HC to adopt the appropriate distance and clustering method. It is best if the method can exclude the dimensional effect, and usually the standard Euclidean distance is a reasonable choice. As for the clustering method, the shortest-distance method is too contracted and the longest-distance method is too expanded. These two methods are simple, but they belong to extreme classification methods. Therefore, the more appropriate methods are the average-distance method, the weighted average method, and the center of gravity method.

There is no existing standard for evaluating the quality of clustering results, and the application of the clustering method depends on the researchers’ application skills and experiences with the classification objects. If the clustering method is applied properly, the critical scale and the distance matrix could be highly correlated. The correlation is calculated by cophenetic correlation coefficients, which can be used to evaluate the clustering results. The value of the cophenetic correlation coefficient is between zero and one. If the clustering is effective, the value is large. However, the high value is only statistically significant; it does not mean that the result is physically effective. Sometimes the result is not meaningful in practice.

The cophenetic correlation coefficients are shown in Table 1. The clustering methods based on the Euclidean distance include the longest-distance method, the shortest-distance method, the average-distance method, the weighted-average method, the center of gravity method, the middle-distance method, and the sum of squares.

Based on the average-distance method of the Euclidean distance, 12 molecular descriptors are clustered into classes, and the numbers of classes are given from 2 to 11. This paper only takes for an example the 7 classes. The clustering pedigree chart of the average-distance method is shown in Figure 3. Numbers 1 through 12 on the horizontal axis in Figure 3 correspond to twelve molecular descriptors (, , , , , , , , , , , and ). The clustering calculation results are (1, 7, 2, 7, 4, 6, 4, 5, 5, 5, 5, and 3), which indicate that , , , and are clustered into classes 1, 2, 6, and 3, respectively; and are clustered into class 7; and into class 4; and , , , and into class 5. When the given numbers of the clustering classes are different, the analysis and principle are the same as those used in class 7.

4.3. Ant Colony Optimization Calculation Results

The HC method needs a predefined number of clusters, which includes every type of classification. The advantage is obvious when the sample (i.e., the number of molecule descriptors) is small, but when the sample is large, it is neither necessary nor practical to list every classification using an exhaustive method. Therefore, in this paper, an ACO without a predefined number of clusters is also applied to the screening of molecular descriptors to arrive at a generalized model of the ELM method for either a small or large number of descriptors.

According to the definition of the ACO, the optimization function is the minimum ratio of the within-class distance to interclass distance. This means the distance among the classes should be large, but the distance between two samples within a class should be small: where where is the factor of sample and is the factor of the center of the class .

The results of the ACO calculation of the clustering molecular descriptors (, , , , , , , , , , , and ) are (1, 2, 2, 2, 3, 4, 5, 6, 6, 6, 6, and 6), where the numbers in the bracket correspond to the classes of the molecular descriptors. From the calculation results it can be observed that twelve molecular descriptors can be divided into six classes: is class 1, , , and are class 2, , , and are classes 3, 4, and 5, respectively, and , , , , and are class 6.

4.4. ELM Calculation Results

The correlation coefficients between the 12 molecular descriptors and homolysis BDE experimental values were calculated, and they are 0.64, 0.46, 0.49, 0.02, 0.12, 0.18, 0.28, 0.51, 0.17, 0.43, 0.05, and 0.27. According to the magnitude of the correlation coefficients (e.g., by HC calculation) when and cluster into a class, is correlated higher with homolysis BDE than and will therefore be selected as the input of ELM. Similarly, when and are classified into one class, will be selected as the input of ELM. In the cluster consisting of , , , and , is selected as the input of ELM. , , , and are clustered into one class. Therefore, , , , , , , and are selected as the final inputs of ELM by using the HC method. In the same way, , , , , , and are selected as the final inputs of ELM after using the ACO clustering method. The errors of the test set in different methods are shown in Figure 4(a).

To assess the KS-HC/ACO-ELM method, the results calculated by the molecular descriptors without screening KS-ELM (using all descriptors), KS-HC-ELM (best results are obtained when the molecular descriptors cluster into 7 classes), and KS-ACO-ELM are compared with the results calculated by B3LYP/6-31G(d). A three layered artificial neural network (KS-HC-ANN) calculation was also performed for comparison. The ANN structure uses the same number of inputs and 6 hidden neurons (shown in Figure 4(b)). The corrected results show that the systematic errors of the calculation value are eliminated, that the homolysis BDE are significantly improved by the KS-ELM, KS-HC-ELM, and KS-ACO-ELM methods, the range of error is narrowed down, the STD calculated by the KS-ELM, KS-HC-ELM, and KS-ACO-ELM methods is much smaller than the calculation error by DFT B3LYP/6-31G(d), the errors are in a Gaussian distribution, and the values of most of the errors are approximately 0. As is clearly seen in Figure 4(b), the results from KS-HC-ELM approach the homolysis BDE experimental values better than those obtained from the ANN, and the calculation time is also much less than that in the ANN because the ANN learning algorithm requires a number of network training parameters set by manual work. ANN also easily leads to a local optimal solution, and its calculated results are very unstable. The ELM method only requires setting the number of hidden layer nodes of the network. It does not need to adjust the network input weights and hidden bias in the implementation process of the algorithm, and it can generate a unique optimal solution, so it has both high learning speed and good generalization performance.

The STD corrected by the KS-ELM method is much smaller than the error of the B3LYP/6-31G(d), and the STD calculated by KS-HC-ELM and KS-ACO-ELM is smaller than that by KS-ELM. That means that it is necessary to screen molecular descriptors to eliminate redundant descriptors. The STD is reduced from 4.03 to 0.30, 0.28, 0.29, and 0.32 kcal mol−1 after correction by the KS-ELM, KS-HC-ELM, KS-ACO-ELM, and KS-HC-ANN methods, respectively. It is quite simple to see that the result corrected by the KS-HC-ELM method is much closer to the experimental results. If some trivial features are introduced into ELM, the accuracy of the model might decrease, and if the parameters are selected inappropriately, overfitting could also appear. In this experiment, although the calculation results using the KS-HC-ELM method are superior to those by the KS-ACO-ELM method, it cannot be assumed that this will be the case under every circumstance because each method has its own advantages. The clustering methods can be divided into two categories, where one requires a predefined number of categories, while the other does not. Because the ELM inputs affect its performance and the best combination of molecular descriptors is difficult to determine, it is recommended that when the number of clustering molecular descriptors is smaller, both the HC method and the ACO method are suitable and that when the number of the molecular descriptors is larger, the ACO method is more appropriate.

5. Conclusions

Nitric oxide (NO) is a very important signaling molecule in mammalian systems. When there are problems resulting in the release of NO molecules in the body, NO carrier molecules may be taken as a drug to treat the disease. It is complicated to measure NO molecular bond homolysis BDE in various experimental methods, and it is rather difficult to achieve chemical accuracy. The DFT method has been very popular in computational chemistry for the past two decades because many computational studies have shown that DFT calculations can capture the physical essence of the molecules. For the calculation of small molecules, the accuracy can reach a high level, but for large molecules, the cost of calculation is too expensive and the calculation accuracy is quite poor.

In this paper, the combination of KS-HC/ACO-ELM and DFT calculation methods succeeds in improving the DFT calculation accuracy of NO bond homolysis BDE. The results show that for three KS-HC/ACO-ELM methods, the STD of homolysis BDE of 92 organic molecules in the test set decreases from 4.03 to 0.30, 0.28, and 0.29 kcal mol−1. This proves that the KS-HC/ACO-ELM method based on B3LYP/6-31G(d) can remarkably improve homolysis BDE calculations, and its result may be used as the reference value to experimental results with high accuracy. The adoption of Kenstone makes the model more generalized. Although the model is ambiguous and the space of the descriptors is complex, the reproducibility of the calculation is very important. The selection of the training set and test set using the Kenstone method makes the model more robust and reproducible and makes the calculation results more convincing. The selection of appropriate molecular descriptors using the HC/ACO can eliminate the subjective bias on some input descriptors. Compared to the ANN method, the nonlinear model created by ELM can not only avoid sensitive parameters and local minimum problems, but it can also achieve more accurate calculation results with less calculation time and computing resources.

The KS-HC/ACO-ELM method expands the feasibility and applicability of the B3LYP/6-31G(d) method, and the more experimental data and molecular descriptors there are, the higher calculation accuracy the KS-HC/ACO-ELM will achieve, leading to more obvious advantages. It is important that such a high-precision method can be used to design novel NO-releasing drug molecule. We believe that it is significantly effective for the KS-HC/ACO-ELM method to improve the accuracy of DFT B3LYP/6-31G(d) calculations, even with a smaller basis set (e.g., STO-3G, etc.). Meanwhile, this method can be applied to correct other properties such as Heterolytic Bond Dissociation Energies, Absorbed Energies, Ionization Energies, and heat of formation. Further studies are ongoing. This study provides an effective tool (the combination of the KS-HC/ACO-ELM and DFT methods) to improve and predict highly accurate homolysis BDE of the molecular NO carrier systems.

Acknowledgments

The authors gratefully acknowledge the financial support from the Science and Technology Development Planning of Jilin Province (20130522109JH, 20110364, and 20125002) and the Fundamental Research Funds for the Central Universities (11QNJJ008 and 11QNJJ027).