#### Abstract

The interest of scientists in nanostructures has been increased in the last years and proper methods for their assessment are needed.* In silico* methods found their usefulness in the replacement of experimental evaluation and are successfully used as efficient alternatives for estimation and prediction of compound’s properties or activities. In this paper, it is shown that a Quantitative Structure-Property Relationship method is proper to be applied also on nanostructures. Based on computational experiment, several models to describe the total strain energy of C_{42} fullerene isomers were obtained and their characteristics are presented. Furthermore, the best performing model obtained on C_{42} fullerene isomers was validated on C_{40} fullerene isomers.

*This paper is dedicated to Professor Mircea V. Diudea on the occasion of his 65th birthday.*

#### 1. Introduction

Since their discovery in 1985 [1], fullerenes attracted interest in different fields of science, including medical field (e.g., for potential use as antibiotics [2–4], as inhibitors of erythroid cells—fullerenol [5], as drug delivery system [6], or as inhibitors of inflammatory mediators [7]). Fullerene molecules are constructed from carbon atoms and take the shape of sphere (also known as buckyballs), ellipsoid, or tube [8]. First spherical fullerene, C_{60}, was discovered in 1985 [1]. Fullerenes have different properties and showed different number of associated isomers (Table 1) [9]. The smallest fullerene (C_{28}) was stabilized by metal encapsulation (with Ti, Zr, and U) by Dunk et al. [10]. Chen et al. showed that C_{32} fullerene has stronger aromaticity compared with C_{30} and C_{34}, respectively [11]. Fifteen distinct isomers with different energies were reported by Manna and Ghanty who encapsulate U into various C_{36} cages [12]. Muhammad et al. showed that C_{20} is a closed-shell fullerene and fullerenes C_{26} and C_{30} are pure open-shell compounds, whereas C_{36}, C_{40}, and C_{42} are intermediate open-shell compounds [13].

The C_{42} fullerenes are small, not necessary spherical cages. The C_{42} cages enclosed high pentagon/hexagon ratio [14]. Fullerene C_{42} along with C_{60} showed highest values of the main peak on Matrix-Assisted Laser Desorption Ionization Time-of-Flight (MALDI-TOF) on mass spectrometric measurement [15].

Some activities of fullerenes have been modeled using quantitative structure-activity relationship (QSAR) approaches (such as anti-HIV protease inhibition activity [16], antiviral activity [17], and drug delivery system [18]). However, C_{60} received the main attention while other fullerenes were neglected in regard of QSAR/QSPR (Quantitative Structure-Property Relationship) modeling. The aim of our research was to model the total strain energy of the isomers of C_{42} fullerene using the structural information.

#### 2. Materials and Methods

All C_{42} fullerene isomers were included in the analysis. Data related to continuum elasticity expressed as total strain energy (TSE in eV) and the structures as files of C_{42} fullerene isomers were taken from [19] (Table 2).

The analysis was conducted on the downloaded file of the C_{42} isomers without any modification on the available geometry. According to [19], the fullerene geometries were based on the geometry of the structures in Yoshida’s Fullerene Library (UNIX files) and reoptimized using Dreiding-like force-field [20]. Here the obtained geometry is used.

The steps applied in the analysis are depicted in Scheme 1.

In the first step of the analysis the downloaded files were translated into file with Spartan software (https://www.wavefun.com/products/spartan.html). In the second step the file is transformed as file using Babel software (http://openbabel.org). The partial charges were calculated in the third step using HyperChem software (http://www.hyper.com/) by applying PM3 (Parameterized Model number 3 [21]) single point (energy) semiempirical calculations. The structural features of the investigated nanoclass of compounds were extracted using unsymmetrical Szeged set, an extension of corresponding Szeged Matrix [22] (forth step). The calculated values of the structural descriptors and the collected values of total strain energy were included in nano-QSPR modeling in the fifth step of the analysis and models with the highest goodness-of-fit (defined as highest correlation coefficients) were analyzed and validated in leave-one-out and leave-many-out analyses [23, 24].

Leave-one-out analysis retrieves valid models if determination coefficient () takes values higher than 0.5. Leave-many-out analysis was conducted for the models with highest abilities in estimation expressed as the highest value of the correlation coefficient. The set was split using a simple random technique [25] in training and test with 2/3 of compounds in training set. The models obtained in training sets were used to predict the TSE in the test sets. The leave-many-out analysis was run five times for equations identified as being with highest estimation and internal prediction abilities in order to assess their prediction abilities.

The assessment of the prediction ability was done on an external dataset represented by C_{40} isomers considering the same property. The TSE values and the structures for external validation were taken from the same source as C_{42} isomers: http://nanotube.msu.edu/fullerene/fullerene.php?C=40 (accessed December 20, 2015). Several metrics were used to assess the prediction ability of the model [23, 24]: determination coefficient on the external set (), predictive square correlation coefficient on external set (, [26]), external prediction ability (), root mean square error of prediction (RMSEP), mean absolute error of prediction (MAEP), percentage predictive error (%PredErr), and concordance correlation coefficient (CCC [27]).

#### 3. Results and Discussion

Structural information of the investigated C_{42} isomers was obtained by calculation of the pool of descriptors given by Szeged Matrix Property Indices (SMPI) method [28]. Performing models in regard of goodness-of-fit (highest correlation coefficient) with 1, 2, 3, and 4 SMPI descriptors was obtained and is given in (1)–(4):where is total strain energy estimated by the model; IJUGE, IIUGF, IFEGE, IFETB, and IFUGB are SMPI descriptors. Two descriptors (IFETB and IFUGB) account for the atomic number as atomic property; the other two descriptors account for electronegativity (IJUGE and IFEGE), while one accounts for the first ionization energy (IIUGF). The investigated property is related to the geometry of compounds (fourth letter “G” in the name of descriptors) with one exception when it is related to topology (fourth letter “T” in the IFETB descriptor). The other letters reflect the linearization operator (first letter), matrix operation (second letter), and interaction descriptor (third letter).

As expected, the determination coefficient increases as the number of descriptors in the models increases, while the standard error of the estimate decreases (Table 3).

The distance between determination coefficient of the model and determination coefficient obtained in leave-one-out analysis varied from 0.0027 to 0.0227, the smallest distance being obtained by (3) (Table 3). On the other hand, the smallest difference between standard errors (estimation model and leave-one-out model) is obtained by the same model (3).

The analysis of the results presented in Table 3 showed that the model with four descriptors is the one with smallest percentage of prediction error. Furthermore, the data on the scatter closest to the straight line is observed for the model given by (4) (Figure 1). Figure 1 shows the absence of the differences between models from (3) and (4), with the dispersion of the point in the scatter closest to the line for model given by (4).

**(a)**

**(b)**

**(c)**

**(d)**

The main characteristics of the models given by (3) and (4) obtained in leave-many-out analysis (training versus test analysis; 2/3 of compounds in training set run 5 times) are presented in Table 4.

The results presented in Table 4 showed the stability of the models, with internal prediction power (defined as determination coefficient in test sets) closed to the estimation power (determination coefficient in training set) from both investigated models. Therefore, the results obtained in training sets closely follow the results on the whole sample for (3) with in the same range when two decimals are of interest. The obtained in test set in all five runs of the leave-many-out analysis was equal to 0.99, so slightly higher than the obtained in training sets (0.98). In three cases out of five, the in training sets for (4) was in the same range for two decimals with the value given in Table 3. However, without any exception, the in test sets was smaller than the in training sets for (4), with values that varied from 0.0005 (id 7 in Table 4) to 0.0264 (id 6 in Table 4). These results showed that (3) performs slightly better in terms of determination coefficients in leave-many-out analysis.

The plots of the models obtained in the fourth run for (3) and fifth run for (4), as examples, are given in Figure 2.

**(a)**

**(b)**

The equations identified with estimation power and internal prediction abilities, namely, (3) and (4), were further applied on C_{40} isomers to test the external prediction abilities. The prediction power of (4) proved to be better compared with prediction power of (3) (see Figure 3 and Table 5).

**(a)**

**(b)**

Despite the fact that the predictive square correlation coefficient on external set is higher for (3) compared with the value obtained with (4), all other calculated metrics sustain that the model given by (4) has better prediction abilities (highest determination coefficient on external set, lowest mean absolute error of prediction, and lowest percentage of predictive error; see Table 5). Furthermore, the analysis of the overall spread of the points in the scatter-plot leads to the conclusion that (4) had better prediction abilities compared with (3). Nevertheless, the mean of residuals proved to be significantly different than the expected value (zero). It could be concluded that the model given by (4) better fit the data on which it was constructed compared with all other models. Nevertheless,* are the structural features extracted by SMPI descriptors on C*_{42}* isomers able to predict the TSE on C*_{40}* isomers*?

SMPI descriptors used by (3) and, respectively, (4) were used to predict the TSE on C_{40} isomers. One out the three descriptors from (3) proved to have the slope not significantly different by zero and was not included in further analysis. The identified models obtained on C_{40} isomers are given inwhere is total strain energy estimated by the model; IJUGE, IIUGF, IFETB, and IFUGB are SMPI descriptors. Two descriptors (IFETB and IFUGB) account for the atomic number as atomic property, one descriptor accounts for electronegativity (IJUGE), and one accounts for the first ionization energy (IIUGF). The investigated property is related to the geometry of compounds (fourth letter “G” in the name of descriptors) with one exception that is related with compounds topology (IFETB descriptor). The other letters reflect the linearization operator (first letter), matrix operation (second letter), and interaction descriptor (third letter). Note that both models have the mean of residual not significantly different by zero ().

The analysis of the metrics associated with (5) and (6) leads to the conclusion that model given by (6) perform better than the model given by (5). The same conclusion is obtained by analyzing the plots of observed versus predicted TSE (Figure 4).

**(a)**

**(b)**

The results of our study showed that the identified nano-QSPR models fit the data based on which the model was identified (C_{42} isomers) but could be used for selection of those structural descriptors with fair abilities in prediction on external dataset (C_{40} isomers). To sum up, equations relating electronegativities, ionization potential, and energy have been identified on C_{42} isomers and proved to work also on C_{40} isomers. Note that electronegativities and ionization potential are atomic properties and since the investigated set contains just C and H atoms, the identified relation between the three properties could be assigned also to the topology and geometry of the investigated compounds.

To the best of our knowledge, structure-property relationship approaches were not applied on C_{42} or C_{40} fullerene isomers. The small-diameter fullerenes (C_{20}, C_{34}, C_{42}, and C_{60}) were mainly investigated in regard of properties (such as adsorption [29], distribution of CC distance [14], and Schlegel diagrams of molecular structures [30]). Therefore, this is the first report of a quantitative relationship between structure and property of C_{42} fullerene. Undoubtedly, the advancement from theoretical to experimental studies is desired.

#### 4. Conclusions

The C_{42} fullerene isomers were successfully modeled and the total strain energy was characterized as function of information extracted from structure of the compounds. The models with goodness-of-fit in leave-one-out () and leave-many-out analyses proved also that prediction power is the one with four descriptors. The total strain reaction proved a function of electronegativity and first ionization energy, in relation to geometry of compounds. The structural descriptors able to fairly explain the total strain energy on C_{42} isomers proved also able to explain the same property on C_{40} fullerene isomers.

#### Conflict of Interests

The authors declare that there is no conflict of interests regarding the publication of this paper.