Research Article

Development of Conformation Independent Computational Models for the Early Recognition of Breast Cancer Resistance Protein Substrates

Table 1

Features of the best individual model (Model 1) and the other individual models (models 2 to 4) that composed the two best 2-model ensembles.

Descriptors included valueSp training set*Se training set*Overall accuracy training set*Sp test set*Se test set*Overall accuracy test set*Leave-group-out CV1Randomization2

Model 1:
(squared Moriguchi octanol-water partition coefficient), nCrR2 (no. of ring quaternary C(sp3)), JGI7 (mean topological charge index of order 7), nCONHR (no. of secondary amides (aliphatic)), nHAcc (no. of acceptor atoms for H-bonds (N O F)), and GGI8 (topological charge index of order 8).
8.04 <0.00000079%68%74%63%74%66%70.4% (±11.9)64.4% (±3.4)

Model 2:
BEHm2 (highest eigenvalue no. 2 of Burden matrix/weighted by atomic masses), BELe2 (lowest eigenvalue no. 2 of burden matrix/weighted by atomic Sanderson electronegativities), Hy (hydrophilic factor), LAI (Lipinski alert index), LP1 (Lovasz-Pelikan index), BEHp1 (highest eigenvalue no. 1 of Burden matrix/weighted by atomic polarizabilities), SEigp (eigenvalue sum from polarizability weighted distance matrix), and VRA2 (average Randic-type eigenvector-based index from adjacency matrix).
7.52<0.00000075.3%74.7%75%76%66.7%73.5%67% (±15)61.5% (±3.6)

Model 3:
D/Dr11 (distance/detour ring index of order 11), nCONHR, nCO (no. of ketones (aliphatic)), X0Av (average valence connectivity index chi-0), nCaH (no. of unsubstituted aromatic C(sp2)), Xt (total structure connectivity index), PW4 (path/walk 4-Randic shape index), D/Dr12 (distance/detour ring index of order 12), T(O..O) (sum of topological distances between O..O), nNHRPh (no. of secondary amines (aromatic)), SPI (superpendentic index), and Rww (reciprocal hyperdetour index).
10.39<0.00000083.5%83.5%83.5%73.2%74%73.5%81.2% (±11.3)62.4% (±5.1)

Model 4:
, JGI7, SRW10 (self-returning walk count of order 10), piPC02 (molecular multiple path count of order 02), and Hy.
6.56<0.00001463.3%70.6%67%77.5%70.4%75.5%64.8% (±13.6)58.3% (±4.05)

*Considering zero as a cutoff value between substrates and non-substrates. This threshold may be later optimized through ROC curves analysis to provide a background-dependent optimal balance between Sp and Se.
1Results are presented as the average result for the folds ± the standard deviation.
2Results are presented as the average performance of the randomized models ± the standard deviation.