Research Article

A Novel Molecular Representation Learning for Molecular Property Prediction with a Multiple SMILES-Based Augmentation

Table 3

The AUROC and AUPRC scores of various approaches in HIV and BACE datasets. The predictive values of the approaches are partly derived from the related references [2, 8, 13, 32].

ModelHIVBACE
AUROCAUPRCAUROCAUPRC

CheMixNetCNN_RNN0.82040.86140.74290.7162

SMILES-basedSmi2Vec-BiGRU0.91170.89630.84400.7872

Conventional methodsXGBoost0.75600.8500
Multitask0.69800.8240

Graph-based methodsGC0.76300.7830
Weave0.70300.8060
Pretraining GNN0.79900.78060.84500.7908

3D-based modelsDrug3D-Net0.96210.96170.71850.6397

Our method (multiple SMILES)RNN (one layer)0.95670.95250.78790.7577
RNN (two layers)0.96130.96360.80830.7665
CNN_RNN0.97670.97980.85120.7919

The best results are highlighted in bold.