A Novel Molecular Representation Learning for Molecular Property Prediction with a Multiple SMILES-Based Augmentation

<div>The images of one-hot vectorization using multiple SMILES for Estradiol molecule. It shows the vectorization of randomly generated 6 SMILES strings, using random order of the character set for Estradiol, which consists of 9 characters: “(”, “3”, “O”, “c”, “1”, “)”, “2”, “4”, “C”. The length of padding is 37, that includes predefined extra padding. (a) c12cc(O)ccc1C1C(C3CCC(O)C3(C)CC1)CC2. (b) C1(O)C2(C)C(CC1)C1C(c3c(cc(O)cc3)CC1)CC2. (c) C12(C)C(CCC1O)C1C(c3ccc(O)cc3CC1)CC2. (d) Oc1ccc2c(c1)CCC1C3CCC(O)C3(C)CCC12. (e) OC1C2(C)CCC3c4ccc(O)cc4CCC3C2CC1. (f) C1C2c3ccc(O)cc3CCC2C2CCC(O)C2(C)C1.</div>

Computational Intelligence and Neuroscience

fig4

Figure 4

Figure 4: A Novel Molecular Representation Learning for Molecular Property Prediction with a Multiple SMILES-Based Augmentation