Rodent Carcinogenicity Dataset

Laboratory of Chemometrics, National Institute of Chemistry, Hajdrihova 19, 1001 Ljubljana, Slovenia

The rodent carcinogenicity dataset was compiled from the Carcinogenic Potency Database (CPDBAS) and was applied for the classification of quantitative structure-activity relationship (QSAR) models for the prediction of carcinogenicity based on the counter-propagation artificial neural network (CP ANN) algorithm. The models were developed within EU-funded project CAESAR for regulatory use. The dataset contains the following information: common information about chemicals (ID, chemical name, and their CASRN), molecular structure information (SDF files and SMILES), and carcinogenic (toxicological) properties information: carcinogenic potency (TD50_Rat_mg; carcinogen/noncarcinogen) and structural alert (SA) for carcinogenicity based on mechanistic data. Molecular structure information was used to get chemometrics information to calculate molecular descriptors (254 MDL and 784 Dragon descriptors), which were further used in predictive QSAR modeling. The dataset presented in the paper can be used in future research in oncology, ecology, or chemicals' risk assessment.