Table of Contents
Dataset Papers in Medicine
Volume 2013 (2013), Article ID 361615, 6 pages
http://dx.doi.org/10.1155/2013/361615
Dataset Paper

Rodent Carcinogenicity Dataset

Laboratory of Chemometrics, National Institute of Chemistry, Hajdrihova 19, 1001 Ljubljana, Slovenia

Received 1 June 2012; Accepted 27 June 2012

Academic Editors: E. Frei and K. van Golen

This dataset has been dedicated to the public domain using the CC0 waiver.

Dataset http://dx.doi.org/10.1155/2013/361615/dataset

Dataset

Dataset Item 1 (Table). A list of 805 chemicals from CPDBAS used for carcinogenicity modeling with indication of training and test sets, which were extracted from the original dataset of 1481 chemicals downloaded from Distributed Structure-Searchable Toxicity (DSSTox) Public Database Network (http://www.epa.gov/ncct/dsstox/sdf_cpdbas.html).  The column ID_v5 presents the codes of the chemicals used in CAESAR project (ID of chemicals in database version 5); ID_CPDBAS-Original, the ID number taken from Distributed Structure-Searchable Toxicity (DSSTox) Public Database Network version 3b; Chemical Name, the chemical names taken from DSSTox and double checked from PubChem Compound (NCBI) (http://www.ncbi.nlm.nih.gov/sites/entrez?db=pccompound); CASRN, the registry number of the Chemical Abstract Service taken from DSSTox and double checked from PubChem Compound (NCBI). In the column Carcinogenic Potency Expressed as TD50, TD50 is the dose rate in milligram per kilogram of body weight per day, which, if administered chronically for the standard lifespan of the species, will halve the probability of remaining tumorless throughout that period. The TD50 value reported is the harmonic mean of the most potent TD50 values from each positive experiment in the species. All the values were derived from the Carcinogenic Potency Database (http://potency.berkeley.edu/cpdb.html). In the column Carcinogenic Potency Expressed as P or NP, “P” means positive or active (carcinogens) and “NP” means not positive or inactive (noncarcinogens). In the column Set, “Training” is for the training set and “Test” is for the test (prediction) set.

  • Column 1: ID_v5
  • Column 2: ID_CPDBAS_Original
  • Column 3: Chemical Name
  • Column 4: CASRN
  • Column 5: Carcinogenic Potency Expressed as TD50 (mg kg−1 d−1)
  • Column 6: Carcinogenic Potency Expressed as P or NP
  • Column 7: Set

Dataset Item 2 (Table). A list of the same 805 chemicals from CPDBAS with additional chemical information for studied compounds and detailed information about carcinogenic potency by results on animal tests (rats and mice). Structural alerts (SAs) for carcinogenicity extracted from Toxtree are included in last columns. The structural diversity of CAESAR dataset of 805 chemicals by the presence of specific structural alerts (SAs) extracted from Toxtree program with the number of chemicals in carcinogenicity dataset is presented in Table 1. In Dataset Item 2 (Table), the column ID_v5 presents the ID number used in the model; ID_CPDBAS-Original, the ID number in CPDBAS; STRUCTURE_Formula, the empirical molecular formula; STRUCTURE_MolecularWeight, the molecular weight or molar mass (atomic mass units); TestSubstance_ChemicalName, the common or trade name of chemical; and TestSubstance_CASRN, the Chemical Abstracts Service (CAS) Registry Number of the tested substance. In the column STRUCTURE_ChemicalName_IUPAC, IUPAC (International Union of Pure and Applied Chemistry) refers to the standardized nomenclature of organic chemistry. The column STRUCTURE_SMILES presents the Simplified Molecular Input Line Entry System (SMILES) molecular text code of displayed STRUCTURE. In the columns TD50_Rat_mg, TD50_Rat_mmol, TD50_Mouse_mg, and TD50_Mouse_mmol, TD50 is a standardized quantitative measure of carcinogenic potency (analogous to an LD50) and is computed in the CPDB for each species/sex/tissue/tumor type for each experiment (see http://potency.berkeley.edu/td50harmonicmean.html). In the columns TargetSites_Rat_Male, TargetSites_Rat_Female, TargetSites_Rat_BothSexes, TargetSites_Mouse_Male, TargetSites_Mouse_Female, and TargetSites_Mouse_BothSexes, target sites (e.g., liver, lung, etc.) are reported for each sex-species group with a positive result  in  the  CPDB  (see  http://potency.berkeley.edu/pathology.table.html).  The column NTP_TechnicalReport presents the National Toxicology Program Technical Report number of study; Website URL, the Internet URL website address for chemical-specific data or content; Alert Type, the structural alert (SA) for carcinogenicity, where GA stands for genotoxic alert, nGA stands for non-genotoxic alert, and NA stands for no alert; Alert 1, the structural alert (SA1) for carcinogenicity, the first SA in molecule; Alert 2, the structural alert (SA2) for carcinogenicity, the second SA in molecule.

  • Column 1: ID_v5
  • Column 2: ID_CPDBAS_Original
  • Column 3: STRUCTURE_Formula
  • Column 4: STRUCTURE_MolecularWeight
  • Column 5: TestSubstance_ChemicalName
  • Column 6: TestSubstance_CASRN
  • Column 7: STRUCTURE_ChemicalName_IUPAC
  • Column 8: STRUCTURE_SMILES
  • Column 9: TD50_Rat_mg (mg kg−1 d−1)
  • Column 10: TD50_Rat_mmol (mmol kg−1 d−1)
  • Column 11: TargetSites_Rat_Male
  • Column 12: TargetSites_Rat_Female
  • Column 13: TargetSites_Rat_BothSexes
  • Column 14: TD50_Mouse_mg (mg kg−1 d−1)
  • Column 15: TD50_Mouse_mmol (mmol kg−1 d−1)
  • Column 16: TargetSites_Mouse_Male
  • Column 17: TargetSites_Mouse_Female
  • Column 18: TargetSites_Mouse_BothSexes
  • Column 19: NTP_TechnicalReport
  • Column 20: Website URL
  • Column 21: Alert Type
  • Column 22: Alert 1
  • Column 23: Alert 2

Dataset Item 3 (Chemical Structure Data). Collection of SDF files for 805 chemicals listed in carcinogenicity dataset. To create QSAR models, we calculated chemical descriptors using chemical structures.

Dataset Item 4 (Table). Values of 254 MDL descriptors for 805 chemicals.

  • Column 1: ID_v5
  • Column 2: ID_CPDBAS_Original
  • Column 3: (1) SsCH3
  •   ⋮
  • Column 254: (252) totop
  • Column 255: (253) Wt
  • Column 256: (254) nclass

Dataset Item 5 (Table). A list of 254 MDL descriptors with their signs and definitions.

  • Column 1: ID
  • Column 2: MDL Number
  • Column 3: Descriptors’ Sign
  • Column 4: Definition
  • Column 5: Class

Dataset Item 6 (Table). Values of 784 Dragon descriptors for 805 chemicals.

  • Column 1: ID_v5
  • Column 2: DRA0001 MW
  • Column 3: DRA0002 AMW
  •   ⋮
  • Column 783: DRA0833 MLOGP2
  • Column 784: DRA0834 ALOGP
  • Column 785: DRA0835 ALOGP2

Dataset Item 7 (Table). A list of 784 Dragon descriptors with their signs and definitions.

  • Column 1: Internal Code
  • Column 2: Symbol
  • Column 3: Definition
  • Column 4: Class