Table 1: List of features and their descriptions in the initial dataset (the dataset is also available at the website of Data Mining and Biomedical Informatics Lab at VCU (http://www.cioslab.vcu.edu/)).

Feature nameTypeDescription and values% missing

Encounter IDNumericUnique identifier of an encounter0%
Patient numberNumericUnique identifier of a patient0%
RaceNominalValues: Caucasian, Asian, African American, Hispanic, and other2%
GenderNominalValues: male, female, and unknown/invalid0%
AgeNominalGrouped in 10-year intervals: 0, 10), 10, 20), …, 90, 100)0%
WeightNumericWeight in pounds. 97%
Admission typeNominalInteger identifier corresponding to 9 distinct values, for example, emergency, urgent, elective, newborn, and not available0%
Discharge dispositionNominalInteger identifier corresponding to 29 distinct values, for example, discharged to home, expired, and not available0%
Admission sourceNominalInteger identifier corresponding to 21 distinct values, for example, physician referral, emergency room, and transfer from a hospital0%
Time in hospitalNumericInteger number of days between admission and discharge 0%
Payer codeNominalInteger identifier corresponding to 23 distinct values, for example, Blue Cross/Blue Shield, Medicare, and self-pay52%
Medical specialtyNominalInteger identifier of a specialty of the admitting physician, corresponding to 84 distinct values, for example, cardiology, internal medicine, family/general practice, and surgeon53%
Number of lab proceduresNumericNumber of lab tests performed during the encounter0%
Number of proceduresNumericNumber of procedures (other than lab tests) performed during the encounter0%
Number of medicationsNumericNumber of distinct generic names administered during the encounter0%
Number of outpatient visitsNumericNumber of outpatient visits of the patient in the year preceding the encounter0%
Number of emergency visitsNumericNumber of emergency visits of the patient in the year preceding the encounter0%
Number of inpatient visitsNumericNumber of inpatient visits of the patient in the year preceding the encounter0%
Diagnosis 1NominalThe primary diagnosis (coded as first three digits of ICD9); 848 distinct values0%
Diagnosis 2NominalSecondary diagnosis (coded as first three digits of ICD9); 923 distinct values0%
Diagnosis 3NominalAdditional secondary diagnosis (coded as first three digits of ICD9); 954 distinct values1%
Number of diagnosesNumericNumber of diagnoses entered to the system0%
Glucose serum test result NominalIndicates the range of the result or if the test was not taken. Values: “>200,” “>300,” “normal,” and “none” if not measured0%
A1c test resultNominalIndicates the range of the result or if the test was not taken. Values: “>8” if the result was greater than 8%, “>7” if the result was greater than 7% but less than 8%, “normal” if the result was less than 7%, and “none” if not measured. 0%
Change of medicationsNominalIndicates if there was a change in diabetic medications (either dosage or generic name). Values: “change” and “no change”0%
Diabetes medicationsNominalIndicates if there was any diabetic medication prescribed. Values: “yes” and “no”0%
24 features for medications NominalFor the generic names: metformin, repaglinide, nateglinide, chlorpropamide, glimepiride, acetohexamide, glipizide, glyburide, tolbutamide, pioglitazone, rosiglitazone, acarbose, miglitol, troglitazone, tolazamide, examide, sitagliptin, insulin, glyburide-metformin, glipizide-metformin, glimepiride-pioglitazone, metformin-rosiglitazone, and metformin-pioglitazone, the feature indicates whether the drug was prescribed or there was a change in the dosage. Values: “up” if the dosage was increased during the encounter, “down” if the dosage was decreased, “steady” if the dosage did not change, and “no” if the drug was not prescribed0%
ReadmittedNominalDays to inpatient readmission. Values: “<30” if the patient was readmitted in less than 30 days, “>30” if the patient was readmitted in more than 30 days, and “No” for no record of readmission.0%