Review Article

Deep Learning in Cancer Diagnosis and Prognosis Prediction: A Minireview on Challenges, Recent Trends, and Future Directions

Table 1

Summary of the datasets for cancer research.

Dataset/service/projectLinkType(s) of cancer(s)Description

TCGA database cancer types, total no. of cases is 11125
Rotterdam tumor bank cancer2982 primary breast cancer patients; 1546 are positive cases
SUPPORT database[31]Multiple9105 adults, an overall 6-month mortality rate of 47%
METABRIC dataset cancer2509 primary breast tumor subjects, 548 matched normal control subjects
MITOS-ATYPIA-14 dataset cancerResolution of pixels at 20x and 40x magnification levels
TUPAC 2016 dataset[33]Breast cancer500 training and 321 testing breast cancer histology whole-brain slides
INbreast dataset[34]Breast cancerTotal of 115 cases and 410 images
LIDC-IDRI database cancerCT scans of 1018 subjects, three categories (i)  mm, (ii)  mm, and (iii)  mm
LUNA16 dataset cancer888 CT scans, facilitates segmentation studies
BreakHis dataset cancer9109 microscopic images; four different magnification levels which are 40x, 100x, 200x, and 400x collected from 82 subjects
2015 Bioimaging Breast Histology Classification Challenge cancerFour classes which are normal, benign, in situ carcinoma, and invasive carcinoma; resolution of pixels
CAMELYON datasethttps://camelyon17.grand-challenge.orgBreast cancerFacilitates patient-level analysis; 1399 unique whole-slide images; no metastases, macrometastases, micrometastases, and isolated tumor cells
PatchCamelyon dataset cancer327,680 color images with resolution of pixels; bigger than CIFAR10 and smaller than ImageNet dataset
2018 ICIAR dataset cancerRepresent normal, benign, in situ carcinoma, and invasive carcinoma; 400 microscopy images with 100 images per class
MITOS12 dataset cancer50 biopsy slides; 40x magnification level; more than 300 mitoses
Leukemia microarray gene data marrow cancer60 bone marrow samples; acute lymphoblastic leukemia, acute myeloid leukemia, chronic lymphocytic leukemia, chronic myeloid leukemia, and healthy bone marrow
Gene Expression Omnibus repository comprehensive sets of microarray, next-generation sequencing, and other genomic data
BioGPS data portal eight species including humans; supports different types of cancers
TCIAhttps://www.cancerimagingarchive.netMultipleSupports a large number of modalities; supports data such as patient outcomes, treatment details, and genomics
GDChttps://gdc.cancer.govMultipleProvides genomic, clinical, and biospecimen data
TARGET cancers are supported; provides vast amounts of genomic data to estimate molecular alterations
1000 Genomes Project a comprehensive resource on human genetic variation
Kvasir dataset tract cancer4000 annotated images belonging to 8 classes
UCSB-BB dataset breast cancer research in human speciesContains images of human, monkey, and cat species at subcellular, cellular, and tissue levels
BRATS dataset tumorMRI scans of 65 subjects each in clinical and synthetic datasets, for brain tumor segmentation task