Review Article

Deep Learning in Cancer Diagnosis and Prognosis Prediction: A Minireview on Challenges, Recent Trends, and Future Directions

Table 1

Summary of the datasets for cancer research.

Dataset/service/projectLinkType(s) of cancer(s)Description

TCGA databasehttps://www.cancer.gov/aboutnci/organization/ccg/research/structural-genomics/tcgaMultiple33 cancer types, total no. of cases is 11125
Rotterdam tumor bankhttps://stat.ethz.ch/R-manual/R-devel/library/survival/html/rotterdam.htmlBreast cancer2982 primary breast cancer patients; 1546 are positive cases
SUPPORT database[31]Multiple9105 adults, an overall 6-month mortality rate of 47%
METABRIC datasethttps://www.cbioportal.org/study/summary?id=brca_metabricBreast cancer2509 primary breast tumor subjects, 548 matched normal control subjects
MITOS-ATYPIA-14 datasethttps://mitos-atypia-14.grand-challenge.org/Home/Breast cancerResolution of pixels at 20x and 40x magnification levels
TUPAC 2016 dataset[33]Breast cancer500 training and 321 testing breast cancer histology whole-brain slides
INbreast dataset[34]Breast cancerTotal of 115 cases and 410 images
LIDC-IDRI databasehttps://wiki.cancerimagingarchive.net/display/Public/LIDCIDRI#1966254194132fe653e4a7db00715f6f775c012Lung cancerCT scans of 1018 subjects, three categories (i)  mm, (ii)  mm, and (iii)  mm
LUNA16 datasethttps://luna16.grandchallenge.org/Data/Lung cancer888 CT scans, facilitates segmentation studies
BreakHis datasethttps://web.inf.ufpr.br/vri/databases/breast-cancerhistopathological-database-breakhis/Breast cancer9109 microscopic images; four different magnification levels which are 40x, 100x, 200x, and 400x collected from 82 subjects
2015 Bioimaging Breast Histology Classification Challengehttps://rdm.inesctec.pt/dataset/nis-2017-003Breast cancerFour classes which are normal, benign, in situ carcinoma, and invasive carcinoma; resolution of pixels
CAMELYON datasethttps://camelyon17.grand-challenge.orgBreast cancerFacilitates patient-level analysis; 1399 unique whole-slide images; no metastases, macrometastases, micrometastases, and isolated tumor cells
PatchCamelyon datasethttps://www.tensorflow.org/datasets/catalog/patch_camelyonBreast cancer327,680 color images with resolution of pixels; bigger than CIFAR10 and smaller than ImageNet dataset
2018 ICIAR datasethttps://iciar2018-challenge.grand-challenge.org/Dataset/Breast cancerRepresent normal, benign, in situ carcinoma, and invasive carcinoma; 400 microscopy images with 100 images per class
MITOS12 datasethttp://ludo17.free.fr/mitos_2012/dataset.htmlBreast cancer50 biopsy slides; 40x magnification level; more than 300 mitoses
Leukemia microarray gene datahttps://www.bioconductor.org/packages/devel/data/experiment/manuals/leukemiasEset/man/leukemiasEset.pdfBone marrow cancer60 bone marrow samples; acute lymphoblastic leukemia, acute myeloid leukemia, chronic lymphocytic leukemia, chronic myeloid leukemia, and healthy bone marrow
Gene Expression Omnibus repositoryhttps://www.ncbi.nlm.nih.gov/geo/MultipleProvides comprehensive sets of microarray, next-generation sequencing, and other genomic data
BioGPS data portalhttp://biogps.org/#goto=welcomeMultipleSupports eight species including humans; supports different types of cancers
TCIAhttps://www.cancerimagingarchive.netMultipleSupports a large number of modalities; supports data such as patient outcomes, treatment details, and genomics
GDChttps://gdc.cancer.govMultipleProvides genomic, clinical, and biospecimen data
TARGEThttps://ocg.cancer.gov/programs/target#MultipleChildhood cancers are supported; provides vast amounts of genomic data to estimate molecular alterations
1000 Genomes Projecthttps://www.internationalgenome.org/1000-genomes-summaryMultipleProvides a comprehensive resource on human genetic variation
Kvasir datasethttps://dl.acm.org/do/10.1145/3193289/abs/Gastrointestinal tract cancer4000 annotated images belonging to 8 classes
UCSB-BB datasethttps://bioimage.ucsb.edu/research/bio-segmentationSupports breast cancer research in human speciesContains images of human, monkey, and cat species at subcellular, cellular, and tissue levels
BRATS datasethttps://www.med.upenn.edu/cbica/brats2020/Brain tumorMRI scans of 65 subjects each in clinical and synthetic datasets, for brain tumor segmentation task