Review Article

Deep Learning Methods for Malware and Intrusion Detection: A Systematic Literature Review

Table 13

Windows-based malware datasets.

ReferenceDataset: description, size, type

API Call dataset [149]7107 different malicious software belonging to various families, such as virus, backdoor, trojan, and so on, have been analyzed, categorized into its different families, and made available for researchers to work on.

EMBER [158]A labelled benchmark dataset for training machine learning models to statically detect malicious Windows portable executable files. This dataset includes features extracted from 1.1 M binary files.

SOREL-20 M [159]A large-scale dataset consisting of nearly 20 million files with preextracted features and metadata, high-quality labels derived from multiple sources, information about vendor detections of the malware samples at the time of collection, and additional “tags” related to each malware sample to serve as additional targets.