Research Article

BLATTA: Early Exploit Detection on Network Traffic with Recurrent Neural Networks

Table 1

Related works in exploit detection. Unlike previous works, Blatta does not have to read until the end of application layer messages to detect exploit traffic.

PaperFeaturesDetection methodDataset(s)Learning typeProtocol(s)Early prediction

PAYL [7]Relative frequency count of each 1-gramBased on statistical model and Mahalanobis distanceD, SGUHTTP, SMTP, SSHNo
RePIDS [8]Mahalanobis distance map which is originated from relative frequency count of each 1-gram, filtered by PCA.Based on statistical model and Mahalanobis distanceD, MUHTTPNo
McPAD [9]-gramsMulti one-class SVM classifierD, MUHTTPNo
HMMPayl [10]Byte sequences of the L7 payload.Ensemble of HMMsD, M, DIUHTTPNo
Oza et al. [11]Relative frequency count of each 1-gram.Based on statistical modelD, M, SGUHTTPNo
OCPAD [12]High-order -grams (n > 1).Based on the occurrence probability of an -grams in a packetM, SGUHTTPNo
Bartos et al. [13]Information from HTTP request headers and the lengthsSVMSGSHTTPNo
Zhang et al. [14]Packet header information and HTTP and DNS messagesNaïve Bayes, Bayesian network, SVMD, SGSDNS, HTTPNo
Decanter [15]HTTP messagesClusteringSGUHTTPNo
Golait and Hubbali [16]Byte sequence of the L7 payloadProbabilistic counting deterministic timed automataSGUSIPNo
Duessel et al. [17]Contextual -grams of the L7 payloadOne-class SVMSGUHTTP, RPCNo
Min et al. [18]Words of the L7 payloadCNN and random forestISHTTPNo
Jin et al. [19]-gramsMulti one-class SVM classifierMUHTTPNo
Hao et al. [20]Byte sequence of the L7 payloadVariant gated recurrent unitISHTTPNo
Schneider and Bottinger [21]Byte sequence of the L7 payloadStacked autoencoderOUModbusNo
Hamed et al. [22]-grams of base64-encoded payloadSVMISAll protocols in the datasetsNo
Pratomo et al. [23]Byte frequency of application layer messagesOutlier detection with deep autoencoderSWUHTTP, SMTPNo
Qin et al. [24]Byte sequence of the L7 payloadUsing a recurrent neural networkOSHTTPNo
Liu et al. [25]Byte sequence of the L7 payloadUsing a recurrent neural network with embedded vectorsD, OSHTTPNo
Zhao and Ahn [26]Disassembled instructions of bytes in network trafficEmploying Markov chain-based model and SVMSGSNot mentionedNo
Shabtai et al. [27]-grams of a file and -grams of opcodes in a file, then calculated TF/IDF of those -gramsVarious ML algorithm, e.g., random forest, decision tree, Naïve Bayes, and few othersSGSFile classificationNo
SigFree [28]Disassembled instructions of bytes in application layer payloadAnalyses of instruction sequences to determine if they are codeSGNon-MLHTTPNo
Proposed approachHigh-order -grams of application layer messagesUses of recurrent neural network to early predict exploit trafficSW, SGSHTTP, FTPYes

D = DARPA99; M = McPAD attacks dataset [9]; I = ISCX 2012; SG = self-generated; DI = DIEE; SW = UNSW-NB15; O = others; U = unsupervised; S = supervised; non-ML = non-machine learning approach.