Security and Communication Networks

Research Article

Android Malware Characterization Using Metadata and Machine Learning Techniques

Table 1

Detailed summary of previous malware detection systems in Android.


Title	Target	Pros	Cons	Dataset	Performance

Drebin [2]	On-device malware detection using code and manifest permissions	On-device; Explainability;	Obfuscation; Accuracy for imbalanced	123, 453	Accuracy: 0.94
Yerima et al. [3]	Bayesian analysis over permissions and other code features for zero-day analysis	Simple model; High accuracy	Small sample set; Code inspection	1, 000	Accuracy: 0.93
Sahs et al. [4]	Use of SVM over permissions and control flow graphs for malware detection	Anomaly detection- based approach	Unbalanced results; Offline system	-	F1-score: 0.25
Peiravian et al. [5]	Tree and SVM Classifiers over permissions and API Calls	High accuracy; robust modelling	Code analysis; Unknown processing times; API calls can be benign	3, 000	F1-score: 0.95
AndroDialysis [6]	Analysis of application intents and permissions as features for machine learning	High accuracy; On-device	Code analysis; Unbalanced dataset	7, 406	Accuracy: 0.955
Yerima et al. [7]	Ensemble learning over app code and API calls	High accuracy; Feature Selection not needed	Code analysis; Large-feature model	6, 863	Accuracy: 0.97
PUMA [8]	Permission Analysis	Simple approach; Robust modelling	Tiny dataset; Narrow feature set	239	AUC: 0.92
Ham et al. [9]	Classifier using app’s runtime parameters	predictive features; High F1-score	Code analysis; Unknown processing times	14, 794	F1-score: 0.99
Karim et al. [10]	User profile single feature classification	Compact model; low False Positive rate;	Complex feature extraction; User profiles can be fooled	4, 117	False Positive Rate: 2.1
DroidChain [11]	Behaviour chain detection of malware	Zero-day support; Behaviour detection	Code analysis; Restricted to 4 malware models	1, 260	F1-score: 0.8
Andromaly [12]	Application monitoring and anomaly detection	distributed service; On-device	Monitoring impact on device; Lack of Ground truth; May require device rooting	20	AUC: 0.8-0.99
RiskRanker [13]	Zero-day malware detection through automated code analysis	Large Dataset; Multi-stage; Risk assessment	Code analysis; Limited support of behaviours; Signature dependent	118, 318	Number of Detections: 718
DroidAPIMiner [14]	KNN classifiers over bytecode semantic information	Fast solution; Family-aware system	Code analysis; API calls can be benign; Static Whitelisting	20,000	Accuracy: 0.99
Massvet [15]	Application code differential analysis to spot clones by comparing with market	Very Large dataset; Scalable; Pairwise app comparison	Vulnerable to code changes; Code analysis	1, 2M	Number of Detections: 127, 429