Research Article

Android Malware Characterization Using Metadata and Machine Learning Techniques

Table 1

Detailed summary of previous malware detection systems in Android.

TitleTargetProsConsDatasetPerformance

Drebin [2]On-device malware detection using code and manifest permissionsOn-device;
Explainability;
Obfuscation;
Accuracy for imbalanced
123, 453Accuracy: 0.94
Yerima et al. [3]Bayesian analysis over permissions and other code features for zero-day analysisSimple model;
High accuracy
Small sample set;
Code inspection
1, 000Accuracy: 0.93
Sahs et al. [4]Use of SVM over permissions and control flow graphs for malware detectionAnomaly detection- based approachUnbalanced results;
Offline system
-F1-score: 0.25
Peiravian et al. [5]Tree and SVM Classifiers over permissions and API CallsHigh accuracy;
robust modelling
Code analysis;
Unknown processing times;
API calls can be benign
3, 000F1-score: 0.95
AndroDialysis [6]Analysis of application intents and permissions as features for machine learningHigh accuracy;
On-device
Code analysis;
Unbalanced dataset
7, 406Accuracy: 0.955
Yerima et al. [7]Ensemble learning over app code and API callsHigh accuracy;
Feature Selection not needed
Code analysis;
Large-feature model
6, 863Accuracy: 0.97
PUMA [8]Permission AnalysisSimple approach;
Robust modelling
Tiny dataset;
Narrow feature set
239AUC: 0.92
Ham et al. [9]Classifier using app’s runtime parameterspredictive features;
High F1-score
Code analysis;
Unknown processing times
14, 794F1-score: 0.99
Karim et al. [10]User profile single feature classificationCompact model;
low False Positive rate;
Complex feature extraction;
User profiles can be fooled
4, 117False Positive Rate: 2.1
DroidChain [11]Behaviour chain detection of malwareZero-day support; Behaviour detectionCode analysis;
Restricted to 4 malware models
1, 260F1-score: 0.8
Andromaly [12]Application monitoring and anomaly detectiondistributed service;
On-device
Monitoring impact on device;
Lack of Ground truth;
May require device rooting
20AUC: 0.8-0.99
RiskRanker [13]Zero-day malware detection through automated code analysisLarge Dataset;
Multi-stage;
Risk assessment
Code analysis;
Limited support of behaviours;
Signature dependent
118, 318Number of Detections: 718
DroidAPIMiner [14]KNN classifiers over bytecode semantic informationFast solution;
Family-aware system
Code analysis;
API calls can be benign;
Static Whitelisting
20,000Accuracy: 0.99
Massvet [15]Application code differential analysis to spot clones by comparing with marketVery Large dataset;
Scalable;
Pairwise app comparison
Vulnerable to code changes;
Code analysis
1, 2MNumber of Detections: 127, 429