Scientific Programming

Review Article

Predictive Analytics and Software Defect Severity: A Systematic Review and Future Directions

Table 1

Research questions guiding the SLR.


Research questions	Objectives

R-Q1: which is the most widespread data sampling state?	To realize the sampling state of dataset mostly deployed so far
R-Q2: which public data are often deployed	To identify public datasets popularly or frequently used in literature
R-Q3: which machine learning approach is popular in literature?	To identify the type of machine learning variate mostly used
R-Q4: does the choice of learner algorithm/ensemble impact the performance of defect severity prediction?	To realize the consensus learner algorithm recommended in the literature
R-Q5: does training strategy impact prediction performance?	To study various fold validation option-choice
R-Q6: is parameter tuning optimization popularly factored into predictive analytics?	To know the extent to which results in the literature are enhanced by tuning options
R-Q7: which training tool is mostly adopted?	Way of identifying the utilitarian value of tools for ML
R-Q8: what feature selection algorithm is mostly deployed?	To identify the most deployed dimensionality reduction technique in literature
R-Q9: what is the course of action between “within” and “cross-project adoption”?	A way of understanding the road map of SDP as implemented in previous studies
R-Q10: what are the prominent threats to the validity of proposed models	To identify from literature germane threats in literature to inspire future studies
R-Q11: understanding the future direction of software defect prediction studies with respect to threats to validity reported	To do a one-to-one mapping of threats reported with future work directions