Statistical and Machine Learning Methods for Software Fault Prediction Using CK Metric Suite: A Comparative Analysis

Suresh, Yeresime; Kumar, Lov; Rath, Santanu Ku.

doi:https://doi.org/10.1155/2014/251083

International Scholarly Research Notices

On this page

Abstract Introduction Related Work Results and Analysis Conclusion References Copyright Related Articles

Research Article | Open Access

Volume 2014 | Article ID 251083 | https://doi.org/10.1155/2014/251083

Statistical and Machine Learning Methods for Software Fault Prediction Using CK Metric Suite: A Comparative Analysis

Yeresime Suresh,¹Lov Kumar,¹and Santanu Ku. Rath¹

Academic Editor: S. K. Shukla, Z. Shen, K. Framling

Received31 Aug 2013

Accepted16 Jan 2014

Published04 Mar 2014

Abstract

Experimental validation of software metrics in fault prediction for object-oriented methods using statistical and machine learning methods is necessary. By the process of validation the quality of software product in a software organization is ensured. Object-oriented metrics play a crucial role in predicting faults. This paper examines the application of linear regression, logistic regression, and artificial neural network methods for software fault prediction using Chidamber and Kemerer (CK) metrics. Here, fault is considered as dependent variable and CK metric suite as independent variables. Statistical methods such as linear regression, logistic regression, and machine learning methods such as neural network (and its different forms) are being applied for detecting faults associated with the classes. The comparison approach was applied for a case study, that is, Apache integration framework (AIF) version 1.6. The analysis highlights the significance of weighted method per class (WMC) metric for fault classification, and also the analysis shows that the hybrid approach of radial basis function network obtained better fault prediction rate when compared with other three neural network models.

1. Introduction

Present day software development is mostly based on object-oriented paradigm. The quality of object-oriented software can be best assessed by the use of software metrics. A number of metrics have been proposed by researchers and practitioners to evaluate the quality of software. These metrics help to verify the quality attributes of a software such as effort and fault proneness.

The usefulness of these metrics lies in their ability to predict the reliability of the developed software. In practice, software quality mainly refers to reliability, maintainability, and understandability. Reliability is generally measured by the number of faults found in the developed software. Software fault prediction is a challenging task for researchers before the software is released. Hence, accurate fault prediction is one of the major goals so as to release a software having the least possible faults.

This paper aims to assess the influence of CK metrics, keeping in view of predicting faults for an open-source software product. Statistical methods such as linear regression and logistic regression are used for classification of faulty classes. Machine learning algorithms such as artificial neural network (ANN), functional link artificial neural network (FLANN), and radial basis function network (RBFN) are applied for prediction of faults, and probabilistic neural network (PNN) is used for classification of faults. It is observed in literature that metric suites have been validated for small data sets. In this approach, the results achieved for an input data set of 965 classes were validated by comparing with the results obtained by Basili et al. [1] for statistical analysis.

The rest of the paper is organized as follows. Section 2 summarizes software metrics and their usage in fault prediction. Section 3 highlights research background. Section 4 describes the proposed work for fault prediction by applying various statistical and machine learning methods. Section 5 highlights the parameters used for evaluating the performance of each of the applied techniques. Section 6 presents the results and analysis of fault prediction. Section 7 concludes the paper with scope for future work.

This section presents a review of the literature on the use of software metrics and their application in fault prediction. The most commonly used metric suites, indicating the quality of any software, are McCabe [2], Halstead [3], Li and Henry [4], CK metric [5], Abreu MOOD metric suite [6], Lorenz and Kidd [7], Martin’s metric suite [8], Tegarden et al. [9], Melo and Abreu [10], Briand et al. [11], Etzkorn et al. [12], and so forth. Out of these metrics, CK metric suite is observed to be used very often by the following authors as mentioned in Table 1 for predicting faults at class level.

Basili et al. [1] experimentally analyzed the impact of CK metric suite on fault prediction. Briand et al. [13] found out the relationship between fault and the metrics using univariate and multivariate logistic regression models. Tang et al. [14] investigated the dependency between CK metric suite and the object-oriented system faults. Emam et al. [15] conducted empirical validation on Java application and found that export coupling has great influence on faults. Khoshgoftaar et al. [16, 17] conducted experimental analysis on telecommunication model and found that ANN model is more accurate than any discriminant model. In their approach, nine software metrics were used for modules developed in procedural paradigm. Since then, ANN approach has taken a rise in their usage for prediction modeling.

3. Research Background

The following subsections highlight the data set being used for fault prediction. Data are normalized to obtain better accuracy, and then dependent and independent variables are chosen for fault prediction.

3.1. Empirical Data Collection

Metric suites are used and defined for different goals such as fault prediction, effort estimation, reusability, and maintenance. In this paper, the most commonly used metric, that is, CK metric suite, [5] is used for fault prediction.

The CK metric suite consists of six metrics, namely, weighted method per class (WMC), depth of inheritance tree (DIT), number of children (NOC), coupling between objects (CBO), response for class (RFC), and lack of cohesion (LCOM) [5]. Table 2 gives a short note on the six CK metrics and the threshold for each of the six metrics.

The metric values of the suite are extracted using Chidamber and Kemerer Java Metrics (CKJM) tool. CKJM tools extract object-oriented metrics by processing the byte code of compiled Java classes. This tool is being used to extract metric values for three versions of Apache integration framework (AIF, an open-source framework) available in the Promise data repository [18]. The versions of the AIF used from the repository are developed in Java language. The CK metric values of the AIF are used for fault prediction.

3.2. Data Normalization

ANN models accept normalized data which lie in the range of 0 to 1. In the literature it is observed that techniques such as Min-Max normalization, -Score normalization, and Decimal scaling are being used for normalizing the data. In this paper, Min-Max normalization [19] technique is used to normalize the data.

Min-Max normalization performs a linear transformation on the original data. Each of the actual data of attribute is mapped to a normalized value which lies in the range of 0 to 1. The Min-Max normalization is calculated by using the equation: where and represent the minimum and maximum values of the attribute, respectively.

3.3. Dependent and Independent Variables

The goal of this study is to explore the relationship between object-oriented metrics and fault proneness at the class level. In this paper, a fault in a class is considered as a dependent variable and each of the CK metrics is an independent variable. It is intended to develop a function between fault of a class and CK metrics (WMC, DIT, NOC, CBO, RFC, and LCOM). Fault is a function of WMC, DIT, NOC, CBO, RFC, and LCOM and can be represented as shown in the following equation:

4. Proposed Work for Fault Prediction

The following subsections highlight the various statistical and machine learning methods used for fault classification.

4.1. Statistical Methods

This section describes the application of statistical methods for fault prediction. Regression analysis methods such as linear regression and logistic regression analysis are applied. In regression analysis, the value of unknown variable is predicted based on the value of one or more known variables.

4.1.1. Linear Regression Analysis

Linear regression is a statistical technique and establishes a linear (i.e., straight-line) relationship between variables. This technique is used when faults are distributed over a wide range of classes.

Linear regression analysis is of two types:(a)univariate linear regression, and(b)multivariate linear regression.

Univariate linear regression is based on where represents dependent variables (accuracy rate for this case) and represents independent variables (CK metrics for this case).

In case of multivariate linear regression, the linear regression is based on where is the independent variable, is a constant, and is the dependent variable. Table 8 shows the result of linear regression analysis for three versions of AIF.

4.1.2. Logistic Regression Analysis

Logistic regression analysis is used for predicting the outcome of dependent variables based on one or more independent variable(s). A dependent variable can take only two values. So the dependent variable of a class containing bugs is divided into two groups, one group containing zero bugs and the other group having at least one bug.

Logistic regression analysis is of two types:(a)univariate logistic regression, and(b)multivariate logistic regression.

(a) Univariate Logistic Regression Analysis. Univariate logistic regression is carried out to find the impact of an individual metric on predicting the faults of a class. The univariate logistic regression is based on where is an independent variable and and represent the constant and coefficient values, respectively. Logit function can be developed as follows: where represents the probability of a fault found in the class during validation phase.

The results of univariate logistic regression for AIF are tabulated in Table 9. The values of obtained coefficient are the estimated regression coefficients. The probability of faults being detected for a class is dependent on the coefficient value (positive or negative). Higher coefficient value means greater probability of a fault being detected. The significance of coefficient value is determined by the value. The value was assessed based on the significance level (). coefficient is the proportion of the total variation in the dependent variable explained in the regression model. High value of indicates greater correlation between faults and the CK metrics.

(b) Multivariate Logistic Regression Analysis. Multivariate logistic regression is used to construct a prediction model for the fault proneness of classes. In this method, metrics are used in combination. The multivariate logistic regression model is based on the following equation: where is the independent variable, represents the probability of a fault found in the class during validation phase, and represents the number of independent variables. The Logit function can be formed as follows: Equation (8) shows that logistic regression is really just a standard linear regression model, where the dichotomous outcome of the result is transformed by the transform. The value of lies in the range . After the logit transforms the value of lies in the range .

4.2. Machine Learning Methods

Besides the statistical approach, this paper also implements four other machine learning techniques. Machine learning techniques have been used in this paper to predict the accuracy rate in fault prediction using CK metric suite.

This section gives a brief description of the basic structure and working of machine learning methods applied for fault prediction.

4.2.1. Artificial Neural Network

Figure 1 shows the architecture of ANN, which contains three layers, namely, input layer, hidden layer, and output layer. Computational features involved in ANN architecture can be very well applied for fault prediction.

In this paper for input layer, linear activation function has been used; that is, the output of the input layer “” is input of the input layer “,” which is represented as follows: For hidden layer and output layer, sigmoidal (squashed-S) function is used. The output of hidden layer for input of hidden layer is represented as follows: Output of the output layer “” for the input of the output layer “” is represented as follows:

A neural network can be represented as follows: where is the input vector, is the output vector, and is weight vector. The weight vector is updated in every iteration so as to reduce the mean square error (MSE) value. MSE is formulated as follows: where is the actual output and is the expected output. In the literature, different methods are available to update weight vector (“”) such as Gradient descent method, Newton’s method, Quasi-Newton method, Gauss Newton Conjugate-gradient method, and Levenberg Marquardt method. In this paper, Gradient descent and Levenberg Marquardt methods are used for updating the weight vector .

(a) Gradient Descent Method. Gradient descent is one of the methods for updating the weight during learning phase [20]. Gradient descent method uses first-order derivative of total error to find the in error space. Normally gradient vector is defined as the first-order derivative of error function. Error function is represented as follows: and is given as:

After computing the value of gradient vector in each iteration, weighted vector is updated as follows: where is the updated weight, is the current weight, is a gradient vector, and is the learning parameter.

(b) Levenberg Marquardt (LM) Method. LM method locates the minimum of multivariate function in an iterative manner. It is expressed as the sum of squares of nonlinear real-valued functions [21, 22]. This method is used for updating the weights during learning phase. LM method is fast and stable in terms of its execution when compared with gradient descent method (LM method is a combination of steepest descent and Gauss-Newton methods). In LM method, weight vector is updated as follows: where is the updated weight, is the current weight, is Jacobian matrix, and is combination coefficient; that is, when is very small then it acts as Gauss-Newton method and if is very large then it acts as Gradient descent method.

Jacobian matrix is calculated as follows: where is number of weights, is the number of input patterns, and is the number of output patterns.

4.2.2. Functional Link Artificial Neural Network (FLANN)

FLANN, initially proposed by Pao [23], is a flat network having a single layer; that is, the hidden layers are omitted. Input variables generated by linear links of neural network are linearly weighed. Functional links act on elements of input variables by generating a set of linearly independent functions. These links are evaluated as functions with the variables as the arguments. Figure 2 shows the single layered architecture of FLANN. FLANN architecture offers less computational overhead and higher convergence speed when compared with other ANN techniques.

Using FLANN, output is calculated as follows: where is the predicted value, is the weight vector, and is the functional block, and is defined as follows: and weight is updated as follows: having as the learning rate and as the error value. “” is formulated as follows: here and represent actual and the obtained (predicted) values, respectively.

4.2.3. Radial Basis Function Network (RBFN)

RBFN is a feed-forward neural network (FFNN), trained using supervised training algorithm. RBFN is generally configured by a single hidden layer, where the activation function is chosen from a class of functions called basis functions.

RBFN is one of the ANN techniques which contains three layers, namely, input, hidden, and output layer. Figure 3 shows the structure of a typical RBFN in its basic form involving three entirely different layers. RBFN contains number of hidden centers represented as .

The target output is computed as follows: where is the weight of the th center, is the radial function, and is the target output. Table 3 shows the various radial functions available in the literature.

In this paper, Gaussian function is used as a radial function, and the distance vector is calculated as follows: where is input vector that lies in the receptive field for center . In this paper, gradient descent learning and hybrid learning techniques are used for updating weight and center, respectively.

The advantage of using RBFN lies in its training rate which is faster when compared with propagation networks and is less susceptible to problem with nonstationary inputs.

(a) Gradient Descent Learning Technique. Gradient descent learning is a technique used for updating the weight and center . The center in gradient learning is updated as: and weight is updated as: where and are the learning coefficients for updating center and weight, respectively.

(b) Hybrid Learning Technique. In hybrid learning technique, radial function relocates their center in self-organized manner while the weights are updated using learning algorithm. In this paper, least mean square (LMS) algorithm is used for updating the weights while the center is updated only when it satisfies the following conditions:(a)Euclidean distance between the input pattern and the nearest center is greater than the threshold value, and(b)MSE is greater than the desired accuracy.

After satisfying the above conditions, the Euclidean distance is used to find the centers close to and then the centers are updated as follows: After every updation, the center moves closer to .

4.2.4. Probabilistic Neural Network (PNN)

PNN was introduced by Specht [24]. It is a feed-forward neural network, which has been basically derived from Bayesian network and statistical algorithm.

In PNN, the network is organized as multilayered feed-forward network with four layers such as input, hidden, summation, and output layer. Figure 4 shows the basic architecture of PNN.

The input layer first computes the distance from input vector to the training input vectors. The second layer consists of a Gaussian function which is formed using the given set of data points as centers. The summation layers sum up the contribution of each class of input and produce a net output which is vector of probabilities. The fourth layer determines the fault prediction rate.

PNN technique is faster when compared to multilayer perceptron networks and also is more accurate. The major concern lies in finding an accurate smoothing parameter “” to obtain better classification. The following function is used in hidden layer: where ,is the input,is the center, andis the Euclidean distance between the center and the input vector.

5. Performance Evaluation Parameters

The following subsections give the basic definitions of the performance parameters used in statistical and machine learning methods for fault prediction.

5.1. Statistical Analysis

The performance parameters for statistical analysis can be determined based on the confusion matrix [25] as shown in Table 4.

5.1.1. Precision

It is defined as the degree to which the repeated measurements under unchanged conditions show the same results:

5.1.2. Correctness

Correctness as defined by Briand et al. [13] is the ratio of the number of modules correctly classified as fault prone to the total number of modules classified as fault prone:

5.1.3. Completeness

According to Briand et al. [13], completeness is the ratio of number of faults in classes classified as fault prone to the total number of faults in the system:

5.1.4. Accuracy

Accuracy as defined by Yaun et al. [26] is the proportion of predicted fault prone modules that are inspected out of all modules:

5.1.5. Statistic

, also known as coefficient of multiple determination, is a measure of power of correlation between predicted and actual number of faults [25]. The higher the value of this statistic the more is the accuracy of the predicted model where is the actual number of faults, is the predicted number of faults, and is the average number of faults.

5.2. Machine Learning

Fault prediction accuracy for four of the applied ANN is determined by using performance evaluation parameters such as mean absolute error (MAE), mean absolute relative error (MARE), root mean square error (RMSE), and standard error of the mean (SEM).

5.2.1. Mean Absolute Error (MAE)

This performance parameter determines how close the values of predicted and actual fault (accuracy) rate differ:

5.2.2. Mean Absolute Relative Error (MARE)

Consider

In (35), a numerical value of is added in the denominator in order to avoid numerical overflow (division by zero). The modified MARE is formulated as:

5.2.3. Root Mean Square Error (RMSE)

This performance parameter determines the differences in the values of predicted and actual fault (accuracy) rate: In (35), (36), and (37), is actual value and is expected value.

5.2.4. Standard Error of the Mean (SEM)

It is the deviation of predicted value from the actual fault (accuracy) rate: where SD is sample standard deviation and “” is the number of samples.

6. Results and Analysis

In this section, the relationship between value of metrics and the fault found in a class is determined. In this approach, the comparative study involves using six CK metrics as input nodes and the output is the achieved fault prediction rate. Fault prediction is performed for AIF version 1.6.

6.1. Fault Data

To perform statistical analysis, bugs were collected from Promise data repository [18]. Table 5 shows the distribution of bugs based on the number of occurrences (in terms of percentage of class containing number of bugs) for AIF version 1.6.

AIF version 1.6 contains 965 numbers of classes in which 777 classes contain zero bugs (80.5181%), 10.4663% of classes contain at least one bug, 3.3161% of classes contain a minimum of two bugs, 1.6580% of classes contain three bugs, 1.4508% of classes contain four bugs, 0.6218% of classes contain five bugs, 0.2073% of the classes contain six bugs, 0.3109% of classes contain seven and eleven bugs, 0.5181% of classes contain eight bugs, and 0.1036% of the class contain nine, thirteen, seventeen, eighteen, and twenty-eight bugs.

6.2. Metrics Data

CK metric values for WMC, DIT, NOC, CBO, RFC, and LCOM, respectively, for AIF version 1.6 are graphically represented in Figures 5, 6, 7, 8, 9, and 10.

6.3. Descriptive Statistics and Correlation Analysis

This subsection gives the comparative analysis of the fault data, descriptive statistics of classes, and the correlation among the six metrics with that of Basili et al. [1]. Basili et al. studied object-oriented systems written in C++ language. They carried out an experiment in which they set up eight project groups each consisting of three students. Each group had the same task of developing small/medium-sized software system. Since all the necessary documentation (for instance, reports about faults and their fixes) were available, they could search for relationships between fault density and metrics. They used the same CK metric suite. Logistic regression was employed to analyze the relationship between metrics and the fault proneness of classes.

The obtained CK metric values of AIF version 1.6 are compared with the results of Basili et al. [1]. In comparison with Basili, the total number of classes considered is much greater; that is, 965 classes were considered (Vs. 180). Table 6 shows the comparative statistical analysis results obtained for Basili et al. and AIF version 1.6 for CK metrics indicating Max, Min, Median, and Standard deviation.

The dependency between CK metrics is computed using Pearson’s correlations (: coefficient of determination) and compared with Basili et al. [1] for AIF version 1.6. The coefficient of determination, , is useful because it gives the proportion of the variance (fluctuation) of one variable that is predictable from the other variable. It is a measure that allows a researcher to determine how certain one can be in making predictions from a certain model/graph. Table 7 shows the Pearson’s correlations for the data set used by Basili et al. [1] and the correlation metrics of AIF version 1.6.

From Table 7, w.r.t AIF version 1.6, it is observed that correlation between WMC and RFC is 0.77 which is highly correlated; that is, these two metrics are very much linearly dependent on each other. Similarly, correlation between WMC and DIT is 0, which indicates that they are loosely correlated; that is, there is no dependency between these two metrics.

6.4. Fault Prediction Using Statistical Methods

6.4.1. Linear Regression Analysis

Table 8 shows results obtained for linear regression analysis, in which the fault is considered as the dependent variable and the CK metrics are the independent variables.

“” represents the coefficient of correlation; “” refers to the significance of the metric value. If , then the metrics are of very great significance in fault prediction.

6.4.2. Logistic Regression Analysis

The logistic regression method helps to indicate whether a class is faulty or not but does not convey anything about the possible number of faults in the class. Univariate and multivariate logistic regression techniques are applied to predict whether the class is faulty or not. Univariate regression analysis is used to examine the effect of each metric on fault of the class while multivariate regression analysis is used to examine the common effectiveness of metrics on fault of the class. The results of three versions of AIF are compared considering these two statistical techniques. Figure 11 shows the typical “” curve obtained (similar to Sigmoid function) for the AIF version 1.6 using multivariate logistic regression. Tables 9 and 10 contain the tabulated values for the results obtained by applying univariate and multivariate regression analysis, respectively.

From Table 9, it can be observed that all metrics of CK suite are highly significant except for DIT. The value for the three versions (w.r.t DIT) is 0.335, 0.108, and 0.3527, respectively. Higher values of “” are an indication of less significance.

Univariate and multivariate logistic regression statistical methods were used for classifying a class as faulty or not faulty. Logistic regression was applied with a threshold value 0.5; that is, indicates that a class is classified as “faulty,” otherwise it is categorized as “not faulty” class.

Tables 11 and 12 represent the confusion matrix for number of classes with faults before and after applying regression analysis, respectively, for AIF version 1.6. From Table 11 it is clear that before applying the logistic regression, a total number of 777 classes contained zero bugs and 188 classes contained at least one bug. After applying logistic regression (Table 12), a total of classes are classified correctly with accuracy of 81.13%.

The performance parameters of all three versions of the AIF are shown in Table 13, obtained by applying univariate and multivariate logistic regression analysis. Here precision, correctness, completeness, and accuracy [1, 13, 27, 28] are taken as a performance parameters. By using multivariate logistic regression, accuracy of AIF version 1.2 is found to be 64.44%, accuracy of AIF version 1.4 is 83.37%, and that of AIF version 1.6 is 81.13%.

From the results obtained by applying linear and logistic regression analysis, it is found that out of the six metrics WMC appears to have more impact in predicting faults.

6.5. Fault Prediction Using Neural Network

6.5.1. Artificial Neural Network

ANN is an interconnected group of nodes. In this paper, three layers of ANN are considered, in which six nodes act as input nodes, nine nodes represent the hidden nodes, and one node acts as output node.

ANN is a three-phase network; the phases are used for learning, validation and testing purposes. So in this article, 70% of total input pattern is considered for learning phase, 15% for validation, and the rest 15% for testing. The regression analysis carried out classifies whether a class is faulty or not faulty. The prediction models of ANN and its forms such as PNN, RBFN, and FLANN, not only classify the class as faulty or not faulty but also highlight the number of bugs found in the class and these bugs are fixed in the testing phase of software development life cycle.

In this paper six CK metrics are taken as input, and output is the fault prediction accuracy rate required for developing the software. The network is trained using Gradient descent method and Levenberg Marquardt method.

(a) Gradient Descent Method. Gradient descent method is used for updating the weights using (15) and (16). Table 14 shows the performance metrics of AIF version 1.6. Figure 12 shows the graph plot for variation of mean square error values w.r.t no of epoch (or iteration) for AIF version 1.6.

(b) Levenberg Marquardt Method. Levenberg Marquardt method [21, 22] is a technique for updating weights. In case of Gradient descent method, learning rate is constant but in Levenberg Marquardt method, learning rate varies in every iteration. So this method consumes less number of iterations to train the network. Table 15 shows the performance metrics for AIF version 1.6 using Levenberg Marquardt method.

Figure 13 shows the graph plot for variation of mean square error values w.r.t number of epoch for AIF version 1.6.

6.5.2. Functional Link Artificial Neural Network (FLANN)

FLANN architecture for software fault prediction is a single layer feed-forward neural network consisting of an input and output layer. FLANN does not incorporate any hidden layer and hence has less computational cost. In this paper, adaptive algorithm has been used for updating the weights as shown in (21). Figure 14 shows the variation of mean square values against number of epochs for AIF version 1.6. Table 16 shows the performance metrics of FLANN.

6.5.3. Radial Basis Function Network

In this paper, Gaussian radial function is used as a radial function. Gradient descent learning and hybrid learning methods are used for updating the centers and weights, respectively.

Three layered RBFN has been considered, in which six CK metrics are taken as input nodes, nine hidden centers are taken as hidden nodes, and output is the fault prediction rate. Table 17 shows the performance metrics for AIF version 1.6.

(a) Gradient Descent Learning Method. Equations (25) and (26) are used for updating center and weight during training phase. After simplifying (25), the equation is represented as: and the modified Equation (26) is formulated as: where is the width of the center and is the current iteration number. Table 18 shows the performance metrics for AIF version 1.6. Figure 15 indicates the variation of MSE w.r.t number of epochs.

(b) Hybrid Learning Method. In Hybrid learning method, centers are updated using (27) while weights are updated using supervised learning methods. In this paper, least mean square error (LMSE) algorithm is used for updating the weights. Table 19 shows the performance matrix for AIF version 1.6. Figure 16 shows the graph for variation of MSE versus number of epochs.

6.5.4. Probabilistic Neural Network (PNN)

As mentioned in Section 4.2.4, PNN is a multilayered feed-forward network with four layers such as input, hidden, summation, and output layer.

In PNN, 50% of faulty and nonfaulty classes are taken as input for hidden layers. Gaussian elimination (28) is used as a hidden node function. The summation layers sum contribution of each class of input patterns and produce a net output which is a vector of probabilities. The output pattern having maximum summation value is classified into respective class. Figure 17 shows the variation of accuracy for different values of smoothing parameter.

6.6. Comparison

Table 20 shows the tabulated results for the obtained performance parameter values, number of epochs, and accuracy rate by applying three neural network techniques. This performance table is an indication of better fault prediction model. In this comparative analysis, the performance parameter mean square error (MSE) was taken as a criterion to compute the performance parameters (such as MARE, MSE, number of epochs, and accuracy rate) when four neural network techniques were applied. During this process the MSE value of 0.002 was set a threshold for evaluation. Based on the number of iterations and the accuracy rate obtained by the respective NN technique, best prediction model was determined.

From Table 20 it is evident that gradient NN method obtained an accuracy rate of 94.04% in 162 epochs (iterations). LM technique, which is an improvised model of ANN, obtained 90.4% accuracy rate. This accuracy rate is less than gradient NN but this approach (LM method) took only 13 epochs. PNN method achieved a classification rate of 86.41%.

The three types of RBFN, namely, basic RBFN, gradient, and hybrid methods obtained a prediction rate of 97.27%, 97.24%, and 98.47%, respectively. Considering the number of epochs, RBFN hybrid method obtained better prediction rate of 98.47% in only 14 epochs when compared with gradient method (41 epochs) and basic RBFN approaches.

FLANN architecture obtained 96.37% accuracy rate with less computational cost involved. FLANN obtained accuracy rate in 66 epochs as it has no hidden layer involved in its architecture.

The performance of PNN is shown in Figure 17. Highest accuracy in prediction was obtained for smoothing parameter value of 1.7. PNN obtained a classification rate of 86.41%.

RBFN using hybrid learning model gives the least values for MAE, MARE, RMSE, and high accuracy rate. Hence, from the obtained results by using ANN techniques it can be concluded that RBFN hybrid approach obtained the best fault prediction rate in less number of epochs when compared with other three ANN techniques.

7. Conclusion

System analyst use of prediction models to classify fault prone classes as faulty or not faulty is the need of the day for researchers as well as practitioners. So, more reliable approaches for prediction need to be modeled. In this paper, two approaches, namely, statistical methods and machine learning techniques were applied for fault prediction. The application of statistical and machine learning methods in fault prediction requires enormous amount of data and analyzing this huge amount of data is necessary with the help of a better prediction model.

This paper proposes a comparative study of different prediction models for fault prediction for an open-source project. Fault prediction using statistical and machine learning methods were carried out for AIF by coding in MATLAB environment. Statistical methods such as linear regression and logistic regression were applied. Also machine learning techniques such as artificial neural network (gradient descent and Levenberg Marquardt methods), Functional link artificial neural network, radial basis function network (RBFN basic, RBFN gradient, and RBFN hybrid), and probabilistic neural network techniques were applied for fault prediction analysis.

It can be concluded from the statistical regression analysis that out of six CK metrics, WMC appears to be more useful in predicting faults. Table 20 shows that hybrid approach of RBFN obtained better fault prediction in less number of epochs (14 iterations) when compared with the other three neural network techniques.

In future, work should be replicated to other open-source projects like Mozilla using different AI techniques to analyze which model performs better in achieving higher accuracy for fault prediction. Also, fault prediction accuracy should be measured by combining multiple computational intelligence techniques.

Conflict of Interests

The authors declare that there is no conflict of interests regarding the publication of this paper.

References

V. R. Basili, L. C. Briand, and W. L. Melo, “A validation of object-oriented design metrics as quality indicators,” IEEE Transactions on Software Engineering, vol. 22, no. 10, pp. 751–761, 1996.
View at: Publisher Site | Google Scholar
T. J. McCabe, “A Complexity Measure,” IEEE Transactions on Software Engineering, vol. 2, no. 4, pp. 308–320, 1976.
View at: Google Scholar
M. H. Halstead, Elements of Software Science, Elsevier Science, New York, NY, USA, 1977.
W. Li and S. Henry, “Maintenance metrics for the Object-Oriented paradigm,” in Proceedings of the 1st International Software Metrics Symposium, pp. 52–60, 1993.
View at: Google Scholar
S. R. Chidamber and C. F. Kemerer, “Metrics suite for object oriented design,” IEEE Transactions on Software Engineering, vol. 20, no. 6, pp. 476–493, 1994.
View at: Publisher Site | Google Scholar
F. B. E. Abreu and R. Carapuca, “Object-Oriented software engineering: measuring and controlling the development process,” in Proceedings of the 4th International Conference on Software Quality, pp. 1–8, McLean, Va, USA, October 1994.
View at: Google Scholar
M. Lorenz and J. Kidd, Object-Oriented Software Metrics, Prentice Hall, Englewood, NJ, USA, 1994.
R. Martin, “OO design quality metrics—an analysis of dependencies,” in Proceedings of the Workshop Pragmatic and Theoretical Directions in Object-Oriented Software Metrics (OOPSLA '94), 1994.
View at: Google Scholar
D. P. Tegarden, S. D. Sheetz, and D. E. Monarchi, “A software complexity model of object-oriented systems,” Decision Support Systems, vol. 13, no. 3-4, pp. 241–262, 1995.
View at: Google Scholar
W. Melo and F. B. E. Abreu, “Evaluating the impact of object-oriented design on software quality,” in Proceedings of the 3rd International Software Metrics Symposium, pp. 90–99, Berlin, Germany, March 1996.
View at: Google Scholar
L. Briand, P. Devanbu, and W. Melo, “Investigation into coupling measures for C++,” in Proceedings of the IEEE 19th International Conference on Software Engineering Association for Computing Machinery, pp. 412–421, May 1997.
View at: Google Scholar
L. Etzkorn, J. Bansiya, and C. Davis, “Design and code complexity metrics for OO classes,” Journal of Object-Oriented Programming, vol. 12, no. 1, pp. 35–40, 1999.
View at: Google Scholar
L. C. Briand, J. Wüst, J. W. Daly, and D. Victor Porter, “Exploring the relationships between design measures and software quality in object-oriented systems,” The Journal of Systems and Software, vol. 51, no. 3, pp. 245–273, 2000.
View at: Publisher Site | Google Scholar
M.-H. Tang, M.-H. Kao, and M.-H. Chen, “Empirical study on object-oriented metrics,” in Proceedings of the 6th International Software Metrics Symposium, pp. 242–249, November 1999.
View at: Google Scholar
K. El Emam, W. Melo, and J. C. Machado, “The prediction of faulty classes using object-oriented design metrics,” Journal of Systems and Software, vol. 56, no. 1, pp. 63–75, 2001.
View at: Google Scholar
T. M. Khoshgoftaar, E. B. Allen, J. P. Hudepohl, and S. J. Aud, “Application of neural networks to software quality modeling of a very large telecommunications system,” IEEE Transactions on Neural Networks, vol. 8, no. 4, pp. 902–909, 1997.
View at: Publisher Site | Google Scholar
R. Hochman, T. M. Khoshgoftaar, E. B. Allen, and J. P. Hudepohl, “Evolutionary neural networks: a robust approach to software reliability problems,” in Proceedings of the 8th International Symposium on Software Reliability Engineering (ISSRE '97), pp. 13–26, November 1997.
View at: Google Scholar
T. Menzies, B. Caglayan, E. Kocaguneli, J. Krall, F. Peters, and B. Turhan, “The PROMISE Repository of empirical software engineering data,” West Virginia University, Department of Computer Science, 2012, http://promisedata.googlecode.com.
View at: Google Scholar
Y. Kumar Jain and S. K. Bhandare, “Min max normalization based data perturbation method for privacy protection,” International Journal of Computer and Communication Technology, vol. 2, no. 8, pp. 45–50, 2011.
View at: Google Scholar
R. Battiti, “First and Second-Order Methods for Learning between steepest descent and newton's method,” Neural Computation, vol. 4, no. 2, pp. 141–166, 1992.
View at: Publisher Site | Google Scholar
K. Levenberg, “A method for the solution of certain non-linear problems in least squares,” Quarterly of Applied Mathematics, vol. 2, no. 2, pp. 164–168, 1944.
View at: Google Scholar
D. W. Marquardt, “An algorithm for the lest-squares estimation of non-linear parameters,” SIAM Journal of Applied Mathematics, vol. 11, no. 2, pp. 431–441, 1963.
View at: Publisher Site | Google Scholar
Y. H. Pao, Adaptive Pattern Recognition and Neural Networks, Addison-Wesley, Reading, UK, 1989.
D. F. Specht, “Probabilistic neural networks,” Neural Networks, vol. 3, no. 1, pp. 109–118, 1990.
View at: Google Scholar
C. Catal, “Performance evaluation metrics for software fault prediction studies,” Acta Polytechnica Hungarica, vol. 9, no. 4, pp. 193–206, 2012.
View at: Google Scholar
X. Yaun, T. M. Khoshgoftaar, E. B. Allen, and K. Ganesan, “Application of fuzzy clustering to software quality prediction,” in Proceedings of the 3rd IEEE Symposium on Application-Specific Systems and Software Engineering Technology (ASSEST '00), pp. 85–91, March 2000.
View at: Google Scholar
T. Gyimóthy, R. Ferenc, and I. Siket, “Empirical validation of object-oriented metrics on open source software for fault prediction,” IEEE Transactions on Software Engineering, vol. 31, no. 10, pp. 897–910, 2005.
View at: Publisher Site | Google Scholar
G. Denaro, M. Pezzè, and S. Morasca, “Towards industrially relevant fault-proneness models,” International Journal of Software Engineering and Knowledge Engineering, vol. 13, no. 4, pp. 395–417, 2003.
View at: Publisher Site | Google Scholar
S. Kanmani and U. V. Rymend, “Object-Oriented software quality prediction using general regression neural networks,” SIGSOFT Software Engineering Notes, vol. 29, no. 5, pp. 1–6, 2004.
View at: Publisher Site | Google Scholar
N. Nagappan and W. Laurie, “Early estimation of software quality using in-process testing metrics: a controlled case study,” in Proceedings of the 3rd Workshop on Software Quality, pp. 1–7, St. Louis, Mo, USA, 2005.
View at: Google Scholar
H. M. Olague, L. H. Etzkorn, S. Gholston, and S. Quattlebaum, “Empirical validation of three software metrics suites to predict fault-proneness of object-oriented classes developed using highly Iterative or agile software development processes,” IEEE Transactions on Software Engineering, vol. 33, no. 6, pp. 402–419, 2007.
View at: Publisher Site | Google Scholar
K. K. Aggarwal, Y. Singh, A. Kaur, and R. Malhotra, “Empirical analysis for investigating the effect of object-oriented metrics on fault proneness: a replicated case study,” Software Process Improvement and Practice, vol. 14, no. 1, pp. 39–62, 2009.
View at: Publisher Site | Google Scholar
F. Wu, “Empirical validation of object-oriented metrics on NASA for fault prediction,” in Proceedings of theInternational Conference on Advances in Information Technology and Education, pp. 168–175, 2011.
View at: Google Scholar
H. Kapila and S. Singh, “Analysis of CK metrics to predict software fault-proneness using bayesian inference,” International Journal of Computer Applications, vol. 74, no. 2, pp. 1–4, 2013.
View at: Google Scholar

Copyright

Copyright © 2014 Yeresime Suresh et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

PDF Download Citation

Download other formats

Order printed copies

Views

5339

Downloads

1206

Citations

International Scholarly Research Notices

Statistical and Machine Learning Methods for Software Fault Prediction Using CK Metric Suite: A Comparative Analysis

Abstract

1. Introduction

2. Related Work

3. Research Background

3.1. Empirical Data Collection

3.2. Data Normalization

3.3. Dependent and Independent Variables

4. Proposed Work for Fault Prediction

4.1. Statistical Methods

4.1.1. Linear Regression Analysis

4.1.2. Logistic Regression Analysis

4.2. Machine Learning Methods

4.2.1. Artificial Neural Network

4.2.2. Functional Link Artificial Neural Network (FLANN)

4.2.3. Radial Basis Function Network (RBFN)

4.2.4. Probabilistic Neural Network (PNN)

5. Performance Evaluation Parameters

5.1. Statistical Analysis

5.1.1. Precision

5.1.2. Correctness

5.1.3. Completeness

5.1.4. Accuracy

5.1.5. Statistic

5.2. Machine Learning

5.2.1. Mean Absolute Error (MAE)

5.2.2. Mean Absolute Relative Error (MARE)

5.2.3. Root Mean Square Error (RMSE)

5.2.4. Standard Error of the Mean (SEM)

6. Results and Analysis

6.1. Fault Data

6.2. Metrics Data

6.3. Descriptive Statistics and Correlation Analysis

6.4. Fault Prediction Using Statistical Methods

6.4.1. Linear Regression Analysis

6.4.2. Logistic Regression Analysis

6.5. Fault Prediction Using Neural Network

6.5.1. Artificial Neural Network

6.5.2. Functional Link Artificial Neural Network (FLANN)

6.5.3. Radial Basis Function Network

6.5.4. Probabilistic Neural Network (PNN)

6.6. Comparison

7. Conclusion

Conflict of Interests

References

Copyright