Table of Contents Author Guidelines Submit a Manuscript
Mathematical Problems in Engineering
Volume 2015, Article ID 562716, 15 pages
http://dx.doi.org/10.1155/2015/562716
Research Article

Predicting Component Failures Using Latent Dirichlet Allocation

1Key Laboratory of Dependable Service Computing in Cyber Physical Society, Ministry of Education, Chongqing 400044, China
2School of Software Engineering, Chongqing University, Chongqing 401331, China

Received 10 January 2015; Revised 28 May 2015; Accepted 15 June 2015

Academic Editor: Mustapha Nourelfath

Copyright © 2015 Hailin Liu et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Abstract

Latent Dirichlet Allocation (LDA) is a statistical topic model that has been widely used to abstract semantic information from software source code. Failure refers to an observable error in the program behavior. This work investigates whether semantic information and failures recorded in the history can be used to predict component failures. We use LDA to abstract topics from source code and a new metric (topic failure density) is proposed by mapping failures to these topics. Exploring the basic information of topics from neighboring versions of a system, we obtain a similarity matrix. Multiply the Topic Failure Density (TFD) by the similarity matrix to get the TFD of the next version. The prediction results achieve an average 77.8% agreement with the real failures by considering the top 3 and last 3 components descending ordered by the number of failures. We use the Spearman coefficient to measure the statistical correlation between the actual and estimated failure rate. The validation results range from 0.5342 to 0.8337 which beats the similar method. It suggests that our predictor based on similarity of topics does a fine job of component failure prediction.

1. Introduction

Components are subsections of a product, which is a simple encapsulation of data and methods. Component-based software development has emerged as an important new set of methods and technologies [1, 2]. How to find failure-prone components of a software system effectively is significant for component-based software quality. The main goal of this paper is to provide a novel method for predicting component failures.

Recently, prediction studies are mainly based on two aspects to build a predictor. One is past failure data [35], and another is the source code repository [68]. Nagappan et al. [9] found if an entity often fails in the past, then it is likely to do so in the future. They combined the two aspects to make a component failure predictor. However, Gill and Grover [10] analyzed the characteristics of component-based software systems and made a conclusion that some traditional metrics are inappropriate to analyze component-based software. They stated that semantic complexity should be considered when we characterize a component.

A metric based on semantic concerns [1113] has provided initial evidence that topics in software systems are related to the defect-proneness of source code. These studies approximate concerns using statistical topic models, such as the LDA model [14]. Nguyen et al. [12] were among the first researchers that concentrated on the technical concerns/functionality of a system to predict failures. They suggested that topic-based metrics have a high correlation to the number of bugs. Also, they found that topic-based defect prediction has better predictive performance than other existing approaches. Chen et al. [15] used defect topics to explain defect-proneness of source code and made a conclusion that prior defect-proneness of a topic can be used to explain the future behavior of topics and their associated entities. However, in Chen’s work, they mainly focused on single file processing and did not draw a clear conclusion whether topics can be used to describe the behavior of components.

In this paper, our research is motivated by the recent success of applying the statistical topic model to predict defect-proneness of source code entities. The predicting model in this work is approached based on the failures and semantic concerns as Figure 1 shows. A new metric (topic failure density) is defined by mapping failures back to the topics. We study the performance of our new predictor on component failure predicting by analyzing three open source projects. In summary, the main contributions of this research are as follows.(i)We utilize past failure data and semantic information in component failure prediction and propose a new metric based on semantic and failure data. As a result, it connects the semantic information with failures of a component.(ii)We explore the word-topic distribution and find the relationship between topics. The more similar the high frequent words are in the topics, the more similar the topics are.(iii)We investigate the capability of our proposed metric in component failure prediction. We compare its prediction performance against the actual data from Bugzilla on three open source projects.

Figure 1: Process of failures prediction of components using LDA. (a) Component failures and source code of components are extracted from a bug database and the source code repository. (b) Failure density is defined as the ratio of failures and number of files. Com1, Com2, and Com3 indicate three components. We map failure density to topics by using the estimated topic distribution of components. , , and indicate three topics. Next, we get the TFD. (c) A similarity matrix is calculated to depict the similarity of topics from the previous version and next version. At last, based on TFD (previous version) and the similarity matrix, we predict the TFD (next version) and component failures.

The remainder of this paper is organized as follows. In Section 2, we present the related work of our research. In Section 3, we describe our research preparation, models, and techniques. In Sections 4 and 5, we show our experiments and validate our results. Then at last in Section 6, we make our conclusion.

2. Related Work

2.1. Software Defect Prediction

Several techniques have been studied on detection and correction of defects in software. In general, software defect prediction is divided into two subcategories: the prediction of the number of expected defects and prediction of the defect-prone entities of a system [16].

El Emam et al. [6] constructed a prediction model based on object-oriented design metrics and used the model to predict the classes that contained failures in a future version. The model was then validated on a subsequent release of the same application. Thwin and Quah [17] presented the application of neural networks in software quality estimation. They built a neural network model to predict the number of defects per class and the number of lines changed per class and then used it to estimate the software quality. Gyimóthy et al. [18] employed statistical and machine learning methods to assess object-oriented metrics and then built a classification model to predict the number of bugs in each class. They made a conclusion that there was strong linear association between the bugs in different versions. Malhotra [19] performed a systematic review of the studies that used machine learning techniques for software fault prediction. He concluded that machine learning techniques have the ability for predicting software fault proneness and more studies should be carried out in order to obtain well formed and generalizable results.

Some other defect prediction researchers paid attention to finding defect-prone parts [16, 20, 21]. In the work by Ostrand et al. [7], they made a prediction based on the source code in the current release and fault and modification history of the file from previous releases. The predictions were quite accurate when the model was applied to two large industrial systems: one with 17 releases over 4 years and the other with 9 releases over 4 years. However, a long failure history may not exist for some projects. Turhan and Bener [22] proposed a prediction model by combining multivariate approaches combined with Bayesian method. This model was used to predict the number of failure modules. Their major contribution was to incorporate multivariate approaches rather than using a univariate one. K. O. Elish and M. O. Elish [23] investigated the capability of SVM in finding defect-prone modules. They used the SVM to classify modules as defective or not defective. Krishnan et al. [24] investigated the relationship between classification-based prediction of failure-prone files and the product line. Jing et al. [25] introduced the dictionary learning technique into the field of software defect prediction. They used a cost-sensitive discriminative dictionary learning (CDDL) approach to enhance the classification ability for software defect classification and prediction. Caglayan et al. [26] investigated the relationships between defects and test phases to build defect prediction models to predict defect-prone modules. Yang et al. [27] introduced a learning-to-rank approach to construct software defect prediction models by directly optimizing the ranking performance. Ullah [28] proposed a method to select the model which best predicts the residual defects of the OSS (applications or components). Nagappan et al. [9] found complexity metrics correlated with components failures, but a single set of metrics could not act as a universally defect predictor. They used principal component analysis on metrics and built a regression model. With the model, they predicted postrelease defects of components. Abaei et al. [29] proposed a software fault detection model using a semisupervised hybrid self-organizing map. Khoshgoftaar et al. [30] applied discriminant analysis to identify fault-prone modules in a sample from a very large telecommunications system. Ohlsson and Alberg [31] investigated the relationship between design metrics and the number of function test failure reports associated with software modules. Graves et al. [8] inclined to use process measures to predict faults. Several novel process measures, such as deltas, derived from the change history, were used in their work and they found process measure is more appropriate in failure prediction than product metrics. Graves’s work is the most similar one to ours. Both our work and Graves’s work are based on process measure. The difference is that we take the extra semantic concerns between versions into consideration. Neuhaus et al. [32] provided a tool for mining a vulnerability database and mapped the vulnerabilities to individual components and then made a predictor. They used this predictor to explore failure-prone components. Schröter et al. [33] used history data to find which design decisions correlated with failures and used combinations of usages between components to make a failure predictor.

2.2. LDA Model in Defect Research

LDA is an unsupervised machine learning technique, which has been widely used in latent topic information recognition from documents or corpuses. It is of great importance in latent semantic analysis, text sentiment analysis, and topic clustering data in the field. Software source code is a kind of text dataset; hence, researchers have applied LDA to diverse software activities such as software evolution [34, 35], defect prediction [12, 15], and defect orientation [36].

Nguyen et al. [12] stated that a software system is viewed as a collection of software artifacts. They use the topic model to measure the concerns in the source code and used these as the input for a machine learning-based defect prediction model. They validated their model on an open source system (Eclipse JDT). The results showed that the topic-based metrics have a high correlation to the number of bugs and the topic-based defect prediction has better predictive performance than existing state-of-the-art approaches. Liu et al. [11] proposed a new metric, called Maximal Weighted Entropy (MWE), for the cohesion of classes in object-oriented software systems. They compared the new metric with an extensive set of existing metrics and used them to construct models that predict software faults. Chen et al. [15] used a topic model to study the effect of conceptual concerns on code quality. They combined the traditional metrics (such as LOC) and the word-topic distributions (from topic model) to propose a new topic-based metric. They used the topic-based metrics to explain why some entities are more defect-prone. They also found that defect topics are associated with defect entities in the source code. Lukins et al. [36] used the LDA-based technique for automatic bug localization and to evaluate its effectiveness. They concluded that an effective static technique for automatic bug localization can be built around LDA and there is no significant relationship between the accuracy of the LDA-based technique and the size of the subject software system or the stability of its source code base. Our work is inspired by the recent success of topic modeling in mining source code.

3. Research Methodology

The proposed method is divided into three steps: the source code extracting and preprocessing step, the topic modeling step, and the prediction step.

3.1. Data Extracting and Preprocessing

Modern software usually has a sound bug tracking system, like Bugzilla and JIRA. The Eclipse project, for instance, maintains a bug database that contains status, versions, and components using Bugzilla. Our experiment data is gained by the three steps.(1)Identify postrelease failures. From the bug tracking system, we get failures that were observed after a release.(2)Collect source code from version management systems, such as SVN and Git.(3)Prune the source code of a component. From the source code, we find there are many kinds of files in a component, but not all are necessary, such as project execution files and XML.

After the extracting step, we perform the following preprocessing steps: separating the comments and identifiers, removing the JAVA keyword syntax structure, and stemming and removing extremely high or extremely low frequency words. The words with a rate of more than 90% and the emergence rate of less than 5% are removed [37].

3.2. Topic Modeling

There are two steps in topic modeling. First, we connect the component size and the failures by defining the failure density. Second, we map failures to topics and build a failure topic metric.

3.2.1. Defining Failure Density

During the design phase of new software, designers make a detail component list. Each component contains a lot of files. In Bugzilla, the bug reports are associated with component units. In Knab et al. [38], they stated that there is a relation between failures and component size. In a software system, researchers used defect density () to assess the defects of a file, which indicates how many defects per line of code [39]. In our study, we found that components with more files usually have more failures. The number of files in different components is different, so are the failures. Here, we define the failure density of a component () aswhere represents component , is the total number of failures in a component , and is the number of files within component . is used to depict how failure-prone a component is.

3.2.2. Mapping Failures to Topics

LDA [14] is a generative probabilistic model of a corpus. It makes a simplifying assumption that all documents have topics. The -dimension vector is the parameter of topic prior distribution, while , a matrix, is the word probabilities. The joint distribution of a topic mixture is a set of topics and a set of words that we express aswhere is simply for the unique such that .

Integrating over and summing over , we obtain the marginal distribution of a document:

A corpus with documents is designed as , So

From (4), we obtain the parameters and by training and get the maximum, so that we compute the posterior distribution of the hidden variables given a document:

By (2), the Failure Density (FD) is determined by the number of failures and the number of files within a component, defined as the ratio of the number of failures in the component to its size, which reflects the failure information in a component. Using this ratio as motivation, we define the failure density of a topic (TFD) asUsing (6), failures are mapped to topics. describes the failure-proneness of topic .

3.3. Prediction

In the work by Hindle et al. [40], they compared two topics by taking the top 10 words with the highest probability that if 8 of the words are the same, then the two topics are considered to be the same. In this paper, we increased the number of the highest probability words and defined the similarity as a ratio of the number of the same words in different topics to the total number of high probability words (see (7)):By (7), we calculate similarity of the topics in two neighboring versions and get a similarity matrix. Then we define TFD relation:where is version , is the total number of topics in , and is the similar degree of topics and which is from the similarity matrix. After getting TFD through (8), we get in using (6), and then we obtain the number of failures in each component.

4. Experiment

4.1. Dataset

The experimental data comes from three open source projects, that is, Platform (a subproject of Eclipse), Ant, and Mylyn. We select three versions of bug reports for each project and the source code of the corresponding versions (Platform3.2, Platform3.3, and Platform3.4; Ant1.6.0, Ant1.7.0, and Ant1.8.0; Mylyn3.5, Mylyn3.6, and Mylyn3.7). The basic information of the three projects is shown in Table 1.

Table 1: Three open-source projects.
4.2. Results and Analysis on Three Open Source Projects

For any given corpus, how to choose the topic number () does not have a unified standard, and different corpora with different topics have great differences [4143]. Setting to extremely small values causes the topics to contain multiple concepts (imagine only a single topic, which will contain all of the concepts in the corpus), while setting to extremely large values makes the topics too fine to be meaningful and only reveals the idiosyncrasies of the data [34]. Synthetically we considered the component number and scale within a project, as well as using our experience; we select from 10 to 100. The experiment results are shown in Figure 2.

Figure 2: The prediction results with different topics.

We choose a different number of topics and compare the similarity of predicted data and actual data. From Figure 2, the result is better than the others when the number of topics is 20. This visualization allows us to quickly and compactly compare and contrast the trends exhibited by the various topics. We set to 20 for the three projects.

We run LDA on three projects and get the topic distribution of the components. The full listing of the topics distribution discovered 9 is given in Table 4. We compare the topic distribution of the components between neighboring versions of the three projects. It is seen that the topics in the current version relate to the topics in the previous version. For example, topic 8 in version 3.5 () and topic 1 in version 3.6 () have almost the same correlation with 11 components (Figure 3). We also find that and have a large difference.

Figure 3: The relationship between topics and components.

Why do and have only a small variance in COM1 (Bugzilla) with the relation value of and being 0.8988 and 0.8545 (see Table 4), respectively? Also, what makes and have a difference within components? We compare the high probability word information of the three topics. Table 2 shows the high probability words (top 10 words) of these three topics.

Table 2: High probability words.

In the experiments we found that not all TFD in the previous version had an impact on the later version. When the similarity between topics is below a threshold, the influence between TFD is very small or even has a negative influence on the results.

With the high probability words of and , just the ninth “message” is different. In addition, has nothing in common with in terms of high probability words. We conclude this is the most direct reason for why the two topics have a high similarly (or great difference) relation value between the components. Hence, this is why we use the similarity of high probability words to describe the similarity between two topics. At the same time, a similarity matrix is built to show the membership of topics in two neighboring versions. Table 3 is a similarity matrix of the three projects. In our study, some topics had one or more similar topics in the next version, but others had no similar topics. This is consistent with the topic evolution [34].

Table 3: Similarity matrix of topics.
Table 4

We use (6) to calculate the TFD for each topic on the three projects (Platform, Ant, and Mylyn). In order to better describe the relation between TFD and versions, we use a box plot. TFD of these three projects is shown in Figure 4.

Figure 4: Box plots of TFD of three projects.

From Table 1 and Figure 4, it is seen that the fewer the number of failures, the smaller the box length. For example, the length of the box plot corresponding TFD in Ant1.8 is almost 0; the number of failures in Ant1.8 is only 5 (Table 1); the number of failures in Ant1.7 is 104; the length of the box plot is much longer. According to (6), TFD is determined by the topic distribution matrix and the FD of the component. If the FD and the topic distribution matrix change, the value of the TFD will also change. However, in our research, we find that the number of files and topic mixture is constant in a version of a project. We conclude that the distribution of the TFDs in the same project is related to the number of failures. On the other hand, the distribution of TFD reflects the failure distribution in each version of a project.

In the above work, it is shown that using the similarity of high probability words describes the connection between topics in two neighboring versions (see Table 3). Furthermore, the TFD has a connection with failures of components (see Figure 4). Next we use TFD and the relation of topics to make a prediction for the failures of components.

Figure 5 shows the prediction results of the failures in each component and the actual number of failures in each component of three projects.

Figure 5: Failures (prediction) versus actual failures.

From Figure 5, it is seen that the number of failures from our predictor (we call it the prediction data) has some relation with the numbers of failures collected from Bugzilla (we call it the actual data). When the actual data of each component is larger, our prediction data is usually larger, for example, the numbers of failures of component SWT and component Debug in Figure 5(c).

What is the significance of our prediction? As in [9], we sort components by the number of failures. We find that ranking of many predicted components is consistent with the order of the real components ranking (Figure 6). We compare the first three and last three components with the actual ranking, and the average correct rate is 77.8%. In other words, our proposed prediction method quickly finds which components have the most failures and which have the least in the next version. It gives an idea about the testing priorities and allows software organizations to better focus on testing activities and improve cost estimation.

Figure 6: Comparing predicted and actual rankings.

5. Validity

5.1. Validation and Comparison

To evaluate the correlation between our prediction data and the actual data, we use the Spearman correlation coefficient [44], which is a measure of the two-variable dependence on each other [45]. If there are no duplicate values in the data and when the two variables have a completely monotonic correlation, the Spearman coefficient is or . represents a complete positive correlation, represents a perfect negative correlation, and 0 means no relationship between two variables. Correlation is as follows:

In this paper, is the actual data, and is the prediction data. The better our predictor would be, the stronger the correlations would be; a correlation of 1.0 means that the sensitivity of the predictor is high. The results of Spearman correlation coefficients are shown in Figure 7.

Figure 7: Performance comparison between our approach and Graves et al.’s approach.

From Figure 7, we find out the predicted failures are positively correlated with actual value with our approach. For instance, in project Mylyn3.7, the higher the number of failures in a component, the larger the number of postrelease failures (correlation 0.6838). To conduct the comparison, we implemented the lightweight method provided by Graves et al. [8]. The number of changes to the code in the past and a weighted average of the dates of the changes to the component are used to predict failures in Graves et al. work. As they described, we collected change management data deltas and average age of components for three projects from Github. The general linear model was used to build the prediction model. Equation (10) shows their most successful prediction model for the log of the expected number of faults:where deltas describe the number of changes to the code in the past and age is calculated by taking a weighted average of the dates of the changes to the module and it means the average age of the code.

In the evaluation, we use Graves’s model to obtain component failures and get the Spearman correlation with the actual failure data (Figure 7). It is seen that our approach gets a higher correlation with actual failures.

5.2. Threats to Validity

Threats to validity of the study are as follows.

Data Extracting. In Bugzilla, each failure is assigned by a tester to a component. Failures in a component are easy to collect. However, it is difficult to extract source code for each component. When we get source code from the version management systems, we should classify the source code by ourselves. It may bring some unnecessary mistakes. For example, a file that belonged to component 1 in version may be moved into component 2 in version .

Parameter Values. Our evaluation of the components similarity is based on the topics. Since LDA is a probability model, mining different versions of the source code may also lead to different topics. Besides, our work involves choosing several parameters for LDA computation; perhaps, the most important is the number of topics. Also required for LDA is the number of sampling iterations, as well as prior distributions for the topic and document smoothing parameters, and . There is currently no theoretically guaranteed method for choosing optimal values for these parameters, even though the resulting topics are obviously affected by these choices.

6. Conclusion

This paper studies whether and how to use historical semantic and failure information to facilitate component failure prediction. In our work, the LDA topic model is used for software source code topic mining. We map information of source code failures to topics and get TFD. Our result is that the TFD is quite useful in describing the distribution of failures in components. After exploring the base information of word-topic and high frequent words, we find the similar regularity from topics. The experiment shows that the similarity of topics is determined by the similarity of their high frequent words. These two results motivated us to make a prediction model. The TFD is used as the basic information, and the similarity matrix is used as a bridge to connect topics from neighboring versions. Our prediction results show that our predictor has a high precision on predicting component failures. To go a step further and validate the results of our prediction, a rank correlation called Spearman is used. The Spearman correlation ranges from 0.5342 to 0.8337 which beats the similar method. It suggests that our prediction model is well applicable to predict component failures.

Appendix

See Table 4.

Conflict of Interests

The authors declare that there is no conflict of interests regarding the publication of this paper.

Acknowledgment

The work described in this paper was partially supported by the National Natural Science Key Foundation (Grant no. 91118005), the National Natural Science Foundation of China (Grant no. 61173131), the Natural Science Foundation of Chongqing (Grant no. CSTS2010BB2061), and the Fundamental Research Funds for the Central Universities (Grant nos. CDJZR12098801 and CDJZR11095501).

References

  1. L. D. Balk and A. Kedia, “PPT: a COTS integration case study,” in Proceedings of the International Conference on Software Engineering, pp. 42–49, June 2000. View at Scopus
  2. H. C. Cunningham, Y. Liu, P. Tadepalli, and M. Fu, “Component software: a new software engineering course,” Journal of Computing Sciences in Colleges, vol. 18, no. 6, pp. 10–21, 2003. View at Google Scholar
  3. D. Gray, D. Bowes, N. Davey, Y. Sun, and B. Christianson, “Using the support vector machine as a classification method for software defect prediction with static code metrics,” in Engineering Applications of Neural Networks, pp. 223–234, Springer, 2009. View at Publisher · View at Google Scholar
  4. M. Fischer, M. Pinzger, and H. Gall, “Populating a release history database from version control and bug tracking systems,” in Proceedings of the International Conference on Software Maintenance (ICSM '03), pp. 23–32, IEEE, September 2003. View at Publisher · View at Google Scholar · View at Scopus
  5. T. Zimmermann, A. Zeller, P. Weissgerber, and S. Diehl, “Mining version histories to guide software changes,” IEEE Transactions on Software Engineering, vol. 31, no. 6, pp. 429–445, 2005. View at Publisher · View at Google Scholar · View at Scopus
  6. K. El Emam, W. Melo, and J. C. Machado, “The prediction of faulty classes using object-oriented design metrics,” Journal of Systems and Software, vol. 56, no. 1, pp. 63–75, 2001. View at Publisher · View at Google Scholar · View at Scopus
  7. T. J. Ostrand, E. J. Weyuker, and R. M. Bell, “Predicting the location and number of faults in large software systems,” IEEE Transactions on Software Engineering, vol. 31, no. 4, pp. 340–355, 2005. View at Publisher · View at Google Scholar · View at Scopus
  8. T. L. Graves, A. F. Karr, U. S. Marron, and H. Siy, “Predicting fault incidence using software change history,” IEEE Transactions on Software Engineering, vol. 26, no. 7, pp. 653–661, 2000. View at Publisher · View at Google Scholar · View at Scopus
  9. N. Nagappan, T. Ball, and A. Zeller, “Mining metrics to predict component failures,” in Proceedings of the 28th International Conference on Software Engineering (ICSE '06), pp. 452–461, May 2006. View at Scopus
  10. N. S. Gill and P. S. Grover, “Component-based measurement,” ACM SIGSOFT Software Engineering Notes, vol. 28, no. 6, 2003. View at Publisher · View at Google Scholar
  11. Y. Liu, D. Poshyvanyk, R. Ferenc, T. Gyimóthy, and N. Chrisochoides, “Modeling class cohesion as mixtures of latent topics,” in Proceedings of the IEEE International Conference on Software Maintenance (ICSM '09), pp. 233–242, September 2009. View at Publisher · View at Google Scholar · View at Scopus
  12. T. T. Nguyen, T. N. Nguyen, and T. M. Phuong, “Topic-based defect prediction (NIER track),” in Proceeding of the 33rd International Conference on Software Engineering (ICSE '11), pp. 932–935, May 2011. View at Publisher · View at Google Scholar · View at Scopus
  13. G. Maskeri, S. Sarkar, and K. Heafield, “Mining business topics in source code using latent dirichlet allocation,” in Proceedings of the 1st India Software Engineering Conference (ISEC '08), pp. 113–120, February 2008. View at Publisher · View at Google Scholar · View at Scopus
  14. D. M. Blei, A. Y. Ng, and M. I. Jordan, “Latent Dirichlet allocation,” Journal of Machine Learning Research, vol. 3, no. 4-5, pp. 993–1022, 2003. View at Google Scholar · View at Zentralblatt MATH · View at Scopus
  15. T.-H. Chen, S. W. Thomas, M. Nagappan, and A. E. Hassan, “Explaining software defects using topic models,” in Proceedings of the 9th IEEE Working Conference on Mining Software Repositories (MSR '12), pp. 189–198, June 2012. View at Publisher · View at Google Scholar · View at Scopus
  16. F. Elberzhager, A. Rosbach, J. Münch, and R. Eschbach, “Reducing test effort: A systematic mapping study on existing approaches,” Information and Software Technology, vol. 54, no. 10, pp. 1092–1106, 2012. View at Publisher · View at Google Scholar · View at Scopus
  17. M. M. T. Thwin and T.-S. Quah, “Application of neural networks for software quality prediction using object-oriented metrics,” Journal of Systems and Software, vol. 76, no. 2, pp. 147–156, 2005. View at Publisher · View at Google Scholar · View at Scopus
  18. T. Gyimóthy, R. Ferenc, and I. Siket, “Empirical validation of object-oriented metrics on open source software for fault prediction,” IEEE Transactions on Software Engineering, vol. 31, no. 10, pp. 897–910, 2005. View at Publisher · View at Google Scholar · View at Scopus
  19. R. Malhotra, “A systematic review of machine learning techniques for software fault prediction,” Applied Soft Computing, vol. 27, pp. 504–518, 2015. View at Publisher · View at Google Scholar · View at Scopus
  20. L. Guo, Y. Ma, B. Cukic, and H. Singh, “Robust prediction of fault-proneness by random forests,” in Proceedings of the 15th International Symposium on Software Reliability Engineering (ISSRE '04), pp. 417–428, IEEE, November 2004. View at Publisher · View at Google Scholar · View at Scopus
  21. Y. Peng, G. Kou, G. Wang, W. Wu, and Y. Shi, “Ensemble of software defect predictors: an AHP-based evaluation method,” International Journal of Information Technology and Decision Making, vol. 10, no. 1, pp. 187–206, 2011. View at Publisher · View at Google Scholar · View at Scopus
  22. B. Turhan and A. Bener, “A multivariate analysis of static code attributes for defect prediction,” in Proceeding of the 33rd 7th International Conference on Quality Software (QSIC '07), pp. 231–237, October 2007. View at Publisher · View at Google Scholar · View at Scopus
  23. K. O. Elish and M. O. Elish, “Predicting defect-prone software modules using support vector machines,” Journal of Systems and Software, vol. 81, no. 5, pp. 649–660, 2008. View at Publisher · View at Google Scholar · View at Scopus
  24. S. Krishnan, C. Strasburg, R. R. Lutz, K. Goseva-Popstojanova, and K. S. Dorman, “Predicting failure-proneness in an evolving software product line,” Information and Software Technology, vol. 55, no. 8, pp. 1479–1495, 2013. View at Publisher · View at Google Scholar · View at Scopus
  25. X.-Y. Jing, S. Ying, Z.-W. Zhang, S.-S. Wu, and J. Liu, “Dictionary learning based software defect prediction,” in Proceedings of the 36th International Conference on Software Engineering (ICSE '14), pp. 414–423, ACM, Hyderabad, India, May-June 2014. View at Publisher · View at Google Scholar
  26. B. Caglayan, A. Tosun Misirli, A. B. Bener, and A. Miranskyy, “Predicting defective modules in different test phases,” Software Quality Journal, vol. 23, no. 2, pp. 205–227, 2014. View at Publisher · View at Google Scholar · View at Scopus
  27. X. Yang, K. Tang, and X. Yao, “A learning-to-rank approach to software defect prediction,” IEEE Transactions on Reliability, vol. 64, no. 1, pp. 234–246, 2015. View at Publisher · View at Google Scholar
  28. N. Ullah, “A method for predicting open source software residual defects,” Software Quality Journal, vol. 23, no. 1, pp. 55–76, 2015. View at Publisher · View at Google Scholar
  29. G. Abaei, A. Selamat, and H. Fujita, “An empirical study based on semi-supervised hybrid self-organizing map for software fault prediction,” Knowledge-Based Systems, vol. 74, pp. 28–39, 2015. View at Publisher · View at Google Scholar
  30. T. M. Khoshgoftaar, E. B. Allen, K. S. Kalaichelvan, and N. Goel, “Early quality prediction: a case study in telecommunications,” IEEE Software, vol. 13, no. 1, pp. 65–71, 1996. View at Publisher · View at Google Scholar · View at Scopus
  31. N. Ohlsson and H. Alberg, “Predicting fault-prone software modules in telephone switches,” IEEE Transactions on Software Engineering, vol. 22, no. 12, pp. 886–894, 1996. View at Publisher · View at Google Scholar · View at Scopus
  32. S. Neuhaus, T. Zimmermann, C. Holler, and A. Zeller, “Predicting vulnerable software components,” in Proceedings of the 14th ACM Conference on Computer and Communications Security (CCS '07), pp. 529–540, ACM, Denver, Colo, USA, November 2007. View at Publisher · View at Google Scholar · View at Scopus
  33. A. Schröter, T. Zimmermann, and A. Zeller, “Predicting component failures at design time,” in Proceedings of the 5th ACM-IEEE International Symposium on Empirical Software Engineering (ISCE '06), pp. 18–27, September 2006. View at Publisher · View at Google Scholar · View at Scopus
  34. S. W. Thomas, B. Adams, A. E. Hassan, and D. Blostein, “Studying software evolution using topic models,” Science of Computer Programming, vol. 80, pp. 457–479, 2014. View at Publisher · View at Google Scholar · View at Scopus
  35. S. W. Thomas, B. Adams, A. E. Hassan, and D. Blostein, “Modeling the evolution of topics in source code histories,” in Proceedings of the 8th Working Conference on Mining Software Repositories (MSR '11), pp. 173–182, May 2011. View at Publisher · View at Google Scholar · View at Scopus
  36. S. K. Lukins, N. A. Kraft, and L. H. Etzkorn, “Bug localization using latent Dirichlet allocation,” Information and Software Technology, vol. 52, no. 9, pp. 972–990, 2010. View at Publisher · View at Google Scholar · View at Scopus
  37. R. Scheaffer, J. McClave, and E. R. Ziegel, “Probability and statistics for engineers,” Technometrics, vol. 37, no. 2, p. 239, 1995. View at Publisher · View at Google Scholar
  38. P. Knab, M. Pinzger, and A. Bernstein, “Predicting defect densities in source code files with decision tree learners,” in Proceedings of the International Workshop on Mining Software Repositories (MSR '06), pp. 119–125, May 2006. View at Publisher · View at Google Scholar · View at Scopus
  39. F. Akiyama, “An example of software system debugging,” in Proceedings of the IFIP Congress, pp. 353–359, Ljubljana, Slovenia, 1971.
  40. A. Hindle, M. W. Godfrey, and R. C. Holt, “What's hot and what's not: Windowed developer topic analysis,” in Proceedings of the IEEE International Conference on Software Maintenance (ICSM '09), pp. 339–348, IEEE, Edmonton, UK, September 2009. View at Publisher · View at Google Scholar · View at Scopus
  41. H. M. Wallach, I. Murray, R. Salakhutdinov, and D. Mimno, “Evaluation methods for topic models,” in Proceedings of the 26th Annual International Conference on Machine Learning (ICML '09), vol. 4, pp. 1105–1112, 2009. View at Publisher · View at Google Scholar
  42. T. L. Griffiths and M. Steyvers, “Finding scientific topics,” Proceedings of the National Academy of Sciences of the United States of America, vol. 101, supplement 1, pp. 5228–5235, 2004. View at Google Scholar
  43. S. Grant and J. R. Cordy, “Estimating the optimal number of latent concepts in source code analysis,” in Proceedings of the 10th IEEE International Working Conference on Source Code Analysis and Manipulation (SCAM '10), pp. 65–74, September 2010. View at Publisher · View at Google Scholar · View at Scopus
  44. A. D. Lovie, “Who discovered Spearman's rank correlation?” British Journal of Mathematical and Statistical Psychology, vol. 48, no. 2, pp. 255–269, 1995. View at Publisher · View at Google Scholar
  45. M. D'Ambros, M. Lanza, and R. Robbes, “An extensive comparison of bug prediction approaches,” in Proceedings of the 7th IEEE Working Conference on Mining Software Repositories (MSR '10), pp. 31–41, IEEE, May 2010. View at Publisher · View at Google Scholar · View at Scopus