Abstract

Owing to the ever-expanding scale of software, solving the problem of bug triage efficiently and reasonably has become one of the most important issues in software project maintenance. However, there are two challenges in bug triage: low quality of bug reports and engagement of developers. Most of the existing bug triage solutions are based on the text information and have no consideration of developer engagement, which leads to the loss of bug triage accuracy. To overcome these two challenges, we propose a high-dimensional hybrid data reduction method that combines feature selection with instance selection to build a small-scale and high-quality dataset of bug reports by removing redundant or noninformative bug reports and words. In addition, we also study the recent engagement of developers, which can effectively distinguish similar bug reports and provide a more suitable list of the recommended developers. Finally, we experiment with four bug repositories: GCC, OpenOffice, Mozilla, and NetBeans. We experimentally verify that our method can effectively improve the efficiency of bug triage.

1. Introduction

A large amount of data is generated during software development and maintenance. Bug reports are generated continuously during this process. A bug report contains a basic description of the bug, error messages, current status of the bug (whether it has been solved or assigned to a developer), etc. [1]. According to statistics, bug fixing increases the costs of software companies by 45% [2] and increases the time of software development. Therefore, solving the problem of bug triage efficiently and accurately, that is, assigning the bug to the most suitable developer [39], becomes important for software projects. Bug tracking systems were developed to track all aspects of bug messages and help software developers fix bugs in time. These systems play an important role in dispatching work between developers. Typical bug tracking systems are Mantis [10], Bugzilla [11], and JIRA [12].

The earliest method of bug triage was to manually assign a bug to a corresponding developer. The senior manager would read a bug report in the bug tracking system and would select a suitable developer for this bug report based on his/her own experience and knowledge of the developer’s ability. However, this method of assignment not only wastes time and human resources, but also has limited accuracy. First, in large-scale software development, the number of bugs dramatically increases, and the quality of bug reports may vary. In 2001–2010, 333,371 bugs were submitted to Eclipse from more than 34,917 developers. When developers submit many duplicated or invalid bugs, a lot of time is wasted [1315]. Moreover, owing to a large number of developers, senior managers cannot remember the bug-fixing skills of each developer and the types of bugs that this developer is good at. Thus, senior managers may assign bugs improperly, which may reduce the accuracy of bug fixes. Recently, a novel approach based on software networks is proposed to address software quality-related problems, e.g., in the perspective of bug network [16] and social networks [17]. These methods need two essential premises, constructed network extracted from software and extra information such as the social relationship of developers. Due to these drawbacks, it is difficult to widely adopt to improve software quality and bug triage.

To solve the above problems of manual bug triage and improve classification accuracy, Murphy et al. [5] proposed to apply text categorization technology to bug triage; it automatically generated a list of recommended developers after training on a bug dataset with developer labels. In their research, a bug report is an instance. The words in the bug report are the corresponding attributes. The developer is the label of an instance. Then, they apply the classification algorithm to predict the best developer sequence for bug fixes. Subsequently, some researchers used the vector space model to represent the bug report [2, 7, 1824]. In addition, the theme model was used to identify different documents, and the words within the document have a specific relationship to each theme [25]. Researchers [2631] not only used the theme model but also studied metadata (such as products, components, and operating system types in the bug report) to improve the accuracy.

Two challenges of automatic bug triage technology remain to be solved. First, the original bug repositories are large scale with low-quality bug reports. Many problems may arise due to large scale and low quality, such as extensive computations and a decline in predictive performance. Second, most of the existing work ignores the influence of developer engagement. Developers who have joined recently may be relatively more active because other developers may change positions or leave.

In this study, we propose a high-dimensional hybrid data reduction method to combine feature selection with instance selection method to build a small-scale and high-quality set of bug reports by removing redundant or noninformative bug reports and words. Our method combines feature selection (FS) with instance selection (IS) using the differential evolution (DE) method with updated rules of crossover and variation. In addition, we consider the developer engagement in each project for more reasonable bug triage. If the developer has dealt with bugs with this kind of product information before, we consider he/she is more active, and we are more likely to assign this bug report to him/her. We conduct experiments on four public bug repositories: GCC, OpenOffice, Mozilla, and NetBeans. The main contributions of this paper are summarized as follows:(1)We present a high-dimensional hybrid data reduction method based on DE to obtain a small-scale and high-quality dataset for bug triage. Specifically, we aim to address two aspects: (a) to simultaneously reduce the bug reports dimension and the words dimension and (b) to improve the performance of bug triage.(2)We consider the developer engagement level further and reorder the optimal list of recommended developers, combining it with product information that is related to bug reports and developers.(3)We verify the effectiveness of our proposed method on four bug repositories (GCC, OpenOffice, Mozilla, and NetBeans). The experimental results show that the results obtained by using our proposed method are superior to the existing methods.

The rest of the paper is structured as follows. Section 2 presents closely related work on bug triage. Section 3 details the proposed DE method, which is improved in our paper. Section 4 describes our experiment with GCC, OpenOffice, Mozilla, and NetBeans and explains the results. Finally, Section 5 concludes our paper and describes future work.

2.1. Methods of Bug Triage

The main purpose of bug triage is to find the right developer to fix a newly submitted bug [2, 7, 11, 19, 20, 3234]. At present, machine learning has been used to accomplish automatic bug dispatching. When we use machine learning, the developer is considered as the category tag of the bug, and the text information of the bug is regarded as a feature. First, the algorithm learns on historical bug data. Then, the algorithm can predict which developer should be assigned to a new bug report. Murphy and Cubranic [7] propose a text classification approach to bug triage. Based on the Naïve Bayes classification algorithm, a list of recommended developers was predicted for Eclipse. Anvik et al. [2] used a variety of machine learning algorithms (such as Naïve Bayes, SVM, and C4.8) to learn historical data for bug triage. Tamrawi et al. [23] proposed “Bugzie,” a method in which a fuzzy set was used to simulate the developer’s expertise to determine whether a new bug report is suitable for this developer. Naguib et al. [35] used a latent Dirichlet allocation language model to learn the similarity between new bug reports and developers by learning from previously available data on bugs already fixed by developers. Yang et al. [27] divided bug reports into different topics. Zhang et al. [36] considered both developer relationships and topic models to recommend the bug report to a developer who would fix it better. Jeong et al. [21] used a Markov chain-based graph model to improve bug triage accuracy and tested it on Eclipse and Mozilla. Xia et al. [37] proposed a composite method named DevRec and analyzed bug reports and developers simultaneously.

Except for the bug report features, other information is also used to classify bugs. Linares-Vásquez et al. [38] analyzed the title comments of related documents and found relevant documents to recommend developers. Bhattacharya and Neamtiu [22] studied the tossing graph with various attributes, which included the developers who could not fix this bug and others who could fix it, to improve the triage accuracy. Kevic et al. [39] assigned the current bug to a developer by finding a developer whose fixed bugs were similar to the current bug report. Wang and Lo [40] used a new approach named FixerCache, which assigned new bug reports to developers based on the developer’s enthusiasm for different product components. In the study by Shokripour et al. [41], four information resources were considered to obtain a list of recommended developers. Xia et al. [37] proposed the TopicMinerMTM model, which used training data to model the topic distribution of bug reports. However, Xia’s approach had difficulties with distinguishing similar developers.

2.2. Bug Triage and Software Networks

Besides the bug reports, researchers adopt more information to address the bug-related problems. Because the quality of a software system is partially determined by its structure (topological structure), software systems can be modeled as complex networks in which software components are abstract nodes and their interactions are abstract edges. Therefore, researchers have proposed many approaches and measures based on general software networks. Zhang and Lee [42] proposed an automated developer recommendation approach for bug triage via building the concept profile (CP) for extracting the bug concepts with topic terms from the documents produced by related bug reports. They identify and rank the important developers by using social network (SN) for bug fixing. Because of the functional form of the incoming link distribution of software dependence network, software is fragile with respect to the failure of a random single component. Locating a faulty component is easy if the failure only affects its nearest neighbors, while it is hard if it propagates further. Challet and Lombardoni [43] addressed the issue of how software components are affected by the failure, and the inverse problem of locating the faulty component through adopting bug propagation and debugging in asymmetric software structures. Pan et al. [44] also analyzed the bug propagation process based on the weighted software networks (WSNs). It considered the process of a bug in one class propagating to the other and is effect to locate the source of component with bug. The aforementioned approaches take advantage of complex networks, especially the relationship among developers, users, software components, and bug reports. In the perspective of software networks, bug triage involves developers and bug reports, and thus, the improvement of the accuracy of bug triage is able to promote and enhance the effectiveness of software network-related methods. For example, bug triage can help us gain a deeper insight of bug propagation on the software networks. Correspondingly, the novel measurements based on software networks may provide more useful information for bug triage. In other words, they are mutually beneficial.

In summary, we find that these traditional feature selection approaches do not have sufficient search range and depth and do not consider chronological order. Therefore, the denoising ability is limited, and the effect on data reduction is not obvious. This study addresses these two aspects. First, we fully consider the chronological order of bug reports. Second, we propose to improve the differential evolution method based on this to expand the search range and search depth. Meanwhile, we also can ensure the convergence of the method, and the accuracy of bug triage increases effectively. Unlike the above research, our paper focuses more on developer engagement. Except for the text provided in the bug report itself, the bug report and the developer’s developers’ bug-fixing experience are also used to obtain the best list of recommended developers. When the similarity of different bug reports is very high, our approach helps the classifier to strictly divide different products to complete the bug triage task successfully and substantially improve the efficiency of bug triage.

3. The Proposed Algorithm

3.1. Overview

In this section, we propose a high-dimensional hybrid data reduction method for bug triage, as shown in Figure 1. The method includes three main phases: data preprocessing, data reduction, and developer engagement detection. In the data preprocessing phase, we preprocess each bug report using word segmentation, stop words, stemming, and vector space representation to obtain the word vectors. In the data reduction phase, we propose a high-dimensional hybrid data reduction method to combine feature selection with the instance selection method to build a small-scale and high-quality dataset of bug reports by removing redundant or noninformative bug reports and words. In the developer engagement detection phase, we consider the influence caused by the product information of the current test case, which can include the level of developer engagement in the final order of recommended developers. These three models are described in the following sections.

3.2. Data Preprocessing

After extracting text and developer information from the bug report tracker, our method preprocessed the data to obtain word vectors for each bug report. The text preprocessing includes tokenization, stop-word removal, stemming, and keyword vector representation [11]. Specifically, tokenization converts text from the original bug report into a set of words. Stop-word removal refers to removing insignificant words that appear frequently in bug reports (such as “the” or “in”). Stemming transforms words that may appear in different forms into their basic form (for example, “computerized” can be changed into “computer”). Keyword vector representation produces a word vector to model a bug report after previous steps, and we delete words with the word frequency less than 10. Finally, our method uses a multidimensional vector space, where each word represents a dimension to describe the processed bug report collection. Every bug report is a vector based on the word dimensions, as shown in the following equation:where is a bug report and refers to word in the word dimension.

3.3. Data Reduction

After using the bug report preprocessing model, our method obtains word vectors for bug reports. Considering that some bug reports will become outdated over time, the data will be noisy and redundant. To reduce the impact of chronological order, our method divides the bug reports into three parts according to the ratio of 7 : 1 : 2 from small ID to large ID (the first 70% is the training set, the middle 10% is the verification set, and the last 20% is the test set). In the search process, our method first uses the training set for training and the verification set for examining to find the optimal solution results with FS, IS, and FS + IS. Next, the combination of the training set and validation set is regarded as a new training set, and our method obtains a final list of Top-10 developers for the test set. Subsequently, we will explain the rules of the DE method and the data reduction method of FS, IS, and FS and IS simultaneously (FS + IS).

3.3.1. Population Coding

To represent feature combination and instance combination clearly and intuitively, we adopt binary coding: our method uses a feature vector with feature dimensions to replace the feature sequence in the population, and it transforms the selected feature combinations into one 0, 1 binary string by the following equation:where is a feature vector corresponding to a feature sequence, indicates that feature is not selected and indicates that feature is selected, and is the total number of features. Thus, a binary string represents a feature selection method.

Similarly, the sequence of instances in the dataset can be represented as a feature vector of , and the selected instance combination is also a 0, 1 binary string, which is described by the following equation:where is a feature vector corresponding to an instance sequence, indicates that instance is not selected and indicates that instance is selected, and is the total number of instances. A binary string represents an instance selection method.

3.3.2. Population Initialization

The population initialization in DE_IS (initial feature selection scheme): we sort bug reports by their IDs, extract all features to obtain a feature set, and generate 10 initial selection options that select 10%, 20%, 30%, 40%, 50%, 60%, 70%, 80%, 90%, and 100%, respectively.

The population initialization in the initial feature selection scheme (DE_IS): we generate a random number between [0, 1] for each position in the binary string of each scheme in the instance selection scheme. If the number is less than 0.5, this position is set to 0. Otherwise, it is set to 1. In this case, our method can obtain ten initial instance selection schemes.

The initial extraction scheme in DE_(FS + IS) (combined scheme of feature selection and instance selection): first, our method generates 10 initial schemes according to the initial population generation methods in DE_FS and DE_IS; second, the generated features and instances are combined with the corresponding number of binary strings. Then, our method obtains 10 extraction schemes in the initial population.

For each individual in the initial population, after calculating their values of fitness, our method records the maximum value and its corresponding code value.

3.3.3. Genetic Manipulation

(1) Variation (Differential Variation). First, our method generates a variation rate randomly between [0, 1] and defines the differential variation rate as ( in this method is always dynamic; it is judged according to the number of iterations that our method is inclined to random variation or differential variation, which takes place between the individual and the currently optimal individual. This is described by the following equation:where is the differential coefficient of variation, index is the current iteration number, and is the number of iterations. If , we use this extraction scheme as a parent to carry out differential variation. Otherwise, we do not perform variation. In differential variation, (the number of mutated genes) positions of variant genes () are randomly selected in the scheme, and differential variation rule is defined by the following equation:where is the current variant gene position, is the optimal individual corresponding gene position, is the corresponding gene position of the new individual, and is a random number. If is different from , is 1; if not, a random number between [0, 1] is generated. If , then is 1; otherwise, it is set to 0. The other genetic positions of the new individual are identical to the currently selected parent.

(2) Crossover. Similarly, a crossover probability is randomly generated at start. The crossover variation rate is defined as (also dynamically changed), and the formula is shown in the following equation:where is the cross coefficient of variation, index is the current iteration number, and is the number of iterations. If , cross-variation is performed to create a child generation. When cross-variation works, two positive integers and are randomly generated ( and ), then the code between and is cross-operated according to the midpoint , where . The rest of the new individual is the same as the parent.

(3) Variation (Random Variation). For each extraction scheme of population, our method generates a variation rate and defines the random variation rate as (also dynamically changed); its formula is defined as follows:where is the random variation coefficient, index is the current iteration number, and is the number of iterations. When is less than , the extraction scheme mutates; otherwise, it is not mutated. During the random variation, positions of variant genes () are randomly selected in the binary string, and the random variation rules are described as follows:where is the current variant gene position and is the corresponding gene position of the new individual. If its value is 0, it will be changed to 1, and the other condition is the opposite.

(4) Selection. We calculate the fitness values of all individuals in the population. Our method not only conserves the new individuals that are generated by the crossover, but also the parent individuals. Subsequently, a new population with twice the size is aggregated. Next, all individuals are sorted according to their fitness value in descending order. The former half is selected, and the latter half is eliminated. Meanwhile, the binary code that has optimal fitness is updated.

3.3.4. Data Reduction Algorithm

After our analysis, both FS and IS can achieve the goal of extracting useful attributes. FS can be reduced at the attribute level, and IS can be reduced at the instance level. The effect of FS + IS might surpass the result obtained by using FS and IS separately. Thus, we choose three feature extraction methods: FS, IS, and FS + IS.

The parameters used in this section are defined in Table 1.

Algorithm 1 indicates that we carry out feature selection (DE_FS) or instance selection (DE_IS) on the dataset. In the first line, we empty the initial extraction scheme and the best scheme. In lines 2–6, we judge whether it is a feature selection or an instance selection, and we call Algorithm 2 for the former and Algorithm 3 for the latter. In lines 7-8, we calculate the fitness value of each extraction scheme in population through the fitness function (the accuracy of Naïve Bayesian Mode (NBM)) and record the extraction scheme with the largest fitness value. For each of these three variants, parent can only choose one method to mutate and generate a child. Lines 13–26 are the choice of differential variation. And lines 28–36 present the choice of cross-variation. The choice of random variation is in lines 38–44. In lines 49–54, we unite the parents and child generated, sorting the fitness in descending order, select the best Top 10 to enter the next generation, and update the optimal fitness and the corresponding binary code. In line 57, after executing T iterations, we decode the saved optimal extraction scheme into a corresponding dataset . In line 58, we return , which is reduced (if this gene position is different from the corresponding position of an optimal individual, the same position of the new individual is 1. Otherwise, a random number between [0, 1] is generated. If this number is equal or greater than 0.5, the position of the new individual is 1; otherwise, it is set to 0. The other genetic positions of the new individual are identical to the currently selected parent).

Input: , , , , , , ,
Output:
(1)
(2)if then
(3)  
(4)else
(5)  
(6)end if
(7)
(8)
(9)
(10)while do
(11)  for do
(12)   
(13)   if then
(14)    
(15)    
(16)    for do
(17)     if then
(18)      
(19)     else
(20)      if then
(21)       
(22)      else
(23)       
(24)      end if
(25)     end if
(26)    end for
(27)    
(28)   else
(29)    if then
(30)     
(31)     if then
(32)      
(33)     else
(34)      
(35)      
(36)     end if
(37)     
(38)    else
(39)     if then
(40)      
(41)      
(42)      for do
(43)       
(44)      end for
(45)     end if
(46)    end if
(47)   end if
(48)  end for
(49)  
(50)  
(51)  if then
(52)   
(53)  end if
(54)  
(55)  
(56)end while
(57)
(58)return
Input: , ,
Output:
(1)
(2)
(3)for do
(4)  
(5)  for do
(6)   
(7)   
(8)  end for
(9)end for
(10)return
Input: , ,
Output:
(1)
(2)for do
(3)  Treat individual i as ins
(4)  for ins gene location j from 1 to M do
(5)   
(6)   if then
(7)    Change the j locus of the ith individual to 1
(8)   else
(9)    Change the j locus of the ith individual to 0
(10)   end if
(11)  end for
(12)end for
(13)
(14)return

Algorithm 2 represents the population initialization process when we select features using the DE method. In the first line, we empty the initial extraction scheme. In line 2, the IG is used to select features and is recorded as metrics. In lines 3–9, we sort the training set from high to low according to the importance of features and select top 10%, 20%, 30%, 40%, 50%, 60%, 70%, 80%, 90%, and 100% to add them into the feature selection scheme (Initialize_Population_FS). We set the extracted feature column to 1 and the opposite feature column to 0.

Algorithm 3 represents the population initialization process when we select instances using the DE method. In the first line, we empty the initial extraction scheme. In lines 2–11, each instance selection scheme is represented as a binary string. Then, we generate a random decimal between 0 and 1 for each gene position of each extraction scheme. If the number is greater than or equal to 0.5, the corresponding gene position of the individual is 1. Otherwise, it is set to 0. We add generated instance selection schemes to Initialize_Population_IS. In line 14, we return all generated extraction schemes.

DE_(FS + IS) indicates that we select features and instances simultaneously for the dataset using the DE method. In line 1, we empty the initial extraction scheme and the best best_individual scheme. In lines 2–6, we combine the initialized feature selection scheme with the initialized instance selection scheme. In lines 7-8, we calculate the fitness value of each extraction scheme to save the best extraction scheme and its fitness value. Differential variation is described in lines 11–26. Cross-variation is detailed in lines 28–37. Lines 39–47 present the random variation. In lines 51–56, parents and children are mixed and sorted by the fitness values to select the best Top 10, which can enter the next generation. The optimal fitness and binary code are updated. In line 60, after T iterations, we obtain Reduced_Train by decoding the best extraction scheme. In line 61, we return the reduced Reduced_Train.

3.4. Engagement Detection

After analyzing the actual situation, we find that some developers may change jobs or leave the company over time. Meanwhile, a bug report that has the same problem with different product information may be very similar. Thus, we reorder developers depending on their level of engagement and the product information. First, we find the corresponding product information in the latter bug reports of each developer’s training set (the original training set and the validation set, which analyze developer engagement using recent bug reports). Subsequently, we use a linked list to encapsulate the data to store more product information. Next, the NBM classifier is used to select the Top 30 developers. We record the product information of current bug reports and check whether each developer has dealt with the same product information or not. If not, we choose to discard this developer, and the subsequent developers will fill up the empty positions one by one. Finally, we can select the optimal Top 10 in this way (Algorithm 4).

Input: , , , , , , ,
Output:
(1)
(2)for do
(3)  
(4)  
(5)  
(6)end for
(7)
(8)
(9)while do
(10)  for do
(11)   
(12)   if then
(13)    
(14)    
(15)    
(16)    for do
(17)     if then
(18)      
(19)     else
(20)      if then
(21)       
(22)      else
(23)       
(24)      end if
(25)     end if
(26)    end for
(27)    
(28)   else
(29)    if then
(30)     
(31)     
(32)     
(33)     
(34)     
(35)     
(36)     
(37)     
(38)     
(39)    else
(40)     if then
(41)      
(42)      
(43)      
(44)      for do
(45)       
(46)      end for
(47)     end if
(48)    end if
(49)   end if
(50)  end for
(51)  
(52)  
(53)  .
(54)  if then
(55)   
(56)  end if
(57)  
(58)  
(59)end while
(60)
(61)return

4. Experiment

We conduct several experiments to verify our method. In Section 4.1, we describe the experimental design, including data preparation and experimental setup. In Section 4.2, we specify the evaluation metrics. In Section 4.3, we discuss experimental results, which answer our research questions.

4.1. Experimental Design
4.1.1. Data Preparation

To demonstrate the effectiveness of our approach, we carry out a series of experiments on four large-scale open-source bug repositories: GCC, OpenOffice, NetBeans, and Mozilla. We only collected the fixed bug reports which were denoted as “resolved” or “closed” before December 31, 2014, due to their stability and reliability. We have uploaded our experimental datasets to URL (https://github.com/gexinxinge/complexnetwork). Table 2 shows the dataset statistics, including the total number of products, total number of reports, total number of words, total number of developers, and time range.

We use the processing method from the previous research [23, 26]. The label of the developer who was assigned to the bug report is regarded as the developer who fixed this bug. To obtain a list of recommended developers, we exclude the bugs that were never fixed [2]. Moreover, data on developers who worked on a small number of bug reports would not be helpful for obtaining accurate results; therefore, we exclude the developers who worked on less than 10 bug reports from our study.

4.1.2. Experimental Setup

In this paper, we segment data by a chronological method. The bug reports are divided into three parts in chronological order. Seventy percent is used as the training set, ten percent is used as the validation set, and the remaining twenty percent is used as the test set. We make the following adjustments to the parameters (the experimental parameters of the three differential methods are consistent): . It is worth pointing out that our approach is sensitive to these parameters. In equation (9), makes an influence on differential variation rate (Pd). In equation (10), decides the value of crossover variation rate (Pc). And in equation (11), affects the random variation rate (Pr). For example, if the values of and are too large, it is hard to remove unimportant features. However, small values may lead to the local optimum. Therefore, we use reasonable values published by others’ experience using difference algorithm. In this way, our method can not only have sufficient search scope, but also converge in the end.

We use two types of benchmarks for comparison: supervised methods and unsupervised methods. Supervised benchmark methods mostly include classifiers that are effective in text classification: Naïve Bayes (NB), polynomial Naïve Bayes (NBM), support vector machine (SVM), k-nearest neighbor (KNN), random tree (RT), and decision tree (J48).

4.2. Evaluation Metrics

In this paper, we use the accuracy of Top-k developers to evaluate the quality of bug triage, that is, the ratio of the predicted exact quantity to all test data. The accuracy of Top-1 to Top-10 is examined in the experiment altogether. The accuracy of Top-k is calculated by the following equation:where is the accuracy of Top-k, refers to the number of bug reports that are assigned correctly to Top-k, and is the total number of bug reports in Top-k. A higher describes a better list of recommended developers.

4.3. Experimental Results
4.3.1. RQ1: Which Supervised Learning Algorithms Are the Most Suitable?

In this experiment, six supervised learning algorithms (NB, NBM, SVM, KNN, RT, and J48) are applied to classify four open-source bug repositories (Mozilla, NetBeans, OpenOffice, and GCC). We use the classification accuracy as an evaluation metric to choose the best classification algorithm.

The results are shown in Tables 36. According to the results, the classification effect of NBM (which is presented in bold) is the best among the six methods. The highest classification accuracies of NBM on four bug repositories are 28.89%, 57.59%, 48.39%, and 68.84%, respectively. Moreover, NB ranks second; its highest classification accuracies are 23.69%, 54%, 37.94%, and 50.27%, respectively. Both methods have much higher classification accuracy than other classifiers.

For OpenOffice, the accuracies of NBM from Top-1 to Top-10 are much higher than those of other algorithms (NB, NBM, SVM, KNN, RT, J48). Additionally, the Top-10 accuracy of NBM is 37.33% higher than the Top-10 accuracy of SVM on OpenOffice. Similarly, Table 6 shows that NBM performs the best among six approaches (NB, NBM, SVM, KNN, RT, and J48). The Top-1 accuracy of NBM is 22.76% higher than that of RT and the Top-10 accuracy is 55.8% higher than RT on GCC.

Therefore, we conclude that NBM is the most effective algorithm among all methods. Thus, we use NBM as a reliable classification algorithm in subsequent experiments.

4.3.2. RQ2: Can Our Proposed FS and IS Methods Perform Better Than Traditional FS and IS Methods in terms of Data Reduction?

In this part, we conduct four experiments on four open-source bug repositories, respectively. For each bug repository, we consider the data reduction of the original bug repository to specific data size, using feature selection or instance selection methods as benchmarks. We choose three traditional feature selection methods (IG, CHI, and SU) and three traditional instance selection methods (CNN, ENN, and ICF) as benchmarks to calculate Top-k accuracies, which reduce the original bug repository to the same data size as benchmarks. The accuracies of our proposed FS and IS heuristic search method with the same size of data reduction are also obtained. The column with the best effect is shown in bold in Tables 710.

Table 7 shows that our proposed FS method has 26.01% accuracy of Top-10 and the IS method has 30.44% accuracy of Top-10. Compared with the original data reduction and the traditional FS and IS methods, our proposed methods perform better for Mozilla. Meanwhile, the accuracy of IS is 4.43% higher than that of FS. The result for OpenOffice (Table 9) is quite similar to Mozilla. Our proposed FS method has the highest accuracy of 46.26% among feature selection methods, and the accuracy of our proposed IS method is 53.76%, which is also the best in IS approaches. The accuracy of IS is 7.5% higher than FS. Table 10 shows similar experimental results for GCC: the highest accuracy is 84.51% for our proposed FS method (in FS methods) and the highest accuracy is 83.69% for our proposed IS approach (in IS approaches). The only difference from the former is that FS is 0.82% higher than IS, which indicates that FS is more suitable for data reduction on GCC. However, there are many differences in NetBeans results (Table 8): our FS method is still the best with 54.14% accuracy among FS methods. However, a traditional IS approach called ENN has the highest accuracy of 59.88% in IS methods. Our IS method appears to overfit. The accuracy is rather high on the validation set (61%) but is reduced on the test set. A possible reason is that the developer flow of NetBeans may have been frequent recently. Moreover, NetBeans is not greatly affected by text information. In the RQ5, we verify this reason further.

We consider that when reducing to the same data size, the heuristic search method is generally better than the traditional IS and FS search method. The main reasons are as follows:(1)The traditional FS methods and traditional IS methods are based on the overall training set. However, the effect of the bug report is time-sensitive. Outdated bug reports will add redundancy and noise, which may affect the classification accuracy. Traditional IS and FS methods cannot effectively eliminate noisy and redundant data.(2)In the 7 : 1 : 2 search strategy, our proposed approach focuses more on extracting features related to recent bug reports, which is helpful for removing noisy and redundant data.

4.3.3. RQ3: Are the Accuracies of Our Proposed FS + IS, Best FS, and Best IS Methods Improved Comparing with the FS + IS Results of Xuan? If Our Results Are Improved, Which One Is the Most Effective among Our Three Methods?

Based on the experimental results in RQ2, we reproduce the FS -> IS (IG + ICF) and IS -> FS (ICF + IG) experiments of Xuan and gain the Top-k accuracy on four bug repositories (Mozilla, NetBeans, OpenOffice, and GCC). In addition, we also combine our proposed FS method with the IS method (FS + IS) to reduce data of four bug repositories. The comparative results are shown in Tables 1114. Compared with FS -> IS and IS -> FS, our proposed heuristic search method is better when reducing to the same data size.

Our FS + IS method has the best effect for Mozilla in Table 11 and NetBeans in Table 12. The accuracies are 29.44% and 56.07%, respectively. The Top-10 accuracy of FS + IS is 6.78% higher than that of IS -> FS on Mozilla and 16.42% higher than that of IS -> FS on NetBeans. We think another reason is that the heuristic search FS + IS focuses on the extraction of features related to recent bug reports. Thus, the amount of information loss is relatively small, and the ability to denoise and exclude redundancy is stronger. For OpenOffice in Table 12, the best accuracy 53.76% belongs to our IS method, which is 7.73% higher than IS -> FS and 45.57% higher than FS -> IS. On the contrary, we can find that our FS approach, which has 84.51% accuracy, is the most effective on GCC in Table 14. The Top-10 accuracy of FS is 9.23% higher than that of IS -> FS method on GCC.

We conclude that our proposed approaches are overall more effective than the results of Xuan. For four different bug repositories, FS + IS performs well on Mozilla and NetBeans. However, OpenOffice has the optimal effect with IS and the best of GCC is FS.

4.3.4. RQ4: How about the Data Reduction Effect of Our Proposed FS, IS, and FS + IS Methods?

In this experiment, we use our proposed FS, IS, and FS + IS methods based on particle swarm optimization to obtain data reduction rates and study the effect of our approaches on four open bug repositories (Mozilla, NetBeans, OpenOffice, and GCC). The results are detailed in Tables 1518 (data reduction degree of FS or IS and overall reduction accuracy refer to the ratio of the reserved instances or features to the original quantity). Here, the “original information” represents the dataset which is divided into 70 training dataset, 10 validation dataset, and 20 test dataset. Our approach finds primary features and instances using differential evolution algorithm with training set. Then, they are merged to get a new training set “original information (7 + 1).”

According to our results, the reduction degrees of FS + IS are 96.05% on Mozilla, 94.43% on GCC, and 97.08% on OpenOffice, which can substantially reduce the data dimensions. However, FS + IS does not work best on any bug repository. For Mozilla, FS has the highest reduction degree of 97.21%, which is 1.16% higher than FS + IS. However, the reduction degrees of FS are smaller than for the FS + IS method on OpenOffice by 2.16% and GCC by 16.19%. The IS method also has a low degree of reduction on Mozilla, OpenOffice, and GCC. Nevertheless, it can reach 50% on OpenOffice. Moreover, our three methods do not perform well on NetBeans. If we compare the performance of three methods according to the degree of reduction, the result is FS + IS > FS > IS.

The degree of data reduction by FS, IS, and FS + IS based on the DE method is quite large. Compared with the traditional methods of removing invalid bug reports by name, the heuristic search based on the DE method, which can automatically delete invalid bug reports, is more effective in denoising and excluding redundancy. However, according to the previous experiment, FS + IS cannot achieve a further improvement in the accuracy of the NBM classifier compared to separate FS and IS methods. We consider the reason may be the excessive reduction, which can result in serious information loss. Therefore, we conclude that the data reduction of FS, IS, and FS + IS based on the DE method is very effective.

4.3.5. RQ5: How Can the Developer’s Activities Affect the Optimal List of Recommended Developers?

In this part, we add the developer engagement level into our experiment to calculate the accuracy on four bug repositories (Mozilla, NetBeans, OpenOffice, and GCC) from Top-1 to Top-10. The results are shown in Tables 1922 (in Tables 1922, we use “Without developer engagement” to describe not considering the developer engagement, and “With developer engagement” refers to the consideration of developer engagement).

According to the results, NBM’s accuracy improves higher on all feature extraction schemes except for Mozilla. The highest classification accuracies on the four bug repositories have changed compared with accuracies without considering developer engagement. The improved percent of FS + IS can reach up to 2.63% on Mozilla. For NetBeans, the overall accuracies of FS, IS, and FS + IS have substantially improved. The Top-10 accuracies of FS, IS, and FS + IS have gone up 5.64%, 2.47%, and 2.91%, respectively, which verifies our thinking in RQ2. The results indicate that the accuracy of NetBeans with consideration of developer engagement has substantial improvement, which explains that the developer flow is frequent in NetBeans. Table 21 shows that the accuracies of the three proposed methods improve on OpenOffice, and FS has a good improved range of 9.33%. Similarly, considering developer engagement, the accuracy of GCC is going up slightly. FS + IS has the biggest improvement of 1.05% among the three proposed approaches.

Meanwhile, the study of developer engagement alleviates the overfitting problem effectively for feature extraction on NetBeans. Moreover, the accuracy of NBM is significantly higher than the original accuracy. Because the accuracy of the feature extraction scheme of FS + IS has been obviously improved, we conclude that the introduction of developer engagement can successfully compensate for the problem of information loss caused by FS + IS.

We find that the accuracy on the test set using the NBM classifier generally presents a trend of increasing first, then lowering, and finally, flattening with increasing N. After a careful analysis, we find that the two kinds of information compete with the growth of N. One is the effective information related to the test set, and the other is the noise. This confrontation relationship leads to the accuracy increasing first and then decreasing with the growth of N. For the data reduction of Mozilla_total’s DE_FS, the accuracy is always at a lower degree with N changing. It is because the flow of Mozilla_total staff changes frequently with time, which causes the noisy and redundant data to grow and fragment. However, we also found that the dataset generated by Mozilla_total’s DE_IS scheme performs well after adding developer engagement, which indicates that the denoising ability of IS on the Mozilla_total is better than that of the FS approach.

We conclude that the introduction of developer engagement can effectively improve the classification accuracy of NBM. Moreover, it significantly alleviates the overfitting phenomenon which happens when a model learns the details and noise in the training data to the extent that it negatively impacts the performance of the model on unseen data. This means that the noise or random fluctuations in the training data is picked up and learned as concepts by the model. The problem is that these concepts do not apply to new data and negatively impact NBM’s ability to generalize. In our improved NBM classification, removal of redundant features from bug trial dataset can prevent overfitting. In addition, the developer engagement effectively compensates for information loss caused by the FS + IS method and substantially increases the accuracy. Meanwhile, compared with the overall dataset, the optimal reference range is smaller and easier to implement if considering developer engagement.

In Table 23, we analyze the peak of engagement, which can explain the frequency of developers’ flow in different bug repositories. A greater distance between the peak and the average means a higher frequency of recent personnel movements. We can learn that the developers of Mozilla and OpenOffice flow more frequently than GCC and NetBeans.

5. Conclusion and Future Work

In this paper, we propose a new bug triage method for recommending suitable developers to fix newly reported bugs. To solve the problem of small search range and neglecting the chronological order in the traditional bug triage method, we improve the existing heuristic search method and expand the search scope further based on the chronological order of bug reports. We find that developer engagement has an impact on bug triage; therefore, in addition to the text information provided in the bug report, we consider the developer’s product information to recommend the best developer for the new bug report. We use FS, IS, and FS + IS to verify our approach on four bug repositories: GCC, OpenOffice, Mozilla, and NetBeans. The results show that the method proposed in this paper is more effective than the previous methods. In future work, we plan to verify our approach using more bug repositories. Moreover, we plan to apply our method to additional software projects.

Data Availability

The data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

This work was supported by the National Natural Science Foundation of China (nos. 61672122, 61602077, 61771087, 51879027, 51579024, 71831002, and 61902050), Program for Innovative Research Team in University of Ministry of Education of China (no. IRT 17R13), the Fundamental Research Funds for the Central Universities (nos. 3132019501 and 3132019502), and the Next-Generation Internet Innovation Project of CERNET (nos. NGII20181203, NGII20181205, and NGII201810128).