#### Abstract

Homogeneous cross-project defect prediction (HCPDP) aims to apply a binary classification model built on source projects to a target project with the same metrics. However, there is still room for improvement in the performance of the existing HCPDP models. This study has proposed a novel approach, including one-to-one and many-to-one predictions. First, we apply the Jensen-Shannon divergence to select the most similar source project automatically. Second, relative density estimation is introduced to choose the suitable instance of the selected source project. Third, one-to-one and many-to-one prediction models are trained by the selected instances. Finally, two benchmark datasets are used to evaluate the proposed approach. Compared to the state-of-the-art methods, the experimental results demonstrated that the proposed approach could improve the prediction performance in the F1-score, AUC, and G-mean metrics and exhibit strong adaptability to the traditional classifiers.

#### 1. Introduction

Software defects are inevitable in the process of software development. Software defects affect user experience and bring economic losses to enterprises. Therefore, it is necessary to help software engineers accurately predict defects in projects. Recent years have seen an increasing trend in software defect prediction (Hall et al. [1]; Rahman et al. [2]; Yang et al. [3]; Ghotra et al. [4], Wen et al. [5]; Tang et al. [6]; Jiarpakdee et al. [7]). Previous research has demonstrated that machine-learning approaches are suitable for software defect prediction (Menzies et al. [8]; Zimmermann and Nagappan [9]; Hassan [10]). The project manager can use the defect prediction model to determine whether a module has defects. However, it is difficult to build an effective prediction model if not enough data are available. An alternative solution is cross-project defect prediction (CPDP) (He et al. [11]; Ma et al. [12]; Canfora et al. [13]; Herbold et al. [14]; Hosseini et al. [15]), which constructs the classifiers on existing projects with sufficient labeled data and predicts the defects of the target project. When the metrics (features) between the source and target projects are the same, CPDP is called homogeneous cross-project defect prediction (HCPDP).

Watanabe et al. [16] introduced the metric compensation method, which revises the metrics of the target project based on the metrics of the source project. Turhan et al. [17] calculated the Euclidian distance of each instance and selected the nearest instances to complete defect prediction, but it was one-phase filtering. Herbold [18] calculated the feature vector of each metric and applied two strategies to select suitable source projects. Fukushima et al. [19] proposed an approach to consider the similarity between the source and target projects by calculating the Spearman values. Panichella et al. [20] applied ensemble learning to improve CPDP performance. Subsequently, two-phase filtering methods (He et al. [21]; He et al. [22]) were proposed to improve the work of Turhan. These pieces of work selected the source project and applied the filtering method to construct the prediction models. Due to the development of transfer learning, Nam et al. [23] applied transfer component analysis to search the feature mapping space for the source and target projects and constructed the prediction models. Liu et al. [24] proposed a two-phase transfer learning method, which used a source project estimator to select similar projects by measuring the Euclidian distance.

Most prior studies focused on one-to-one defect prediction, and their findings have proved that selecting a suitable project is the primary solution for HCPDP. However, there is still room for improvement in the selection process, and many-to-one defect prediction is necessary for a natural environment. Therefore, this paper expands the Jensen-Shannon (JS) divergence (Zheng et al. [25]) to many-to-one defect prediction and applies a relative density strategy together to select the training instances. Unlike choosing the features in prior studies, JS divergence can keep all the features and calculate the similarity of two probability distributions for two projects directly. Moreover, prior work (Turhan et al. [17]; He et al. [22]; Nam et al. [23]) considered using distance or probability density to select the training instance, but it is difficult to choose when the metric dimension is high. This paper presents an alternative to solving this problem. Although it is not easy to exactly obtain the probability density of each instance, it is feasible to extract the proportional relation between the probability densities of any two instances. Therefore, the relative density is proposed to reflect the proportional relations for selecting instances in the projects. Finally, a novel weighting strategy in many-to-one defect prediction is designed. The experimental results show that the proposed approach can improve HCPD performance and be adaptable to different machine-learning algorithms.

The main contributions of this study are summarized as follows:(1)An approach for HCPDP based on the JS divergence with relative density is designed.(2)An ensemble weighting strategy based on JS divergence is proposed to build a many-to-one prediction model.(3)The proposed approach is not subject to change in the learning algorithms.

To evaluate the proposed approach, two benchmark datasets from NASA (Shepperd et al. [26]) and PROMISE (Jureczko and Madeyski [27]) were selected. Compared to the previous studies, one-to-one prediction exhibited average improvements in the F1-score of 13%∼119% on NASA and 5%∼33% on PROMISE, in the AUC of 8%∼31% on NASA and 2%∼30% on PROMISE, in the G-mean by 10%∼68% on NASA and 3%∼56% on PROMISE; many-to-one prediction was averagely improved by 10%∼193% of the F1-score on NASA and 15%∼63% on PROMISE, 1%∼25% of the AUC on NASA and 3%∼14% on PROMISE, 2%∼75% of the G-mean on NASA and 10%∼34% on PROMISE. Moreover, five widely used methods, including Logistic Regression (LR), Naive Bayesian (NB), Support Vector Machine (SVM), K-Nearest Neighbor (KNN), and Random Forest (RF), were selected to test the proposed approach.

The remainder of this paper is organized as follows: Section 2 presents the preliminary results related to this study. The proposed approach is described in Section 3. Section 4 gives the experiment setup. The experiment results and analysis are shown in Section 5, and Section 6 introduces the threats to validity. Section 7 outlines the related work, and conclusion and future works are provided in Section 8.

#### 2. Preliminary

This section introduces the preliminary result related to the proposed approach, including Jensen–Shannon divergence and the relative density estimation strategy.

##### 2.1. Jensen-Shannon Divergence

Feature selection is often used in CPDP, and filtering, wrapper, and embedded methods are the main methods. The filtering method is very fast, but the selected features may not be useful for the model. The advantage of the wrapper method is that the selected features can improve the effect of the model, but the disadvantage is that the model needs to be trained many times to evaluate the effect of the features. The embedded method considers the advantages of the above two methods, but it is difficult to define a criterion for judging whether a feature is valid or not. Therefore, this paper considered keeping all the features and chose Jensen-Shannon (JS) divergence to measure the similarity of two probability distributions. In probability, JS divergence has the ability to measure the similarity of two distributions, which is based on Kullback–Leibler (KL) divergence (Lamberti and Majtey [28]). Compared with KL divergence, JS divergence has symmetry, which is suitable for cross-project defect prediction. For example, when A is a target project and B is a source project, JS (A||B) can be calculated. However, JS (B||A) is not calculated again because of the symmetry of JS divergence, that is, the two values are the same. Hence, JS divergence can reduce the analysis process of CPDP. Moreover, the range of JS divergence is [0,1], and its value is a constant, which discriminates the similarity of two projects more accurately.

Suppose there are two distributions, *P* and *Q*, in which *P* is the true distribution and *Q* is the approximate distribution. Then the *KL* divergence is defined as

However, *KL* divergence is asymmetric. In other words, . Then, the *JS* divergence is proposed, and it is represented by (2) as follows:

Then MONTE CARLO is used to calculate JS value by (3) as follows:

As described in work (Lamberti & Majtey [28]), The smaller the JS divergence, the more similar the source project is to the target project. In summary, JS divergence is used to avoid the selection randomness of source projects.

##### 2.2. Relative Density Estimation Strategy

After a similar source project is selected, it is necessary to filter the suitable instances as the training data. If all the instances are used, noises and outliers will be drawn. The probability density can be calculated to distinguish significant instances from outliers. However, it is difficult to obtain the exact results when the metric dimension is high. Moreover, it is time-consuming as well.

This subsection presents an alternative to solving this problem. It is not easy to exactly obtain the probability density of each instance, but it is feasible to extract the proportional relation of the probability densities between any two instances. In this study, the relative density reflects the proportional relation. To calculate the relative density, a K-nearest neighbors-based probability density estimation (KNN-PDE) (Fukunaga and Hostetler [29]; Mack and Rosenblatt [30]; Yu et al. [31]; Zheng et al. [32]) alike strategy is adopted. KNN-PDE calculates the *K*th nearest neighbor distance of a single instance to measure its probability density distribution. When the number of instances tends to infinity, the results obtained from KNN-PDE converge to the actual probability density distribution.

For each training instance , its *K*th nearest neighbors are easy to find, and the distance between them is defined as . If is larger, will hold a lower density. For noise and outliers, they should appear in the region of low density, so can be used to measure the significance of each instance. To obtain a higher value for high-density instances and a lower value for noise and outliers, the reciprocal of can be used. The reciprocal of the K-nearest neighbors’ distance of each instance is named as its relative density. The relative density between any two samples is inversely proportional to the *K*th nearest neighbor distance between them.

According to (4), the selection of *K* is essential. If it is too small, noise and outliers are challenging to identify, but if it is too large, some small disjunctions will become blurred. The appropriate value for *K* was discussed in Section 5. After the estimation process was completed, all the instances were sorted, and then they were ready for the instance selection.

#### 3. Proposed Approach

As shown in Figure 1, the proposed approach includes the training and prediction phases. First, the JS divergence values between source and target projects are calculated to select the most similar project. Then, the relative density information is estimated to select the high-density instances. Finally, the selected instances are used to train one-to-one classifiers or many-to-one classifiers. In the prediction phase, the trained classifiers are evaluated by the target project, and the prediction results are obtained. Since the proposed method does not divide the data, cross-validation is not required. The details are illustrated as follows.

##### 3.1. Training Phase

###### 3.1.1. JS divergence for Project Selection

Figure 2 describes the project selection process based on JS divergence. The Gaussian mixture model (GMM = {GMM_{S1}, GMM_{S2}, ···, GMM_{Sn}, GMM_{T}}) is generated first based on the source and target projects. Then, the JS divergence values are acquired based on equations (2) and (3). If the calculated JS divergence is the lowest, it denotes that the source project is the most similar to the target project.

###### 3.1.2. Relative density estimation for instance selection

Since there are still noises and outliers in the source project, high-density instances are required to select. As shown in Figure 3, the number of instances *N* is counted first. Second, the distance of each instance to its Kth nearest neighbor is calculated to estimate the relative density. Finally, the instances are sorted by percentage and then selected. The parameters of K and percent will be discussed in Section 5.

###### 3.1.3. One-to-One Classifier Training

Some HCPDP models are one-to-one predictions. They use one source project to construct the model and predict the target project. This process is not complex to understand. The learning algorithms can be selected to train the classifier based on the selected instances directly.

###### 3.1.4. Many-to-One Classifiers Training

Besides one-to-one prediction, many-to-one prediction methods aim to add training data by putting all the source projects together. In this paper, a dynamic ensemble voting strategy is proposed to construct classifiers based on JS divergence. As shown in Figure 4, this paper trains the subclassifiers by the selected instances in similar source projects. Then, the percentage of the reciprocal of JS divergence is used as the dynamic voting weight for each subclassifier, that is, . By using this strategy, the model could guarantee that the selected source projects have the most significant weight, which can strengthen their performance.

##### 3.2. Prediction Phase

For one-to-one prediction, we input the data of the target project into the trained classifier and obtain the prediction label. For many-to-one prediction, the target data is sent to each subclassifier, and then the prediction result is obtained by multiplying the voting strategy. Finally, the result of each classifier is added to get the prediction result. If the prediction result is higher than a threshold of 0.5 (Nam et al. [23]; Zhou et al. [33]; Wan et al. [34]), an instance is more likely to be highly defective.

#### 4. Experiment Setup

##### 4.1. Environment and Datasets

The experimental environment is a computer equipped with an Intel Xeon E3-1231 and 16 GB RAM, running Windows 10 (64 bit). Two benchmark datasets, NASA (Shepperd et al. [26]) and PROMISE (Jureczko and Madeyski [27]; Zhou et al. [33]; Xu et al. [35]), are collected. Table 1 shows the total number of instances and defects, and the percentage of defective instances in these datasets.

##### 4.2. Evaluation Metrics

Three wide metrics are used in this study:

###### 4.2.1. F1-Score

Software defect prediction is recognized as a binary classification task. The final result may be True Positive (*TP*), which denotes the number of actually predicted defective modules; False Positive (*FP*) denotes the number of incorrectly predicted non-defective modules; True Negative (*TN*) represents the number of correctly predicted non-defective modules, and False Negative (*FN*) represents the number of incorrectly predicted defective modules. Based on the four results, precision is used to assess the correctness of a prediction model, , and recall is used to evaluate the possibility of correctly predicted defects, . In general, there is a trade-off between the two metrics. For example, by sacrificing precision, the recall value (Nam et al. [23]) may be improved. These trade-offs make it difficult to compare the performance of the prediction models using precision and recall (Kim et al. [36]; Xu et al. [35]; Wan et al. [34]). For this reason, this paper compares prediction results using F1-score values, that is, .

###### 4.2.2. AUC

AUC is used to estimate the area under the receiver operating characteristic curve, obtained by a set of (false positive rate and recall) pairs.

###### 4.2.3. G-Mean

G-mean can reflect the performance while the data is imbalanced, .

#### 5. Results and Analysis

In this section, six one-to-one and seven many-to-one HCPDP methods are selected to compare against the proposed approach. Since all the compared methods use logistic regression (LR) (Fan et al. [37]) as the underlying classifier, LR is used to construct the classifier. All the methods were implemented by following their papers.

##### 5.1. Six One-to-One Prediction Methods Are Described as Follows

(i)ManualDown (Zhou et al. [33]). It is a simple unsupervised method, which performs better than most the existing works. ManualDown has been recognized as a new baseline method of CPDP.(ii)Logistic regression (LR) (Fan et al. [37]). It only constructed an LR classifier to implement one-to-one defect prediction.(iii)MT (Zhang et al. [38]). The authors proposed Multiple Transformations (MT) to improve prediction performance.(iv)MT+ (Zhang et al. [38]). The authors improved MT to advance the prediction results.(v)BDA (Xu et al. [36]). The authors proposed a balanced distribution adaptation (BDA) based transfer learning method to implement cross-project defect prediction.(vi)CORAL (Sun et al. [38]). A method for domain adaptation was proposed by CORAL, which selected a similar source project with the target project by minimizing the distribution difference.##### 5.2. Seven Many-to-One Prediction Approaches Are Described as Follows

(i)ManualDown (Zhou et al. [33]). A new baseline method is proposed by calculating the module size.(ii)Logistic regression (Fan et al. [37]). The LR classifier was constructed to implement many-to-one defect prediction.(iii)TCA+ (Nam et al. [23]). An algorithm was proposed to deal with the data pre-processing during the many-to-one prediction process.(iv)TPTL (Liu et al. [24]). A source project estimator was given to select a similar source project and construct the two-phase prediction model.(v)ISDA (Jing et al. [39]). The idea of subclass discriminant analysis (SDA) was introduced into cross-project defect prediction. The authors improved SDA to advance prediction performance.(vi)TDS (Herbold et al. [18]). A method for training data selection was given, and the authors used Euclidean distance to measure the difference between the source and target projects.(vii)DYCOM (Minku et al. [40]). It was a weighted sum of multiple pre-trained models from source projects and 10% of data from the target project.To check if the proposed approach over other methods was statistically significant, Friedman’s test (Friedman [41]) is a favorable statistical test to compare over two ways. At 5% significance level, Friedman’s test rejects the null hypothesis of “equal” performance among all the comparing methods. Then Nemenyi’s test (Demiar and Schuurmans [42]) is used to analyze which methods differ. Moreover, we also apply Cohen’s *d* (Cohen [43]) to calculate the effect size, which can quantify the difference among the methods. The calculation formula is shown in (5). Table 2 shows the effect grade corresponding to Cohen’s *d*.

##### 5.3. One-to-One Defect Prediction Results and Analysis

Firstly, Tables 3 and 4 show the JS results of two projects on NASA and PROMISE. Since the JS result is smaller, the two projects are more similar. Then we can select the most similar source project for each target project. In the tables, the smallest JS value is boldface. For example, if CM1 is the target project, PC4 is selected as the source project.

After the source project is determined, instance selection could be completed by following relative density estimation. Then Tables 5–7 present the experimental results (i.e., F1-score, AUC, and G-mean) of the proposed approach compared with the six baseline approaches on NASA and PROMISE. For each target project, the best results are highlighted in bold. The last row in the tables presents the improvement of our approach over other baseline approaches.

Compared with the other six methods on two datasets, the one-to-one prediction method improved F1-score by 3.1%∼15.2% in Table 5. In Table 6, AUC is improved by 3.4%∼16.4%, and G-mean is improved by 3.5%∼20.7% in Table 7. However, the authors ran the approach of MT+ on “poi3.0” many times, but the result was still 0. It may not be suitable to evaluate MT+, and the authors do not also consider this project in their paper.

Next, Friedman’s and Nemenyi’s tests are used to analyze the performance. First, the corresponding -values (all less than 0.05) under the Friedman test are given in Table 8, which refers to the differences among the seven methods from a global perspective. To further visually show the differences, CD diagrams were used in Figure 5. The average rank of each method is marked along the axis. This study connected them with a thick line if these approaches are not significantly different under the Nemenyi test. By investigating Figure 5 and Table 8, the proposed approach significantly improves the F1-score, AUC, and G-mean over all the other methods.

**(a)**

**(b)**

**(c)**

**(d)**

**(e)**

**(f)**

In Table 9, Cohen’s *d* is used to calculate effect size. The proposed one-to-one defect prediction obtained 14 “*L*” on NASA datasets, and 9 “*L*” on PROMISE datasets. It can be concluded that the proposed approach is significantly better than the other six one-to-one prediction methods.

Finally, the parameters of K and percent in Figure 6 are analyzed. To decide the K value, we explore the parameter K based on the number of instances, and the range is , where N is the number of instances. For the percent value, it represents the proportion chosen for sorting instances after relative density estimation, and its range was [0.1, 0.9]. Due to the space limitation, this paper takes CM1 in NASA and ant 1.7 in PROMISE as an example. Then the selected source projects based on JS values are PC4 and xalan2.4, respectively.

**(a)**

**(b)**

**(c)**

**(d)**

**(e)**

**(f)**

Figure 6 plots the variance of the performance with the variance of K and percent. On the *X*-axis, 0.1∼0.9 denotes the proportion of the selected sorting instances. On the *Y*-axis, 0.25∼4 indicates the choice of *K*, and the *Z*-axis indicates the values of F1-score, AUC, and G-mean, respectively. As investigated in Figures 6(a)–6(c), when *K* = 0.25, percent = 0.2, the F1-score, AUC, and G-mean of PC4CM1 were 0.3609, 0.6747, and 0.6668. In Figures 6(d)–6(f), when *K* = 4, percent = 0.7, the F1-score, AUC, and G-mean of xalan2.4 ant1.7 were 0.5714, 0.7338, 0.7289. Hence, the selection of K and percent are based on the choice of the project.

##### 5.4. Many-to-One Defect Prediction Results and Analysis

Compared to one-to-one prediction, the proposed many-to-one prediction has the same process for source project selection and instance section. However, the exact number of source projects needed to discuss during the process of the many-to-one prediction, and the corresponding analysis will be given later.

By the ensemble voting weighting strategy Tables 10–12 show the F1-score, AUC, and G-mean of many-to-one prediction under the proposed approach compared with the seven many-to-one methods on NASA and PROMISE. The best results are still highlighted in bold, and the last row presents the improvement of the proposed approach over other approaches.

According to the results in Table 10, the proposed many-to-one prediction improved the F1-score by 6.7%∼22.6%. In Table 11, AUC is improved by 2.3%∼7.7%, and G-mean is improved by 5.8%∼22.8% in Table 12 on NASA and PROMISE. In these tables, some results are still 0, the relevant project may not be suitable to evaluate, and the authors do not consider the projects in their papers.

Table 13 gives the -values (all less than 0.05), which indicate that all the methods have significant performance differences between each other. Moreover, the CD diagrams in Figure 7 show that the proposed approach ranks first in all the evaluation metrics, except ranking second in G-mean on NASA and AUC on PROMISE. As a whole, many-to-prediction under the proposed approach significantly improved the F1-score, AUC, and G-mean over other many-to-one methods.

**(a)**

**(b)**

**(c)**

**(d)**

**(e)**

**(f)**

Similarly, Cohen’s *d* is used to evaluate effect size. The analysis results can be seen in Table 14 and the proposed many-to-one defect prediction obtained 10 “*L*” on NASA datasets, and 16 “*L*” on PROMISE datasets. The proposed approach is significantly better than the other seven many-to-one prediction methods, and the difference cannot be ignored.

Finally, Figure 8 illustrates the variance of the evaluation metrics with the numbers of the source projects. It can be found that when the number of NASA source projects is 3 and the number of PROMISE source projects are 7, the result is optimal. For example, if “CM1” is the target project, the selected source projects are PC4, MC2, and MW1.

**(a)**

**(b)**

**(c)**

**(d)**

**(e)**

**(f)**

##### 5.5. Different Learning Methods Analysis

This section aims to prove that the proposed is not subject to machine learning methods, that is, the prediction performance can be improved regardless of which learning method is selected. We first select four traditional learning methods that are often applied in CPDP, and they are LR, SVM, KNN, RF, and NB, respectively.

Since AUC can reveal the classification performance, we take it as an example. Figure 9 shows the comparison results, and it can be seen that AUC is improved regardless of which type of learning method is used to construct the classifier.

**(a)**

**(b)**

To check the difference in learning methods, Figure 10 graphically visualizes the average comparison results of the proposed approach with five learning methods on two datasets. When the classifier changes, the proposed approach still works well. It proves that the proposed approach has more robust adaptability compared to other learning methods. However, we cannot easily judge which learning method is better because it is affected by the datasets.

**(a)**

**(b)**

**(c)**

**(d)**

##### 5.6. Discussion about the Time-Efficiency

To illustrate whether the running time is acceptable, Tables 15 and 16 give the average running time of our approach and other approaches. Compared with TPTL and ISDA, the proposed approach needs less running time, but more than the other methods.

It is necessary to explain why our approach needs more time. Four components are contained in our approach, which includes JS calculation for source project selection, relative density estimation for instance selection, classifier building, and prediction. The running time of classifier building and prediction is less, and they can be ignored. In Table 17, it can be seen that more time is used for JS calculation and the relative density estimation. As relative density estimation is based on the selected project, JS calculation is the main reason our approach needs more time. However, the calculation process does not need to be updated all the time. Even if a new project is added to the datasets, we only need to calculate the JS values of the new project with other projects, instead of recalculating all the JS values.

#### 6. Threats to Validity

Four potential threats to the validity are described in the following:(1)Accuracy of experiments. Most of the compared works do not provide codes of their methods. This study only analyses and implements their methods by following their papers.(2)Bias of evaluation measures. In this work, the wide measures of F1-score, AUC, and G-mean are used to show the results of the prediction. Other measures, such as recall, precision, skewed F-measure, and Matthews correlation coefficient, are not considered.(3)Bias of classifiers. Classification is a significant research topic, and many learning methods can be used to build classifiers. As investigated in previous studies, Logistic regression can achieve better performance. Therefore, this work also applied logistic regression to build the classifiers. Meanwhile, the convention learning methods of SVM, KNN, Random Forest, and Naïve Bayes are also tested to evaluate the performance.(4)Bias of datasets. Several benchmark data sets are commonly used in the field of cross-project defect prediction, such as NASA (Shepperd et al. [26]), PROMISE (Jureczko and Madeyski [27]), AEEEM (D’Ambros et al. [44]), Relink (Wu et al. [45]), and SOFTLAB (Turhan et al. [17]), and some datasets contain different versions of each project. To evaluate the prediction performance, this work chose the widely used NASA and PROMISE as the datasets.

#### 7. Related Work

Cross-project defect prediction has become one of the most important topics in software engineering (Rahman et al. [2]; Yang et al. [3]; Ghotra et al. [4]; Herbold et al. [14]; Wen et al. [5]). In practice, researchers have recognized it as a binary classification problem, that is, they model the defective data to train a machine learning model. Due to insufficient training data in the source projects, the researchers try to build a bridge from the source project to the target project. They utilize the information from the source projects and transfer it to the target project. The following will give the related studies only.

For one-to-one defect prediction, Sun et al. [38] described an effective and efficient method for domain adaptation called CORAL, which minimized the distribution difference between source and target projects. Zhang et al. [46] applied different transformations to explore whether the cross-project defect prediction is affected and proposed Multiple Transformations (MT) and MT+ to improve prediction performance. Xu et al. [35] introduced a balanced distribution adaptation-based transfer learning method to implement defect prediction, which improved the existing methods’ performance. In this study, we compare those one-to-one models with our one-to-one approach.

For many-to-one defect prediction, Herbold [18] utilized distance-based strategies to select training data in many-to-one defect prediction. The results show their method can achieve good performance. Nam et al. [23] extended their work to TCA+ with a preprocessing data method and conducted it on one-to-one and many-to-one predictions. Minku et al. [40] proposed a transfer learning model for Dycom based on the work (Nam et al. [23]). Dycom is a weighted sum of transferred models trained by various source projects. Jing et al. [39] proposed a method called subclass discriminant analysis, which can learn features from original metrics and make the distributions of source and target projects stable. Liu et al. [24] proposed a two-phase transfer learning model (TPTL) for CPDP. They introduced a source project estimator to choose similar source projects and built the two prediction models. By combining the prediction results of the two models, they further improved the final performance. In this study, we compare the above models with our many-to-one approach.

Besides homogeneous defect prediction, heterogeneous defect prediction has recently become a great process. Gong et al. [47] utilized the thought of stratification embedded in the nearest neighbor to produce evolving training datasets with balanced data. Zou et al. [48] proposed a method named Joint Feature representation with double marginalized denoising auto-encoders to learn the global and local features, and they introduced local data gravitation between source and target domains to determine instance weight in the learning process (Zou et al. [49]). Jin et al. [50] used two support vector machines to implement domain adaptation to match data distribution. Mehta and Patnaik [51] used various ensemble machine learning techniques to improve classification performance. In the future, we will study heterogeneous defect prediction based on the state-of-the-art model.

From the abovementioned literature, it can be concluded that there are two significant problems in CPDP research. The first problem is that the source and target project data usually exhibit significantly different distributions. Since the source and target projects might be implemented by different programming languages or companies, the same metrics in the source and target project data might have different distributions. The second problem is that there are no standard metrics between the source and target project data. It is difficult to predict the defects in the target project using conventional methods to build models on the source project data. Therefore, many efforts have been made to solve these two problems in recent years, and this study focuses on the solution to the first problem. In the future, standardized tools will be developed based on more experiments.

#### 8. Conclusion and Future Works

This paper studies the limitations of prior studies and proposes a novel approach. First, to avoid the randomness of the source project selection, Jensen-Shannon divergence is used to measure the similarity between source and target projects automatically. Subsequently, relative density information is introduced to filter noise and outliers in similar source projects. Posteriorly, the one-to-one defect prediction model is constructed based on the most similar source project directly, and the many-to-one defect prediction model is built by a proposed ensemble weight strategy based on Jensen-Shannon divergence. Finally, the models predict the potential defects in the target project.

Experimental results on the benchmark datasets of NASA and PROMISE indicate that our approach can improve the F1-score, AUC, and G-mean. Moreover, the results also prove that the proposed approach is adaptive regardless of the type of learning method.

In further work, we will select more defect data sets to evaluate the effectiveness of the proposed approach. Moreover, the proposed approach will be improved, such as combing domain adaptation analysis, to implement heterogeneous defect prediction.

#### Data Availability

The data used to support the findings of this study have been deposited in https://github.com/Sevensweett/Defect-Prediction.

#### Conflicts of Interest

The authors declare that there are no conflicts of interest regarding the publication of this paper.

#### Acknowledgments

The work was supported by the Scientific Research Foundation for the introduction of talent of Jiangsu University of Science and Technology, China; Natural Science Foundation of the Higher Education Institutions of Jiangsu Province, China (Grant No. 18JKB520011); and Primary Research and Development Plan (Social Development) of Zhenjiang City, China (Grant No. SH2019021).