Research Article

An Improved Method for Cross-Project Defect Prediction by Simplifying Training Data

Table 5

The best prediction results obtained by the CPDP approach based on TDSelector with Cosine similarity. NoD represents the baseline method; + denotes the growth rate of AUC value; the maximum AUC value of different normalization methods is underlined; each number shown in bold indicates that the corresponding AUC value rises by more than 10%.

Cosine similarityAntXalanCamelIvyJeditLucenePoiSynapseVelocityXercesEclipseEquinoxLucene2MylynPdeMean ± St.d

Linear
α0.70.90.91.00.91.00.91.00.60.90.80.60.70.70.50.338
AUC0.8130.6760.6030.7930.7000.6110.7580.7410.5120.7420.7830.7600.7390.7050.7290.711 ± 0.081
+ (%)6.3%3.7%1.9%-30.6%-3.0%-43.0%0.3%22.6%39.4%4.1%5.9%4.0%9.0%
Logistic
α0.70.50.710.70.60.60.60.50.500.40.70.50.50.351
AUC0.8020.6740.5950.7930.6650.6210.7590.7650.5790.7450.7730.7380.7120.7070.7400.711 ± 0.070
+ (%)4.8%3.4%0.5%-24.1%1.6%3.1%3.2%61.7%0.7%21.0%35.5%0.3%6.2%5.6%9.0%
Square root
α0.70.70.60.60.70.60.70.90.510.40.60.60.60.60.249
AUC0.7990.6540.5960.8070.7350.6260.7460.7620.5000.7400.7740.5600.7220.7000.7380.697 ± 0.091
+ (%)4.4%0.3%0.7%1.8%37.1%2.5%1.4%2.8%39.7%-21.0%2.8%1.7%5.3%5.3%6.9%
Logarithmic
α0.60.60.91.00.71.00.70.70.50.90.50.50.60.60.60.351
AUC0.7980.6620.5940.7930.7310.6110.7480.7440.5000.7580.7740.7000.7550.7020.7410.707 ± 0.083
+ (%)4.3%1.5%0.3%-36.4%-1.6%0.4%39.7%2.4%21.2%28.5%6.3%5.5%5.8%8.5%
Inverse cotangent
α0.71.01.01.00.71.00.71.00.60.700.70.70.70.70.213
AUC0.7980.6520.5920.7930.6590.6110.7490.7410.5000.7640.7730.5560.7390.6950.7340.690 ± 0.092
+ (%)4.3%---22.9%-1.8%-39.7%3.2%21.0%2.1%4.1%4.4%4.8%5.9%

NoD 0.7650.6520.5920.7930.5360.6110.7360.7410.3580.7400.6390.5430.7090.6650.7010.652 ± 0.113