Research Article

Research on Cross-Company Defect Prediction Method to Improve Software Security

Algorithm 3

Metric matching and sample weight setting based CCDP algorithm.
 Input: source company project datasets as training datasets;
 Target company project datasets as test datasets;
 Output: trained defect prediction model
(1)Clustering-based metric matching:
(a) Use MaxMinNormalization method to normalize S and T to get SNorm and TNorm;
(b) Extract the multigranularity metric feature vector of S and T, expressed as and , where ;
(c) Apply K-means clustering algorithm to cluster and into K clusters, respectively;
(d) Extract the principal component of each cluster through PCA as the representative vector and ;
(e) Perform one-to-one metric matching on and through metric matching;
(f) Redistribute SNorm and TNorm based on above steps, expressed as ,.
(2)Sample selection-based weight setting:
(a) Use NNFilter to select N source samples that similar to each target sample based on Euclidean distance as candidate training data samples ;
(b) Statistic the frequency of selected source samples in the SCandidate;
(c) Use samples frequency information in SCandidate as the basis for sample weight setting.
(3)CCDP model construction and verification:
(a) Use the weighted source samples as the training dataset and apply common machine learning methods including LR, NB, and KNN to construct the predict model;
(b) Perform experiments on multiple defect datasets and evaluate the performance of the proposed method.