Security and Communication Networks

Research Article

Research on Cross-Company Defect Prediction Method to Improve Software Security

Metric matching and sample weight setting based CCDP algorithm.

Input: source company project datasets as training datasets;
Target company project datasets as test datasets;
Output: trained defect prediction model
(1)	Clustering-based metric matching:
(a)	Use MaxMinNormalization method to normalize S and T to get SNorm and TNorm;
(b)	Extract the multigranularity metric feature vector of S and T, expressed as and , where ;
(c)	Apply K-means clustering algorithm to cluster and into K clusters, respectively;
(d)	Extract the principal component of each cluster through PCA as the representative vector and ;
(e)	Perform one-to-one metric matching on and through metric matching;
(f)	Redistribute SNorm and TNorm based on above steps, expressed as ,.
(2)	Sample selection-based weight setting:
(a)	Use NNFilter to select N source samples that similar to each target sample based on Euclidean distance as candidate training data samples ;
(b)	Statistic the frequency of selected source samples in the SCandidate;
(c)	Use samples frequency information in SCandidate as the basis for sample weight setting.
(3)	CCDP model construction and verification:
(a)	Use the weighted source samples as the training dataset and apply common machine learning methods including LR, NB, and KNN to construct the predict model;
(b)	Perform experiments on multiple defect datasets and evaluate the performance of the proposed method.