Research Article

A High Accurate Multiple Classifier System for Entity Resolution Using Resampling and Ensemble Selection

Algorithm 1

Resampling algorithm.
Input:
(1) the dataset to be resampled:
% each instance is the field similarity vector of a
record pair, and the class label indicates whether the
corresponding record pair is match or non-match
(2) the ratio of resampling:
Initialization:
(3) , ,
Splitting the dataset into ambiguous and normal:
(4) get the instance number
(5) Split dataset into duplicate data and distinct
data
(6) Average each instance to get record similarity
vector and
(7) Calculate expectation and variance of and as
, and , respectively
(8) Calculate lower bound LB of as
and the upper bound UB of as UB =
(9) for
(10) If % is the similarity of the
th instance
(11)    % ambiguous instances
(12) Else
(13)      % normal instances
Resampling:
(14) For
(15) Randomly select an instance from
(16)
(17) For
(18) Randomly select an instance from
(19)
(20) Order in random order
Output:
(21) the resampled dataset: