Research Article
A High Accurate Multiple Classifier System for Entity Resolution Using Resampling and Ensemble Selection
Input: | (1) the dataset to be resampled: | % each instance is the field similarity vector of a | record pair, and the class label indicates whether the | corresponding record pair is match or non-match | (2) the ratio of resampling: | Initialization: | (3) , , | Splitting the dataset into ambiguous and normal: | (4) get the instance number | (5) Split dataset into duplicate data and distinct | data | (6) Average each instance to get record similarity | vector and | (7) Calculate expectation and variance of and as | , and , respectively | (8) Calculate lower bound LB of as | and the upper bound UB of as UB = | (9) for | (10) If % is the similarity of the | th instance | (11) % ambiguous instances | (12) Else | (13) % normal instances | Resampling: | (14) For | (15) Randomly select an instance from | (16) | (17) For | (18) Randomly select an instance from | (19) | (20) Order in random order | Output: | (21) the resampled dataset: |
|