Research Article

Correcting Classifiers for Sample Selection Bias in Two-Phase Case-Control Studies

Algorithm 1

Stochastic inverse-probability oversampling.
Input: Observed sample of size , IP weights
Output: Unbiased prediction for new unbiased data
(1) Perform IP oversampling, resulting in reconstructed sample of size
(2) for    to    do
for    to    do
(a) Estimate of distribution
(b) Draw noise vector from of length
(c) Rebuild original stratum as
end
(a) Combine strata to sample:
(b) Fit classifier
end
(3) Output the ensemble of learners
(4) Aggregate predictions on new data set by averaging: