Computational and Mathematical Methods in Medicine

Research Article

Correcting Classifiers for Sample Selection Bias in Two-Phase Case-Control Studies

Stochastic inverse-probability oversampling.

Input: Observed sample of size , IP weights
Output: Unbiased prediction for new unbiased data
(1) Perform IP oversampling, resulting in reconstructed sample of size
(2) for to do
for to do
(a) Estimate of distribution
(b) Draw noise vector from of length
(c) Rebuild original stratum as
end
(a) Combine strata to sample:

(b) Fit classifier
end
(3) Output the ensemble of learners
(4) Aggregate predictions on new data set by averaging: