Research Article

Correcting Classifiers for Sample Selection Bias in Two-Phase Case-Control Studies

Table 1

Properties and performance of correction approaches for logistic regression and random forest. The properties are as follows: (i) a correction attempt is made at all; (ii) the covariance structure of the learning data is attempted to be unbiased; (iii) learning is based on a data set containing a larger number of observations than the original stratified data set (see (3)). Criteria are fulfilled (“✓”), not clearly fulfilled (“(✓)”), or not fulfilled (“”).

Correction approach Properties according to Section 3.1.3 Sufficient performance
(i)(ii)(iii)Logistic regressionRandom forest

No correction
IP oversampling
IP bagging
Costing(✓)
Modified SMOTE(✓)(✓)
Stochastic IP oversampling
Parametric IP bagging