Research Article

Metagenomics Biomarkers Selected for Prediction of Three Different Diseases in Chinese Population

Table 2

The evaluation of algorithms based on 1 score.

ā€‰KNNLRRFSVMGBDTSGDADA

NearMiss0.66280.75100.78880.71840.82820.76960.7956
SMOTEEN0.86020.91420.83410.91380.87410.83600.8959

# of Markers280300220320160220220

KNN, -nearest neighbor; LR, logistic regression; RF, random forest; SVM, supporting vector machine; GBDT, gradient boosting decision tree; SGD, stochastic gradient descent; ADA, adaptive boosting. Miss3 is one method using the -nearest neighbor (KNN) classifier to achieve undersampling; is one method by removing three nearest neighbors from training set to achieve oversampling. Bold font stands for the best result.