Research Article

A Semi-Automated Usability Evaluation Framework for Interactive Image Segmentation Systems

Table 3

Relative absolute prediction errors for AttrakDiff-2 and SUS test set samples. Predictions are computed by six separately trained Stochastic Gradient Boosting Regression Forests (GBRFs), one for each figure of merit. Note that each training process only utilizes the interaction log data. Results displayed are the median values of randomly initialized training processes.

Relative Error ATT HQ HQ-I HQ-S PQ SUS

Mean 11.5% 7.4% 10.5% 8.0% 15.7% 10.4%
Median 8.9% 6.3% 9.4% 6.2% 13.7% 8.8%
Std 8.0% 5.5% 6.7% 6.9% 12.0% 7.1%