Research Article
A Deep Multimodal Model for Predicting Affective Responses Evoked by Movies Based on Shot Segmentation
Table 4
Comparison of state-of-the-art results for experienced emotion prediction.
| Features | Arousal (loss1) | Valence (loss2) | MSE | PCC | MSE | PCC |
| All features | 0.0275 | 0.6187 | 0.0632 | 0.3443 | −Action features | 0.0291 | 0.6038 | 0.0673 | 0.3259 | −Face features | 0.0277 | 0.6136 | 0.0637 | 0.3667 | −Person features | 0.0280 | 0.6181 | 0.0653 | 0.3726 | −Place features | 0.0280 | 0.5981 | 0.0663 | 0.3315 | −VGGish features | 0.0290 | 0.5952 | 0.0669 | 0.3444 | −OpenSMILE features | 0.0295 | 0.6003 | 0.0666 | 0.3345 | All_visual_features | 0.0316 | 0.4931 | 0.0751 | 0.2694 | All_audio_features | 0.0297 | 0.6141 | 0.0726 | 0.3356 |
|
|
“−” indicates without the feature.
|