Research Article
Emotional Video to Audio Transformation Using Deep Recurrent Neural Networks and a Neuro-Fuzzy System
Table 2
Extended MOS and MAE results with 8 samples.
| | MAE | Target MOS | Obtained MOS | Valence | Arousal | Valence | Arousal |
| 1 | 0.193 | 8.0 | 8.6 | 4.43 ± 1.99 | 5.14 ± 2.10 | 2 | 0.142 | 3.8 | 5.2 | 5.57 ± 1.84 | 5.57 ± 1.59 | 3 | 0.234 | 6.4 | 8.0 | 4.71 ± 1.58 | 5.00 ± 1.78 | 4 | 0.203 | 6.4 | 6.8 | 4.71 ± 1.39 | 3.71 ± 1.03 | 5 | 0.280 | 6.6 | 7.0 | 6.43 ± 1.29 | 6.57 ± 1.05 | 6 | 0.262 | 6.0 | 6.8 | 3.00 ± 0.53 | 4.29 ± 0.88 | 7 | 0.206 | 7.2 | 6.8 | 5.86 ± 1.46 | 4.71 ± 2.05 | 8 | 0.219 | 5.2 | 7.2 | 5.43 ± 1.18 | 5.86 ± 1.36 |
|
|