Research Article
Emotional Video to Audio Transformation Using Deep Recurrent Neural Networks and a Neuro-Fuzzy System
Table 6
Human evaluation with 10 samples generated with the DEAP dataset using the current model.
| | Target MOS | Obtained MOS | Valence | Arousal | Valence | Arousal |
| 1 | 5.39 | 5.32 | 5.40 ± 2.80 | 5.20 ± 2.91 | 2 | 5.58 | 5.18 | 4.10 ± 2.47 | 4.80 ± 2.74 | 3 | 5.02 | 5.29 | 5.00 ± 2.21 | 5.70 ± 2.50 | 4 | 5.53 | 5.55 | 6.00 ± 2.26 | 5.60 ± 2.84 | 5 | 5.01 | 5.44 | 4.60 ± 2.37 | 4.90 ± 2.69 | 6 | 5.49 | 4.20 | 4.40 ± 2.63 | 4.90 ± 2.81 | 7 | 5.30 | 5.04 | 5.40 ± 2.99 | 5.10 ± 3.18 | 8 | 4.72 | 5.50 | 4.70 ± 2.50 | 5.50 ± 2.51 | 9 | 5.13 | 5.21 | 3.60 ± 2.32 | 4.30 ± 3.16 | 10 | 5.11 | 4.92 | 3.80 ± 2.74 | 4.50 ± 3.27 |
|
|