Research Article

Emotional Video to Audio Transformation Using Deep Recurrent Neural Networks and a Neuro-Fuzzy System

Table 6

Human evaluation with 10 samples generated with the DEAP dataset using the current model.

Target MOSObtained MOS
ValenceArousalValenceArousal

15.395.325.40 ± 2.805.20 ± 2.91
25.585.184.10 ± 2.474.80 ± 2.74
35.025.295.00 ± 2.215.70 ± 2.50
45.535.556.00 ± 2.265.60 ± 2.84
55.015.444.60 ± 2.374.90 ± 2.69
65.494.204.40 ± 2.634.90 ± 2.81
75.305.045.40 ± 2.995.10 ± 3.18
84.725.504.70 ± 2.505.50 ± 2.51
95.135.213.60 ± 2.324.30 ± 3.16
105.114.923.80 ± 2.744.50 ± 3.27