Research Article

Emotional Video to Audio Transformation Using Deep Recurrent Neural Networks and a Neuro-Fuzzy System

Figure 10

Spectrograms of the expected and estimated audio output (a, b) for Crystallize video excerpt, with MAE = 0.280 and MOS (Valence, Arousal) = (6.43 ± 1.29, 6.57 ± 1.05) and (c, d) for LOTR video excerpt, with MAE = 0.206 and MOS (valence, arousal) = (, ).
(a)
(b)
(c)
(d)