Research Article
Emotional Video to Audio Transformation Using Deep Recurrent Neural Networks and a Neuro-Fuzzy System
Figure 10
Spectrograms of the expected and estimated audio output (a, b) for Crystallize video excerpt, with MAE = 0.280 and MOS (Valence, Arousal) = (6.43 ± 1.29, 6.57 ± 1.05) and (c, d) for LOTR video excerpt, with MAE = 0.206 and MOS (valence, arousal) = (, ).
(a) |
(b) |
(c) |
(d) |