The Understanding of Deep Learning: A Comprehensive Review

<div>From image to text. Captions generated by a recurrent neural network (RNN) taking, as extra input, the representation extracted by a deep convolution neural network (CNN) from a test image, with the RNN trained to “translate” high-level representations of images into captions. When the RNN is given the ability to focus its attention on a different location in the input image (middle and bottom; the lighter patches were given more attention) as it generates each word (underlined), we found that it exploits this to achieve better ‘translation’ of images into captions (<a href="https://casmls.github.io/general/2016/10/16/attention_model.html">https://casmls.github.io/general/2016/10/16/attention_model.html</a>).</div>

Mathematical Problems in Engineering

fig3

Figure 3

Figure 3: The Understanding of Deep Learning: A Comprehensive Review