Features to Text: A Comprehensive Survey of Deep Learning on Semantic Segmentation and Image Captioning

<div>Sample architecture of a multimodal image captioning network [<a href="/journals/complexity/2021/5538927/#B111">111</a>].</div>

Complexity

Figure 6: Features to Text: A Comprehensive Survey of Deep Learning on Semantic Segmentation and Image Captioning