Research Article

Features to Text: A Comprehensive Survey of Deep Learning on Semantic Segmentation and Image Captioning

Figure 6

Sample architecture of a multimodal image captioning network [111].