Research Article

Features to Text: A Comprehensive Survey of Deep Learning on Semantic Segmentation and Image Captioning

Table 2

Class pixel label distribution in the CamVid dataset.

DatasetMethodB-1B-2B-3B-4MC

MS COCOLSTM-A-2 [179]0.7340.5670.4300.3260.2541.00
Att-Reg [180]0.7400.5600.4200.3100.260
Attend-tell [156]0.7070.4920.3440.2430.239
SGC [181]67.148.834.323.921.873.3
phi-LSTM [182]66.648.935.525.823.182.1
COMIC [183]70.653.439.529.223.788.1
TBVA [184]69.552.138.628.724.191.9
SCN [185]0.7410.5780.4440.3410.2611.041
CLGRU [186]0.7200.5500.4100.3000.2400.960
A-Penalty [187]72.155.141.531.424.795.6
VD-SAN [188]73.456.642.832.225.499.9
ATT-CNN [189]73.957.143.33326101.6
RTAN [190]73.556.943.332.925.4103.3
Adaptive [191]0.7420.5800.4390.3320.2661.085
Full-SL [192]0.7130.5390.4030.3040.2510.937

Flickr30KhLSTMat [193]73.855.140.329.42366.6
SGC [181]61.542.128.619.318.239.9
RA + SF [194]0.6490.4620.3240.2240.1940.472
gLSTM [195]0.6460.4460.3050.2060.179
Multi-Mod [196]0.6000.3800.2540.1710.169
TBVA [184]66.648.434.624.720.252.4
Attend-tell [156]0.6690.4390.2960.1990.185
ATT-FCN [158]0.6470.4600.3240.2300.189
VQA [197]0.7300.5500.4000.280
Align-Mod [144]0.5730.3690.2400.157
m-RNN [198]0.6000.4100.2800.190
LRCN [112]0.5870.3910.2510.165
NIC [141]0.6700.4500.300
RTAN [190]67.148.734.923.920.153.3
3-gated [199]69.445.733.222.623
VD-SAN [188]65.247.133.623.919.9
ATT-CNN [189]66.147.233.423.219.4