Research Article

Features to Text: A Comprehensive Survey of Deep Learning on Semantic Segmentation and Image Captioning

Table 1

Class pixel label distribution in the CamVid dataset.

DatasetMethodmIoU

CamVidApesNet [90]48.0
ENet [91]51.3
SegNet [60]55.6
LinkNet [92]55.8
FCN8 [59]57.0
AttentionM [93]60.1
DeepLab-LFOV [72]61.6
Dilation8 [66]65.3
BiseNet [94]68.7
PSPNet [60]69.1
DenseDecoder [67]70.9
AGNet [95]75.2

PASCAL VOCWails [96]55.9
FCN8 [59]62.2
PSP-CRF [97]65.4
Zoom Out [98]69.6
DCU [99]71.7
DeepLab1 [72]71.6
DeConvNet [61]72.5
GCRF [100]73.2
DPN [101]74.1
Piecewise [102]75.3

CityscapesFCN8 [59]65.3
DPN [101]66.8
Dilation10 [103]67.1
LRR [104]69.7
DeepLab2 [73]70.4
FRRN [105]71.8
RefineNet [106]73.6
GEM [107]73.69
PEARL [108]75.4
TuSimple [109]77.6
PSPNet [110]78.4
SPP-DCU [99]78.9