Research Article
Features to Text: A Comprehensive Survey of Deep Learning on Semantic Segmentation and Image Captioning
Table 1
Class pixel label distribution in the CamVid dataset.
| Dataset | Method | mIoU |
| CamVid | ApesNet [90] | 48.0 | ENet [91] | 51.3 | SegNet [60] | 55.6 | LinkNet [92] | 55.8 | FCN8 [59] | 57.0 | AttentionM [93] | 60.1 | DeepLab-LFOV [72] | 61.6 | Dilation8 [66] | 65.3 | BiseNet [94] | 68.7 | PSPNet [60] | 69.1 | DenseDecoder [67] | 70.9 | AGNet [95] | 75.2 |
| PASCAL VOC | Wails [96] | 55.9 | FCN8 [59] | 62.2 | PSP-CRF [97] | 65.4 | Zoom Out [98] | 69.6 | DCU [99] | 71.7 | DeepLab1 [72] | 71.6 | DeConvNet [61] | 72.5 | GCRF [100] | 73.2 | DPN [101] | 74.1 | Piecewise [102] | 75.3 |
| Cityscapes | FCN8 [59] | 65.3 | DPN [101] | 66.8 | Dilation10 [103] | 67.1 | LRR [104] | 69.7 | DeepLab2 [73] | 70.4 | FRRN [105] | 71.8 | RefineNet [106] | 73.6 | GEM [107] | 73.69 | PEARL [108] | 75.4 | TuSimple [109] | 77.6 | PSPNet [110] | 78.4 | SPP-DCU [99] | 78.9 |
|
|