Research Article
Features to Text: A Comprehensive Survey of Deep Learning on Semantic Segmentation and Image Captioning
Table 2
Class pixel label distribution in the CamVid dataset.
| Dataset | Method | B-1 | B-2 | B-3 | B-4 | M | C |
| MS COCO | LSTM-A-2 [179] | 0.734 | 0.567 | 0.430 | 0.326 | 0.254 | 1.00 | Att-Reg [180] | 0.740 | 0.560 | 0.420 | 0.310 | 0.260 | — | Attend-tell [156] | 0.707 | 0.492 | 0.344 | 0.243 | 0.239 | — | SGC [181] | 67.1 | 48.8 | 34.3 | 23.9 | 21.8 | 73.3 | phi-LSTM [182] | 66.6 | 48.9 | 35.5 | 25.8 | 23.1 | 82.1 | COMIC [183] | 70.6 | 53.4 | 39.5 | 29.2 | 23.7 | 88.1 | TBVA [184] | 69.5 | 52.1 | 38.6 | 28.7 | 24.1 | 91.9 | SCN [185] | 0.741 | 0.578 | 0.444 | 0.341 | 0.261 | 1.041 | CLGRU [186] | 0.720 | 0.550 | 0.410 | 0.300 | 0.240 | 0.960 | A-Penalty [187] | 72.1 | 55.1 | 41.5 | 31.4 | 24.7 | 95.6 | VD-SAN [188] | 73.4 | 56.6 | 42.8 | 32.2 | 25.4 | 99.9 | ATT-CNN [189] | 73.9 | 57.1 | 43.3 | 33 | 26 | 101.6 | RTAN [190] | 73.5 | 56.9 | 43.3 | 32.9 | 25.4 | 103.3 | Adaptive [191] | 0.742 | 0.580 | 0.439 | 0.332 | 0.266 | 1.085 | Full-SL [192] | 0.713 | 0.539 | 0.403 | 0.304 | 0.251 | 0.937 |
| Flickr30K | hLSTMat [193] | 73.8 | 55.1 | 40.3 | 29.4 | 23 | 66.6 | SGC [181] | 61.5 | 42.1 | 28.6 | 19.3 | 18.2 | 39.9 | RA + SF [194] | 0.649 | 0.462 | 0.324 | 0.224 | 0.194 | 0.472 | gLSTM [195] | 0.646 | 0.446 | 0.305 | 0.206 | 0.179 | — | Multi-Mod [196] | 0.600 | 0.380 | 0.254 | 0.171 | 0.169 | — | TBVA [184] | 66.6 | 48.4 | 34.6 | 24.7 | 20.2 | 52.4 | Attend-tell [156] | 0.669 | 0.439 | 0.296 | 0.199 | 0.185 | — | ATT-FCN [158] | 0.647 | 0.460 | 0.324 | 0.230 | 0.189 | — | VQA [197] | 0.730 | 0.550 | 0.400 | 0.280 | — | — | Align-Mod [144] | 0.573 | 0.369 | 0.240 | 0.157 | — | — | m-RNN [198] | 0.600 | 0.410 | 0.280 | 0.190 | — | — | LRCN [112] | 0.587 | 0.391 | 0.251 | 0.165 | — | — | NIC [141] | 0.670 | 0.450 | 0.300 | — | — | — | RTAN [190] | 67.1 | 48.7 | 34.9 | 23.9 | 20.1 | 53.3 | 3-gated [199] | 69.4 | 45.7 | 33.2 | 22.6 | 23 | — | VD-SAN [188] | 65.2 | 47.1 | 33.6 | 23.9 | 19.9 | — | ATT-CNN [189] | 66.1 | 47.2 | 33.4 | 23.2 | 19.4 | — |
|
|