Research Article
Context-Fused Guidance for Image Captioning Using Sequence-Level Training
Table 3
Performance comparisons on MS COCO Karpathy test split under CIDEr-D score optimization.
| Sequence-level optimization | Metric | BLEU1 | BLEU2 | BLEU3 | BLEU4 | METEOR | ROUGE-L | CIDEr | SPICE |
| NIC [16] | — | — | — | 31.9 | — | 54.3 | 106.3 | — | SCST [11] | — | — | — | 34.2 | 26.7 | 55.7 | 114.0 | — | Up-down [4] | 79.8 | — | — | 36.3 | 27.7 | 56.9 | 120.1 | 21.4 | RFNet [17] | 79.1 | 63.1 | 48.4 | 36.5 | 27.7 | 57.3 | 121.9 | 21.2 | HAN [20] | 80.9 | 64.6 | 49.8 | 37.6 | 27.8 | 58.1 | 121.7 | 21.5 | RAtt-soft [29] | 80.4 | 63.4 | 48.9 | 37.5 | 28.5 | 61.6 | 122.1 | 22.1 | CFG | 80.5 | 64.7 | 50.2 | 38.3 | 28.2 | 58.3 | 125.4 | 21.6 |
|
|
The best results (%) are highlighted in boldface. The symbol “—” indicates the results are not reported.
|