Research Article
A Symmetric Fusion Learning Model for Detecting Visual Relations and Scene Parsing
Table 1
Comparison with state-of-the-art on the VG data set.
| Method | SGGen | SGCls | PredCls | R@20 | R@50 | R@100 | R@20 | R@50 | R@100 | R@20 | R@50 | R@100 |
| IMP [29] | 14.6 | 20.7 | 24.5 | 31.7 | 34.6 | 35.4 | 52.7 | 59.3 | 61.3 | Frequency [31] | 17.7 | 23.5 | 27.6 | 27.7 | 32.4 | 34.0 | 49.4 | 59.9 | 64.1 | Frequency + overlap [31] | 20.1 | 26.2 | 30.1 | 29.3 | 32.3 | 32.9 | 53.6 | 60.6 | 62.2 | MotifNet-LeftRight [31] | 21.4 | 27.2 | 30.3 | 32.9 | 35.8 | 36.5 | 58.5 | 65.2 | 67.1 | Graph R-CNN [39] | — | 11.4 | 13.7 | — | 29.6 | 31.6 | — | 54.2 | 59.1 | VCTREE-SL [40] | 21.7 | 27.7 | 31.1 | 35.0 | 37.9 | 38.6 | 59.8 | 66.2 | 67.9 | RelDN [37] | 21.1 | 28.3 | 32.7 | 36.1 | 36.8 | 36.8 | 66.9 | 68.4 | 68.4 | VCTREE + TranstextNet[41] | — | 28.1 | 31.7 | — | 38.3 | 39.3 | — | 66.9 | 68.7 | Ours | | | | | | | | | |
|
|