Research Article

A Symmetric Fusion Learning Model for Detecting Visual Relations and Scene Parsing

Table 2

Comparison with state-of-the-art on the VRD data set.

MethodRelation detectionPhrase detection
k = 1k = 70k = 1k = 70
R@50R@100R@50R@100R@50R@100R@50R@100

VRD [3]17.0316.1724.9020.0414.7013.8621.5117.35
KL distillation [23]19.1721.3422.6831.8923.1424.0326.3229.43
Zoom-net [25]18.9221.4121.3727.3024.8228.0929.0537.34
CAI + SCA-M [25]19.5422.3922.3428.5225.2128.8929.6438.39
RelDN [37]18.9222.9621.5226.3826.3731.4228.2435.44
AVR [42]22.8325.4127.3532.9629.3333.2734.5141.36
GPS-Net [43]21.5024.3028.9034.00
SABRA [44]24.4729.1627.2733.9930.5736.8033.3941.79
Ours