A Symmetric Fusion Learning Model for Detecting Visual Relations and Scene Parsing

<div>The framework of our proposed approach. We employ faster R-CNN to extract the global feature of an input image via the pretrained VGG-16 backbone and adopt fastText to initialize the word vector to obtain semantic embeddings. Our model contains three modules: (1) a visual module containing SRO branches is used to extract the visual features of object proposals, (2) a semantic module that introduces external information utilizes pretrained fastText word vectors, and (3) a symmetric learning module for alleviating noise via reverse supervision.</div>

Scientific Programming

fig3

Figure 3

Figure 3: A Symmetric Fusion Learning Model for Detecting Visual Relations and Scene Parsing