Research Article
Visual Experience-Based Question Answering with Complex Multimodal Environments
Table 2
Performance analysis of visual experience-based question answering depending on different VQAS configurations.
| Configurations | Accuracy (%) | Question types | Total | Existence | Counting | Attribute | Relationship | Include | AgentHas |
| VQAS | 91.96 | 79.60 | 68.53 | 61.16 | 56.24 | 63.21 | 72.37 | VQAS with GT 2D objects | 99.74 | 91.18 | 78.91 | 73.94 | 72.93 | 78.30 | 83.37 | VQAS with GT scene graph | 99.74 | 100.0 | 99.91 | 93.23 | 99.70 | 100.0 | 98.62 | VQAS with GT query | 92.22 | 79.60 | 68.53 | 64.18 | 56.24 | 63.21 | 72.95 |
|
|