Research Article

Visual Experience-Based Question Answering with Complex Multimodal Environments

Table 2

Performance analysis of visual experience-based question answering depending on different VQAS configurations.

ConfigurationsAccuracy (%)
Question typesTotal
ExistenceCountingAttributeRelationshipIncludeAgentHas

VQAS91.9679.6068.5361.1656.2463.2172.37
VQAS with GT 2D objects99.7491.1878.9173.9472.9378.3083.37
VQAS with GT scene graph99.74100.099.9193.2399.70100.098.62
VQAS with GT query92.2279.6068.5364.1856.2463.2172.95