Research Article

Visual Experience-Based Question Answering with Complex Multimodal Environments

Table 1

Specification of the VEQA dataset.

CategoryCount

Action scenarioAction scenarios200
Actions per action scenario77

QuestionExistence1,168
Counting1,168
Attribute1,168
Relation1,005
Include676
AgentHas212
Total questions5,397
Vocabulary size90

Scene graphScene graphs3,916
Objects13,109
Attributes26,218
Relationships25,583