Research Article

Visual Experience-Based Question Answering with Complex Multimodal Environments

Table 5

Performance analysis of 3D localization depending on different input information.

Input informationOutputMean absolute errorAccuracy (%)

Depth Image + 2D BboxPosition0.3440.52217.89
Size0.178

Depth Image + 2D Bbox + Agent PosePosition0.1090.24962.52
Size0.140

Depth Image + 2D Bbox+Agent Pose+Object ClassPosition0.0890.21678.93
Size0.127

Bold values represent the best results.