Research Article
Visual Experience-Based Question Answering with Complex Multimodal Environments
Table 5
Performance analysis of 3D localization depending on different input information.
| Input information | Output | Mean absolute error | Accuracy (%) |
| Depth Image + 2D Bbox | Position | 0.344 | 0.522 | 17.89 | Size | 0.178 |
| Depth Image + 2D Bbox + Agent Pose | Position | 0.109 | 0.249 | 62.52 | Size | 0.140 |
| Depth Image + 2D Bbox + Agent Pose + Object Class | Position | 0.089 | 0.216 | 78.93 | Size | 0.127 |
|
|
Bold values represent the best results.
|