Research Article

Deep Metric Learning-Assisted 3D Audio-Visual Speaker Tracking via Two-Layer Particle Filter

Table 1

MAE in 3D (m) and on the image plane (pixels) on seq08, seq11, and seq12 over cameras 1, 2, and 3.

Sequence3D MAE (m)2D MAE (pixels)
SeqCamera[21][12]Ours[5][12]Ours

0810.150.120.0810.754.313.24
20.240.110.077.334.663.11
30.200.090.079.855.343.28
1110.310.330.2314.668.156.04
20.290.140.0714.017.485.13
30.260.120.0813.966.644.06
1210.410.260.1812.496.864.15
20.510.170.1010.8110.675.19
30.470.200.1311.869.715.58
Average0.320.170.1111.757.094.42