Research Article
Deep Metric Learning-Assisted 3D Audio-Visual Speaker Tracking via Two-Layer Particle Filter
Table 1
MAE in 3D (m) and on the image plane (pixels) on seq08, seq11, and seq12 over cameras 1, 2, and 3.
| Sequence | 3D MAE (m) | 2D MAE (pixels) | Seq | Camera | [21] | [12] | Ours | [5] | [12] | Ours |
| 08 | 1 | 0.15 | 0.12 | 0.08 | 10.75 | 4.31 | 3.24 | 2 | 0.24 | 0.11 | 0.07 | 7.33 | 4.66 | 3.11 | 3 | 0.20 | 0.09 | 0.07 | 9.85 | 5.34 | 3.28 | 11 | 1 | 0.31 | 0.33 | 0.23 | 14.66 | 8.15 | 6.04 | 2 | 0.29 | 0.14 | 0.07 | 14.01 | 7.48 | 5.13 | 3 | 0.26 | 0.12 | 0.08 | 13.96 | 6.64 | 4.06 | 12 | 1 | 0.41 | 0.26 | 0.18 | 12.49 | 6.86 | 4.15 | 2 | 0.51 | 0.17 | 0.10 | 10.81 | 10.67 | 5.19 | 3 | 0.47 | 0.20 | 0.13 | 11.86 | 9.71 | 5.58 | Average | 0.32 | 0.17 | 0.11 | 11.75 | 7.09 | 4.42 |
|
|