Real-Time Audio-Visual Analysis for Multiperson Videoconferencing
Figure 13
Recall versus precision for face detection, face tracking, and speaker match (Dataset 2). Speaker match shows lower performance in case of Dataset 2 due to the presence of 4 participants within a sector of 100°.