Research Article

Real-Time Audio-Visual Analysis for Multiperson Videoconferencing

Figure 11

Evaluations at different steps of processing. Two different scorings of results are used: (a) temporal weighted scoring is used to better evaluate algorithms from the perspective of amount of time. (b) Event-level based scoring is used to better evaluate algorithms from the perspective of amount of discrete events. Upper blocks show basic precision/recall values; further blocks show achieved precision/recall values after each step of processing for the operating system’s point. The two lowest blocks (speaker match and multimodal voice activity detection) show the final achieved precision/recall values. The results from the blocks, marked with a tick, are propagated further to the TA2 system, while the results from other blocks are intermediate or comparative.
175745.fig.0011