Research Article

Realistic Speech-Driven Talking Video Generation with Personalized Pose

Table 2

Mean Opinion Score (MOS) of 100 participants on 4 questions. Q1: completenessof body. Q2: the face is clear. Q3: the body movement is correlated with audio. Q4: overall quality.

ā€‰Q1Q2Q3Q4

Synth.4.144.372.923.75
TTS4.103.802.583.39
Real4.314.424.334.40