Research Article

Realistic Speech-Driven Talking Video Generation with Personalized Pose

Table 1

Mean Opinion Score (MOS) of 100 participants on 4 questions. Q1: completeness of body. Q2: the face is clear. Q3: the body movement is correlated with audio. Q4:overall quality.

ā€‰Q1Q2Q3Q4

Learning gesture [31]3.4143.6593.9143.308
Neural-voice-puppetry[32]3.2023.8403.1803.542
EverybodyDance [33]3.9443.6623.6803.681
Personalized-bodyPose[29]3.8944.0113.3833.762
Our method3.9014.0833.5263.778