Research Article

Realistic Speech-Driven Talking Video Generation with Personalized Pose

Table 3

Evaluation metrics used NME (%) on facial landmarks (lower is better).

ā€‰Orig.Only-GRUTTS-melText

0.54.9255.6735.8715.693
1.04.9215.6405.8855.690
1.54.8535.6445.8775.614
2.04.9075.6475.8295.607