Research Article

Voice Quality Modelling for Expressive Speech Synthesis

Table 5

The confusion matrix ( ) and measures in the expressive speech styles identification for the reference configurations (“Natural” and “ResHNM”) and HNM transformation configurations (“HNMPro,” “HNMProJiSh,” and “HNMProVoQ”).

(%) HAP SEN AGG SAD Others

Natural
HAP 94.1 0.0 5.9 0.0 0.0
SEN 0.0 100.0 0.0 0.0 0.0
AGG 0.0 0.0 100.0 0.0 0.0
SAD 0.0 9.4 0.0 90.6 0.0
0.97 0.96 0.97 0.95

ResHNM
HAP 87.1 0.0 11.8 0.0 1.2
SEN 1.2 95.3 1.2 2.4 0.0
AGG 1.2 0.0 98.8 0.0 0.0
SAD 0.0 11.8 0.0 88.2 0.0
0.92 0.92 0.93 0.93

HNMPro
HAP 30.6 3.5 17.6 28.2 20.0
SEN 1.2 31.8 9.4 40.0 17.6
AGG 35.3 1.2 21.2 23.5 18.8
SAD 2.4 18.8 10.6 41.2 27.1
0.36 0.41 0.27 0.35

HNMProJiSh
HAP 30.6 3.5 16.5 27.1 22.4
SEN 3.5 30.6 4.7 36.5 24.7
AGG 32.9 2.4 18.8 22.4 23.5
SAD 4.7 17.6 1.2 58.8 17.6
0.36 0.40 0.27 0.48

HNMProVoQ
HAP 34.1 3.5 25.9 14.1 22.4
SEN 0.0 40.0 5.9 34.1 20.0
AGG 23.5 2.4 31.8 16.5 25.9
SAD 3.5 20.0 1.2 64.7 10.6
0.42 0.48 0.39 0.56