Automatic Evaluation of Voice Quality Using Text-Based Laryngograph Measurements and Prosodic Analysis
Table 5
Best feature sets for human-machine correlation and their weights in the regression formulae.
Feature
Context
w/o CFx
w/o CFx
DurNorm
WPW
0.377
0.499
0.378
DurNorm
W
0.513
0.402
F0Min
W
F0Mean
W
F0Onset
W
0.173
F0OffPos
W
0.322
0.120
0.185
0.236
EnNorm
WPW
0.343
EnNorm
W
0.155
MeanJitter
15 W
0.118
0.186
0.113
0.249
0.239
0.366
0.368
0.320
0.208
MeanShimmer
15 W
0.144
0.138
0.145
0.114
StandDevShimmer
15 W
#+Voiced
15 W
0.321
0.347
0.334
0.324
0.094
0.122
RelNum+/−Voiced
15 W
0.218
0.082
CFx
0.210
0.206
CQx
0.643
0.495
0.506
0.71
0.66
0.71
0.67
0.36
0.53
0.47
0.45
0.49
ρ
0.57
0.49
0.58
0.49
0.27
0.54
0.46
0.45
0.55
Significance level
<0.001
<0.001
<0.001
<0.001
0.003
<0.001
<0.001
<0.001
<0.001
Contexts: W: word, WPW: word-pause-word, 15 W: 15 words (“global” feature). The correlations of the respective set to the human reference are given by (Pearson) and (Spearman).