Research Article

Automatic Evaluation of Voice Quality Using Text-Based Laryngograph Measurements and Prosodic Analysis

Table 2

Prosodic features and their intervals of computation; 33 prosodic features are based upon duration (“Dur”), energy (“En”), and fundamental frequency (“F0”) measures. The context size denotes the interval of words on which the features are computed; W: computed on current word, WPW: computed in the interval that contains the second and first word before the current word, and the pause between them.

FeaturesContext size
WPWW

Pause: before, Fill-before, after, Fill-after
En: RegCoeff, MseReg, Abs, Norm, Mean
En: Max, MaxPos
Dur: Abs, Norm
F0: RegCoeff, MseReg
F0: Mean, Max, MaxPos, Min, MinPos, Off, OffPos, On, OnPos

The features are abbreviated as follows.
Length of pauses “Pause”: length of silent pause before (before) and after (after), and filled pause before (Fill-before) and after (Fill-after) the respective word in context.
Energy features “En”: regression coefficient (RegCoeff) and mean square error (MseReg) of the energy curve with respect to the regression curve; mean (Mean) and maximum energy (Max) with its position on the time axis (MaxPos); absolute (Abs) and normalized (Norm) energy values.
Duration features “Dur”: absolute (Abs) and normalized (Norm) duration.
features “F0”: regression coefficient (RegCoeff) and the mean square error (MseReg) of the curve with respect to its regression curve; mean (Mean), maximum (Max), minimum (Min), voice onset (On), and offset (Off) values as well as the position of Max (MaxPos), Min (MinPos), On (OnPos), and Off (OffPos) on the time axis; all values are normalized.