Table of Contents Author Guidelines Submit a Manuscript
Computational and Mathematical Methods in Medicine
Volume 2015, Article ID 316325, 11 pages
Research Article

Automatic Evaluation of Voice Quality Using Text-Based Laryngograph Measurements and Prosodic Analysis

1Lehrstuhl für Mustererkennung, Friedrich-Alexander-Universität Erlangen-Nürnberg (FAU), Martensstraße 3, 91058 Erlangen, Germany
2Klinik für Hals-, Nasen-, Ohrenheilkunde, Universitätsklinikum Magdeburg, Leipziger Straße 44, 39120 Magdeburg, Germany
3Phoniatrische und Pädaudiologische Abteilung, Klinikum der Universität Erlangen-Nürnberg, Bohlenplatz 21, 91054 Erlangen, Germany
4Department of Computer Science and Engineering, University of West Bohemia in Pilsen, Univerzitní 8, 306 14 Plzeň, Czech Republic
5Klinik für Phoniatrie und Pädaudiologie, Medizinische Hochschule Hannover, Carl-Neuberg-Straße 1, 30625 Hannover, Germany

Received 23 February 2015; Accepted 13 May 2015

Academic Editor: Zoran Bursac

Copyright © 2015 Tino Haderlein et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.


Due to low intra- and interrater reliability, perceptual voice evaluation should be supported by objective, automatic methods. In this study, text-based, computer-aided prosodic analysis and measurements of connected speech were combined in order to model perceptual evaluation of the German Roughness-Breathiness-Hoarseness (RBH) scheme. 58 connected speech samples (43 women and 15 men; years) containing the German version of the text “The North Wind and the Sun” were evaluated perceptually by 19 speech and voice therapy students according to the RBH scale. For the human-machine correlation, Support Vector Regression with measurements of the vocal fold cycle irregularities (CFx) and the closed phases of vocal fold vibration (CQx) of the Laryngograph and 33 features from a prosodic analysis module were used to model the listeners’ ratings. The best human-machine results for roughness were obtained from a combination of six prosodic features and CFx (, ). These correlations were approximately the same as the interrater agreement among human raters (, ). CQx was one of the substantial features of the hoarseness model. For hoarseness and breathiness, the human-machine agreement was substantially lower. Nevertheless, the automatic analysis method can serve as the basis for a meaningful objective support for perceptual analysis.