About this Journal Submit a Manuscript Table of Contents
ISRN Artificial Intelligence
Volume 2014 (2014), Article ID 737814, 13 pages
Research Article

BPN Based Likelihood Ratio Score Fusion for Audio-Visual Speaker Identification in Response to Noise

1Department of Computer Science & Engineering, Rajshahi University of Engineering & Technology, Rajshahi 6204, Bangladesh
2School of Engineering & Computer Science, Independent University, Dhaka 1229, Bangladesh

Received 25 September 2013; Accepted 3 November 2013; Published 8 January 2014

Academic Editors: J. Molina, M. Monti, M. Ture, and J. M. Usher

Copyright © 2014 Md. Rabiul Islam and Md. Abdus Sobhan. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.


This paper deals with a new and improved approach of Back-propagation learning neural network based likelihood ratio score fusion technique for audio-visual speaker Identification in various noisy environments. Different signal preprocessing and noise removing techniques have been used to process the speech utterance and LPC, LPCC, RCC, MFCC, ΔMFCC and ΔΔMFCC methods have been applied to extract the features from the audio signal. Active Shape Model has been used to extract the appearance and shape based facial features. To enhance the performance of the proposed system, appearance and shape based facial features are concatenated and Principal Component Analysis method has been used to reduce the dimension of the facial feature vector. The audio and visual feature vectors are then fed to Hidden Markov Model separately to find out the log-likelihood of each modality. The reliability of each modality has been calculated using reliability measurement method. Finally, these integrated likelihood ratios are fed to Back-propagation learning neural network algorithm to discover the final speaker identification result. For measuring the performance of the proposed system, three different databases, that is, NOIZEUS speech database, ORL face database and VALID audio-visual multimodal database have been used for audio-only, visual-only, and audio-visual speaker identification. To identify the accuracy of the proposed system with existing techniques under various noisy environment, different types of artificial noise have been added at various rates with audio and visual signal and performance being compared with different variations of audio and visual features.