- About this Journal ·
- Abstracting and Indexing ·
- Aims and Scope ·
- Article Processing Charges ·
- Articles in Press ·
- Author Guidelines ·
- Bibliographic Information ·
- Citations to this Journal ·
- Contact Information ·
- Editorial Board ·
- Editorial Workflow ·
- Free eTOC Alerts ·
- Publication Ethics ·
- Reviewers Acknowledgment ·
- Submit a Manuscript ·
- Subscription Information ·
- Table of Contents
Applied Computational Intelligence and Soft Computing
Volume 2014 (2014), Article ID 831830, 7 pages
Feature Fusion Based Audio-Visual Speaker Identification Using Hidden Markov Model under Different Lighting Variations
1Department of Computer Science & Engineering, Rajshahi University of Engineering & Technology, Rajshahi 6204, Bangladesh
2School of Engineering & Computer Science, Independent University, Dhaka 1229, Bangladesh
Received 31 August 2013; Revised 14 January 2014; Accepted 16 January 2014; Published 5 March 2014
Academic Editor: T. Warren Liao
Copyright © 2014 Md. Rabiul Islam and Md. Abdus Sobhan. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
- D. G. Stork and M. E. Hennecke, Eds., Speechreading by Humans and Machines, Springer, Berlin, Germany, 1996.
- R. Campbell, B. Dodd, and D. Burnham, Eds., Hearing by Eye II, Psychology Press, Hove, UK, 1998.
- L. A. Ross, D. Saint-Amour, V. M. Leavitt, D. C. Javitt, and J. J. Foxe, “Do you see what I am saying? Exploring visual enhancement of speech comprehension in noisy environments,” Cerebral Cortex, vol. 17, no. 5, pp. 1147–1153, 2007.
- P. Arnold and F. Hill, “Bisensory augmentation: a speechreading advantage when speech is clearly audible and intact,” British Journal of Psychology, vol. 92, no. 2, pp. 339–355, 2001.
- S. Dupont and J. Luettin, “Audio-visual speech modeling for continuous speech recognition,” IEEE Transactions on Multimedia, vol. 2, no. 3, pp. 141–151, 2000.
- G. Potamianos, J. Luettin, and C. Neti, “Hierarchical discriminant features for audio-visual LVCSR,” in Proceedings of the IEEE Interntional Conference on Acoustics, Speech, and Signal Processing, pp. 165–168, May 2001.
- C. C. Chibelushi, F. Deravi, and J. S. D. Mason, “A review of speech-based bimodal recognition,” IEEE Transactions on Multimedia, vol. 4, no. 1, pp. 23–37, 2002.
- D. N. Zotkin, R. Duraiswami, and L. S. Davis, “Joint audio-visual tracking using particle filters,” EURASIP Journal on Applied Signal Processing, vol. 2002, no. 11, pp. 1154–1164, 2002.
- P. de Cuetos, C. Neti, and A. W. Senior, “Audio-visual intent-to-speak detection for human-computer interaction,” in Proceedings of the IEEE Interntional Conference on Acoustics, Speech, and Signal Processing, pp. 2373–2376, Istanbul, Turkey, June 2000.
- D. Sodoyer, J.-L. Schwartz, L. Girin, J. Klinkisch, and C. Jutten, “Separation of audio-visual speech sources: a new approach exploiting the audio-visual coherence of speech stimuli,” EURASIP Journal on Applied Signal Processing, vol. 2002, no. 11, pp. 1165–1173, 2002.
- E. Foucher, L. Girin, and G. Feng, “Audiovisual speech coder: using vector quantization to exploit the audio/video correlation,” in Proceedings of the Conference on Audio-Visual Speech Processing, pp. 67–71, Terrigal, Australia, December 1998.
- J. Huang, Z. Liu, Y. Wang, Y. Chen, and E. Wong, “Integration of multimodal features for video scene classification based on HMM,” in Proceedings of the IEEE 3rd Workshop on Multimedia Signal Processing, pp. 53–58, Copenhagen, Denmark, September 1999.
- E. Cosatto and H. P. Graf, “Photo-realistic talking-heads from image samples,” IEEE Transactions on Multimedia, vol. 2, no. 3, pp. 152–163, 2000.
- G. Potamianos, C. Neti, and S. Deligne, “Joint audio-visual speech processing for recognition and enhancement,” in Proceedings of the Auditory-Visual Speech Processing Tutorial and Research Workshop (AVSP '03), pp. 95–104, Saint-Jorioz, France, September 2003.
- A. Ross and R. Govindarajan, “Feature level fusion using hand and face biometrics,” in Biometric Technology for Human Identification II, Proceedings of SPIE, pp. 196–204, Orlando, Fla, USA, March 2005.
- A. Rogozan and P. Deléglise, “Adaptive fusion of acoustic and visual sources for automatic speech recognition,” Speech Communication, vol. 26, no. 1-2, pp. 149–161, 1998.
- G. Potamianos, C. Neti, G. Gravier, A. Garg, and A. W. Senior, “Recent advances in the automatic recognition of audiovisual speech,” Proceedings of the IEEE, vol. 91, no. 9, pp. 1306–1326, 2003.
- J.-S. Lee and C. H. Park, Adaptive Decision Fusion for Audio-Visual Speech Recognition. Speech Recognition, 2008.
- K. Nandakumar, Y. Chen, S. C. Dass, and A. K. Jain, “Likelihood ratio-based biometric score fusion,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 30, no. 2, pp. 342–347, 2008.
- Md. Rabiul Islam and Md. Fayzur Rahman, “Likelihood ratio based score fusion for audio-visual speaker identification in challenging environment,” International Journal of Computer Applications, vol. 6, no. 7, pp. 6–11, 2010.
- L. Girin, G. Feng, and J. L. Schwartz, “Fusion of auditory and visual information for noisy speech enhancement: a preliminary study of vowel transitions,” in Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP '98), pp. 1005–1008, 1998.
- S. Doclo and M. Moonen, “On the output SNR of the speech-distortion weighted multichannel Wiener filter,” IEEE Signal Processing Letters, vol. 12, no. 12, pp. 809–811, 2005.
- N. Wiener and R. E. A. C. Paley, Fourier Transforms in the Complex Domains, American Mathematical Society, Providence, RI, USA, 1934.
- K. Kitayama, M. Goto, K. Itou, and T. Kobayashi, “Speech starter: noise-robust endpoint detection by using filled pauses,” in Proceedings of the Eurospeech, pp. 1237–1240, Geneva, Switzerland, 2003.
- S. E. Bou-Ghazale and K. Assaleh, “A robust endpoint detection of speech for noisy environments with application to automatic speech recognition,” in Proceedings of the IEEE International Conference on Acoustic, Speech, and Signal Processing (ICASSP '02), vol. 4, pp. 3808–3811, May 2002.
- J. W. Picone, “Signal modeling techniques in speech recognition,” Proceedings of the IEEE, vol. 81, no. 9, pp. 1215–1247, 1993.
- L. P. Cordella, P. Foggia, C. Sansone, and M. Vento, “A real-time text-independent speaker identification system,” in Proceedings of the 12th International Conference on Image Analysis and Processing, pp. 632–637, IEEE Computer Society Press, Mantova, Italy, September 2003.
- F. J. Harris, “On the use of windows for harmonic analysis with the discrete Fourier transform,” Proceedings of the IEEE, vol. 66, no. 1, pp. 51–83, 1978.
- S. Milborrow, Locating facial features with active shape models [dissertation], Faculty of Engineering, University of Cape Town, Cape Town, South Africa, 2007.
- R. Herpers, G. Verghese, K. Derpains, and R. McCready, “Detection and tracking of face in real environments,” in Proceedings of the IEEE International Workshop on Recognition, Analysis and Tracking of Face and Gesture in Real-Time Systems, pp. 96–104, Corfu, Greece, 1999.
- J. Daugman, “Face detection: a survey,” Computer Vision and Image Understanding, vol. 83, no. 3, pp. 236–274, 2001.
- R. C. Gonzalez and R. E. Woods, Digital Image Processing, Addison-Wesley, 2002.
- A. Ross and R. Govindarajan, “Feature level fusion using hand and face biometrics,” in Biometric Technology for Human Identification II, vol. 5779 of Proceedings of SPIE, pp. 196–204, Orlando, Fla, USA, March 2005.
- J.-S. Lee and C. H. Park, Speech Recognition, Technologies and Applications, I-Tech, Vienna, Austria, 2008.
- P. A. Devijver, “Baum's forward-backward algorithm revisited,” Pattern Recognition Letters, vol. 3, no. 6, pp. 369–373, 1985.
- N. A. Fox, B. A. O'Mullane, and R. B. Reilly, “VALID: a new practical audio-visual database, and comparative results,” in Audio - and Video-Based Biometric Person Authentication, Lecture Notes in Computer Science, pp. 777–786, 2005.