About this Journal Submit a Manuscript Table of Contents
Applied Computational Intelligence and Soft Computing
Volume 2014 (2014), Article ID 831830, 7 pages
http://dx.doi.org/10.1155/2014/831830
Research Article

Feature Fusion Based Audio-Visual Speaker Identification Using Hidden Markov Model under Different Lighting Variations

1Department of Computer Science & Engineering, Rajshahi University of Engineering & Technology, Rajshahi 6204, Bangladesh
2School of Engineering & Computer Science, Independent University, Dhaka 1229, Bangladesh

Received 31 August 2013; Revised 14 January 2014; Accepted 16 January 2014; Published 5 March 2014

Academic Editor: T. Warren Liao

Copyright © 2014 Md. Rabiul Islam and Md. Abdus Sobhan. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Linked References

  1. D. G. Stork and M. E. Hennecke, Eds., Speechreading by Humans and Machines, Springer, Berlin, Germany, 1996.
  2. R. Campbell, B. Dodd, and D. Burnham, Eds., Hearing by Eye II, Psychology Press, Hove, UK, 1998.
  3. L. A. Ross, D. Saint-Amour, V. M. Leavitt, D. C. Javitt, and J. J. Foxe, “Do you see what I am saying? Exploring visual enhancement of speech comprehension in noisy environments,” Cerebral Cortex, vol. 17, no. 5, pp. 1147–1153, 2007. View at Publisher · View at Google Scholar · View at Scopus
  4. P. Arnold and F. Hill, “Bisensory augmentation: a speechreading advantage when speech is clearly audible and intact,” British Journal of Psychology, vol. 92, no. 2, pp. 339–355, 2001. View at Publisher · View at Google Scholar · View at Scopus
  5. S. Dupont and J. Luettin, “Audio-visual speech modeling for continuous speech recognition,” IEEE Transactions on Multimedia, vol. 2, no. 3, pp. 141–151, 2000. View at Publisher · View at Google Scholar · View at Scopus
  6. G. Potamianos, J. Luettin, and C. Neti, “Hierarchical discriminant features for audio-visual LVCSR,” in Proceedings of the IEEE Interntional Conference on Acoustics, Speech, and Signal Processing, pp. 165–168, May 2001. View at Scopus
  7. C. C. Chibelushi, F. Deravi, and J. S. D. Mason, “A review of speech-based bimodal recognition,” IEEE Transactions on Multimedia, vol. 4, no. 1, pp. 23–37, 2002. View at Publisher · View at Google Scholar · View at Scopus
  8. D. N. Zotkin, R. Duraiswami, and L. S. Davis, “Joint audio-visual tracking using particle filters,” EURASIP Journal on Applied Signal Processing, vol. 2002, no. 11, pp. 1154–1164, 2002. View at Publisher · View at Google Scholar · View at Scopus
  9. P. de Cuetos, C. Neti, and A. W. Senior, “Audio-visual intent-to-speak detection for human-computer interaction,” in Proceedings of the IEEE Interntional Conference on Acoustics, Speech, and Signal Processing, pp. 2373–2376, Istanbul, Turkey, June 2000. View at Scopus
  10. D. Sodoyer, J.-L. Schwartz, L. Girin, J. Klinkisch, and C. Jutten, “Separation of audio-visual speech sources: a new approach exploiting the audio-visual coherence of speech stimuli,” EURASIP Journal on Applied Signal Processing, vol. 2002, no. 11, pp. 1165–1173, 2002. View at Publisher · View at Google Scholar · View at Scopus
  11. E. Foucher, L. Girin, and G. Feng, “Audiovisual speech coder: using vector quantization to exploit the audio/video correlation,” in Proceedings of the Conference on Audio-Visual Speech Processing, pp. 67–71, Terrigal, Australia, December 1998.
  12. J. Huang, Z. Liu, Y. Wang, Y. Chen, and E. Wong, “Integration of multimodal features for video scene classification based on HMM,” in Proceedings of the IEEE 3rd Workshop on Multimedia Signal Processing, pp. 53–58, Copenhagen, Denmark, September 1999.
  13. E. Cosatto and H. P. Graf, “Photo-realistic talking-heads from image samples,” IEEE Transactions on Multimedia, vol. 2, no. 3, pp. 152–163, 2000. View at Publisher · View at Google Scholar · View at Scopus
  14. G. Potamianos, C. Neti, and S. Deligne, “Joint audio-visual speech processing for recognition and enhancement,” in Proceedings of the Auditory-Visual Speech Processing Tutorial and Research Workshop (AVSP '03), pp. 95–104, Saint-Jorioz, France, September 2003.
  15. A. Ross and R. Govindarajan, “Feature level fusion using hand and face biometrics,” in Biometric Technology for Human Identification II, Proceedings of SPIE, pp. 196–204, Orlando, Fla, USA, March 2005. View at Publisher · View at Google Scholar · View at Scopus
  16. A. Rogozan and P. Deléglise, “Adaptive fusion of acoustic and visual sources for automatic speech recognition,” Speech Communication, vol. 26, no. 1-2, pp. 149–161, 1998. View at Scopus
  17. G. Potamianos, C. Neti, G. Gravier, A. Garg, and A. W. Senior, “Recent advances in the automatic recognition of audiovisual speech,” Proceedings of the IEEE, vol. 91, no. 9, pp. 1306–1326, 2003. View at Publisher · View at Google Scholar · View at Scopus
  18. J.-S. Lee and C. H. Park, Adaptive Decision Fusion for Audio-Visual Speech Recognition. Speech Recognition, 2008.
  19. K. Nandakumar, Y. Chen, S. C. Dass, and A. K. Jain, “Likelihood ratio-based biometric score fusion,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 30, no. 2, pp. 342–347, 2008. View at Publisher · View at Google Scholar · View at Scopus
  20. Md. Rabiul Islam and Md. Fayzur Rahman, “Likelihood ratio based score fusion for audio-visual speaker identification in challenging environment,” International Journal of Computer Applications, vol. 6, no. 7, pp. 6–11, 2010.
  21. L. Girin, G. Feng, and J. L. Schwartz, “Fusion of auditory and visual information for noisy speech enhancement: a preliminary study of vowel transitions,” in Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP '98), pp. 1005–1008, 1998.
  22. S. Doclo and M. Moonen, “On the output SNR of the speech-distortion weighted multichannel Wiener filter,” IEEE Signal Processing Letters, vol. 12, no. 12, pp. 809–811, 2005. View at Publisher · View at Google Scholar · View at Scopus
  23. N. Wiener and R. E. A. C. Paley, Fourier Transforms in the Complex Domains, American Mathematical Society, Providence, RI, USA, 1934.
  24. K. Kitayama, M. Goto, K. Itou, and T. Kobayashi, “Speech starter: noise-robust endpoint detection by using filled pauses,” in Proceedings of the Eurospeech, pp. 1237–1240, Geneva, Switzerland, 2003.
  25. S. E. Bou-Ghazale and K. Assaleh, “A robust endpoint detection of speech for noisy environments with application to automatic speech recognition,” in Proceedings of the IEEE International Conference on Acoustic, Speech, and Signal Processing (ICASSP '02), vol. 4, pp. 3808–3811, May 2002. View at Scopus
  26. J. W. Picone, “Signal modeling techniques in speech recognition,” Proceedings of the IEEE, vol. 81, no. 9, pp. 1215–1247, 1993. View at Publisher · View at Google Scholar · View at Scopus
  27. L. P. Cordella, P. Foggia, C. Sansone, and M. Vento, “A real-time text-independent speaker identification system,” in Proceedings of the 12th International Conference on Image Analysis and Processing, pp. 632–637, IEEE Computer Society Press, Mantova, Italy, September 2003.
  28. F. J. Harris, “On the use of windows for harmonic analysis with the discrete Fourier transform,” Proceedings of the IEEE, vol. 66, no. 1, pp. 51–83, 1978. View at Publisher · View at Google Scholar · View at Scopus
  29. S. Milborrow, Locating facial features with active shape models [dissertation], Faculty of Engineering, University of Cape Town, Cape Town, South Africa, 2007.
  30. R. Herpers, G. Verghese, K. Derpains, and R. McCready, “Detection and tracking of face in real environments,” in Proceedings of the IEEE International Workshop on Recognition, Analysis and Tracking of Face and Gesture in Real-Time Systems, pp. 96–104, Corfu, Greece, 1999.
  31. J. Daugman, “Face detection: a survey,” Computer Vision and Image Understanding, vol. 83, no. 3, pp. 236–274, 2001.
  32. R. C. Gonzalez and R. E. Woods, Digital Image Processing, Addison-Wesley, 2002.
  33. A. Ross and R. Govindarajan, “Feature level fusion using hand and face biometrics,” in Biometric Technology for Human Identification II, vol. 5779 of Proceedings of SPIE, pp. 196–204, Orlando, Fla, USA, March 2005. View at Publisher · View at Google Scholar · View at Scopus
  34. J.-S. Lee and C. H. Park, Speech Recognition, Technologies and Applications, I-Tech, Vienna, Austria, 2008.
  35. P. A. Devijver, “Baum's forward-backward algorithm revisited,” Pattern Recognition Letters, vol. 3, no. 6, pp. 369–373, 1985. View at Scopus
  36. N. A. Fox, B. A. O'Mullane, and R. B. Reilly, “VALID: a new practical audio-visual database, and comparative results,” in Audio - and Video-Based Biometric Person Authentication, Lecture Notes in Computer Science, pp. 777–786, 2005. View at Scopus