About this Journal Submit a Manuscript Table of Contents
ISRN Artificial Intelligence
Volume 2014 (2014), Article ID 737814, 13 pages
http://dx.doi.org/10.1155/2014/737814
Research Article

BPN Based Likelihood Ratio Score Fusion for Audio-Visual Speaker Identification in Response to Noise

1Department of Computer Science & Engineering, Rajshahi University of Engineering & Technology, Rajshahi 6204, Bangladesh
2School of Engineering & Computer Science, Independent University, Dhaka 1229, Bangladesh

Received 25 September 2013; Accepted 3 November 2013; Published 8 January 2014

Academic Editors: J. Molina, M. Monti, M. Ture, and J. M. Usher

Copyright © 2014 Md. Rabiul Islam and Md. Abdus Sobhan. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Linked References

  1. J. D. Woodward, “Biometrics: Privacy's foe or Privacy's friend?” Proceedings of the IEEE, vol. 85, no. 9, pp. 1480–1492, 1997. View at Publisher · View at Google Scholar · View at Scopus
  2. A. K. Jain, R. Bolle, and S. Pankanti, “Introduction to biometrics,” in Biometrics, Personal Identification in Networked Society, A. K. Jain, R. Bolle, and S. Pankanti, Eds., pp. 1–41, Kluwer Academic Publishers, Dordrecht, The Netherlands, 1999.
  3. D. Bhattacharyya, R. Rahul, A. A. Farkhod, and C. Minkyu, “Biometric authentication: a review,” International Journal of U- and E- Service, Science and Technology, vol. 2, no. 3, 2009.
  4. C. Sanderson and K. K. Paliwal, “Information fusion and person verification using speech and face information,” Research Paper IDIAP-RR 02-33, IDIAP, 2002.
  5. D. G. Stork and M. E. Hennecke, Speechreading by Humans and Machines, Springer, Berlin, Germany, 1996.
  6. R. Campbell, B. Dodd, and D. Burnham, Hearing by Eye II, Psychology Press, Hove, UK, 1998.
  7. S. Dupont and J. Luettin, “Audio-visual speech modeling for continuous speech recognition,” IEEE Transactions on Multimedia, vol. 2, no. 3, pp. 141–151, 2000. View at Publisher · View at Google Scholar · View at Scopus
  8. G. Potamianos, J. Luettin, and C. Neti, “Hierarchical discriminant features for audio-visual LVCSR,” in Proceedings of the IEEE Interntional Conference on Acoustics, Speech, and Signal Processing, pp. 165–168, May 2001. View at Scopus
  9. G. Potamianos and C. Neti, “Automatic speechreading of impaired speech,” in Proceedings of the Conference on Audio-Visual Speech Processing, pp. 177–182, 2001.
  10. F. J. Huang and T. Chen, “Consideration of lombard effect for speechreading,” in Proceedings of the IEEE 4th Workshop on Multimedia Signal Processing, pp. 613–618, October 2001. View at Scopus
  11. G. Potamianos, C. Neti, G. Gravier, A. Garg, and A. W. Senior, “Recent advances in the automatic recognition of audiovisual speech,” Proceedings of the IEEE, vol. 91, no. 9, pp. 1306–1325, 2003. View at Publisher · View at Google Scholar · View at Scopus
  12. D. A. Reynolds, “Experimental evaluation of features for robust speaker identification,” IEEE Transactions on Speech and Audio Processing, vol. 2, no. 4, pp. 639–643, 1994. View at Publisher · View at Google Scholar · View at Scopus
  13. S. Sharma, D. Ellis, S. Kajarekar, P. Jain, and H. Hermansky, “Feature extraction using non-linear transformation for robust speech recognition on the aurora database,” in Proceedings of the IEEE Interntional Conference on Acoustics, Speech, and Signal Processing (ICASSP '00), pp. 1117–1120, June 2000. View at Scopus
  14. D. Wu, A. C. Morris, and J. Koreman, “MLP internal representation as disciminant features for improved speaker recognition,” in Proceedings of the International Conference on Non-Linear Speech Processing (NOLISP '05), pp. 25–33, Barcelona, Spain, 2005.
  15. Y. Konig, L. Heck, M. Weintraub, and K. Sonmez, “Nonlinear discriminant feature extraction for robust text-independent speaker recognition,” in Proceedings of the RLA2C, ESCA Workshop on Speaker Recognition and Its Commercial and Forensic Applications, pp. 72–75, 1998.
  16. C. C. Chibelushi, F. Deravi, and J. S. D. Mason, “A review of speech-based bimodal recognition,” IEEE Transactions on Multimedia, vol. 4, no. 1, pp. 23–37, 2002. View at Publisher · View at Google Scholar · View at Scopus
  17. X. Zhang, C. C. Broun, R. M. Mersereau, and M. A. Clements, “Automatic speechreading with applications to human-computer interfaces,” EURASIP Journal on Applied Signal Processing, vol. 2002, no. 11, pp. 1228–1247, 2002. View at Publisher · View at Google Scholar · View at Scopus
  18. D. N. Zotkin, R. Duraiswami, and L. S. Davis, “Joint audio-visual tracking using particle filters,” EURASIP Journal on Applied Signal Processing, vol. 2002, no. 11, pp. 1154–1164, 2002. View at Publisher · View at Google Scholar · View at Scopus
  19. P. de Cuetos, C. Neti, and A. W. Senior, “Audio-visual intent-to-speak detection for human-computer interaction,” in Proceedings of the IEEE Interntional Conference on Acoustics, Speech, and Signal Processing, pp. 2373–2376, June 2000. View at Scopus
  20. D. Sodoyer, J.-L. Schwartz, L. Girin, J. Klinkisch, and C. Jutten, “Separation of audio-visual speech sources: a new approach exploiting the audio-visual coherence of speech stimuli,” EURASIP Journal on Applied Signal Processing, vol. 2002, no. 11, pp. 1165–1173, 2002. View at Publisher · View at Google Scholar · View at Scopus
  21. E. Foucher, L. Girin, and G. Feng, “Audiovisual speech coder: using vector quantization to exploit the audio/video correlation,” in Proceedings of the Conference on Audio-Visual Speech Processing, pp. 67–71, Terrigal, Australia, 1998.
  22. J. Huang, Z. Liu, Y. Wang, Y. Chen, and E. Wong, “Integration of multimodal features for video scene classification based on HMM,” in Proceedings of the Workshop on Multimedia Signal Processing, pp. 53–58, Copenhagen, Denmark, 1999.
  23. M. M. Cohen and D. W. Massaro, “What can visual speech synthesis tell visual speech recognition?” in Proceedings of the Asilomar Conference on Signals, Systems, and Computers, Pacific Grove, Calif, USA, 1994.
  24. E. Cosatto and H. P. Graf, “Photo-realistic talking-heads from image samples,” IEEE Transactions on Multimedia, vol. 2, no. 3, pp. 152–163, 2000. View at Publisher · View at Google Scholar · View at Scopus
  25. G. Potamianos, C. Neti, and S. Deligne, “Joint audio-visual speech processing for recognition and enhancement,” in Proceedings of the Auditory-Visual Speech Processing Tutorial and Research Workshop (AVSP '03), pp. 95–104, St. Jorioz, France, 2003.
  26. A. Rogozan and P. Deléglise, “Adaptive fusion of acoustic and visual sources for automatic speech recognition,” Speech Communication, vol. 26, no. 1-2, pp. 149–161, 1998. View at Scopus
  27. A. Adjoudani and C. Benoit, “On the integration of auditory and visual parameters in an HMM-based ASR,” in Humans and Machines: Models, Systems, and Applications, D. G. Stork and M. E. Hennecke, Eds., pp. 461–472, Springer, Berlin, Germany.
  28. T. W. Lewis and D. M. W. Powers, “Sensor fusion weighting measures in audio-visual speech recognition,” in Proceedings of the Conference on Australasian Computer Science, pp. 305–314, Dunedine, New Zealand, 2004.
  29. G. Potamianos and C. Neti, “Stream confidence estimation for audio-visual speech recognition,” in Proceedings of the International Conference on Spoken Language Processing, pp. 746–749, Beijing, China, 2000.
  30. I. Matthews, J. A. Bangham, and S. Cox, “Audiovisual speech recognition using multiscale nonlinear image decomposition,” in Proceedings of the International Conference on Spoken Language Processing (ICSLP '96), pp. 38–41, October 1996. View at Scopus
  31. L. Lam and C. Y. Suen, “Application of majority voting to pattern recognition: an analysis of its behavior and performance,” IEEE Transactions on Systems, Man, and Cybernetics A, vol. 27, no. 5, pp. 553–568, 1997. View at Publisher · View at Google Scholar · View at Scopus
  32. L. Lam and C. Y. Suen, “Optimal combinations of pattern classifiers,” Pattern Recognition Letters, vol. 16, no. 9, pp. 945–954, 1995. View at Scopus
  33. L. Xu, A. Krzyzak, and C. Y. Suen, “Methods of combining multiple classifiers and their applications to handwriting recognition,” IEEE Transactions on Systems, Man and Cybernetics, vol. 22, no. 3, pp. 418–435, 1992. View at Publisher · View at Google Scholar · View at Scopus
  34. J. Daugman, “Biometric decision landscapes,” Tech. Rep. TR482, University of Cambridge Computer Laboratory, 2000.
  35. S. Doclo and M. Moonen, “On the output SNR of the speech-distortion weighted multichannel Wiener filter,” IEEE Signal Processing Letters, vol. 12, no. 12, pp. 809–811, 2005. View at Publisher · View at Google Scholar · View at Scopus
  36. R. Wang and W. Filtering, “PHYS, 3301, scientific computing,” Project Report for NOISE Group, May 2000.
  37. K. Kitayama, M. Goto, K. Itou, and T. Kobayashi, “Speech starter: noise-robust endpoint detection by using filled pauses,” in Proceedings of the 8th European Conference on Speech Communication and Technology (Eurospeech '03), pp. 1237–1240, Geneva, Switzerland, September 2003.
  38. Q. Li, J. Zheng, A. Tsai, and Q. Zhou, “Robust endpoint detection and energy normalization for real-time speech and speaker recognition,” IEEE Transactions on Speech and Audio Processing, vol. 10, no. 3, pp. 146–157, 2002. View at Publisher · View at Google Scholar · View at Scopus
  39. N. Wiener and R. E. A. C. Paley, Fourier Transforms in the Complex Domains, American Mathematical Society, Providence, RI, USA, 1934.
  40. J. W. Picone, “Signal modeling techniques in speech recognition,” Proceedings of the IEEE, vol. 81, no. 9, pp. 1215–1247, 1993. View at Publisher · View at Google Scholar · View at Scopus
  41. L. P. Cordella, P. Foggia, C. Sansone, and M. Vento, “A real-time text-independent speaker identification system,” in Proceedings of the 12th International Conference on Image Analysis and Processing, pp. 632–637, IEEE Computer Society Press, 2003.
  42. F. J. Harris, “On the use of windows for harmonic analysis with the discrete Fourier transform,” Proceedings of the IEEE, vol. 66, no. 1, pp. 51–83, 1978. View at Publisher · View at Google Scholar · View at Scopus
  43. J. Proakis and D. Manolakis, Digital Signl Processing, Principles, Algorithms and Applications, Macmillan, New York, NY, USA, 2nd edition, 1992.
  44. S. B. Davis and P. Mermelstein, “Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences,” IEEE Transactions on Acoustics, Speech, and Signal Processing, vol. 28, no. 4, pp. 357–366, 1980. View at Scopus
  45. T. F. Li and S.-C. Chang, “Speech recognition of mandarin syllables using both linear predict coding cepstra and Mel frequency cepstra,” in Proceedings of the 19th Conference on Computational Linguistics and Speech Processing, pp. 379–390, 2007.
  46. Y. Hu and P. C. Loizou, “Subjective comparison of speech enhancement algorithms,” in Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP '06), pp. I153–I156, Toulouse, France, May 2006. View at Scopus
  47. Y. Hu and P. C. Loizou, “Evaluation of objective measures for speech enhancement,” in Proceedings of the 9th International Conference on Spoken Language Processing (INTERSPEECH '06), pp. 1447–1450, September 2006. View at Scopus
  48. Y. Hu and P. C. Loizou, “Evaluation of objective quality measures for speech enhancement,” IEEE Transactions on Audio, Speech and Language Processing, vol. 16, no. 1, pp. 229–238, 2008. View at Publisher · View at Google Scholar · View at Scopus
  49. S. Milborrow, Locating facial features with active shape models [Masters dissertation], Faculty of Engineering, University of Cape Town, Cape Town, South Africa, 2007.
  50. R. Herpers, G. Verghese, K. Derpains, and R. McCready, “Detection and tracking of face in real environments,” in Proceedings of the International IEEE Workshop Recognition, Analysis and Tracking of Face and Gesture in Real- Time Systems, pp. 96–104, Corfu, Greece, 1999.
  51. J. Daugman, “Face detection: a survey,” Computer Vision and Image Understanding, vol. 83, no. 3, pp. 236–274, 2001.
  52. R. C. Gonzalez and R. E. Woods, Digital Image Processing, Addison-Wesley, 2002.
  53. M. Turk and A. Pentland, “Eigenfaces for recognition,” Journal of Cognitive Neuroscience, vol. 3, no. 1, pp. 71–86, 1991. View at Scopus
  54. T. Matthew and A. Pentland, Face Recognition Using Eigenfaces, Vision and Modeling Group, The Media Laboratory, Massachusetts Institute of Technology, 1991.
  55. F. S. Samaria and A. C. Harter, “Parameterisation of a stochastic model for human face identification,” in Proceedings of the 2nd IEEE Workshop on Applications of Computer Vision, pp. 138–142, Orlando, Fla, USA, December 1994. View at Scopus
  56. R. O. Duda, P. E. Hart, and D. G. Strok, Pattern Classification, A Wiley-Interscience Publication, John Wiley & Sons, 2nd edition, 2001.
  57. V. Sarma and D. Venugopal, “Studies on pattern recognition approach to voiced-unvoiced-silence classification,” in Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '78), vol. 3, pp. 1–4, 1978.
  58. L. R. Rabiner, “Tutorial on hidden Markov models and selected applications in speech recognition,” Proceedings of the IEEE, vol. 77, no. 2, pp. 257–286, 1989. View at Publisher · View at Google Scholar · View at Scopus
  59. P. A. Devijver, “Baum's forward-backward algorithm revisited,” Pattern Recognition Letters, vol. 3, no. 6, pp. 369–373, 1985. View at Scopus
  60. A. Rogozan and P. Deléglise, “Adaptive fusion of acoustic and visual sources for automatic speech recognition,” Speech Communication, vol. 26, no. 1-2, pp. 149–161, 1998. View at Scopus
  61. J. S. Lee and C. H. Park, “Adaptive decision fusion for audio-visual speech recognition,” in Speech Recognition, Technologies and Applications, F. Mihelic and J. Zibert, Eds., p. 550, 2008.
  62. A. Adjoudant and C. Benoit, “On the integratio of auditory and visual parameters in an HMM-based ASR,” in Humans and Machines: Models, Systems, and Speech Recognition, Technologies and Applications, D. G. Strok and M. E. Hennecke, Eds., pp. 461–472, Springer, Berlin, Germany, 1996.
  63. J. A. Freeman and D. M. Skapura, Neural Networks, Algorithms, Applications and Programming Techniques, Addison-Wesley, 1991.
  64. W. C. Yau, D. K. Kumar, and S. P. Arjunan, “Visual recognition of speech consonants using facial movement features,” Integrated Computer-Aided Engineering, vol. 14, no. 1, pp. 49–61, 2007. View at Scopus
  65. M. R. Islam and M. A. Sobhan, “Improving the convergence of backpropagation learning Neural Networks Based Bangla Speaker Identification System for various weight update frequencies, momentum term and error rate,” in Proceedings of the International Conference on Computer Processing of Bangla (ICCPB '06), pp. 27–34, Independent University, 2006.
  66. A. F. Niall, A. O. Brian, and B. R. Richard, “VALID: a new practical audio-visual database, and comparative results,” in Audio- and Video-Based Biometric Person Authentication, vol. 3546 of Lecture Notes in Computer Science, pp. 201–243, 2005.