Table of Contents Author Guidelines Submit a Manuscript
Journal of Healthcare Engineering
Volume 2017, Article ID 3090343, 31 pages
https://doi.org/10.1155/2017/3090343
Review Article

A Review on Human Activity Recognition Using Vision-Based Method

1College of Information Science and Engineering, Ocean University of China, Qingdao, China
2Department of Computer Science and Technology, Tsinghua University, Beijing, China

Correspondence should be addressed to Zhen Li; moc.liamg@0310nehzil

Received 22 February 2017; Accepted 11 June 2017; Published 20 July 2017

Academic Editor: Dong S. Park

Copyright © 2017 Shugang Zhang et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Linked References

  1. R. Poppe, “A survey on vision-based human action recognition,” Image and Vision Computing, vol. 28, pp. 976–990, 2010. View at Google Scholar
  2. J. K. Aggarwal and M. S. Ryoo, “Human activity analysis: a review,” ACM Computing Surveys, vol. 43, p. 16, 2011. View at Google Scholar
  3. T. B. Moeslund, A. Hilton, and V. Krüger, “A survey of advances in vision-based human motion capture and analysis,” Computer Vision and Image Understanding, vol. 104, pp. 90–126, 2006. View at Google Scholar
  4. J. M. Chaquet, E. J. Carmona, and A. Fernández-Caballero, “A survey of video datasets for human action and activity recognition,” Computer Vision and Image Understanding, vol. 117, pp. 633–659, 2013. View at Google Scholar
  5. M. Blank, L. Gorelick, E. Shechtman, M. Irani, and R. Basri, “Actions as space-time shapes,” in Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1, pp. 1395–1402, Beijing, China, 2005.
  6. I. Laptev and T. Lindeberg, “Space-time interest points,” in Proceedings Ninth IEEE International Conference on Computer Vision, vol. 1, pp. 432–439, Nice, France, 2003. View at Publisher · View at Google Scholar
  7. L. Rabiner and B. H. Juang, Fundamentals of Speech Recognition, 1993.
  8. L. R. Rabiner and B. H. Juang, “An introduction to hidden Markov models,” IEEE ASSP Magazine, Prentice Hall, Upper Saddle River, New Jersey, vol. 3, pp. 4–16, 1986. View at Google Scholar
  9. A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep convolutional neural networks,” in Advances in Neural Information Processing Systems, pp. 1097–1105, Lake Tahoe, Nevada, 2012.
  10. A. Efros, A. C. Berg, G. Mori, and J. Malik, “Recognizing action at a distance,” in Proceedings Ninth IEEE International Conference on Computer Vision, pp. 726–733, Nice, France, 2003.
  11. C. R. Wren, A. Azarbayejani, T. Darrell, and A. P. Pentland, “Pfinder: real-time tracking of the human body,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 19, pp. 780–785, 1997. View at Google Scholar
  12. D. Koller, J. Weber, T. Huang et al., “Towards robust automatic traffic scene analysis in real-time, in: pattern recognition,” in Proceedings of 12th International Conference on Pattern Recognition, vol. 1, pp. 126–131, Jerusalem, Israel, 1994.
  13. C. Stauffer and W. E. L. Grimson, “Adaptive background mixture models for real-time tracking,” in Proceedings. 1999 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No PR00149), pp. 246–252, Fort Collins, CO, USA, 1999.
  14. A. Veeraraghavan, A. R. Chowdhury, and R. Chellappa, “Role of shape and kinematics in human movement analysis,” in Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004, pp. I–730, Washington, DC, USA, 2004.
  15. A. Veeraraghavan, R. Chellappa, and A. K. Roy-Chowdhury, “The function space of an activity,” in 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06), pp. 959–968, New York, NY, USA, 2006.
  16. A. Bobick and J. Davis, “An appearance-based representation of action,” in Proceedings of 13th International Conference on Pattern Recognition, pp. 307–312, Vienna, Austria, 1996.
  17. A. F. Bobick and J. W. Davis, “The recognition of human movement using temporal templates,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 23, pp. 257–267, 2001. View at Google Scholar
  18. N. Ikizler and P. Duygulu, “Human action recognition using distribution of oriented rectangular patches,” in Human Motion--Understanding Modelling Capture and Animation, pp. 271–284, Springer, Rio de Janeiro, Brazil, 2007.
  19. G. Xu and F. Huang, “Viewpoint insensitive action recognition using envelop shape,” in Computer Vision--Asian Conference on Computer Vision 2007, pp. 477–486, Springer, Tokyo, Japan, 2007.
  20. D. Weinland, R. Ronfard, and E. Boyer, “Free viewpoint action recognition using motion history volumes,” Computer Vision and Image Understanding, vol. 104, pp. 249–257, 2006. View at Google Scholar
  21. S. Cherla, K. Kulkarni, A. Kale, and V. Ramasubramanian, “Towards fast, view-invariant human action recognition,” in 2008 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, pp. 1–8, Anchorage, AK, USA, 2008.
  22. B. D. Lucas and T. Kanade, “An iterative image registration technique with an application to stereo vision,” in IJCAI'81 Proceedings of the 7th international joint conference on Artificial intelligence - Volume 2, pp. 674–679, Vancouver, British Columbia, Canada, 1981. View at Publisher · View at Google Scholar · View at Scopus
  23. J. Shi and C. Tomasi, “Good features to track,” in 1994 Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp. 593–600, Seattle, WA, USA, 1994.
  24. X. Lu, Q. Liu, and S. Oe, “Recognizing non-rigid human actions using joints tracking in space-time,” in International Conference on Information Technology: Coding and Computing, 2004. Proceedings. ITCC 2004, pp. 620–624, Las Vegas, NV, USA, 2004.
  25. D. Tran and A. Sorokin, “Human activity recognition with metric learning,” in ECCV '08 Proceedings of the 10th European Conference on Computer Vision: Part I, pp. 548–561, Springer, Amsterdam, The Netherlands, 2008.
  26. C. Achard, X. Qu, A. Mokhber, and M. Milgram, “A novel approach for recognition of human actions with semi-global features,” Machine Vision and Applications, vol. 19, pp. 27–34, 2008. View at Google Scholar
  27. E. Shechtman and M. Irani, “Space-time behavior-based correlation-or-how to tell if two underlying motion fields are similar without computing them?” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 29, pp. 2045–2056, 2007. View at Google Scholar
  28. Y. Ke, R. Sukthankar, and M. Hebert, “Spatio-temporal shape and flow correlation for action recognition,” in 2007 IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–8, Minneapolis, MN, USA, 2007.
  29. S. Kumari and S. K. Mitra, “Human action recognition using DFT,” in 2011 Third National Conference on Computer Vision, Pattern Recognition, Image Processing and Graphics, pp. 239–242, Hubli, Karnataka, India, 2011. View at Publisher · View at Google Scholar · View at Scopus
  30. A. Klaser, M. Marszalek, and C. Schmid, “A spatio-temporal descriptor based on 3d-gradients,” in Conference: Proceedings of the British Machine Vision Conference 2008, pp. 271–275, Leeds, United Kingdom, 2008.
  31. X. Peng, L. Wang, X. Wang, and Y. Qiao, “Bag of Visual Words and Fusion Methods for Action Recognition: Comprehensive Study and Good Practice, arXiv Prepr,” Computer Vision and Image Understanding, Elsevier, Amsterdam, The Netherlands, vol. 150, pp. 109–125, 2016, http://arxiv.org/abs/1405.4506. View at Google Scholar
  32. X. Peng, C. Zou, Y. Qiao, and Q. Peng, “Action recognition with stacked fisher vectors,” in Computer Vision--Asian Conference on Computer Vision--ECCV 2014, pp. 581–595, Springer, Zurich. View at Google Scholar
  33. Z. Lan, M. Lin, X. Li, A. G. Hauptmann, and B. Raj, “Beyond gaussian pyramid: multi-skip feature stacking for action recognition,” in 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 204–212, Boston, MA, USA, 2015.
  34. C. Harris and M. Stephens, “A combined corner and edge detector,” in Alvey Vision Conference, p. 50, Manchester, UK, 1988.
  35. T. Kadir and M. Brady, “Saliency, scale and image description,” International Journal of Computer Vision, vol. 45, pp. 83–105, 2001. View at Google Scholar
  36. A. Oikonomopoulos, I. Patras, and M. Pantic, “Spatiotemporal salient points for visual recognition of human actions,” IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics), vol. 36, no. 3, pp. 710–719, 2005. View at Google Scholar
  37. P. Dollár, V. Rabaud, G. Cottrell, and S. Belongie, “Behavior recognition via sparse spatio-temporal features,” in 2005 IEEE International Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance, pp. 65–72, Beijing, China, 2005.
  38. Y. Ke, R. Sukthankar, and M. Hebert, “Efficient visual event detection using volumetric features,” in Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1, pp. 166–173, Beijing, China, 2005.
  39. G. Willems, T. Tuytelaars, and L. V. Gool, “An efficient dense and scale-invariant spatio-temporal interest point detector,” in Computer Vision—European Conference on Computer Vision 2008, pp. 650–663, Springer, Marseille, France, 2008.
  40. P. Viola and M. Jones, “Rapid object detection using a boosted cascade of simple features,” in Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001, pp. 1–511, Kauai, HI, USA, 2001.
  41. I. Laptev, “On space-time interest points,” International Journal of Computer Vision, vol. 64, pp. 107–123, 2005. View at Google Scholar
  42. O. Oshin, A. Gilbert, J. Illingworth, and R. Bowden, “Spatio-temporal feature recogntion using randomised ferns,” in The 1st International Workshop on Machine Learning for Vision-based Motion Analysis-MLVMA'08, pp. 1–12, Marseille, France, 2008.
  43. M. Ozuysal, P. Fua, and V. Lepetit, “Fast keypoint recognition in ten lines of code,” in 2007 IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–8, Minneapolis, MN, USA, 2007.
  44. D. G. Lowe, “Object recognition from local scale-invariant features,” in Proceedings of the Seventh IEEE International Conference on Computer Vision, pp. 1150–1157, Kerkyra, Greece, 1999.
  45. D. G. Lowe, “Distinctive image features from scale-invariant keypoints,” International Journal of Computer Vision, vol. 60, pp. 91–110, 2004. View at Publisher · View at Google Scholar · View at Scopus
  46. P. Scovanner, S. Ali, and M. Shah, “A 3-dimensional sift descriptor and its application to action recognition,” in Proceedings of the 15th ACM international conference on Multimedia, pp. 357–360, Augsburg, Germany, 2007.
  47. H. Bay, A. Ess, T. Tuytelaars, and L. V. Gool, “Speeded-up robust features (SURF),” Computer Vision and Image Understanding, vol. 110, pp. 346–359, 2008. View at Google Scholar
  48. N. Dalal and B. Triggs, “Histograms of oriented gradients for human detection,” in 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05), pp. 886–893, San Diego, CA, USA, 2005.
  49. W. L. Lu and J. J. Little, “Simultaneous tracking and action recognition using the pca-hog descriptor,” in The 3rd Canadian Conference on Computer and Robot Vision (CRV'06), p. 6, Quebec, Canada, 2006.
  50. I. Laptev, M. Marszalek, C. Schmid, and B. Rozenfeld, “Learning realistic human actions from movies,” in 2008 IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–8, Anchorage, AK, USA, 2008.
  51. H. Wang, A. Kläser, C. Schmid, and C. L. Liu, “Action recognition by dense trajectories,” in 2011 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3169–3176, Colorado Springs, CO, USA, 2011.
  52. G. Farnebäck, “Two-frame motion estimation based on polynomial expansion,” in Image Analysis, pp. 363–370, Springer, Halmstad, Sweden, 2003.
  53. H. Wang and C. Schmid, “Action recognition with improved trajectories,” in 2013 IEEE International Conference on Computer Vision, pp. 3551–3558, Sydney, Australia, 2013.
  54. M. A. Fischler and R. C. Bolles, “Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography,” Communications of the ACM, vol. 24, pp. 381–395, 1981. View at Google Scholar
  55. Y. Shi, Y. Tian, Y. Wang, and T. Huang, “Sequential deep trajectory descriptor for action recognition with three-stream CNN,” IEEE Transactions on Multimedia, vol. 19, no. 7, pp. 1510–1520, 2017. View at Google Scholar
  56. X. Wang, L. Wang, and Y. Qiao, “A comparative study of encoding, pooling and normalization methods for action recognition,” in Computer Vision--Asian Conference on Computer Vision 2012, pp. 572–585, Springer, Sydney, Australia, 2012.
  57. K. Chatfield, V. S. Lempitsky, A. Vedaldi, and A. Zisserman, “The devil is in the details: an evaluation of recent feature encoding methods,” in British Machine Vision Conference, p. 8, University of Dundee, 2011.
  58. X. Zhen and L. Shao, “Action recognition via spatio-temporal local features: a comprehensive study,” Image and Vision Computing, vol. 50, pp. 1–13, 2016. View at Google Scholar
  59. J. Sivic and A. Zisserman, “Video Google: a text retrieval approach to object matching in videos,” in Proceedings Ninth IEEE International Conference on Computer Vision, pp. 1470–1477, Nice, France, 2003.
  60. J. C. V. Gemert, J. M. Geusebroek, C. J. Veenman, and A. W. M. Smeulders, “Kernel codebooks for scene categorization,” in Computer Vision--European Conference on Computer 2008, pp. 696–709, Springer, Marseille, France, 2008.
  61. J. C. V. Gemert, C. J. Veenman, A. W. M. Smeulders, and J. M. Geusebroek, “Visual word ambiguity,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 32, pp. 1271–1283, 2010. View at Google Scholar
  62. J. Yang, K. Yu, Y. Gong, and T. Huang, “Linear spatial pyramid matching using sparse coding for image classification,” in 2009 IEEE Conference on Computer Vision and Pattern Recognition, pp. 1794–1801, Miami, FL, USA, 2009.
  63. K. Yu, T. Zhang, and Y. Gong, “Nonlinear learning using local coordinate coding,” in Advance Neural Information Processing Systems, pp. 2223–2231, Vancouver, British Columbia, Canada, 2009.
  64. J. Wang, J. Yang, K. Yu, F. Lv, T. Huang, and Y. Gong, “Locality-constrained linear coding for image classification,” in 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 3360–3367, San Francisco, CA, USA, 2010.
  65. F. Perronnin, J. Sánchez, and T. Mensink, “Improving the fisher kernel for large-scale image classification,” in Computer Vision--European Conference on Computer Vision 2010, pp. 143–156, Springer, Heraklion, Crete, Greece, 2010.
  66. A. Coates, A. Y. Ng, and H. Lee, “An analysis of single-layer networks in unsupervised feature learning,” in Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics, pp. 215–223, Fort Lauderdale, USA, 2011.
  67. H. Jégou, M. Douze, C. Schmid, and P. Pérez, “Aggregating local descriptors into a compact image representation,” in 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 3304–3311, San Francisco, CA, USA, 2010.
  68. H. Jégou, F. Perronnin, M. Douze, J. Sanchez, P. Perez, and C. Schmid, “Aggregating local image descriptors into compact codes,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 34, pp. 1704–1716, 2012. View at Google Scholar
  69. X. Zhou, K. Yu, T. Zhang, and T. S. Huang, “Image classification using super-vector coding of local image descriptors,” in Computer Vision--European Conference on Computer Vision 2010, pp. 141–154, Springer, Heraklion, Crete, Greece, 2010.
  70. K. Yu and T. Zhang, “Improved local coordinate coding using local tangents,” in Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 1215–1222, Haifa, Israel, 2010.
  71. L. Liu, L. Wang, and X. Liu, “In defense of soft-assignment coding,” in 2011 International Conference on Computer Vision, pp. 2486–2493, Barcelona, Spain, 2011.
  72. Y. Huang, K. Huang, Y. Yu, and T. Tan, “Salient coding for image classification,” in Computer Vision and Pattern Recognition (CVPR), 2011 IEEE Conference, pp. 1753–1760, Colorado Springs, CO, USA, 2011.
  73. Z. Wu, Y. Huang, L. Wang, and T. Tan, “Group encoding of local features in image classification,” in Proceedings of the 21st International Conference on Pattern Recognition (ICPR2012), pp. 1505–1508, Tsukuba, Japan, 2012.
  74. O. Oreifej and Z. Liu, “Hon4d: histogram of oriented 4d normals for activity recognition from depth sequences,” in 2013 IEEE Conference on Computer Vision and Pattern Recognition, pp. 716–723, Portland, OR, USA, 2013.
  75. J. Shotton, T. Sharp, A. Kipman et al., “Real-time human pose recognition in parts from single depth images,” Communications of the ACM, vol. 56, pp. 116–124, 2013. View at Publisher · View at Google Scholar · View at Scopus
  76. W. Li, Z. Zhang, and Z. Liu, “Action recognition based on a bag of 3d points,” in 2010 IEEE Computer Society Conference on Computer Vision and Pattern, pp. 9–14, San Francisco, CA, USA, 2010.
  77. Y. Zhao, Z. Liu, L. Yang, and H. Cheng, “Combing rgb and depth map features for human activity recognition,” in Proceedings of The 2012 Asia Pacific Signal and Information Processing Association Annual Summit and Conference, pp. 1–4, Hollywood, CA, USA, 2012.
  78. X. Yang, C. Zhang, and Y. Tian, “Recognizing actions using depth motion maps-based histograms of oriented gradients,” in Proceedings of the 20th ACM International Conference on Multimedia, pp. 1057–1060, Nara, Japan, 2012.
  79. X. Yang and Y. Tian, “Super normal vector for human activity recognition with depth cameras,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 39, pp. 1028–1039, 2017. View at Google Scholar
  80. A. Jalal, S. Kamal, and D. Kim, “A depth video-based human detection and activity recognition using multi-features and embedded hidden Markov models for health care monitoring systems,” International Journal of Interactive Multimedia and Artificial Inteligence, vol. 4, no. 4, p. 54, 2017. View at Google Scholar
  81. H. Chen, G. Wang, J. H. Xue, and L. He, “A novel hierarchical framework for human action recognition,” Pattern Recognition, vol. 55, pp. 148–159, 2016. View at Publisher · View at Google Scholar · View at Scopus
  82. J. Wang, Z. Liu, Y. Wu, and J. Yuan, “Learning actionlet ensemble for 3D human action recognition,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 36, pp. 914–927, 2014. View at Google Scholar
  83. A. Shahroudy, T. T. Ng, Q. Yang, and G. Wang, “Multimodal multipart learning for action recognition in depth videos,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 38, no. 10, pp. 2123–2129, 2016. View at Google Scholar
  84. L. Xia, C. C. Chen, and J. K. Aggarwal, “View invariant human action recognition using histograms of 3d joints,” in 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, pp. 20–27, Providence, RI, USA, 2012.
  85. X. Yang and Y. Tian, “Eigenjoints-based action recognition using naive-bayes-nearest-neighbor,” in 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, pp. 14–19, Providence, RI, USA, 2012.
  86. O. Boiman, E. Shechtman, and M. Irani, “In defense of nearest-neighbor based image classification,” in 2008 IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–8, Anchorage, AK, USA, 2008.
  87. W. Zhu, C. Lan, J. Xing et al., Co-Occurrence Feature Learning for Skeleton Based Action Recognition Using Regularized Deep Lstm Networks, arXiv Prepr, AAAI, Phoenix, Arizona, USA, vol. 2, p. 8, 2016, http://arxiv.org/abs/1603.07772.
  88. A. Chaaraoui, J. Padilla-Lopez, and F. Flórez-Revuelta, “Fusion of skeletal and silhouette-based features for human action recognition with rgb-d devices,” in 2013 IEEE International Conference on Computer Vision Workshops, pp. 91–97, Sydney, Australia, 2013.
  89. A. Jordan, “On discriminative vs. generative classifiers: a comparison of logistic regression and naive bayes,” Advances in neural information processing systems, vol. 14, p. 841, 2002. View at Google Scholar
  90. E. Shechtman and M. Irani, “Space-time behavior based correlation,” in 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05), pp. 405–412, San Diego, CA, USA, 2005.
  91. M. D. Rodriguez, J. Ahmed, and M. Shah, “Action mach a spatio-temporal maximum average correlation height filter for action recognition,” in 2008 IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–8, Anchorage, AK, USA, 2008.
  92. T. Darrell and A. Pentland, “Space-time gestures,” in Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp. 335–340, New York, NY, USA, 1993.
  93. A. Veeraraghavan, A. K. Roy-Chowdhury, and R. Chellappa, “Matching shape sequences in video with applications in human movement analysis,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 27, pp. 1896–1909, 2005. View at Google Scholar
  94. J. Yamato, J. Ohya, and K. Ishii, “Recognizing human action in time-sequential images using hidden markov model,” in Proceedings 1992 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 379–385, Champaign, IL, USA, 1992.
  95. P. Natarajan and R. Nevatia, “Online, real-time tracking and recognition of human actions,” in 2008 IEEE Workshop on Motion and Video Computing, pp. 1–8, Copper Mountain, CO, USA, 2008.
  96. S. Hongeng and R. Nevatia, “Large-scale event detection using semi-hidden markov models,” in Proceedings Ninth IEEE International Conference on Computer Vision, pp. 1455–1462, Nice, France, 2003.
  97. N. M. Oliver, B. Rosario, and A. P. Pentland, “A Bayesian computer vision system for modeling human interactions,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 22, pp. 831–843, 2000. View at Google Scholar
  98. T. V. Duong, H. H. Bui, D. Q. Phung, and S. Venkatesh, “Activity recognition and abnormality detection with the switching hidden semi-markov model,” in 2005 IEEE Computer Society Conference on Computer Vision and Pattern, pp. 838–845, San Diego, CA, USA, 2005.
  99. P. Ramesh and J. G. Wilpon, “Modeling state durations in hidden Markov models for automatic speech recognition,” in [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing, pp. 381–384, San Francisco, CA, USA, 1992.
  100. Y. Luo, T. D. Wu, and J. N. Hwang, “Object-based analysis and interpretation of human motion in sports video sequences by dynamic Bayesian networks,” Computer Vision and Image Understanding, vol. 92, pp. 196–216, 2003. View at Google Scholar
  101. H. I. Suk, B. K. Sin, and S. W. Lee, “Hand gesture recognition based on dynamic Bayesian network framework,” Pattern Recognition, vol. 43, pp. 3059–3072, 2010. View at Publisher · View at Google Scholar · View at Scopus
  102. M. Brand, N. Oliver, and A. Pentland, “Coupled hidden Markov models for complex action recognition,” in Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 994–999, San Juan, Puerto Rico, USA, 1997.
  103. S. Park and J. K. Aggarwal, “A hierarchical Bayesian network for event recognition of human actions and interactions,” Multimedia System, vol. 10, pp. 164–179, 2004. View at Publisher · View at Google Scholar · View at Scopus
  104. V. Vapnik, S. E. Golowich, and A. Smola, “On discriminative vs. generative classifiers: a comparison of logistic regression and naive bayes,” in Advances in Neural Information Processing Systems, vol. 9, pp. 841–848, Vancouver, British Columbia, Canada, 1996.
  105. C. Schüldt, I. Laptev, and B. Caputo, “Recognizing human actions: a local SVM approach,” in Pattern Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004, pp. 32–36, Cambridge, UK, 2004.
  106. D. L. Vail, M. M. Veloso, and J. D. Lafferty, “Conditional random fields for activity recognition,” in Proceedings of the 6th International Joint Conference on Autonomous Agents and Multiagent Systems, p. 235, Honolulu, Hawaii, 2007.
  107. P. Natarajan and R. Nevatia, “View and scale invariant action recognition using multiview shape-flow models,” in 2008 IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–8, Anchorage, AK, USA, 2008.
  108. H. Ning, W. Xu, Y. Gong, and T. Huang, “Latent pose estimator for continuous action recognition,” in Computer Vision--European Conference on Computer Vision 2008, pp. 419–433, Springer, Marseille, France, 2008.
  109. L. Sigal and M. J. Black, Humaneva: Synchronized Video and Motion Capture Dataset for Evaluation of Articulated Human Motion, Brown Univertsity TR, p. 120, 2006.
  110. S. Min, B. Lee, and S. Yoon, “Deep learning in bioinformatics,” Briefings in Bioinformatics, vol. 17, 2016. View at Google Scholar
  111. Y. LeCun, Y. Bengio, and G. Hinton, “Deep learning,” Nature, vol. 521, pp. 436–444, 2015. View at Publisher · View at Google Scholar · View at Scopus
  112. G. Luo, S. Dong, K. Wang, and H. Zhang, “Cardiac left ventricular volumes prediction method based on atlas location and deep learning,” in 2016 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), pp. 1604–1610, Shenzhen, China, 2016.
  113. L. Mo, F. Li, Y. Zhu, and A. Huang, “Human physical activity recognition based on computer vision with deep learning model,” in 2016 IEEE International Instrumentation and Measurement Technology Conference Proceedings, pp. 1–6, Taipei, Taiwan, 2016.
  114. P. Wang, W. Li, Z. Gao, J. Zhang, C. Tang, and P. O. Ogunbona, “Action recognition from depth maps using deep convolutional neural networks,” IEEE Transactions on Human-Machine Systems, vol. 46, no. 4, pp. 498–509, 2016. View at Google Scholar
  115. K. Simonyan and A. Zisserman, “Two-stream convolutional networks for action recognition in videos,” in Advances in Neural Information Processing Systems, pp. 568–576, Montreal, Quebec, Canada, 2014.
  116. Y. Li, W. Li, V. Mahadevan, and N. Vasconcelos, “Vlad3: encoding dynamics of deep features for action recognition,” in 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1951–1960, Las Vegas, NV, USA, 2016.
  117. L. Wang, Y. Qiao, and X. Tang, “Action recognition with trajectory-pooled deep-convolutional descriptors,” in 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4305–4314, Boston, MA, USA, 2015.
  118. S. Ji, W. Xu, M. Yang, and K. Yu, “3D convolutional neural networks for human action recognition,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 35, pp. 221–231, 2013. View at Google Scholar
  119. S. J. Berlin and M. John, “Human interaction recognition through deep learning network,” in 2016 IEEE International Carnahan Conference on Security Technology (ICCST), pp. 1–4, Orlando, FL, USA, 2016.
  120. Z. Huang, C. Wan, T. Probst, and L. V. Gool, Deep Learning on lie Groups for Skeleton-Based Action Recognition, arXiv Prepr, Cornell University Library, Ithaca, NY, USA, 2016, http://arxiv.org/abs/1612.05877.
  121. R. Kiros, Y. Zhu, R. R. Salakhutdinov et al., “Skip-thought vectors,” in Advances in Neural Information Processing Systems, pp. 3294–3302, Montreal, Quebec, Canada, 2015.
  122. J. Li, M.-T. Luong, and D. Jurafsky, A Hierarchical Neural Autoencoder for Paragraphs and Documents, arXiv Prepr, Cornell University Library, Ithaca, NY, USA, 2015, http://arxiv.org/abs/1506.01057.
  123. A. Grushin, D. D. Monner, J. A. Reggia, and A. Mishra, “Robust human action recognition via long short-term memory,” in The 2013 International Joint Conference on Neural Networks (IJCNN), pp. 1–8, Dallas, TX, USA, 2013.
  124. M. Baccouche, F. Mamalet, C. Wolf, C. Garcia, and A. Baskurt, “Action classification in soccer videos with long short-term memory recurrent neural networks,” in International Conference Artificial Neural Networks, pp. 154–159, Thessaloniki, Greece, 2010.
  125. V. Veeriah, N. Zhuang, and G. J. Qi, “Differential recurrent neural networks for action recognition,” in 2015 IEEE International Conference on Computer Vision (ICCV), pp. 4041–4049, Santiago, Chile, 2015.
  126. Y. Du, W. Wang, and L. Wang, “Hierarchical recurrent neural network for skeleton based action recognition,” in 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1110–1118, Boston, MA, USA, 2015. View at Publisher · View at Google Scholar · View at Scopus
  127. R. Arroyo, J. J. Yebes, L. M. Bergasa, I. G. Daza, and J. Almazán, “Expert video-surveillance system for real-time detection of suspicious behaviors in shopping malls,” Expert Systems with Applications, vol. 42, pp. 7991–8005, 2015. View at Google Scholar
  128. G. Welch and G. Bishop, An Introduction to the Kalman Filter, University of North Carolina at Chapel Hill Chapel Hill NC, Chapel Hill, North Carolina, USA, 1995.
  129. N. J. Gordon, D. J. Salmond, and A. F. M. Smith, “Novel approach to nonlinear/non-Gaussian Bayesian state estimation,” in IEE Proceedings F - Radar and Signal Processing, pp. 107–113, London, UK, 1993.
  130. H. Zhou, M. Fei, A. Sadka, Y. Zhang, and X. Li, “Adaptive fusion of particle filtering and spatio-temporal motion energy for human tracking,” Pattern Recognition, vol. 47, pp. 3552–3567, 2014. View at Google Scholar
  131. A. A. Vijay and A. K. Johnson, “An integrated system for tracking and recognition using Kalman filter,” in 2014 International Conference on Control, Instrumentation, Communication and Computational Technologies (ICCICCT), pp. 1065–1069, Kanyakumari, India, 2014.
  132. P. Sarkar, Sequential Monte Carlo Methods in Practice, Taylor & Francis, Oxfordshire, UK, 2003.
  133. I. Ali and M. N. Dailey, “Multiple human tracking in high-density crowds,” Image and Vision Computing, vol. 30, pp. 966–977, 2012. View at Publisher · View at Google Scholar · View at Scopus
  134. P. Feng, W. Wang, S. Dlay, S. M. Naqvi, and J. Chambers, “Social force model-based MCMC-OCSVM particle PHD filter for multiple human tracking,” IEEE Transactions on Multimedia, vol. 19, pp. 725–739, 2017. View at Google Scholar
  135. P. Feng, W. Wang, S. M. Naqvi, and J. Chambers, “Adaptive retrodiction particle PHD filter for multiple human tracking,” IEEE Signal Processing Letters, vol. 23, pp. 1592–1596, 2016. View at Publisher · View at Google Scholar · View at Scopus
  136. P. Feng, W. Wang, S. M. Naqvi, S. Dlay, and J. A. Chambers, “Social force model aided robust particle PHD filter for multiple human tracking,” in 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4398–4402, Shanghai, China, 2016.
  137. P. Feng, M. Yu, S. M. Naqvi, W. Wang, and J. A. Chambers, “A robust student’s-t distribution PHD filter with OCSVM updating for multiple human tracking,” in 2015 23rd European Signal Processing Conference (EUSIPCO), pp. 2396–2400, Nice, France, 2015. View at Publisher · View at Google Scholar · View at Scopus
  138. P. Feng, Enhanced Particle PHD Filtering for Multiple Human Tracking, School of Electrical and Electronic Engineering, Newcastle University, Newcastle University, Newcastle upon Tyne, UK, 2016.
  139. D. Comaniciu, V. Ramesh, and P. Meer, “Kernel-based object tracking,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 25, pp. 564–577, 2003. View at Google Scholar
  140. D. Comaniciu and P. Meer, “Mean shift: a robust approach toward feature space analysis,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 24, pp. 603–619, 2002. View at Google Scholar
  141. L. Hou, W. Wan, K. Han, R. Muhammad, and M. Yang, “Human detection and tracking over camera networks: a review,” in 2016 International Conference on Audio, Language and Image Processing (ICALIP), pp. 574–580, Shanghai, China, 2016.
  142. C. Liu, C. Hu, and J. K. Aggarwal, “Eigenshape kernel based mean shift for human tracking,” in 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops), pp. 1809–1816, Barcelona, Spain, 2011. View at Publisher · View at Google Scholar · View at Scopus
  143. A. Yilmaz, “Object tracking by asymmetric kernel mean shift with automatic scale and orientation selection,” in 2007 IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–6, Minneapolis, MN, USA, 2007.
  144. D. Yuan-ming, W. Wei, L. Yi-ning, and Z. Guo-xuan, “Enhanced mean shift tracking algorithm based on evolutive asymmetric kernel,” in 2011 International Conference on Multimedia Technology, pp. 5394–5398, Hangzhou, China, 2011.
  145. C. Liu, Y. Wang, and S. Gao, “Adaptive shape kernel-based mean shift tracker in robot vision system,” Computational Intelligence and Neuroscience, vol. 2016, Article ID 6040232, 8 pages, 2016. View at Publisher · View at Google Scholar · View at Scopus
  146. K. H. Lee, J. N. Hwang, G. Okopal, and J. Pitton, “Ground-moving-platform-based human tracking using visual SLAM and constrained multiple kernels,” IEEE Transactions on Intelligent Transportation Systems, vol. 17, pp. 3602–3612, 2016. View at Publisher · View at Google Scholar · View at Scopus
  147. Z. Tang, J. N. Hwang, Y. S. Lin, and J. H. Chuang, “Multiple-kernel adaptive segmentation and tracking (MAST) for robust object tracking,” in 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 1115–1119, Shanghai, China, 2016.
  148. C. T. Chu, J. N. Hwang, H. I. Pai, and K. M. Lan, “Tracking human under occlusion based on adaptive multiple kernels with projected gradients,” IEEE Transactions on Multimedia, vol. 15, pp. 1602–1615, 2013. View at Publisher · View at Google Scholar · View at Scopus
  149. L. Hou, W. Wan, K. H. Lee, J. N. Hwang, G. Okopal, and J. Pitton, “Robust human tracking based on DPM constrained multiple-kernel from a moving camera,” Journal of Signal Processing Systems, vol. 86, pp. 27–39, 2017. View at Google Scholar
  150. S. Mitra and T. Acharya, “Gesture recognition: a survey,” IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews), vol. 37, pp. 311–324, 2007. View at Google Scholar
  151. S. Mulroy, J. Gronley, W. Weiss, C. Newsam, and J. Perry, “Use of cluster analysis for gait pattern classification of patients in the early and late recovery phases following stroke,” Gait & Posture, vol. 18, pp. 114–125, 2003. View at Google Scholar
  152. Z. Ren, J. Yuan, J. Meng, and Z. Zhang, “Robust part-based hand gesture recognition using kinect sensor,” IEEE Transactions on Multimedia, vol. 15, pp. 1110–1120, 2013. View at Google Scholar
  153. S. Fothergill, H. Mentis, P. Kohli, and S. Nowozin, “Instructing people for training gestural interactive systems,” in Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, pp. 1737–1746, Austin, Texas, USA, 2012.
  154. P. J. Phillipsl, I. R. Sudecp Sarkari, P. Grotherl, and K. Bowyer, The Gait Identification Challenge Problem: Data Sets and Baseline Algorithm, IEEE, Quebec City, Quebec, Canada, 2002.
  155. R. T. Collins, R. Gross, and J. Shi, “Silhouette-based human identification from body shape and gait,” in Proceedings of Fifth IEEE International Conference on Automatic Face Gesture Recognition, pp. 366–371, Washington, DC, USA, 2002.
  156. K. Okuma, A. Taleghani, N. D. Freitas, J. J. Little, and D. G. Lowe, “A boosted particle filter: multitarget detection and tracking,” in European Conference Computer Vision, pp. 28–39, Prague, Czech Republic, 2004.
  157. V. D. Shet, V. S. N. Prasad, A. M. Elgammal, Y. Yacoob, and L. S. Davis, “Multi-cue exemplar-based nonparametric model for gesture recognition,” in Proceedings of the Seventh Indian Conference on Computer Vision, Graphics and Image Processing, pp. 656–662, Kolkata, India, 2004.
  158. M. Marszalek, I. Laptev, and C. Schmid, “Actions in context,” in 2009 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2009, pp. 2929–2936, Miami, FL, USA, 2009.
  159. L. Ballan, M. Bertini, A. D. Bimbo, and G. Serra, “Action categorization in soccer videos using string kernels,” in 2009 Seventh International Workshop on Content-Based Multimedia Indexing, pp. 13–18, Chania, Crete, Greece, 2009.
  160. J. Wang, Z. Liu, Y. Wu, and J. Yuan, “Mining actionlet ensemble for action recognition with depth cameras,” in 2012 IEEE Conference on Computer Vision and Pattern Recognition, pp. 1290–1297, Providence, RI, USA, 2012.
  161. J. C. Niebles, C.-W. Chen, and L. Fei-Fei, “Modeling temporal structure of decomposable motion segments for activity classification,” in European Conference on Computer Vision, pp. 392–405, Heraklion, Crete, Greece, 2010.
  162. K. K. Reddy and M. Shah, “Recognizing 50 human action categories of web videos,” Machine Vision and Applications, vol. 24, pp. 971–981, 2013. View at Google Scholar
  163. B. Ni, G. Wang, and P. Moulin, “Rgbd-hudaact: a color-depth video database for human daily activity recognition,” in 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops), pp. 193–208, Springer, Barcelona, Spain, 2013.
  164. J. Sung, C. Ponce, B. Selman, and A. Saxena, “Unstructured human activity detection from rgbd images,” in 2012 IEEE International Conference on Robotics and Automation, pp. 842–849, Saint Paul, MN, USA, 2012.
  165. P. Wang, W. Li, Z. Gao, C. Tang, J. Zhang, and P. Ogunbona, “ConvNets-based action recognition from depth maps through virtual cameras and Pseudocoloring,” in Proceedings of the 23rd ACM International Conference on Multimedia, pp. 1119–1122, Brisbane, Australia, 2015.
  166. F. Ofli, R. Chaudhry, G. Kurillo, R. Vidal, and R. Bajcsy, “Berkeley MHAD: a comprehensive multimodal human action database,” in 2013 IEEE Workshop on Applications of Computer Vision (WACV), pp. 53–60, Tampa, FL, USA, 2013.
  167. M. Müller, T. Röder, M. Clausen, B. Eberhardt, B. Krüger, and A. Weber, Documentation Mocap Database hdm05, Universit¨at Bonn, D-53117 Bonn, Germany, 2007.
  168. K. Yun, J. Honorio, D. Chattopadhyay, T. L. Berg, and D. Samaras, “Two-person interaction detection using body-pose features and multiple instance learning,” in 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, pp. 28–35, Providence, RI, USA, 2012.
  169. M. S. Ryoo and J. K. Aggarwal, “UT-interaction dataset, ICPR contest on semantic description of human activities (SDHA),” in IEEE International Conference on Pattern Recognition Workshops, p. 4, Istanbul, Turkey, 2010.
  170. V. Bloom, D. Makris, and V. Argyriou, “G3d: a gaming action dataset and real time action recognition evaluation framework,” in 2012 IEEE Computer Society Conference on Computer Vision and Pattern, pp. 7–12, Providence, RI, USA, 2012.
  171. A. Shahroudy, J. Liu, T. T. Ng, and G. Wang, “NTU RGB+ D: a large scale dataset for 3D human activity analysis,” in 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1010–1019, Las Vegas, NV, USA, 2016.
  172. K. Soomro, A. R. Zamir, and M. Shah, UCF101: A Dataset of 101 Human Actions Classes from Videos in the Wild, arXiv Prepr, Cornell University Library, Ithaca, NY, USA, 2012, http://arxiv.org/abs/1212.0402.
  173. J. Wang, Z. Liu, J. Chorowski, Z. Chen, and Y. Wu, “Robust 3d action recognition with random occupancy patterns,” in Computer Vision - ECCV 2012, pp. 872–885, Springer, Florence, Italy, 2012.
  174. A. Jalal, S. Kamal, and D. Kim, “Individual detection-tracking-recognition using depth activity images,” in 2015 12th International Conference on Ubiquitous Robots and Ambient Intelligence (URAI), pp. 450–455, Goyang, South Korea, 2015.