Enabling Technologies for Smart Mobile ServicesView this Special Issue
Research Article | Open Access
Arm Motion Recognition and Exercise Coaching System for Remote Interaction
Arm motion recognition and its related applications have become a promising human computer interaction modal due to the rapid integration of numerical sensors in modern mobile-phones. We implement a mobile-phone-based arm motion recognition and exercise coaching system that can help people carrying mobile-phones to do body exercising anywhere at any time, especially for the persons that have very limited spare time and are constantly traveling across cities. We first design improved k-means algorithm to cluster the collecting 3-axis acceleration and gyroscope data of person actions into basic motions. A learning method based on Hidden Markov Model is then designed to classify and recognize continuous arm motions of both learners and coaches, which also measures the action similarities between the persons. We implement the system on MIUI 2S mobile-phone and evaluate the system performance and its accuracy of recognition.
Human arms’ motions play an important role, not only in manipulating objects, but also in interacting with other people [1, 2], and are commonly used as interaction approaches of daily communication . Arm motion, combining with gesture recognition thereby, is extensively used in many scenarios, such as computer game, machinery control, and thorough mouse replacement . Therefore, having the ability to recognize arm motion by smart devices can greatly help people to promote life quality and create many possible interactive applications, such as remote exercise coaching.
Due to the low-cost and outstanding wireless sensing and communication capabilities () of modern sensors and smartphones, utilizing the wireless communication resources in smart mobile-phones to implement the recognition of arm motion and remote human interaction has become an increasing trend in mobile pervasive computing, which provides a light and flexible interaction platform for remote person-to-person interaction, especially for those busy persons with very limited spare time.
Arm motion recognition through sensor-based and vision-based techniques has been widely studied. For the sensors, they are accelerometers, gyroscopes, RFID transmitters, wireless WiFi, and Bluetooth modules. The vision-based methods [6, 7] are used to obtain the arm motion images through cameras, extract the characters, and analyze the actions performed by persons. Hidden Markov Models (HMMs) and its variants [3, 8, 9] and Dynamic Bayesian Network  algorithms are used to achieve the recognition accuracy. Several commercial tools and systems were also implemented for arm motion and gesture recognition, such as Xbox, Kinect, Gesture Watch , RisQ , and LVQ , which often use expensive devices (e.g., hand-worn wristwatch, wristband, and data glove) while paying little attention to their extensive applicability.
In general, existing arm motion recognition methods suffer from the following limits. First, for the sensor-based and vision-based applications/methods, extra infrastructures such as RFID transmitters and stereoscopic cameras are required to be deployed around the surrounding environment, which needs extra hardware expense and makes the systems not applicable to the persons frequently traveling between cities who need to exercise anywhere at any time with light portable devices. Second, current systems and methods mainly focus on single arm motion recognition while paying little attention to identifying continuous actions. Third, current arm motion recognition methods assume that the interaction between persons is only performed locally and they neglect considering the methods of facilitating remote person-to-person interaction between persons in different places (which may be very helpful to those persons frequently traveling across different cities).
To address the above limits, without the need of any expensive extra devices, we have implemented a light mobile-phone-based arm motion recognition system called AMRECS in 3D environments which is flexible and can be used by the persons constantly traveling across cities. We illustrate the system in Figure 1. Suppose that Alice is a Yoga coach, who has set up a training class in Hangzhou city, and Bob is one of her learners. Now, Bob is on his business trip in Shanghai city and at the same time Alice is teaching other students in Hangzhou. Bob happens to have spare time at the hotel and he does want to learn Yoga from Alice so as to follow the teaching schedule. Our AMRECS system can help Bob to do this thing. In AMRECS, both Alice and Bob only need to run the AMRECS app in their mobile-phones held in their hands.
During the teaching process, the actions of Alice will be captured by the accelerometer and gyroscope sensors embedded in their smartphones and transferred to a remote backend server by WiFi. Upon receiving the data, the backend server runs a k-mean method for dividing the actions into clusters and an HMM-based method for recognizing the arm motions and generating the motion curves of the coach, Alice. The actions of the coach will be transferred through Internet and displayed on Bob’s iPad. Then, the student, Bob, does the same motions according to the coach’s motion trace showed on his iPad, and his actions are also delivered to the backend server through Internet, and then the similarity between the student’s and the coach’s actions is also measured by the server. In this way, Bob can immediately know the correctness of his actions so as to learn better.
However, when designing and implementing such a light mobile-phone-based arm motion recognition system, several technical challenges have to be addressed. First, due to the existence of noise in accelerometer and gyroscope sensors, the data of arm motion traces received may be disturbed and may deviate from the true actions. Second, we need to differentiate the start point from the end point of actions in three-dimensional (3D) environments without any existence of standard referential coordinates. Third, effective methods are lacking for performing similarity comparison of persons’ action traces under 3D environments.
To handle these technical challenges, we build up a basic arm motion library and divide person’s actions into a set of elemental actions by designing an improved k-mean method. We also present an HMM-based algorithm for arm motion recognition and measuring the action similarity between the coach and his/her students. Finally, we implement the system in mobile-phones and evaluate its effectiveness and flexibility.
The rest of this paper is organized as follows. We discuss related work in Section 2. Section 3 presents our algorithms for removing noise, arm motion recognition, and similarity comparison in this section. The system architecture is given in Section 4 and we conduct experiments to evaluate the system in Section 5. Finally, we conclude this paper in Section 6.
2. Related Work
Arm motion recognition, especially in the context of smart environment, has been an important topic of research. According to the information collection modes by the input devices , it can be roughly divided into two categories: vision-based recognition and sensor-based recognition. In general, vision-based recognition has been studied extensively for human interaction, which usually adopts one or more video cameras to capture and recognize arm motion trace; please refer to literatures [10, 15–19] for more details. Sensor-based recognition  uses different sensors (e.g., accelerometer , gyroscope , and body-worn sensors [23, 24]) to perceive position and orientation data and translate the data into coordinates and angles. Considering our present work, we focus on the discussion of the state-of-the-art sensor-based recognition methods.
Until now, there are several motion-sensor-based applications and sensor-based gesture recognition systems, such as the Nintendo Wii Remote [25, 26], data glove, body-worn sensor-based system (e.g., wristwatch [27–32]), and RFID-based system . More and more researchers embedded the environment with different kinds of sensors, such as body-worn accelerometers and RFID tags, to detect, collect, and recognize human arm activities .
Schlömer et al.  used a Nintendo Wii Remote controller and a Hidden Markov Model to train and recognize user-chosen arm motions so as to help persons to interact with systems.
Hand data glove is an electronic device equipped with sensors that perceives the movements of hand. The motion based on data glove has been used in signal language processing and training. For example, Kumar et al.  used hand data glove to make paintings and air-write characters in more real-time environment and with less complexity.
Garcia-Ceja et al.  used acceleration data from a wristwatch in order to identify long-term, complex activities like cooking, playing sports, and taking medication.
The authors in  developed a swimming motion display system for athlete swimmers’ training using a wristwatch-style acceleration and gyroscopic sensor device, which consisted of a sensing unit and software. The sensing unit, which is attached to the swimmer’s wrist, measures and records the triaxis acceleration and angular velocity of the swimming stroke during training; the software reconstructed the swimming motion from the measured results transmitted from the sensing unit and displayed estimated fluid forces acting on the swimmer’s hand and forearm.
Kratz et al.  presented an accurate, efficient method that improves both arm motion detection and classification by making motion input from arm-worn inertial sensors more practical.
Fortmann et al.  showed LightWatch, a wearable light display integrated into a common analogue wristwatch without interfering with the functionality of the watch itself; it shall raise body awareness by enabling sensor-based measurement, adjustment, and display of a user’s personal exertion level.
In literature , the authors reported on a real-time monitoring and alerting system, “Mobilecare Monitor,” which combined the wireless wristwatch-based monitoring system for older adults health surveillance.
Daisuke et al.  provided a motion artifact compensation method for the wristwatch type photoplethysmography sensor to reduce the artifact acquired by the sensor for daily healthcare monitoring and for sports.
Lu et al.  implemented an approach to achieve intensive manipulation of virtual objects using natural hand motions. Park et al.  implemented an E-Gesture system for gesture recognition on a hand-worn sensor device and achieve high accuracy recognition under dynamic mobile situation.
A method for spotting sporadically occurring arm motions in a continuous data stream from body-worn inertial sensors was presented by Junker et al. .
RFID-based approaches also have been proposed for arm motion recognition. For example, Asadzadeh et al.  proposed to use multiple hypothesis tracking and subtag count information to track the motion patterns of passive RFID tags, which can be used to recognize hand motions, and enable interaction with applications in a RFID-enabled environment.
Krigslund et al.  propose a novel method estimating and tracking the tag orientation in 3D based solely on the physical characteristic of the tag reply, using multiple reader antennas distributed around the interrogation zone.
However, current methods may not be applicable to the scenario of remote coaching by light mobile-phones discussed in this paper. For example, the sensor-based systems and methods mainly focus on local human-machine interaction while seldom considering the remote person-to-person interaction scenario. The RFID-based systems usually require the deployment of static data transceiver stations [33, 38, 39] and users need to carry RFID tags with them, which make this kind of methods not applicable to businessmen traveling in different cities who only carry portable devices.
In this paper, we implement a light mobile-phone-based system for exercise coaching, which does not need any extra static and expensive devices and it helps users communicate with mobile-phones and portable devices. We also present algorithms for similarity comparison between learners and coaches in noisy environments so as to help learners to perform remotely learning and correcting their actions.
3. System Framework
We first define basic hand motions and then illustrate how to perform data preprocessing and smoothing in noisy environments. Next, a k-mean algorithm is proposed for clustering hand motions into basic motion groups. Finally, an HMM-based algorithm is proposed for arm motion recognition and measuring action similarity between the learner and his coach.
3.1. Basic Arm Motions and Data Smoothing
To quickly capture arm motions for recognition, we define eight basic motions in the motion library, which is shown in Table 1. Each arm motion can then be defined by a sequence of the eight basic motions. For example, a horizontal-to-up motion can be defined by three basic motions in sequence, that is, “,” “,” and “.” If each discrete basic motion can be distinguished from the continuous action trace of the hand, we can recognize and deduce the arm motions.
As there may exist signal noise or the gyroscope’s accumulated error in the data captured by sensors embedded in mobile-phones, we need to perform data preprocessing and smoothing so as to filter signal noise and keep data quality. We use Savitzky-Golay filter (SG-filter)  for data preprocessing and smoothing, which can increase the signal-to-noise ratio without distorting the signal.
For the continuous motion, we could decompose it into several discrete basic motions according to the time sequence of data acquisition, so that the acceleration values corresponding to the motion we acquire at each direction have connection with the time sequence; that is, they are correlated to the sequence number of acquisition at each direction from the point of data. Therefore, we smooth the acquired data at every direction to decrease the computation complexity.
Considering sampling points, we denote a group of values of 3-axis acceleration by , and refer to the values of acceleration at the sampling point of x-, y-, and z-axis, respectively. Supposing is the set of of all the sampling points, we can construct an -order-polynomial function to fix , where , , and represent -order-polynomial function at each direction of -, -, and -axis. Taking as an example, , the fitting function of -axis direction at the sampling point, can be given bywhere , is the fitting coefficient and is the sampling sequence.
The error can then be measured by
To get the minimized value of , it will have
We can then obtain that
Given the value of and n, the fitting data can be easily obtained. We can now calculate the values of coefficients , and will be got as well. Similarly, we can use the same way to get , , and .
An example is shown in Figure 2 where we smoothed a group of continuous motions. In this example, the coach lets her arm fall naturally, straightens her arm in line with her body, lifts it up till the top of her head, and then comes back to the start point slowly following the same route. We sample and smooth the discrete data acquired by the accelerometers. The smoothed result is shown in Figure 2.
(a) 3-axis acceleration data smoothing
(b) Gyroscope data distribution
From Figure 2(a), we observe that using SG-filter can achieve exciting smoothing effects, that is, after eliminating some noisy data, the gathering, discrete data mostly lies in or close to the smoothing curve.
Explicitly from Figure 2(b) that shows the distribution of gyroscope data, we find that the actions of “” and “” can be explicitly distinguished if we know the start point and the end point in advance, which become used to obtain the motion directions and traces. Combined with the 3-axis acceleration information, both the observation state and the continuous motions can be identified.
3.2. Clustering Algorithm
We use the coordinates of smoothed data as the input data and design an improved k-mean clustering algorithm to classify 3-axis acceleration values of random motions into the eight basic types. The essence of k-mean is to reach the purpose of stepwise refinement through iteration, which is very applicable to our arm motion recognition. As a discussion on the idea of k-means is beyond the scope of this paper, the interested readers can refer to . The algorithm is shown in Algorithm 1.
In Algorithm 1, we stipulate the vertical downward 3-axis coordinate of the coach to represent the initial reference value and use the standard gravity acceleration G as the unit of the coordinate where the sign “−” represents that the trace of the motion is downward. We first identify initial clusters according to the eight basic motions. Second, shown in step and step , for each coordinate, we compute its distance to each barycenter and then assign it to the closest cluster. Third, we update the new barycenter of current cluster. For all the new barycenters, if the distance between current barycenter and the new barycenter is less than or equal to the threshold , we will output all the barycenters .
For example, we use this algorithm to cope with twenty groups of 3-axis acceleration and gyroscope data with each group having a ten-element tuple (acceleration, gyroscope) representing 8 basic motions (i.e., “acceleration” captures the 3-axis acceleration coordinate of one motion and “gyroscope” perceives the motion direction information).
Table 2 shows the obtained 3-axis base coordinate sequences of the eight basic motions by using the clustering algorithm. With the 3-axis acceleration coordinate and gyroscope data, we identify a group of continuous motions and obtain the motion trace.
Figure 3 shows the clustering results of two successive motion sequences. We classify two groups of successive motions, the right-up (shown in Figure 3(a)) motion and the left-up (shown in Figure 3(b)) motion. The right-up motion denotes that the coach raises her right arm from a vertical-down location up to her head and the left-up motion means that she raises up her left arm in the same way. Considering the error that may exist, we use the coordinate of motion “” as the standard value for reference and each piece of data received is calibrated by using a normalization method .
(a) Right-up clustering
(b) Left-up clustering
3.3. Motion Recognition
We use the HMM method  to complete arm motion recognition, which has its advantages in motion recognition to model human actions by the approach of stochastic process . It defines a finite set of states with each of which being associated with a multidimensional probability distribution . We define the elements of an HMM method as follows. The eight basic motions are seen as hidden symbols and the observation symbols are composed of the hidden states. We use to denote the number of observation states. One hidden symbol at time is denoted by with and (where is the length of the output observation symbol sequence). is the number of the hidden states. A set of state transition probability matrix is where means the state transition probability from state at time to state at time (), denotes current hidden symbol, and meets the conditions of and .
Let be a probability distribution matrix between hidden states and observation states with being the probability that the observation symbol is at time and the practical state is . It holds that with and
Let denote the set of initial state distributions where . We can now define the HMM as .
3.3.1. Satisfied Conditions
During the process of identifying single or successive motions, we find that the recognition process meets the Markov property since the action of the next state always depends on the current state. For example, if current motion is “,” the next state’s motion will only be “” or “.” For two states and at moments and , we can get Let be the observation symbol sequences with being the observation symbol at time ; we get that
We use the previous clustering results and the observation symbol information to obtain the hidden symbol sequence . We compute the conditional probability by where denotes all possible hidden symbols’ full permutation of and denotes one of the possible arrangement sequences of the basic motions in our system.
However, as computing (9) needs higher time complexity, we use an iterative recursion method to decrease the complexity and define the forward output probability . It holds that and . Now, we can compute by (10) and obtain .
Finally, we will find out the most probable hidden symbol sequence . Let be the probability of the most probable path to the symbol . The maximum possible probability at time is and it has where .
3.4. Similarity Comparison
The same motion made by different individuals may look very different due to the different height and length of their arms. For example, when a tall person raises up his arm, it may cause a long motion trace while a small person may cause a short trace. To cope with this scenario, we propose an algorithm for similarity comparison to support exercise coaching, which removes the influences brought by differences in persons’ height and arm length.
Before we begin to measure and compare the motion similarity between the coach and her student, we should ensure that they are moving to the same direction at the same time, either “upward” or “downward,” which could be judged by the acquired gyroscope data, combined with the known location of start point and end point in advance. We then compute the curvature of their motion paths, , at a set of time points so as to discretely measure their similarity. is computed by where , () is the accelerate coordinate at time with and being the first- and second-order derivatives of , respectively.
As shown in Algorithm 1, to measure the degree of similarity, we first need to normalize the initial coordinates of both the coach and the student by the same position (the position is used in this paper). and are used to denote the curvatures of curves of the coach and student at time , respectively. We then use the square of the difference between and to calculate at time . After obtaining the maximum value (i.e., ) and the minimum value (i.e., ) from the set of , we normalize each and compute the degree of similarity between the two curves, which measures the accuracy of the student’s action deviating from the coach’s. This algorithm is given by Algorithm 2.
4. System Architecture
In this section, we present the system architecture and its components. As shown in Figure 4, our system AMRECS contains three parts, smartphone for data acquisition and transmission, server for arm motion recognition and similarity comparison, and tablet computer for displaying action exercises. In this figure, we use “BLE” to denote the Bluetooth low energy 4.0 module.
We obtain the 3-axis acceleration coordinate and the orientation data by the 3-axis accelerometer and gyroscope sensor in the smartphone. Its in-built BLE 4.0 and WiFi are also used for connecting with the remote backend server.
Most of the computation burden must be shifted to the backend server due to its powerful processing capability . The main functions of our backend server are to receive data from remote smartphones, perform arm motion recognition and similarity comparison, and communicate with remote tablet computer. The backend server in our system also stores coaching videos in advance for guiding and correcting the students’ actions.
5. Performance Evaluation
In this section, we first present the experiment scenario and then conduct experiments on our HMM-based recognition method. We also evaluate the efficiency of the similarity comparison algorithm.
5.1. Experiments Scenario
We obtain the 3-axis acceleration coordinates and orientation data by using a 3-axis MEMS accelerometer, a 3-axis MEMS gyroscope, and a BLE 4.0 communication module embedded in an MIUI 2S smartphone with Android platform, which is connected to the remote server by its WiFi module and the BLE 4.0 wireless communication module is mainly used to communicate with the Pad and transfers the gathered data into the Pad.
The system of arm motions recognition, arm motion traces generation, and comparison on the backend server (a Lenovo M6900 workstation with 2 GB memory and Intel Core Duo processor) is implemented in Java.
We carried the experiments in two distant rooms (called Rooms A and B) with their distances being more than 100 miles. The backend server is deployed in Room A while two Samsung pads are used as the display terminals in both of two rooms. Two volunteers participated in our experiment, one playing the role of coach in Room A and the other playing the role of learner Room B. Both of the two volunteers hold their mobile terminals following the same routes. We first obtain the coach’s eight discrete basic motions as the training samples. After training, the coach does a set of continuous actions and the corresponding data will be sent to the backend server for processing.
The student watches and follows the coach’s action in his room. The actions of both the learner and the coach will be compared and the degree of their similarity will be measured. The server will immediately inform the learner whether his action is now correct or not.
5.2. Data Acquisition
We combine the in-built accelerometer LIS3DH (Figure 5(a)) with gyroscope L3G4D200DH (Figure 5(b)) modules in MIUI 2S smartphone to implement 3-axis accelerated velocity and gyroscope angle data acquisition. Moreover, the embedded BLE 4.0 and WiFi communication modules are in charge of establishing connection with display termination and backend server, respectively. Considering the self-deviation of accelerometer and gyroscope sensors, the Kalman-filter method is used for data correction. The Android-based MIUI 2S smartphone is based on the APQ8064 quad-core processor, which has 16 KB flash memory and 2 GB RAM and has an embedded 3-axis accelerometer LIS3DH and a 3-axis gyroscope L3G4D200DH. The LIS3DH has dynamically user selectable full scales of , and it is capable of measuring accelerations with output data from 1 Hz to 5 KHz. The L3G4D200DH is a low-power 3-axis angular rate sensor and has a full scale of dps.
The pseudocode of sensors’ initialization and data acquisition is shown in Pseudocode 1. The initialization includes setting communication baud rate between MIUI 2S and the two sensors; here, we set baud rate as 38400 bps; ascertaining the full scale range of LIS3DH to be , and L3G4D200DH to be 250 dps, respectively. Some essential parameters, such as the zero partial correction values of accelerometer and gyroscope, are defined.
After establishing the communication connection between MIUI 2S and the two sensors, groups of data including 3-axis acceleration values () and 3-axis angular rates () are sampled and transferred to MIUI 2S processor and then to backend server by WiFi module. The LIS3DH uses separate proof masses for each axis, acceleration along a particular axis induces displacement on the corresponding proof mass, and capacitive sensors detect the displacement differentially. When MIUI 2S is placed on a flat surface, it will measure on the - and -axes and on the -axis. The accelerometer’s scale factor is then calibrated and is nominally independent of supply voltage. When the L3G4D200DH is rotated about any of the sense axes, the three independent vibratory gyroscopes detect rotation about the -, -, and -axes; the Coriolis Effect causes a vibration that is detected by a capacitive pick-off. The resulting signal is amplified, demodulated, and filtered to produce a voltage that is proportional to the angular rate. This voltage is digitized using individual on-chip 16-bit Analog-to-Digital Converters (ADCs) to sample each axis.
5.3. Experiments on HMM-Based Recognition
We conduct experiments on arm motion recognition by the 3-axis acceleration and gyroscope samples every one second by varying consecutive mobility situations, that is, “” “” “” “” “”, as shown in Figure 6. The coordinate axis represents the basic motions, which is the hidden symbol, denoted by the numbers “1,” “2,”,“8.” The red dotted arrows represents the action route. For example, one action route in the experiment is the path of motion “3” “7” “1” “5” “4,” and, after pausing for a while, motion “4,” “5” and “4” again. The action finally returns back to initial position “3.” We do the same experiments for 10 times. The “①” to “⑩” is the state transition process from one motion to another, and the “I” to “IV” is the observation symbol, which consisted of basic motions.
First, we get ten groups of samples data from the volunteer and each group is composed of 14-motion transition which includes the 3-axis acceleration and gyroscope. The data is trained and used to find the next most possible motion that the volunteer might do. We can then judge which observation symbols it belongs to and record it in the state sequence. Therefore, we can compute the state transition probability matrix and the probability distribution matrix B.
After training, we can estimate the observation symbol according to the state sequence and use and to find the hidden symbol sequences of maximum probability, which is the successive motion path that we want to recognize and shown in Table 3. For experiment results shown in Table 3, we can see that only the second symbol, where the hidden symbol should be “7,” is misjudged among the fourteen estimations, and the accuracy of recognition is 92.8%.
5.4. Similarity Comparison
Similarity comparison aims at finding out the degree of action consistency between the coach and the learner. We construct three experiment scenarios to evaluate our AMRECS system. The first experiment scenario shows that the learner does the exact actions as the coach does. In the second experiment scenario, the learner’s actions are mostly consistent with the coach’s. In the third experiment, the leaner fails to correctly imitate the coach’s actions.
(a) Sampling data by left hand
(b) by left hand at the first scenario
(a) Sampling data by right hand
(b) by right hand
Figure 7(a) shows that the learner follows the coach’s yoga action from “” to “” by using his left hand. Figure 7(b) shows the corresponding degree of similarity between the two persons. We can find that there exist a few different curvature trends at the corresponding positions (labeled with black dotted line), which shows that the learner can improve or correct his actions at these positions. The whole similarity degree is calculated to be between the two action curves performed by the student and his coach. Figure 8(a) shows that the person does the actions by using his right hand and the corresponding similarity degree is which is shown in Figure 8(b).
Figure 9 discusses the second experiment scenario, under which the coach does a set of continuous actions from “” to “” while the learner does the same action with his right arm horizontally outstretching and forming 45∘ angle with his body. Figure 9(b) shows that the learner’s actions are not exactly consistent with the coach’s exercise. In this way, the similarity degree is only 65.23%.
(a) Sampling data by right hand
(b) by right hand
Finally, we design two different groups of actions done by the learner and the coach, respectively. This scenario is shown in Figure 10. In this scenario, we want to test whether our method can find the motions that greatly deviate from the coach’s motions. In this way, we let the students do a group of motions, which are different from the coach’s, and then we observe that the student’s motions are far different from the coach’s. The similarity degree greatly decreases to .
(a) Sampling data by right hand
(b) by right hand
6. Conclusions and Future Work
In this paper, we present a light arm motion recognition and exercise coaching system by using smartphones. Our AMRECS system provides an effective solution for remote wireless interaction. We conduct three groups of experiments to evaluate the efficiency of our AMRECS system. The results shows that our system can accurately recognize static and dynamic arm motions. The system provides similarity comparison and measure so as to help person obtain the real-time feedback of their exercising actions. For future work, we may add other sophisticated applications, such as Wii and Kinect, and extend our system to some other remote exercise coaching sports, for example, aerobics, table tennis, and Chinese Tai-Ji-Quan.
Conflict of Interests
The authors declare that there is no conflict of interests regarding the publication of this paper.
This work is mainly supported by the National 973 Programs (Grant no. 2013CB329102), the National Natural Science Foundation of China (NSFC) (Grants nos. 61190113, 61401135, 61272188, 61572162, and 61402417), the Zhejiang Provincial Natural Science Foundation (Grants nos. LY13F020033, LY12F02005, LQ14F020013, and LY15F020037), the Open Foundation of State Key Laboratory of Networking and Switching Technology in Beijing University of Posts and Telecommunications (Grant no. SKLNST-2013-1-14), and the Open Foundation of State Key Laboratory for Novel Software Technology of Nanjing University (Grant no. KFKT2014B15).
- X. Zhao, Z. M. Gao, T. Feng, S. Shah, and W. Shi, “Continuous fine-grained arm action recognition using motion spectrum mixture models,” Electronics Letters, vol. 50, no. 22, pp. 1633–1635, 2014.
- G. S. Schmidt and D. H. House, “Model-based motion filtering for improving arm gesture recognition performance,” in Gesture-Based Communication in Human-Computer Interaction, vol. 2915 of Lecture Notes in Computer Science, pp. 210–230, Springer, Berlin, Germany, 2003.
- X. H. Shen, G. Hua, L. Williams, and Y. Wu, “Dynamic hand gesture recognition: an exemplar-based approach from motion divergence fields,” Image and Vision Computing, vol. 30, no. 3, pp. 227–235, 2012.
- H. Hasan and S. Abdul-Kareem, “Human-computer interaction using vision-based hand gesture recognition systems: a survey,” Neural Computing and Applications, vol. 25, no. 2, pp. 251–261, 2013.
- Y. Wang, J. Lin, M. Annavaram et al., “A framework of energy efficient mobile sensing for automatic user state recognition,” in Proceedings of the 7th ACM International Conference on Mobile Systems, Applications, and Services (MobiSys '09), pp. 179–192, ACM, Kraków, Poland, June 2009.
- R. Poppe, “A survey on vision-based human action recognition,” Image & Vision Computing, vol. 28, no. 6, pp. 976–990, 2010.
- D. Kim, J. Lee, H.-S. Yoon, J. Kim, and J. Sohn, “Vision-based arm gesture recognition for a long-range human-robot interaction,” Journal of Supercomputing, vol. 65, no. 1, pp. 336–352, 2013.
- F.-S. Chen, C.-M. Fu, and C.-L. Huang, “Hand gesture recognition using a real-time tracking method and hidden Markov models,” Image and Vision Computing, vol. 21, no. 8, pp. 745–758, 2003.
- R. Amstutz, O. Amft, B. French, A. Smailagic, D. Siewiorek, and G. Troster, “Performance analysis of an HMM-based gesture recognition using a wristwatch device,” in Proceedings of the International Conference on Computational Science and Engineering (CSE '09), vol. 2, pp. 303–309, IEEE, Vancouver, Canada, August 2009.
- H.-I. Suk, B.-K. Sin, and S.-W. Lee, “Hand gesture recognition based on dynamic Bayesian network framework,” Pattern Recognition, vol. 43, no. 9, pp. 3059–3072, 2010.
- J. Kim, J. He, K. Lyons, and T. Starner, “The gesture watch: a wireless contact-free gesture based wrist interface,” in Proceedings of the 11th IEEE International Symposium on Wearable Computers (ISWC '07), pp. 15–22, IEEE, Boston, Mass, USA, October 2007.
- A. Parate, M.-C. Chiu, C. Chadowitz, D. Ganesan, and E. Kalogerakis, “RisQ: recognizing smoking gestures with inertial sensors on a wristband,” in Proceedings of the 12th Annual International Conference on Mobile Systems, Applications, and Services (MobiSys '14), pp. 149–161, ACM, June 2014.
- F. Camastra and D. De Felice, “LVQ-based hand gesture recognition using a data glove,” in Neural Nets and Surroundings, vol. 19 of Smart Innovation, Systems and Technologies, pp. 159–168, Springer, Berlin, Germany, 2013.
- C. Kühnel, T. Westermann, F. Hemmert, S. Kratz, A. Müller, and S. Möller, “I'm home: defining and evaluating a gesture set for smart-home control,” International Journal of Human Computer Studies, vol. 69, no. 11, pp. 693–704, 2011.
- J. J. Zhang and M. G. Zhao, “A vision-based gesture recognition system for human-robot interaction,” in Proceedings of the IEEE International Conference on Robotics and Biomimetics (ROBIO '09), pp. 2096–2101, Guilin, China, December 2009.
- H.-C. Lee, C.-Y. Shih, and T.-M. Lin, “Computer-vision based hand gesture recognition and its application in iphone,” Smart Innovation, Systems and Technologies, vol. 21, pp. 487–497, 2013.
- M. Hasanuzzaman, V. Ampornaramveth, T. Zhang, M. A. Bhuiyan, Y. Shirai, and H. Ueno, “Real-time vision-based gesture recognition for human robot interaction,” in Proceedings of the IEEE International Conference on Robotics and Biomimetics (ROBIO '04), pp. 413–418, Shenyang, China, August 2004.
- A. S. Ghotkar and G. K. Kharate, “Study of vision based hand gesture recognition using indian sign language,” International Journal on Smart Sensing and Intelligent Systems, vol. 7, no. 1, pp. 96–115, 2014.
- P. Gieselmann and M. Deneche, “Towards multimodal interaction with an intelligent room,” in Proceedings of the 8th European Conference on Speech Communication and Technology (EUROSPEECH '03), pp. 2229–2232, Geneva, Switzerland, September 2003.
- R. Wimmer, P. Holleis, M. Kranz, and A. Schmidt, “Thracker—using capacitive sensing for gesture recognition,” in Proceedings of the 26th IEEE International Conference on Distributed Computing Systems Workshops (ICDCS '06), pp. 64–69, IEEE, Washington, DC, USA, July 2006.
- S. Agrawal, I. Constandache, S. Gaonkar, R. R. Choudhury, K. Caves, and F. DeRuyter, “Using mobile phones to write in air,” in Proceedings of the 7th ACM International Conference on Mobile Systems, Applications, and Services (MobiSys '11), pp. 15–28, Washington, DC, USA, June 2011.
- H. Lu, J. Yang, Z. Liu, N. D. Lane, T. Choudhury, and A. T. Campbell, “The jigsaw continuous sensing engine for mobile phone applications,” in Proceedings of the 8th ACM International Conference on Embedded Networked Sensor Systems (SenSys '10), pp. 71–84, Zurich, Switzerland, November 2010.
- T. Park, J. Lee, I. Hwang, C. Yoo, L. Nachman, and J. Song, “E-gesture: a collaborative architecture for energy-efficient gesture recognition with hand-worn sensor and mobile devices,” in Proceedings of the 9th ACM Conference on Embedded Networked Sensor Systems (SenSys '11), pp. 260–273, ACM, Seattle, Wash, USA, November 2011.
- H. Junker, O. Amft, P. Lukowicz, and G. Tröster, “Gesture spotting with body-worn inertial sensors to detect user activities,” Pattern Recognition, vol. 41, no. 6, pp. 2010–2024, 2008.
- A. Mahmood and G. Masitah, “Towards natural interaction with wheelchair using nintendo wiimote controller,” in Software Engineering and Computer Systems, vol. 181 of Communications in Computer and Information Science, pp. 231–245, Springer, Berlin, Germany, 2011.
- J. C. Lee, “Hacking the nintendo wii remote,” IEEE Pervasive Computing, vol. 7, no. 3, pp. 39–45, 2008.
- E. Garcia-Ceja, R. F. Brena, J. C. Carrasco-Jimenez, and L. Garrido, “Long-term activity recognition from wristwatch accelerometer data,” Sensors, vol. 14, no. 12, pp. 22500–22524, 2014.
- M. Nakashima, Y. J. Ohgi, E. Akiyama, and N. Kazami, “Development of a swimming motion display system for athlete swimmers' training using a wristwatch-style acceleration and gyroscopic sensor device,” Procedia Engineering, vol. 2, no. 2, pp. 3035–3040, 2010.
- L. Kratz, T. S. Saponas, and D. Morris, “Making gestural input from arm-worn inertial sensors more practical,” in Proceedings of the 30th ACM Conference on Human Factors in Computing Systems (CHI '12), pp. 1747–1750, May 2012.
- J. Fortmann, J. Timmermann, B. Luers, M. Wybrands, W. Heuten, and S. Boll, “Lightwatch: a wearable light display for personal exertion,” in Human-Computer Interaction—INTERACT 2015, vol. 9299 of Lecture Notes in Computer Science, pp. 582–585, Springer, Berlin, Germany, 2015.
- N. Charness, M. Fox, A. Papadopoulos, and C. Crump, “Metrics for assessing the reliability of a telemedicine remote monitoring system,” Telemedicine and e-Health, vol. 19, no. 6, pp. 487–492, 2013.
- H. Daisuke, N. Hiroki, and S. Ken, “Motion artifact compensation for wristwatch type photoplethysmography sensor,” Key Engineering Materials, vol. 523-524, pp. 639–644, 2012.
- P. Asadzadeh, L. Kulik, and T. Tanin, “Gesture recognition using RFID technology,” Personal and Ubiquitous Computing, vol. 16, no. 3, pp. 225–234, 2012.
- A. Manzoor, H.-L. Truong, A. Calatroni et al., “Analyzing the impact of different action primitives in designing high-level human activity recognition systems,” Journal of Ambient Intelligence and Smart Environments, vol. 5, no. 5, pp. 443–461, 2013.
- T. Schlömer, B. Poppinga, N. Henze, and S. Boll, “Gesture recognition with a Wii controller,” in Proceedings of the 2nd International Conference on Tangible and Embedded Interaction (TEI '08), pp. 11–14, ACM, Bonn, Germany, February 2008.
- P. Kumar, S. S. Rautaray, and A. Agrawal, “Hand data glove: a new generation real-time mouse for human-computer interaction,” in Proceedings of the 1st International Conference on Recent Advances in Information Technology (RAIT '12), pp. 750–755, IEEE, Dhanbad, India, March 2012.
- G. Lu, L.-K. Shark, G. Hall, and U. Zeshan, “Immersive manipulation of virtual objects through glove-based hand gesture interaction,” Virtual Reality, vol. 16, no. 3, pp. 243–252, 2012.
- R. Krigslund, P. Popovski, and G. F. Pedersen, “3D gesture recognition using passive RFID tags,” in Proceedings of the IEEE Antennas and Propagation Society International Symposium (APSURSI '13), pp. 2307–2308, IEEE, Orlando, Fla, USA, July 2013.
- L. Kriara, M. Alsup, G. Corbellini, M. Trotter, J. Griffin, and S. Mangold, “RFID shakables: pairing radio-frequency identification tags with the help of gesture recognition,” in Proceedings of the 9th ACM International Conference on Emerging Networking Experiments and Technologies (CoNEXT '13), pp. 327–332, Santa Barbara, Calif, USA, December 2013.
- S. R. Krishnan, M. Magimai-Doss, and C. S. Seelamantula, “A savitzky-golay filtering perspective of dynamic feature computation,” IEEE Signal Processing Letters, vol. 20, no. 3, pp. 281–284, 2013.
- C. H. Edwards and D. E. Penney, Calculus, Pearson, 6th edition, 2002.
- H. Zhou and Y. Liu, “Accurate integration of multi-view range images using k-means clustering,” Pattern Recognition, vol. 41, no. 1, pp. 152–175, 2008.
- B. A. Q. Al-Qatab and R. N. Ainon, “Arabic speech recognition using Hidden Markov Model Toolkit(HTK),” in Proceedings of the International Symposium on Information Technology (ITSim '10), pp. 557–562, IEEE, Kuala Lumpur, Malaysia, June 2010.
- H. I. Yassin, Automatic Information Extraction Using Hidden Markov Model, VDM Verlag Press, 2010.
- P. F. Zhou, Y. Q. Zheng, and M. Li, “How long to wait?: predicting bus arrival time with mobile phone based participatory sensing,” in Proceedings of the 10th International Conference on Mobile Systems, Applications, and Services (MobiSys '12), pp. 379–392, June 2012.
Copyright © 2016 Hong Zeng et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.