Abstract

The aim of this paper is to choose the optimal motion sensor for the selected human activity recognition. In the described studies, different human motion measurement methods are used simultaneously such as optoelectronics, video, electromyographic, accelerometric, and pressure sensors. Several analyses of activity recognition were performed: recognition correctness for all activities together, matrices of the recognition errors of the individual activities for all volunteers for the individual sensors, and recognition correctness of all activities for each volunteer and each sensor. The experiments enabled to find a range of interchangeability and to choose the most appropriate sensor for recognition of the selected motion.

1. Introduction

Telemetric recording and automatic interpretation of motion activities play a significant role in home monitoring. From a variety of applications, we can distinguish a few most common ones: prevention and detection of falls, detection of abnormal or dangerous situations, rehabilitation monitoring, and activity assessment and quantification. An automatic system usually consists of sensors, specific signal or image processing methods, and recognition module for the selected activity. Selection of sensors seems to be the most important issue and must take into account useable sensor properties: wearing ability, sensitivity to disturbances, occurrence of outsiders, etc. Out of the many propositions of sensors, it is difficult to choose the best universal one because each sensor works best in a certain range of recognized activities. This fact motivates us to study that topic.

In [1], electromyographic (EMG) analysis of four lower limb muscles was performed during seven classes of preventive exercises against loss of balance or falling. Other researchers integrated EMG and inertial measurement unit (IMU) to construct a balance evaluation system for recording the body in a dynamic and static posture [2]. In [3], seven hand movements were classified (by neural networks with backpropagation and Gustafson–Kessel algorithm) on the basis of EMG signal of four forearm muscles. An EMG- and augmented reality- (AR-) based rehabilitation system for the upper limbs was proposed in [4]. In [5], an EMG biofeedback device for forearm physiotherapy was constructed to discriminate 6 classes of movements.

Novak et al. [6] proposed a system for automatic detection of gait phases using acceleration and pressure sensors and supervised learning algorithm. For gait abnormalities detection in [7], the authors built a prototype of pressure force sensing resistor (FSR), bend sensor, and IMU. Principal component analysis (PCA) was used for the features generation and support vector machine (SVM) for multiclass classification. Shu et al. [8] presented a time-space measurement tool in the form of insoles of conductive fabric sensors placed around the midfoot and the heel. The wireless capacitive pressure sensors were introduced in [9]. Other studies [10] were related to equilibrium measurements with an instrumented insole with 3 pressure sensors per foot.

An accelerometric (ACC) system for monitoring the daily motor activity (sitting, standing, lying, and periods of natural walking) was proposed in [11]. An ACC sensor was placed on the subject’s sternum. Detection of gait parameters by means of a detector composed of gyrometric, accelerometric, and magnetic sensors was proposed in [12]. Rong et al. [13] presented the use of 3D accelerometric sensor located at the waist to identify people based on their characteristic gait patterns. Identification was prepared with discrete wavelet transform (DWT). Jafari et al. [14] proposed ACC-based detection of accidental fall. The selected signal features were used for distinction of four transitions (sitting-standing, standing-sitting, lying-standing, and standing-lying) with the use of neural network and k-nearest neighbour (k-NN) classification. In [15], researchers developed ACC-based fall detection for smartphones. The proposed system enabled fall event detection, location tracking of the person, and notifications of emergency situations.

Juang et al. [16] introduced a system for detection of four body postures (standing, bending forward, sitting, or lying) and sudden falls. For classification purposes, the silhouette was segmented from each image frame. The feature vector was composed of Fourier transform coefficients and a ratio of body silhouette length and width. Real-time system was implemented in [17]. It consisted of three main modules: segmentation of silhouette, recognition, and identification of posture. The authors introduced decision rules based on body parameters. It was possible to detect four postures: standing, sitting, squatting, and bending. In [18], authors performed analysis by means of supervised and nonsupervised learning for classification of the body position on images sequence. Other researchers [19] presented the posture detection method which took into account information about the body shape and the skin colour. Song and Chen [20] proposed vision-based activity recognition on the basis of information of pose, location, and elapsed time.

In the mentioned papers, the selection of particular sensors was not so clearly justified. This raises a natural question about the optimal choice. The aim of our research was based on the use of various sensors applied to simultaneously capture the signs in basic activities and study the correlation of information obtained from them. This approach enabled the choice of the proper sensor depending on the situation and the current need. The experiments aimed at determining how well the simple measuring devices can approximate the information obtained from the specialized medical equipment. Our measurements were performed by means of three-dimensional motion capture system, wireless EMG amplifier and wireless feet pressure system (as reference equipment), and accelerometer and video camera (as currently available consumer-grade sensors).

2. Materials and Methods

2.1. Plan of the Experiment

A total number of 20 volunteers (8 women, 12 men, age—22 to 61, average age—27) were examined. Each subject was instructed to do about 30 (19 to 46) repetitions of 12 different activities:(i)Squatting from a stand position (1a) and getting up from a squat (1b)(ii)Sitting on a chair from a stand position (2a) and getting up from a chair (2b)(iii)Reaching (3a) and returning from reaching the upper limb forward in the sagittal plane (standing) (3b)(iv)Reaching (4a) and returning from reaching the upper limb upwards in the sagittal plane (standing) (4b)(v)Bending from a stand position (5a) and straightening the trunk forward in the sagittal plane (5b)(vi)Single step for the right (6a) and left lower limb (6b).

The measurements were performed simultaneously with the following:(i)A, a motion capture system: Optotrak Certus (NDI) with NDI First Principles software(ii)B, a wireless biopotential amplifier: ME6000 (Mega Electronics) with MegaWin software(iii)C, a wireless feet pressure measurement system: ParoLogg with Parologg software(iv)D, a digital video camera: Sony HDR-FX7E(v)E: ACC recorder (Revitus system) with dedicated software.

2.2. Characteristics of the Examined Signals

The three-dimensional motion trajectories of 30 infrared markers M1 to M30 located on the body were measured from the left side of the observed person (Figure 1). The acquisition was performed with the sampling frequency 100 Hz, accuracy 0.1 mm, and resolution 0.01 mm.

Surface EMG signals were recorded (2 kHz) from 8 muscles of both lower limbs: (1) quadriceps (vastus lateralis), (2) biceps femoris, (3) tibialis anterior, and (4) gastrocnemius (medial head).

Feet pressure signals were captured with 64 piezoresistive sensors (32 for each feet) with 100 Hz. Triaxial acceleration signal was recorded by sensors integrated in Revitus device located on the sternum. The recorder enabled online measurement via Bluetooth (100 Hz).

Video signals (720 × 576 pixels, 25 frames per second) were obtained from silhouette measurement using a digital camera placed from the volunteer’s left side.

2.3. Processing of the Measurement Data

To calculate feature vectors for classification, the processing of data recorded with sensors B to E was performed in MATLAB.

The three-dimensional motion trajectories were used for determining the precise time moments of start and end of activities. The exception was the gait (6a, 6b), which cannot be performed in a natural way in the distance as short as 4 m (the maximal width of registration space of the motion capture system). Therefore, for the gait (6a, 6b), the start and end points of duration were determined from visual analysis of video frames.

The difference of performance time between analyzed movements and acting volunteers requires normalization of the data length with a window . In order to make the optimal selection of its width, a set of histograms of activities performance were calculated:(i)Histograms of minimal, maximal, and average (MIN, MAX, AVG) performance time for all people and all activities together; the ALL histogram—for all values of duration time together and for all volunteers and activities (Figure 2)(ii)Histograms of all performance time for all volunteers, for each activity separately from 1a to 6b (Figure 3).

Based on the ALL histogram, the length of time window was set to , as the shortest of all window-covering activities of various types. Above this value, the other histograms (except for MAX) do not show a significant activity.

The electromyographic signals were processed as follows:(i)Calculating the absolute value of the signal(ii)Averaging the signal in a moving time window (0.1 s)(iii)Normalizing the amplitude separately for each volunteer—dividing the signal by the maximal value from all measurements of all activities for each volunteer(iv)Creating the vector data (which are then used as a component of the input classifier vector) consisting of the prepared (as above) EMG signal of each muscle of the left (L) and right (R) lower limb: (v)Normalizing the amplitude to (0 1] interval(vi)Resampling the signal to the frequency of 25 Hz.

The feet pressure signals were processed as follows:(i)Averaging the signal values from the sensors in the three selected areas—the heel (1), the center (2), and the front (3) for the left (L) and the right (R) foot: (ii)Averaging the signal in a moving time window of 0.3 s(iii)Normalizing the amplitude for each volunteer separately(iv)Creating the vector data: (v)Normalizing the amplitude to (0 1] interval(vi)Resampling the signal to the frequency of 25 Hz.

The accelerometric signals were processed as follows [21]:(i)Subtracting the offset value from the signal (offset—average of the 10 s length signal, when a person is in a stationary upright position) separately for each channel (x, y, z) and for each person(ii)Averaging the signal in a moving time window of 0.2 s(iii)Normalizing the amplitude for each volunteer separately(iv)Creating the vector data consisting of a prepared acceleration signal in the axes x, y, z: (v)Normalizing the amplitude to (0 1] interval(vi)Resampling the signal to the frequency of 25 Hz.

The video signal was prepared as follows [22]:(i)Converting a colour image to a grayscale.(ii)Calculating the vector motion field with 2 coordinates—optical flow (OF) using Horn–Schunck algorithm [23].(iii)Median filtering of the motion field components (5 × 5 pixels).(iv)Detecting the moving objects—binarization of the motion field module with a T threshold constant for all people and all activities; the threshold has been chosen experimentally in [24].(v)Calculating an area of the moving silhouette Sn−1 on the (n − 1)-th frame (yellow area in Figure 4(b)) as a joint part from areas OFn−1/n−2 (blue) and OFn/n−1 (turquoise), where OFn−1/n−2 is the motion field calculated on the basis of (n − 1)-th and (n − 2)-th frame and OFn/n−1is the motion field calculated on the basis of n-th and (n − 1)-th frame.(vi)Filling the holes in the area Sn−1.(vii)Thickening the contour mask of the movable silhouette part Sn−1 (inside to approximately four pixels (Figure 4(c)).(viii)Determining the histograms of motion field directions—aggregation of motion field vectors from the bold contour to 8 directions; each direction corresponds to the following angle ranges [−337.50° 22.50°], [22.50° 67.50°], …, [292.50° −337.50°].(ix)Normalizing the histograms.(x)Creating the data vector consisting of the histograms with bins B1, B2, B3, B4, B5, B6, B7, B8—each bar corresponds to one of eight directions: .

2.4. Identification of the Activities

To identify the selected activities, a supervised classification was performed. The set of all measurement data from each sensor was divided into learning and test sets. The former contained 2400 randomly selected representatives of all 10 activities, while the latter all 4874 remaining cases.

For classification of the selected activities, k-NN algorithm and Manhattan metrics were used. Before the classification step, the classifier was tested using the LOO (Leave-One-Out) method. On the basis of these analyses, k equal to 1 was the optimal value for all sensors and sets of sensors.

For each activity a and each sensor s, the correctness of recognition for all volunteers Rs_a (1) and its calculation error Us_a (2) were calculated. Us_a is a measure of the results dispersion coming from intersubject differences. Due to different numbers of activity repetitions for each volunteer, we used weighted standard deviation (2):where is the sum of correctly identified repetitions of the activity a for all volunteers for the sensor s and is the sum of all repetitions of the activity a performed by all volunteers for the sensor s:where n = 20 is the number of weights, equal to the number of volunteers; is the weight for the i-th volunteer, equal to the number of the activity a repetitions performed by the i-th volunteer; and xi is the percentage of correct recognition for specific activity calculated for the i-th volunteer.

In order to represent an additional variable, Rs_a_ALL (and its calculation error Us_ALL) was employed. It illustrates the percentage of correct recognition for all activities and all volunteers for each sensor:where Ps_a_ALL is the sum of correctly identified repetitions of all activities ALL performed by all volunteers for the sensor s and Ws_a_ALL is the sum of all performed repetitions of all activities ALL for all volunteers.where ui is the weight for the i-th volunteer, equal to the total number of repetitions of all activities performed by the i-th volunteer, and yi is the percentage of correct recognition for all activities calculated for volunteer i.

For each volunteer V and sensor s, the percent recognition for all activities Rs_V (5) and its calculation error Us_V (6) were determined. Us_V is a measure of the results value dispersion arising from differences between different activities.where Ps_V is the sum of correctly identified repetitions of all activities with the sensor s performed by the volunteer V and Ws_V is the sum of repetitions of all activities performed by the volunteer V.where m = 12 is the number of weights, equal to the number of activity types, pj is the weight for the j-th activity, equal to the number of its repetitions performed by the volunteer, and zj is the percentage of correct recognition for the j-th activity for the specific subject.

In addition, the calculation error Us_V_ALL, was determined as an activity-related dispersion:where qj is the weight for the j-th activity equal to the number of all the repetitions performed by all volunteers and rj is the percentage of correct recognition for the j-th activity calculated for all volunteers.

3. Results

The correctness of recognition Rs_a (1) of activities 1a to 6b for all persons for sensors B to E is presented in Table 1.

Matrices of the recognition errors (in %) of the individual activities 1a to 6b for all volunteers for the individual sensors B to E are shown in Tables 25. The percentage of correct recognition Rs_a for the individual activities is therefore placed on a diagonal matrix.

The correctness of recognition Rs_V of all activities for volunteers V1 to V20 and Rs_a_ALL for ALL volunteers for sensors B to E is presented in Table 6.

4. Discussion

The correctness of recognition Rs_a (1) is negatively correlated with the dispersion of the value Us_a (2) (Table 1). Therefore, less reliable recognition of the activity carried out by all volunteers does not mean worse recognition of the activity for each individual volunteer, but rather it is the implication of the individual way of performing the activity by the volunteer.

Some types of activities such as free gait or the return from reaching in the vertical and horizontal plane showed much less reliable recognition than others, regardless of the sensor type. Reliability of gait recognition is low probably due to high diversity in walking rhythm. Reaching is difficult to recognize, as it is characterized by low degree of dynamics of the whole body.

It was found that, among the single sensors, the best classifier for different activities is sensor B, followed successively by sensors D, E, and C.

The correctness of recognition Rs_V (5) is negatively correlated with the value of dispersion Us_V (7) (Table 6). It means that less reliable recognition for a single volunteer (taking into account all activities) does not come from an inferior recognition reliability of every single activity for that volunteer, but rather it is a result of the existing inconsistency of individual activities recognitions.

Our research is focused on the recognition of only 12 types of daily life activities. The motivation of that choice is mainly based on the following aspects:(i)Since the chosen activities are done quite often and are easy to repeat, we limit as much as possible the errors coming from different volunteer performance of the activity and thus the comparison of the sensors is more reliable(ii)It can be presumed that any activity (even more complex) can be presented by means of the simple (elementary) poses [26].

Although the choice of a proper sensor is a very complex issue, in our studies, we simplify it only to the comparison of motion items. Nevertheless, the final choice of the sensors is precisely related with the application. The following requirements should then be taken into consideration:(i)Individual characteristics of the sensor signal(ii)Size of the registration space(iii)Sensor accuracy(iv)Sensor portability and unobtrusiveness(v)Cost of the sensor device and reliable software(vi)Privacy of the supervised person.

The reason for the performance differences for each activity and for each sensor has the source in differences in:(i)Speed, range, and way of doing the particular motion(ii)Anatomy and biomechanics of the volunteer body (physical fitness, strength, endurance, flexibility, way of loading the body weight, etc.).

The above factors have an impact on all of the sensors (B to E).

5. Conclusions

The paper presents results of recognition of 12 motor activities in human based on individual interpretation of simultaneous recordings from various sensors. The main finding is that some sensors are more appropriate to the selected activities, while the other sensors show higher performance compared with the others. Consequently, we specified both areas where sensors show distinctive properties and a common range of activities where the sensors show similar metrological properties and may be selected based on other criteria (e.g., cost and commodity).

Additionally, we found that some recognition results generalized for all volunteers as well as those generalized for all activities showed surprisingly low values. This suggests that the recognition performance is dependent on particular volunteer (i.e., subject-specific) and also on particular action. Accordingly, the hierarchy of expected recognition results for particular actions is not universal, and to produce optimal results, it should be individually adjusted with regard to particular user behavior.

The prospective ways of future extension of our studies are as follows:(i)Expanding the list of activities with more complex ones(ii)Evaluating and adaptating the proposed solutions in home environment(iii)Extending video processing algorithm with a detection of individual body parts.

Data Availability

Research data are not openly available because of the volunteers' privacy.

Conflicts of Interest

The author declares that there are no conflicts of interest regarding the publication of this paper.

Acknowledgments

The author would like to thank Prof. Adam Gacek and Mr. Paweł Kowalski from the Institute of Medical Technology and Equipment (ITAM) in Zabrze, Poland, for providing the prototype of Revitus measurement device and software with no charge. The author also thanks Mrs. Beata Przybyłowska-Stanek, the director of Basen AGH in Kraków, Poland, for the possibility of renting the room for experiments’ area with no charge. This research was supported by the AGH University of Science and Technology in the year 2019 from the subvention granted by the Polish Ministry of Science and Higher Education.