Abstract

Human activity recognition (HAR) systems are widely used in our lives, such as healthcare, security, and entertainment. Most of the activity recognition models are tested in the personal mode, and the performance is quite good. However, HAR in the impersonal mode is still a great challenge. In this paper, we propose a two-layer activity sparse grouping (TASG) model, in which the first layer clusters the activities into 2–4 groups roughly and the second layer identifies the specific type of activities. A new feature selection metric inspired by the Fisher criterion is designed to measure the importance of the features. We perform the experiment using the TASG model with SVM, KNN, Random Forest, and RNN, respectively. The experiments are tested on HAPT, MobiAct, and HASC-PAC2016 datasets. The experimental results show that the performance of standard classifiers has been improved while combining the TASG method. The features selected by the proposed metric are more effective than other FS methods.

1. Introduction

In the last decade, adaptable sensor technologies have rapidly developed, which results in the rapid advance in mobile and ubiquitous computing. As the size of sensors shrinking, capability increasing, and costs decreasing, these electronic components have been more widely used in daily life. Human activity recognition (HAR) has many applications, particularly in health monitoring, city planning, sports coaching, entertainment, fitness assessment, and smart homes [1]. Driven by these reasons, human activity recognition based on wearable sensors has become one of the research hotspots.

At present, simple activity recognition has achieved high accuracy [2]. However, these experimental results were obtained on small-scale datasets which only contain a few people’s activity data. Moreover, most research work [35] used the personal mode for classification model training and testing. The training and testing datasets in personal mode have common users, but there is no common user between train and test datasets in the impersonal mode. In practical applications, the impersonal mode is more realistic [1, 6]. It is very inconvenient to train a model for every new user. For example, we cannot get the subject activity data, there are many activity classes, or some activities are not desirable for the subject to carry out (e.g., falling downstairs). Dungkaew et al. [7] proposed an impersonal and lightweight model for identifying activities in nonstationary sensory streaming data, and the experimental results on WISDM [8] dataset showed the recognition accuracy was less than 80% for walking, jogging, and stairs. Other research works [9, 10] also showed that the accuracy of impersonal activity recognition model was still not satisfactory. Therefore, HAR in the impersonal mode is still a difficult problem to be solved.

In the study of human activity recognition, some activities are easy to be classified, such as walking and standing, but some are confusing, such as walking upstairs and walking downstairs [11]. At present, most current work [1214] does not consider differences in features among activities. They always used the common features for all activities, which results in some awkward features, which are useless for distinguishing some activities although they have discriminative characteristics in other activities. Zhang and Sawchuk [15] proposed a multilevel activity classification model, which manually clustered the activities into several groups. The disadvantages of this method are as follows. (1) Manually clustering activities are too subjective to discover the relationship between the different activities. (2) The higher levels of the classification model caused low recognition performance.

In this paper, we first extract common features on the datasets. Similar activities are clustered into one group using ASG. Then, we perform feature selection in each group. Finally, we train the classifier of each group using the selected features, respectively. The main contributions of this study are listed as follows:(1)We propose a two-layer activity sparse grouping (TASG) method. The similarity of different behaviors is measured by the sparse coefficients, and the behaviors are clustered according to the sparse coefficients.(2)We propose a new feature selection (FS) metric based on the Fisher criterion [16]. Specifically, the selected features are more efficient in the classification task on the metric of maximizing the variance between classes and minimizing the variance within the class. Compared to other feature selection methods, our FS method gets higher score.(3)We test on three AR datasets using four basic classifiers. The recognition performance of the original model could be improved about 3% averagely using our TASG augmented model.

Research on human activity recognition started as early as the 1990s. Activity recognition based on wearable sensors is gaining popularity since it has applications in assisted living, sports, security, etc. Many techniques have been recently employed to analyze and classify data gathered from sensors on wearable devices.

Almeida and Alves [17] applied activity recognition to the game field. They presented an activity recognition system, called ActiveRunner, to replace traditional touch-based interaction. Ejupi et al. [3] proposed a wavelet-based algorithm to detect and assess the quality of sit-to-stand movements with a wearable pendant device. The experimental results showed significant differences between older fallers and nonfallers, and the new method can accurately detect sit-to-stand movements in older people. Hossain et al. [4] analyzed different active learning strategies to scale activity recognition and proposed a dynamic k-means clustering to solve the barrier of collecting the ground truth information. Walse et al. [5] presented experimental work of various classifiers on the WISDM dataset and the performance of Random Forest was the best.

Feature selection is also a research hotspot in activity recognition. Many people have studied the application of different feature selection methods in HAR. Feature selection can be divided into three categories [18] according to evaluation criteria: filter method, wrapper method, and embedded method. Zhang and Sawchuk [15] compared three different feature selection methods (Relief-F, SFC, and SFS) from computational cost and effectiveness. The number of features they selected using these methods gradually increased from 5 to 110. The experimental results showed that, across three feature selection methods, the classification errors taper off when 50 features are included. From computational cost, Relief-F has the lowest computational cost, and SFS has the highest computational cost. Relief-F was also compared to fast correlation-based filter and correlation-based feature selection in [19], and it was reported as the best feature selection algorithm due to the ability to deal with incomplete and noisy data. There are other feature selection methods, e.g., oppositional-based binary kidney-inspired algorithm [20] and trace ratio criterion [21]. He et al. [22] demonstrated the performance of the Laplacian score on Iris and PIE face datasets.

Deep learning (DL) shows powerful recognition ability in other research areas. Therefore, some researchers considered applying it to the HAR. Ronao and Cho [23] proposed a deep convolutional neural network to perform a new automatic way to extract robust features from raw data. Hammerla et al. [24] explored deep, convolutional, and recurrent approaches across three representative datasets that contain movement data captured with wearable sensors. However, due to lack of large labeled datasets, directly applying the deep learning methods for human activity recognition is not feasible. Münzner et al. [25] discussed three key problems in the development of robust DL and proposed a novel pressure specific normalization method. Compared to other DL methods, the results showed CNNs based on shared filter approach that has a smaller dependency on the available training data.

3. Method

3.1. Preliminary and Framework

The framework of the TASG method is shown in Figure 1, in which Figure 1(a) is the training process of TASG and Figure 1(b) is the hierarchy structure of TASG. First, we cluster activities into different groups using TASG. Specifically, we perform sparse decomposition of training data. In this process, it is important to ensure that the coefficient vector is sparse enough, and it will directly affect the subsequent grouping results. Then, the similarity between different activities is calculated based on sparse coefficients. As a result, activities with high similarity are clustered into the same group. Subsequently, we perform feature selection in different groups using a new feature metric. After that, we train the first layer classifier (group classifier) considering the same group as a class and train second layer classifiers (group-in classifier) with the selected features in different activity groups. Finally, we get a two-layer classification model; the first layer classifier determines the specific group, and the second layer classifier identifies the specific activity class.

The notations used in this paper are defined in Table 1.

3.2. Activity Sparse Grouping (ASG)

In human activity recognition research, some activities are easy to be classified, such as walk and stand, but others are confusing [11], such as skip and jump.

Considering sparse representation of all activities using the same dictionary, the more similar the activity, the closer the atoms are used, which can be reflected by sparse coefficients. Therefore, we propose an activity sparse grouping method to cluster activities according to sparse coefficients and then perform feature selection.

3.2.1. Activity Sparse Grouping Algorithm

The ASG algorithm is summarized in Algorithm 1.

Input: A: {A1, A2,…, An}, G = Ø
Output: G: {G1, G2,…, Gm}(1)Sparse decomposition of training data and solve the sparse coefficients;  (2)Calculating the similarity matrix ;  (3)for i = 1, 2, …, nif then      end ifend for(4)Do:for each     if AqGkthen   add Ap into Gk and remove Ap from A        end ifend for eachuntil A = Ø or A no longer changes(5)if AØ cluster remaining activities in A into a new group Gm and add Gm into Gend if(6)Return G;

In the algorithm, the similarity between different activity classes is calculated based on the sparse coefficient vector. The algorithm can ensure that coefficient α is sufficiently sparse via minimizing the number of nonzero components in the sparse coefficient.

Solving sparse representation involves seeking the sparsest linear combination from a dictionary. In this paper, we use online dictionary learning [26] to solve these problems. It mainly included two mutually iterative sections: sparse coding and dictionary update.

3.2.2. Sparse Coefficient Learning

The goal of sparse coding is to define a given data vector y as a weighted linear combination of a small number of basis atoms from the dictionary. Since sparse coding is an iterative process, the solution can be translated into the following optimization problems:where is the number of iterations and is the regularization parameter. In this paper, we use LARS [27] to solve .

The second part in dictionary learning is the optimization of the dictionary based on the current sparse coding. In this paper, the dictionary is updated using block-coordinate descent with warm restarts [28]. The dictionary updating problem can be turned into a convex optimization problem using convex dictionary admissible sets as shown in the following equation:

We update each column dj of D when keeping the other ones fixed under the constraint . For each dj, we use the method in [29] to update it as shown in the following equation:

3.3. Feature Selection Metric

Feature selection could help us in understanding data, reduce computation requirement, and improve the classification performance. A new feature metric learning method is designed in this paper. Good features always decrease the difference within the class, while increase the difference between classes. Considering simplicity and easy calculation, we propose a new feature selection metric to measure the importance of feature based on the Fisher criterion. The metric is designed on minimizing variance within a class and maximizing the variance between classes as follows:where is the variance of k-th feature between classes; is the variance of k-th feature within the class; is the i-th sample in k-th feature; is the i-th sample in k-th feature in class c; and is the mean of k-th feature, and is the mean of k-th feature in class c.

According to equation (4), of the feature with large differences between classes and less discriminating within the class is higher, and the higher the , the more important the feature is.

4. Experiments and Results

4.1. Datasets and Experimental Settings
4.1.1. Datasets

In this paper, in order to verify the recognition effectiveness of our method, we chose three different scale datasets in the experiment, where HASC-PAC2016 is a large-scale dataset. Table 2 also shows the statistics of three datasets.

(1) HAPT Dataset [30]. The dataset was carried out by a group of 30 volunteers within the age of 19–48 years, and the activities in the dataset are divided into two categories: basic activities ((1) walking, (2) walking upstairs, (3) walking downstairs, (4) sitting, (5) standing, and (6) laying) and postural transition activities ((7) stand-to-sit, (8) sit-to-stand, (9) sit-to-lie, (10) lie-to-sit, (11) stand-to-lie, and (12) lie-to-stand). Activity data are collected at a constant rate of 50 Hz using the embedded accelerometer and gyroscope in a smartphone.

(2) MobiAct Dataset [31]. The dataset was carried out by a group of 57 volunteers (42 men and 15 women) within an age bracket of 20–47 years. The dataset includes two parts: fall data and activities of daily living. In this paper, we use daily activity data composed of 12 basic activities: (1) standing, (2) walking, (3) jogging, (4) jumping, (5) stairs up, (6) stairs down, (7) stand-to-sit, (8) sit on chair, (9) sit-to-stand, (10) car step in, (11) car step out, and (12) lying. Activity data are also collected using the embedded accelerometer and gyroscope in a smartphone.

(3) HASC-PAC2016 Dataset [32]. The dataset was carried out by a group of 510 volunteers (390 men and 120 women). HASC-PAC2016 is targeted for basic human activity, which includes 6 activities: (1) stay, (2) walk, (3) jog, (4) skip, (5) going upstairs, and (6) going downstairs. Activity data are collected using the embedded accelerometer and gyroscope in a different smartphone.

4.1.2. Experimental Settings

In this paper, we use impersonal (training set and test set will have no common users) test mode and 5-fold cross-validation in the following experiments.

HAPT dataset has 30 volunteers; we randomly divide people into 5 groups of 6 people each; four groups are used as the training set, and one group is used for testing. MobiAct dataset has 57 different users, but not everyone has all activity data; we filter out those people and get finally 18 people as experimental data; they were divided them into 6 groups of 3 people each; five groups are used as the training set, and one group is used for testing. For HASC-PAC2016 dataset, there are 510 people in the dataset, but not everyone has gyroscope data, we filter out those people and get finally 280 people as experimental data. The sampling frequency for each person is different; we regularize the sampling frequency of all people to 50 Hz in this paper. Then, we randomly divide people into 5 groups of 56 people each; four groups are used for training, and one group is used for testing.

In this paper, we use accuracy, precision, recall, and F1 score to evaluate the performance of classifiers, and the accuracy is defined as follows:where (true positives) means the correct classifications of positive examples and (true negatives) means the correct classifications of negative examples. (false positives) and (false negatives) represent the negative examples incorrectly classified into the positive classes and the positive examples incorrectly classified into the negative classes, respectively.

The accuracy measure does not take into account the unbalanced datasets. Thus, precision, recall, and F1 score are considered to measure classifier performance:

We calculate the average of the precision, recall of each class as the final precision, and recall to measure the classifier. Then, the F1 score is calculated according to precision and recall.

4.2. Feature Extraction

Due to the continuous and long-lasting characteristics of the original sensor data, we cannot directly use them as training or testing data. A common approach to this problem is to use sliding windows to perform feature extraction on raw data. We set window size as 4 seconds and overlapping as 50% in experiments. The features we extracted are shown in Table 3, for more details refer to [33]. The datasets include accelerometer and gyroscope data. Therefore, we can get 162 (27 ∗ 6) features for each window.

4.3. The Results of Activity Grouping
4.3.1. HAPT Dataset

By the TASG algorithm, we can firstly obtain 4 initial groups: {walking, walking downstairs}, {sitting, standing}, {sit-to-lie, stand-to-lie}, and {lie-to-sit, lie-to-stand}. The remaining activities are walking upstairs, laying, stand-to-sit, and sit-to-stand. We add them to the initial groups using the nearest neighbor method. Finally, we can get four groups on HAPT dataset as shown in Table 4. Figure 2 shows the 2D plot of the clustering results on the HAPT dataset.

4.3.2. MobiAct Dataset

We also get three final groups on MobiAct dataset according to ASG: {standing, sitting on chair, lying}, {walking, jogging, jumping, stairs up, stairs down}, and {stand-to-sit, sit-to-stand, car step in, car step out} as shown in Table 5. Figure 3 shows the 2D plot of the clustering results.

4.3.3. HASC-PAC2016 Dataset

Figure 4 shows the 2D plot of the clustering results on the HASC-PAC2016 dataset. In Figure 4, “stay” activity is added into {walk, stUp, stDown} because there are at least two activities in a group. Finally, we get two groups {stay, walk, stUp, stDown} and {jog, skip}, as shown in Table 6.

4.4. Comparison of Different Classification Methods in Impersonal Mode

In the experiment, we perform feature selection with new feature metric and build two-layer ASG models with SVM [18, 34], Random Forest [5, 11], KNN [12, 13], and RNN [24], respectively.

The recognition performances of different methods on three datasets are showed as Tables 79, respectively. Table 7 summarizes the performance results obtained on the HAPT dataset in the impersonal mode. Obviously, accuracies of TASG-SVM, TASG-RF, TASG-KNN, and TASG-RNN have improved compared with those of standard SVM, RF, KNN, and RNN respectively, and TASG-KNN increases the most. For F1 score, we can see that TASG-SVM is almost the same as SVM, and the performance of other three TASG models has improved significantly compared to the original models. Table 8 shows the performance of different classification methods on the MobiAct dataset in the impersonal mode. For accuracy, the TASG model has a 2.23% improvement overall. The performance is also better than the original models in terms of F1 score except for RNN. Table 9 summarizes the performance results obtained on the HASC-PAC2016 dataset in the impersonal mode. The scale of the dataset is very large, and it can be observed that the TASG model can improve 3.56% in accuracy and 3.27% in F1 score overall.

The classification accuracies of different models in different groups are shown in Figures 57. In the HAPT dataset, we select the TASG-SVM model. We can find that the accuracy of our method is higher than that of the other four classification methods on four groups, especially in group 4. For the MobiAct dataset, we also select the TASG-SVM model. Although our method is not best in group 3, the classification accuracy of our method in group 2 is much better than that of the other four methods. For the HASC-PAC2016 dataset, we select the TASG-RF model. It is obvious that our method is better than the other three classification methods except for Random Forest in group 1. The experimental results show that the features selected by the TASG model can produce the better classification result than other FS methods.

4.5. Comparsion of Different Feature Selection Methods

In this section, we compare the proposed feature selection method with other three feature selection methods, e.g., Laplacian score [22], Relief-F [15], and MCFS [35]. The experiment results are shown in Figures 810. Our FS method based on the Fisher metric obtains better results than other methods. Especially, when the selected feature number is relatively small, our method could select more valuable features and get more high score than other FS methods. Therefore, our method can use fewer features to achieve the same level of accuracy of the other FS methods.

5. Conclusion

In this paper, we propose a two-layer activity sparse grouping hierarchical model to augment the performance of general classification models. Similar activities are clustered according to the sparse coefficient. We find that {walk, stair up, stair down} and {stand, sit, lay} activities are highly similar, respectively. The experimental results show that the performance of our TASG hierarchical model can improve about 3% averagely on HAR datasets in the impersonal mode. In addition, a new feature selection metric based on the Fisher criterion is proposed. The experimental results show that the new feature metric is more effective than other feature selection methods in HAR.

Data Availability

Three datasets in the experiment of this paper can be accessed freely. The web links are listed as follows. (1) The HAPT dataset used to support the findings of this study has been deposited in the repository of Human Activity Recognition Using Smartphones Data Set (http://archive.ics.uci.edu/ml/datasets/Smartphone-Based+Recognition+of+Human+Activities+and+Postural+Transitions). (2) The MobiAct dataset used to support the findings of this study has been deposited in the repository of MobiAct (https://bmi.teicrete.gr/en/the-mobifall-and-mobiact-datasets-2/). (3) The HASC-PAC2016 dataset used to support the findings of this study has been deposited in the repository of HASC-PAC2016 (http://hub.hasc.jp/).

Conflicts of Interest

The authors declare that there are no conflicts of interest regarding the publication of this paper.

Acknowledgments

This work was supported by the Zhejiang Provincial Natural Science Foundation of China (No. LY17F020008) and partially by the Hangzhou Science and Technology Development Plan Project (20150533B15).