Abstract

Automatic human activity recognition systems aim to capture the state of the user and its environment by exploiting heterogeneous sensors attached to the subject’s body and permit continuous monitoring of numerous physiological signals reflecting the state of human actions. Successful identification of human activities can be immensely useful in healthcare applications for Ambient Assisted Living (AAL), for automatic and intelligent activity monitoring systems developed for elderly and disabled people. In this paper, we propose the method for activity recognition and subject identification based on random projections from high-dimensional feature space to low-dimensional projection space, where the classes are separated using the Jaccard distance between probability density functions of projected data. Two HAR domain tasks are considered: activity identification and subject identification. The experimental results using the proposed method with Human Activity Dataset (HAD) data are presented.

1. Introduction

The societies in the developed countries are rapidly aging. In 2006, almost 500 million people worldwide were 65 years of age or older. By 2030, that total number of aged people is projected to increase to 1 billion. The most rapid increase of aging population occurs in the developing countries, which will see a jump of 140% by 2030 [1]. Moreover, the world’s population is expected to reach 9.3 billion by 2050 [2], and people who are above 60 years old will make up 28% of the population. Dealing with this situation will require huge financial resources to support the ever-increasing living cost, where human life expectancy is expected to reach 81 years by 2100.

As older people may have disorders of body functions or suffer from age-related diseases, the need for smart health assistance systems increases each year. A common method of monitoring geriatric patients is a physical observation, which is costly, requires a lot of human staff, and is increasingly infeasible in view of massive population aging in the following years. Many Ambient Assisted Living (AAL) applications such as care-providing robots, video surveillance systems, and assistive human-computer interaction technologies require human activity recognition. While the primary users of the AAL systems are of course the senior (elderly) people, the concept also applies to mentally and physically impaired people as well as people suffering from diabetes and obesity, who may need assistance at home, and people of any age interested in personal fitness monitoring. As a result, the sensor-based real-time monitoring system to support independent living at home has been a subject of many recent research studies in human activity recognition (HAR) domain [310].

Activity recognition can be defined as the process of how to interpret sensor data to classify a set of human activities [11]. HAR is a rapidly growing area of research that can provide valuable information on health, wellbeing, and fitness of monitored persons outside a hospital setting. Daily activity recognition using wearable technology plays a central role in the field of pervasive healthcare [12]. HAR has gained increased attention in the last decade due to the arrival of affordable and minimally invasive mobile sensing platforms such as smartphones. Smartphones are innovative platforms for HAR because of the availability of different wireless interfaces, unobtrusiveness, ease of use, high computing power and storage, and the availability of sensors, such as accelerometer, compass, and gyroscope, which meet the technical and practical hardware requirements for HAR tasks [1315]. Moreover, technological development possibilities of other applications are still arising, including virtual reality systems. Therefore, these machines present a great possibility for the development of innovative technology dedicated for the AAL systems.

One of the key motivating factors for using mobile phone-based human activity recognition in the AAL systems is the relationship and correlation between the level of physical activity and the level of wellbeing of a person. Recording and analysing precise information on the person’s activities are beneficial to keeping the progress and status of the disease (or mental condition) and can potentially improve the treatment of person’s conditions and diseases, as well as decreasing the cost of care. Recognizing indoor and outdoor activities such as walking, running, or cycling can be useful to provide feedback to the caregiver about the patient’s behaviour. When following the daily habits and routines of users, one can easily identify deviations from routines, which can assist the doctors in diagnosing conditions that would not be observed during routine medical examination. Another key enabler of the HAR technology is the possibility of providing independent living for the elderly as well as for patients with dementia and other mental pathologies, which could be monitored to prevent undesirable consequences of abnormal activities. Furthermore, by using persuasive techniques and gamification, HAR systems can be designed to interact with users to change their behaviour and lifestyles towards more active and healthier ones [16].

Recently, various intelligent systems based on mobile technologies have been constructed. HAR using smartphones or other types of portable or wearable sensor platforms has been used for assessing movement quality after stroke [17], such as upper extremity motion [18], for assessing gait characteristics of human locomotion for rehabilitation and diagnosis of medical conditions [19], for postoperative mobilization [20], for detecting Parkinson’s disease, back pain, and hemiparesis [21], for cardiac rehabilitation [22], for physical therapy, for example, if a user is correctly doing the exercises recommended by a physician [23, 24], for detecting abnormal activities arising due to memory loss for dementia care [25, 26], for dealing with Alzheimer’s [27] and neurodegenerative diseases such as epilepsy [28], for assessment of physical activity for children and adolescents suffering from hyperlipidaemia, hypertension, cardiovascular disease, and type 2 diabetes [29], for detecting falls [30, 31], for addressing physical inactivity when dealing with obesity [32], for analysing sleeping patterns [33], for estimating energy expenditures of a person to assess his/her healthy daily lifestyle [34], and for recognizing the user’s intent in the domain of rehabilitation engineering such as smart walking support systems to assist motor-impaired persons and the elderly [35].

In this paper, we propose a new method for offline recognition of daily human activities based on feature dimensionality reduction using random projections [36] to low dimensionality feature space and using the Jaccard distance between kernel density probabilities as a decision function for classification of human activities.

The structure of the remaining parts of the paper is as follows. Section 2 presents the overview of related work in the smartphone-based HAR domain with a particular emphasis on the features extracted from the sensor data. Section 3 describes the proposed method. Section 4 evaluates and discusses the results. Finally, Section 5 presents the conclusions and discusses future work.

All tasks of the HAR domain require correct identification of human activities from sensor data, which, in turn, requires that features derived from sensor data must be properly categorized and described. Next, we present an overview of features used in the HAR domain.

2.1. Features

While numerous features can be extracted from physical activity signals, increasing the number of features does not necessarily increase classification accuracy since the features may be redundant or may not be class-specific:(i)Time domain features (such as mean, median, variance, standard deviation, minimum, maximum, and root mean square, applied to the amplitude and time dimensions of a signal) are typically used in many practical HAR systems because of being less computationally intensive; thus, they can be easily extracted in real time.(ii)Frequency-domain features require higher computational cost to distinguish between different human activities. Thus, they may not be suitable for real-time AAL applications.(iii)Physical features are derived from a fundamental understanding of how a certain human movement would produce a specific sensor signal. Physical features are usually extracted from multiple sensor axes, based on the physical parameters of human movements.

Based on the extensive analysis of the literature and features used by other authors (esp. by Capela et al. [17], Mathie et al. [37], and Zhang and Sawchuk [38]), we have extracted 99 features of data, which are detailed in Table 1.

2.2. Feature Selection

Feature selection is the process of selecting a subset of relevant features for use in construction of the classification model. Successful selection of features allows for simplification of models to make them easier to interpret, to decrease model training times, and to better understand difference between classes. Using feature selection allows removing redundant or irrelevant features without having an adverse effect on the classification accuracy. There are four basic steps in a typical feature selection method [58]: generation of candidate feature subset, an evaluation function for feature candidate subset, a generation stopping criterion, and a validation procedure.

Further, we analyse several feature selection methods used in the HAR domain.

ReliefF [59] is a commonly used filter method that ranks features by weighting them based on their relevance. Feature relevance is based on how well data instances are separated. For each data instance, the algorithm finds the nearest data point from the same class (hit) and nearest data points from different classes (misses).

Matlab’s Rankfeatures ranks features by a given class separability criterion. Class separability measures include the absolute value of a statistic of a two-sample -test, Kullback-Leibler distance, minimum attainable classification error, area between the empirical Receiver Operating Characteristic (ROC) curve and the random classifier slope, and the absolute value of the statistic of a two-sample unpaired Wilcoxon test. Measures are based on distributional characteristics of classes (e.g., mean, variance) for a feature.

Principal component analysis (PCA) is the simplest method to reduce data dimensionality. This reduced dimensional data can be used directly as features for classification. Given a set of features, a PCA analysis will produce new data variables (PCA components) as linear combinations of the features with the highest variance in the subspace orthogonal to the preceding PCA component. As variability of the data can be captured by a relatively small number of PCs, PCA can achieve high level of dimensionality reduction. Several extensions of the PCA method are known such as kernel PCA, sparse PCA, and multilinear PCA.

Correlation-based Feature Selection (CFS) [60] is a filter algorithm that ranks subsets of features by a correlation-based heuristic evaluation function. A feature is considered to be a good one if it is relevant to the target concept but is not redundant to any of the other relevant features. Goodness of measure is expressed by a correlation between features, and CFS chooses the subset of features which has the highest measure. The chosen subset holds the property that features inside this subset have high correlation with the class and are unrelated to each other.

Table 2 summarizes the feature selection/dimensionality reduction methods in HAR.

A comprehensive review of feature selection algorithms in general as well as in the HAR domain can be found in [58, 6163].

2.3. Summary

Related work in the HAR domain is summarized in Table 3. For each paper, the activities analysed, types of sensor data used, features extracted, classification method applied, and accuracy achieved (as given by the referenced papers) are given.

3. Method

3.1. General Scheme

The typical steps for activity recognition are preprocessing, segmentation, feature extraction, dimensionality reduction (feature selection), and classification [24]. The main steps of activity recognition include (a) preprocessing of sensor data (e.g., denoising), (b) feature extraction, (c) dimension reduction, and (d) classification. The preprocessing step includes noise removal and representation of raw data. The feature extraction step is used to reduce large input sensor data to a smaller set of features (feature vector), which preserves information contained in the original data. The dimensionality reduction step can be applied to remove the irrelevant (or less relevant) features and reduce the computational complexity and increase the performance of the activity recognition process. The classification step is used to map the feature set to a set of activities.

In this paper, we do not focus on data preprocessing and feature extraction but rather on dimensionality reduction and classification steps, since these two are crucial for further efficiency of AAL systems. The proposed method for human activity recognition is based on feature dimensionality reduction using random projections [36] and classification using kernel density function estimate as a decision function (Figure 1).

3.2. Description of the Method

During random projection, the original -dimensional data is projected to a -dimensional () subspace using a random matrix . The projection of the data onto a lower -dimensional subspace is , where is the original set of   -dimensional observations. In the derived projection, the distances between the points are approximately preserved, if points in a vector space are projected onto a randomly selected subspace of suitably high dimension (see the Johnson-Lindenstrauss lemma [64]). The random matrix is selected as proposed by Achlioptas [36] as follows:

Given the low dimensionality of the target space, we can treat the projection of low-dimensional observations onto each dimension as a set of random variables for which the probability density function (PDF) can be estimated using kernel density estimation (KDE) (or Parzen window) method [65].

If , is a sample of a random variable, then the kernel density approximation of its probability density function iswhere is some kernel and is the bandwidth (smoothing parameter). is taken to be a standard Gaussian function with mean zero and variance 1 of the examined data features:

For a two-dimensional case, the bivariate probability density function is calculated as a product of univariate probability functions as follows:Here, and are data in each dimension, respectively.

However, each random projection produces a different mapping of the original data points which reveals only a part of the data manifold in higher-dimensional space. In case of the binary classification problem, we are interested in a mapping that separates data points belonging to two different classes best.

As a criterion for estimating the mapping, we use the Jaccard distance metric between two probability density estimates of data points representing each class. The advantage of the Jaccard distance metric as compared to other metrics of distance such as Kullback-Leibler (KL) divergence and Hellinger distance is its adaptability to multidimensional spaces where compared points show relations to different subsets. Therefore, it is well adapted to the developed model of human activity features, where according to description in the previous section we have divided them into some sets of actions. Furthermore, the computational complexity of the Hellinger distance is very high, while KL divergence might be unbounded.

The Jaccard distance, which measures dissimilarity between sample sets, is obtained by subtracting the Jaccard coefficient from 1 or, equivalently, by dividing the difference of the sizes of the union and the intersection of two sets by the size of the union:

In the proposed model, the best random projection with the smallest overlapping area is selected (see an example in Figure 2).

To explore the performance and correlation among features visually, a series of scatter plots in a 2D feature space is shown in Figure 3. The horizontal and vertical axes represent two different features. The points in different colours represent different human activities.

In case of multiple classes, the method works as a one-class classifier: recognizing instances of a positive class, while all instances of other classes are recognized as outliers of the positive class.

3.3. Algorithm

The pseudocode of the algorithms for finding the best projection and using it for classification in low-dimensional space is presented in Pseudocodes 1 and 2, respectively.

ALGORITHM:FindBestProjection
INPUT:     data1, data2 data for class1 and class2 [nxm matrices]
      threshold iterating parameter
OUTPUT: bestProjection
BEGIN
Jaccard = MAXINT;
WHILE (Jaccard > threshold)
generate Random Projection matrix projection
project m-dimensional data1 & data2 into 2D pdata1 & pdata2
FOREACH dimension of pdata1 & pdata2
calculate kernel density distributions
calculate Jaccard intersection of pdata1 & pdata2
END FOREACH
Memorize bestProjection with smallest Jaccard
END
RETURN bestProjection
END
ALGORITHM: BinaryClassify
INPUT:   sample [1xm matrix], bestProjection [mx2 matrix],
    density estimates fx1,fy1 (class +1), and
    fx2,fy2 (class -1) [1xn vectors]
OUTPUT:classLabel
BEGIN
pSample = sample * bestProjection
IF (fx1(pSample) * fy1(pSample) > fx2(pSample) * fy2(pSample))
LET classLabel = +1
ELSE
LET classLabel = -1
END
RETURN classLabel
END

4. Experiments

4.1. Dataset

To evaluate the performance of the proposed approach for HAR from the smartphone data, we used the part of the dataset (USC Human Activity Dataset [38]) recorded using the MotionNode device (sampling rate: 100 Hz; 3-axis accelerometer range: 6 g; 3-axis gyroscope range: 500 dps). The dataset consists of records recorded with 14 subjects (7 male, 7 female; age: 21–49) of 12 activities, 5 trials each. During data acquisition, MotionNode was attached on the front right hip of subjects.

The recorded low-level activities are as follows: Walking Forward (WF), Walking Left (WL), Walking Right (WR), Walking Upstairs (WU), Walking Downstairs (WD), Running Forward (RF), Jumping Up (JU), Sitting (Si), Standing (St), Sleeping (Sl), Elevator Up (EU), and Elevator Down (ED). Each record consists of the following attributes: date, subject number, age, height, weight, activity name, activity number, trial number, sensor location, orientation, and readings. Sensor readings consist of 6 readings: acceleration along -, -, and -axes and gyroscope along -, -, and -axes. Each trial was performed on different days at various indoor and outdoor locations.

4.2. Results

In Table 4, we describe the top three best features from Table 1 (see column Feature number) ranked by the Matlab Rankfeatures function using the entropy criterion.

The results of feature ranking presented in Table 5 can be summarized as follows:(i)For Walking Forward, Walking Left, and Walking Right, the important features are moving variance of acceleration and gyroscope data, movement intensity of gyroscope data, moving variance of movement intensity of acceleration data, first eigenvalue of moving covariance between acceleration data, and polar angle of moving cumulative sum of gyroscope data.(ii)For Walking Upstairs and Walking Downstairs, moving variance of gyroscope along -axis, movement intensity of gyroscope data, and moving variance of movement intensity are the most important.(iii)For Running Forward, moving variance of 100 samples of acceleration along -axis, moving variance of 100 samples of gyroscope along -axis, and moving energy of acceleration are distinguishing features.(iv)For Jumping Up, the most important features are moving variance of acceleration, moving variance of movement intensity, and moving energy of acceleration.(v)For Sitting, movement intensity of gyroscope data and movement intensity of difference between acceleration and gyroscope data are the most important.(vi)For Standing, moving variance of movement intensity of acceleration data, moving variance of acceleration along -axis, and first eigenvalue of moving covariance of difference between acceleration and gyroscope data are the most distinctive.(vii)For Sleeping, the most prominent features are first eigenvalue of moving covariance between acceleration data and moving variance of movement intensity of acceleration data.(viii)For Elevator Up and Elevator Down, the most commonly selected feature is moving variance of -axis of gyroscope data. Other prominent features are first eigenvalue of moving covariance of difference between acceleration and gyroscope data and moving energy of -axis of gyroscope data.

These results can be considered as consistent from what can be expected from the physical analysis of human motions in the analysed dataset.

The evaluation of HAR classification algorithms is usually made through the statistical analysis of the models using the available experimental data. The most common method is the confusion matrix which allows representing the algorithm performance by clearly identifying the types of errors (false positives and negatives) and correctly predicted samples over the test data.

The confusion matrix for within-subject activity recognition using Matlab’s Rankfeatures is detailed in Table 5. The classification was performed using 5-fold cross-validation, using 80% of data for training and 20% of data for testing. Grand mean accuracy is 0.9552; grand mean precision is 0.9670; grand mean sensitivity is 0.9482; grand mean specificity is 0.9569; grand mean recall is 0.9482; grand mean -score is 0.9482. The baseline accuracy was calculated using only the top 2 features selected by Rankfeatures, but without using random projections. The results show that features derived using random projections are significantly better than features derived using a common feature selection algorithm.

To take a closer look at the classification result, Table 5 shows the confusion table for classification of activities. The overall averaged recognition accuracy across all activities is 95.52%, with 11 out of 12 activities having accuracy values higher than 90%. If we examine the recognition performance for each activity individually, Running Forward, Jumping Up, and Sleeping will have very high accuracy values. For Running Forward, the accuracy of 99.0% is achieved. Interestingly, the lowest accuracy was achieved for Elevator Up activity, only 84.0%, while it was most often misclassified with Sitting and Standing. Elevator Down is misclassified with Elevator Up (only 69.7% accuracy). This result makes sense since Sitting on a chair, Standing, and Standing in a moving elevator are static activities, and we expect difficulty in differentiating different static activities. Also, there is some misclassification when deciding on a specific direction of activity; for example, Walking Left is confused with Walking Forward (77.4% accuracy) and Walking Upstairs (87.4% accuracy). Walking Upstairs is also confused with Walking Right (79.8% accuracy) and Walking Downstairs (70.8% accuracy). This is due to the similarity of any walk-related activities.

For comparison, the confusion matrix for within-subject activity recognition obtained using the proposed method with ReliefF feature selection is detailed in Table 6. The classification was performed using 5-fold cross-validation, using 80% of data for training and 20% of data for testing. Grand mean accuracy is 0.932; grand mean precision is 0.944; grand mean sensitivity is 0.939; grand mean specificity is 0.933; grand mean recall is 0.939; grand mean -score is 0.922.

The baseline accuracy was calculated using only the top 2 features selected using ReliefF, but without using random projections. Again, the results show that features derived using random projections are significantly better than features derived using the ReliefF method only.

Surprisingly, though the classification accuracy of the specific activities differed, the mean accuracy metric results are quite similar (but still worse, if grand mean values are considered). The features identified using ReliefF feature selection were better at separating Walking Forward from Walking Left and Standing from Elevator Up activities but proved worse for separating other activities such as Sitting from Standing.

For subject identification, the data from all physical actions is used to train the classifier. Here, we consider one-versus-all subject identification problem. Therefore, the data of one subject is defined as positive class, and the data of all other subjects is defined as negative class. In this case, also 5-fold cross-validation was performed, using 80% of data for training and 20% of data for testing. The results of one-versus-all subject identification using all activities for training and testing are presented in Table 7. While the results are not very good, they still are better than random baselines: grand mean accuracy is 0.477; precision is 0.125; recall is 0.832; and -score is 0.210.

If an activity of a subject has been established, separate classifiers for each activity can be used for subject identification. In this case, also 5-fold cross-validation was performed, using 80% of data for training and 20% of data for testing, and the results are presented in Table 8. The grand mean accuracy is 0.720, which is better than random baseline. However, if we consider only the top three walking-related activities (Walking Forward, Walking Left, or Walking Right), the mean accuracy is 0.944.

Finally, we can simplify the classification problem to binary classification (i.e., recognize one subject against another). This simplification can be motivated by the assumption that only a few people are living in an AAL home (far less than 14 subjects in the analysed dataset). Then, the data from a pair of subjects performing a specific activity is used for classification and training. Separate classifiers are built for each pair of subjects, the results are evaluated using 5-fold cross-validation, and the results are averaged. The results are presented in Table 9. Note that the grand mean accuracy has increased to 0.947, while, for the top three walking-related activities (Walking Forward, Walking Left, or Walking Right), the grand mean accuracy is 0.992.

5. Evaluation and Discussion

Random projections have been used in the HAR domain for data dimensionality reduction in activity recognition from noisy videos [69], feature compression for head pose estimation [70], and feature selection for activity motif discovery [71]. The advantages of random projections are the simplicity of their implementation and their scalability, robustness to noise, and low computational complexity: constructing the random matrix and projecting the data matrix into dimensions are of order .

The HAD dataset has been used in HAR research by other authors, too. Using the same HAD dataset, Zheng [66] has achieved 95.6% accuracy. He used the means and variances of magnitude and angles as the activity features and the magnitude and angles that were produced by a triaxial acceleration vector. Classifier used the Least Squares Support Vector Machine (LS-SVM) and Naïve-Bayes (NB) algorithm to distinguish different activity classes. Sivakumar [67] achieved 84.3% overall accuracy using symbolic approximation of time series of accelerometer and gyroscope signal. Vaka [68] achieved 90.7% accuracy for within-person classification and 88.6% accuracy for interperson classification using Random Forest. The features used for the recognition were time domain features: mean, standard deviation, correlation between and , correlation between and , correlation between and , and root mean square of a signal. Our results (95.52% accuracy), obtained using the proposed method, are very similar to the best results of Zheng for activity recognition task.

The results obtained by different authors using the USC-HAD dataset are summarized in Table 10.

We think that it would be difficult to achieve even higher results due to some problems with the analysed dataset, which include a set of problems inherent to many Human Activity Datasets as follows:(i)Accurate Labelling of All Activities. Existing activity recognition algorithms usually are based on supervised learning where the training data depends upon accurate labelling of all human activities. Collecting consistent and reliable data is a very difficult task since some activities may have been marked by users with wrong labels.(ii)Transitionary/Overlapping Activities. Often people do several activities at the same time. The transition states (such as walking-standing, lying-standing) can be treated as additional states, and the recognition model can be trained with respect to these states to increase the accuracy.(iii)Context Problem. It occurs when the sensors are placed at an inappropriate position relative to the activity being measured. For example, with accelerometer-based HAR, the location where the phone is carried, such as in the pocket or in the bag, impacts the classification performance.(iv)Subject Sensitivity. It measures dependency of the trained classification model upon the specifics of user.(v)Weak Link between Basic Activities and More Complex Activities. For example, it is rather straightforward to detect whether the user is running, but inferring whether the user is running away from danger or jogging in a park is different.(vi)Spurious Data. Most published studies handle the problem of the fuzzy borders by manual data cropping.

6. Conclusion

Monitoring and recognizing human activities are important for assessing changes in physical and behavioural profiles of the population over time, particularly for the elderly and impaired and patients with chronic diseases. Although a wide variety of sensors are being used in various devices for activity monitoring, the positioning of the sensors, the selection of relevant features for different activity groups, and providing context to sensor measurements still pose significant research challenges.

In this paper, we have reviewed the stages needed to implement a human activity recognition method for automatic classification of human physical activity from on-body sensors. A major contribution of the paper lies in pursuing the random projections based approach for feature dimensionality reduction. The results of extensive testing performed on the USC-HAD dataset (we have achieved overall accuracy of within-person classification of 95.52% and interperson identification accuracy of 94.75%) reveal the advantages of the proposed approach. Gait-related activities (Walking Forward, Walking Left, and Walking Right) allowed the best identification of subjects opening the way for a multitude of applications in the area of gait-based identification and verification.

Future work will concern the validation of the proposed method using other datasets of human activity data as well as integration of the proposed method in the wearable sensor system we are currently developing for applications in indoor human monitoring.

Competing Interests

The authors declare that there are no competing interests regarding the publication of this paper.

Acknowledgments

The authors would like to acknowledge the contribution of the COST Action IC1303 AAPELE: Architectures, Algorithms and Platforms for Enhanced Living Environments.