Abstract

Nowadays, activity recognition is a central topic in numerous applications such as patient and sport activity monitoring, surveillance, and navigation. By focusing on the latter, in particular Pedestrian Dead Reckoning navigation systems, activity recognition is generally exploited to get landmarks on the map of the buildings in order to permit the calibration of the navigation routines. The present work aims to provide a contribution to the definition of a more effective movement recognition for Pedestrian Dead Reckoning applications. The signal acquired by a belt-mounted triaxial accelerometer is considered as the input to the movement segmentation procedure which exploits Continuous Wavelet Transform to detect and segment cyclic movements such as walking. Furthermore, the segmented movements are provided to a supervised learning classifier in order to distinguish between activities such as walking and walking downstairs and upstairs. In particular, four supervised learning classification families are tested: decision tree, Support Vector Machine, -nearest neighbour, and Ensemble Learner. Finally, the accuracy of the considered classification models is evaluated and the relative confusion matrices are presented.

1. Introduction

The ability to recognize daily activities or particular movements may be helpful in many contexts such as user mobility identification [1], monitoring of daily activities [2], and applications such as patient and sport activity monitoring [3, 4], surveillance [5], and navigation [6]. Activity recognition (AR) can be carried out by acquiring kinematic data from body-worn sensors. Nowadays, the Microelectromechanical Systems (MEMS) [7] technology made low-cost sensors available on the market so that an increasing number of mobile devices are already supplied with MEMS sensors. Most of these devices encompass accelerometer, gyroscope, and altimeter and can be employed for various activity recognition studies [8]. By focusing on navigation solutions, AR may play an important role in Pedestrian Dead Reckoning (PDR) applications which can be used for several dynamic human jobs and actions [6].

PDR techniques are one of the big navigation challenges since they would enable navigation systems to rely more on body-worn sensors and less on external infrastructures such as GPS system [9].

The proposed technique aims to extend a PDR scheme focused on walking movement only [10], to a generalized PDR algorithm able to distinguish several daily movements, as illustrated in Figure 1. In particular, we focus on the movement segmentation and movement classification blocks that are highlighted in Figure 1. In the present work, the acceleration signal provided by a body-worn sensor is acquired through consecutive time windows of 1-second duration with an overlapping factor of 50%; then, the signal is segmented into portions which include unclassified activity movements. Moreover, a supervised learning technique performs the classification of the segmented cyclic movements such as walking and walking upstairs and downstairs.

Movement segmentation procedure exploits Continuous Wavelet Transform (CWT) [11] and the movement classification block extracts representative features [1214] from the movement and applies different classifier belonging to the following families: decision tree (DT), Support Vector Machine (SVM), -nearest neighbour (NN), and Ensemble Learner (EL) [14, 15]. Finally, the computed classification models are compared by evaluating the predictive accuracy.

An AR dataset has been collected from the recordings of 10 subjects performing low-level activities while carrying a belt-mounted smartphone Samsung Galaxy S4 as shown in Figure 2. Each subject is required to perform four trials for each considered activity: walking and walking downstairs and upstairs. This work is aimed at extending the method proposed in [16] by a more effective segmentation procedure and relying on a new dataset.

In Section 2 the related works and motivation are presented. In Section 3 the proposed approach for daily movement recognition is described. In Section 4 the movement segmentation process is taken into consideration with specific attention for the cyclic movement detection and Segmentation block. In Section 5 the feature extraction mechanism of the classification system is described and evaluated thoroughly. In Section 6 we describe the classification algorithms and the cross validation procedure. In Section 7 the performance assessment of the implemented classifiers is presented; in particular, we evaluate the accuracy of the considered classification models and the relative confusion matrices.

When dealing with PDR systems, the step can be detected from either the signal of acceleration or angular rate [17] as illustrated in Figure 1. In particular, when focusing on step detection through the acceleration signal analysis, the steps can be revealed by performing several techniques depending on the on-body sensor location [17, 18]. In the case of foot-mounted sensor, the step detection can easily be performed by identifying the stance and swing phases of the foot corresponding to zero velocity periods (ZVPs) [1820]. The latter approach cannot be applied when dealing with handheld or belt-mounted sensors [18, 21], since these configurations do not lead to a zero velocity period occurrence; in these cases zero-crossing (ZDT) or peak detection (PDT) can be adopted for the step detection routine. However, both PDT and ZDT may lead to revealing multiple steps when actually only one step occurs [18], leading to detection errors which can affect the resulting PDR navigation solution.

The first part of the present work introduces a possible solution to overcome the issue of step detection errors. Continuous Wavelet Transform (CWT) analysis is exploited to segment the acceleration signal by detecting the singularities which refer to the movement boundaries and occur between consecutive cyclic movements as the walking. This approach leads to the fundamental advantage of enabling the movement detection process to discard the singularities which refer to the acceleration signal irregularities and take into account only the ones that actually refer to movement boundaries which separate consecutive movements, as illustrated in Figure 3. The segmentation process will be described in Section 4. Wavelet analysis in particular the Discrete Wavelet Transform (DWT) has been investigated in literature in either gait analysis [22] or feature computation for movement classification techniques [23], while the CWT employment in the movement segmentation is a novel approach.

As mentioned in [1214], activity recognition (AR) usually acquires the signal through time windows which are typically a few seconds long in order to reveal a particular activity in a certain time interval, for instance, by exploiting acceleration signal features [24]. This approach can be adopted to get landmarks on building maps in order to calibrate the PDR routines particularly in locations such as stairs or elevator [10]. The signal information in the considered time window most likely does not refer to an individual movement but rather to a group of movements which may also involve different activities. Moreover, we have to consider that a wider time window leads the AR system to a loss in the real-time processing capability. The signal segmentation procedure able to separate portions of signal which refer to single cyclic movements would enable an AR system to classify individual movements. As mentioned in [18], activity recognition has been investigated to recognize different step modes and device poses.

The second part of the present work aims to classify the cyclic movements such as walking and walking upstairs and downstairs which have been previously segmented through the CWT-based segmentation routine. In particular, supervised learning techniques such as DT, SVM, NN, and EL are performed in the classification routines.

3. Proposed Approach

The present work considers an AR system which combines a CWT-based signal segmentation and supervised learning techniques (SLTs) to classify the segmented movements. The CWT operation performs the segmentation of the acceleration signal into portions which refer to individual movements. Furthermore, SLTs classify the segmented movements such as walking and walking upstairs and downstairs by exploiting algorithms such as DT, SVM, NN, and EL. The proposed approach considers a body-worn acceleration sensor which acquires the signal through a time window of 1-second duration with an overlapping factor of 50%. Since the signal segmentation routine aims to detect particular time events in the signal, the overlapping factor is crucial to avoid missing detections at the boundaries of the considered time window. Section 7 describes how the overlapping factor can affect the performance of the movement recognition system, in particular the accuracy of the classification algorithms.

This contribution aims to enhance the traditional Pedestrian Dead Reckoning (PDR) routine through a generalized movement classifier as illustrated in Figure 1.

The acceleration signal within the current time window is processed by the movement segmentation block as shown in Figure 4: it aims to detect and segment cyclic movement activity exploiting CWT operation as depicted in Figure 5. Then, the segmented individual movement is provided as input to the feature extraction routine which is shown in Figures 4 and 6: this procedure is adopted to extract features and pass them to the classification algorithm. The scheme in Figure 4 represents the employed classification system which involves three primary blocks: feature extraction, classification and cross validation, and performance assessment. In order to perform the evaluation of the proposed procedures on real movements, an activity dataset has been realized considering 10 subjects in the 25–65 age range performing low-level human activities. For the classification purpose of our work we consider the activities such as walking forward and walking upstairs and downstairs, which are more prone to critical recognition analysis and can often be mistaken. Each subject was asked to perform four trials for each activity on different days. Acceleration data is captured by 3-axis acceleration sensor at 100 Hz sample rate. The accelerometer is integrated in a Samsung Galaxy S4 smartphone which is belt-mounted as shown in Figure 2. The dataset has been collected inside a shopping mall. Figure 7 shows the footpath followed by the subjects who participated in the recordings and the indoor scenario.

4. Movement Segmentation

The movement segmentation block processes the acceleration signal within the current time window; firstly a low-pass filtering is performed in order to identify the signal DC component. The latter is then subtracted from the original signal to separate the acceleration component due to the gravity (DC component) and the one related to the linear acceleration [14], as shown in Figure 4.

The orientation of the smartphone is regulated as shown in Figure 2 in order to match the movement direction in the horizontal plane with the -axis of the accelerometer body integrated into the smartphone. By observing Figure 8(a) and the -axis in particular, it is reasonable to search for the singularities between adjacent step movements in order to separate consecutive movements and to extract the waveform which refers to an individual step. The -axis signal is evaluated because it shows the clearest singularities among all axes. Figure 3 shows how the segmentation process should work by selecting the singularities at the movement boundaries and discarding the undesired ones.

The Discrete Fourier Transform (DFT) of the signal does not provide time information about the location in which the signal singularities occur within the considered time window, while it provides the frequency content information of the signal [11]. For this reason, the DFT is not suitable for segment movements through the proposed approach.

On the other hand, Continuous Wavelet Transform (CWT) permits bringing out specific time events by decomposing the signal over dilated and translated wavelets [11]. A wavelet is a short waveform generally named : and centered in the neighbour of such thatSuppose that is a real wavelet; the corresponding Real Wavelet Transform of the function is given aswhere indicates conjugate complex operator. This operator measures the variation of the function in a neighbour of proportional to . In our case, the variable represents the time translation variable which enables the wavelet to scan the entire time window. On the other hand, the variable represents the scale which permits dilating the wavelet in order to bring out particular events coming up in a certain time within the analyzed signal waveform. Expression (2) represents the inner product between the given function which in our case represents the acceleration signal and a particular wavelet. The result of the inner product is directly proportional to the correlation between the function and the appropriately dilated wavelet. Since singularities between movements are searched for, as shown in Figure 3, high correlation values are expected when considering fine scales of the wavelet. By knowing that the local Lipschitz regularity of a function at a particular time depends on the decay at fine scale of in the neighbour of , singularities can be evaluated from the local maxima values of [11].

We describe below how to detect singularities which separate consecutive movements in the -axis acceleration signal which refers to the walking activity as it is shown in Figure 8(a).

First, a suitable wavelet is chosen for our application: Mexican hat wavelet guarantees that the maxima of the CWT modulus that are located at belong to a maxima line that propagates toward finer scales [11]. The mathematical expression of this wavelet isand the respective waveform is illustrated in Figure 9. Afterwards, we consider the scale range for values of ranging from 1 to 16 (this scale range has been heuristically selected: the good performance of the proposed algorithm supports this choice).

The upper graph of Figure 10 represents the linear acceleration signal on the -axis during a walking activity (Figure 8(a)); the central plot illustrates the CWT coefficients of the relative acceleration signal considering a Mexican hat wavelet; the lower plot brings out the local maxima (light-blue stars) of the performed CWT and shows four maxima lines whose start is relative to and they converge to the target singularities at finer scales, thus permitting identifying the step movement left and right boundaries. In particular, by observing the lower plot in Figure 10, the second and forth maxima lines starting from the left side of the figure represent the singularities occurring in the upper plot of Figure 10 at the and , respectively. By focusing on the singularities which include the left and right boundaries, as shown in Figure 3, the acceleration signal which refers to the individual step movement can be extracted considering the signal included between the lower pair of singularities. Finally, the linear acceleration of the segmented movement is delivered to the movement classification block as shown in Figure 4.

5. Feature Extraction

The feature extraction is a crucial operation in classification problems and mainly determines classifier performance. If the extracted features are characterized by poor class separation attributes, the target classes will not be effectively distinguished by means of representative features. In the activity recognition context, the feature extraction procedure is carried out by many approaches [1214]. In the present work, the feature extraction block that is depicted in Figure 6 receives the segmented acceleration signal which refers to an individual cyclic movement and has to be classified from the classification routine. The signal undergoes the following operations: Fast Fourier Transform (FFT) of the segmented acceleration signal on each axis and extraction of the frequency which corresponds to the maximum value of the FFT magnitude, as shown in Figure 6. Afterwards, a low-pass filtering process is performed around the maximum through an equiripple FIR filter. Table 1 lists the filter parameters. The considered features in the feature extraction routine are the energy of both the current and the filtered linear acceleration signal [24], the mean of the current acceleration, and the variance of both the current and the filtered signal. We consider also the correlation coefficients between the current acceleration axes and the correlation coefficients between the filtered acceleration axes.

Finally, we consider the maximum value of the magnitude of the FFT operation on the current acceleration signal for each axis. The features are summarized and listed in the following.

Feature Description

F1, F2, F3:

F4, F5, F6:LPF: Low-pass Filtered.

F7, F8, F9:

F10, F11, F12:

F13, F14, F15:

F16:

F17:

F18:

F19:

F20:

F21:

F22:

F23:

F24:

6. Classification Algorithms and Cross Validation

In Section 5 we explained how to extract features from the target movements in order to discriminate them. In this section we define a classification algorithm that allows building up a mathematical model which is able to effectively map the features to the respective classes.

In the present work, four families of classification algorithms are considered: decision tree (DT), Support Vector Machine (SVM), -nearest neighbour (NN), and Ensemble Learner (EL) [2, 14, 15]. For each family we selected three algorithms with a specific trade-off between computational load and classification accuracy, as shown in Figure 11.

As in the regression, when dealing with classification, we may encounter overfitting which is responsible for losing prediction potential of the model. Overfitting may be originated by a too complex predictive model, that is, when we have too many features and a relative low number of examples. Moreover, overfitting also depends on the model structure with the data shape: the model must have generalization capabilities.

In order to avoid overfitting and evaluating the predictive accuracy of the fitted models, we can use the cross validation approach [13]. In this work, a validation scheme has been chosen before training any models so that in our session all the models are compared using the same validation method. In particular, the 10-fold validation scheme has been performed by selecting 10 folds in which the data are partitioned. Afterwards, we trained each single model for each fold using all data outside the fold. Then, we tested the performance of each model using the data inside the fold; finally we calculated the average test error over all folds. This method gives a good estimate of the predictive accuracy of the final model that has been trained with all the data.

7. Performance Assessment

In this section the results of the classification algorithms which are described in Section 6 are reported and evaluated. The performance assessment of the classification models takes into account the same cross validation approach for all the adopted models in order to allow a coherent comparison. Supervised learning field provides various classification algorithms with different approaches, each of which presents different characteristics in terms of memory usage, fitting speed, prediction speed, and predictive accuracy [13].

The DT algorithm produces a classification tree by training examples and separates the classes through splitting down the tree [24]. It allows a fast prediction, low memory usage, and medium predictive accuracy.

The SVM is a classification model which tries to find the hyperplane that maximizes the separation margin between the training data points of any classes. Depending on the kernel function, the SVM can effectively separate the training data [13, 18]: it has a very good predictive accuracy but low memory usage and fast predictive speed only for a few support vectors.

The NN algorithm tries categorizing new data based on their distance from neighbours in the training set. It envisages high memory usage and provides a very good predictive accuracy and a fast prediction speed only with low feature space dimension.

Finally, the EL fuses results from weak learners to build one high-performance classifier [13]. The three considered EL are boosted trees, subspace NN, and subspace discriminant, all of them with a number of learners equal to 200.

The performance of the classification algorithms is evaluated by the misclassification rate. The accuracy results relative to the considered supervised learning classification routines described in Section 6 are shown in Figure 11. The latter illustrates four groups of bars, each corresponding to a particular supervised learning classification family: the DT algorithms involve three different values of maximum split number: 5 (1st bar), 10 (2nd bar), and 100 (3rd bar); the SVM models refer to a linear kernel (4th bar), quadratic kernel (5th bar), and cubic kernel (6th bar); the three NN models refer to different number of neighbours: 100 (7th bar), 10 (8th bar), and 1 (9th bar); finally, the 10th, 11th, and 12th bar refer to three different EL models with 200 learners: boosted tree, subspace discriminant, and subspace NN, respectively. The SVM model has revealed the best accuracy performance with respect to the other classifiers: in particular the model with cubic kernel is the most accurate (6th bar in Figure 11). Also the NN model (7th, 8th, and 9th bar in Figure 11) has obtained very good accuracy results in classifying the testing movements. For each of the considered classification algorithm families (DT, SVM, NN, and EL), the confusion matrices of the classifier with better accuracy are provided, as shown in Tables 2, 3, 4, and 5. The confusion matrices are provided in order to determine the movements that are the most likely to be mixed up. The results brought out the downstairs movement that has the highest false negative rate and it is the one that produces the majority of the misclassification errors on the other movements. This is because the singularities which refer to the movement boundaries in the -axis acceleration values recorded during the walking downstairs activity are less clear with respect to singularities produced by the other movements.

The classification system performance also depends on the effectiveness of the movement segmentation algorithm described in Section 4. The accuracy performance of the considered classification algorithms which rely on the proposed CWT-based segmentation technique has been compared with the accuracy resulting from the segmentation based on a zero-crossing detection approach [16, 18]. Figure 12 shows that the CWT-based segmentation leads to an improvement in terms of classification accuracy. In particular, Figure 12 illustrates three groups of bars, each corresponding to a particular supervised learning classification family: the 1st group refers to the DT, the 2nd group refers to the SVM, and the 3rd group refers to NN. Each group of bars includes three subgroups of two bars: the 1st subgroup refers to the coarse accuracy classifiers (C-), that is, a maximum number of splits equal to 5 for the DT, 100 neighbours for NN, and linear Kernel for SVM; the 2nd subgroup refers to the medium accuracy classifiers (M-), that is, a maximum number of splits equal to 10 for the DT, 10 neighbours for NN, and quadratic kernel for the SVM; the 3rd subgroup refers to the fine accuracy classifiers (F-), that is, one split for the DT, one neighbour for NN, and cubic kernel for the SVM. CWT refers to the classification algorithm which relies on the CWT-based segmentation, while ZDT refers to the classification algorithm which relies on the zero-crossing detection to segment the acceleration signal.

Moreover, the classification algorithms have been evaluated by considering different overlapping factors of the time window: 0%, 25%, and 50%. The value of 50% provides the best accuracy performance, as depicted in Figure 13. The latter shows four groups of bars, each referring to a particular medium accuracy classifier: the DT with a maximum number of splits equal to 10 (1st, 2nd, and 3rd bar), the SVM with quadratic kernel (4th, 5th, and 6th bars), NN with 10 neighbours (7th, 8th, and 9th bar), and the Ensemble Learner with subspace discriminant (10th, 11th, and 12th bar).

8. Conclusion

In this paper, a generalized movement classifier for Pedestrian Dead Reckoning applications has been proposed. In particular, movement segmentation and classification routines have been performed. The first one allowed effective segmenting of cyclic movements exploiting Continuous Wavelet Transform (CWT) operation. The second one carried out a supervised learning classification technique in order to classify the segmented movements.

The segmentation procedure has permitted using short-time analyzing window and identifying individual movements by detecting the signal singularities which characterize the movement boundaries, thus enabling the classification procedure to classify single movements and potentially extending the Pedestrian Dead Reckoning to further activities.

The segmentation block receives acceleration data which are acquired at 100 Hz sample rate by the sensor integrated in waist-mounted Samsung Galaxy S4 smartphone. For the purpose of this work we focused on the activity such as walking forward and walking downstairs and upstairs, which are very similar and prone to critical recognition analysis. The classification system has been described by focusing on the primary feature extraction and classification blocks. Furthermore, four supervised learning classification families have been tested, namely, decision tree (DT), Support Vector Machine (SVM), -nearest neighbour (NN), and Ensemble Learner (EL); for each of these families three classification algorithms have been selected based on the computational load and the classification accuracy trade-off. Finally, the accuracy of the considered classification models has been evaluated, thus showing that the SVM and NN have revealed the best accuracy performance. The confusion matrices are provided to evaluate which movements are more likely to be mixed up. The results brought out the walking downstairs movement that has the highest false negative rate and it is the one that produces most of the misclassification errors on the other movements.

Competing Interests

The authors declare that they have no competing interests.