Abstract

Lower limb activity recognition utilizing body sensor data has attracted researchers due to its practical applications, such as neuromuscular disease detection and kinesiological investigations. The employment of wearable sensors including accelerometers, gyroscopes, and surface electromyography has grown due to their low cost and broad applicability. Electromyography (EMG) sensors are preferable for automated control of a lower limb exoskeleton or prosthesis since they detect the signal beforehand and allow faster movement detection. The study presents hybrid deep learning models for lower limb activity recognition. Noise is suppressed using discrete wavelet transform, and then the signal is segmented using overlapping windowing. Convolutional neural network is used for temporal learning, whereas long short-term memory or gated recurrent unit is used for sequence learning. After that, performance indices of the models such as accuracy, sensitivity, specificity, and F-score are calculated. The findings indicate that the suggested hybrid model outperforms the individual models.

1. Introduction

Lower limb activity recognition (LLAR) has increased in popularity due to its ability to monitor or identify daily lower limb human actions in a range of applications such as elderly health monitoring, surveillance and security systems, human fall detection, and so on [1, 2]. The two methodologies utilized for acquiring human activity data are visual and wearable sensors [3]. Wearable sensors such as inertial measurement units, goniometers, and sEMG electrodes are placed on the subject’s body for data collection [4]. The vision-based approach has limited capability in terms of applicability, security and complexity [5]. Wearable sensors have seen significant technical advancements in recent times. It results in a lower overall cost, making it more accessible. Popular sensors used in wearable research for activity recognition include inertial measurement units, accelerometers, gyroscopes, electromyography, and barometers [6]. EMG sensors are better than others since they can predict movement in advance in a very short amount of time. Out of these sensors, the EMG sensors are superior because they can anticipate movement in advance in a very short amount of time [7, 8]. Neuromuscular activity generates the biological signal known as the EMG signal. It can be detected by the electrical currents in muscles during the muscle contraction. Surface (non-invasive) EMG and intramuscular (invasive) EMG are the two approaches that are employed for the recording of the EMG signal [9]. Intramuscular electromyogram (iEMG) signals are captured by placing the wire electrodes within the muscles, whereas surface electromyogram (sEMG) signals are captured by placing the surface electrodes just above the muscle’s surface. The following are the advantages of sEMG over the iEMG [10]:(1)There is no requirement of medical supervision for placing the electrodes; also, there is no discomfort.(2)Infection risk is mitigated.

Surface EMG signals are used in a wide range of healthcare applications, which include the control of prostheses or exoskeletons, neuromuscular disease assessment, activity monitoring, and many more [1113]. According to Kiguchi et al., sEMG signal-based neuro-fuzzy approach can be used to control an upper limb robotic exoskeleton [14]. It is found that sEMG signals can be used for multiple applications. Krasin et al. [15] proposed the low-cost elbow joint powered exoskeleton. The major goal of this sEMG signal-based exoskeleton is for strengthening the biceps brachii. Sharmila et al. [16] presented a low-cost sEMG-controlled prosthetic arm for upper limb amputees. Sensors are employed for the recording of the sEMG signals from the muscles during various activities to control prostheses autonomously. Then, the actuators can be controlled using artificial intelligence methodology. Pancholi and others [17] developed hardware for amputees to recognize real-time arm gestures. Vijayvargiya et al. developed low-cost sEMG data acquisition system for collecting the sEMG signals [18]. Cai et al. [19] identified the upper limb motion pattern for controlling a rehabilitation robot using sEMG data and a support vector machine approach.

Compared to lower limb sEMG signals, much emphasis has been dedicated to the classification and pattern recognition of upper limb sEMG signals in the recent decade. Classification of lower limb sEMG signals has shown to be more complicated than the classification of upper limb signals. It is because of the complexity induced due to the inherent coupling of lower limb sEMG signals. Souit et al. [20] presented the control approach using sEMG for a lower limb exoskeleton. The exoskeleton can be operated autonomously utilizing artificial intelligence approaches by analyzing the sEMG signal produced by muscles during various activities. Khimraj et al. [21] investigated classification between six lower limb activities and evaluated the performance of various machine learning classifiers for the same. Silva et al. [22] carried out research investigation on spinal cord damage based on the EMG signal that has been captured while the upper limb movements were being performed. Vijayvargiya et al. [23] investigated the detection of knee abnormalities using unbalanced sEMG data for walking activity. Here, the authors demonstrated the impact of an imbalanced signal on the model performance for the detection of a knee issue and evaluated the performance of several oversampling methods to improve machine learning model performance. Ertugrul et al. [24] proposed an adaptive local binary pattern (ALBP) approach for the retrieval of the characteristics and classification of healthy and abnormal knee participants with an accuracy of 85%. Handcrafted characteristics from signals are required for machine learning models, which may be extracted using statistical methods. Choosing the proper feature set manually is a tedious process. According to the available literature, deep learning models such as CNN, LSTM, GRU, and other techniques have been employed to solve the problem [25]. The features are retrieved first by the algorithm in deep learning techniques, and then the classification procedure is carried out. According to earlier studies, these deep learning models have been used for various applications and have demonstrated very high performance [2630].

This study aims to apply hybrid deep learning algorithms to identify lower limb activity. The authors present the models that integrate the advantage of convolutional neural network (CNN), long short-term memory (LSTM), and gated recurrent unit (GRU). CNN architecture has multiple layers such as the input layer, output layer, dense layer, convolutional layer, rectified linear unit layers, and dropout layers. One of the significant issues with CNN is its inability to analyze the characteristics of the time-series data such as the previous or temporal data. Therefore, the LSTM or GRU can be employed for the analysis of data which efficiently collects temporal information present in the data. As a result, a combination of CNN with LSTM (CNN-LSTM) or CNN with GRU (CNN-GRU) is proposed as the better strategy for processing EMG data. In the suggested hybrid models, a convolutional neural network provides temporal learning, whereas the LSTM or GRU is used for capturing sequence-to-sequence learning. The major contributions of this study are as follows:(1)Surface electromyography (sEMG) data acquired from leg muscles are used in this study to examine lower limb movements in healthy and knee deformity individuals by using the hybrid deep learning framework.(2)The preprocessing technique known as wavelet denoising is applied in order to eliminate noises of sEMG signal.(3)Hybrid deep learning models, CNN-LSTM and CNN-GRU, are proposed for recognition of lower limb activities. Here, a convolutional neural network (CNN) is used for temporal learning, while long short-term memory (LSTM) or gated recurrent unit (GRU) is used for sequence learning.(4)The proposed hybrid CNN-GRU model has high performance compared to the existing models.

2. Dataset

The authors used the publicly accessible sEMG signal dataset from the UCI machine learning repository by Sanchez et al. in their study [24, 31]. The data comprise sEMG signals from the lower extremities of 22 individuals over the age of 18, 11 of which have been healthy and 11 of which have known knee injuries. There is no history of knee injury or discomfort in the healthy participants. The sciatic nerve was injured in one abnormal knee subject, the anterior cruciate ligament (ACL) was injured in six abnormal knee subjects, and the meniscus was injured in the remaining four abnormal knee subjects. The sEMG signals were acquired using a Biometrics Ltd. DataLog MWX8 and a goniometer while the participants performed one of three tasks: walking, sitting, and standing. The biceps femoris (BF), vastus medialis (VM), rectus femoris (RF), and semitendinosus (ST) muscles had their sEMG data taken with goniometer affixed towards the outer surface of the knee. The sEMG signal was recorded on the damaged limb of the person with a defective knee and the left leg of the healthy individuals. The data were collected at a sample rate of 1 kHz with a resolution of 14 bits. A band-pass filter with a passband frequency of 20 Hz to 460 Hz has already been applied to the sEMG signals. The sEMG signals collected throughout each activity by the healthy and abnormal knee subjects are shown in Figure 1.

3. Proposed Methodology

This section explains the methods used for lower limb activity recognition (wavelet denoising, segmentation, and deep learning frameworks). Figure 2 depicts the proposed deep learning-based method for the identification of the lower limb activity based on the sEMG signals. First, noise is removed from the raw sEMG signal with the help of discrete wavelet transform, and then the signal is segmented using the overlapping windowing technique. After that, deep learning models CNN, CNN-LSTM, and CNN-GRU are applied to identify the lower limb activities in healthy and abnormal knee individuals. In these hybrid models, a convolutional neural network (CNN) is used for temporal learning, while long short-term memory (LSTM) or gated recurrent unit (GRU) is used for sequence learning.

3.1. Wavelet Denoising

Various types of noises have been interlaced with the sEMG signals during the recording. The most prominent noises are as follows [32]:(i)Electronic devices and electromagnetic interference generate inherent and ambient noises, respectively.(ii)The subject’s walking leads to electrodes’ movement, which induces the artifacts.(iii)The firing rate of motor units can lead to the inherent instability noise.

These noises can badly impact the performance of the classifier. Therefore, it is required to filter the noise from the sEMG signal. There are various topologies proposed by various researchers in the literature. Traditional approaches such as low pass, high pass, and band pass can be employed for the filtering of noises that do not fall within the frequency band of the sEMG signal, which is 20 to 460 Hz. However, these methods have failed to filter noise in the active spectrum of sEMG signals. In recent times, several researchers have successfully employed empirical mode decomposition, independent component analysis, and wavelet decomposition for the filtering of noises from the sEMG signals [33].

The use of wavelet decomposition [34, 35] has seen a rising trend in sEMG signal denoising for both the upper and lower limbs. This is because it can effectively eliminate the white Gaussian noise from the signal. In the wavelet method, firstly, the mother wavelet function is selected. Then, the frequency and temporal analysis can be performed by the low and high-frequency versions of the wavelet, respectively. Wavelet decomposition includes the following steps:(i)The signal is decomposed using the discrete wavelet transform.(ii)A threshold is chosen.(iii)The signal is reconstructed using the inverse wavelet transform.

Decomposition generates the approximation and detail coefficients. The level of decomposition decides the number of the coefficients. After this, thresholding is applied. Thresholding forced all coefficients to zero, which is below a certain threshold. Afterward, the signal is reconstructed again. By scaling (s) and translating a single basic wavelet , many mother wavelets are created. The mathematical expression of basic wavelet is presented in the following equation:

As per literature, multiple thresholding approaches are employed, such as hard, soft, and universal. In this work, the authors have employed a db4 mother wavelet from the Daubechies family with four decomposition levels and applied garotte thresholding on second detail coefficients.

3.2. Segmentation

Since the sEMG signal’s characteristic is stochastic, it is suggested by Vijayvargiya et al. to segment the sEMG signal into smaller portions. However, different lengths of sEMG signal impact the classifier accuracy. Therefore, commonly used windowing topologies such as adjacent and overlapping are employed on the signal segmentation. Here, the authors have used the overlapping-based windowing technique with a window size of 256 ms and 25% overlapping [36].

3.3. Deep Learning Framework
3.3.1. Convolutional Neural Network (CNN)

Generally, the artificial neural network has a fully connected network of neurons, which means every neuron is connected via weights to the next layer completely. This connectivity can lead to overfitting of the network. To address this problem, several researchers in the literature have presented regularization techniques, which involve the magnitude of the weight-to-loss ratio. Similarly, CNN has been adopted for regularization because of its ability to progressively exploit the dataset’s structure by gathering the simple structures. It uses the principle of convolution instead of matrix pointwise multiplication. An additional advantage is that these networks have fewer parameters as compared to other fully connected neural networks. Therefore, the training of these networks is fast. A CNN includes a sequence of multiple layers such as input, hidden, and output layers. Hidden layers constitute the three different types of layers which have convolutional, max-pooling, and fully connected layers. The convolutional layer convolves with multiplication. Generally, ReLU is actively employed as the activation function in the neuron.

Mainly, CNN is employed on 2D datasets such as video or images. Therefore, CNN is named as the 2D CNN. As per the literature, some authors have modified 2D CNN to 1D CNN [37].

Mathematical expressions of layers of CNN are described in equations (2) to (9).(1)Convolution Layer.(i)Forward propagation:(ii)Backpropagation to update the weight:(iii)Backpropagation to previous layer:(2)Max-Pooling Layer.(i)Forward propagation:(ii)Backpropagation:(3)Fully Connected Layer.(i)Forward propagation of ReLU activation function:(ii)Backpropagation of ReLU activation function:where x is the input, ak denotes the output after convolution layer k, k denotes the layer index, W denotes the kernel (filter), mn denotes the filter size, MN denotes the input size, b denotes the bias, and E denotes the cost function.

3.3.2. Long Short-Term Memory (LSTM)

Conventional neural networks failed to achieve the desired accuracy for sequential time-series datasets because they do not have any memory element to store the previous state. This means it only relies on the present state. Therefore, the recurrent neural network allows a feedback mechanism to learn the previous step state. This structure of recurrent provides better handling of the time-series data. However, it can often encounter a problem when the network becomes deeper. Vector-Jacobian gradient vector tends to shrink due to its repetitive nature. This can lead to training being slower and less effective. This phenomenon is known to be a gradient vanishing problem. Thus, these recurrent neural networks perform badly in terms of long-term dependency. Thus, the long short-term memory (LSTM) [38] can effectively deal with the gradient vanishing problem by tackling the long-term dependencies.

The architecture of LSTM consists of an LSTM cell. Each cell contains a well-designed vectorized pointwise multiplication between the new and the previous states. In addition, mathematical functions such as the hyperbolic tan and sigmoid functions are employed to control the flow of information instead of only a single layer as in simple recurrent layers. The LSTM cell consists of the three operational gates with memory element via feedback, and each gate has its bias and weight vectors. Three gates are forget gate, input/update gate, and output gate. Firstly, the current and previous activation state information flows through the sigmoid function, and this layer is known to be the forget layer. This function squashed the information into the range of 0 and 1. This operation quantifies how much information is required for the next step prediction. Secondly, the processed information from the forget gate and input vector flows through the hyperbolic tangent function. This layer is known as the input/update layer because it uses the previous state and present state to generate the new information for the cell. Finally, the linear vector-matrix addition is performed. Then, the information flowed through the hyperbolic tangent. Afterward, the output gate scaled the resultant final values from the previous gate. Overall, the sigmoid function controls the flow such as which information is required or what is needed to forget from the previous cell. It significantly increases the accuracy. The mathematical expression is given in equations (10) to (15).

Forget gate:

Input gate:

Output gate:

Intermediate state:

Final state:

New state:where W and B are the layer weight and bias vectors, respectively, and the input vector is denoted by X.

3.3.3. Gated Recurrent Unit (GRU)

The gated recurrent unit [39] is a variant of the LSTM. Therefore, it has similar properties as the LSTM, and it also solves the vanishing gradient problem. This overall improves the learning long-term capability of the neural network. It also has the sigmoid and hyperbolic tangent functions. It does not have a separate forget and input gate like LSTM, which means not having a separate memory element. Therefore, it has fewer parameters as compared to the LSTM cell. It yields efficiently compared to the former. The update gate consists of both sigmoid and hyperbolic tangent functions. In addition, a reset gate is also designed in the GRU cell. All process flow is the same as the LSTM cell. The mathematical expression is given in equations (16) to (19).

Update gate:

Reset gate:

Internal activation:

Output activation:where W and B represent the respective layer weight and bias vector and X represents the input vector.

3.3.4. Proposed Hybrid Models

One of the most serious shortcomings of CNN is its inability to analyze historical or temporal data contained within time-series signal, as it lacks a memory element for storing the previous state. This means that it is entirely dependent on the current situation. Thus, a feedback mechanism enables the recurrent neural network to learn the previous step state. This recurrent structure enables more efficient handling of time-series data. However, it frequently encounters a gradient vanishing challenge and performs badly in long-term dependency. By addressing long-term dependencies, long short-term memory or gated recurrent unit, the variants of the recurrent neural network, can successfully address the gradient vanishing problem.

In this study, the authors proposed a hybrid of a convolutional neural network with long short-term memory and gated recurrent unit to recognize lower limb activities. CNN is employed for capturing the temporal relationship present in the dataset, whereas the LSTM or GRU is employed for sequence-to-sequence learning. The authors have used the dataset collected from the four channels collected at a sampling frequency of 1 kHz. Then, the signal is segmented as 256 ms. Thus, the total window length is equal to 256 samples. Therefore, the four channels are connected serially. It results in an array size of 1 × 1024, which can be applied as the input for the 1D CNN. Firstly, the input features are normalized in range of 0 to 1. Then, the normalized signal is passed through the two convolutional layers which consist of the convolutional and non-linear layers (ReLU). Afterward, the processed data are passed through the max-pooling layer and then followed through two LSTM or GRU layers. After that, the signal is passed through the two fully connected layers. Feature dimension is reduced by integrating the pooling layer after the convolutional layer. The parameters of studied deep learning models are shown in Table 1 which are found out using the trial-and-error approach.

4. Results and Discussion

We show our findings for several situations involving lower limb activity recognition in this section. The suggested model was trained and evaluated using TPU Google Colab, a cloud-based system to detect a human’s lower limb actions. The model’s code within Colab notebook executes on a Google cloud server. Python’s Keras modules have been used to recognize the various 1D CNN models, and the outcomes were computed. The Adam optimizer integrates the classic backpropagation approach with a cross-entropy loss function and employs the stochastic gradient descent strategy. The hyperparameters of the optimizer have the following values: learning rate (0.001), epsilon (0.00000001), beta 1 (0.9), beta 2 (0.999), and locking (false).

In this investigation, total of 22 subjects consisting of normal persons and knee difficulty persons in equal numbers were examined. The performance indicators were evaluated on (1) healthy adults and (2) individuals with a knee anomaly. The authors have considered starting 70% of the signal for every subject as a training dataset and the remaining 30% as a testing dataset. It helps in the reduction of the temporal dependencies and arbitrarily tests set selection. The authors have considered five performance metrics such as accuracy, precision, sensitivity, specificity, and F-score like the previous study of Vijayvargiya et al. [40].

Table 2 shows the summary indices of the studied deep learning models in percentage for the three activities such as walking, sitting, and standing, under research obtained from sEMG data gathered from healthy people, whereas Table 3 shows the performance for knee abnormality subjects. The data in these tables allow the individual models to be compared in terms of participants with and without knee abnormalities, confirming that the suggested hybrid models outperform the individual model. In the case of the healthy subjects, the CNN-GRU model obtained an accuracy of 99.86% while for CNN, LSTM, GRU, and CNN-LSTM, it was 98.88%, 95.38%, 98.32%, and 99.02%, respectively, as indicated in Table 2. Similarly, F-score value of CNN-GRU model was 99.79% while for CNN, LSTM, GRU, and CNN-LSTM, it was 98.31%, 93.82%, 97.51%, and 98.50%, respectively. In the case of the abnormal knee subjects, the CNN-GRU model obtained an accuracy of 98.69% while for CNN, LSTM, GRU, and CNN-LSTM, it was 92.62%, 54.54%, 96.69%, and 97.62%, respectively, as indicated in Table 3. Similarly, F-score value of the CNN-GRU model was 98.61% while for CNN, LSTM, GRU, and CNN-LSTM, it was 92.22%, 47.64%, 96.55%, and 97.42%, respectively.

Table 4 presents the time taken to complete an epoch of studied deep learning models. It indicates that when the CNN layer is used, the time taken for completion of an epoch is less required than the individual LSTM or GRU model. In healthy participants, the proposed hybrid CNN-GRU and CNN-LSTM deep learning models needed 4.26 s and 4.34 s computing time to execute an epoch, respectively. In contrast, the CNN, LSTM, and GRU deep learning models required 2.26 s, 94.06 s, and 117.60 s. In abnormal knee subjects, the proposed hybrid CNN-GRU and CNN-LSTM deep learning models needed 4.58 s and 6.15 s computing time to execute an epoch, respectively, whereas the CNN, LSTM, and GRU deep learning models required 5.19 s, 98.87 s, and 106.67 s. When the proposed hybrid models were examined in terms of computational time, a significant difference was observed for the hybrid models (CNN-LSTM and CNN-GRU) and individual models LSTM and GRU, and a very small variance was observed between the hybrid models and CNN, but the accuracy and F-score values are relatively high in hybrid models than in individual models.

Figure 3 presents the confusion matrices for the CNN, LSTM, GRU, CNN-LSTM, and CNN-GRU models for healthy patients, whereas Figure 4 depicts the confusion matrices obtained for abnormal knee subjects. The confusion matrix provides a tabular representation of the performance of a classification method. It comprises the data on the real and predicted labels from the model. As illustrated in Figure 3(a), 105, 282, and 320 samples of walking, sitting, and standing activities are accurately predicted, but five samples of walking activity are incorrectly predicted as sitting and three samples of sitting activity are incorrectly predicted as walking.

Figure 5 depicts the variation between the loss vs. epoch for the CNN, LSTM, GRU, CNN-LSTM, and CNN-GRU models for healthy patients, whereas Figure 6 depicts the training plot between loss vs. epoch for abnormal knee subjects. These plots demonstrate that the value of the loss function decreases as the number of epoch increases, until it approaches a steady state, which indicates that the overfitting problem is resolved. The proposed hybrid models achieve a steady state as they approach the tenth epoch, which is significantly better than the other examined models.

Numerous approaches for recognizing lower limb activity have been presented using similar datasets. Table 5 presents a comparison assessment of the proposed model’s performance vs. prior findings, allowing us to conclude that the proposed CNN-GRU model performed well for recognizing lower limb activity in healthy and abnormal knee individuals.

5. Conclusion and Future Scope

The research proposes the use of hybrid deep learning models CNN-LSTM and CNN-GRU to analyze sEMG data to detect lower limb activity in individuals with and without knee abnormalities in which CNN is used for temporal learning, and LSTM or GRU is used for sequence learning. To begin, the authors have used discrete wavelet denoising to denoise the original sEMG signal and then introduced overlapping windowing approaches for data segmentation to mitigate the issue of a small dataset. After that, deep learning models CNN, LSTM, GRU, CNN-LSTM, and CNN-GRU are implemented. In the proposed hybrid models, CNN is used for temporal learning, and LSTM or GRU is used for sequence learning. The proposed hybrid CNN-GRU model achieves classification accuracy of 99.86 and 98.69% and computational time of an epoch of 4.26 and 5.58 s for healthy and abnormal knee subjects, respectively. The results were compared with those obtained by individual models with the hybrid approach proving superior to them.

The sEMG dataset used in this study is limited to 22 individuals to evaluate the suggested approach. Thus, the strategy could be validated in the future by utilizing a large real-time dataset. The suggested methodology was evaluated using an offline dataset; hence, future research could focus on clinical validation using a real-time dataset.

Data Availability

No data were used to support this study.

Conflicts of Interest

The authors declare that they have no conflicts of interest.