Abstract

Frailty and senility are syndromes that affect elderly people. The ageing process involves a decay of cognitive and motor functions which often produce an impact on the quality of life of elderly people. Some studies have linked this deterioration of cognitive and motor function to gait patterns. Thus, gait analysis can be a powerful tool to assess frailty and senility syndromes. In this paper, we propose a vision-based gait analysis approach performed on a smartphone with cloud computing assistance. Gait sequences recorded by a smartphone camera are processed by the smartphone itself to obtain spatiotemporal features. These features are uploaded onto the cloud in order to analyse and compare them to a stored database to render a diagnostic. The feature extraction method presented can work with both frontal and sagittal gait sequences although the sagittal view provides a better classification since an accuracy of 95% can be obtained.

1. Introduction

This work is part of a project called Gait-A whose main objective is the early detection of frailty and senility syndromes using gait analysis. Physical activity is one of the main components involved in frailty syndrome evaluation [1, 2]. Gait is identified as a high cognitive task in which attention, planning, memory, and other cognitive processes are involved [3, 4].

Through gait analysis, quantification of measurable information of gait, and its interpretation [5], frailty and dementia syndromes can be diagnosed. This process is carried out by specialists and is based on estimations through visual inspection of gait.

In this work, we propose a computer vision approach that could aid the specialists providing them with objective measurements of gait and, thus, gain in objectivity of the gait analyses performed.

We propose the use of smartphone cameras to record the subject’s gait and also provide computer vision algorithms able to analyse those sequences to extract spatiotemporal gait parameters. These parameters are then sent to the cloud to be analysed by a classifier for the purpose of determining whether abnormalities are present or not.

A lot of works dealing with gait analysis using computer vision are found in the literature. However, most of them focus on gait biometrics for human identification, and few of them address gait analysis for detection of abnormalities.

The main goal of this study is to provide a nonexpensive and easy-to-deploy solution to obtain the spatiotemporal parameters of gait, which will be fed to classification algorithms that will discriminate between normal and abnormal gait. It needs to be mentioned that the process of obtaining spatiotemporal parameters for abnormal gait compounds the task as the number of assumptions that can be made over gait patterns is drastically reduced. In such cases, neither cyclic patterns nor the totality of the gait phases can be assumed to be present. In this work, for study purposes, Parkinsonian gait, knee pain, and foot dragging among other patterns that deviate from what we consider normal gait will be taken as abnormal gait.

A set of different gait features is analysed in [6] for person identification. The process starts by extracting the silhouette with a background subtraction technique to then obtain the contour. After the contour is obtained, they extract four time-series features: width/height ratio, bounding box width, silhouette area, and center of gravity (COG). These four features follow a cyclic pattern that match the gait cycle and are used to identify a person through deterministic learning.

Xu et al. examined the suitability of the Kinect sensor to measure gait parameters while walking on a treadmill in frontal view [7]. They compared the heel strike (HS) and toe off (TO) they obtained with those obtained using a motion tracking system. HS showed less error than TO because it happens closer to the sensor.

Choudhury and Tjahjadi [8] proposed a method composed of three modules: silhouette extraction, subject classification using Procrustes shape analysis (PSA) and elliptic Fourier descriptor (EFD), and combination of both results. For silhouette extraction, they use background subtraction and morphologic operations to remove noise. PSA module analyses a group of shapes using matching of geometrical locations of a silhouette. The stride length is computed using the width of the bounding box. Finally, EFD allows to characterize the contour of the subject in key points of a gait phase.

Leu et al. proposed a method to extract skeleton joints from sagittal and frontal views [9]. The method proposed uses the horizontal and vertical projection of the silhouette pixels to obtain the neck joint. Then they apply an anatomical model to obtain hip, knees, and ankles. Yoo and Nixon [10] also extract skeleton joints using an anatomical model to segment the silhouette but they obtain the mean points of each segment and then apply linear regression to obtain a line that represents the bones. During double support gait phase, they apply motion tracking to estimate the location of the occluded points. Khan et al. [11], similarly obtain the skeleton by computing the mean points of each body segment. They obtain leg movement and posture inclination and compare it with a normal gait model to recognise Parkinsonian gait.

In addition, we find the following proposals for classifying gait patterns. In Wang [12], the method is based on optical flow that calculates a histogram of silhouette flows to which an eigenspace transformation applies. The data obtained are compared with a normal gait template to calculate deviation. In Bauckhage et al. [13], homeomorphisms apply between 2D lattices and binary shapes to obtain a vector space in which the silhouette is encoded. They performed several silhouette bounding box splittings to obtain different lattices that are then classified using support vector machine (SVM).

Apparently, most of the vision-based gait analysis proposals use sagittal view for the reason that it provides more information with which to work. However, there are obtainable benefits out of a frontal gait analysis. According to Whittle [14], more gait abnormalities can be observed from a sagittal view than from a frontal view. However, we do also undertake frontal gait analysis for the following reasons:(i)Some abnormalities can only be observed from a frontal point of view. Whittle [14] mentions that circumduction gait, hip hiking, abnormal foot contact, and rotation among others are better observed from a frontal view.(ii)In terms of the physical space necessary for recording, sagittal gait sequences require much more than those of frontal gait, for which only a small hall or corridor will serve.

A way to reduce the space needed for sagittal view recording is to use a treadmill, but it could alter gait patterns, especially with frail people. Another workaround is to use a motorised camera that follows the subject, but it is expensive and could complicate the background subtraction as it is moving as well. Both workarounds complicate the acquisition of gait sequences making it difficult to be processed by a smartphone.

Sagittal images show a clear view of feet displacement and enough information to locate heel and toe of each foot. In frontal view, on the other hand, it is not easy to determine where the heel and toe are located in each foot. Therefore, a different approach is required for frontal sequences.

In sagittal view, the size of the subject’s silhouette is maintained along the whole of its trajectory. However, in frontal view, the size of the silhouette increases along its trajectory, so a normalization might be required.

The paper is organized as follows. Section 2 describes the sagittal and frontal methods to obtain spatiotemporal parameters of gait, their implementation in a smartphone, and the classification of normal and abnormal gait in a cloud platform. Section 3 shows the results in which the spatiotemporal gait parameters are subjected to normal and abnormal gait classification. Finally, Section 4 provides the conclusion of this work.

2. Methods

In this paper, we present a platform for gait analysis using computer vision where a smartphone records and processes a gait sequence to obtain spatiotemporal parameters to be sent to the cloud for a classification between normal and abnormal gait. The layout of the platform is shown in Figure 1. In the following subsections, each module of the platform will be described.

2.1. Sagittal Approach

The sagittal approach takes gait sequences recorded from the side as input. The method presents four phases: preprocessing, feet location, feature extraction, and skeleton extraction. Figure 2 shows the diagram of the sagittal approach. The classification phase is performed in the cloud.

2.1.1. Preprocessing

In this phase, a background subtraction is performed to obtain the silhouette of the subject using mixture of Gaussians [15] background subtraction. After that, a morphology operator is applied to remove noise. Finally, the bounding box of the remaining silhouette is extracted by computing the positions using (1), and then those points are made to correspond to a rectangle using (2).

2.1.2. Feet Location

The silhouette obtained by background subtraction is then enclosed in its bounding box and split into four regions, namely, head (13% of bounding box height), torso (34%), upper legs (24%), and lower legs (29%), according to an anthropometric model [16] as shown in Figure 3. The lower leg region is then brought to focus. We search the silhouette pixel with maximum X component to obtain the toe of the front foot (FF) using (3) and the pixel with minimum X to obtain the heel of the back foot (BF) using (4). Then, the lower leg region is split into halves vertically to separate each foot. In the BF half, we search for the lower right pixel (assuming displacement from left to right) to obtain the BF toe. In the FF half, we search for the lower left pixel to obtain the heel. The final result is shown in Figure 3.

2.1.3. Feature Extraction

For each frame of the sequence, the position of the heel and toe of both feet was obtained in the previous phase. To these time series, we applied gradient analysis of the X component to obtain heel strike (HS) when the mean point gradient between FF heel and FF toe goes from greater than zero to zero (foot stops moving as shown in (5)) and the toe off (TO) when the mean point gradient between BF heel and BF toe goes from zero to greater than zero (foot starts moving as shown in (6)). Applying the gradient directly over the position time series produces a lot of false positives due to some noise. To filter the noise, we apply a threshold where any gradient value less than that is set to zero. This threshold can remove small oscillations due to an error in the process of getting the silhouette and locating toes and heels. It follows that a Gaussian smoothing is applied, and isolated values greater than zero or equal to zero are removed using (7).

2.1.4. Skeleton Extraction

The skeleton extraction phase provides a fast way of obtaining an approximation of the locations of the head, neck, hip, knees, and feet. It uses the four regions of the silhouette described in the feet location phase. The head and torso regions are divided in half horizontally, and the COG of each half is computed. The COG of the upper region is moved to the top, and the COG of the lower region is moved to the bottom. Then, the head lower COG and the torso upper COG are averaged to obtain a common point which is the neck. The head location corresponds to the upper COG of the head region.

The upper leg region is also split horizontally in half, and both COGs are obtained. In addition, a vertical split is also performed, and another two COGs are obtained. The upper COG is moved to top and averaged with the lower torso COG to obtain the hip location. Lower COG is discarded. Then right and left COGs are moved to bottom, those two points being the location of the knees. The knees are adjusted to simulate bending. The process to adjust the knees consists in tracing three circles: one with center at the hip and thigh length radius (which is the height of the upper leg segment) and two other circles with center at each foot and radius equal to the tibia length (which is the height of the lower leg segment). Then, an intersection between the hip circle and each of the foot circles is performed. There are three possibilities:(i)No intersection. In this case, the knee point is the one given by the COG.(ii)One intersection. In this case, the knee point is the intersection point.(iii)Two intersections. In this case, the knee point is the intersection point more to the right (assuming gait direction from left to right).

Finally, the location of each foot is the mean point of the heel and toe obtained in the feet location phase. Figure 4 shows the final result.

2.2. Frontal Approach

The frontal approach is very similar to the sagittal one proposed in the previous subsection. It has the same phases: preprocessing, feet location, feature extraction, and skeleton detection. The diagram of the frontal gait approach is shown in Figure 5.

2.2.1. Preprocessing

This phase is exactly the same as for sagittal. The silhouette is obtained using Mixture of Gaussians as background subtraction, and then morphology operators are applied to remove noise.

2.2.2. Feet Location

In frontal view, both toes are always visible but heels are constantly occluded, so heels cannot be properly located. Therefore, we can only rely on toe information.

To obtain toes, we proceed by dividing the silhouette in four regions according to the anthropometric model established in [16]. We focus only on the lower leg segment. Then, we calculate its bounding box and split it vertically into half to separate both feet. It is important to recalculate the bounding box of this part so the vertical split separates both feet accurately; otherwise, any misalignment can cause problems. Note that the process of splitting the bounding box for the purpose of separating both feet will never be accurate with gait patterns that place one foot in front of the other. We will assume that this particular gait pattern is not present in our dataset. We obtain the left and right foot toe by locating the pixel with minimum y component in the left and right half, respectively (8) (Figure 6).

2.2.3. Feature Extraction

The previous phase provides the position of each toe for each frame, which is precisely the information we need to derive HS and TO. We propose an approach to obtain HS and TO with frontal gait based on the time series derived by subtracting the vertical component of both feet.

We will use the subtraction of the y component of the toes to obtain a curve in which zero crosses indicate the feet adjacent gait phase. HS and TO of each foot are located between each zero cross. We can estimate HS and TO by assuming that HS is produced before TO; HS is produced in the first half of each region and TO in the second half. Therefore, we can estimate HS and TO following (9) and (10), respectively, where relates to the frame in which a zero cross point occurs and relates to the frame of the previous zero cross point.

This approach poses some problems with some abnormal gait patterns, as shown in [17], in which some events could not be detected, for example, when a foot is always behind the other or is dragged due to some injury or pain. Figure 7 shows foot dragging where, in some cases, the curve does not cross zero during the swing phase. To solve the problem, we devise another method. Using the same curve from the previous approach (the difference of y component of each foot), we proceed by applying Gauss filters to remove noise (Figure 8 shows the curve of Figure 7 after applying Gauss filters), then we obtain the local maxima and minima, which are located more or less at the center of each pair of zero crosses. But, in this case, the curve does not have to cross zero to produce a maximum or minimum, and the problem is solved.

HS are located before a maximum or minimum, and TO after. We know that both events are located in that region. Empirically adjusting them, we derived that the HS is located at the distance between one maximum (or minimum) and the previous one (12), and TO is located at the distance between one maximum (or minimum) and the next one (13).

Being M an ordered set of maxima and minima in ascending chronological order:

HS of is obtained as

and TO is obtained as

2.2.4. Skeleton Detection

The process is the same as the one described for the sagittal approach, but for frontal approach, the adjustment of knees is not necessary.

2.3. Smartphone Implementation

Sagittal and frontal approaches were implemented on Android using OpenCV native functions. We allowed two ways of processing a dataset:(i)On a real-time video: the smartphone camera records the subject walking and processes it at the same time.(ii)On a previously recorded video: the smartphone records the subject walking and stores it in memory, and then the stored video is processed.

To achieve real-time processing, we use the pyramidal multirresolution approach described in [18]. We achieve 10 fps using a quad core at 1.4 GHz smartphone with 1 GB memory and 25 fps using a tablet with a Tegra K1 quad core processor at 2.2 GHz and 2 GB memory. The size of the input image was reduced to 480 × 270 pixels. However, results shown in Section 3 are obtained using full resolution using the dataset.

2.4. Cloud Platform

To develop the cloud platform, we used the Microsoft Azure Machine Learning platform. This is a cloud platform for designing and developing predictive models. Azure provides a REST Web Service to access the Machine Learning tools.

For our purposes, we develop a K-nearest neighbour (KNN) algorithm with Dynamic Time Warping (DTW) as a distance function accessed through the REST Web Service provided by Azure. To perform a classification between normal and abnormal gait, we use the stride (bounding box width for sagittal approach, and subtraction between y component of each foot for frontal approach) and leg-angle time series (provided by the skeleton extraction algorithm computed as the angle formed by the hip and each foot).

3. Results and Discussion

We will now describe the experiments performed and the results obtained. The dataset recorded for the experiments is also described in this section.

3.1. Dataset

To test the proposed approaches, we recorded two datasets of subjects walking: one using sagittal view and the other using frontal view. Both datasets were recorded in a room with a nonhomogeneous background including windows where the light made it difficult to extract the silhouette. This was intentional because we wanted to test our approaches in real conditions, and so the silhouette is often incomplete. Figure 9 shows the room in which the recordings were performed.

To record the frontal dataset, we placed a camera at one end of an 8 m corridor and asked the subject to walk towards it. We captured a total of 23 samples of normal gait and 20 samples of abnormal. To record the sagittal dataset, we used the same environment, but we placed a camera at a distance of 4 m from the perpendicular of the gait direction to obtain a side view. In this case, a total of 15 samples of normal gait and 15 of abnormal gait were recorded. Even if the number of recorded samples is low (43 for frontal gait and 30 for sagittal gait), there are a total of 320 HS events and 319 TO events for frontal gait and 233 HS events and 223 TO events for sagittal gait.

We asked the subjects to walk normally along the corridor and then to walk feigning some of the following abnormalities:(i)Knee pain: the subject simulated pain in one of his knees.(ii)Foot dragging: the subject dragged one foot.(iii)Parkinsonian gait: the subject made some small steps with variable speed.(iv)Other: the subject depicted random patterns.

To guarantee the privacy of the subjects, we published only the silhouettes extracted during the silhouette extraction phase. These silhouettes are stored as an ordered set of images, and a file with the elapsed milliseconds for each image is also included. For each recorded sample, we manually mark the frames in which a HS or TO event occurs to use it as a ground truth. We also include information related to pixel width to be able to calculate distances and the sample class (normal = 0 or abnormal = 1). In addition, a file with the output of the feet location and feature extraction phases is included which contains the positions of heel and toe of each foot, their gradients, and the events of HS and TO detected. These results are the output of the HS and TO detection algorithm using full resolution (1920 × 1080), which do not correspond to those provided by the smartphone using a quarter of that resolution.

Both datasets are accessible through the URL provided by [19].

3.2. Experiments

We performed experiments using our own datasets for sagittal and frontal gait. We used the manual marking of the HS and TO events of each gait sequences of the dataset as ground truth. The error margin of this manual marking was set to frame because that is the minimum value. We also assumed an error of frame in the algorithm output. So, the global error margin was set to frames. Then, the difference in frames between the ground truth and the proposed algorithm was analysed. Any difference less or equal to the global error margin was considered acceptable. Then, the root mean square error (RMSE) of the differences was computed usingwhere n corresponds to the number of events (HS or TO in this case), the frame of the event i in the manual marking, and the frame of the event i in the algorithm output.

3.3. Sagittal Approach

In Table 1, we show the results after applying the HS and TO detection algorithm with the filtering method described in the previous section for sagittal view. The table shows the amount of correct detections (less than 2 frames of difference between algorithm and manual marking), undetected cases, wrong detection (more than 2 frames of difference), and the root mean square error of both correct and wrong cases. As observed, the RMSE of both HS and TO events is lower than the error margin of 2 frames. TO events are more accurately delimited than HS events. But, HS events show less undetected cases. Therefore, it will be HS, the event we will use to obtain the spatiotemporal parameters to perform classification. Figure 10 shows graphically the correct, wrong, and undetected cases.

3.4. Frontal Approach

Table 1 also shows the results after applying the frontal approach. As shown in there, the RMSE of both HS and TO in normal gait is smaller than the error margin of 2 frames, but it is slightly bigger for abnormal gait. Therefore, results are acceptable for both normal and abnormal. Error is mainly produced in the first steps when the silhouette is smaller (the subject is farthest from the camera). Figure 11 shows graphically the results of Table 1.

The results obtained with our sagittal view approach are similar for normal gait. We obtained 1.44 frames for HS and 1.08 for TO, which were slightly more precise than the ones we extracted from frontal approach (1.88–1.63) but close to each other. However, in the case of abnormal gait, we obtained 1.79 frames for HS and 1.59 for TO, which were more precise than those obtained with the frontal approach (2.42–2.17).

3.5. Classification

To perform a classification between normal and abnormal gait, we use KNN to compare the stride length and leg-angle time series of the different gait cycles. To calculate the distance between two time series, we apply DTW. We perform the classification test with two different methods:(i)Testing each gait cycle separately. The time series corresponding to each gait cycle is treated separately as if it belonged to different subjects.(ii)Testing each gait cycle of each recording sample and outputting the mode class for each subject. In this case, a prediction for each gait cycle follows, and then another prediction is computed by outputting the mode class for the same recording sample.

To validate the proposed classification, we use 10-fold and leave-one-out cross-validations to finely measure the accuracy of each classifier.

Table 2 shows the results of the stride and leg-angle time series for the sagittal approach. We obtained an accuracy rate of 100% using leg-angle time series when outputting the mode class for each recording sample. Least accurate results, however, are the ones offered by the stride width.

The results of the classification experiments for frontal approach are shown in Table 3. As shown in there, testing each recording sample produces better results as it tends to eliminate outliers.

We have focussed on obtaining a classification between normal and abnormal gait to assess the suitability of the proposed algorithm to differentiate between the two of them. For this test, we considered knee pain and foot dragging as abnormal gait. The results obtained suggest that the classifier can differentiate between normal and abnormal gait. Therefore, future work will focus on classifying different abnormal gaits.

4. Conclusion

The main contribution of this paper is a nonexpensive and easy-to-deploy approach to obtain HS and TO and some skeleton joints using both sagittal and frontal gait sequences. Frontal view poses some problems when obtaining heels position, so we focus on toes instead. Results show acceptable precision in providing HS and TO in both the sagittal and the frontal methods. Comparing both approaches, results were similar but sagittal proved to be more accurate. The dataset recorded to test the proposed approaches is for anyone to use it [19]. To maintain the privacy of the subjects, we published only the silhouette.

We also provide a cloud platform-based web service to perform a classification between normal and abnormal gait for both sagittal and frontal views. Results show a classification rate greater than 80% in frontal view and more than 90% in sagittal view.

The ability to perform gait analysis using frontal view reduces the physical space required for the tests. In addition, this method does not rely on silhouette displacement (the sagittal approach does), so it is also suitable for treadmill gait sequences. Therefore, the space could be reduced even more in cases where the alteration of gait patterns that the treadmill could cause does not significantly matter.

Future work will focus on improving the accuracy of HS and TO for abnormal gait and classifying different abnormal gait types.

Conflicts of Interest

The authors declare no conflicts of interest.

Acknowledgments

This research is part of the FRASE MINECO project (TIN2013-47152-C3-2-R) funded by the Ministry of Economy and Competitiveness of Spain.