Table of Contents Author Guidelines Submit a Manuscript
The Scientific World Journal
Volume 2014 (2014), Article ID 136018, 8 pages
Research Article

Gait Signal Analysis with Similarity Measure

1Department of Electrical and Electronic Engineering, Xi’an Jiaotong-Liverpool University, Suzhou 215123, China
2Department of Information Security, Tongmyong University, Sinseonno, Nam-gu, Busan 608-711, Republic of Korea

Received 25 April 2014; Accepted 10 June 2014; Published 7 July 2014

Academic Editor: T. O. Ting

Copyright © 2014 Sanghyuk Lee and Seungsoo Shin. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.


Human gait decision was carried out with the help of similarity measure design. Gait signal was selected through hardware implementation including all in one sensor, control unit, and notebook with connector. Each gait signal was considered as high dimensional data. Therefore, high dimensional data analysis was considered via heuristic technique such as the similarity measure. Each human pattern such as walking, sitting, standing, and stepping up was obtained through experiment. By the results of the analysis, we also identified the overlapped and nonoverlapped data relation, and similarity measure analysis was also illustrated, and comparison with conventional similarity measure was also carried out. Hence, nonoverlapped data similarity analysis provided the clue to solve the similarity of high dimensional data. Considered high dimensional data analysis was designed with consideration of neighborhood information. Proposed similarity measure was applied to identify the behavior patterns of different persons, and different behaviours of the same person. Obtained analysis can be extended to organize health monitoring system for specially elderly persons.

1. Introduction

Analysis on the human gait signal has been studied steadily by numerous researchers [13]. The research on the gait signal applies to the field of healthcare development, security system, and another related area. Research methodology is based on how to classify the signal and develop pattern recognition algorithm for the comparable data sets. This algorithm applies the processing of the different signals of the same person and same action signals from multiple persons. Developing methodology for gait discrimination is challenge. However human gait signal has high dimensional characteristics, hence analyzing and designing explicit classifying formula is needed [3].

Generally, human gait signals consist of walking, sitting, standing, stepping up and down, and other usual behavior. Such a usual behavior would be done in house life, then it is quite easy for us to identify when we watch them in real situation. In order to develop a more massive monitoring system and healthcare system to analyze and identify behavior signal of each person, decision and classifying system for the gait signal is required. More specifically, decision whether he/she is doing in normal activities or not can be applied to the design of health care system. Therefore, obtained research output can be applied to the identification, healthcare, and other related fields. Additionally, more detail checking result even for the healthy people such as athletes can provide useful information whether he/she has suffered from other problems compared to the previous behavior.

To discriminate between different patterns, rational measure obtained from a statistical approach or heuristic approach is needed. By the point of statistical method, signal autocorrelation and cross correlation knowledge are useful because such formula provide us how much the signals are related with each other by the numeric value. Also, it is rather convenient to calculate due to the conventional software such as Matlab toolbox. However, it is not easy for high dimensional data to construct high dimensional correlation/covariance matrices. For heuristic approach, it needs preliminary processing for the signal, such as data redefinition and measure design based on the human thinking. Even the realization of measure based on heuristic idea is considered, ordinary measure for discrimination has to be considered such as distance. Fortunately, similarity measure design for the vague data has become a more interesting research topic; hence, numerous researchers have been focused on the similarity measure, entropy design problem for fuzzy set, and intuitionistic fuzzy set [48].

Similarity measure [912] provides useful knowledge to the clustering, and pattern recognition for data sets [13]. However, most of the conventional results were not included in high dimensional data. Human gait signal represents high dimensional data characteristics. So similarity measure design problem for high dimensional data are also needed to deal with the human gait signal. Similarity measure research has rather long history; square error clustering algorithm has been used from the late 1960s [14]. And it was modified to create the cluster program [15]. Naturally, similarity measure topic has been moved to many areas such as statistics [16], machine learning [17, 18], pattern recognition [19], and image processing. Extended research on high-dimensional data can be applied to the security business including fingerprint and iris identification, image processing enhancement, and even big data application recently.

Then, distance between vectors can be organized by norms such as 1-norm, Euclidean-norm, and so forth. Similarity measure is also designed explicitly with the distance norm. Similarity measure design problem for high-dimension needs more considerate approach. Conventionally, the similarity measure has been designed based on the distance measure between two considered data, that is, distance measure was considered information distance between two membership functions. In the similarity measure design with distance measure, measure structure should be related to the same support of the universe of discourse [14, 15]. Additionally, similarity measure consideration on overlapped or nonoverlapped data is needed because many cases of high dimensional data are represented nonoverlapped data structure. With conventional similarity measure, nonoverlapped data analysis is not possible. Hence, similarity measure design for nonoverlapped data should be followed. In order to design similarity measure on nonoverlapped data, neighbor data information was considered. Data has to be affected from the adjacent information, so similarity measure on nonoverlapped data was designed. Inside of literature, artificial data was given to compare with conventional similarity measure; calculation result was also illustrated.

In the following section, preliminary results on the similarity measure on overlapped and nonoverlapped data were introduced. Proposed similarity measure was proved and applied to overlapped and nonoverlapped artificial data. In Section 3, gait signal acquisition system was considered with sensor and data acquisition Gait signal which was also illustrated with different behaviors. High dimensional similarity measure was proposed and proved in Section 4. Similarity measure design for high dimensional data was also discussed by way of norm structure. Similarity calculation results for different behavior and individuals were also shown. Finally, conclusions are followed in Section 5.

2. Similarity Measure Based Distance Property

In order to design similarity measure explicitly, usual measure such as Hamming distance was commonly used as distance measure between sets and as follows: where , and are fuzzy membership function of fuzzy sets and at , and was the absolute value of . The membership function of is represented by , is total set, and is the class of all fuzzy sets of . Similarity measure definition was defined with the help of distance measure [14]. There are numerous similarity measures satisfying the following definition.

Definition 1 (see [14]). A real function : is called a similarity measure if has the following properties:(S1);(S2);(S3);(S4), if , then and ;where , is the class of ordinary sets of and is the complement set of . By this definition, numerous similarity measures could be derived. In the following theorem, similarity measures based on distance measure is illustrated.

Theorem 2. For any set , if satisfies Hamming distance measure, then is the similarity measure between sets and .

Proof. Commutativity of (S1) is clear from (9) itself; that is, .
For (S2), is obtained because of and , where and denote zero and one over the whole universe of discourse of . Hence, (S2) was satisfied.
(S3) is also easy to prove as follows: It is natural that satisfied maximal value. Finally, guarantees , and also guarantees ; therefore, triangular equality is obvious by the definition, and hence (S4) is also satisfied.

Besides Theorem 2, numerous similarity measures are possible. Another similarity measure is illustrated in Theorem 3, and its proof is also found in the previous result [15, 16].

Theorem 3. For any set , if satisfies Hamming distance measure, then are the similarity measure between sets and .

Proof. Proofs are easy to be derived, and it was found in previous results [15, 16].

Besides similarity measures of (7) to (9), other similarity measures are also illustrated in previous results [1517]. With similarity measure in Theorems 2 and 3, it is only possible to compute the similarity measure for overlapped data (Figure 1). Following data distributions of diamonds () and circles () illustrates nonoverlapped data; it is appropriate to design similarity measure for data in Figure 2. Consider the nonoverlapped data distribution of diamonds () and circles (), the similarity measure of (7) to (9) cannot provide the appropriate solution for the nonoverlapped distribution. Two data pairs that constitute different distributions are considered in Figure 2. Twelve data with six diamonds () and six circles are illustrated with different combination in Figures 2(a) and 2(b). Similarity degree between circles and diamonds must be different between Figures 2(a) and 2(b) because of different distribution. For example, (7) represents

Figure 1: Overlapped data distribution.
Figure 2: (a) Data distribution between circle and diamond. (b) Data distribution between circle and diamond.

From (7), provides distance between and . By the following definitions: Nonoverlapped data satisfies , and is defined as or . Hence, shows similarity measure, where is the total number of data sets and . From this property, and the same result is obtained for Figures 2(a) and 2(b). Hence, similarity measures (2) to (9) are not proper for the nonoverlapped data distribution. Two different data in Figure 2(a) are less discriminate than Figure 2(b). It means that similarity measure of Figure 2(a) has a higher value than Figure 2(b). Similar results are also obtained by the calculation of similarity measures (8) and (9).

Hence, it is required to design similarity measure for nonoverlapping data distribution. Consider the following similarity measure for nonoverlapped data such as Figures 2(a) and 2(b).

Theorem 4. For singletons or discrete data , if satisfied Hamming distance measure, then is a similarity measure between singletons and . In (13), and satisfy and , respectively. Where is whole data distribution including and .

Proof. (S1) and (S2) are clear. (S3) is also clear from definition as follows: Finally, (S4) , if , then because is satisfied. Similarly, is also satisfied.

Similarity measure (13) is also designed with the distance measure such as Hamming distance. As noted before, conventional measures were not proper for nonoverlapping continuous data distributions; this property is verified by the similarity measure calculation of Figures 2(a) and 2(b).

Next, calculate the similarity measure between circle and diamond with (13).

For Figure 2(a), is satisfied.

For the calculation of Figure 2(b), Calculation result shows that the proposed similarity measure is possible to evaluate the degree of similarity for nonoverlapped distributions. By comparison with Figure 2, distribution between diamond and circle in Figure 2(a) shows more similar.

3. Human Behavior Signal Analysis and Experiments

Gait signals are collected with experiment unit; acquisition system (Figure 3) system is composed with all in one sensor in which accelerator, magnetic, and Gyro sensor, mobile station, and connector are integrated. Signal acquisition experiment was done as shown in the following figures.

Figure 3: Data acquisition experiment.

Gait patterns are composed of walking, step up and step down for 20 persons. For each behavior, signals are measured with all in one sensor which integrated with three sensors (accelerator, magnetic, and Gyro sensors); each sensor represents three dimension direct signals. Four sensors are attached to waist, two legs, and head. Example of obtained gait signals is illustrated in the following Figure 4. Among numerous cases, walking and stair up signals are illustrated with acceleration sensor in Figures 4(a) and 4(b), magnetic sensor in Figures 4(c) and 4(d), and Gyro sensor in Figures 4(e) and 4(f), respectively. Full signal was illustrated in Figure 4(f) for stair up with Gyro sensor; we can notice 12 signals for direction, and it shows almost the same pattern for similar gait. Hence, direction signals are considered in each figure. Due to the fact that signal patterns are almost the same and numerous quantity, we collect two directional signals. From the top, the first two signals represent signals at head, and next ones are waist and left and right leg signals, respectively. We also carried out preprocessing to make synchronize signals and obtained gait signals that are illustrated in Figure 4.

Figure 4: Gait signal with all in one sensor.

We get the signals from the control unit, and the signal is processed in a note book. Signal characteristics were considered peak value and magnitude distance between each gait signal. Next, by the application of the similarity measure, we get the calculation of each action such as walking, step up, and so on.

4. Numerical Decision Calculation

4.1. High Dimensional Analysis

Research on big data analysis has been emphasized by research outcomes recently [7, 8, 13, 19]. Big data examples are illustrated as follows.(i)Biomedical data such as DNA sequence or Electroencephalography (EEG) data. It contains not only high dimension but also large number of channel data.(ii)Recommendation systems and target marketing are important applications in the e-commerce area. Sets of customers/clients’ information analysis help to predict their action to purchase based on customers’ interest. It also includes a huge amount of data and high dimensional structure.(iii)Industry application such as EV station scheduling problem needs geometrical information, city size, population, traffic flow, and others. Hence, number of station and station size constitute huge data and high dimension.Data might be expressed as high dimensional structure such as where and denote the number of data and dimension, respectively.

Direct data comparison is applicable to overlapped data with norm definition including Euclidean norm such as Information distributions show various configurations, and hence it needs consider various types of distance measure to complete discriminative measure. Furthermore analysis of similarity and relation between different information should be considered carefully when it represents high-dimensional data. Specifically, dimension represents the independent number of characteristics or attributes.

4.2. Illustrative Example

Hence, analysis and comparison with each attribute provide explicit importance of each data. Similarity measure provides analysis between patterns, such as And comparison with different patterns for the same person is carrying out between walking, step up, or step down. Gait signal related with his/her movement was gathered to analyze their different patterns and different persons. Each signal constitutes walking, stair up, and stair down with 20 persons’ gaits were gathered. Hence, personal information can be represented by multidimensional information such as And among 20 personal data, 3 different behaviors were also expressed. Considering the data, it is obvious that data is overlapped. Hence, it is clear that similarity measures of (2) and (7)–(9) provide similarity calculation for overlapped data.

Similarity measure between person to person is expressed as Also different action from the same person as follows:

Normalized similarity calculation results are illustrated in Table 1.

Table 1: Similarity measure comparison between patterns.

Results illustrate that the stair up and down shows higher similarity than the others. However, even similarity calculation result is higher than others; it is not much close to one, it just satisfies 0.55. Due to different directions stair up/down should have basic limit to close maximum similarity.

Table 2 shows the average similarity between different individuals. Results show that the stair up is the closest even with a different gait. Naturally, walking pattern represents the least similar. In Table 3, walking similarity between different individuals is illustrated; symmetric results of 20 persons are listed in Table 3.

Table 2: Average similarity measure between individuals.
Table 3: Similarity measure comparison between individuals (walking).

Similar results for stair up and down are obtained.

5. Conclusions

Gait signal identification was carried out through similarity measure design. Gait signal was obtained via data acquisition system including mobile station, all in one sensor attached to the head, waist, and two legs. In order to discriminate the gait signal with respect to different behaviors and individuals, similarity measure design was considered. Similarity measure was considered with the distance measure. For data distribution, overlapped and nonoverlapped distribution were considered, and similarity measure was applied to calculate the similarity. However, the conventional similarity measure was shown that it was not available to calculate the similarity on non-overlapped data. To overcome such a drawback, the similarity measure was considered with data information of neighbor. Closeness between neighbor data provides a measure of similarity among data sets; hence, the similarity measure was calculated. Calculation proposed two different artificial data, and the proposed similarity measure was useful to identify nonoverlapped data distribution. It is meaningful that similarity measure design can be extended to high dimensional data processing because gait signal was considered as a high dimensional data. With data acquisition system, 20 person gait signals were collected through experiments. Different gaits, walking, stair up, and stair down signals were obtained, and similarity measure was applied. By calculation, similarity between stair up and stair down showed higher similarity than others. Individual similarity for a different gait signal was also obtained.

Gait signal analysis can be used for behavior decision system development; it is also naturally extended to health care system, especially to elderly people. Additionally, it is also useful for athlete to provide useful information if he/she is suffering from different actions compared to previous behavior.

Conflict of Interests

The authors declare that there is no conflict of interests regarding the publication of this paper.


This work was supported in part by a Grant from the Research Development Fund of XJTLU (RDF 11-02-03).


  1. D. H. Fisher, “Knowledge acquisition via incremental conceptual clustering,” Machine Learning, vol. 2, no. 2, pp. 139–172, 1987. View at Publisher · View at Google Scholar · View at Scopus
  2. A. K. Jain and R. C. Dubes, Algorithms for Clustering Data, Prentice Hall, 1988. View at MathSciNet
  3. F. Murtagh, “A survey of recent advances in hierarchical clustering algorithms,” Computer Journal, vol. 26, no. 4, pp. 354–359, 1983. View at Publisher · View at Google Scholar · View at Zentralblatt MATH · View at Scopus
  4. R. S. Michalski and R. E. Stepp, “Learning from observation: conceptual clustering,” in Machine Learning: An Artificial Intelligence Approach, pp. 331–363, Springer, Berlin, Germany, 1983. View at Publisher · View at Google Scholar
  5. H. P. Friedman and J. Rubin, “On some invariant criteria for grouping data,” Journal of the American Statistical Association, vol. 62, no. 320, pp. 1159–1178, 1967. View at Publisher · View at Google Scholar · View at MathSciNet
  6. K. Fukunaga, Introduction to Statistical Pattern Recognition, Academic Press, New York, NY, USA, 1990. View at MathSciNet
  7. “Advancing Discovery in Science and Engineering. Computing Community Consortium,” Spring 2011.
  8. Advancing Personalized Education, Computing Community Consortium, 2011.
  9. S. Lee and T. O. Ting, “The evaluation of data uncertainty and entropy analysis for multiple events,” in Advances in Swarm Intelligence, pp. 175–182, Springer, New York, NY, USA, 2012. View at Google Scholar
  10. S. Park, S. Lee, S. Lee, and T. O. Ting, “Design similarity measure and application to fault detection of lateral directional mode flight system,” in Advances in Swarm Intelligence, vol. 7332 of Lecture Notes in Computer Science, pp. 183–191, Springer, New York, NY, USA, 2012. View at Google Scholar
  11. S. Lee and T. O. Ting, “Uncertainty evaluation via fuzzy entropy for multiple facts,” International Journal of Electronic Commerce, vol. 4, no. 2, pp. 345–354, 2013. View at Google Scholar
  12. S. Lee, W. He, and T. O. Ting, “Study on similarity measure for overlapped and non-overlapped data,” in Proceedings of the International Conference on Information Science and Technology (ICIST '13), pp. 48–53, March 2013.
  13. “Smart Health and Wellbeing,” Computing Community Consortium, 2011.
  14. X. C. Liu, “Entropy, distance measure and similarity measure of fuzzy sets and their relations,” Fuzzy Sets and Systems, vol. 52, no. 3, pp. 305–318, 1992. View at Publisher · View at Google Scholar · View at MathSciNet · View at Scopus
  15. S. Lee, W. Pedrycz, and G. Sohn, “Design of similarity and dissimilarity measures for fuzzy sets on the basis of distance measure,” International Journal of Fuzzy Systems, vol. 11, no. 2, pp. 67–72, 2009. View at Google Scholar · View at MathSciNet · View at Scopus
  16. S. Lee, K. H. Ryu, and G. Sohn, “Study on entropy and similarity measure for fuzzy set,” IEICE Transactions on Information and Systems, vol. 92, no. 9, pp. 1783–1786, 2009. View at Publisher · View at Google Scholar · View at Scopus
  17. S. H. Lee, S. J. Kim, and N. Y. Jang, “Design of fuzzy entropy for non convex membership function,” in Advanced Intelligent Computing Theories and Applications. With Aspects of Contemporary Intelligent Computing Techniques, vol. 15 of Communications in Computer and Information Science, pp. 55–60, 2008. View at Google Scholar
  18. Y. Cheng and G. Church, “Biclustering of expression data,” in Proceedings of the 8th International Conference on Intelligent System for Molecular Biology, 2000.
  19. D. M. West, Big Data for Education: Data Mining, Data Analytics, and Web Dashboards, Governance Studies at Brookings, Washington, DC, USA, 2012.