Research Article | Open Access
Bineet Kaur, Garima Joshi, "Lower Order Krawtchouk Moment-Based Feature-Set for Hand Gesture Recognition", Advances in Human-Computer Interaction, vol. 2016, Article ID 6727806, 10 pages, 2016. https://doi.org/10.1155/2016/6727806
Lower Order Krawtchouk Moment-Based Feature-Set for Hand Gesture Recognition
The capability of lower order Krawtchouk moment-based shape features has been analyzed. The behaviour of 1D and 2D Krawtchouk polynomials at lower orders is observed by varying Region of Interest (ROI). The paper measures the effectiveness of shape recognition capability of 2D Krawtchouk features at lower orders on the basis of Jochen-Triesch’s database and hand gesture database of 10 Indian Sign Language (ISL) alphabets. Comparison of original and reduced feature-set is also done. Experimental results demonstrate that the reduced feature dimensionality gives competent accuracy as compared to the original feature-set for all the proposed classifiers. Thus, the Krawtchouk moment-based features prove to be effective in terms of shape recognition capability at lower orders.
Gesture recognition system identifies gestures using shape features. These systems are applied in video-surveillance, 3D animation, sign language interpretation, and Human Computer Interaction (HCI) systems. Sign language is a nonverbal way of communication among the deaf through hand gestures. In sign language, different signs are created by the combination of hand movements and facial expressions. Sign language has its own grammar and syntax which is different than the grammar used in spoken or written language. Every country has its own sign language with its own phonetics. Some of the sign languages are American Sign Language (ASL), British Sign Language (BSL), Japanese Sign Language (JSL), Korean Sign Language (KSL), and so on. Indian Sign Language (ISL) has a well-structured phonology, morphology, syntax, and grammar. ISL provides information through movements of hands, arms, face, and head. It can produce an isolated sign (single hand gesture) and continuous sign (movement of hands in series). The aim is to create an application which understands ISL alphabets by interpreting gestures of hands which can then be produced in the textual form on the computer screen. This will make interaction with the deaf people easy without the need for an interpreter, thereby developing Human Computer Interaction (HCI) system in terms of ISL. The block diagram of ISL recognition system is illustrated in Figure 1.
For a gesture recognition system to be effective, features play an important role in extracting information about the image. Therefore, moments are widely used shape descriptors because they are capable of extracting local as well as global information. These have been used in various applications like pattern recognition , face recognition [2, 3], image watermarking , medical image analysis , and gesture recognition [6, 7]. Hu proposed moments for the first time . These were seven moments, which were invariant to scale, rotation, and translation. Since then, a lot of research has been done on its applicability in various domains. After this, Teague introduced continuous orthogonal moments, Zernike and Legendre . Being orthogonal in nature, they had minimum information redundancy. So, image reconstruction was easier as compared to Hu moments which were nonorthogonal in nature. However, these moments require image coordinate space transformation. Moreover, the numerical approximation of continuous moment integrals results in discretization error, thus reducing the recognition accuracy. However, Khotanzad and Hong used Zernike moments for pattern recognition because of its rotational invariance property . To rectify the problems encountered in continuous orthogonal moments, discrete orthogonal moments like Tchebichef and Krawtchouk were proposed.
Yap et al. analyzed Krawtchouk moments for the first time on 2D images . One benefit of Krawtchouk moments is that they are defined in image coordinate space and hence do not have a discretization error. Another benefit is that they not only extract global information like Zernike moments but also have the ability to represent local information from any Region of Interest (ROI) in an image. Wang et al. compared Krawtchouk (till 4th order, with ROI at centre), Legendre, and Zernike moments for handwritten Chinese character recognition . It was concluded that Krawtchouk moments outperformed other moments in terms of recognition accuracy. Moreover, Krawtchouk moments proved to be best feature descriptors in face recognition systems also. Noraini analyzed the performance of Krawtchouk and Tchebichef moments on a database of facial expressions . It was studied that Krawtchouk moments gave better accuracy. In the context of hand gestures, Priyal and Bora evaluated the performance of Krawtchouk moments (till 80th order, with ROI at centre) on single handed 10 gesture signs of digits “0” to “9,” taken from 23 users, with 423 samples for each gesture . The gestures were taken at different angles, scales, and orientations. It was concluded that Krawtchouk moments were user-invariant as compared to Geometric and Zernike moments. The paper focuses on following objectives:(1)A Krawtchouk moment-based feature-set till 3rd order is extracted. To include local shape characteristics, the feature vector is increased by varying the ROI.(2)The performance of proposed features is analysed for different classifiers.(3)The recognition accuracy of original feature-set is compared with the reduced feature-set. An optimal feature-set and classifier combination that gives good recognition accuracy is proposed.(4)For comparative analysis the performance of proposed feature vector is also analysed on standard Jochen-Triesch’s database.
The paper is organized as follows: Section 2 gives the introduction of classical Krawtchouk polynomials, Krawtchouk invariant moments for feature extraction, and spatial behaviour of 1D and 2D Krawtchouk polynomials. Section 3 presents the detail of proposed methodology. The details about the database, feature extraction technique, feature selection algorithm, and classifiers used in this work are detailed in this section. Section 4 discusses the results and comparison of results for original as well as reduced feature-set. Section 5 gives conclusion and scope of future work.
2. Krawtchouk Moment-Based Features
Krawtchouk moments are derived from Krawtchouk polynomials. These are discrete in nature and are associated with binomial functions . This section gives an introduction to the classical Krawtchouk polynomials and weighted Krawtchouk polynomials. The spatial behaviour of 1D and 2D Krawtchouk polynomials with varying ROI is also presented.
2.1. Krawtchouk Polynomials
The 1D weighted Krawtchouk polynomials of th order are defined as The parameter “” can be used to vary the position of peak. In 1D it can be varied from 0 to 1. In the classical Krawtchouk polynomials increasing the order results in numerical fluctuations. So, weighted Krawtchouk polynomials are used to achieve numerical stability. The Krawtchouk polynomials are defined as where is the pochhammer symbol which is given byThe weight functions used in (1) are given byIn weighted Krawtchouk polynomials, as the order increases, the numbers of zero crossings also increase. Figure 2 shows the 1D weighted Krawtchouk polynomials plot at 0th, 1st, 2nd, and 3rd order. In these plots, ROI is varied by varying parameter , 0.5, and 0.7, respectively. The parameter deviates from 0.5 by . For example, if , the ROI lies at the centre of the image. For , the ROI is shifted towards the positive -direction and, with , it is shifted to the negative -direction. It can be observed that the value of weighted Krawtchouk polynomials lies within the range of .
2.2. Krawtchouk Moment Invariants
For an image intensity function , the 2D Krawtchouk moments of order iswhere the size of is . In case of 2D Krawtchouk polynomial plots, parameters and shift the ROI horizontally and vertically, respectively. shifts ROI horizontally to the positive -direction while shifts the ROI horizontally to the negative -axis. Similarly, for , shifting of ROI takes place vertically to the negative -direction and, for , the ROI shifts to the positive -direction. Figure 3 shows the top view of 2D Krawtchouk moment plots at lower orders. Here, ROI is focused to the centre with . The ROIs with different values of and are shown in Figure 4, covering different ROIs in an image which are used as features in this paper. The white colour shows positive peaks whereas black colour shows negative peaks. It shows that the Krawtchouk moments are very good local descriptors.
For an image, various lower orders of Krawtchouk moments can be calculated as shown in Figure 3. Further, the ROIs are defined at different positions of the image, thus covering the entire image. The ROI is shifted horizontally and vertically by taking values of (, ) as (0.5, 0.5), (0.3, 0.3), (0.4, 0.7), (0.6, 0.3), and (0.6, 0.7) as shown in Figure 4. At each ROI, till 3rd order 9 features are calculated. Thus, a total of 45 Krawtchouk moment-based local features are extracted to study the recognition accuracy of ISL alphabets.
The shape-based local features invariant to rotation, scale, and translation are calculated from Krawtchouk moments. These invariant Krawtchouk moments are listed in the following:where The Krawtchouk moments can also be written in terms of geometric moments as follows:where geometric moments of order are given bywhere is in the range of . To define it in the range of 0° to 360°, make modifications as done by Teague . is central moment. The Krawtchouk moments thus obtained in (8) are invariant to rotation, scale, and translation:
3. Proposed Methodology
3.1. ISL Database
The database consists of 10 hand gestures of ISL alphabets “A,” “B,” “F,” “I,” “K,” “M,” “P,” “Q,” “U,” and “W.” Each gesture is collected from 72 subjects. A total of 720 gesture images are constructed from different signers as shown in Figure 5. A total of 720 gesture images are constructed from different signers. The signs are selected such that they have identical shapes like, “A,” “B,” “P,” “Q,” “U,” and “W,” high occlusion as in “M” and “W,” and one gesture being subgesture of the other as in “I” and “K.” In most of signs both hands are used, which leads to complexity. The images have a resolution of 640 × 480 pixels, taken on a uniform (black) background, with varying illumination, at a fixed distance from the camera. These go through preprocessing stage, where each image is converted from RGB to binary form. Edge detectors are used and each image is resized to form 30 × 30 binary images, while preserving their edges and shape.
3.2. Jochen-Triesch’s Database
The database consists of 10 static hand postures collected from 24 subjects in uniform dark, uniform light, and complex background .
3.3. Feature Extraction
In this work, the moments till 3rd order are selected. This can be viewed in Figure 6, where the reconstruction results show that they cover the entire image giving local shape features. Thus, the lower order Krawtchouk moments of 3rd order are more suitable for classification . On the other hand, the higher order moments result in more computational complexity with the increase in the number of Krawtchouk coefficients.
The Krawtchouk features up to 3rd order are extracted by varying values of and . Lower orders have the ability to characterize shape details, thus providing local information from a specific ROI. The features calculated for various ROIs are shown in Figure 7. The 9-dimensional feature-set extracted for each ROI includesAs shown in Figure 7, for 5 values of (, ), a total of 45 Krawtchouk moment-based local features are calculated for image.
3.4. Feature Selection
Feature selection is useful in removing irrelevant and redundant features. In this paper, correlation-based feature selection (CFS) algorithm is used. CFS is considered as the most stable feature selection algorithm which selects feature subsets that are highly correlated with the class, but uncorrelated with each other [16, 17]. CFS uses Pearson correlation coefficient which is calculated as follows :where is the merit of the current subset of features, is the number of features, is the mean of the correlations between each feature and the class, and is the mean of the pairwise correlations between every two features. The numerator increases when the features can classify the data accurately and the denominator increases when there exists redundancy between the features. Larger gives the best feature subset. The feature subset is chosen on the basis of highest value. For this, greedy stepwise algorithm is used, where one feature is added at a time into an empty matrix at each stage. Each feature subset is ranked on the basis of until a subset is selected on the basis of the best correlation coefficient value () .
CFS has been used for large databases to reduce the problems of class imbalance, high dimensionality, and information redundancy [17, 20, 21]. For the proposed database in this paper, it reduces the feature-set from 45 to 22. A comparison of original and reduced feature-set is done in terms of recognition accuracy at various classifiers.
The performance of -Nearest Neighbour (-NN) using Manhattan (MD), -NN using Euclidean distance (ED) (with value of , in both cases), Multilayer Perceptron (MLP), Support Vector Machine (SVM), and Extreme Learning Machines (ELM) is analyzed [22, 23]. A comparison of performance of these classifiers in terms of recognition accuracy with variation of feature-set dimensionality is studied. For SVM, the performance of PUK, RBF, and Poly kernel is deliberated. For ELM, linear, Poly, and RBF kernels are taken into consideration. SVM with PUK has the strongest mapping power and better generalization as compared to other kernel functions like polynomial, linear, and RBF . They outperformed in terms of recognition accuracy as compared to other kernels and classifiers like -NN, MLP, and SVM with polynomial and RBF kernels [17, 25]. ELM is used in various multiclass classification applications and gives similar or better generalizations at faster learning speed as compared to SVM which has high computational complexity . In each case, the results for raw feature vector and feature vector after normalization are also compared.
4. Results and Discussions
The accuracy is defined as total number of correctly recognized images to total number of test images:To ensure the recognition ability of Krawtchouk moments, the comparison with Hu moments and Zernike moments is done. Table 1 shows the recognition accuracy for ISL alphabets for these three moments. For a small feature vector up to 3rd order, Krawtchouk moments show the best results for all the classifiers except for ELM. On normalization, the best accuracy of 85.9% is observed with Krawtchouk moments by SVM PUK, SVM polynomial, and -NN using Manhattan distance. In case of Hu and Zernike moments, the best recognition accuracies are observed in SVM PUK and ELM polynomial kernel.
In Tables 2 and 3, a comparison of accuracies is carried out for increased feature vector of Krawtchouk moments for ISL database and the standard Jochen-Triesch’s dataset. With an increase in the feature-set at different ROIs, recognition accuracy improves gradually for different classifiers. A total of five ROIs are taken, from center, top-left, top-right, bottom-left, and bottom-right of an image. The aim is to find a combination of feature-set and a classifier that gives the best recognition accuracy.
For the maximum feature vector size of 45, ELM with polynomial kernel gives the best recognition of 88.1% followed by SVM with PUK and ELM with linear kernel both giving 87.9%. ELM with RBF gives 87.7% accuracy. Some classifiers like -NN using Euclidean and Manhattan distance show good recognition accuracy even with a smaller feature vector size. For other classifiers like SVM and MLP, a larger feature-set results in better accuracy.
However, after normalizing the feature-set, improvement in accuracy is observed. Normalization maps the feature values into a specific range. In this paper, the normalized feature values lie within . It helps in improving the recognition accuracy as it accelerates the training step [29, 30]. For normalized feature vector, SVM with PUK shows best recognition accuracy of 90%. It is followed by ELM with polynomial kernel giving 89.3%. SVM with RBF and ELM with RBF kernel both show 89.2%.
On applying CFS technique, the feature-set reduces to 22 features. Table 4 shows the recognition accuracies at different classifiers for selected feature-set. For some classifiers the performance of accuracy reduces slightly but for others the recognition accuracy improves. The best performance is shown by ELM with polynomial kernel giving 89.6%. Also, SVM with RBF, ELM with RBF, and MLP show the comparable results. On normalization of reduced feature vector, SVM PUK and ELM polynomial both show the best accuracy at 90% at 22 features. This is equivalent to maximum accuracy achieved by normalized 45 features. ELM linear kernel gives 89.9%, followed by 89.8% shown by both SVM RBF and ELM RBF. Overall, the feature-reduction algorithm proves to be effective, as accuracies obtained at smaller feature-sets are either better or comparable.
In this paper, the spatial behavior of 1D and 2D Krawtchouk polynomials at lower orders has been studied. Reconstruction results ensure that the lower order moments give local characteristics of an image. Krawtchouk-based moment invariants are extracted till 3rd order to represent their shape details. The ROI is varied by changing parameters and , and features are calculated from different regions in an image. It is concluded that lower order Krawtchouk moment proves to be effective shape descriptors. The performance of these in terms of recognition accuracy is analyzed for different classifiers. It is observed that ELM polynomial kernel gives the best recognition accuracy of 88.1% at 45 features followed by SVM PUK and ELM linear kernel with 87.9%.
However, on normalizing features, 90% accuracy is observed in SVM PUK followed by 89.3% in ELM polynomial kernel.
Other classifiers also show encouraging results. However, the feature-set is reduced from 45 to 22 by CFS algorithm. The best performance is shown by ELM with polynomial kernel giving 89.6% from 88.1% at 45 features. With normalization of features, the accuracy improves further on feature selection. SVM PUK and ELM polynomial kernel both show the best accuracy at 90% at 22 features. ELM and SVM both perform best when the feature-set is increased and normalized. -NN shows best performance with a lower feature vector size.
Overall, it can be concluded that the normalized feature-set of 22 features is capable of classifying the images with maximum accuracy of 90% in case of SVM PUK and ELM-Poly classifiers for ISL dataset. For Jochen-Triesch dataset, it is 84.9% for ELM with Poly kernel.
Table 5 shows the comparison of results of proposed feature-set for Jochen-Triesch’s dataset. It is concluded that Krawtchouk moment-based features prove to be effective in shape recognition capability even with a smaller feature vector size as compared to other methods used which involve a large feature vector size. The feature-set for this database can be increased by extracting Krawtchouk moment features at higher orders. The technique employed for feature extraction can be based on image partitioning, with Krawtchouk moment features calculated at different subimages.
The authors declare that there are no competing interests regarding the publication of this paper.
- Y. S. Abu-Mostafa and D. Psaltis, “Recognitive Aspects of Moment Invariants,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 6, no. 6, pp. 698–706, 1984.
- A. J. Noraini, “A comparative study of face recognition using discrete orthogonal moments,” in Proceedings of the10th International Conference on Information Sciences Signal Processing and their Applications (ISSPA '10), pp. 197–200, Kuala Lumpur, Malaysia, May 2010.
- J. S. Rani and D. Devaraj, “Face recognition using Krawtchouk moment,” Sadhana, vol. 37, no. 4, pp. 441–460, 2012.
- E. D. Tsougenis, G. A. Papakostas, D. E. Koulouriotis, and V. D. Tourassis, “Performance evaluation of moment-based watermarking methods: a review,” Journal of Systems and Software, vol. 85, no. 8, pp. 1864–1884, 2012.
- X. B. Dai, H. Z. Shu, L. M. Luo, G. N. Han, and J. L. Coatrieux, “Reconstruction of tomographic images from limited range projections using discrete Radon transform and Tchebichef moments,” Pattern Recognition, vol. 43, no. 3, pp. 1152–1164, 2010.
- A. Sit and D. Kihara, “Comparison of image patches using local moment invariants,” IEEE Transactions on Image Processing, vol. 23, no. 5, pp. 2369–2379, 2014.
- S. P. Priyal and P. K. Bora, “A robust static hand gesture recognition system using geometry based normalizations and Krawtchouk moments,” Pattern Recognition, vol. 46, no. 8, pp. 2202–2219, 2013.
- M. K. Hu, “Visual pattern recognition by moment invariants,” IRE Transactions on Information Theory, vol. 8, no. 2, pp. 179–187, 1962.
- M. R. Teague, “Image analysis via the general theory of moments,” Journal of the Optical Society of America, vol. 70, no. 8, pp. 920–930, 1980.
- A. Khotanzad and Y. H. Hong, “Invariant image recognition by Zernike moments,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 12, no. 5, pp. 489–497, 1990.
- P.-T. Yap, R. Paramesran, and S.-H. Ong, “Image analysis by Krawtchouk moments,” IEEE Transactions on Image Processing, vol. 12, no. 11, pp. 1367–1377, 2003.
- X. Wang, B. Xie, and Y. Yang, “Combining krawtchouk moments and HMMs for offline handwritten chinese character recognition,” in Proceedings of the 3rd International IEEE Conference on Intelligent Systems, pp. 661–665, London, UK, September 2006.
- G. Y. Pryzva, “Kravchuk orthogonal polynomials,” Ukrainian Mathematical Journal, vol. 44, no. 7, pp. 792–800, 1992.
- J. Triesch and C. Von der Malsburg, “Classification of hand postures against complex backgrounds using elastic graph matching,” Image and Vision Computing, vol. 20, no. 13-14, pp. 937–943, 2002.
- S. Liu, Y. Liu, J. Liu, and Z. Wang, “A static hand gesture recognition algorithm based on krawtchouk moments,” in Pattern Recognition: 6th Chinese Conference, CCPR 2014, Changsha, China, November 17–19, 2014. Proceedings, Part II, vol. 484 of Communications in Computer and Information Science, pp. 321–330, Springer, Berlin, Germany, 2014.
- H. Wang, T. M. Khoshgoftaar, and A. Napolitano, “Stability of filter- and wrapper-based software metric selection techniques,” in Proceedings of the IEEE 15th International Conference on Information Reuse and Integration (IRI '14), pp. 309–314, Redwood City, Calif, USA, August 2014.
- S. Chapaneri, R. Lopes, and D. Jayaswal, “Evaluation of music features for PUK kernel based genre classification,” Procedia Computer Science, vol. 45, pp. 186–196, 2015.
- J. L. Rodgers and W. A. Nicewander, “Thirteen ways to look at the correlation coefficient,” The American Statistician, vol. 42, no. 1, pp. 59–66, 1988.
- D. J. Dittman, T. M. Khoshgoftaar, R. Wald, and A. Napolitano, “Simplifying the utilization of machine learning techniques for bioinformatics,” in Proceedings of the 12th International Conference on Machine Learning and Applications (ICMLA '13), pp. 396–403, IEEE, Miami, Fla, USA, December 2013.
- R. Wald, T. M. Khoshgoftaar, and A. Napolitano, “Using correlation-based feature selection for a diverse collection of bioinformatics datasets,” in Proceedings of the IEEE International Conference on Bioinformatics and Bioengineering (BIBE '14), pp. 156–162, IEEE, Boca Raton, Fla, USA, November 2014.
- X. Xu, A. Li, and M. Wang, “Prediction of human disease-associated phosphorylation sites with combined feature selection approach and support vector machine,” IET Systems Biology, vol. 9, no. 4, pp. 155–163, 2015.
- X. Chen and M. Koskela, “Online RGB-D gesture recognition with extreme learning machines,” in Proceedings of the 15th ACM on International Conference on Multimodal Interaction (ICMI '13), pp. 467–474, ACM, Sydney, Australia, December 2013.
- G.-B. Huang, H. Zhou, X. Ding, and R. Zhang, “Extreme learning machine for regression and multiclass classification,” IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics, vol. 42, no. 2, pp. 513–529, 2012.
- B. Üstün, W. J. Melssen, and L. M. C. Buydens, “Facilitating the application of support vector regression by using a universal Pearson VII function based kernel,” Chemometrics and Intelligent Laboratory Systems, vol. 81, no. 1, pp. 29–40, 2006.
- G. Zhang and H. Ge, “Support vector machine with a Pearson VII function kernel for discriminating halophilic and non-halophilic proteins,” Computational Biology and Chemistry, vol. 46, pp. 16–22, 2013.
- A. Just, Y. Rodriguez, and S. Marcel, “Hand posture classification and recognition using the modified census transform,” in Proceedings of the 7th International Conference on Automatic Face and Gesture Recognition (FGR '06), pp. 351–356, IEEE, Southampton, UK, April 2006.
- D. Kelly, J. McDonald, and C. Markham, “A person independent system for recognition of hand postures used in sign language,” Pattern Recognition Letters, vol. 31, no. 11, pp. 1359–1368, 2010.
- D. Dahmani and S. Larabi, “User-independent system for sign language finger spelling recognition,” Journal of Visual Communication and Image Representation, vol. 25, no. 5, pp. 1240–1250, 2014.
- L. A. Shalabi, Z. Shaaban, and B. Kasasbeh, “Data mining: a preprocessing engine,” Journal of Computer Science, vol. 2, no. 9, pp. 735–739, 2006.
- I. Dinç, M. Sigdel, S. Dinç, M. S. Sigdel, M. L. Pusey, and R. S. Aygün, “Evaluation of normalization and PCA on the performance of classifiers for protein crystallization images,” in Proceedings of the IEEE Southeastcon, pp. 1–6, Lexington, KY, USA, March 2014.
Copyright © 2016 Bineet Kaur and Garima Joshi. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.