Abstract

With the rapid development of the information society, human body gesture recognition has become an important technology for human-computer interaction. This paper combines Kinect’s human bone monitoring technology with auxiliary gymnastics training. The gymnastics and dance training can correct students’ wrong movements in time through feedback and improve the training efficiency, so as to meet the needs of nature and harmony of human-machine interaction. In this paper, based on the wireless network Kinect, the human body posture recognition method and tracking technology are studied, and the joint point angle representation method based on the fixed axis is proposed, and the posture recognition method based on the joint point angle is improved, which can accurately recognize the human body posture. Aiming at the situation that the human joint points are occluded, the human joint point repair algorithm is improved. The algorithm is based on the proportion of human bone nodes and the characteristics of human motion, and based on geometric principles, it repairs the occluded points. The feasibility of the original joint point data, angle feature, and distance feature in expressing human behavior is analyzed through experiments, a standard gymnastics movement database is established, and new gymnastics movements can be entered at any time. A gymnastics auxiliary training system is designed, which can analyze and evaluate the exercises of the trainer from the joint point coordinates and the angle formed by the joints and provide the trainer with intuitive error correction prompts. The human body posture recognition method studied in this paper can accurately give the difference between the trainer’s movement and the standard movement, and the trainer can adjust the movement posture according to the prompts, improve the level of gymnastics, and achieve the purpose of auxiliary training. Experiments show that the algorithm model has an accuracy rate of 95.7% for human body posture recognition, and it plays a huge role in line dance aerobics and gymnastics training.

1. Introduction

Human body gesture recognition research has various forms and rich content, involving many fields such as computer vision, sensor technology, image processing, and pattern recognition. The analysis of human motion behavior can be applied to various sports, and the posture data of athletes during training and competition can be extracted to provide them with valuable reference information and guidance. At the same time, exercise analysis can also be used in fitness activities, patient rehabilitation exercises, and other scenes. By collecting human body movement information, it can assist fitness coaches and medical staff to give better exercise and rehabilitation plans. However, there are still many basic problems that have not been solved in the recognition of human posture through computer vision. Therefore, this article uses the Kinect-based human posture recognition algorithm to analyze the recognition of human posture and action. This method can be said to be a combination of the advantages of wearable devices and vision. Kinect can not only accurately locate the main joint points of the human body but also provide three-dimensional information. And it will not cause inconvenience. Because it is based on the infrared principle to obtain depth data, it is basically not interfered by natural light. At the same time, Kinect can also obtain color images, which not only open up new methods based on visual gesture recognition, but also take into account the original technology.

The universal Kinect-type controller allows the user to control any existing application by using body movements as input. The working principle of middleware is to convert detected actions into keyboard and/or mouse click events and send them to the target application. Paliyawan and Thawonmas introduced the structure and design of the core module and combined actual cases to illustrate how to configure middleware to adapt to various applications. They demonstrated interface designs that decode all configuration details into human interpretable languages, which significantly enhance the user experience and eliminate the need for programming skills. The performance of the middleware is evaluated based on fighting game motion data, and Paliyawan and Thawonmas make the data public so that they can be used for other research. For example, it can be used to promote a healthy life through play and/or to conduct serious research on the exercise system. However, the Kinect controller they proposed has a limited application range for motion systems and cannot be used in more applications [1]. Hwang et al.’s study tested the simultaneous validity and test-retest reliability of the Kinect bone tracking algorithm to measure the angles of the torso, shoulder, and elbow joints during the wheelchair transfer task. Eight wheelchair users were recruited for this study. Kinect and Vicon motion capture systems simultaneously recorded the joint positions, while the subject was transferred from the wheelchair to the horizontal bench. The angles of the shoulders, elbows, and torso recorded by the Kinect system follow a trajectory similar to the angles recorded by the Vicon system, and the correlation coefficients on both sides (forearm and back arm) are greater than 0.71. The root mean square error (RMSE) of shoulder, elbow, and torso angles ranged from 5.18 to 22.46. The 95% limit of agreement (LOA) for the difference between the two systems exceeds the clinical significance level by 5°. For trunk, shoulder, and elbow angles, Kinect has very good relative reliability in measuring trunk sagittal, frontal, and horizontal angles. However, the Kinect bone tracking algorithm they studied is too complicated to calculate when measuring joint data, and the accuracy of the calculation is difficult to predict [2]. Zhang et al. proposed an integrated imaging display system based on KinectFusion, using only a mobile low-cost depth camera as a pickup device. The multiframe depth data of the observation scene streamed from the Kinect sensor is fused into a single global surface model represented by the volume truncated symbol distance function. Therefore, the inherent noise associated with a single frame of depth data can be eliminated. The element image array for display is obtained by ray casting the volume truncated symbol distance function. In order to match the pickup part and the display part, the ray optics theory is used to derive the relationship between the pickup voxel and the display voxel. However, their method is only suitable for mobile low-cost depth cameras and cannot reflect the effectiveness of the research [3].

This paper studies the role of human posture recognition method based on wireless network Kinect in line dance, aerobics, and gymnastics training. The main structure of the paper is arranged as follows: the first part of the paper is the research background and significance of human posture recognition method of wireless Kinect and introduces the main research content of the subject. The second part of this paper studies the current posture and motion recognition methods of Kinect bone information, summarizes the principles of feature extraction, and puts forward the distance and angle features that can describe the human body based on the analysis of joint point data and skeleton model. In the third part of this paper, the experimental design of gymnastics auxiliary training system based on Kinect is carried out, including the design of experimental method and data acquisition method. The fourth part is the analysis of Kinect gymnastics auxiliary training experiment combined with human posture recognition method to verify whether the algorithm designed in this paper is effective. The fifth part summarizes the research content and results of this paper. Then, it points out the work that needs to be further improved and improved in the future.

2. Human Pose Recognition Algorithm

2.1. Action Representation Based on Pose Sequence

To recognize an action, the first step is to express the action. In the static posture, this article uses a frame of feature data to represent it. An action can be regarded as a combination of a series of skeletal frame data, and each frame is equivalent to a static posture, which is said action can be seen as a combination of multiple static posture sequences [4, 5]. In the traditional visual method action recognition research, it is generally necessary to select several key frame image sequences to represent the action, and the action is broken down into several poses, because if multiple image sequences are selected to represent the action, the computational cost of the image data is high. It is impossible to meet the real-time requirements [6]. This method of selecting key frames breaks the continuity of the action. Since Kinect can extract about 30 frames of bone data per second, this article hopes to be able to express the action through continuous bone frame data; that is to say, for continuous bone data carrying out feature extraction to form the action feature sequence, this way is more appropriate to the description of the action [7, 8]. Based on the above considerations, this article will express the human body movement through a continuous posture sequence over a period of time. Then, for an action , there is the following equation:

Among them, represents an action and represents the feature quantity of the human body posture corresponding to the -th frame.

If the angle characteristics of a group of dynamic behavior key frames are

then, the representation sequence for dynamic behavior is as follows:

2.2. Feature Extraction Based on Joint Point Information

Kinect’s joint point tracking function can collect the spatial position of 25 joint points in the three-dimensional coordinate system during human movement [9, 10]. However, whether the selected behavior characteristics can adequately represent human behavior is a key issue. If the 25 joint points of the human body obtained by Kinect are directly used as the feature vector for classification selection, there will be several shortcomings. On the one hand, different human body structures are quite different, and the selected original joint point data are not general. For example, the data of the same person at different distances will cause errors in the data of different people at the same distance; on the other hand, the data dimension of 25 joint points reaches 60 dimensions. Too high a dimension will make the calculation process very complicated, and there will be redundant information [11, 12]. In the research on the behavioral feature extraction of Kinect three-dimensional joint points, it is basically based on the recognition purpose and the targeted feature extraction from the perspective of geometric relations.

2.2.1. Angle Feature Based on Joint Point Information

Kinect establishes 25 joint point coordinates based on human bones, built-in human bone structure, and connects depth information in parallel to realize 3D space view [13, 14]. The skeleton joint point model diagram of the human bone structure is shown in Figure 1:

Assuming that the left elbow joint point space coordinate is , the left wrist joint point space coordinate is , the left shoulder joint point space coordinate is , the joint vector between the left elbow and the left wrist is constructed as , and the joint vector between the left elbow and the left shoulder is, among them are as follows:

Then, the calculation formula of the angle feature between the vector and the vector is as follows:

2.2.2. Distance Features Based on Joint Points

When it is necessary to determine the relative distance between certain joint points and the human head and torso, the angle information cannot provide enough behavioral details. The relative distance coefficients of the joint points of the upper limbs of the human body are selected, but the joint points of the hips and the lower body of the legs are ignored [15, 16]. On this basis, this paper redefines the distance characteristics of human behavior to improve the processing of human joint point data. After analyzing the information of human joint points, it is found that the right wrist, left wrist, right ankle, and left ankle of the limbs joint points contribute more to the behavioral expression. The knee joint, elbow joint and other joint points are of little significance as a behavioral assessment. Based on the above analysis, this paper selects four sets of distance features, namely, the distance from the right wrist to the center of the buttocks, the distance from the left wrist to the center of the buttocks, the distance from the right ankle to the center of the buttocks, and the distance from the left ankle to the center of the buttocks. Since the size of each person’s skeleton model is different, the distance feature will be different when performing the same action. Therefore, based on the above distance feature, a relatively stable distance feature from the center of the shoulder to the center of the hip is used for the other four distance features. It needs normalization processing [17, 18]. This part involves the structure vector of the human joint points. The behavior representation distance features are as follows:

2.3. Posture Recognition Method Based on Markov Model

Assuming that the system has states , when the system state transfers to another state over time, assuming that the state at is , the probability of occurrence of in the state of the system at is directly related to the previous state, and the relationship can be expressed as follows:

The state of the system at time is related to the state at time ; then, the Markov model is as follows:

If we only consider the random process of independent time ,

Among them, the transition probability of the state must satisfy .

A measure describes the similarity between the two given point sets and . The distance between and is called Hausdorff distance, which can be expressed as follows:

The size of the distance is proportional to the similarity of the two sets of and ; that is, the smaller the distance is, the closer the two sets of and are.

2.4. Posture Recognition Based on Angle Measurement of Joint Points

Suppose the spatial coordinates of two points are and , then, the Euclidean distance between the two points is as follows:

Obtain the angle of any joint of the human body through the three-point coordinates of the joint.

The above calculation method of included angle based on joint coordinates is theoretically feasible, but in practical applications, the joint points are unstable with each other, so the obtained results have large errors and cannot be directly used for posture recognition. To intercept the representative static attitude data from the bone data flow, the possible solution is different from other methods which are calculating the European distance of the coordinate value of the two frames of the joint points before and after and then realizing online recognition combined with the identification algorithm of this paper. In addition, the size of the sample space determines the cost of storage and calculation. Online identification requires high real-time performance, and the sample space size should be as simple as possible. The sample space can be analyzed to remove samples with large similarity and samples with little contribution to the classification.

3. Experimental Design of Gymnastics Training System Based on Kinect

3.1. Introduction to Kinect Structure

The appearance of the Kinect is shown in Figure 2. The appearance of the Kinect shows that the Kinect is composed of an infrared transmitter, an RGB camera, an infrared camera, and a four-element linear microphone array.

The specific components of Kinect are shown in Figure 3. Kinect consists of these components. Kinect’s four sound microphones are responsible for collecting audio data, and the infrared transmitter and infrared camera are used to obtain depth data. The collimated infrared beam emitted by the transmitter is in contact with the rough surface. The distortion of the spectrum will bring about highly random reflection spots (speckle) when the object is different; the speckles at different distances form different patterns and are received by the infrared receiver [19, 20]. The depth image obtained by the infrared receiver is pixel-matched by the system processor and the color image obtained by the RGB camera to obtain a color image with depth information; the color camera obtains the color image, and the core component is the central processing unit chip, which is responsible for controlling other sensors [21, 22]; for example, control the infrared transmitter to project the structure of the external light and at the same time receive the spot formed by the infrared light on the object through the CMOS image sensor and transmit it to the processor for processing to obtain the depth data and transmit it to the computer via USB [23, 24]. Kinect somatosensory sensor is a device that combines a variety of data. From Table 1, the basic specifications and parameters provided by Kinect can be understood.

3.2. Research Status of Human Gesture Recognition Based on Kinect

Traditional human gesture recognition can be divided into wearable device recognition and visual recognition according to the different ways of acquiring the original data of behavior as shown in Table 2. Although the wearable device collects behavioral data more accurately, it will make users feel inconvenient in the actual use process, and at the same time, it is more difficult to operate, and the cost of the device is also relatively high. Vision-based data collection methods are easily affected by external factors such as light and texture.

3.3. Depth Image Acquisition Technology

The key to Kinect’s success is to obtain depth images in a cheap way. This technology is supported by PrimeSense. One of the biggest advantages of Kinect is to obtain depth information [25]. The depth information is collected by the infrared transmitter and the infrared receiver together, and the light coding technology (light coding) [26] is used to effectively capture the point information in the field of view, so as to obtain the depth image of the environment where the Kinect is located. The techniques of using infrared to obtain depth images include TOF (time of flight) and structured light testing technology. TOF technology uses the transmission delay between light pulses to obtain depth data [27, 28], and structured light testing is based on optical coding. The technology uses an infrared transmitter to project the infrared spectrum in the visible range of Kinect, and the infrared camera senses the changes in the infrared spectrum [29, 30] to obtain depth data. In the traditional image, the three-dimensional scene is projected into a two-dimensional image, and the depth image (depth image), also called the pixel point of the distance image, expresses the distance between the objects in the visible range and the camera [31, 32]; currently, the detection environment depth information mainly relies on the following technologies, which are binocular stereo vision, radar detection, TOF (time of flight) technology, and structured light-based depth detection technology [4, 33]. Among the above methods, the structured light method has become a research hotspot in recent years due to its low cost and high accuracy. The depth image system diagram is shown in Figure 4.

The imaging distance of Kinect in real space is directly reflected in the depth value of each pixel on the acquired depth image. Each pixel value of the depth image is represented by a 16-bit binary number.

As shown in Table 3, the lower 3 bits of the binary table are used to distinguish different measured objects, which will eventually be converted into an integer value according to the number of measured objects in the visible range. When the integer value is 0, it means that the object is not found in the visible range. If the value is 1 or 2, it means that “measured object 1” or “measured object 2” has been recognized. In the field of vision, Kinect can read the lower three bits of information to distinguish individuals. This information only has the counting function, not the identification function. For the 16-bit binary number of each pixel, if its value is 0, it indicates that the depth information of that position cannot be obtained.

3.4. Gymnastics Auxiliary Training System Based on Kinect

This paper develops a Kinect-based gymnastics auxiliary training system, which combines gymnastics training with Kinect human skeleton recognition and tracking technology to recognize postures. In this paper, the standard gymnastics posture data is collected to the computer through Kinect in advance, and a standard gymnastics posture database is established, which is used as a template to compare and analyze the collected trainer’s gymnastics posture data with the template posture and intuitively compare the trainer’s gymnastics posture. The evaluation has opened a new stage in the digitalization of gymnastics training. Rhythmic gymnastics is scientific and standardized. It pays attention to the precision of movements and expresses the emotions and thoughts of dancers with high-standard movements. Basic gymnastics training is a compulsory and major course for gymnastics training. Its characteristic is to clarify the standard, systematic, and scientific nature of the movements. Therefore, this article regards rhythmic gymnastics basic training movements as experimental movements for gymnastics auxiliary training. The system framework is shown in Figure 5.

3.5. Data Collection

In order to correct the gymnastics movements of the trainer, there must be standard gymnastics movements as a comparison template. In this part, by obtaining the gymnastics movements of the gymnastics coach, the information of the occlusion points is restored to form standardized exercise data. This article invites professional gymnastics coaches to demonstrate standard gymnastics movements and use Kinect to collect gymnastics movement data. Mark the corresponding action name for each group of data and save it as exercise information as a comparison template for trainers. A total of 50 groups of basic gymnastics training movements are entered in the database, and each training movement can be divided into 4 decomposition movements, which means that a total of 200 movements can meet the educational needs of beginners, and the system can also be based on the trainer in the later stage. Enter new standard actions for different needs.

4. Role of Human Body Posture Recognition Method Based on Wireless Network Kinect in Line Dance Aerobics and Gymnastics Training

4.1. Feature Extraction Experiment Evaluation and Analysis

Through the analysis of the joint point information, the angle feature and distance feature expressing human behavior are determined, but whether these two features can meet the feature invariance described above remains to be verified. Therefore, this section uses experiments to evaluate and analyze feature invariance. Verify whether the measured object is at different distances from Kinect to the feature. Select the action of sitting and standing, fix the Kinect to 1.0 m high, collect 1000 frames of joint point data of the same measured object at a distance of 2.0 m, 2.5 m, and 3.0 m from the Kinect, and take the average of the 1000 frames of joint point data converted into angle feature and distance feature, and also added the original joint point data for experimental comparison. The experimental results are shown in Figures 6 and 7.

The above experimental results show that the measured object is at different distances from Kinect; when the original joint points of the same measured object are collected to represent the sitting and standing behavior, large errors will be caused due to the difference in depth and distance, and the distance and angle characteristics are both acceptable. The fluctuation within the allowable deviation is basically not affected by different distance factors.

Through analysis, it can be known that the angle feature and distance feature used to represent human behavior have better translation invariance and zoom invariance than the original joint point data. However, the number of distance features is small, the expression effect is single, and it lacks in the expression of human behavior in multiaction recognition. Therefore, this article chooses the angle feature to represent the static behavior. Figure 8 shows a comparison chart of the sequence curve comparison of the left knee joint angle characteristics of a jumper’s two groups of jumping movements.

4.2. Gymnastics-Assisted Training Based on Kinect’s Human Posture Recognition Method

The movement data of the trainer is collected and the movement information of the standard movement is compared, and training guidance is given according to the comparison result, so as to quickly improve the level of the trainer. The angle calculation method based on joint coordinates is theoretically feasible, but in practical applications, the joint points are unstable with each other, so the obtained results have a large error and cannot be directly used for posture recognition. Therefore, this article proposes a method of angle representation based on a fixed axis; that is, the positive direction of the -axis is used as the reference line, and the connection between the two joint points is used as the line to be measured, and the line to be measured takes the central axis of the human body outward as the positive direction. Take the horizontal axis of the shoulder as the center and outward as the positive direction, calculate the angle between the line to be measured and the reference line in a counterclockwise order, and define the angle as the angle of the two joint points. In this way, it can ensure that the line to be measured and the reference line are relatively stable, and the accuracy of the angle measurement can be ensured. This article uses joint position, speed, and angle to express the difference between the trainer’s gymnastics movements and standard movements. In order to judge whether there are differences between the two groups of movements, we compare and analyze the average value, standard deviation, and value. The results are shown in Table 4 (), so these three characteristics are different in gymnastics training.

In order to analyze the role of the human body posture recognition method in gymnastics and gymnastics training, this article compares the traditional gymnastics training method and the Kinect human body posture recognition method to the gymnastics training. The five students in the control group were trained by traditional gymnastics training methods, and the students in the experimental group were trained using Kinect human posture recognition gymnastics training methods. The two groups of students had not undergone any gymnastics and related training. The basic information of the students is shown in Table 5.

Next, the students will be scored on the basis of the exercises of the students before gymnastics training, and they will be scored in turn two weeks and one month after the gymnastics exercise training, and then compared with the scores before and after the gymnastics exercise training. The detailed results are shown in Table 6.

It can be seen from Table 6 that after two weeks of training (), the results show that the difference before and after training is not obvious. The trainees generally have no gymnastics training experience and their foundation is weak. Therefore, after two weeks of training, there is little difference before and after training.. After gymnastics training one month later (), the results show that there are significant differences before and after training. The next step is to analyze the progress of the experimental group and the control group through another -test. The improvement is calculated by subtracting the base score from the posttraining score. The results are shown in Table 7.

It can be seen from Table 7 () that there is a significant difference between the two groups after gymnastics training. The progress of the experimental group was higher than that of the control group, indicating that the students made significant progress after using Kinect to assist in training. Instruct students to improve the effectiveness of the learning process and conduct a questionnaire survey after gymnastics training in the experimental group and the control group. The result is shown in Figure 9.

It can be seen from Figure 9 that the survey results of the experimental group after gymnastics training are compared with those of the control group. In the process of gymnastics movement learning, the standard degree of gymnastics movements and the training results of the control group are weaker than those of the experimental group, and the experimental group I thinks this method is more interesting. In the survey of improving gymnastics ability, the experimental group thinks that the improvement of gymnastics ability is higher. The wrong movements can be corrected through feedback, and the learning efficiency is improved by 42.6% compared with the control group. The research results show that the Kinect-based human body posture recognition method-assisted training can help students better perform gymnastics training, can effectively improve the dance level of the trainer, and can achieve the purpose of assisted training.

5. Conclusions

The human body is a highly complex combination, and the postures it can express are ever-changing, and it is impossible to recognize each posture. In this article, a gymnastics auxiliary training system is designed, which collects samples of gymnastics gestures and recognizes the characteristics of the characteristic data. The better recognition effect provides ideas for gesture recognition. The human posture recognition method based on Kinect can give training suggestions according to the movement posture comparison results, and the trainer can reasonably adjust the movement according to the software prompt to achieve the purpose of auxiliary training. Experiments show that the accuracy of human posture recognition is as high as 95.7%, and the auxiliary training efficiency is improved by 42.6%, which can effectively improve the dance level of trainers. There are still many shortcomings in the research of this paper, such as (1) this paper uses a single Kinect to evaluate the trainer and can only collect data from a single perspective and cannot cover all perspectives. The next step can be to consider using two or more Kinects to collect the trainer’s movement data at the same time, realize the collaborative work of multiple Kinects through software programming, and use two or more sets of data to restore high-precision human motion models and improve the accuracy of gymnastics training. (2) Establish a more comprehensive and detailed library of expert gymnastics movements. This system only collects more than 30 groups of basic gymnastics training movement information from a professional gymnastics coach, and it cannot meet the needs of a large number of gymnastics movement training. In the future, more professional coaches’ gymnastics movements can be collected to enrich the database. At the same time, this article only takes rhythmic gymnastics basic training exercises as experimental exercises, but there are many types of gymnastics, and we can continue to collect standard information of different types of gymnastics exercises in the future to provide more auxiliary training support for different gymnastics trainers.

Data Availability

No data were used to support this study.

Disclosure

The author has seen the manuscript and approved to submit to your journal. The author confirms that the content of the manuscript has not been published or submitted for publication elsewhere.

Conflicts of Interest

There are no potential competing interests in this paper.