Abstract

Computer vision has become a fast-developing technology in the field of artificial intelligence, and its application fields are also expanding, thanks to the rapid development of deep learning. It will be of great practical value if it is combined with sports. When a traditional exercise assistance system is introduced into sports training, the athlete’s training information can be obtained by monitoring the exercise process through sensors and other equipment, which can assist the athlete in retrospectively analyzing the technical actions. However, the traditional system must be equipped with multiple sensor devices, and the exercise information provided must be accurate. This paper proposes a motion assistance evaluation system based on deep learning algorithms for human posture recognition. The system is divided into three sections: a standard motion database, auxiliary instruction, and overall evaluation. The standard motion database can be customized by the system user, and the auxiliary teaching system can be integrated. The user’s actions are compared to the standard actions and intuitively displayed to the trainers as data. The system’s overall evaluation component can recognize and display video files, giving trainers an intelligent training platform. Simulator tests are also available. It also demonstrates the efficacy of the algorithm used in this paper.

1. Introduction

With the rapid growth of China’s emerging sports [13] and health industry [46], more people are devoting their time to sports like golf and skiing. Beginners who do not learn in a systematic way are more likely to fail to improve their technical level due to nonstandard movements, which can lead to sports injuries [79]. As a result, you must frequently review and analyze your actions in order to make improvements. Traditional training methods require professional sports coaches to conduct one-to-one teaching [10]. High labor costs and a lack of flexibility are among the issues. It is currently an urgent problem to solve how to conduct teaching and training in a simpler and more effective manner. Sports assist systems are frequently used by professional athletes in their daily training. Professional training analysts model the athletes based on sensor data and correct the details of the athletes’ movements by analyzing the data collected from the limbs during training.

In recent years, with the emergence of big data and the substantial increase in computer parallel computing capabilities, deep learning [1114] has made breakthroughs in the fields of computer vision [1517] and natural language processing by relying on rich training data and powerful feature expression capabilities [18]; especially in the goals, many practical breakthroughs have been made in detection, machine translation [19], and motion recognition. Target detection is an important research topic of computer vision. It is the basis of many computer vision tasks [20, 21]. It has also been widely used in real life. For example, traditional target detection algorithms such as face recognition [2224], unmanned driving, and target tracking use artificial manually extract features; there are problems such as incomplete feature information extraction and poor recognition effect. With the advent of convolutional neural networks, deep convolutional neural networks can automatically extract task-related feature information from massive data. Compared with traditional machine learning algorithms [2527], they have obvious advantages by manually setting feature extraction rules [28]. The detection algorithm has been greatly improved in detection accuracy and speed based on the goal of deep learning, which not only broadens the scope of computer vision applications but also adds more application value. Another important research topic in computer vision is semantic analysis of video, particularly human action recognition, which has a wide range of application scenarios, including human-computer interaction and gesture recognition. It is primarily intended for video sequences. The video can no longer be accurately described by the characteristics of pure static images. Therefore, the human motion recognition algorithm [29, 30] not only needs to extract the spatial features of each frame of image but also analyzes the temporal features between frames. The development of recurrent neural networks makes deep learning capable of processing sequence data and is widely used in natural language processing tasks such as machine translation [31, 32], text generation, and personalized recommendation [33].

This article discusses combining computer vision technology and deep learning technology with sports and applying them to sports training and evaluation, providing a data-based training platform for sports trainers.

The main contributions of this paper are as follows: (1)This paper proposes a sports assisted evaluation system that uses deep learning algorithms to recognize human posture and provide an intelligent training platform for sports trainers(2)The concept of “close Angle” is proposed in this paper to solve the motion recognition error caused by inconsistencies in limb lengths and angles between the human body and the camera acquisition equipment. The cosine angle of known three-point coordinates can be used to calculate the joint angle of key parts of the human body during movement(3)To improve the model’s performance, this paper uses the idea of finding the Euclidean distance between two vectors to calculate the similarity between actions

The organization of the paper is as given. Section 2 discusses the background to the proposed research. Methodology of the paper is given in Section 3 with details of the work done in the proposed research. Experiments and results are given in Section 4. The paper is concluded in Section 5.

2. Background

Types of athletic, entertainment, mass, and medical movement include: type of competitive sports is people want to beat the opponent, and obtain good results, the level greatly cultivate and strengthen itself, groups in the body, such as physical, emotional bias will carry out the objective and reasonable, a full range of training and contest; type of competitive sports is people want to beat the opponent, and obtain good results, the level greatly cultivate and strengthen itself, groups in the body, such as physical, emotional bias will carry out the objective and reasonable. Recreational sports are activities that people engage in during their free time or at designated locations in order to achieve a happy effect. This kind of activity has the characteristics of nonprofessional, leisure, and pleasure. The activities usually include ball games, chess and cards, travel and play, and ethnic activities of partial distribution. Mass sports are sports activities commonly held in real life, such as people want to improve their physical fitness, resist diseases, train candidates for elite, and leisure time. The main objects involved are workers, farmers, and regional groups. At the same time, different projects have been held for people with different genders and inconvenient movement. Medical sports are in the treatment of some diseases and injuries, with the help of sports skills to help diagnose and to help improve the body function.

The movement of the human body can be described by the movement of a few key parts, and most of the movements can be described by piecing and tracking the joint parts. Generally speaking, traditional sports mainly have four major disadvantages: the restriction of places and devices, the special teaching of difficult movements, the difficulty of recording the practice data, and the boring process of practice. As a result, the development of an auxiliary training system allows trainers to undertake sports training at any time and in any area, regardless of time or location, and to capture real-time training data. However, in recent years, researchers have begun to look for ways to use machines to analyze athletes’ movements. Cameras can capture the entire training process and use computer vision technology to analyze and process the captured image data, resulting in a machine recognition effect of the human body, which aids human research in motion to some extent.

3. Methodology

3.1. Human Body Gesture Recognition

We use OpenPose [34], an open source library for human posture recognition, as the research tool in this article. OpenPose uses VGGNet-19 as a feature extractor during network model training. One of the biggest features of VGGNet is that it can process input image data of different resolutions, so in system development. In the process, proper processing of the image data input to the network can improve the fluency of the system. OpenPose is a multiperson gesture recognition library developed according to a bottom-up approach. The bottom-up method is more robust in the early stage. No matter the effect is as good in single-person and multiperson gesture recognition, the developed system can be easily turn to multiplayer sports evaluation. The current mainstream human pose recognition libraries include OpenPose, Mask R-CNN, and Alpha-Pose. The human body recognition time is compared among the three, and it is found that as the number of people in the image increases, the running time of Mask R-CNN and Alpha-Pose shows linear growth, and the running time of OpenPose remains unchanged, so the system developed using OpenPose will be more stable.

The human body gesture recognition module’s purpose is to choose a technology that will allow you to quickly and easily extract the position of key points on the human body from an image or video. For subsequent follow-up, the position information of the key points extracted must be accurate and conform to the actual rules of motion. The action evaluation went off without a hitch. After the action collection is complete, the human body gesture recognition module analyzes the image or video to extract the gesture data. Figure 1 depicts the OpenPose network structure.

In the first stage, the network generates a set of partial affinity fields , and represents the convolutional neural network structure that is also predicted in stage 1. In each subsequent stage, the prediction results from the previous stage and the original map feature are cascaded and used to generate accurate predictions.

Here, refers to the convolutional neural network structure of the prediction stage , and refers to the total PAF prediction. After iterations, the process is repeated to predict the confidence map from the latest PAF prediction stage.

Here, refers to the convolutional neural network structure of the prediction stage , and refers to the total confidence map prediction stage. The component association strategy is shown in Figure 2.

Figure 2(a) shows the two body parts (neck and marrow) and all possible joint pairs of three people. Figure 2(b) shows the result of using the midpoint strategy to connect. Both the correct connection and the wrong connection meet the constraints.

3.2. Action Evaluation

The prerequisite for recognizing human actions in images or videos is that there must be a set of action description rules. The action recognition effect is achieved by processing the posture information data extracted by the human posture recognition module according to the action description rules. The workflow of action description rules in the action evaluation process is shown in Figure 3.

3.2.1. Joint Angle

The joint points of the human body all have a coordinate position in the image, and the method of calculating the cosine angle by knowing the three-point coordinates can completely calculate the joint angle of the human body correctly. The joint angle is introduced to describe the action. The calculation process does not need to know the length of the body’s limbs, and the two different actions can be compared, and then, the experimental verification is carried out. The human skeleton model and bone label are shown in Figure 4.

The three-point coordinate is known to be based on the idea of finding the joint angle between two vectors using the law of cosines, as shown in Figure 5.

The calculation equation for finding the joint angle is as follows:

Then, we can find the cosine of the joint angle :

3.2.2. Action Similarity

If there are two different athletes doing the same movement and you want to know who is doing more standard, you need to use a method to find out the “distance” between the two movements. Here, the concept of movement similarity is introduced. Action similarity is at the heart of action description rules; it can be used to identify a single action, and it can also be used to identify the beneficial actions in continuous action sequences based on single action recognition. Using the joint angle data of the two movements, we can find the distance between the two movements, and this distance is actually the degree of similarity between the movements. The joint angle data of the two movements were, respectively, expressed in the form of multidimensional vectors, such as and , where represents the joint angle of the action to be measured and represents the joint angle of the template action. The Euclidean distance was used to solve the distance between the two vectors:

Then:

The smaller the distance, the more similar the action.

3.2.3. Action Evaluation

We use the minimum Euclidean distance, which is the similarity measure between two actions. The system can allow each joint angle between the movement of the exercise trainer and the standard movement to have the same error range. Within this range standard, find the movement most similar to the standard movement in the continuous movement. The specific realization can be that the realization prepares standard action data and then finds the (where ) key actions that are most similar to the standard action in a motion video containing (where ) actions.

First, select a representative key action in the movement as the standard action, and prepare a picture that contains the standard action. Use the human body gesture recognition technology to extract the joint points of the characters in the picture, calculate the respective joint angles, and press the action. Sort them one after the other (such as action 1 and action 2) and save them in the database. Then, use the shooting equipment to record the complete video of the athlete making this movement and import it into the system. The system extracts the joint angle data of all the movements in the entire video. Finally, the system compares the test motions’ joint angle data to the standard motion database, finds the test motions that are the most similar to the standard motions and have the same number, and displays the difference between the test motions and the standard motions to the user. The user has the option of displaying according to the view. The difference identifies the source of the tester’s issue.

3.3. Sports Training Auxiliary Decision-Making Evaluation System

The exercise-assisted evaluation system we proposed includes system login entry, account information modification, standard action database collection, assisted teaching, and overall exercise evaluation. The login window requires the user to enter the correct user name and password to enter the system. The account information allows the user to enter the system. Change the account password in time when the account is abnormal. The standard action database allows users to set a key action during exercise that can be used as the evaluation standard. The auxiliary teaching allows users to practice exercises anytime and anywhere. The overall evaluation can be based on the user’s prerecorded exercises. The video performs action filtering and outputs all action data that are highly similar to standard actions. The overall functional module diagram is shown in Figure 6.

4. Experiments and Results

4.1. Experimental Setup

All of the experiments in this article were run on a deep learning server with two NVIDIA GTX 1080 TI graphics cards with 11 GB of memory. The system software platform is primarily developed in the Visual Studio 2015 environment. The open source computer vision library OpenCV for deep neural networks, Microsoft’s MFC interface library, CUDA architecture, and GPU acceleration library CUDNN were all used in the development process. MFC is a set of basic class libraries created by Microsoft using the C++ programming language.

4.2. Dataset

This paper uses GolfDB as the experimental data set, which is a high-quality golf swing video data set, which is specially created for golf swing motion and used for golf swing motion recognition. GolfDB contains 1,400 golf swing video samples, with a total of more than 390 K frames of video data, which are collected manually from YouTube video website. 580 regular-speed and slow-motion golf swing motion videos are the most important for golf swing motion recognition. To ensure that the club is visible and to reduce motion blur, the sampling only considers 30 fps, 720P resolution video. The video mainly intercepts the swing movements of 248 professional golfers from the game videos of PGA, LPGA, and Champions Tours.

4.3. Analysis of Auxiliary Training Evaluation Results

The auxiliary teaching function is actually the concrete application of the evaluation method of a single movement. Users can select the corresponding actions through the drop-down list box for individual learning and click the detection button to start training. The computer will enable its own camera to capture the human body in real time and display it in the picture control. At the same time, the body posture information and joint angle data of each frame will be output in real time on the right side of the picture space. The eight edit boxes on the far right with read-only properties display the data of the current joint angle of the action in real time.

Taking the movement as the test target, the user makes the corresponding movement in front of the camera, and the system automatically identifies the joint angle data of the current posture and calculates the Euclidean distance from the selected standard movement. When the current distance value is less than one value, the training stops. The function of Euclidean theorem in auxiliary teaching is to set a Euclidean distance threshold. When the Euclidean distance between the trainer’s action and the joint angle vector of the standard action is less than a specific value, the training will be stopped in time, and the training results of the current action will be output. The training results include the detailed display between the eight joint angles. After stopping the exercise, the system quickly displays the joint angle data of the last frame and the standard movement in the list control in the lower right corner, so that the trainer can observe the difference between himself and the standard movement and make self-adjustment.

It can be seen from Table 1 that the Euclidean distance value of action 3 and action 6 is the smallest, which proves that the exercise is in compliance with the standard. At the same time, the Euclidean distance of action 1 and action 4 is larger, and the sports training decision-making department should focus on action 1, and action 4 conducts intensive training and correction.

In the actual video motion recording, the speed of the movement process of the demonstration personnel will not have any influence on the experimental results, because the system will pick out all the motion frames that meet the conditions and list the corresponding joint angle data. The user can select all the optimal actions, that is, the actions with the minimum distance value, as the basis of the whole set of actions. With the help of the motion assistant evaluation system based on human posture recognition developed in this paper, the included angle at the joint can be obtained, and the distance between movements can be calculated according to the Euclidean theorem. After the comparative analysis of the experiment, it has been proved that it can be applied to the auxiliary teaching and the auxiliary evaluation of sports. Through the comparison data of joint angle given by this system, athletes can find their own problems by checking the difference between their movements and standard movements and make improvements according to the problems. With more practice, they will get closer and closer to the standard movements.

5. Conclusion

In this article, we propose a motion assistance evaluation system based on human posture recognition based on deep learning algorithms. The system is mainly composed of three parts: standard motion database, auxiliary teaching, and overall evaluation. System users can customize the standard motion database to assist teaching. Some systems can compare the user’s actions with standard actions and visually display them to the trainer in the form of data. The overall evaluation system can identify and filter video files to provide an intelligent training platform for trainers. In addition, simulation experiments also prove the effectiveness of the algorithm in this paper.

Data Availability

The data used to support the findings of this study are included within the article.

Conflicts of Interest

All the authors do not have any possible conflicts of interest.