Effective identification and correction of swimmers’ improper postures can significantly improve athletes’ weekday swimming training quality. The human body’s affine deformation is prone to occur during swimming movements when performing posture recognition and correction, resulting in the creation of low-brightness action feature locations. The inability of coaches to identify and correct athletes’ improper posture in real time is a result of a lack of detection and correction. Additionally, the human skeleton motion data from the depth camera Kinect contains a high amount of noise and fewer skeleton nodes, and the data level of detail is low. To overcome this issue, this research proposes a network for enhancing Kinect skeleton motion data. The network is composed of six bidirectional cyclic autoencoder stacks. The stacking structure improves the smoothness and naturalness of the data, and the training phase includes hidden variable limitations to ensure that the bone motion data preserve a genuine bone shape when the degree of detail is raised. The trials demonstrate that the optimized data from the network have a better degree of smoothness and can keep a more realistic bone structure, enabling the goal of obtaining high-precision motion capture data with low-precision Kinect equipment to be met.

1. Introduction

Swimming starts mark the beginning of a swimming competition, and they are a crucial aspect in gaining an advantage and ultimately winning a swimming tournament. When it comes to short-distance swimming competitions, the quality of the starting technique is a critical aspect in determining the final result. The beginning time at a distance of 25 meters determines 25% of the final results of the race. The beginning time at a distance of 50 m determines the outcome of the race by around 20% [1, 2]. Approximately 10% of the race outcomes are determined by the starting time from a distance of 100 meters. When it comes to short-distance swimming competitions, the squat start is a crucial beginning technique that is commonly utilized in the freestyle and butterfly events. Because of the ongoing increase in the world’s competitive swimming level, as well as the increasingly fierce rivalry, the difference between high-level players is frequently measured in millimeters. The result will be determined by any little modification or improvement in any technological link, which will result in a change in the competition ranking. Although swimmers spend less effort on starting than they do on strokes and turns, proper starting is still the most important factor in achieving success in short-distance swimming competitions [3, 4].

While competing in the World Swimming Championships, coaches and athletes from every country and every continent place a high value on each individual competition’s performance, and the progression of each competition’s performance is inextricably linked to the advancement of each athlete’s skills. Having proper swimming posture can assist athletes in improving their swimming capabilities; however, incorrect swimming posture hinders the advancement of athletes’ abilities. It will not be able to obtain optimal competition results. Because of this, proper assessment and adjustment of swimmers’ posture is critical for athletes who want to enhance their skills and attain positive competition outcomes. Under the influence of special medium water, the coach’s spoken instruction will not be effective in improving all athletes’ abilities, and athletes will have difficulty reproducing the coach’s movements accurately. Furthermore, athletes find it difficult to correct their exact technical motions through hearing because of the limitations of the use of human body demonstrations in swimming technique instruction. They are unable to get self-feedback information and hence are unable to attain the goal of enhancing their own abilities [57].

As a result, it is critical to use multimedia technology to correct athletes’ incorrect posture as they compete. The recognition and correction of swimmers’ posture have emerged as a significant research topic among relevant experts and scholars and have significant research value. Methods such as video recording, front and back comparison of action pictures, on-site shooting, and replay of video technology, among others, have made it a popular topic of discussion. The research into athletes’ incorrect posture recognition methods has yielded some promising results, according to the findings of the associated research. Some researchers use swimming movements to identify and fix the posture of swimmers, while others use a different method. Other researchers employ a dynamic module matching algorithm to detect and identify incorrect athlete posture, as well as to complete the recognition and repair of swimmers’ incorrect postures. When the swimming posture is affected by scale, the image is susceptible to distortion. When the swimming posture is affected by factors such as noise and distance, the image is susceptible to distortion [8, 9].

Game creation, film and television production, sports training, and medical rehabilitation have all benefited from the availability of human skeletal motion data. Individuals can get human skeleton motion data by the use of high-precision motion capture technologies, such as the Vicon optical motion capture system and the Xsens sensor motion capture system, among others. However, because it is both pricey and cumbersome to wear, it is unlikely to become widely popular. The Kinect depth camera is capable of acquiring real-time data on the mobility of the human skeleton. It is inexpensive and simple to use; however, the motion data acquired contain just 25 skeletal nodes and contain a significant amount of noise [10].

As a result, before the Kinect data acquired from the human skeleton can be used meaningfully, the data must be optimized. Motion data optimization is classified into two categories: surface optimization, which refers to the process of filling in missing values in motion data, and noise and outlier removal, which requires that the optimized data retain the spatiotemporal pattern and human kinematics information contained in the motion data. This paper aims to increase the quality of Kinect data by utilizing deep neural networks. Not only can deep neural networks fill in gaps in the Kinect’s skeleton node information, but they can also eliminate noise and outliers, raise the level of detail and accuracy of Kinect skeletal motion data, and permit acquisition with low-precision equipment. The purpose of high-precision motion capture data is to accurately correct the swimmer’s starting position, which is then communicated to the coach.

2. Background

2.1. Related Work

Some researchers developed a real-time probabilistic framework based on the Gaussian process model to increase the accuracy of motion data recorded by the Kinect in order to optimize the incomplete motion data captured by the device. In the situation of self-occlusion, the approach is also capable of generating high-quality motion. However, because of the high computing complexity of the Gaussian process, this strategy is only appropriate for limited datasets. Other researchers have used Gaussian process-based local mixture approaches to enhance the solution speed, and their methods allow the local model to be progressively updated in real time, which allows them to handle a larger number of data samples than previous researchers. Both of these strategies generate smooth motion by controlling the speed change between successive frames; however, it is impossible to recover the motion of turning with either of these two methods. Other researchers employ the dimensionality reduction method based on principal component analysis to search for suitable motion data from the motion database in order to fill in the missing values and then employ the proportional-derivative controller to generate physically credible motion in order to generate physically credible motion [4, 8, 11].

The accuracy of this motion recovery method is highly influenced by the content and amount of the database used in the analysis. Other researchers have presented a method that combines local pose estimation with global retrieval technology, employing a voting approach based on the Hausdorff distance to combine the two hypotheses in order to achieve the final motion pose, as described in detail below [1214]. It takes into account the continuity of motion and can efficiently tackle the problem of data missing caused by occlusion and noise; nevertheless, it has a poor recovery effect for fast motion or motion that rotates more than 45° around the vertical axis. The non-deep learning Kinect data denoising methods discussed above can be used to process some motion segments; however, the effect of processing large continuous motion sequences that may contain arbitrary motion is not immediately apparent [1518].

Methods for optimizing motion data based on deep learning have also been developed steadily over the last few years. While some studies improve motion data using EBD and EBF neural networks, others do not. By employing joint information and temporal correlation, EBD fills in gaps in missing data. EBF is essential because the motion data denoised by EBD contain faults such as jitter and poor noise reduction, implying that EBD alone is insufficient. Additional smoothing is possible, but this method cannot be used to increase the amount of information included in movement data [1924]. Other studies map and invert Kinect motion data using CNN-based autoencoders, but the resulting motion data retain some jitter relative to the original data. Deep RNNs are used to process the three-dimensional coordinates and velocity of joint points in Kinect data, and the outputs are stored as pDRNN and vDRNN, respectively. To assure the naturalness of motion, some researchers employ the Kalman filter and K-nearest neighbor algorithm to integrate and enhance the findings of pDRNN and vDRNN. This method requires additional postprocessing procedures to achieve smooth and natural motion [2529]. A number of academics have asserted that the BRA network can optimize the Kinect bone motion data segment, but it has been demonstrated that when the amount of detail in the data needs to be increased, it is difficult for the BRA network to ensure optimization. The authenticity of the data in the post needs to be perceived [3032].

2.2. Motion Characteristics of the Start of the Grab Table

In swimming competitions, the goal of any starting posture and technique is to cause the athlete’s center of gravity to achieve the maximum horizontal speed as quickly as possible, and the swimmer’s reaction and technical movements will play a critical role in the speed with which the initiation occurs. The horizontal force of the body’s supporting reaction force is the most important source of horizontal speed, and the characteristics of the prepared posture and action define the size of the force value. As the body is being prepared to step down from a platform, it should be fully stretched and its center of gravity should be pushed forward as far as it can possibly go. However, if the technical fundamentals are not learned, the athlete’s body will assume a sitting-back position when the starting signal is given, which will increase the time required to exit the stage and have a negative impact on the technology’s effect and function. The initial power increases proportionally to how long it takes to depart the level. According to data from athletes’ daily training sessions, the average athlete’s time to leave the stage for the start of a competition is approximately 0.75 seconds. One of the most crucial factors for evaluating an athlete’s starting technique is the athlete’s response to the starting signal and the action that follows as a result. The force will be exerted in proportion to the amount of time spent away from the stage; conversely, the greater the amount of time spent away from the stage is, the lower the force will be.

When swimming begins, the first action is a movement of the human body, and several uncontrollable factors must be taken into account. For example, some swimmers will have a short flight time but a long distance during daily training, while others will have a long flight time but a short distance. Additionally, the time period is lengthy. When it comes to movement parameters following takeoff, the swimmer’s speed upon departure from the platform is crucial; thus, the grab table’s starting method has a relatively fair departure angle, as a good departure angle results in an improved aerial. Athlete’s posture and time in the air can be drawn as a graphs. One of the most critical parts of the table start technique is the departure angle. Similar to how the law of motion for a parabola has been developed, the whole movement path following the athlete’s exit from the stage has been established. When the athlete’s center of gravity cannot be changed while in the air, the athlete’s posture can be shifted into the water stance. When entering the water, it is necessary to completely reverse one’s posture, from up to down, as well as to adjust the angle of entry to the water to one’s position. The vertical axis remains unchanged from the previous configuration.

The table grab start technique is the most often utilized beginning technique in the training and competition of young swimmers. It is also the most effective starting technique. Swingarm starting technology and grasping table starting technology are two of the most often used beginning techniques today. This technology for grabbing table beginning offers several advantages, as can be demonstrated in this example. There are several essential elements of the technology used to initiate the grab table’s operation. As a starting table is being prepared, two hands grab hold of the preparation position, allowing the arm to be employed as a support point, resulting in the athlete’s body being more solid and making it harder for him or her to seem offset. The use of fouls is minimized to some extent. Second, the center of gravity of the body can be shifted forward to the greatest extent possible during preparation, allowing the takeoff speed to be significantly increased with minimal effort. Third, the entry angle of young swimmers can be decreased to a significant degree throughout the process of taking off and leaving the water. Fourth, the initial speed can be increased to the greatest extent possible because the athlete’s time in the air is relatively short when using the table start technique, and the speed with which the athlete leaves the table and enters the water is also relatively quick.

Because of their inexperience with grab-and-go starting procedures, which have been developed through relevant research, young athletes frequently make a series of mistakes in their everyday training sessions. Examples include competitors who fail to grasp the starting platform, which is wholly unacceptable behavior in the sport. Because the athlete is required to grasp the starting platform with both hands, when the starting signal is issued, the arm can be quickly pulled up and the elbow can be bent, causing the athlete’s body to be tilted downward and the distance between the thigh and upper body to be maximized, and then the center of gravity of the body can be moved forward and to the front of the starting platform, resulting in an acceleration of time to enter the water. Furthermore, grasping the starting platform with both hands has another vital role, which is to maintain the athlete’s body when the athlete’s center of gravity departs from the support surface during the race. It can also maintain a relatively static state for an extended length of time, allowing it to efficiently manage the forward thrust force stored by the body during that time. Athletes simply need to flatten their bodies at this point because the body will rush forward with the help of the already stored forward force when they do. Meanwhile, there is another critical element that athletes must learn in order to be successful in their endeavors. When the horizontal plane is at a 45-degree angle between the athlete’s center of gravity and the support points of both feet, the best takeoff angle is achieved.

In addition, when the athlete begins to take off, his body’s center of gravity has already shifted away from the starting platform and his body has already exceeded the table of the support platform. Athletes that utilize the grab table starting technique just need to drop their heads, stretch their legs and pull their arms, allow their bodies to lean forward as much as they can, and then conduct actions such as bending their knees and lying down to begin. When the angle between the knee joint and the ankle joint achieves the most appropriate angle, the athlete executes activities such as extending the knee and ankle, swinging the arm, and extending the hip all at the same time, resulting in the formation of a resultant force by the entire body. However, if the athlete’s hands do not grab the starting platform, the body will not be able to optimize forward leaning, resulting in a flat shot into the water and a relatively slow speed of entry into the water, both of which are harmful to the athlete’s overall performance. As a result, while young swimmers are training, the coach should emphasize the importance of effective training in the grab table starting technique. First and foremost, ensure that the athletes have a good understanding of the key tactics and their characteristics, so that the young swimmer may more thoroughly begin the grab table. Master the advantages of technology, employ the fundamentals of movement in a flexible manner, and then effectively boost your sporting performance [31].

3. Method

3.1. Motion Dataset

To accomplish the study’s objective, it is necessary to first construct a motion dataset that is simultaneously collected by two motion data collecting devices, namely, the Kinect and motion capture devices, and then analyze that dataset. This project establishes a synchronized database through the use of the Kinect and Noitom Perception Neuron motion capture sensors. This enables performers to practice swimming grabbing and starting actions continuously while in the acquisition environment, resulting in a succession of longer motion sequences. The data collected by Kinect are stored in joint point coordinate format, and each frame of data contains the three-dimensional coordinates of 25 joint points, resulting in a motion data frame with a dimension of 75 (253 = 75). Noitom converts the motion data it collects into a format called joint coordinates. Each frame of data contains the three-dimensional coordinates of 59 joint points, yielding a total dimension of 177 points per frame of motion data. As illustrated in Figures 1 and 2, a skeleton structure has been generated from motion data collected by two motion collection devices [32].

Following the acquisition of the motion sequence with two different types of equipment, the data must be preprocessed in a synchronized manner. Prior to any motion analysis, spatial processing must be completed, and the two kinds of data must be translated into relative coordinate data based on node 0, which is the root node, before the two kinds of motion analysis can begin. In this case, the data are time-synchronized because the sample frequency of Noitom motion capture is 60 frames per second and the sampling frequency of Kinect is unstable, around 30 frames per second, and thus the sampling rate and time domain of Noitom are used as the benchmarks. The data from the Kinect were upsampled to 60 frames per second using natural spline interpolation. It is acquired after preprocessing the data to obtain a synchronized long-sequence motion dataset, from which 80% of it is utilized as a training set and 20% of it is used as a test set.

Take the Kinect motion data segment as the noise data and denote the Nuoyiteng motion capture data as .

The network structure of this paper is shown in Figure 3.

The input data of the network refer to the Kinect motion data segment, and the label data refer to the motion capture motion data segment. For this reason, this paper defines the mean square error aswhere .

Let the bone length of the stacked autoencoder output Y in the ith frame of the bone t bewhere and are the position coordinates corresponding to the two end nodes, respectively, and the loss function about the bone length is

To ensure that the optimized motion sequence transitions smoothly between adjacent frames, this paper imposes a smoothness constraint on the output data of the stacked autoencoder:

The loss is

The perceptual loss is

The first stage only uses all in the training dataset to train the perceptual autoencoder and imposes the mean square error constraint, bone length constraint, and smoothness constraint on the output of the perceptual autoencoder. The loss is

On the basis of hidden variable constraints, the second stage trains a stacked autoencoder, which then applies a mean squared error constraint to the output of each autoencoder within the stacked bidirectional recurrent autoencoder. There are bone length limitations and smoothness constraints, and the function also imposes hidden variable constraints, which results in loss.

In the process of athlete pose recognition and correction, represents the foreground pixels of the athlete pose image, and represents the background pixels of the athlete pose image. The positioning of each joint point when the athlete swims is

Analyze the located joint points. Use equation (10) to extract the feature points of each joint of the athlete’s limbs during swimming:where .

As part of the preprocessing of the image, median filtering is employed because it is capable of successfully reducing the non-linear signal of noise in essence. The median of each value is represented. Individual noise points are excluded from the analysis using this strategy, which involves making the relationship between the adjacent real value and the pixel value identical. The advantages of this technology are that it removes noise quickly, it is quick, and it is simple to operate. It is capable of effectively preprocessing digital photos under particular situations [2830].

4. Results

Parallel to the continuous upgrading and improvement of domestic information technology, the development of DV video screen-related hardware is reaching completion, with the recognition and monitoring of moving target image features being widely used in a number of industries. Swimming activities also have a broader range of applications. The first stage is to acquire images using Kinect technology; the second step is to implement timely detection and recognition of swimming target poses using a method given in the third part. The athlete’s goal action is estimated using deep image interception, which reduces the challenging prediction problem to a relatively straightforward classification problem based on the image deep interception and tracking method. The swimming posture of the human body is employed as the tracking and detection objective, and the Kinect technology is used to collect deep intercepted images in order to more effectively monitor moving targets and obtain better results. The position information for the moving target can be calculated using the approach outlined above, and this position information can serve as a critical foundation for later retrieval and recognition of swimming improper postures.

The validity and stability of the algorithm are checked by measuring the precision and rate with which swimming posture photos of moving targets are captured. The long sequence data of different athletes are recorded in test experiment 1, as shown in Figures 46, when they first begin competing.

It can be seen that for three athletes, our method can better capture the pose of the athletes.

To further study the distinctions between the method described in this research and KPC, a standard posture capture algorithm, we compare the two methods’ success rate and matching time (again, as shown in Figures 7 and 8). As can be observed, our solution outperforms the KPC method in terms of success rate (more than 92%) and matching time (much less than the KPC method).

5. Conclusion

Effectively identifying and correcting swimmers’ incorrect postures can considerably improve the quality of athletes’ daily swimming training. Coaches’ inability to notice and correct athletes’ poor posture in real time is due to a deficiency in detection and correction. Additionally, the human skeleton motion data from the depth camera Kinect are noisy, with fewer skeleton nodes and a poor level of resolution. To address this issue, this study presents an enhanced network using Kinect skeleton motion data. Six bidirectional cyclic autoencoder stacks comprise the network. The stacking structure enhances the smoothness and naturalness of the data, and the training phase incorporates hidden variable constraints to ensure that the bone motion data retain its real shape when the degree of detail is increased. The studies demonstrate that the optimized data from the network have a higher degree of smoothness and can maintain a more realistic bone structure, enabling the goal of high-precision motion capture using low-precision Kinect equipment to be accomplished.

Data Availability

The data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest

The authors declare that they have no conflicts of interest.