Abstract

At present, more and more sports science and technology are being explored and applied in competitive sports. The birth and popularization of video tracking and capturing technology have provided more fair and just perspectives for many sports events. Track linear capture can replay the player’s behavior in real time, the flight path of the badminton can be analyzed in 3D stereoscopic analysis, and the ball’s motion trajectory can be calculated more accurately. In this paper, an objective trajectory tracking and prediction model is constructed based on the motion cognition algorithm, and the motion characteristics of the objective are extracted from the limited historical trajectory of the objective to achieve more accurate trajectory tracking. Then, the trajectory tracking model is applied to the objective tracking framework to obtain ideal objective tracking results. At the same time, in order to make use of the interaction between scene information and objective, this paper improves the trajectory tracking model. The trajectory prediction model based on neural network is constructed, which learns the pedestrian motion characteristics from the pedestrian trajectory data of the target tracking scene offline and uses its “memory” online to generate the implicit depth motion characteristics of the target according to the limited historical information of the target. It also predicts the most likely location of the future target and calculates the motion similarity between the targets. Finally, a simulation experiment platform is built to prove the effectiveness of the trajectory tracking model and objective tracking algorithm proposed in this paper. Through the research results of this paper, it can play a role in verifying the referee’s judgment on the penalty of some key balls, which is more conducive to maintaining the fairness of the game, and more helpful for athletes to optimize their exercise results according to scientific basis, and has the function of improving their performance.

1. Introduction

With the continuous improvement of modern badminton players’ physical quality, sports ability, and technical and tactical level, modern badminton has also developed rapidly. The confrontation of badminton matches has become more and more intense. The speed of the ball in the match has become faster and faster, and the online competition has become increasingly fierce, even to the extent that it is difficult for human eyes to distinguish. As a result, it is difficult to guarantee the accuracy of the penalty in the match. Therefore, in badminton matches, the referee often misjudges and misses. In a close match, when the score is close to life and death, a misjudgment by the referee is likely to lead to the efforts of the sports team in the whole match [1].

With the emergence of eye movement recording technology, researchers began to explore the peripheral mechanism of elite athletes’ processing of sports information. They used eye trackers to study the differences of eye movement indicators in the decision-making process of athletes at different levels. They combined the peripheral observation indicators of eye movement characteristics with the internal decision-making characteristics in rapid response scenarios to understand the visual search mode of elite athletes’ cognitive processing [2]. Abernethy’s research results show that elite athletes use a “low search rate” approach to visual search to reduce the information processing load of stimulus perception and improve the speed and accuracy of motor decision-making [3].

Global nearest neighbor standard filter (GNNSF) [4] is one of the data association methods widely used in the early stage of multitarget tracking research. It considers all possible associations in a properly gated region and generates the most likely association hypothesis by solving the 2D binary assignment problem. At the same time, reference [5] proposes a joint probabilistic data association method JPDA, unlike GNNSF, which uses only a single observation to update the trajectory. JPDA considers all the observation measurements that pass the threshold, and [6] is also a continuation of the JPDA method, in which the state update of the trajectory is given in the form of a feasible measurement weighting; that is, the expectation of all associated hypotheses is combined. Multiple hypothesis tracking (MHT) [7] is a delayed logic method that first retains all possible data association hypotheses and uses subsequently received observations to resolve ambiguous hypotheses in the current frame. The JPDA method determines the most likely hypothesis to associate with the trace at each time step. MHT is different from it in that it propagates the current assumption and uses the postorder data to make a better estimate. It also provides a constraint formula to deal with the complete life cycle of the trajectory, including birth, growth, and termination. The MHT algorithm is essentially an extension of the KF algorithm in multitarget tracking [8].

Trajectory linear capture is to determine the position of a specific target in a given video sequence and maintain its correctness over time. In the three-tier structure of computer vision, target tracking belongs to the intermediate task and is the basis of other high-level tasks (such as action recognition) [9]. According to the number of tracked objects, video target tracking can be divided into single object tracking (SOT) and multiple object tracking (MOT). Single target tracking only tracks a single target in the video. The single target tracking algorithm mainly models the target to distinguish between the target and the background. It focuses on the design of the complex appearance model or motion model to deal with the challenges such as scale change, rotation, and illumination change [10]; in addition to solving the problems encountered in single target tracking, the most important task of the multitarget tracking algorithm is to set up a multitarget tracking algorithm The matching problem of processing multiple targets among multiple images is the problem of data association [11].

In this paper, based on the MOT problem of players and badminton in badminton matches, after given the input video sequence, multiple objectives are firstly detected and positioned, the labels of all objectives are maintained over time, and individual tracks are generated to locate multiple objective associations in continuous multiframe images. The network flow model is used to complete the data association task in multitarget tracking. Trajectory prediction error comparison experiments, multitarget tracking index comparison experiments, and multitarget tracking visualization comparison experiments are carried out on public data sets to verify the effectiveness of the trajectory prediction model and the multitarget tracking model.

The main innovations of this paper are as follows: (1)A trajectory prediction model considering image scene information is constructed(2)Select a gradient optimization algorithm and updating parameters in the model by utilizing a back propagation idea until the model is converged(3)A complete multitarget tracking framework is constructed based on the trajectory prediction model, and the improved trajectory prediction model is applied to the multitarget tracking framework

2. Motion Model

Unlike SOT, which focuses on building complex appearance models of objectives to distinguish objectives and backgrounds, most MOT methods not only take appearance as the core part but also take the movement characteristics of objectives and the interaction between objectives as an important part [12].

The motion model is used to capture the dynamic behavior of the target. It can estimate the possible position of the target in the future frame image, thus reducing the search space. In most cases, it is assumed that the target is moving smoothly in the real world and image (except for sudden movement). According to the motion of the target, the existing motion models can be simply divided into the linear motion model and nonlinear motion model. The commonly used linear motion models include linear fitting and Kalman filtering. Figures 1 and 2 show the comparison between a set of linear motion models and nonlinear motion models [13].

2.1. Linear Motion Model

The linear motion model assumes that the velocity of the objective remains constant. Based on this assumption, the velocity smoothness and position smoothness of the objective are mainly considered when constructing the linear motion model [14].

Velocity smoothness is modeled by forcing the change of objective velocity in the continuous frame image, and velocity smoothness is modeled by Eq. (1).

In Eq. (1), is the velocity smoothness, is the objective coordinate of the frame picture with time , is the objective coordinate of the next time of , is the velocity, and is the trajectory radian [15].

Position smoothness is directly constrained to estimate the difference between position and detected position to model position smoothness. The motion similarity between and of the trajectory fragment was calculated, and the position estimation or trajectory prediction of the trajectory fragment was carried out from the beginning and end of and , respectively. The estimated time length is , and the velocity is calculated by the position difference of the two frames at the end of the trajectory fragment [16]. The probability to which trace fragments and can be correlated in terms of motion similarity is calculated by Eq. (2).

2.1.1. Nonlinear Motion Model

In the existing MOT methods, the linear motion model is usually used to model the objective motion. However, in some cases, the linear motion model cannot explain the motion state of the objective very well. The nonlinear motion model is used to calculate the similarity between track segments more accurately. If the linear motion model is used to calculate the motion similarity between track segments and , the two cannot be associated together, but actually, the two track segments belong to the same objective [17]. In this case, if the objective is modeled based on the intermediate trajectory segment , and can be successfully associated [18]. However, there are two main problems in the nonlinear motion model. One is how to determine which nonlinear assumption the motion characteristics of a specific objective meet. Unreasonable assumptions may lead to greater errors in the prediction results. The second is to propose nonlinear motion hypothesis for each objective in each scene, which has poor generality and portability [19].

2.2. Interaction Model

(1)The interaction model describes the interaction between the objective and other objectives and the environment when it moves. Especially for the goal of players, players not only consider their own sports mode but also consider the cooperation with other players. It is not enough to consider the movement of the objective independently [20](2)In the game, there may be some interaction between different players. For example, several players tend to go to the same hitting track in parallel, which is known as group attraction in the model. At the same time, between players will subconsciously avoid collisions with other teammates and conflict (exclusion), which in the model is known as the repulsive force between the objectives. Eq. (3) shows the common attractive model [21]

In Eq. (3), represents the speed of the objective, and represents the distance between objective and objective . The closer and more consistent the two objectives are, the more attractive they will be, and the smaller the corresponding energy functions will be [22].

In Eq. (4), is the possible collision distance between objective and objective . The smaller the distance is, the greater the energy of this term will be, and the greater the influence on the motion decision of objective will be, so as to avoid the collision or conflict between them [23]. (1)Players try to avoid foul play such as hitting the ball over the net. As shown in Eq. (5), the repulsive force is used to describe the influence of the center net on players’ movement [24]

In Eq. (5), is the repulsive effect of the center net on players’ sports, is the rule of the center net, and is the boundary coordinate of the center net [25].

The next move is determined by environmental factors after the player has identified the receiving point. As shown in Eq. (6), the destination energy function is used to depict the influence of destination on players’ sports [26].

In Eq. (6), is the catch point of objective , and is the coordinate of objective .

Players may be judged by specific circumstances in the environment, for example, multiple players judge the same hitting point and move in the same direction at the same time, as shown in Figure 3.

As shown in Eq. (7), attractiveness is used to characterize the impact of hitting points on players’ movement.

In Eq. (7), is the impact of hitting point on players’ sports attraction, and is the distance from hitting point.

3. Improvement of Trajectory Prediction Model

3.1. Trajectory Prediction Model Optimization Algorithm

In order to add badminton court information to the trajectory model to depict its influence on players’ sports decisions, this paper improves the model LSTMv and constructs a new loss function to train the model. The trajectory prediction model considering image scene information is called background LSTM, abbreviated as LSTMb, and the corresponding network parameter is [27]. Based on the obstacle boundary information, hitting point information, and landing point information in the scene, the structure of the trajectory prediction model is shown in Figure 4.

Different from LSTMv, the input information of the LSTMb model is not but , in which, in addition to the coordinate information of the objective, the boundary information at time , the hitting information, and the landing point information are also included. The four kinds of information are, respectively, mapped by the linear layer into dimension vectors and connected together, as shown in Eq. (8).

In Eq. (8), and are the position coordinates of the objective at time , and and are the network and boundary coordinates in the background at time . Similarly, , , and are the coordinates of hitting points and falling points in the background at time . These coordinates are, respectively, the information vectors of the scenario generated by different embedding layers as the input information of the network. Similar to LSTMv, is the embedding function activated with ReLU, and is the embedding parameter matrix. The hidden layer neuron state of the network at the final time is calculated by Eq. (9).

According to Eq. (9), the trajectory information of the badminton and scene information of the image are embedded into the input of the model, and the hidden layer state of the network at the last moment is taken as the depth motion feature of the objective. According to this motion feature, the corresponding to the most likely location of the objective in the future is generated [28].

In the model training stage, the empirical risk function is Eqs. (4)–(7):

In Eq. (10), are the weighted coefficients of the energy function , respectively. The specific label information in is added to the corresponding training sample from the manual line. The three energy functions are used to constrain the restriction of scene information on the movement of players.

In the experiment, the parameter in the perspective coefficient is set as and . During model training, network parameter is trained based on the loss function of minimization Eq. (10) [29].

During model training, the training data should be obtained according to the training data generation method of the LSTMv model, and scene information labels and data should be added for each group of training data, including coordinates of scene information and experience parameters and . The hyperparameters of the model are shown in Table 1.

The hyperparameters of the LSTMb model are determined by the grid search method, and there are the following main differences from the superparameter setting of the LSTMv model:

Dimension of the input layer is as follows: as the input data of the LSTMb model is composed of four groups of embedding vectors, the dimension of the input layer is four times that of the LSTMv model, namely, [25].

Dimension of hidden layer is as follows: compared with the LSTMv model, the LSTMb model needs stronger feature expression ability; so, more neurons are needed for hidden layer.

Number of iterations is as follows: more complex models require more iteration training to converge well.

Based on the structure and parameters of LSTMv, a targeted input structure, hidden layer, structure, and loss function are designed to obtain a new trajectory prediction model, LSTMb. LSTMb learns more reasonable trajectory patterns in this scenario based on the scene information input into the network and excludes in advance the trajectory prediction results that should not occur in this scenario.

3.2. Processing of Interaction between Objectives

In order to improve the authenticity of trajectory prediction results, the interaction between objectives is considered on the basis of the LSTMb model.

In a match, two or more players are in attack or defense, which is called grouping. Grouping detection is required before dealing with grouping, that is, to determine whether multiple players are in a group.

3.2.1. Grouping Test

Players in a group tend to go at the same speed and keep a steady distance from the rest of the group. The grouping detection task is to determine whether player and player belong to the same grouping. In the multiobjective tracking framework of this paper, according to a given pair of track fragments , the label is used to determine whether the two belong to the same group, where means that the two belong to the same group, and vice versa. This is a binary classification problem based on paired trajectory fragments.

A SVM classifier is trained from the training data by defining an eigenfunction :

The eigenfunction is composed of the following: (1)The standardized histogram of distance of (2)The standardized histogram of the absolute value of the velocity difference of (3) standardization histogram of the absolute value of the directional difference of

and are the position coordinates and velocity of overlap time of the trajectory fragment. These three histogram features are connected to form the input features of SVM classifier [26].

Accordingly, the prediction grouping results of track fragment AA are given in the following form:

3.2.2. Grouping Processing

In the track fragment, players belong to the same group. Due to the attraction between objectives within the group, the single frame distance between them should remain stable on the field, while the independent vanilla LSTM model or background LSTM did not consider the attraction of the two, resulting in the increasing distance between the two in the prediction results, as shown in Figure 5.

Grouping processes are the predicted results of the detection response of the trace fragment.

A new loss function is constructed to correlate the LSTM model corresponding to , and the prediction results of the two were made more reliable through online fine-tuning. is shown in Eq. (13) [30].

In Eq. (13), represent, respectively, the prediction results of the independent LSTM model pair and the prediction results of the associated model during parameter iteration, is the penalty item describing objective attraction in the group, and is the penalty item coefficient. In the equation, the first two guarantees that the prediction results of the new model are not too different from the original prediction results, while the last guarantees that the objectives are attractive to some extent.

To depict the attraction of the objective within the group, the penalty items are as follows:

In Eq. (14), represent the position coordinates of frame in , is the average distance of , is the length of model prediction results, and is the weighted coefficient. In Eq. (14), the specific functions of each HH term are as follows: (1)The first two items and guarantee that the prediction results of by the associated LSTM model are not far from the prediction results of the original single LSTM model to an average degree [31]. This is because the label used in the training of the original model has certain reliability, and the revised prediction results should not be too far from the original label

The last item causes the position difference of each frame in the prediction result of by the associated LSTM model to satisfy , and is a constant greater than 0. At the same time, as the number of frames increases, the restriction of the current frame should be slightly reduced, which is represented by the weighted coefficient in the equation. According to the above description, the grouping processing is completed through the LSTM model of the two objectives in the association group, and the prediction results are obtained.

4. Experimental Tests

4.1. Test Environment

In this paper, a 2DMOT 2015 data set is selected. It has 22 video sequences, among which 11 are training sets with labeling information and the other 11 are test sets. For each frame of all sequences, the platform gives the corresponding objective detection results; that is, for each frame of the training set and test set, the position and size information of all objectives is obtained through a specific objective detector (detection results may be missed, misdetected, and detection frame deviation) [32]. In the training set, in addition to the detection results of each frame, the platform also provided the real track marked manually and numbered all the detection results correctly, which is called ground truth (GT). In the test set, users need to number all detection results according to their respective tracking algorithms and connect the objectives with the same number in the continuous frame image to form the trajectory of the objective. The test building platform is shown in Figure 6.

4.2. Multiobjective Tracking Experimental Results and Analysis

Given the input video sequence, the multiobjective tracking algorithm proposed in this paper uses the appearance features of the objective to construct the trace fragments and calculates the appearance similarity between the trace fragments [33]. Trajectory fragments were predicted by the LSTMv model, and motion similarity was calculated according to the predicted results. Finally, the network flow model is used to complete the data association, and the tracking results of all objectives in the video sequence are obtained. In order to analyze the effectiveness of this method, experimental data include trajectory prediction error comparison experiment and multiobjective tracking index comparison experiment.

4.2.1. Comparison of the Effectiveness of Detection Algorithm

As for the detection results of a frame in three test sets, it can be seen that the common detection algorithm may suffer from false detection and missed detection, which is one of the important reasons affecting the effect of the multiobjective tracking algorithm, as shown in Figure 7.

Figure 7 shows the tracking results of a certain frame in a training set. Rectangles with different colors represent different objectives, and rectangles with the same objective in adjacent frames have the same color. This result is given by the detection and label of the training set, which is shown in Table 2.

4.2.2. Comparison of Trajectory Prediction Errors

In the MOT system, the most commonly used trajectory prediction model is the linear prediction model, which predicts its possible position in the future based on the assumption of the linear motion of the badminton, and then compares it with the subsequent objectives, calculates the similarity between objectives, and completes the correlation between objectives. The error of the linear model can be ignored when only the position of adjacent frames of the objective is to be predicted, but when the position of nonadjacent frames or multiple frames of the objective is to be predicted, the error of the linear model is likely to bring interference and confusion to the data association. To analyze the trajectory prediction model that is proposed in this paper, LSTMv in MOT real data sets separately uses this model in the MOT validation set track prediction experiments with the traditional linear model and was calculated and compared in this paper: the model and the traditional linear model in the track prediction of the average position error and the finish position error and the validation set and test set, respectively, from MOT challenge platform of training set and test set and test results as shown in Figure 8.

The prediction results of the LSTMv model are obviously superior to those of the linear prediction model. From the mean error of all sequences, the mean position error of the linear model in the 16 sequences is 8.48. The mean of the terminal position error of the linear model is 12.85, which is 1.52 times of the average position error. In other words, relative to the average position error, the terminal position error of the linear model increases by 52%. By the same method, the endpoint position error of the LSTMv model is calculated to increase by 33%, which indicates that the prediction error of the LSTMv model increases more gently when the objective multiframe position is predicted.

Through trajectory prediction error comparison experiments, detection algorithm effectiveness analysis experiments, and multitarget tracking comparison experiments, the effectiveness of the proposed method is verified. Better motion similarity is obtained, and more accurate tracking results are achieved. It is proved that the improved LSTMI model has better tracking results.

5. Conclusion

This paper constructs a trajectory prediction model based on LSTM to learn the motion characteristics of badminton matches from a large number of players and badminton track data in real badminton match scenes and uses its “memory” to generate the hidden deep motion characteristics of the objective according to the linear trajectory of badminton to predict the badminton track. The main contributions are as follows: (1)Based on appearance model, motion model, and data association, a complete multiobjective tracking framework is constructed to realize multiobjective tracking task in video sequence(2)In order to deeply analyze the effectiveness of the framework and its components, the common data set and evaluation index are used in the field of multiobjective tracking, and then the trajectory prediction model and multiobjective tracking method are tested on the data set(3)The effectiveness of this tracking method is verified by quantitative comparison and qualitative analysis of other MOT methods on the platform of MOT challenge

In the future work, it is necessary to carry out the research on the badminton players’ body movements, take the flexibility of the players’ limbs and sports injuries as the reference indexes, and track the strength burst point in the players’ nonlinear motion trajectory, so as to obtain more accurate sports cognitive ability.

Data Availability

The data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Authors’ Contributions

The authors of the manuscript “Analysis of Badminton Movement Cognition Algorithm Based on Track Linear Capture” declare the following contributions to the creation of the manuscript: Zhiwei Wang contributed to the conceptualization, resources, methodology, and review. Yuxiang Hu contributed to the original draft and writing—review and editing.

Acknowledgments

The study was supported by “Science and Technology Project of China Railway Corporation, China (Grant No. 1341324011).”