#### Abstract

A sports-assisted education method based on a support vector machine (SVM) is proposed to address the problem of complex and variable sports actions leading to easy ghosting of target detection and high dimensionality of feature extraction, which reduces the low accuracy of sports action recognition. The ViBe target detection algorithm is improved by using Wronskian function and the “4-linked algorithm” seed filling algorithm, which effectively solves the ghosting problem and obtains clearer human sports targets. By using the genetic algorithm to fuse the eight-star model with sports action features extracted by the Zernike moment, redundant features are reduced and differentiability between different classes is ensured. Sports action classification was achieved by using a one-to-one construction of an SVM classifier. The results show that the proposed method can effectively recognize sports movements with an average recognition accuracy of more than 96%, which can assist physical education and has a certain practical application value.

#### 1. Related Work

With the popularity of artificial intelligence technology in various industries, machine vision is playing an increasingly important role in people’s lives. As a hot research direction in machine vision, sports action recognition is widely used in sports analysis and sports-assisted education, which is based on target detection. Sports action features are extracted to analyze them, and an automatic classifier is used for recognition. At present, commonly used target detection algorithms mainly include hybrid Gaussian background modeling, visual background extraction (ViBe), and average background model. [1]. For example, Farag Wael detected autonomous vehicles by hybrid Gaussian background modeling to achieve fast real-time target detection. Zhao Xiaolei et al. applied the ViBe algorithm to achieve multiscale target detection in high-resolution remote sensing images [2]. Zhao Weidong et al. achieved the target detection of steel defects by the average background model [3]. Compared with hybrid Gaussian background modeling and the average background model, the ViBe algorithm has good fault tolerance, high computing speed, and detection accuracy [4]. Therefore, the ViBe algorithm is used as the target detection algorithm in this paper for sports action recognition. In terms of feature extraction, the main feature extraction methods include the eight-star model, Zernike moments, and other approaches. Liu Jing et al. extracted hyperspectral remote sensing image features. An eight-star model is used to improve the recognition effect of images [5]. Wang et al. used Zernike moments to extract MRI image features and used SVM classification to identify them for rectal cancer T-stage prediction [6]. Sports actions are complex, and multifeature fusion is beneficial to improve the accuracy of subsequent action recognition. Therefore, in this paper, a genetic algorithm is used to fuse the eight-star model with features extracted from Zernike moments. For image classification recognition, it mainly includes a dynamic time regularization algorithm and probability-based statistical recognition methods. Based on the former, it is usually not used for complex sports action classification recognition due to its susceptibility to noise [7]. The classification methods based on probability statistics include hidden Markov models and SVM models. The SVM model is highly useable and generalizable in dealing with high-dimensional pattern recognition and nonlinear problems, and it is applicable to this paper for sports action classification [8]. Thus, SVM is selected as a classifier to classify sports actions through target detection and multifeature fusion, so as to achieve sports-assisted education.

#### 2. Basic Algorithms

##### 2.1. Target Detection Algorithm

###### 2.1.1. An Introduction to ViBe Algorithm

A ViBe algorithm is a pixel-level detection algorithm with the characteristics of occupying a small hardware memory and a high recognition rate, and its specific steps are as follows:(1)Background modeling: let *M(x)* be the set of background pixel values *v(x)* of the pixel point *x.* Then, the background model can be obtained as(2)Foreground detection: the new pixel value with *M(x*) is compared. If is close to the sampled value in *M(x)*, belongs to the background point. Suppose is a sphere space with as the center and *R* as the radius and denotes finding the cross section in the space. In Figure 1, *C*_{1} and *C*_{2} are the components of (*C*_{1},*C*_{2}) in the two-dimensional Euclidean space, # is the number of intersecting elements of the set, and min is the decision threshold, then the decision process is expressed as If the number of *M(x)* in the space < min, *x* is a foreground pixel point.(3)Background model updating: *P*_{G} is the pixel point in the random point *x* eight neighborhood in the background model, as in Figure 2(a), input *P*_{t}, as in Figure 2(b), and *P*_{G} needs to be updated when is judged to be the background. Spatial randomness is the random replacement of pixels in the eight neighborhoods by .

**(a)**

**(b)**

###### 2.1.2. ViBe Algorithm Improvement

The ViBe algorithm has the advantage of constructing a background model from the first frame of the video sequence, but at the same time, it also has the problem of ghost regions. To solve this problem, this paper improves it by the means of Wronskian function [9] and the “4-linked algorithm” seed filling algorithm [10], as shown in Figure 3. In addition, the steps of improvement are as follows:(1).After preprocessing the collected data, the ViBe algorithm is used to detect the motion target(2).The pixel values are judged according to equation (3) to obtain the ghosting area: where are the gray values of the pixel point at moments *t* and *t* − 1(3)The ghost region is filled with the “4-linked algorithm” and acquire the moving target by median filtering

##### 2.2. Feature Extraction Algorithm

###### 2.2.1. Eight-Star Model

The process of extracting sports action features from the eight-star model is as follows: we assume that the silhouette pixel points are *N*, and we calculate the sports pose silhouette centroid coordinates .

According to , the motion target is divided into four parts: top, bottom, left, and right, and the Euclidean distance from the extreme value point of the silhouette contour of each part to the center of mass is calculated as

We connect the center of mass of each part with the contour extreme point, and we calculate the angle between each line and the horizontal line as shown in

By internalizing the sports action gesture silhouette in a semicircle, the eccentricity is used to determine the amplitude of sports movements as shown inwhere *b* is the short semiaxis of the ellipse, i.e., silhouette height, and *a* is the long semi-axis of the ellipse, i.e., silhouette width. Based on the above information, the sports action feature vector extracted by the eight-star model can be obtained as

###### 2.2.2. Zernike Moments

Zernike moments are calculated by computing the orthogonal set of the projection of the image on the set of complex-valued functions on the unit circle in the following form:where are the length of the pixel point (*x, y*) in the unit circle from the origin and the angle information with the *x*-axis and is the radial polynomial of (*x,y*).

From Zernike moment polynomial properties, it is known that there exists a unique expression of .where is the Zernike moment, which is defined in

The eigenvectors corresponding to Zernike moments are , , , , , , and .

##### 2.3. Classification Recognition Algorithm

SVM is a classification algorithm that performs nonlinear classification by a kernel method. The core idea of the SVM algorithm is to use mathematical methods to construct the optimal classification surface in the original space or the projected high-dimensional space, so that the given binary categories can be distinguished [11]. The specific procedure is as follows:

Suppose the input data is *x* and *x* is mapped to a high-dimensional space by a nonlinear mapping function as shown in Figure 4. The estimation function is then used to linearly estimate:where *C* is the penalty coefficient, and the larger its value, the stronger the penalty; is the insensitive loss function; and *d*_{i} is the true output of SVM. For finding the minimum value of , i.e., by introducing the dot product function with the use of Wolfe pairwise solution [12], the dual solution of equation (15) is

The constraint of equation (16) is [13]where are relaxation variables.

Introducing the Lagrange multiplier method, the estimated function *f*(*x*) can be transformed as [13]

The constraint is , where is the kernel function of SVM.

The estimation function is then used to linearly estimate inwhere *C* is the penalty coefficient, and the larger its value, the stronger the penalty; is the insensitive loss function; and *d*_{i} is the true output of SVM. For finding the minimum value of , i.e., by introducing the dot product function with the use of the Wolfe pairwise solution [12], the dual solution of (19) is

The constraint of (20) is [13]where are relaxation variables.

Introducing the Lagrange multiplier method, the estimated function *f(x)* can be transformed in [13]

The constraint is , where is the kernel function of SVM.

Although SVM has a strong ability to be used and generalized, it still has some limitations when facing complex and changing sports. Therefore, this paper selects SVM as a classifier to build a classification model to identify sports actions and then assist in sports education.

#### 3. Multifeature Fusion-Based SVM Sports Action Recognition Classification Method

##### 3.1. Feature Fusion

Feature extraction is central to the implementation of sports action recognition [14]. In this paper, we combine the characteristics of sports action, consider the comprehensiveness of feature extraction and the description of local features, and use the eight-star model and Zernike moments commonly used in sports posture multifeature extraction to extract sports action multifeatures. In order to reduce the redundancy of features and the dimensionality of the feature vector, this paper uses a genetic algorithm-based approach to fuse the above extracted features [15]:(1)We use the binary method with “0” and “1” code to indicate the unchecked and selected features(2)initialize the generated population and calculate the fitness function of all features, randomly select individuals for inheritance, and eliminate unselected individuals according to a predetermined strategy, such as the random traversal sampling method(3)We use crossover probability of 0.7 [16]and a variance probability of 0.5 [17] to generate new individuals(4)continue iterating until the algorithm satisfies termination condition

##### 3.2. Recognition Model Construction

Using a one-to-one approach, a multiclassifier with radial kernel function SVM is constructed, and the specific implementation process is shown in Figure 5, which indicates that six SVMs are required for the input of four classes of samples. Set class a as positive samples and class b as negative samples and train to get the classifier SVMab. When classifying votes, the test samples are input to the classifier to get the cumulative values of votes corresponding to the four categories, and the maximum cumulative value corresponding to labels is used as the classification result. Taking SVMab as an example, if the output is determined to be class a, the voting score of a is sum(*i*) = sum(*i*) + 1, and the maximum value corresponding to the label is found from it.

The above classifier is applied to human sports action recognition to construct a feature fusion multiclass classifier as shown in Figure 6. The training samples are input to the model for training, the best model is saved and input to the test samples, and the category corresponding to the maximum cumulative value is selected as the classification result output, which is the model recognition result.

#### 4. Simulation Experiments

##### 4.1. Experimental Environment Construction

The experiment runs on 64-bit Windows 7 operating system with Inter(R) Core(TM) i5-2450M [email protected] GHz CPU, 8 GB RAM, Microsoft Visual Studio 2010+Opencv2.4.1 software development environment, and Visual C++ as the development language.

##### 4.2. Data Source and Preprocessing

The test and competition datasets are selected as the experimental datasets for evaluating the target detection effectiveness of the proposed method [18]. Among them, the test dataset contains 111 video frames with a resolution of 320 ∗ 240 and a frame rate of 25 fps, and the competition dataset contains 396 video frames with a resolution of 720 ∗ 480 and a frame rate of 29 fps.

KTH, Weizmann, and UCF-Sport datasets, which are commonly used for human motion pose recognition, are selected as experimental data [19, 20]. KTH contains 160 ∗ 120 resolution, 25 fps frame rate, 599 videos, and 6 kinds of actions; Weizmann contains 180 ∗ 144 resolution, 25 fps frame rate, 90 videos, and 10 kinds of actions; UCF-Sport contains 720∗480 resolution, 10fps frame rate, 150 videos, and 10 kinds of actions.

Since videos in the above dataset usually contain clips without human behavioral activities and video frames without motion targets, which increase the model computation and recognition time, invalid video frames are removed in this experiment. Taking the KTH dataset as an example, we first calculate the area of the motion target silhouette area in each frame of a video, set 1/2 of the maximum area value as the threshold value, and then classify the video frames smaller than the threshold value as invalid video for deletion processing.

In addition, considering that the human motion changes with the motion distance from camera position, there is a difference in the size of the target detection motion area. To eliminate this discrepancy, experiments are performed using a bilinear interpolation method [21], as shown in (24), with scale normalization for each video frame.where *m* ∗ *n* is the motion target region, *M* is the height, and newR ∗ newC is the preprocessing result. The normalized result of the original video frame is shown in Figure 7.

**(a)**

**(b)**

**(c)**

##### 4.3. Evaluation Indexes

In this experiment, precious, recall, false positive rate (FPR), and *F*-measure are selected as evaluation metrics of the proposed method, which are calculated as follows [22, 23]:where TP, TN, FP, and FN correspond to true positive, true negative, false positive, and false negative, respectively.

##### 4.4. Experimental Results

###### 4.4.1. Target Detection Algorithm Verification

To verify the results of the proposed method on target detection and the suppression effect of ghosting, experiments are tested on test and competition datasets, and the results are shown in Figures 8 and 9. From figures, it can be seen that the conventional ViBe algorithm detects the ghost shadow in the background of the target, and there is an occluding reflective shadow. The ViBe algorithm improved by the Wronskian can eliminate the ghost shadow and reflective shadow well, which has no effect on multimotion targets. This shows that the proposed algorithm has a good detection effect in the target detection process.

**(a)**

**(b)**

**(c)**

**(d)**

**(e)**

**(f)**

**(a)**

**(b)**

**(c)**

**(d)**

**(e)**

**(f)**

In order to quantitatively analyze the effectiveness of the proposed method in target detection, the performance of the algorithm before and after the improvement is experimentally analyzed, and the results are shown in Table 1. As can be seen from the table, compared with the traditional ViBe algorithm and the Wronskian algorithm, the proposed algorithm performs better in terms of precious, recall, false positive rate, and F-measure indexes, which indicates that the improvement of the algorithm in this paper is effective.

###### 4.4.2. Multifeature Fusion Results

To verify the effect of the proposed method on the multifeature fusion, the 39 feature quantities of walking, running, and jumping movements extracted from the eight-star model and Zernike moments are experimentally fused with normalized and normalized features, and the results are shown in Figure 10. As can be seen from the figure, there is a certain interval between the feature data of walking, running, and jumping after the multifeature fusion, which indicates that the differences between different categories of sports actions after the multifeature fusion are obvious and easy to classify.

###### 4.4.3. Identification Results

To verify the effectiveness of the proposed method, experiments are tested on the preprocessed Weizmann and UCF-Sport data. Among them, three common movement postures of walking, running, and jumping are selected for testing on the Weizmann dataset, with walking and running movements recorded by 10 volunteers individually and jumping by 9 volunteers; three common movement postures of golf, diving, and gymnastics are selected for testing on the UCF-Sport dataset. Figure 11 shows the test results of the proposed method on the Weizmann dataset. From the figure, it can be seen that the proposed method can effectively identify sports targets and accurately classify the sports movements of walking and jumping, but there is a classification error of misidentifying running as walking. The reason for this is that some of the key frames of running and walking are similar in posture profile, so the separability of extracted features needs to be improved, which in turn leads to classification errors.

Figure 12 shows the test results of the proposed method on the UCF-Sport dataset. As can be seen from the figure, the proposed method has good recognition results for videos with a single background and can well recognize the sports action of playing golf, but there are false recognition cases for sports actions such as diving and gymnastics with complex backgrounds, the cause of which is that sports such as diving and gymnastics have more action transformations that are not conducive to feature extraction, which in turn leads to wrong recognition of individual video frames.

To further verify the effectiveness of the proposed method, experiments compare the recognition results of the proposed model with those of the commonly used sports recognition methods on the experimental dataset. Table 2 shows the recognition results of different recognition methods on the KTH dataset for walking and running, Table 3 shows the recognition results of different recognition methods on the Weizmann dataset for walking, running, and jumping, and Table 4 shows the recognition results of different recognition methods on the UCF-Sport dataset. As can be seen from the tables, compared with SIFT features and multifeatures recognition methods on different experimental datasets, the recognition accuracy of the proposed method has been improved to different degrees, and the average recognition accuracy reaches more than 95%, which has good recognition results.

###### 4.4.4. Example Validation

To verify the generalization ability of the proposed method, experiments are tested on the above standard experimental dataset in addition to the self-built video dataset. A 48-megapixel camera is used to capture three sports postures of walking, running, and striping in both indoor and outdoor scenes, and the proposed method is used to test them. The average recognition rate of the proposed method is shown in Table 5. As can be seen from figures, the proposed method can detect the complete motion target, and there is no shadow interference and trailing shadow adhesion between the target and the surrounding environment, which is conducive to the extraction of physical signs, and the overall recognition effect is good with an average recognition accuracy of more than 96%. However, due to the small difference between walking and running movement transformation and posture, there is still the problem of recognition error, but it has no effect on the overall recognition effect and can better realize movement recognition and then assist physical education.

#### 5. Conclusion

In summary, the proposed deep learning-based sports-assisted education method improves the ViBe algorithm by using Wronskian function and the “ 4-linked algorithm” seed filling algorithm, which effectively solves the ghosting problem and can obtain clearer targets of human sports. By using the genetic algorithm to fuse the eight-star model with sports action features extracted by Zernike moments, redundant features are reduced, and differentiability between different classes is ensured. By using one-to-one construction of the SVM classifier, sports action classification recognition is achieved with a comprehensive recognition accuracy of more than 96%, which can be used for actual sports action classification recognition. The innovation of this research lies in the systematic processing of moving images from all links and the improvement of classification algorithm, so as to comprehensively improve the classification accuracy of moving images.

However, due to these conditions, there are some problems in this paper to be further deepened and improved. When SVM is selected for classification, its kernel parameters and penalty factors have a large impact on classification results and affect the generalizability of the model, while the influence of kernel function is ignored in this paper. Therefore, in the subsequent research, the SVM model should also be further improved from the above aspects in order to enhance the generalization ability and the classification effect of the model.

#### Data Availability

The experimental data used to support the findings of this study are available from the corresponding author upon request.

#### Conflicts of Interest

The authors declared that they have no conflicts of interest regarding this work.