#### Abstract

Aiming at the related problems existing in the field of leisure sports computing, in order to study the behavior recognition of leisure sports by deep residual network, based on the deep residual neural network theory, the behavior recognition algorithm and the corresponding robust model are used to analyze the leisure sports related samples, and the correlation model is used to predict and analyze the leisure sports related content. The results show that the change curves of Sig and Tanh functions can be divided into slow increasing stage, linear increasing stage, and stable stage. The *y* value corresponding to ReLU curve shows a linear change trend with the increase of *x* value. The Leaky function’s corresponding curve can be divided into two stages. The function coincides with the ReLU function in the first quadrant and remains linear in the third quadrant. The activation function curves corresponding to layers 56 and 20 have a relatively large variation range, and both of them show an overall trend of gradual decline. On the whole, the curve value corresponding to layer 56 is higher than that corresponding to layer 20, indicating that the method of layer 20 is relatively good and the corresponding training error is relatively low. It can be seen from the robustness recognition rate of various methods under different training samples that *F*_{l} has the highest overall data recognition rate while *S*_{c} has relatively poor stability. However, the recognition rate of IDCC and DCC shows a relatively flat trend, indicating that these two methods have certain advantages in describing the robust recognition rate. The research results can provide theoretical support for the application of deep residual neural networks in other fields.

#### 1. Introduction

Deep residual neural network has been widely applied in different fields: Aiming at the problems in the field of handwriting, a method based on deep residual network has been established, which can achieve higher accuracy [1]. In view of the problems existing in the accurate recognition of cancer lesions, deep learning technology can play a certain role in performing natural image segmentation, and this model has reasonable scale invariance and the ability to detect even small differences [2]. In view of the problems existing in the operation process of short-term load, a deep residual network short-term load prediction method based on adaptive method is proposed [3]. The accuracy and timeliness of neural network are considered in this method. A method using deep residual network to enhance image resolution was proposed. The method used behavior recognition algorithm and robustness analysis to calculate data and samples, and the results showed that the model could better reflect and describe images [4]. In order to better analyze and describe the characteristics of radar, an optimization model based on deep residual network is proposed. This model can not only analyze and verify the relevant data of radar, but also carry out predictive analysis of radar samples within a certain range, and the accuracy of this model has been verified by experiments [5]. The evaluation index of image quality is very important for image processing. Aiming at a series of problems existing in image research, an image evaluation model based on deep residual network and behavior algorithm is established. The model extracts relevant data for analysis, thus obtaining the corresponding model results, and the accuracy of the model has been verified through experiments [6]. Deep fuzzy neural network cannot well explain the behavior of nonlinear dynamics. A new architecture based on deep residual neural network theory is established. In order to verify the superiority of the model, a large amount of data were used for verification, and the structure showed the accuracy of the model [7].

The above studies mainly analyzed the application of deep residual network in aspects other than leisure sports. Therefore, in order to better study related behavior recognition algorithms and other problems existing in leisure sports, deep residual neural network theory is adopted. Leisure sports are monitored and analyzed based on behavior recognition algorithm, and the correlation model is used to predict and analyze the sample data of leisure sports. The research results can provide support for the application of deep residual neural networks in other fields.

The scale of leisure sports is shown in Figure 1. From the scale of leisure sports, we can see that with the passage of time, the scale of sports shows a trend of gradual increase [8–10]. The scale of leisure sports increased by 5.7 times from 129 in 2011 to 731 in 2021, indicating that the scale of leisure sports is gradually increasing. In order to quantitatively analyze the change rule of leisure sports scale over time, the increment percentage of leisure sports scale at different times was plotted. It can be seen from the figure that the corresponding increment also shows a trend of gradual increase. In addition, it can be seen from the way of linear fitting that this law conforms to the quadratic function distribution. Through the above analysis, we can draw relevant conclusions: The scale of leisure sports increases gradually with the increase of time, and the increment proportion also shows an increasing trend. However, from 2019 to 2021, it can be seen that the increment of the size of leisure sports decreased, indicating that there are some problems in the development process of leisure sports scale, which has restricted the development of leisure sports scale to some extent.

#### 2. Basic Theory of Deep Residual Networks

##### 2.1. Convolutional Neural Network

Convolutional neural network (CNN) is the most representative deep learning algorithm, which has achieved great success in the field of computer vision [11, 12]. Compared with traditional fully connected neural networks, CNN has fewer training parameters and more flexible training methods and can deepen model layers through network optimization methods. When the network model trains image data, the original image to be processed or the preprocessed fuzzy image is first put into the input layer, and then the convolutional layer and pooling layer process the image of the input layer. The feature information is extracted to form the feature map, and then the activation layer performs nonlinear operation on the feature map to enlarge the image and reconstruct the final result. CNN uses convolution computation to propagate feedforward signals, and the corresponding fully connected neural network is shown in Figure 2.

Through the whole neural network diagram, we can see that neural network can be divided into three parts according to its different functions: input layer, hidden layer, and output layer. The specific data processing process is as follows: Firstly, the verification samples and data are imported into the input layer; the data is analyzed through the algorithm in the input layer; and then the analyzed data is imported into hidden layer 1. Further analysis and validation of the input data are carried out through the five units and modules in hidden layer 1, and the corresponding data is then exported. The data of the map is imported into hidden layer 2 again. In hidden layer 2, there are three variable elements that can verify the input data. After the above hidden layer analysis, the obtained data is output, so as to realize the description of the neural network.

In order to better analyze the computation process of convolutional neural network, an introduction neural network structure diagram is drawn, as shown in Figure 3. As can be seen from the convolutional neural network flow chart, convolutional neural network can be divided into five parts: convolution unit, upsampling, convolution, downsampling, and full connection. Firstly, the sample data is imported into the convolution part to extract the relevant features of the data. Then, the feature data is imported into the upsampling for further analysis, through a series of operations such as copying and pasting the data. Then, the pasted data is imported into the new convolution part again for calculation. The newly obtained data is imported into the lower sampling for a new round of more accurate copy-and-paste operations. In this way, the obtained new data can be imported into the full connection operation. Through the full connection operation, the data sample can be analyzed and modified, and finally the relevant data can be exported.

Convolutional neural network mainly includes convolutional layer and corresponding activation function.

###### 2.1.1. Convolutional Layer

The convolutional layer is the foundation of convolutional neural network and the most core component of the network. Through the combination analysis of the obtained local feature information, the global feature information is formed. The corresponding operation mode is as follows:where *u*_{ij} is the input image, *m* and *n* are the size of the input image, is the size of the convolution kernel, and *b* is the bias constant of the convolution kernel. CONV(*ij*) is the characteristic graph output after convolution operation.

###### 2.1.2. Activation Function

CNN adds an activation function layer to the network and analyzes the model better by adopting the feature mapping method of nonlinear function. The full name of ReLU function is modified linear unit. The function is one of the commonly used activation functions, which is characterized by low computational complexity and no exponential operation. The ReLU function can be expressed as follows:

However, it is worth explaining that ReLU function has certain defects in the calculation process. When the data passes through the negative range of ReLU function, the output value is equal to 0. Under certain conditions, some neurons in the data network will no longer update parameters, thus reducing the expression ability of the network model. Based on the above problems, a Leaky-ReLU function is proposed to correct and optimize the ReLU function, which makes the output value not equal to 0 when the data passes through the negative range of the function. The Leaky-ReLU function can be expressed as follows:

The corresponding equations of Sig and Tanh are as follows:

The graph of activation function corresponding to the above functions is shown in Figure 4. The *y* value of Sig function increases gradually with the increase of *x*, which can be divided into three parts: The first part belongs to the stable stage. In this stage, with the increase of *x* value, the corresponding *y* value shows a constant trend of change, and the value is near zero. In the next stage, with the increase of *x* value, *y* value shows the trend of the curve increasing in the last step, and the slope corresponding to the curve at this stage is approximately constant, indicating that the linear characteristics of the curve are obvious. After the second part, as the *x* value increases, and the curve recovers again, showing an approximately constant trend of change. However, compared with the first part, the *y* value corresponding to the constant phase in this part is relatively large. From the change curve of Tanh, this curve is basically consistent with the change trend of Sig curve, except that the corresponding *y* value is in a negative state in the stable stage of the first part. In the second stage, the curve still shows a linear trend of change, but compared with the Sig function, the slope of Tanh function in this stage is relatively larger. As the value of *x* increases further, the curve reaches the third stage and coincides with the corresponding curve of the Sig function at this stage.

From the ReLU curve, the corresponding *y* value shows a linear change trend with the increase of *x* value, and the slope of this linear curve is constant. In addition, the slope of ReLU curve is basically consistent with that of Sig function in the second stage. And, the curve corresponding to Leaky function can be divided into two stages, indicating that Leaky function belongs to a segmented function, which coincides with the curve of ReLU function in the first quadrant. In the third quadrant, it still shows a linear change rule, but the slope is relatively smaller than that in the first quadrant.

##### 2.2. Basis of Residual Neural Network

Deep residual network has a good application effect in image classification, location task, and semantic segmentation and can effectively alleviate the related problems caused by the disappearance of gradient [13–15]. CNN convolution layer has the characteristics of local perception and weight sharing. Local perception means that the convolution layer only extracts local features during operation and then combines all local features into global features after operation. Weight sharing means that the weight parameters between the convolution kernels are shared and the features extracted by the convolution kernels with shared weight parameters are the same in any region of the neural network. These two characteristics enable CNN to have faster training speed and better network performance compared with traditional fully connected neural network. In order to better analyze the relevant calculation content of residual neural network, the structure diagram of different types of residual neural network is summarized, as shown in Figure 5.

In order to explore the structure differences of different residual neural networks, they can be divided into three structures according to the difference of their operation processes: ResNet structure, VDSR structure, and DRRN structure. It can be seen from the structure comparison diagram that the three structures are consistent in the initial data input and final data export, while there are great differences in the concrete computing part in the middle. The intermediate operation part of ResNet structure can include two identical data loop chains, which can make the data better for iteration and analysis and make the calculation results more targeted. In the VDSR structure, the data operation part is directly carried out without any cycle, which can ensure the fluency and authenticity of the data and make the data results more comparative. The running part in the middle of the DRRN structure belongs to the multi-iterative loop, and multiple iterative loops can make the analysis of experimental data more accurate thus making the derived structure more general.

The structure characteristics of several common residual neural networks are shown as follows [16–18]:(1)VDSR: VDSR model is a new network structure that will add residual module into the deep neural network. Compared with SRCNN and other traditional super-resolution network models, the VDSR model has a faster training speed, and it can be found through calculation that the model can train image data of different sizes.(2)DRRN: It adopts a deeper network structure, which makes better use of the residual network module. Each residual unit has two convolution layers, and the parameters at the corresponding positions between the convolution layers are shared. DRRN has faster computing speed and higher accuracy during training and can reconstruct higher-quality high-resolution images with less memory resources.(3)ResNet: It refers to VGG19 network and is modified on its basis. Residual unit is added through short-circuit mechanism. The changes are mainly reflected in ResNet directly using convolutional samples for downsampling. In order to better analyze the influence of different layers on the training error of ResNet, the calculation results of ResNet under different layers are summarized, as shown in Figure 6.

The two activation functions of different types are shown in Figure 6. It can be seen from the figure that the curves corresponding to the 56th and 20th floors change in a relatively large range, and both of them show a trend of gradual decline on the whole. However, their different trends can be divided into two parts: In the first part, with the increase of the number of iterations, the two curves first show a trend of gradual decline, but in the process of change, there are still certain fluctuations in the data. This indicates that the training error corresponding to the activation function has a certain jump. When the number of iterations reaches 3.5 × 10^{4}, the curve suddenly drops rapidly and enters the second stage. The main reason for the rapid decline of the curve in this stage is that there is a certain jump in the relevant data, which leads to the rapid decline of the training error. In the second stage, the curve still shows an overall trend of gradual decline, but the corresponding data still have certain fluctuations. Compared with the first stage, the changes of the two curves are relatively small. On the whole, the curve value corresponding to the 56th layer is higher than that corresponding to the 20th layer, indicating that the training error corresponding to the 56th layer is higher than that of the 20th layer. At the same time, this also shows that the 20-layer method is relatively good and the corresponding training error is relatively low, which can better reflect the relevant properties of the activation function.

In order to better analyze the influence of structures corresponding to different residual neural networks on the calculation results, the calculation results of residual neural network structures of different types are summarized, as shown in Figure 7. As can be seen from the figure, different data structures show a trend of gradual decline on the whole. Their different trends can be divided into three stages on the whole: In the first stage, the residual neural network shows a trend of rapid decline and then slow decline. In the process of this change, the data of relevant structures fluctuated to some extent, indicating that the value of this data changed with the increase of the number of iterations. The overall performance is a certain degree of decline. With the further increase of the number of iterations, the data curve drops sharply, thus entering the second stage. At this stage, the data tended to be stable after experiencing a short rapid decline and remained stable all the time, indicating that the curve had good stability on the whole at this stage. As the number of iterations further increases, the data enters the third stage, and the curve shows a slow downward trend. The slope of the corresponding curve gradually approaches zero, indicating that the data has good stability after the first and second stages, and the data can be derived. The curves of different structures are basically the same. It can be seen that the curve values corresponding to ResNet structure are the largest. The values corresponding to the VDSR curve are the smallest in the first part, but only second to the ResNet structure in the second and third parts. The values of the corresponding DRRN structure are second only to ResNet in the first part, but lowest in the second and third parts, indicating that the stability of VDSR and DRRN structure is poor, while the stability of ResNet structure is relatively good.

##### 2.3. Evaluation of Residual Networks

In order to better analyze the calculation results of the residual neural network, different analysis methods are adopted to analyze the deep residual network [19, 20]. The different evaluation indexes can be divided into subjective indexes and objective indexes.

###### 2.3.1. Subjective Evaluation

Mean Opinion Score (MOS) is a subjective evaluation standard for residual network quality evaluation, and its definition is as follows:where *i* represents the *i*-th sample to evaluate the residual network, and *X*_{i} is the score of the *i*-th sample to evaluate the residual network. MOS did not establish a rigorous mathematical model in the process of residual network quality evaluation. Considering the particularity of structural calculation, MOS was not used as the evaluation standard of residual network quality after reconstruction in this paper. The evaluation method of MOS is relatively fair and reasonable and is often used as an indicator of evaluation model algorithm. However, the results of MOS are also subject to subjective influence of participants. Different evaluators will have different views and opinions when comparing the same model, so there is a big gap in the evaluation results, leading to many uncertain factors.

The index of subjective evaluation is the range of scores, which is generally between 1 and 5: 5 means “very good evaluation,” 4 means “good evaluation,” 3 means “medium evaluation,” 2 means “poor evaluation,” and 1 means “poor evaluation.”

###### 2.3.2. Objective Evaluation

The principle of Mean-Square Error (MSE) is to evaluate the quality of the reconstructed network by analyzing the difference between the reconstructed residual network and the original neural network. It can be expressed as follows:where *X*_{ij} represents the reconstructed residual network with length and width of *m* − 1 and *n* − 1, and *Y*_{ij} represents the original neural network with length and width of *m* − 1 and *n* − 1.

Since the neural network will cause data changes in the process of computation and compression, it is more rigorous to use the Peak Signal-to-Noise Ratio (PSNR) to evaluate the quality of the reconstructed residual neural network.

The function of PSNR is to use data to reflect the advantages and disadvantages of each neural network algorithm, and its specific expression form is as follows:

Max is the peak value of the model, which is 255 in general. If it is 1, it indicates that the model has been linearly normalized. Structural similarity is used to evaluate the similarity of residual neural network from the three aspects of contrast, brightness, and structure and is an index to measure the similarity of residual neural network. The evaluation method is more accurate and has a wider range than PSNR. The definition of structural similarity is as follows:where *u*_{x} represents the gray mean of the original neural network, *δ*_{x} represents the variance of the original neural network, *u*_{x1} represents the gray mean of the neural network after reconstruction, *δ*_{x1} represents the variance of the residual neural network after reconstruction, *δ*_{xx1} represents the covariance between *x* and *x*_{1}, and *C*_{1} and *C*_{2} are constants. Generally speaking, the closer SSIM is to 1, the higher the similarity is, and the better the computational quality of the model is. SSIM can evaluate the quality of models from three different aspects, better meeting the requirements of perceptual evaluation, so it is widely used in the field of super-resolution.

#### 3. Behavior Recognition Algorithm

In order to better analyze the behavior recognition algorithm, a behavior recognition flow chart is drawn, as shown in Figure 8. As can be seen from Figure 8, the behavior recognition algorithm mainly includes acquisition end, information processing, feature extraction and behavior recognition. The specific calculation process is as follows: Firstly, the target and sample to be identified are imported to the collection end of the behavior recognition algorithm, and the samples are analyzed by feature extraction and behavior extraction in the collection end. Then, the parameters after feature extraction are input into the information processing module, corresponding to the extraction of morphological features and behavioral features. In order to better analyze the extracted data, the external morphological and behavioral features of the sample are fused through the feature extraction data plate. In this way, the corresponding analysis can be carried out by comprehensively considering the behavior of samples. Finally, the data are compared and verified with the data in the database of the corresponding object’s morphological characteristics, so as to explain the relevant problems.

##### 3.1. Basic Theory of Model

The path of behavior recognition algorithm is divided into two-stream recognition network model and class activation model [21–23].

###### 3.1.1. Two-Flow Network Model

Image sequence and optical flow diagram can be regarded as two different modes of information, and the sampling sequence of each mode can be expressed as follows:where *T* is the total number of video frames.

In the model corresponding to the behavior recognition algorithm, the modal features of the image sequence and the modal features of the light flow are average-pooled, and then the channel splicing and feature fusion are carried out, so that the final classification and prediction result of the network for the operation behavior in the video can be expressed as follows:where *f*_{1} and *f*_{2} are feature extractors of each mode, *G* is time aggregation function, *h* is multimode fusion function, and *y* is the predicted output of video.

###### 3.1.2. Class Activation Model

Class activation model (CAM) is a salient feature model that generates specific categories using average-pooling layer in modern deep CNN network [24, 25]. The principle formula of CAM is derived as follows:where *f*_{l}(*i*) is set as the activation value of unit *l* at the spatial position *i* of the last convolutional layer of the network, and CNN network operation is performed on unit *l* to obtain *F*_{l}.

Softmax input for category *c* is as follows:where is the weight.

Softmax output for category *c* is as follows:

For a given category *c*, its CAM can be expressed as follows:

Using the CAM generated by *M*_{c}, we can identify the image region considered by CNN when it classifies the image as class *c*. Therefore, if we use the highest probability categories corresponding to the results output by the Softmax function, the generated CAM will provide a saliency feature map of the image. Since the operation behavior in the first view focuses on the object being operated by the first person, CAM can be used to make the network focus on the area of the operation object in the image, so as to realize the spatial positioning of the operation object under weak supervision. In view of the differences among different algorithms in the class activation model, the calculation results of relevant parameters under different algorithms are summarized, as shown in Figure 9:

In order to better analyze the differences and connections of various methods under different training samples, their recognition rates are drawn, as shown in the figure. It can be seen from the figure that, with the gradual increase in the number of iterations, the training samples show different trends. Among them, the recognition rate of *F*_{l} data is the highest, and the overall variation range is between 80 and 90. The curve is relatively flat, indicating that the training method is relatively stable. The data of *S*_{c} curve was second only to that of *F*_{l}, which also showed relatively stable changes, with the overall change ranging from 70 to 89. On the other hand, the data of the corresponding identification methods *P*_{c} and *M*_{c} are relatively close on the whole, except that the number of iterations is relatively minimum when it is 10, and the range of other changes remains around 80. From the above analysis, we can see that the *F*_{l} and *S*_{c} methods have relatively high recognition rate, while the *P*_{c} and *M*_{c} methods have relatively low recognition rate.

##### 3.2. Robust Analysis

Multiple samples are selected as robustness test data for the above algorithm [26–28]. Relevant studies show that the robustness recognition rate of class activation model is higher than that of other methods under different number of training samples [29–31].

Specifically, for each frame of the input image, CAM is firstly calculated using the category with the highest probability output by CNN network, and then Softmax operation is applied along the spatial dimension. The obtained CAM is transformed into a probability graph, which is a spatial attention diagram. Finally, it is multiplied by the output of the last convolutional layer of CNN network to obtain a new image feature with spatial attention. For each image input frame, the class of CAMs with the highest probability needs to be output using the network. Firstly, the data are applied Softmax operations along the spatial dimension to transform the resulting CAMs into probability maps, and then the last layer of convolution is multiplied with the CNN network output to obtain new image features with spatial attention. The corresponding formula is as follows:where *f*_{i} is the feature graph output by the convolution layer at the spatial position *i*, *M*_{c}(*i*) is the CAM obtained using the category *c* with the highest probability output by CNN network, *f*_{SA}(*i*) is the image feature obtained using spatial attention, and ⊙ is the product of the corresponding elements.

The above methods have certain errors in the recognition of training samples. In order to better study the robustness recognition rates of various methods under different training samples, robustness analysis method is adopted to calculate the data, as shown in Figure 10. As can be seen from the figure, the overall recognition rate of *F*_{l} data is the highest, showing a trend of slow decline at first and then a gentle change. The identification trend corresponding to *S*_{c} identification method increases rapidly at first, then decreases slowly, and finally increases rapidly, indicating that the stability of this identification method is relatively poor. However, the recognition rate of IDCC and DCC shows a gentle trend of change on the whole, and the corresponding recognition rate is basically the same, indicating that these two methods have certain advantages in describing the robust recognition rate. Therefore, under different training methods, the order of the robustness recognition rate is *F*_{l}, *P*_{c}, *M*_{c}, and *S*_{c}.

#### 4. Analysis of Leisure Sports Behavior Recognition Algorithm

##### 4.1. Introduction to Leisure Sports

Leisure sports require people to choose their favorite sports to exercise after work, in order to achieve the goal of physical exercise and relaxation. Compared with traditional sports, leisure sports take fitness and entertainment as the main goals, assuming that all participants can participate in sports in a relaxed state of mind and body. In addition, recreational sports are not necessarily winners like traditional sports, but emphasize that all participants can get a sense of physical and mental pleasure and improve their physical quality, and play a certain role in enriching their spare time.

With the continuous development of the times, leisure sports have become an indispensable part of people’s daily life. Leisure sports, as a new form of sports, are characterized by freedom, culture, nonutility, enthusiasm, and initiative. In order to better analyze the influence of different factors on leisure sports, the proportion chart of leisure sports under different impression factors is summarized, as shown in Figure 11. From the proportion of different factors, it can be seen that enthusiasm, as the most important influencing factor in leisure sports, accounts for 33.33% of the total. The proportion of nonutilitarian factors in leisure sports is about 23.33%. Culture, as an important influencing factor, accounts for about 20% of leisure sports. Free activities are one of the leisure sports pursued by people in today’s environment, accounting for about 13.33. The lowest proportion is that of the initiative factor, which is only 10%, indicating that initiative has the least influence on leisure sports.

##### 4.2. Analysis of Leisure Sports Behavior Algorithm

On the basis of the above research, in order to better analyze and study the relevant content of leisure sports, based on the theory of deep residual neural network, behavior recognition algorithm is used to analyze the relevant data of leisure sports, so as to obtain the corresponding recognition algorithm of leisure sports. In order to more accurately describe the variation rule of the samples of the behavioral motion algorithm, the calculation results of different samples under the behavior recognition algorithm are summarized, as shown in Figure 12.

By using the above method to identify leisure sports under different samples, the corresponding sample analysis graph was obtained. It can be seen from the graph that, with the gradual increase of samples, the corresponding value showed a relatively fluctuating trend of change, which remained at about 40 on the whole. The variation trend is a series of fluctuation changes, which first slowly decreases, then gradually increases with the increase of samples, then slowly decreases, and finally slowly increases again. The values are relatively high when the sample size reaches about 52–56. Specifically, when the sample number was 54, the highest sample value (278) was reached.

In order to better analyze the impact of different models on leisure sports, different models are used to analyze relevant data, and the prediction curves of leisure sports under different models are obtained through iterative calculation, as shown in Figure 13. It can be seen from the figure that the calculated data of different models show different changing curve trends, which can be divided into three stages. From the perspective of the dual-flow model, the curve shows a rapid downward trend in the first stage, with a large drop range, indicating that the stability of the curve in this stage is poor. As the number of iterations increases, the curve enters the second stage. The overall performance of the curve is relatively flat, and the variation range is small, indicating that the stability of the curve in this stage is relatively high. Finally, the curve enters the third stage, showing a slow downward trend. In the quasi-activation model, the curve shows a rapid downward trend in both the first and second stages, but the downward trend in the second stage is higher than that in the first stage. In the third stage, the curve shows an upward trend, indicating that the model increases to a certain extent when the number of iterations is high. In the robustness model, the curve drops slowly in the first stage, shows an approximately constant change trend when it reaches the second stage, and drops rapidly in the third stage. On the whole, the three models have basically the same overall range of variation. The robustness model has good stability in the first and second stages, while the dual-flow model has good stability in the third stage.

#### 5. Conclusion

(1)As time goes by, the scale of sports shows a trend of gradual increase. The scale of leisure sports increases from 129 in 2011 to 731 in 2021, an increase of about 5.7 times. The scale of leisure sports increases gradually with the increase of time, and the increment proportion also shows an increasing trend.(2)Activation curves of residual neural networks with different structures are basically the same, and the curve values corresponding to ResNet structure are the largest. The values corresponding to the VDSR curve are the smallest in the first part. The DRRN structure values are second only to ResNet in the first part, but lowest in the second and third parts. This indicates that the stability of VDSR and DRRN structures is poor, while the stability of ResNet structures is relatively good.(3)The overall variation range of*F*

_{l}data recognition rate is between 80 and 90, and the curve is relatively flat, indicating that the training method is relatively stable. The data of

*S*

_{c}curve was second only to

*F*

_{l}curve, which also showed relatively stable changes. However, the data of the corresponding

*P*

_{c}and

*M*

_{c}identification methods are relatively close, indicating that the recognition rates of

*P*

_{c}and

*M*

_{c}methods are relatively low.(4)The calculated data of different models show different changing curve trends, which can be divided into three stages. On the whole, the three models have basically the same overall range of variation. The robustness model has good stability in the first and second stages, while the dual-flow model has good stability in the third stage.

#### Data Availability

The dataset can be accessed upon request.

#### Conflicts of Interest

The author declares that there are no conflicts of interest regarding the publication of this paper.