Abstract
In recent years, the increasing demand for physical health promotes the basketball sports industry’s reform. The latest science and technology enter the sports industry one after another and constantly impact the traditional sports equipment. However, the conventional sports technology scheme is unreasonable, and the sports training management models are obsolete. So, it is impossible to use scientific methods in basketball training, and the overall training effect is not good. This paper proposes a basketball training algorithm using big data and the Internet of Things. The proposed algorithm uses gesture recognition from continuous video movement characteristics of key figures, trajectory characteristics, and background information fusion. It improves the recognition mechanism based on the Shuangliu C3D video basketball player action classification method. Due to the lack of a scientific exercise plan for basketball players, a training plan based on BMI and IoT-enabled big data is devised. The proposed scheme is implemented so that different basketball players customize their own scientific sports training modules.
1. Introduction
Basketball originated in the United States more than 100 years ago and has now developed into a global sport with a large-scale and extraordinary influence. At the same time, the flourishing and prosperous basketball culture has become a basic project to promote the development of the cultural and sports industry [1]. Thanks to the boost of high-level events, basketball can be promoted globally. The president of basketball games once mentioned, “the innovation of basketball training concepts in the new era and the improvement of athletes’ competitive ability have made basketball competition more intense.” It can be seen that the development of basketball is inseparable from scientific training. In order to improve the development of basketball, training is the key. Therefore, in recent years, most of the research hotspots have been concentrated in the field of basketball training [2]. Basketball training has become more scientific, and the subdivision areas of training have become more extensive [3].
Basketball training is a long-term, continuous, and systematic work [4]. Training sessions are an important part. It is generally believed that the ideal training goals can be achieved only by optimizing resources and training based on the personalities of the players and the characteristics of the team. All aspects of scientific research, training, selection, and management will improve performance, so overall coordination is required. In the process of basketball training, coaches should make adequate preparations for athletes before the game. According to the characteristics of the sport, follow the principles of universal education and scientific methods, and focus on athletes, physical [5], technical [5], tactical [6], psychological [7], intellectual [8], and recovery aspects. Conduct training. Basketball training mainly consists of four aspects: physical training, technical training, tactical training, and psychological training. Therefore, basketball training’s main content is physical training, technical training, tactical training, psychological training, and intelligence training. They are mutually complementary and inseparable.
Due to the lack of systematic and scientific basketball physical training, athletes cannot cope with high-intensity basketball competitions, high injury rates, and short sports life. Therefore, a scientific and effective basketball player physical training evaluation system is constructed to achieve the best effect of physical training. Standard physical training practice has certain guiding significance [9].
With the continuous development and improvement of big data, the Internet of Things, and neural network technologies [10, 11], people have begun to vigorously support and accelerate the modern basketball sports intelligent training industry’s development. However, the physical health of basketball players needs to be resolved urgently. Therefore, it is urgent to develop a complete basketball training system based on big data and the Internet of Things [12] to create a personalized sports training platform for basketball players in line with their physique. According to different athletes through analysis, collect user’s sports data through cloud monitoring. Customize different sports training prescriptions through analysis, extract sports movement characteristics of athletes, establish standard human movement description models, and guide athletes to train by their physical energy intensity to improve physical function. Therefore, this algorithm’s original intention is a personalized guided customized sports training plan, which can achieve basic sports training while maintaining the individuality of the athlete while having a relatively wide range of applicability. The algorithm proposed in this paper can solve the task requirements of basketball training and some shortcomings of sports training equipment. It is convenient for the competitive team to control the basketball training movement in a refined and comprehensive manner.
Following are the main contributions points of this paper:(1)This paper proposes a novel basketball training algorithm based on big data and the Internet of Things, guiding basketball players to train by their physical strength, improve physical function, and realize the scientific and refined basketball training management.(2)The use of AlexNet-enabled convolutional neural network helps to extract the video frame’s static features and construct the motion tube to extract the video’s dynamic features. The Cholesky shift completes the classification of behavior by performing function fusion of static and dynamic elements and then compensating by GRU network features in the time domain. The experimental results show that the video has the highest recognition accuracy because the fusion ratio of static and dynamic features is 8 : 2.(3)This paper proposes to use the 3D pyramid pooling layer to replace the last maximum pooling layer in the original C3D network to accept input videos of nonfixed size. Add a C3D network to extract the optical flow features in the video to further enrich the feature information.
The rest of the paper is organized as follows. In Section 2, related work is studied followed by methodology in Section 3. Experimental results are given in Section 4, and the paper is concluded in Section 5.
2. Related Work
In recent years, the term “monitoring” has gradually appeared in the field of competitive sports, that is, “monitoring” [13, 14]. The application of it in the field of physical training of athletes enables us to understand scientific. Data-based training has a higher and deeper understanding, which can provide more coaches, athletes, and scientific research personnel with timely feedback on training load and improve athletes’ performance. With the gradual rise of digital training monitoring research, coaches gradually break away from the conceptual limitations of multirepetition and heavy-weight exercise training and begin to pay attention to sports training monitoring during athletes’ physical training. The process of physical training is the process of coaches instructing athletes to carry out load training. It is a process of continuously strengthening athletes’ sports ability. The purpose is to promote athletes’ sports performance through reasonable training intensity and load training stimulation. Although the basketball training program relies on monitoring, monitoring is accurate basketball movement recognition [15].
As the huge application value hidden in motion recognition technology is gradually discovered, all walks of life seek opportunities to integrate with this technology. Therefore, more and more researchers devote themselves to the field of motion recognition technology. The rapid development of computer vision has laid a solid foundation for the research of action recognition. From the initial research on action recognition based on wearable devices to the current research on action recognition based on deep images, computer vision has been directly promoted to some extent changes in motion recognition technology. The key to action recognition is to extract effective action features. When extracting features for the same action, various features may be obtained, so there are some differences in the description of actions. Therefore, it is necessary to select the corresponding action feature for description according to the scene’s difference.
Theodorakopoulos et al. [16] constructed a human skeleton diagram based on the human bones’ joint points and calculated the angle characteristics between the joint points in the skeleton diagram to realize human movements. Gowayyed et al. [17] proposed a 2 D trajectory description, which uses an azimuth histogram to vote for the trajectory’s displacement. It uses three projections of the histogram to describe the 3D trajectory to describe the three-dimensional trajectory of the human’s joint points. Recognize human movements. Local features and global features are based on two aspects of the underlying features of the image. Among them, the local feature is the most prominent part of the local change feature, and modeling is performed to describe the human action. Laptev [18] considered that there are prominent changes in the image value of the local structure of time and space detection in time and space and used the Laplacian operator to estimate the normalization of the time and space scale. To define, the local and scale descriptors were calculated. The global feature represents the global interest area, and its code represents the whole body motion information. Global features include space-time features, silhouette features, motion energy maps, and so on. The review shows that it is effective to obtain basketball training actions through the Internet of Things and use computer vision technology to recognize them.
3. Methodology
This paper is based on the Internet of Things to capture basketball training sports videos because the character behavior recognition in the video is related to the movement trajectory of the target person in the video and the background environment in the video should be taken into account because the background and other information in the video are combined as features. The character movement’s trajectory characteristics in the video are also very important for the description of the character’s behavior in the video. Simultaneously, further processing should be carried out on the characteristic processing method of the person’s trajectory to describe the movement behavior more specifically because we have constructed a motion recognition method based on the fusion of dynamic and static features of the video.
3.1. Static and Dynamic Feature Extraction of Basketball Training
This section will introduce the extraction of dynamic features and static features of video from two aspects: CNN-based static feature extraction and motion frame based on the trajectory as dynamic feature extraction, laying the foundation for subsequent feature fusion and other work.
3.1.1. Static Feature Extraction of the First Frame Based on CNN
Since the video frame presents the scene information of the person’s movement in the video in a static manner, it can be considered to extract a single frame’s features. From another perspective, video frames can also be considered as images. Considering the successful application of CNN in image recognition, this section designs a neural network [19–22] to extract features from the first frame of each video segment. The specific network structure is shown in Figure 1.

The specific neural network structure consists of 5 convolutional layers, 2 pooling layers, 2 fully connected layers, and the final softmax layer. The Relu function is used as the activation function after the convolutional layer’s output and the fully connected layer. After a single video frame is an input to the network, 1000 vectors are finally output as static features in the softmax layer.
3.1.2. Motion Frame Generation Based on Trajectory
The trajectory calculated from the optical flow is used in this section as a low-level description of the dynamic characteristics. To distinguish the regions with different significant motions in the video frame, the trajectory clustering algorithm based on DBSCAN can be used to cluster the different significant motion regions in the video frame. Different significant motion regions in the video frame will form multiple clusters. To reduce the influence of noise and some relatively insignificant motion regions, some clusters can be removed. If the number of track points in the cluster to be measured in the current video frame is less than 50% of the number of track points in the largest cluster in the current frame, the cluster will be removed. The cluster can be removed, and the remaining clusters form a motion candidate frame. The motion candidate frame in each frame currently obtained is a circular area composed of the cluster center as the center and £ as the radius. To unify the moving frame scale in the subsequent calculations, the above clusters need to be transformed from circular to the rectangle. Calculate the Chebyshev distance from each point in the cluster to the cluster center, and discard the farthest 20%. The specific Chebyshev-based motion frame noise reduction is shown in Algorithm 1.
|
The clusters in each frame form a motion frame through the Chebyshev-based motion frame denoising algorithm. A vector is used to describe the motion frame, where and represent the abscissa and ordinate of the upper left corner of the motion frame. The coordinates r and f indicate the height of the moving frame and which frame in the video sequence it belongs to.
3.1.3. Dynamic Feature Extraction Based on Sports Tube
Different distinct moving areas in video frames are formed after clustering. However, the motion frames’ size is not uniform, so it is necessary to unify the size and number of motion frames in each frame in the related frame sequence.
Assuming that the input video is , first divide the video into video segments in the time domain. The video can be expressed as because different significant motion regions in each video frame form different motion frames after clustering. Therefore, the video clip can be described by the motion frames in different frames, and the calculation equation is as follows:where represents the th motion frame in the th frame of the th video segment. It should be noted that the number of motion frames in different frames is different. Suppose the number of motion frames in the video frame is not the same. In that case, it may happen that some motion frames cannot constitute a motion tube. The motion frames that do not belong to the motion tube also become a part of the motion tube, thereby affecting the description of the motion. Therefore, if a motion tube is to be constructed between consecutive frames through motion frames, the number of motion frames in each frame must be unified to ensure that each motion frame can be connected between adjacent frames.
To unify the number of motion frames in each video clip, we first calculate the average number of motion frames in each frame of a video clip. It is assumed that the average number of motion frames contained in each frame of a video clip containing continuous frames is . If the number of motion frames in the current frame is greater than , then all motion frames after N are deleted. If the number of moving frames contained in the current frame is less than , each in the vector is created manually by the linear regression method to ensure the same number of moving frames in the video frame. After the above steps, we have a uniform number of motion frame video clips. The calculation equation is as follows:
Then, the calculation equation for calculating the Euclidean distance matrix between the motion tubes formed by all the motion frames in the two frames is as follows:where is the Euclidean distance of all motion frames in the th frame and the th frame and represents the Euclidean distance between the th motion frame in the th frame and the th motion frame in the th frame miles away.
After calculating the Euclidean distances of all motion frames between adjacent frames, all motion frames of adjacent frames can be connected. The calculation equation of the motion tube of the video clip is as follows:where represents the th video clip, the first column of the matrix represents the frame in the th video clip, and the second column represents the operation frame in the frame. The following three columns represent the abscissor, coordinate, and height of the motion frame that needs to be connected with the motion tube in the frame. At this point, the moving tube’s construction based on the motion frame of each frame is completed. The dynamic eigenvector finally generated is expressed as , where = 1000.
3.2. Feature Fusion Based on Dynamic and Static Features
The previous two sections have extracted static features and dynamic features, respectively. Both static features and dynamic features are important for feature fusion, so this section will carry out feature fusion of the two features from the theoretical level.
3.2.1. Feature Fusion Based on Cholesky Changes
The Cholesky change decomposes a positive definite Hermite matrix into the product of a lower triangular matrix and its conjugate transpose matrix. Therefore, we can consider the integration of static and dynamic features in this way. Assuming that and represent static and dynamic eigenfeatures, respectively, considering that the existence matrix can be represented by and , the calculation equation is as follows:
Through the above matrix, we can get
At the same time, swap the positions of S and M to get
In the same way, we can get
3.2.2. Feature Fusion Based on Gaussian Distribution
Since the static state feature and the dynamic feature are each a set of vectors, the two can be described using a histogram first and fitted by a Gaussian distribution function. The specific calculation equation of the two-dimensional Gaussian distribution function that can characterize static and dynamic characteristics is as follows:where represents the standard deviation of the dynamic feature histogram, represents the standard deviation of the static feature histogram, represents the mean value of the dynamic feature histogram, and the form represents the mean value of the static feature histogram. Since there is scale inconsistency between the dynamic features and the static features in the Gaussian mixture model, the scale matrix should unify the two features’ histograms before calculation. The calculation equation is as follows:
The introduction of equation (9) can avoid the static features and dynamic features appearing at certain points in the model. The dynamic features will have a greater impact on the model while ignoring the impact of another set of features.
3.3. Recognition of Basketball Training Actions Based on IoT and GRU
When the dynamic and static features are fused to further capture the dynamic and static features’ temporal feature information, the GRU is used to process the fused features. The structure diagram of the basketball training action classification model based on GRU is shown in Figure 2.

The left half of Figure 2 represents the feature fusion of static features and dynamic features. The sequence after the feature fusion is input to the GRU to form a neural network [23–25], which is used to capture the features’ temporal features after the fusion of the video sequence. The entire network has 128 features. The GRU and C3D unit, with a dropout of 0.8, are finally output through the softmax layer and complete the recognition of the basketball training video.
4. Experiments and Results
In this section, we discuss the experimental environment, experimental setup, and experimental results.
4.1. Experimental Environment
Since the experiment in this article needs to train a deep neural network, the scale is large. The structure is more complex, and the calculation scale is large. The programming language used is Python. The version is 3.6. The deep learning framework used is Keras2.1.5, and the IDE for program deployment is PyCharm. All experiments are conducted in the same environment. All our experiments have been conducted on a desktop PC with an Intel Core i7-8700 processor and an NVIDIA GeForce GTX 1080ti GPU.
4.2. Experimental Setup
The source of our data set is a compilation of past basketball training videos for variable-scale video behavior feature extraction experiments. It is used to verify the variable-scale video feature extraction algorithm’s effectiveness based on a 3D convolutional neural network. This section sets the use of the general 3D convolutional neural network for feature extraction. The use of the pyramid pooling 3D convolutional neural network performs a comparative experiment of feature extraction. And for the training of the video behavior feature extraction model, two methods are also set for comparison: fixed input scale training (training from zero or using pretrained weights). The other is variable input scale training. For training with a variable input scale, the input video sizes are 16 × 112 × 112, 16 × 220 × 220, and 32 × 112 × 112, respectively. The video size represents the frame length, and represents the video frame size. In this experiment, the initial learning rate is set to 0.001. After 15,000 and 20,000 iterations, the learning rate becomes one-tenth of the original, and 30 training data are fed each time.
4.3. Experimental Results
First, the experiment observes the impact of variable-scale training and 3D pyramid pooling on the video behavior feature extraction model. In this experiment, only the video’s original RGB frames are considered as input, and the accuracy of the video behavior recognition of each model on the data set is calculated. The experimental results are shown in Table 1. It can be found that the results of variable-scale training are better than fixed-scale training by about 1%. No matter what training method is used, the model with the 3D pyramid pooling layer performs better than the general 3D convolutional neural network. Such a result may be because pretraining and variable-scale training can better reduce overfitting. The 3D pyramid pooling can better extract and save sufficient temporal and spatial information features.
Figure 3 shows the recognition result of the algorithm’s basketball training action in this paper, which proves the effectiveness of the algorithm in this paper.

5. Conclusion
This paper proposes a basketball training algorithm based on big data and the Internet of Things. Aiming at the problem of recognizing a single action in the field of action recognition, this paper is dedicated to the study of continuous action recognition and designing a video based on motion recognition. The feature-based feature fusion method performs feature fusion on the movement trajectory and background information of key characters. It then proposes a video classification method of basketball player action based on dual-stream improved C3D to complete the action recognition and classification. In response to the current lack of guidance on scientific sports training plans for basketball players, the overall results are poor. A scientific sports training plan based on the BMI index has also been proposed. By analyzing many basketball big data uploaded based on the Internet of Things, it can be realized according to different basketballs. Athletes customize personalized scientific sports training plans.
Data Availability
The data used to support the findings of this study are included within the article.
Conflicts of Interest
The authors declare that they have no conflicts of interest.