Abstract
With the improvement of the level of sports competition technology, pure sports game broadcasting can no longer meet the requirements of various users. Users urgently need new technologies to meet the needs of quickly obtaining game information. The 5G Internet of Things is rapidly rising; volleyball is also gradually into people’s life. To solve the above problems, the research on the volleyball video moving target tracking and detection algorithm based on 5G Internet of Things communication and artificial intelligence becomes very important. This article is aimed at comparing the tracking and detection algorithms of volleyball video moving targets through 5G Internet of Things communication and artificial intelligence. In the process of detecting moving targets, several commonly used background detection methods are comprehensively compared and analyzed, and a method based on adaptive background difference is proposed. The algorithm uses Gaussian mixture model for background modeling, uses the OTSU threshold algorithm to determine the segmentation threshold , and proposes a background update method that combines the expected full statistics and the -close window to accelerate the background update speed, so that a more accurate sports prospect target. Experimental results show that the algorithm can detect and track moving targets in volleyball videos well in a fixed scene or when moving targets are not blocked, with a detection accuracy of more than 92%.
1. Introduction
1.1. Background
As an important video data in modern life, sports video has broad application prospects and audience groups, which has attracted widespread attention from the academic community. The paper analyzes and studies the communication mode and the current situation of the industry and proposes that the content should be flexible and innovative; the communication mode should be mobile, multiscreen, and social as the development direction, and the industry trend should form an open and dynamic industry chain. Due to the continuous improvement of the level of sports competition technology, pure sports game broadcasting can no longer meet the needs of various users, and users urgently need new technical means to meet the rapid acquisition of game information. For example, the audience is no longer satisfied with the existing passive waiting viewing method, and it is particularly important to accurately extract the video summary in the live broadcast of the game; coaches need to accurately extract certain information of the athletes to formulate game plans, analyze game strategies, or evaluate players’ performance. These users are more interested in the detection and tracking of sports targets, the extraction of motion trajectories, and semantic analysis based on this. Therefore, the analysis of sports videos focuses on sports targets, including balls or athletes.
1.2. Significance
Detecting and tracking sports targets in sports videos can provide more help for professionals. For example, athletes extract relevant data from game videos and conduct tactical research. Athletes can use these data to perform three-dimensional reconstruction and carry out realistic design, simulation, and analysis of technical movements to improve your training level. The referee analyzes the game images taken by the camera through relevant videos, extracts the targets in the region of interest, and uses scientific methods to ensure the fairness of the fierce sports competitions to make more accurate penalties. Research on volleyball video motion target tracking detection algorithm can help volleyball training and competition.
1.3. Related Work
With the improvement of the level of sports competition technology, pure sports game broadcasting can no longer meet the requirements of all kinds of users. Users urgently need new technologies to meet the needs of quickly obtaining game information. Yu et al. pointed out that tracking player actions from sports video sequences is a hot spot in computer vision technology. The state transition equation and observation equation in the target tracking system are often nonlinear and non-Gaussian, and the mean shift algorithm cannot effectively track the visual target. The principle and shortcomings of the traditional mean shift algorithm are analyzed. The reasons for its weakness are also analyzed. In order to effectively track fast-moving targets, a new tracking algorithm combining particle filter and mean shift is proposed. It uses a particle filter to estimate the position of the target in the previous frame. The position of the target is updated by the mean shift algorithm. Experimental comparison shows that it has better fusion performance in tracking fast-moving players in sports videos, but the actual application is not yet mature [1]. Chen and Hao systematically study the feature dimension reduction method of wireless communication signal. In this paper, RF fingerprint identification of power amplifier is taken as an example. High dimension reduction for RF fingerprint features and irrelevant or redundant features in feature space, but the cost of IoT devices is relatively high [2]. Hassabis D pointed out that the fields of neuroscience and artificial intelligence (AI) have a long and intertwined history. However, recently, exchanges and cooperation between these two fields have become less common. In this article, we believe that a better understanding of biological brains can play a vital role in building intelligent machines. We investigated the historical interaction between AI and the field of neuroscience and emphasized the current progress of AI inspired by neural computing research in humans and other animals, but we also need to pay attention to its downsides [3].
1.4. Main Content
This article explains the research background and significance of the subject and uses neural network algorithms to track the volleyball video moving targets. In the process of moving target detection, several commonly used background detection methods are comprehensively compared and analyzed, and an adaptive background difference method is proposed. The algorithm uses the Gaussian mixture model for background modeling, uses the OTSU threshold algorithm to determine the segmentation threshold T, and proposes a background update method that combines the expected full statistics and the L-close window to accelerate the background update speed. Sports prospect target. Through the introduction of the Internet of Things and artificial intelligence, the volleyball video movement target of the Internet of Things is tracked and detected. Through the analysis of the experimental results, it is concluded that the algorithm in this paper can detect and track the moving objects in the volleyball video well in a fixed scene or when the moving objects are not occluded, and the detection accuracy rate can reach more than 90%.
2. Volleyball Video Sports Target Tracking Method
2.1. Forward and Feedback Neural Networks
Neural networks can be divided into two categories according to the topological structure between neurons [4]: forward neural network and feedback neural network, as shown in Figures 1 and 2. (1)The forward network can divide the network into several “layers”; each layer receives signals in order, the layer only accepts the signals output by the layer, and there is no connection between the neurons in each layer, and the information is sequentially sent from the input layer pass up, as shown in Figure 1


As shown in Figure 2, the perceptron is a simple forward network that can be used to solve linearly separable problems. Given a set of possible inputs, learn the weights in the perceptron [5], so as to correctly divide the linearly separable data set into two categories. Single perceptrons are superimposed to form a multilayer perceptron. Perceptron is a typical structure of artificial neural network, which is characterized by its simple structure, convergence algorithm for the problems it can solve, and strict mathematical proof, which plays an important role in promoting the research of neural network. (2)The feedback network is slightly more complicated than the forward neural network. Each node of the network [6, 7] represents a computing unit, and each node directly outputs externally while receiving information and additional information from other nodes.
2.2. Tracking Algorithm Based on Convolutional Neural Network
Convolutional neural network is an emerging field in machine learning research. It covers probability theory knowledge, statistics knowledge, approximate theory knowledge, and complex algorithm knowledge; uses computer as a tool and is committed to real and real-time simulation of human learning mode; and divides the existing content into knowledge structure to effectively improve learning efficiency. It has strong feature representation capabilities and has applications in image classification and target recognition. Most of the existing convolutional neural network tracking algorithms use the ImageNet data set to train large deep networks offline and select a certain amount of positive and negative sample sets to input the network during tracking, resulting in repeated calculations in similar regions, resulting in the general real-time tracking performance. It is low, and many scholars have sacrificed accuracy to improve speed, but accuracy is very important for tracking tasks. Therefore, in this article, we propose a convolutional neural network tracking algorithm that uses a convolutional neural network as a feature extractor and uses the idea of mobility to extract features individually. ImageNet data is a well-known data set in the CV field, and the data set used in the ISLVRC competition is a lightweight version of the ImageNet data set.
2.2.1. Softmax Classifier
Softmax is the most commonly used classifier in convolutional neural networks. As mentioned in the previous section, convolutional neural networks must use cascaded classifiers to complete classification and recognition tasks [8]. The Softmax classifier used in this article has high classification accuracy, simple training, and other characteristics. Unlike the SVM classifier, which directly divides categories into outputs, it starts from a new perspective and maps the required input pixel vector to score and map the scores to probability domains.
The Softmax regression model is a generalization of the multiclassification problem in the logistic regression model. In particular, suppose our training set contains s labeled samples and the input feature is , the hypothesis function will output a -dimensional vector (vector and 1) to represent estimated probability values; the hypothesis function is as follows:
In the formula, θ is a collection of various types of scores. Then, train the parameters θ to minimize the loss function:
For any given input , we use the hypothesis function to estimate the probability value of each category . If there are categories, there are estimated values; then, the Softmax classifier can be written as follows:
where is to normalize the probability and θ1, θ2,…, are the scores in the model, where θ is listed as θ1,θ2,…, and expressed as follows:
After getting the score of each category, the neural network also has the recognition and classification functions [9]. Tracking can be seen as a binary classification task that distinguishes the target from the background, so this network can be used to complete the target tracking task.
2.2.2. Border Return
Since the convolutional neural network is a high-level abstraction of features and the sample with the highest score needs to be selected among multiple positive samples to locate the position, it is prone to inaccurate positioning, so the popular frame regression in the target detection field is added to the algorithm model [10].
In order to make the prediction more accurate, it is necessary to find a special relationship so that the original box can be mapped closer to the real box to obtain the predicted box. And we need to get as accurate a prediction frame as possible through this prediction relationship.
In actual use, it is mainly divided into the following two steps:
First is do translation . Assuming that the input bounding box is , and the coordinate of the prediction box is , then there are as follows: and so
Then, scale scaling (, ) is as follows: and so
From the above 4 formulas, we can see that we need to learn , , , and four parameters to get the predicted value . The following article will conduct a detailed analysis on how these four parameters are learned.
The goal of the bounding box regression model is to make the predicted value as close to the true value as possible, where the true value is ; then, there is
The objective function is expressed as follows:
Assuming that the difference between the predicted value and the true value is , by minimizing the loss function [11]:
The optimization goal is
Finally, use the least square method [12] to find .
2.3. Moving Target Detection Algorithm
According to the characteristics of the volleyball game screen, the background difference method is used to check the target. First, the Gaussian mixture model is used for modeling, and the background model is obtained. The median filter is performed on the image of each frame [13], the noise is removed, the background is reduced, and the adaptive OTS threshold segmentation algorithm is used to detect moving targets. The background model is updated in real time according to the dynamic changes of, and finally, the mathematical morphology is corrected according to the detection results [14] to achieve accurate detection targets. The flowchart of this algorithm is shown in Figure 3.

2.3.1. GMM Establishes a Background Model
For scenes that represent dynamic changes, each pixel can be represented by l [15]. The basic idea of the Gaussian mixture model is to define l states for each pixel, and each state is represented by a Gaussian function. Gaussian mixture model is a model that uses Gaussian probability density function to accurately quantify things. It is a model that decomposes things into several ones based on Gaussian probability density function. Some of these states represent the background model, and the rest represent the motion foreground model. Assuming that the pixel value of a certain pixel at time is , its probability density function can be expressed by the following:
In the formula, , , and , respectively, represent the weight, covariance matrix, and mean of the Gaussian distribution at time . Covariance is used in probability theory and statistics to measure the total error of two variables. In order to simplify the model, the , , and channels of each pixel in the image can be regarded as independent of each other and have the same covariance; then:
The Gaussian distributions are sorted in descending order according to the value of , and the first Gaussian distributions are determined as the background distribution by the threshold , namely:
(1) Model Matching. The current pixel value is matched with the previous distribution models one by one, and 2.5 times the standard deviation is used as the matching standard when matching, so the model that meets is identified as a matching model [16], and the remaining models are identified as nonmatching models.
(2) Background Update Guidelines [17]. The traditional adaptive Gaussian mixture model can handle slow changes in dynamic scenes and the interference of various moving objects more robustly. However, the learning rate of this model in the initial stage is too slow, especially when the scene is more complex. This update method is divided into two parts. For the first frame image, the model adopts an expected full statistical update method to speed up the convergence speed of the model, and for the image after the frame, the -nearest window update method is used. Optical flow is a concept used in object motion detection in the field of view to describe the movement of the surface or edge of an observed object caused by the movement of the observer.
The improved adaptive Gaussian mixture model has a relatively fast learning rate in the initial stage and can robustly handle various changes in dynamic scenes.
(3) OTSU Threshold Selection. The selection of the threshold is one of the keys to the background difference method. Choosing a threshold that is too high or too low will affect the extraction effect of the moving target.
(4) Parameter Method. Suppose the area of the original binary image is and the area of the segmented object is , then . Therefore, first specify a value of , and then calculate the image histogram, which can roughly get the reading value [18].
(5) Peak and Valley Method of Gray Histogram. If the gray difference between the target and the background in a given image is large, there are two peak gray distributions on the histogram, as shown in Figure 4(a). The gray value corresponding to the bottom position is selected as the threshold value of the binary image to realize the segmentation of the target and the background. This method is relatively intuitive and simple to select the reading value, but it is not suitable for complex background images.

(a) Bimodal histogram

(b) Differential histogram
The bimodal histogram is because there are two peaks in the histogram; it is because the observations come from two populations, and the data from two distributions are mixed together, and the bimodal histogram and the differential histogram are used for image recognition. When the boundary between the target and the background in the image changes sharply, for example, when the edge changes, the gray value of the image cannot be used directly, but the differential value [19] should be used to determine it, as shown in Figure 4(b).
(6) Discriminant Analysis Method. The basic idea of the discriminant analysis method is select the optimal separation value and use the threshold to divide the gray level set in the gray level histogram into two groups; one group is smaller than , and one group is larger than . In the case where the ratio of the gray-level average between-group variance to the intragroup variance is the largest, the obtained gray-level separation value is the best.
In this article, we use the OTSU algorithm [20] to complete the selection of the binary image review value. The OTSU algorithm is also known as the “Maximum Between-Class Variance Method” and is considered to be the best method to automatically select the reading method. This method analyzes the histogram of the input gray image, divides the histogram into two parts, the background and the foreground, and divides the point with the largest variance between the two parts as the obtained reading value. If a part of the target is mistakenly divided into the target, then the variance becomes smaller, so that the maximum variance between classes means that the probability of misclassification is minimized.
As shown in Table 1, during the entire experiment, the frame rate of video processing can reach 20 fps, which meets the real-time requirements. From the results of the target inspection, the moving target extracted by the adaptive threshold is relatively complete, and the algorithm provides an ideal segmentation value. In order to further improve the real-time performance, the algorithm can be changed without calculating the threshold value every frame.
(7) Connected Area Analysis. In morphology, in the processed image, some small interference areas have been eliminated, and small gaps have been filled, but there are still some large white areas or black gaps [21]. These large gaps must be filled. Full, first calculate the area of each connected black area. If the area of the black area is less than a certain threshold, change this area to a white area. After the above processing, the area of each connected white area can be directly calculated. When the area of the white area is greater than a certain threshold, it is considered that there is a background change.
3. Research Experiment of IoT Video Platform Based on Video Analysis
3.1. Moving Target Detection
In static images or video sequence images, the motion characteristics of the target can be combined to detect the target. In the moving target detection of video sequence images, methods such as optical flow method, interframe difference method, and background subtraction are used.
One of the most commonly used methods of moving target detection and segmentation is the frame difference method [22]. This paper proposes an effective background subtraction method—moving target detection algorithm. The basic idea is to use the difference operation between the current frame and the background model in the image sequence to detect the moving target area.
3.2. Moving Target Tracking
For the target tracking problem, researchers have done a lot of research and put forward many valuable algorithms. According to the different matching methods, tracking algorithms can be divided into four categories: region-based tracking, feature-based tracking, etc., and a tracking method combining multiple tracking methods is proposed.
A basic idea based on the regional target tracking algorithm is proposed; that is, the target template is obtained through artificial or image segmentation technology, and the target region is matched according to the similarity between the template and the candidate target.
Tracking based on target features is usually achieved by using prior feature information or adding constraints. For example, the feature points in adjacent frames of video sequence images have small changes in time and space, and the corresponding relationship between feature points is established as a constraint condition.
3.3. Concept of the Internet of Things
Nowadays, the popular definition is that the Internet of Things is to connect different devices to the Internet at the same time through sensing technology, item recognition technology, global satellite positioning technology [23], and communication technology, so as to realize mutual information sharing between them and finally achieve one-to-one, the purpose of identification, positioning, and supervision of items. The Internet of Things realizes the interaction between things, people and people, and people and things through information exchange. The important functions realized by the Internet of Things mainly include as follows: one is to perceive items and collect information; the other is to transmit and share information; the third is to effectively supervise items and perform intelligent processing.
Because the Internet of Things is an extension of the Internet, it is closely related to today’s various network forms. Because they belong to the network, the traditional network is divided into multiple levels [24], and the functions and performance requirements of each level are also different, as shown in Figure 5.

3.4. Platform Design
The IoT video platform based on video analysis is an intelligent monitoring platform that integrates video image perception, video analysis, data transmission, and other functions. The central server is responsible for the storage and transmission of all monitoring station data. The centralized processing of the central server is suitable for the intelligent upgrade of the video network and puts forward higher requirements on the video image. The distributed analysis technology developed on the basis of front-end equipment has strong processing capabilities and centralized management capabilities. It can browse, store, analyze, and quickly retrieve images of hundreds of monitoring points in real time, and the search speed is very fast and a good decision-making tool.
According to Figure 6, an IoT video platform based on video analysis is constructed.

3.5. Technical Route
In view of the shortcomings of the current video surveillance platform, as shown in Figure 7, closely following the above functional requirements and design principles, the application research of the Internet of Things video platform based on video analysis is carried out [25]. The design and implementation of the platform are carried out in two stages. First, the front-end equipment uses intelligent video analysis technology to detect, track, and recognize actions with the collected video images as the target. Through video compression processing, the information is transmitted to the central server to realize information recognition and fusion. The central server conducts centralized analysis and management of all monitoring points. Combined with the current mature video surveillance technology and solutions, a practical and unique video surveillance platform suitable for specific application industries and application environments has been designed and implemented.

4. Experimental Results and Analysis
4.1. Moving Target Detection and Quality Evaluation
The paper takes the detection results of players in volleyball videos as an example to verify the effectiveness of the algorithm. Using the algorithm in the paper to process the video image, the player detection effect is shown in Figure 8.

(a) 30th frame image

(b) Background image

(c) Inspection image
Experiments show that the algorithm has a good effect on the detection of moving targets in volleyball videos, as shown in Figure 8.
The following criteria are used to analyze and evaluate the detection effect of the paper’s algorithm:
Precision is as follows:
Recall rate is as follows:
Regarding the player detection situation in video segment 1, each consecutive 100 frames is used as a statistical unit and divided into 10 groups for effect evaluation. The evaluation results are shown in Figure 9, where the axis is the group and the axis is the percentage of the detection result. Table 2 counts the motion detection effects in different video scenes, where segment 1 is the detection of players and segment 2 is the detection of volleyball.

It can be seen from Figure 9 that the detection effects of the players in each group are different. The experiment shows that when the players are closer, the detection results have a greater impact, as shown in the fourth group in Figure 9. Table 2 shows the detection results of players and volleyball in different video sequences. The algorithm of the paper meets the real-time and effectiveness. The relatively low recall rate and precision rate in segment 1 are caused by the scene change and the players getting too close to each other for the purpose of saving the ball. How to further improve the accuracy of the algorithm in a complex environment is the next research direction of the subject.
4.2. Algorithm Performance Evaluation
Experiments were conducted on multiple videos to verify the accuracy of the algorithm, and the results are shown in Table 3.
Through experiments, it can be seen that the algorithm in the paper has a good detection and tracking effect of moving objects in a volleyball video under a fixed scene or when the moving object is not occluded, and the detection accuracy rate can reach more than 92%, but if the moving scene is more complex, the detection effect is not ideal, but it can meet the system requirements. The subject that needs to be improved is how to improve the accuracy of target detection in complex scenes.
5. Conclusions
The core of this article lies in the research of volleyball video moving target tracking and detection algorithm. The article first introduces the background and significance of volleyball video moving target tracking and detection, explains in detail the idea and principle of the algorithm, and lays a foundation for the source of the idea of the algorithm in this article. Through comparative analysis of the advantages and disadvantages of various algorithms, the algorithm flow suitable for the detection environment of this paper is adopted, and the effectiveness of the algorithm is verified through experiments. Currently, the application of artificial intelligence theories and methods to volleyball video moving target tracking and detection is still in its infancy and exploratory stage. This article is just a preliminary attempt to apply artificial intelligence technology to volleyball video moving target tracking. The superiority of the algorithm in this paper is verified by comparing the processing result of the algorithm with that of the traditional algorithm. In the experiment, 6 cameras were used to track the volleyball field in all directions, and finally, the trajectory of the volleyball in the three-dimensional space was obtained. Experimental results show that this method has the characteristics of short processing time and high signal-to-noise ratio.
Data Availability
Data sharing is not applicable to this article as no new data were created or analyzed in this study.
Conflicts of Interest
The authors state that this article has no conflict of interest.
Acknowledgments
This work was supported by the National Social Science Foundation (Approval no. 12CTY030).