Abstract
Moving target detection is involved in many engineering projects, but it is difficult because of the strong time-varying speed and uncertain path. Goal recognition is the key technology of the basketball goal automatic test. Also, accurate and timely judgment of basketball goals has important practical value. Therefore, a basketball goal recognition method based on an improved lightweight deep learning network model (L-MobileNet) is proposed. First of all, the basket detection is carried out by the Hough circle transform algorithm. Then, in order to further improve the detection speed of basketball goals, based on the lightweight network MobileNet, an improved lightweight network (L-MobileNet) is proposed. First of all, for deeply separable convolution, channel compression and block convolution reduce the parameters and computational complexity of the module. At the same time, because block convolution will hinder the information exchange between characteristic channels, an improved channel shuffling method, IShuffle, is introduced. Then, combined with the residual structure to improve the generalization ability of the network, the RLDWS module is constructed. Finally, a more lightweight network L-MobileNet is constructed by using the RLDWS module. The experimental results show that the proposed method can effectively realize the judgment of basketball goals, and the judgment accuracy is improved by 8.35%. At the same time, the amount of parameters and computation is only 29.7% and 53.2% of the original, and it also has certain advantages compared with other lightweight networks.
1. Introduction
The NBA (National Basketball Association) and CBA (China Basketball Association) are popular sports in current ball games. Through the understanding of the NBA and CBA, it is found that they use artificial and intelligent devices to realize timing and scoring. The method of combining artificial and intelligent devices is used to realize timing and scoring. There is a camera on the backboard of the NBA, which automatically takes photos whenever players are under the basket or on the layup. When the basketball falls from the basket, it will touch the net, which will drive the sensor. The controller will receive the goal information, and the referee will update the score and the time. The competition between the NBA and CBA is dominated by manpower and supplemented by equipment.
At present, the physical education professional examination includes special items and supplementary items. Special items and auxiliary items should be completed in a short time, and there are many examination items. Therefore, the task for invigilators is very heavy. However, the fairness and justice of the college entrance examination must be observed by every examinee and invigilator. At present, many places still use some manual methods to score basketball exams, so too much workload will lead to unfairness. This method is now slowly being replaced by some smart devices. Nowadays, infrared detection [1–5] is used in many exams to judge basketball goals. Through the two intelligent detection methods of infrared detection and microswitch, although some problems existing in traditional manual work have been effectively solved, these two methods also have some shortcomings, such as easy damage and high maintenance cost. Then, finding a better alternative method is an urgent research work at present.
With the rapid development of image processing technology, the detection technology of moving objects in a video has been more widely used [6–10]. With the rapid development of the same era, more and more requirements for video processing applications have been put forward. For example, the fixed-point shooting test is conducted in the college entrance examination for physical education. Moving object detection technology is a particularly important branch of vision technology. Because of the strong scientific research value of moving target detection technology, researchers have devoted a lot of energy to research it and achieved good research results.
2. Target Detection Analysis Based on the Deep Learning Network
At present, the target detection methods based on deep learning are mainly divided into two categories [11–14]: the two-stage detection framework and single-stage detection framework. The two-stage detection framework mainly divides the detection task into two stages. Firstly, the candidate region is generated, and then, the region is regressed and classified by using the deep network model. This class detection algorithm mainly has high detection accuracy, but the detection speed is slow. Girshick et al. [15] put forward the application of RCNN in the field of target detection and improved the map of the algorithm to 53.3% in the Pascal VOC 2012 dataset [16], which is far superior to the traditional target detection algorithm. However, there are also some problems such as tedious training steps, slow detection speed, and the need to input fixed-size images. He et al. [17] proposed an SPPNet algorithm to solve the problem that the fixed-size image must be input to extract features by the CNN. The innovation of this algorithm is that a Spatial Pyramid Pooling (SPP) layer is added between the convolution layer and fully connected layer.
The single-stage detection framework can classify and regress the targets in the image at the same time, without the operation of generating candidate regions. This class detection algorithm has a fast detection speed, but the detection accuracy is usually not as good as that of the two-stage detection algorithm. Redmon et al. [18] proposed the classic YOLO algorithm. In 2016, Liu et al. [19] proposed the SSD algorithm. This algorithm combines the advantages of YOLO’s fast speed and RPN’s accurate positioning [20] so as to achieve the effect of detecting the target at different scales. Compared with YOLO, the SSD algorithm can predict more candidate regions, and the detection effect is better, but the disadvantage is that the speed is slower than that of YOLO. The lightweight network model MobileNet [21] proposed by Google focuses on devices with limited resources, such as mobile or embedded devices, to maximize classification accuracy. The main innovation of this network lies in the proposal of the Depth-Wise Convolution (DWC) module to reduce the parameters and computation and the effective compromise between classification accuracy and speed by using two superparameters of width multiplier and resolution multiplier.
Therefore, in order to better solve the problems of easy damage, high cost, and misjudgment in traditional fixed-point shooting devices, this paper proposes a basketball goal recognition method based on image analysis and uses deep learning technology to solve the abovementioned problems. Firstly, the problems of the existing lightweight network model MobileNet are analyzed theoretically, and the improved strategies are put forward to solve these problems, and the RLDWS module is gradually constructed. Then, the improved L-MobileNet model is constructed by using this module instead of the original deep separable convolution. Finally, a comparative experiment with other lightweight networks is carried out to further verify the rationality and effectiveness of the proposed improved network.
3. Basketball Goal Recognition Based on the Improved MobileNet Model
3.1. Detection of Basket
The detection of the basket is that the original color image is processed into a gray image, the gray image is subjected to median filtering and mathematical morphology processing, and then, the Hough circle transformation algorithm [22] is used to extract the basket. Finally, the extracted basket circle is added to the original color image, and the position and size of the basket are marked. The essence of the Hough circle transformation is to transform the coordinates of the image and to transform the plane coordinates with the parameter coordinates so that the transformed results are easier to identify and detect. The general equation of a circle is as follows:where is the center of the circle and is the radius of the circle.
When the circle on the X-Y plane in the image space is transformed into the a-b-r parameter space, a three-dimensional cone will be formed in the parameter space corresponding to the circle containing (x, y) points. The Hough transform principle is shown in Figure 1. The parameters of the circle can be obtained from the detected point so as to determine the circle. The parameter image of the circle is shown in Figure 2.


3.2. Problem Analysis of the MobileNet Model
MobileNet has the following three problems [23–25]:(1)1 × 1 convolution has a large amount of computation: by deducing the computation and parameters of the network structure, it is found from the perspective of the layer type that the computation and parameters are mainly concentrated on 1 × 1 point-by-point convolution operation, in which the computation accounts for about 95% of the whole network and the parameters account for 75%, as shown in Table 1. In the depth separable convolution operation, the calculation amount of the depth convolution and the calculation amount of the point-by-point convolution are shown in equations (2) and (3), respectively. where is the size of the convolution kernel, is the size of the input feature, is the dimension of the input feature, and is the dimension of the output feature. It can be seen that the calculation amount of point-by-point convolution is positively correlated with , and the value of will gradually increase with the deepening of network layers, resulting in an increase in the proportion of calculation amount of point-by-point convolution operation. Subsequently, this problem is mainly improved by improving the depth separable convolution structure.(2)Low-dimensional data collapse caused by ReLU: when low-dimensional (n-dimensional) data are mapped to high-dimensional (m-dimensional) by the random matrix for ReLU operation and then mapped back to this dimension by the generalized inverse matrix, some information will be lost, and the smaller the m is, the more the information will be lost. To solve this problem, in the subsequent module design, it is considered to use the Mish activation function [26, 27] instead of ReLU after feature mapping with few channels; otherwise, the information will be destroyed.(3)No reuse feature: MobileNet is a very simple straight-cylindrical structure. In the training process of the network model, if the weight of a convolution node becomes 0, the output of the node will be 0 for any input. However, the gradient of ReLU operation to 0 value is 0, so the value of this node will not be recovered no matter how much the iteration is, and the residual module will be added to improve it later.
3.3. Improved MobileNet Model
In response to problems 1 and 2, in order to maximize the use of packet convolution, we modified the improved channel shuffle (IShuffle), as shown in Figure 3.

First of all, it is still uniform recombination of different groups of features, but there is a group of recombined features that are different, and this group of features is obtained by merging and combining the recombined features of each group, respectively. Specifically, it is assumed that the number of input features m is 9 and the number of scores is 3. The first six features are still the same as those in channel shuffling. The remaining three features are obtained by intergroup feature fusion of these nine feature channels. The last six feature channels are spliced with these three features in the channel dimension, and the final output features are obtained. As can be seen from Figure 3, after the uniform recombination of features, IShuffle fused features between groups to improve the information exchange between groups.
In addition, because ReLU is prone to data collapse in low dimensions, we consider changing ReLU nonlinearity to the Mish activation function after compressing the 1 × 1 convolution kernel of dimension, and its expression and graph are shown in Figure 4.

Compared with ReLU, the Mish activation function has better smoothness. When the value is negative, a smaller gradient flow is allowed, which makes the information better penetrate into the network, thus improving the accuracy and generalization ability of the network while still being borderless.
Because the block convolution reduces the information flow between channels, the modified IShuffle strategy is subsequently added. Finally, the outputs from the two parts are spliced to obtain the final output features. As shown in Figure 5, the module is named the “LDWS module.”

The LDWS module is mainly composed of a compression layer and expansion layer. The compression factor is , which represents the dimension reduction ratio of the compression layer, and its calculation is shown in the following equation:where represents the number of input channels of the LDWS module and represents the number of convolution kernels of the compression layer.
The calculation amount for the LDWS module is shown in the following equation:where represents the convolution kernel number of the 1 × 1 part in the expansion layer and represents the convolution kernel number of the depth separable convolution part in the expansion layer.
The comparison between the calculation amount of the LDWS module and that of depth separable convolution is as follows:where represents the number of packet groups in point-by-point convolution. It can be seen that by reducing the compression factor and increasing the number of packets in packet convolution, the amount of parameters and computation can be greatly reduced. In the experiment, the value of is 0.125.
In view of problem 3, the RLDWS module is designed by introducing residual connection based on the LDWS module, as shown in Figure 6, in which the blue and red parts are LDWS1 modules, and the purple part is an improved residual structure. Before ResNet appeared, in order to improve the recognition accuracy of the neural network model, a deeper network was often built by simply stacking layers. However, due to the back propagation process of the gradient, the deeper network may make the parameters of the shallow layer unable to be updated, and the gradient disappears, thus leading to the saturation or even decline of the network performance. Therefore, a residual connection structure is proposed in ResNet.

Using the RLDWS module and the LDWS module to replace some DWS modules in MobileNet and increase the number of RLDWS modules, the final structure of L-MobileNet is constructed, as shown in Table 2, where s is the step size in deep convolution, c is the number of output channels, and k is the number of final categories.
L-MobileNet is mainly composed of one standard convolution, five DWS modules, and nine RLDWS modules. Like MobileNet, in L-MobileNet, the standard convolution operation of 3 × 3 is the first step, followed by five DWS modules. Then, the remaining eight DWS modules are replaced with the improved RLDWS module in this paper, and an additional RLDWS module is added after the last RLDWS module. Finally, through the average pooling and full connection layer of 7 × 7 sizes and multiclassification with softmax, the output of the network is obtained.
4. Experiment and Result Analysis
4.1. System Construction
The basketball goal recognition system is mainly built by hardware and software environment. The hardware environment mainly provides video image data and running environment for the system, mainly including cameras and PCs. The software mainly processes video image data. The model of the camera is M30A, and the frame rate is 60 frames per second when the resolution is 640∗480. The image data format is YUV422. The data transmission protocol is USB2.0. The CPU of the PC is [email protected] GHz, the memory is 8G, and the graphics card is GTX 2060s. The operating system is Windows 10 (64-bit), and the software is MATLAB 2019 b.
FLOPs are used to measure the complexity of the model, specifically the number of multiply-add. The smaller the value of this index, the less the amount of calculation required by the model, that is, the faster the speed.
4.2. Selection of Grouping Quantity
FLOPs can be greatly reduced by using block convolution operation. However, the fact that FLOPs do not increase does not mean that the speed becomes faster. Grouping too much will increase the memory access consumption, and it will also reduce the speed. Therefore, it is necessary to weigh the selection of the number of packets through experiments. Table 3 shows the comparison of the effects of setting different packet convolutions on the network model on L-MobileNet, and represents the number of packets.
With the number of packets increasing from 1 to 16, the number of parameters and calculation of the network also decreases, and the judgment accuracy first increases and then decreases. Therefore, it is necessary to make a certain tradeoff between the judgment accuracy and the speed, so the convolution number of the grouping is selected as 4 to continue the subsequent experiments.
4.3. Verification of Residual Structure
Table 4 shows the LDWS results of the influence of the residual structure on the judgment accuracy and speed of the network. When the residual structure is added, the error of the network is reduced by 6.84%, respectively.
4.4. Analysis of the Experimental Process
The basket test results are shown in Figure 7. In addition, through the comprehensive analysis of three simulated test videos collected from the left, middle, and right shots, the basketball was detected from the video sequence frame images. The test sequence of shooting the video is shown in Figure 8.

(a)

(b)

(a)

(b)
After testing 30 groups of data (10 groups of data on the left, middle, and right), the test results are as shown in Tables 5–7, respectively.
In this experiment, the performance of L-MobileNet is compared with the commonly used lightweight models MobileNet, MobileNet V2 [28], SqueezeNet [29], and ShuffleNet [30]. Table 8 shows the performance comparison of each network model.
The L-MobileNet model is better than the other network models in every index. Compared with MobileNet, the judgment accuracy of L-MobileNet is improved by 8.35%, and the amount of parameters and calculations is only 29.7% and 53.2% of the original ones. SqueezeNet greatly reduces the number of parameters due to the compression of parameters, but the amount of calculation is still very large. L-MobileNet is superior to SqueezeNet except for a few more parameters. Compared with ShuffleNet, L-MobileNet’s judgment accuracy and parameter quantity are almost the same, but the amount of calculation is almost half that of ShuffleNet.
The change curve of judgment accuracy of each network model during dataset training is shown in Figure 9.

All the network models tend to converge after 160 epochs, and the judgment accuracy from low to high is MobileNet, MobileNet V2, L-MobileNet, SqueezeNet, and ShuffleNet. L-MobileNet has higher judgment accuracy than MobileNet and MobileNet V2. On the other hand, the training time of L-MobileNet is less than that of the other models, which is due to the advantages of lighter models.
5. Conclusions
In this paper, an improved lightweight neural network model (L-MobileNet) based on MobileNet is proposed and applied to basketball goal recognition. Through the construction experiment of the improved network model, the improved network with a compromise between accuracy and speed is selected. Compared with MobileNet, the judgment accuracy of the improved network is increased by 8.35%, while the amount of parameters and calculations are only 29.7% and 53.2% of the original ones. Finally, the improved network is compared with some commonly used lightweight network models, and it is concluded that the improved network has a good effect on the accuracy and speed of goal judgment. Then, the improved RLDWS module is tried to combine with VGGNet so as to further reduce the complexity of the model. At the same time, neural structure search will try to use data-driven and intelligent methods to automatically build a better network.
Data Availability
The experimental data used to support the findings of this study are available from the corresponding author upon request.
Conflicts of Interest
The author declares no conflicts of interest regarding the present study.