Abstract
In this paper, line dancing's moving object detection technology based on machine vision is studied to improve object detection. For this purpose, the improved frame difference for the background modeling technique is combined with the target detection algorithm. The moving target is extracted, and the postmorphological processing is carried out to make the target detection more accurate. Based on this, the tracking target is determined on the time axis of the moving target tracking stage, the position of the target in each frame is found, and the most similar target is found in each frame of the video sequence. The association relationship is established to determine a moving object template or feature. Through certain measurement criteria, the mean-shift algorithm is used to search the optimal candidate target in the image frame and carry out the corresponding matching to realize moving objects' tracking. This method can detect the moving targets of line dancing in various areas through the experimental analysis, which will not be affected by the position or distance, and always has a more accurate detection effect.
1. Introduction
The development of human action recognition can be summarized as movement, action, and activity. So far, human action is in the research stage of action recognition, that is, by extracting some features from the training data, using supervised or unsupervised training as a classification model, extracting features from the new data, and sending them to the model to get the classification results. Human action recognition has developed from human action recognition in the image to human action recognition in the video [1] using input data. Video action recognition is to process and analyze the original graphics or image sequence data collected by sensors and learn and understand the human action and behavior. Generally, based on motion detection and feature extraction, the human motion pattern is obtained through analysis. The mapping relationship between video content and action type description is established so that the computer can understand the video [2, 3].
In recent years, for video action recognition, the primary type of action videos includes daily human actions such as walking, running, waving, and clapping; sports actions such as diving, skating, and riding; and life actions such as cutting vegetables and washing vegetables. In numerous research studies on human action recognition, few universities and institutions have studied dance actions. Dance is a way of emotional expression to the public through body movements, which is more complex in expression and has a wide variety of movements. There are many dance actions with their own characteristics [4–6]. Therefore, the research of dance action is still in the stage of dance action analysis. In most cases, it is through the posture analysis of the collected dance action, and then it is applied to the performance of animated characters by using animation processing software. Various universities and enterprises have continuously studied human action recognition due to applying action recognition technology in numerous fields. Human action recognition can help the public life to be more colorful and facilitate human beings' production and energy significantly [7].
Due to the disappearance of intangible cultural heritage (because of the lack of inheritors), human action's key information can be preserved through action recognition technology to preserve intangible cultural heritage relevant actions. For example, dance actions have national characteristics in many areas of China. Critical information of dance actions is obtained and preserved through action recognition technology, which reduces the possibility of interruption of dance inheritance [8]. For dance action recognition, Pan et al. used zero crossings' angular velocity to realize video motion sequence segmentation [9]. Zheng et al. developed a more complex technology and used linear dynamic system output to represent low-level segments (primitives) [10]. Gao showed that motion segmentation is implied in state machine or motion graph representation [11]. However, the low-level segmentation cannot represent a complete action sequence. Talaat proposed three methods to long segment sequences into action subsequences [12] automatically. The first two methods were applied in practice video, which meant that the algorithm traversed all frames from the beginning to the end and created segmentation when breakpoints appear. The first algorithm allocated a cut when the intrinsic dimension of the local model suddenly increases. The second algorithm allocated the cut when the posture of the observed distribution changed. The third algorithm is a batch segmentation process and used the clustering method to form a simple action cluster for the continuous frame sequence of Gaussian mixture model belonging to different elements. Moemen et al. proposed to use the time structure of the vital pose set for modeling [13]. By comparison, the third method proposed by authors in [14] is more flexible (the speed change that allows gaps between adjacent postures is more robust). It is expected that the time structure would be learned automatically rather than designed manually [14].
Machine vision is an emerging and rapidly developing discipline. It is a broad area involving digital signal processing, graphics and image processing, human-computer interaction, pattern recognition, mathematical statistics, computer science, and other fields [15]. It has become an irresistible trend to use machine vision instead of traditional radar, infrared, and other means to capture external information through image and video technology [16]. Object tracking technology is an important research direction in machine vision, which has great practical value in real life. Moving object tracking technology is mainly used to process the video's continuous images, which contains more information about the moving object. Through the analysis of the scene spatial position change information between dynamic sequence images on the time axis, the moving target is extracted from the background, and then the target position is tracked. The target behavior can be analyzed further and understood by calculating the target motion information, such as the centroid position, motion mode, and trajectory route, to achieve a higher-level task. Moving object detection and tracking research is closely related to real life, bringing great convenience to people's lives. Using computer instead of human eyes and brain, people do not have to operate the computer all day to work easily and save workforce and material resources for the company. This technology has high practical value in intelligent transportation, important units, densely populated public areas, banks with high-security requirements, monitoring and detection of museums, medicine, military, and other aspects. With the continuous improvement of science and technology and computer technology's continued development, video image acquisition and storage costs are significantly reduced, and the processing speed is constantly improved. These are the platform support for the development of video tracking technology. Video tracking technology is in line with the future trend of automation, information, and intelligence. On the other hand, the tracking technology based on machine vision can solve the complex work of processing many video information for a long time, saving workforce and material resources, and bringing convenience to people. Moving object tracking technology is closely related to people's lives, bringing comfort to people's lives. Moreover, target tracking technology is widely used in military science and technology, a broad field related to science and technology and people's livelihood [17]. With the advancement of science and technology, the research in this field will be more mature and perfect. Based on machine vision, this paper studies the moving object detection of line dancing and judges the detection and tracking accuracy and effect of this method.
The rest of this paper is organized as follows. In Section 2, we discuss our proposed moving object detection of line dancing using machine vision. In Section 3, we comprehensively evaluate our proposed work. Finally, the paper is concluded and future research directions are provided in Section 4.
2. Moving Object Detection Method of Line Dancing Based on Machine Vision
The moving object detection of line dancing based on machine vision has two integral parts: background subtraction method and adaptive tracking of moving target. First we discuss the background subtraction method for moving objection detection in line dancing in Section 2.1 followed by adaptive tracking of moving target based on mean-shift in Section 2.2.
2.1. Background Subtraction Method for Moving Object Detection in Line Dancing
After understanding the basic methods of moving object detection, analyzing the applicable scene and the advantages and disadvantages of each method, this paper decided to use the moving object detection method based on background subtraction. Various methods of background modeling are deeply studied, and the advantages and disadvantages of various algorithms are compared. Considering the real-time and accuracy of the system, the improved frame difference background modeling method is adopted.
2.1.1. Improved Frame Difference Background Modeling Method
In this paper, based on the traditional frame difference background modeling, a new initialization method is proposed and its feasibility is proved by experiments. The steps of improved frame difference background modeling are as follows:
Get Initial Image. This step involves the following methods to get the initial image.
Choosing the Average Method. The principle of choosing the average method is proposed based on the original average method. Two adjacent images are selected. The region with variable pixel value is the moving target region, and the region with stable pixel value is the background region. The traditional adjacent frame difference method is used to distinguish the background region and the foreground region through threshold selection. The pixel of the moving object is set to 0, and the pixel of the background part remains unchanged.
The selection of threshold can be selected through experiments. Generally, the change of background pixels is small, but it is not completely unchanged. The threshold can be obtained through experiments according to the actual situation. In order to get a complete background image, we can select N frames in a video and use the adjacent frame difference method to compare the two images before and after to get a continuous background sequence. The process of selecting the average method is as follows:(1)The background sequence is obtained by frame difference method. The target region is set to 0, and the background region remains unchanged.(2)The number of times that the pixel in in the background sequence image is not zero is obtained.(3)The average of the sum of background sequence images is as follows:(4)Let iteration parameters be .(5)Frame difference method:
The selection of threshold : is determined manually, which can be obtained through practical experience. and are the -th and -th images, respectively, and is the difference image.
Background Update. After the initial background image is obtained by selecting the average value method, the background area and the moving target area of the current image are distinguished according to the frame difference method. The background part of the moving target area does not need to be updated, and the background area of the previous image remains unchanged. The background region is updated by adding the weights of the background of the previous frame and the image of the current frame.(1), return to step (3), and perform iterative operation.
The average method is selected to segment the moving object, and then the background area is averaged, which reduces the influence of the moving object on the background, and the extracted background effect is more accurate. A few prior video sequences are needed.
When the initial background image is obtained, the appropriate number of frames should be selected for modeling. If the number of selected frames is too small, holes will appear in the results obtained by the average method [18]. If it chooses too many frames, it will not only waste resources but also increase the burden of calculation. Moreover, if the time is too long, factors such as illumination and environmental changes will affect the accuracy of the results. Generally, it can be selected according to the speed of the object and the size of the target, so that all areas in the video image can display the background [19].
2.1.2. Moving Target Recognition
The principle of moving object recognition is that the background image obtained by the improved frame difference background modeling method is combined with the background difference method to obtain the moving object and realize the detection of the moving object. The binary image of the moving target is processed by morphological operation to remove the small target irrelevant to the moving target, fill the holes in the target, and connect the broken part of the target after threshold segmentation to make the image more complete. The main steps of moving target recognition are as follows:(1)The improved frame difference background modeling method obtains the background image .(2)Background difference: the difference operation is performed between the current frame image and the background image .(3)Image binarization: the experiment obtains the appropriate threshold , set “1″” if it is greater than the threshold, and set “0″” if it is less than the threshold.(4)For the binary image, the irrelevant pixels are removed by morphological operation to make the target more complete. Flow chart of moving target recognition is shown in Figure 1.

2.2. Adaptive Tracking of Moving Target Based on Mean Shift
After obtaining the moving object image, in order to realize the moving object detection and tracking of line dancing, the mean-shift algorithm is used to carry out the follow-up research. In recent years, mean-shift algorithm is widely used in target tracking. Some scholars use the Bhattacharyya coefficient as the similarity measure between target model and candidate target and use mean-shift algorithm to search the optimal candidate target, to obtain good tracking effect. Target tracking based on mean shift is a template-based tracking method [20].
2.2.1. Basic Principle of Mean Shift
In the field of machine vision, data analysis is usually carried out in multidimensional space. To estimate the nonparametric density of multidimensional sampled data sets, it needs to know the kernel function used in multidimensional space, that is, the multivariate kernel function.
In -dimensional space, points , are given; , and a positive definite bandwidth matrix , the multivariable kernel density of kernel are estimated as
Generally, the bandwidth matrix has two forms: one is diagonal matrix, that is, ; the other is proportional unit matrix, that is, . In this case, (7) can be written as follows:
If is centrosymmetric, and is defined in the interval of ; then is called the contour function of and is the normalized constant coefficient.
After introducing the contour function, the kernel density estimation can be rewritten as follows:
This expression is commonly used in mean-shift algorithm to calculate the probability density of eigenvalues. If it wants to know the location of the highest density data in the data set, it can estimate the standard density gradient. Let , and calculate the error covariance gradient. After simplification, it can get the mean-shift vector .
2.2.2. Practical Application of Mean Shift in Moving Target Tracking
In this section, we discuss the practical application of mean shift in moving target tracking. We mainly focus on the description of target model in this section.
Because the histogram of the target image records the probability of color appearance, which is not affected by the shape change of the target, mean shift selects the color histogram as the feature description of the target. Suppose the center of the target region is and the pixel position of the target in the image is represented by . If the target model is a histogram of values, the normalized color distribution can be expressed aswhere function is the image of the pixel at to the color index, is the delta function, and is the normalization coefficient.
According to the above frame difference method, the background area and the moving target area of the current image are distinguished, so that the pixels near the center of the target model are more reliable than other pixels. Therefore, different weights can be given to the pixels at different positions of the target, the pixels near the center are given a large weight, and the pixels far away from the center are given a small weight. By introducing kernel estimation into the objective density function, the probability density estimation of the objective model can be expressed aswhere is the contour function of the kernel function, and in function is used to eliminate the influence of different sizes of targets. Using condition of , it can get the following results:
In subsequent frames, the region that may contain the moving target is called the candidate region, and its center coordinate is , which is also the center coordinate of the kernel function. The pixels in the region are represented by . The description of the candidate region is called target candidate model. The probability density of the eigenvalue of the candidate model is as follows:where is the normalized constant coefficient.
The similarity function describes the degree of similarity between the target model and the candidate target, and the probability distributions of the two models are exactly the same in ideal cases. In the mean-shift algorithm, Bhattacharyya coefficient is selected as the similarity function. The Bhattacharyya coefficient is defined as follows:
The value of P is between 0 and 1. The larger the value of is, the more similar the two models are. In the current frame, the candidate model is calculated by calculating different candidate regions, so that the candidate region with the largest is the position of the target in the current frame.
2.2.3. Implementation Process of Moving Object Detection and Tracking
The most likely position of the target in the current frame is the candidate region that makes maximum. In order to maximize , the position of the target center of the previous frame in the current frame is the starting position, and the optimal matching position is found, whose center is y. Firstly, the target candidate model is calculated, and the Taylor expansion of (15) is performed at . The Bhattacharyya coefficient can be approximately expressed aswhere
It can be seen from (16) that only the second term changes with , so that
To maximize is equivalent to maximizing .
By analyzing (17), it can see that this expression is similar to the kernel density estimation function, but with one more weight , it can make take the maximum value by means shift iteration. The specific steps are as follows:(1)Suppose the distribution of the target model is and the estimated position of the target is.(2)Initialize the target position of the current frame with , calculate the distribution, and estimate the Bhattacharyya coefficient: .(3)Calculate the weight according to (17).(4)Calculate the new position of the target according to the mean-shift vector: .(5)Update and estimate .(6)When , .(7)If , end; otherwise , go to step 1.
The process of target tracking is to get the maximum Bhattacharyya coefficient through the gradient descent search of mean-shift vector, so as to find the best matching position of the target in the next frame.
3. Experimental Results and Discussion
Taking a dance troupe in a certain city as the research object, 10 dance performers from the dance troupe are invited to cooperate with the experiment. They are asked to carry out dance performance and use cameras to collect images of their dance actions.
The background effect obtained by the proposed method is compared with that obtained by traditional unimproved frame difference background modeling. The results are shown in Figure 2.

(a)

(b)
Figure 2(a) is the background of traditional frame difference background modeling method, and Figure 2(b) is the image of improved frame difference background modeling. It can be seen that the initial image of the traditional method contains the target information, and the shadow of the object motion is obvious when modeling the background, so the modeling is not accurate. The improved modeling method reduces the influence of motion information, and the background effect is ideal.
A simple stage background is selected; the dancers are invited to make different dance actions and are required to be in the normal position, the edge of the screen, and the position close to the camera, respectively, so as to verify the effect of moving object detection and tracking of line dancing by the proposed method. The test results are shown in Figure 3.

(a)

(b)

(c)
It can be seen from Figure 3 that when the line dancing target moves, the proposed method can predict the centroid position of the target and search and match the target in a window neighborhood of the prediction point in the next frame. It can be seen from Figure 3(a) that the method in this paper can accurately locate the object of line dancing. As can be seen from Figure 3(b), when the target object is at the edge of the field of view of the camera, a part of the target object can still be located. As can be seen from Figure 3(c), when the moving object is close to the camera and the size changes, the detection and tracking window will also change.
Under the influence of complex background, the moving target detection results of line dancing select the complex stage background with bright color. The webcam is fixed, and the proposed method is used to detect and track the line dancing movement of each dancer. The results are shown in Figure 4.

(a)

(b)

(c)
It can be seen from Figure 3 that the proposed method can detect the moving object of line dancing in complex background and obtain more accurate detection and tracking results.
The experimental results show that in the case of obstacles, the detection and tracking of moving objects in line dancing are improved. The experimental results are shown in Figure 5.

(a)

(b)

(c)
It can be seen from Figure 5 that when the detected object is partially occluded, even if only a few parts are not occluded, moving object detection of line dancing can still be realized. When the detected object is completely occluded, the detection and tracking frame will be reduced to a small point. Once the target to be detected is exposed, an edge will be detected immediately, which shows that the detection efficiency of the proposed method is high.
In this paper, the recognition rate of RGB recognition method, flow recognition method, and the proposed method are used to compare the different parts of the upper body, lower body, and the whole body. From the recognition rate obtained by different methods, it can be seen that the recognition rate of the upper body or the lower body is relatively close, but the recognition rate of the lower body is slightly higher. The recognition rate of the combination of human body regions is the highest. The recognition rate of the direct recognition of human body regions is lower than that of the combination of human body regions. The reason is that the number of upper body or lower body movements is close, and the lower body movements are more. In the application of the whole-body region of the human, when the movement is only the upper body region or the lower body region, the two regions will affect each other, and the dance action naming often uses the lower body and upper body action naming, respectively, and the separate training can improve the accuracy of recognition. The recognition effect of different features extracted from different human regions is shown in Table 1.
4. Conclusion
This paper mainly studies the moving object detection method of line dancing based on machine vision. We used two approaches to achieve this objective. An improved frame difference background modeling is used to attain moving object detection, and the mean-shift method is used to achieve moving object tracking. We took the real-life choreographer as an example for our experimental results to verify our approach and prove that both our approaches can detect the moving objects in any area of the camera lens and not receive the restriction of occlusion or distance camera. It can be seen that the method can obtain good detection results of moving objects of line dancing [19, 20].
Data Availability
The data used to support the findings of this study are available from the corresponding author upon request.
Conflicts of Interest
The authors declare that there are no conflicts of interest regarding the publication of this paper.
Acknowledgments
This study was supported by the Project of the Ideological and Political Research of University Course in Hunan Province: Research and Practice of the Ideological and Political Teaching of the Course “Line Dance” and the study result of the Project of the Hunan Educational Science Plan: Study on the Optimizing of Physical Education in University under the Impact of Epidemic Situation (XJK20CGD012).