This article is dedicated to the research of video motion segmentation algorithms based on optical flow equations. First, some mainstream segmentation algorithms are studied, and on this basis, a segmentation algorithm for spectral clustering analysis of athletes’ physical condition in training is proposed. After that, through the analysis of the existing methods, compared with some algorithms that only process a single frame in the video, this article analyzes the continuous multiple frames in the video and extracts the continuous multiple frames of the sampling points through the Lucas-Kanade optical flow method. We densely sampled feature points contain as much motion information as possible in the video and then express this motion information through trajectory description and finally achieve segmentation of moving targets through clustering of motion trajectories. At the same time, the basic concepts of image segmentation and video motion target segmentation are described, and the division standards of different video motion segmentation algorithms and their respective advantages and disadvantages are analyzed. The experiment determines the initial template by comparing the gray-scale variance of the image, uses the characteristic optical flow to estimate the search area of the initial template in the next frame, reduces the matching time, judges the template similarity according to the Hausdorff distance, and uses the adaptive weighted template update method for the templates with large deviations. The simulation results show that the algorithm can achieve long-term stable tracking of moving targets in the mine, and it can also achieve continuous tracking of partially occluded moving targets.

1. Introduction

Under visible light irradiation, objects in the surrounding environment form images on the retina of the human eye, which are converted by photoreceptor cells into nerve impulse signals, which are transmitted to the cerebral cortex via nerve fibers for processing and understanding. Vision not only refers to the perception of light signals but also includes the entire process of obtaining, transmitting, processing, storing, and understanding visual information [1]. After the appearance of signal processing theory and computers, people tried to use cameras to obtain environmental images and convert them into digital signals and use computers to realize the whole process of visual information processing. In this way, a new subject is formed: computer vision. The research goal of computer vision is to make the computer have the ability to use one or more images to recognize the surrounding environment information [24]. Video motion segmentation is a branch of basic research in the field of computer vision. It is the foundation of applications such as intelligent surveillance, human-machine interaction, navigation guidance, and industrial robots. Video motion segmentation is to segment the moving target area in the image sequence according to a certain standard. The existing segmentation method mainly uses the difference of the edge, texture, and temporal and spatial characteristics of the moving object for segmentation. However, in practical applications, complex scenes, camera movement, multiple targets, occlusion, and other factors make it very difficult to perform accurate and stable motion segmentation. This ability enables the computer to perceive the geometric information of objects in the environment, including shape, position, posture, and movement, and to describe, store, recognize, and understand them [5].

Optical flow field calculation has important practical significance in industrial and military applications, such as robot vision systems that complete various industrial or military tasks, space satellite tracking systems based on motion analysis, surface-to-air missile fire control systems, and automatic aircraft landing [6, 7, 9]. In the process of exploring the method of solving the ill-posed problem, many algorithms have emerged to overcome the ill-posed problem. For example, Du et al. [8] found according to the optical flow field caused by the same moving object should be continuous and smooth; that is, the phase on the same object should be continuous and smooth. The speeds of neighboring points are similar, so the change of the optical flow projected on the image should also be smooth. A method of using additional constraints imposed on the optical flow field, that is, the overall smoothing constraint, is proposed to reduce the light flow. The calculation problem of the flow field is transformed into a variational problem. Singh et al. [9] considered that the basic equation itself already has constraints on the optical flow field in the direction of the gradient of the gray field at this point; it is proposed that additional smoothness constraints should make the optical flow field in the vertical direction along its gradient. The change rate in the heavy direction is the smallest. Based on this, a new iterative algorithm is derived. Dong et al. [10] believe that the optical flow field calculation belongs to a kind of differential problem. The Snake model was proposed by Ge et al. [11]. It was first applied to the field of lip language machine recognition. Its basic idea is to regard the boundary of the moving target as a dynamic contour line and then introduce the concept of energy function to solve the minimization of the energy function. The process is the process of finding the outline of the target object. The C-V model proposed by Allaoui et al. [12] is another common model. This model mainly uses the global information in the image and also achieves a good segmentation effect. In addition, video segmentation algorithms can also be divided into semiautomatic and fully automatic segmentation. Fully automatic segmentation does not require manual participation and is mostly used in places with high security levels, such as bank surveillance and military surveillance [1315]. Semiautomatic segmentation algorithms are mostly used in multitargets and complex motion backgrounds. For complex motion backgrounds, you need to manually determine the target’s position, contour, and other information, which facilitates tracking of the target in subsequent frames and improves the segmentation effect. However, this method is complicated to operate and has poor real-time performance [1619].

In order to include enough motion information, this article densely sampled the video frames and filtered the spatiotemporal feature points of these samples to remove some feature points that are not obvious and difficult to track with structural information and then use the optical flow method to track these sampling points. In order to describe the motion information of a moving object, the distance between two trajectories is defined in a specific time window. At the same time, a video motion segmentation algorithm based on spectral clustering is proposed, which uses the observation that the trajectories of moving objects in the video are similar to perform clustering analysis on the trajectories of densely sampled points to realize the segmentation of moving objects. The similarity matrix is constructed by using the distance between the trajectories. In addition, it is proposed to use the classic clustering algorithm to cluster the similarity matrix while adding the structural information of the moving objects in the video to improve the clustering effect. And the feasibility of the algorithm is verified through experiments.

2. Real-Time Model of Athletes’ Physical Condition in Training Based on Video Monitoring Technology of Optical Flow Equation

2.1. Distribution of the Solution Set of the Optical Flow Equation

Based on the optical flow equation sorting statistical theory, it is a nonlinear signal processing method that suppresses noise. It is suitable for those images that do not have too many or obvious details and edges. Figure 1 shows the spatial distribution of the solution set of the optical flow equation. For those images where there are more prominent points, more obvious edge lines, and other information, median filtering is not good, because the filtered image will lose a lot of detailed information [2024].

The calculation method of Canny operator is developed on the basis of other operators. The main idea is to introduce two thresholds to determine whether the pixel is on the contour. A low threshold means that many edges in the image will be detected, but many of them are not needed, so a high threshold is introduced to filter out some insignificant edges.

The idea of smoothing template is to remove sudden changes and filter out certain noise by averaging one point and 8 surrounding points. Although it considers the role of neighboring points, it does not consider the influence of the position of each point. All 9 points are treated equally, so the smoothing effect is not ideal, while the Gaussian template considers the point closer to a certain point. The influence of the point should be greater, and the weighting coefficient is introduced to make the smoothing effect more ideal.

Image smoothing often blurs the boundaries and contours of the image, while sharpening can make the edges, contours, and details of the image clearer. The fundamental reason why the image becomes blurred after smoothing is that the image has been averaged or integrated. Therefore, performing inverse operations (such as differential operations) on the image can make it clear. The commonly used method is gradient sharpening.

The gradient-based method derives the optical flow constraint equation based on the basic assumption that the gray level of the image remains unchanged before and after the movement. Since the optical flow constraint equation cannot uniquely determine the optical flow, other constraints need to be introduced. According to the different constraints introduced, the gradient-based method can be divided into a global constraint method and a local constraint method. The global constraint method assumes that the optical flow satisfies certain constraints in the entire image range, while the local constraint method assumes that the optical flow satisfies certain constraints in a small area around a given point.

2.2. Analysis of Video Monitoring Technology

In the field of video monitoring and image segmentation, we need to determine the contour of the motion area first. According to the curve evolution theory, the active contour will eventually stop at the place where the energy is the smallest, that is, the boundary line of the moving area. The problem to be solved is how to express the contour line and how to construct the force to minimize the energy of the contour line on the target boundary line. Different representation methods constitute different active contour models. The level set method is another commonly used activity model.

We also evaluate its indicators including router queue stability, packet loss rate, and response to burst traffic flows through simulation experiments. The level set is first used to solve the contour changes of flames and then introduced into the fields of fluid mechanics and image processing. The contour line is mapped to the level set of the three-dimensional space surface, usually a zero-level plane. The idea of level set can be visually regarded as a problem of mountain peak contour lines. The mountain peak is cut along the contour line. The shape of the cut surface is the contour line of the mountain peak on this contour line. In this way, the problem of the change process of the two-dimensional plane curve is transformed into the problem of the evolution process of the three-dimensional space surface. Figure 2 shows the level set distribution of video surveillance profile mapping.

The image flow constraint equation is actually a straight line equation on the speed plane. If you consider the continuous image sequence and assume that the target’s moving speed remains approximately constant in the frame image, for the moving target, the motion-constrained straight lines must approximately intersect at one point in the velocity plane. In the process of moving target detection, the scene where the moving target is located is constantly changing. In order to accurately extract the moving target, it is necessary to update the background model in real time. In order to improve the accuracy of the detection results, in general, the interval between model updates is as small as possible. The commonly used background modeling methods in the background difference method include statistical average background model, Gaussian mixture background model, W4 background model, and background modeling based on color information. The structure of the Gaussian mixture model is relatively simple and easy to implement, and the background model is also very reliable. It also has good robustness in complex scenes and is widely used. The detection steps of the Canny operator are generally divided into the following steps: first, we use Gaussian filter to denoise the image, then solve the gradient size and direction of the pixels in the original image according to the template, and finally detect the edge of the image with dual thresholds. Image segmentation algorithms based on edge detection are mostly applied to some images with obvious linear features, but this algorithm has obvious shortcomings: for images with uneven lighting and complex edges, these operators will have blurred edges, discontinuous edges, and weak edges.

2.3. Decomposition and Clustering of Physical Condition Images

Video motion segmentation technology emerged on the basis of the image segmentation technology of athletes’ physical condition. Image segmentation technology only uses the spatial information of the image. For sports video, the relationship between the frame before and after the image sequence contains more useful information for people. Because of the large amount of data in sports video, the video is neither convenient for storage nor for transmission. In video, moving objects often contain more information that people are interested in. Therefore, in the field of video processing, accurate segmentation of moving objects is a prerequisite for some other fields. In the field of computer vision, according to the different motion states of the lens and the object being photographed, motion videos can be divided into four types of motion. Among them, the most common are the two forms: the lens does not move, the object is moving, and both the lens and the object move. In video processing, the situation where both the lens and the object are moving is the most challenging. In addition, according to the number of lenses and moving objects, it can be divided into multivision and multitarget situations.

The idea of the algorithm is as follows: first search from left to right and from top to bottom, the first black point found must be the upper left boundary point; then from this boundary point, define the initial search direction as along. At the bottom right, if the point at the bottom right is a black point, it is a boundary point. Otherwise, the search direction is rotated 45 degrees counterclockwise, so that the first black point is found. Finally, this black point is used as a new boundary point.

And we compared with other two representative AQM algorithms including RED and PI. We rotate 90 degrees counterclockwise on the basis of the search direction and continue to search for the next black point in the same way until it returns to the original boundary point. The ultimate goal of filtering is to filter out noise while not damaging the image quality as much as possible. Here, a new filtering method based on median filtering is proposed. From the perspective of the entire filtering process, it is based on taking the maximum. The median filter of the value or the minimum value, because the window is time-varying during the filtering process, is also called dynamic filtering. For the input original image size of , starting from window processing until traversing the entire image, the average filtered image obtained is expressed by an expression. Figure 3 shows the decomposition and clustering process of physical condition images.

In addition to contour tracking algorithms, edge detection can use Laplacian and Sobel operators to quantify the change rate of gray paralysis and determine the direction through the field of each microcline.

In practical applications, due to the distortion of the optical lens, a nonlinear model is introduced in order to reflect the projection process more truly. For the most complex global motion, that is, when both the lens and the object are moving, motion compensation is required for the lens motion before segmentation. Therefore, motion estimation technology is of great significance to video segmentation, and accurate estimation of moving objects in the video is the basis for a good segmentation result. To estimate the actual motion in space by plane motion, due to insufficient conditions, some motion assumptions need to be made. There are generally three assumptions: temporal continuity, spatial continuity, and brightness invariance. In addition, some motion estimation methods put forward certain constraints for the actual motion situation, such as optical flow method and block matching method.

2.4. Real-Time Analysis Model Factor Normalization

At present, there are many real-time analysis and detection methods for feature points in the field of computer vision, and the motion information in the video is mostly described by feature points, so the selection of feature points is a key step in the field of computer vision.

If the number of feature points is too small, it is not enough to provide the required motion information, resulting in the failure of motion segmentation, but too many feature points will affect the computational efficiency of the algorithm. By extracting the trajectory information of densely sampled points, a similarity matrix between trajectories is constructed, and K-means clustering is used to segment moving objects. The feature points are tracked by optical flow method. In order to avoid drifting during the tracking process, only 15 frames are tracked. In addition, the concept of distance between trajectories is defined, and the similarity matrix is constructed based on this. Figure 4 shows the normalization of real-time analysis model factors.

When some high-dimensional data sets are classified by spectral clustering algorithm, the Euclidean distance between sample points is generally selected to construct the similarity matrix. However, in the experimental data in this article, some moving objects have complex motion patterns, and sample point trajectories are intricate and simple. The Euclidean distance cannot meet actual needs. After analyzing the trajectory information of the sampling points, the following method is adopted. In order to construct the similarity matrix between the sample points, the distance between the trajectories is first defined. For scenes where the light intensity is constant or the light only changes slowly, the single-Gaussian background model can effectively represent the background image through the automatic update of the background model. However, in a complex scene, the pixel value at the same position in the image is not only affected by changes in lighting but also may be affected by various complex interference factors such as dynamic background elements.

However, the configuration of router parameters is greatly simplified. There may be several Gaussian distributions of pixel values in a period of time. In this case, it is difficult to predict the true background using a single Gaussian model. This also makes the single-Gaussian model background method’s processing power drops sharply when dealing with complex scenes. In view of the above situation, based on the background modeling of the single Gaussian model, the background modeling of the mixed Gaussian model suitable for complex scenes is proposed. Then, we obtain the first eigenvalues and corresponding eigenvectors of the regularized Laplacian matrix and then use the K-means clustering algorithm to cluster these eigenvectors and realize the segmentation of moving objects according to the similarity between the trajectories.

3. Application and Analysis of the Real-Time Model of Athletes’ Physical Fitness in Training Based on Optical Flow Equation-Based Video Monitoring Technology

3.1. Feature Extraction of Video Monitoring Data

This paper uses experiments to verify the feasibility of the above algorithm. The software environment of the experiment is VS2010 and OpenCV, and the hardware conditions are Intel(R) Core(TM) i3-3240, 3.40 GHz, 4 G memory. The preprocessing is mainly to simplify the original vector data into data suitable for processing. In the actual measurement process, most of the data generated are complex, and the amount of data is huge and has no regularity. These data are called original vector data. The preprocessing process generally uses data filtering and only selects the data that we are interested in and suitable for subsequent processing.

Generally, shapes and textures are selected to represent the characteristics of the data. The quality of the data mapping determines the effect of the visualization. Drawing the flow field is mostly applied to the computer graphics theory, and the mapped data is drawn into an image that is easier for the observer to understand. Figure 5 shows the error fitting distribution of the video monitoring data.

It can be seen that the background wall illumination between the first and second frames does not change much, and the optical flow of moving objects can be effectively detected. However, between the third and fourth frames, pedestrians pass by and illuminate the background wall. As well as the increase in the speed of pedestrian movement between frames, the accuracy of optical flow calculation becomes smaller. The Kanade algorithm has a large change in the background illumination of the image sequence, and the calculation accuracy of the optical flow in the motion is not continuous enough. The Kanade algorithm is more adaptable to the environment than the Hom-Schunk algorithm; that is, the “anti-noise” performance is better. In the block motion displacement matching algorithm, the current frame image is divided into many blocks, and it is assumed that the motion displacement of all pixels in each block is the same.

Below, we search for the pixel block that best matches each macro block in the current frame image in the adjacent frame image; then, the relative displacement between the current pixel block in the current frame image and its matching block is the relative motion displacement vector of the pixel block. The motion vector of the background part of the image is smaller than the motion vector of the foreground moving target, so the motion vector of the target object in the BMV is partially eliminated. Here, the OTSU adaptive boudoir value method is used to determine a segmentation value, and the BMV image is binarized according to the segmentation threshold value. The pixel value of the point greater than the value is set to 0; otherwise, it is 255, and a new one is obtained.

3.2. Real-Time Model Simulation of Athletes’ Physical Fitness

It can be known from the experiment that the principle of the method is simple, and the algorithm is easy to implement. In addition, different audible values can be set in different situations, or the Great Law (OTSU) can be directly used to determine the audible value. This makes the result obtained can directly reflect the size, shape, and position of the sports daily mark, and the algorithm accuracy is relatively high. However, it is difficult to extract the background image, and the algorithm is susceptible to external conditions such as light and weather. These conditions cause the gray value of the original background image to change, which requires that the background image must be updated in real time.

In addition, we also evaluated the performance of iDroptail under mixed flow conditions. When the scale of the search window is larger than the gray scale change scale, multiple minimum values will appear on the surface of the SSD, and the peak value will be “offset” by the weighted least square method, resulting in incorrect speed estimation, and the obtained confidence measure is also very low. However, the value of the confidence measure reflects the reliability of the speed estimation; for the selected value after the regularization parameter, when the response in the search window is very small, the final speed estimate is very sensitive to the later selected value. Figure 6 is the confidence measurement curve of athletes’ physical condition.

Generally, the process of using image processing technology to eliminate jitter in video images can be divided into three major parts according to functions: motion estimation, motion filtering, and motion compensation. First, it is necessary to calculate the global motion vector between two adjacent frames of images, and then filter them, and finally perform motion compensation on the second frame of image to complete the elimination of image jitter. Edge detection is one of the most basic operations used to detect significant changes in a part of an image. In one dimension, the step edge is related to the local peak of the first derivative of the image. Gradient is usually used as a measure of function change.

These theoretical and experimental results show that if it can meet the user’s requirements for system stability and other major QOS indicators in network congestion. We can regard an image as an array of sampling points as a continuous function of image intensity. Therefore, similar to the one-dimensional time, the discrete approximation function of the gradient can be used to detect the significant change of the image gray value. The feature-based method is not affected by the overall change of the image gray level. The effect of acquiring features often depends on the settings of other operators and parameters and is easily affected by noise during the extraction process, so the required features may be missed, or other features are added, which cause difficulties in solving the matching relationship and cause mismatches. These problems can be solved by improving the algorithm. Figure 7 is a comparison of the evaluation errors of athletes’ physical condition.

Although the jitter of the background has been eliminated in the compensated image, there is still a part of the motion of the interference area. We know that the motion area of the target is connected. It uses an image segmentation algorithm based on connected domain analysis to segment the image to segment the target motion area, and the segmented motion target image is as text. Because the SUSAN algorithm uses low-level image processing methods to detect feature points with small core values, it has great advantages in antinoise and computing speed and is suitable for high-noise locations in coal mines.

The selection of the algorithm scale is related to the size of the input image. When the size of the image sequence is constant, a more suitable scale can be determined to enable the Retinex algorithm to achieve a better enhancement effect. In practical applications, we can set the scale parameter according to our own needs. For example, when we want to get more detailed image information, we only need to set the scale parameter to a small scale. The scale Retinex algorithm cannot guarantee that both detail enhancement and color fidelity can achieve better results at the same time.

3.3. Example Application and Analysis

In the actual shooting and acquisition process of the video image sequence used in the experiment, due to the complexity of the imaging environment, there are some random and disorderly noises in the video image sequence. Therefore, for the accuracy of the experiment, the video image sequence should be processed before the image processing. It is necessary to perform filtering processing. In this paper, the Gaussian filter is used to preprocess the video image sequence, and the filter window size is ( and sizes can also be selected). The Gaussian filter is very effective in suppressing the noise that obeys the normal distribution, and it is a very effective low-pass filter in both the spatial domain and the frequency domain. In addition, if the Gaussian filter is used, the system function is smooth, and ringing is avoided.

While simplifying the router configuration, it achieves the same or superior performance as the AQM algorithm. In this paper, different detection methods are used to process video sequences in different scenes, but whether it is background subtraction, interframe difference method in static scenes, or optical flow method in dynamic scenes. All of them process the gray value of the image pixels, so the collected color image must be converted into a gray image before the experiment. Figure 8 shows the result of Gaussian filtering gray scale processing for video monitoring.

In order to estimate the optical flow of the corner points corresponding to the next frame of image, we also need to use the corner point matching algorithm to get the matching points in the next frame of image. The template matching algorithm in the corner matching algorithm has the advantages of easy template selection, simple calculation, easy implementation, and high accuracy and is widely used in image registration. Normalized cross-correlation (NCC) is the most outstanding representative of template matching algorithms.

When the corner points are matched, first calculate the correlation coefficient for the corner point in image 1 and each corner point in the rectangular window in image 2, and use the corner point with the largest NCC value as the matching point to obtain a matching point set; if the same matching point pair is searched in the two matching point sets, the same corner point is regarded as a matching point. Figure 9 is a comparison of video monitoring Gaussian filtering correlation coefficients.

We select the video image sequence collected by the indoor surveillance camera, that is, the video image sequence under the static background for experiment, select the previous frame image of the first frame image where the moving target exists as the background image, and then read the video image through the software, collect each frame of image from the video in real time, subtract each frame of image from the background image, and perform threshold processing. Here, we randomly select three frames from the video image sequence, subtract each selected frame from the background image, and perform stop processing.

It can be used as an effective solution for network congestion control. The final area is the moving target area. (1) If the MBD point obtained by Step 1 is on the rectangular box at the center, search for the rectangular box centered on this MBD point to obtain the MBD point, and end the search. (2) If the MBD point obtained by Step 1 is on a rectangular frame, the search mode is the same as TSS. (3) If the MBD point obtained by Step 1 is exactly the center point, then the center point is the best matching point, and the search ends. In NTSS, if the MBD point obtained in the first step is located in a rectangular box at the center, the search mode in the next step is similar to FSS, which is divided into two types: on the side of the rectangular box; if it is on the corner, the total number of points searched is . Both modes only need two steps to find the best matching point: if the first The MBD points obtained in one step are on a rectangular box, and the same search mode as TSS is adopted to find the best matching point.

4. Conclusion

This article introduces three classic moving target detection algorithms: background difference method, frame difference method, and optical flow method. The theory of the three algorithms is specifically discussed, and the advantages and disadvantages of each algorithm are compared. The moving target detected by the background difference method is prone to “tailing” and is more sensitive to the dynamic changes of the background; the frame difference method has a small amount of calculation, and the detected moving target has “holes”; the optical flow method is very sensitive to the movement of the camera. The moving target can also be detected under the circumstances, but its antinoise performance is poor, and at the same time, the complete contour of the moving target cannot be detected very accurately. Through experimental analysis of various edge operators, it is determined that the improved Prewitt operator is used to detect the edge of the object to obtain more complete object contour information; considering the shortcomings of the traditional optical flow algorithm in real-time and antinoise, the strong SUSAN corner detection detects the corners and then uses the improved layered optical flow method to estimate the optical flow vector to improve the real-time performance of the algorithm. For the problem of changes in the underground illumination, this paper also proposes the trajectory of the characteristic points based on the layered optical flow. The forward-backward error method eliminates the mismatched optical flow points; finally, the edge information of the object is fused with the optical flow motion information, and the results are processed by mathematical morphology methods. The optical flow method has a wide range of applications in the field of moving target detection and tracking. Compared with the optical flow method, it explains the advantages of the optical flow method and also points out its shortcomings. The office proposed an improvement and finally verified the effectiveness of the improved algorithm through experiments.

Data Availability

The data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.


This work was supported by the 2021 Huanghe Jiaotong University school-level scientific research and innovation team: Huanghe Jiaotong University ‘Duzhi’ Enjoying Sports and Health Science Innovation Research Team (No. 2021TDZZ05).