Abstract
To enhance the effect of motion detection, a Gaussian modeling algorithm is proposed to fix holes and breaks caused by the conventional frame difference method. The proposed algorithm uses an improved three-frame difference method. A three-frame image sequence with one frame interval is selected for pairwise difference calculation. The logical “OR” operation is used to achieve fast motion detection and to reduce voids and fractures. The Gaussian algorithm establishes an adaptive learning model to make the size and contour of the motion detection more accurate. The motion extracted by the improved three-frame difference method and Gaussian model is logically summed to obtain the final motion foreground picture. Moreover, a moving target detection method, based on the U-Net deep learning network, is proposed to reduce the dependency of deep learning on the number of training datasets. It helps the algorithm to train models on small datasets. Next, it calculates the ratio of the number of positive and negative samples in the dataset and uses the reciprocal of the ratio as the sample weight to deal with the imbalance of positive and negative samples. Finally, a threshold is set to predict the results for obtaining the moving object detection accuracy. Experimental results show that the algorithm can suppress the generation and rupture of holes and reduce the noise. Also, it can quickly and accurately detect movement to meet the design requirements.
1. Introduction
Motion detection refers to the process of extracting and identifying objects in image sequences or videos of which spatial position changes as foreground motion [1]. Due to the dynamic changes of the scene, it is not easy to reliably and effectively detect and segment motion. In the field of computer vision, it has always been a common research subject. Operations such as tracking, processing, recognition, and behavior discrimination can be performed through effective segmentation of moving objects. These are widely applied to live production, such as video surveillance, motion recognition, and multimedia applications, among others [2].
At present, the frame difference method [3], optical flow method [4], and background subtraction method [5] are standard methods for motion detection, in addition to the feature matching method, K-nearest neighbor method, and ViBe algorithm [6]. Two adjacent frames of the image sequence are taken to perform pixel-based time difference and binarization to extract the motion area in the image for the frame difference method. The advantages of this method are as follows: its simple principle and a fast calculation speed. However, the noise resistance is poor, and holes and breaks are easy to be produced [7]. The ViBe algorithm stores a sample set for each pixel. In the sample collection, the sampling value is the pixel’s previous pixel value and the pixel values of its neighbors. The first frame of the image can be taken to complete the creation of the background model’s sample set. The sample set can then be compared to each new pixel value to see if it belongs to the context point. The calculation efficiency is high, but it is easy to cause breaks and other problems. In the optical flow method, the motion information of objects between adjacent frames is determined using changes in the time domain of pixels in the image series and the correlation between adjacent frames, based on the relationship between the previous frame and the current frame. Although the entire area of motion can be detected, there is a large amount of calculation. With a complex structure, it can be easily affected by external factors.
In this study, we propose an optimized Gaussian model and improved three-frame difference method for motion detection. The main contributions are as follows.(1)We present an optimized Gaussian model and improved three-frame difference method to solve the shortcomings of the aforementioned algorithm, which is used for this task till date(2)We compare the optimized Gaussian model with the improved three-frame difference method. First, the traditional three-frame difference method is improved to obtain the foreground motion. Then, the Gaussian model is optimized. According to the motion obtained by the Gaussian model, logical operation and morphological processing are performed on the two motions to obtain the final foreground motion. This algorithm can effectively overcome holes and adapt to environmental changes with good robustness.
The rest of this study is organized as follows. In Section 2, we discuss the improved three-frame difference method followed by the optimized Gaussian modeling in Section 3. In Section 4, the improved U-Net model is discussed. In Section 5, the experimental results and analysis are provided. Finally, the study is concluded and future research directions are provided in Section 6.
2. Improved Three-Frame Difference Method
In the traditional three-frame difference method, the first two frames and the last two frames of the adjacent three frames of images are taken for the difference operation. The two frames of difference images are then subjected to the “AND” operation to obtain the result [8]. Due to the continuity of the video sequence, the change between adjacent frames will be minimal if there is no motion in the area. If there is a motion, there will be apparent changes between consecutive frames [9]. Although motions can be quickly detected by this method, problems such as holes and double shadows are likely to appear.
To solve the aforementioned problems, the traditional three-frame difference method is improved in this study. The difference operation is conducted on the three-frame images of the video sequence interval. The “OR” operation instead of the “AND” operation is conducted for different images. The specific steps are as follows:(1)Three-frame images with one frame interval are collected and converted into grayscale images. Then, filtering processing to remove noise is performed to obtain the corresponding three-frame images, denoted as , , .(2)The k frame of the image is taken as the current frame, and the difference operation is performed with the k + 2 and k − 2 frames, respectively, to obtain the absolute value of difference D1(x, y) and D2(x, y) as shown in the following equations.(3)The obtained difference of the above equations is binarized, and an OR operation is performed as shown in the following equation.(4)Finally, appropriate morphological processing is performed on the results
According to the above principles, the code is written to realize the motion detection of a video image by the traditional three-frame difference and the improved three-frame difference. The detection effect is shown in Figure 1.

(a)

(b)
In Figures 1(a) and 1(b), images at different periods in the same video are shown. The left side image in each figure is the original video image. The middle is the motion detected by the traditional three-frame difference method. The motion observed by the enhanced three-frame difference method is seen on the right. The detection results show that improving the three-frame difference approach will help solve the holes’ problem to some degree. However, the motion is deformed to a certain extent, so it needs to be improved or combined with other algorithms for processing.
3. Optimized Gaussian Modeling
The value or feature of each pixel in the image is distributed within a specific range around a certain central point in a short time, and this distribution is regular. According to statistical laws, if the amount of data is large enough, it will be normally distributed, known as Gaussian distribution. According to this feature, if the pixel’s value is far from the center point, the pixel belongs to the foreground; otherwise, it belongs to the background [10]. In motion detection, the Gaussian modeling is taken for each background pixel by mixing K Gaussian distributions (3–5 is generally taken for K). Clearer motions can be extracted by Gaussian modeling, and the background can be updated and adaptive. The improved three-frame difference method can be combined to achieve complementary advantages and disadvantages and enhance the detection of motions [11].
3.1. Background Model Building
The grayscale histogram of an image reflects the frequency of a specific grayscale value in the image. Suppose the motion area in the image is larger than the background area. There is a certain difference in grayscale between the background and motion area. In that case, the grayscale histogram of the image presents a double peak-valley shape. One of the peaks corresponds to the motion, and the other peak corresponds to the central grayscale of the background. For complex images, they are generally multipeak. By treating the multipeak characteristics of the histogram as the superposition of multiple Gaussian distributions, the image segmentation problem can be solved [12].
Each pixel of the background image is modeled by a Gaussian mixture model composed of K Gaussian distributions as shown in the following equation.where k is the total number of distribution patterns, and represents the estimated value of the weight coefficient of the ith Gaussian distribution in the Gaussian mixture model at time t, is the ith Gaussian distribution at time t, and represents the probability density function of the Gaussian distribution. xt represents the value of ith pixel at time t. represent the mean vector and covariance matrix of the ith Gaussian distribution in the Gaussian mixture model at time t.
The patterns are arranged in descending order of the value of . The larger the value, the more likely the kth distribution is the background model. Then, the first B Gaussian distributions were selected as the background models. As shown in the following equation, T represents the proportion of the background.
3.2. Foreground Motion Detection
After the background model is established, each pixel value xt of the new input image frame is matched with B Gaussian distributions in the background model. If the pixel value matches any Gaussian distribution, the pixel is the background point; otherwise, it is the foreground, thus completing the motion detection. Each new pixel value xt is compared with the current K models. As generally shown in equation (6), the mean deviation coefficient is generally 2.5.
3.3. Background Update
The context model must be changed during the motion detection process. The context model is learning to enable the detection to adapt to changes in the environment, and a is the learning rate. Suppose a frame of image does not match any Gaussian distribution during the detection process. In that case, the pattern with the most negligible weight is replaced. The weight of the pattern is represented by equation (7), and its mean and standard deviation remain unchanged.
Suppose, it is detected that at least one Gaussian distribution in the background model matches the pixel. In that case, the parameters of the Gaussian model are updated using the following equations.
The effect of motion detection by Gaussian modeling is shown in Figure 2. The left side is the original video image, and the right side is the motion detection image. From figure, it can be seen that the Gaussian model can better overcome the problem of holes. Simultaneously, the appearance of the motion is not elongated, and the outline is evident [13].

3.4. Combining the Improved Three-Frame Difference Method
The frame difference approach has a simple operation and quick speed. However, due to its low noise resistance, holes are easy to appear. For the enhanced three-frame difference process, three nonadjacent frames are taken, and the OR operation is used instead of the original, which may not only eliminate gaps but also trigger issues such as elongated movements and blurred outlines. Therefore, the Gaussian model is combined to establish a learning background model for motion detection. The outline of the motion detected by the Gaussian model method is relatively straightforward, and the appearance is not elongated.
In the detection process of motions, first, the motions are extracted by the improved three-frame difference method and Gaussian modeling. Then, the logical AND operation is conducted for the motion images obtained by these two algorithms. Next, connectivity detection and morphological operations are taken to the process, and finally, the completed motion is achieved. The flow chart of the algorithm is shown in Figure 3.

4. Improved U-Net Model
The top-down image detection algorithm based on deep learning [14–18] extracts global image information and high-level semantic information by self-learning image features. In contrast, the bottom-up image detection algorithm based on low-level visual features extracts low-level image color and texture. Image features can extract rich image local and spatial information. Therefore, fuse the semantic information and global information of the top-down image detection algorithm with the bottom-up saliency detection algorithm’s spatial information. The local information can help the high-level network obtain the local image information to locate the salient target and improve the image detection effect.
Therefore, this study adds the saliency map extracted by the bottom-up motion detection algorithm in the decoder stage of U-Net and realizes the accurate prediction of the saliency image by fusing high-level global information and low-level local information. Specifically, the algorithm first reduces the saliency map extracted by the bottom-up image detection algorithm, generates a priori map of the same size as the feature map of the U-Net network decoder stage, and then combines the a priori map with upsampling. Feature fusion is performed on the feature map of the stage. Furthermore, the channel attention module is added after sampling at each layer to extract more important channel information. Finally, the saliency map output of each layer is upsampled to construct a weighted loss function to continuously optimize the learning process of the neural network [16, 19–22]. The specific structure of the model is shown in Figure 4.

5. Experimental Results and Analysis
To verify the algorithm’s effectiveness in the study, a highway surveillance video for testing was selected for the experiment. The hardware environment of the experimental platform is an ordinary computer, with an I3 processor, main frequency 2.1 GHz, and memory 4G. The software environment is Visual Studio 2019 and OpenCV3 computer vision open source library. The programming language is C#. Gaussian modeling, improved three-frame difference method, and combination of Gaussian modeling and improved three-frame difference method were taken to the test. The effect diagram of the test is shown in Figure 5. For the improved U-Net model, this article uses python language, and the deep learning framework uses Keras.

(a)

(b)

(c)
In Figure 5, the original video sequence is in the first column. The result of the improved three-frame difference method is in the second column. The third column is the result obtained by Gaussian modeling. The last column is the motion obtained by combining the two methods. Figure 5(a)–5(c) are the detection effect diagrams of the same video at different periods.
In Figure 6, we compare the accuracy of our work against the conventional Gaussian model and improved three-frame different model. We compare the accuracy of these schemes for different time frames. This figure shows that our proposed work has much better accuracy as compared to the existing schemes.

Figure 7 shows the comparison experiment results of the average time complexity under different time frames. Our scheme outperforms the existing models, and it has much better results for varying number of frames.

6. Conclusions
A motion detection algorithm based on Gaussian modeling and an improved three-frame difference method is proposed in the study. Adjacent frames of images are taken in the traditional three-frame difference method. Images of nonadjacent frames are taken in the improved algorithm, and “OR” operation is performed upon them. We use Gaussian modeling to adapt to environmental changes. After the two algorithms are tested, improved, and combined, the deformation, holes, noise, and other motion problems caused by the frame difference method can be effectively improved. It helps in raising the detection efficiency of motion detection. The antiinterference is enhanced, which can meet the requirements of motion detection, with certain practicability and value.
Data Availability
The data used to support the findings of this study are included within the article.
Conflicts of Interest
The authors declare that they have no conflicts of interest.