An Adaptive Smoothness Parameter Strategy for Variational Optical Flow Model
The smoothness parameter is used to balance the weight of the data term and the smoothness term in variational optical flow model, which plays very significant role for the optical flow estimation, but existing methods fail to obtain the optimal smoothness parameters (OSP). In order to solve this problem, an adaptive smoothness parameter strategy is proposed. First, an amalgamated simple linear iterative cluster (SLIC) and local membership function (LMF) algorithm is used to segment the entire image into several superpixel regions. Then, image quality parameters (IQP) are calculated, respectively, for each superpixel region. Finally, a neural network model is applied to compute the smoothness parameter by these image quality parameters of each superpixel region. Experiments were done in three public datasets (Middlebury, MPI_Sintel, and KITTI) and our self-constructed outdoor dataset with the proposed method and other existing classical methods; the results show that our OSP method achieves higher accuracy than other smoothness parameter selection methods in all these four datasets. Combined with the dual fractional order variational optical flow model (DFOVOFM), the proposed model shows better performance than other models in scenes with illumination inhomogeneity and abnormity. The OSP method fills the blank of the research of adaptive smoothness parameter, pushing the development of the variational optical flow models.
Motion detection [1, 2] is a research hotspot of image processing, which is widely applied in motion segmentation , target tracking , and video  surveillance . Optical flow method  is one of the most commonly used motion detection methods, which aims to estimate the spatial displacement of each pixel in adjacent image sequences.
Variational optical flow algorithm  is one of the most popular optical flow methods, which consists of three parts: data term, smoothness term, and smoothness parameter. In order to improve the performance of the variational optical flow algorithm, many improvements have been done to the data and smoothness term to solve the large displacement problem , enhance the robustness against noise and illumination changes [10–12], maintain the discontinuity between different motion regions , highlight the contour of motion regions [14–16], and deal with occlusion problems [17, 18]. However, little attention is paid to the smoothness parameter selecting method. The smoothness parameter acts as a bridge between the data term and smoothness term that directly affects the final result of optical flow estimation. Small smoothness parameter leads to overfitting, whereas higher value causes underfitting.
The optimal smoothness parameter is determined by image quality . Small value of smoothness parameter is assigned to image which has clear outline and high contrast, while large smoothness parameter is assigned to image that has blurry outline and low contrast. In optical flow estimation, many methods determine smoothness parameter through human visual perception which has no exact value, so the smoothness parameter chosen by these methods is inaccurate. Moreover, many existing methods use the same smoothness parameter throughout the whole image, but the image quality in different region may be different and due to this, a mismatch between the smoothness parameter and image content would exist. Although some methods have been proposed for adjusting the smoothness parameter through image content, none of them can solve these problems well.
Figure 1 shows the illumination inhomogeneous image sequence with different image quality regions and the optical flow estimation results of dual fractional order optical flow model. Figures 1(a) and 1(b) are input image sequences; Figures 1(c) and 1(d) are optical flow estimation results when the smoothness parameter is and , respectively. It can be seen in Figure 1(c) that the contour of the flow field on the bright part is clearer than dark part, whereas in Figure 1(d), the contour of the flow field on the dark part is clearer than bright part, so, if we set in bright part of the image and meanwhile in dark part of the image, the accuracy would be highest among these three methods.
The purpose of this paper is to solve the mismatch problem of smoothness parameter and image content in each part of an image; an adaptive smoothness parameter strategy is designed to select the optimal smoothness parameters for different regions in an image.
For this paper, the main contributions are as follows:(1)A new image segmentation algorithm combining SLIC with LMF is proposed to segment the whole image into several superpixel regions(2)We would not use the same smoothness parameter in the whole image any more; different smoothness parameters would be assigned to different superpixel region, respectively(3)Image quality parameters are used to calculate smoothing parameters for the first time
The rest of the paper is organized as follows. Section 2 reviews the work related to smoothness parameter selecting methods. Section 3 introduces the flowchart of the algorithm. Section 4 introduces the variational optical flow model used in this paper. Sections 5 and 6 describe the proposed image segmentation method and image quality parameters, respectively. Section 7 explains how to use image quality parameters to determine smoothness parameter for each superpixel region. The experimental results are discussed in Section 8 and conclusion is presented in Section 9.
2. Related Work
The smoothness parameter adjustment strategy for the variational optical flow model was first proposed by Nagel in 1986. He proposed oriented smoothness constrain in  which used the variation of gray value to adjust the value of smoothness parameter. However, this method fails in illumination insufficient scenes and illumination changes scenes. In , a smoothness weight selection method was proposed; the method uses a blurring operator to calculate the weighted distance, but the spatially varying character needs to be adjusted before it can be applied to variational optical flow model. In , a method combining the maximum likelihood function with the cross-validation function was applied to estimate the smoothness parameter, but the method cannot be used in large displacement scenes. A data-driven method based on estimated risk in  was used to select the smoothness parameter, but the method is computationally expensive, especially with respect to robust data terms. A Bayesian framework is presented in  to estimate the variational model parameters by minimizing the objective function. In , an average data constant error criterion based on optimal prediction theory was proposed to dynamically select the smoothness parameter, but the method is limited to linear rigid motion. Kyong et al. applied adaptive convolution kernel function in  to address oversegmentation or oversmoothing problem, but when low contrast regions exist, the adaptive strategy would be invalid. In , weighted root mean square error criterion was used to estimate the smoothness parameter. The method estimates an initial smoothness parameter by visual perception. Then, search procedures were applied to obtain the optimal parameter around the initial smoothness parameter with the least root mean square error. However, the method has to try many times to find the optimal smoothness parameter, and the same smoothness parameter would be applied in the whole image; when uneven illumination occurs in an image, the method would not be suitable.
In recent years, many other improvements for the variational optical flow model were proposed: TV-L1-based methods [28, 29], deep learning-based methods [30–32], and filtering based methods [33–35], but none of them focus on smoothness parameter.
3. The Flowchart of the Algorithm
The flowchart of the algorithm is in Figure 2: first, the whole image was segmented into several superpixel regions by SLIC and LFM method; then, for each superpixel region, the image quality parameter is calculated; then, a neural network model is designed to use image quality parameter determining the optimal smoothness parameter; the neural network model should be trained by pairs of image quality parameters and optimal smoothness parameter firstly. At last, DFOVOFM and optimal smoothness parameter was used to estimate the optical flow for image sequences.
4. The Dual Fractional Order Optical Flow Model
The dual fractional order optical flow model (DFOVOFM)  is a fractional order version of the HS model in which the data term and smoothness term of the HS model are reconstructed by fractional order derivative. The predominance of DFOVOFM is that the model integrated the variation characteristics around the target points, which not only retains the edge and texture details, but also eliminates the influence of subtle noise, so as to improve the robustness of the model to environmental changes in the process of optical flow estimation.
The DFOVOFM is defined aswhere represents the energy function of the optical flow model, is the optical flow vector, and are the components of optical flow vector on axis and axis , respectively. , , and denote the fractional order derivatives of brightness function on axis. represents the order of fractional order derivative. , , , and denote the order derivatives of optical flow components , on axis, respectively, denotes a neighborhood of point , is the smoothness parameter, and .
5. The Superpixel Segmentation
The image quality parameters in a local region are considered to be the same if the RGB values of pixels are same or similar. Therefore, we use the RGB value and position of each pixel to divide the image into several subregions and then calculate the quality parameters of each subregion. Simple Linear Iterative Clustering (SLIC)  algorithm with five-dimensional space is applied in this paper for image segmentation.
Firstly, the whole image is segmented into subregions of equal size; then the coordinate and RGB value of center points are reserved. The distance between two adjacent center points is , where represents the pixel number in the image.
Secondly, the Euclidean distance of RGB value and position information between the center point and its adjacent points in region is calculated. A weight should be added to because the size of subregion may be different.where is a compensation factor; after that, the values of adjacent pixels in the region are ranked from small to large. The new superpixel region is composed by the first pixels; then the center point of every new superpixel region is renewed.
We iterate the second procedure until the average distance between and is less than threshold . and represent the coordinates of the center point of the kth iteration and k + 1th iteration, . The iteration will terminate when the following equation is true:
However, according to the conventional SLIC, even if the RGB value is very similar to the center point, the edge points of the superpixel region will be misclassified. In order to correct these erroneously segmented pixels, the local membership function  is applied to the edge points:where represent the proportion of pixels belongs to which superpixel region. We reclassify the pixel to the superpixel region of the maximum . is the number of superpixel region around the pixel and , is the brightness function of a pixel, and is the mean of brightness value of pixels in a superpixel region.
Figure 3 shows the segmentation result of Figure 1(a). Figures 3(a) and 3(b) are the results of original SLIC and improved SLIC, respectively. It can be seen that some of the edge points of superpixel regions have been classified correctly after processing by the local membership function.
6. Image Quality Parameter
Image quality parameter is the quantification of human visual perception of an image and it is one of the basic steps in image processing field. Many image processing tasks can be performed well if the image quality is evaluated accurately. Recently, many image quality parameters are proposed such as entropy , brightness , colorfulness , sharpness , and contrast . Author in  proposed a Color Root Mean Square Enhancement (CRME) method for color image as the extended version of the gray scale Root Mean Enhancement (RME). K. Panetta in  proposed a Color Quality Enhancement (CQE) method which measured the color image quality by the combination of colorfulness, sharpness, and contrast.
Keeping in consideration the aforementioned image quality parameters and the characteristics of the variational optical flow model, we designed more comprehensive image quality parameters which include CRME, colorfulness, sharpness, and contrast in CQE and signal-to-noise ratio in our work.
The Color Root Mean Enhancement (CRME) is designed to not only measure the contrast for each color plan, but also measure the difference between the color cube center and the surrounding.
To calculate CRME, the image is segmented into blocks, is the center of the block , and is the mean of the intensity in block . is the weight of the color plane, according to NTSC (National Television System Committee) standard, and , , and .
Contrast refers to the measurement of different brightness levels between the brightest white and the darkest black in an image. The larger the difference between the brightest and the darkest, the greater the contrast.
To calculate contrast, the image is segmented into blocks, and are the max and min intensity value in block , and , , and .
Sharpness is responsible for the presentation of the details, edges, and textures in the image. For example, in the case of high sharpness, not only are the wrinkles and spots of the face on the picture clearer, but also the bulge or depression of the facial muscles can be identified.
To calculate contrast, the image is segmented into blocks, and are the max and min intensity value in block , and , , and .
Color is the most sensitive formal element that can cause our common aesthetic pleasure. Color is one of the most expressive elements, because its nature directly affects people’s feelings.where , , is the pixel number of a superpixel region, , ,
6.5. Signal-to-Noise Ratio (SNR)
SNR is a physical quantity that represents the degree of local image region affected by noise and distortion. SNR is defined aswhere indicates the mean of and is the corresponding standard deviation.where and are the gradient of brightness function for axis, respectively. In general, the higher value of SNR results in better image quality.
7. The Calculation of Smoothness Parameter
7.1. The Neural Network Model
The input of the model is composed of five nodes which represent five image quality parameters and the output of the model has only one node (the corresponding smoothness parameter). After numerous attempts, we finally determined the number of hidden layers and the number of nodes in each layer: five hidden layers: the first layer has 10 nodes, the second layer has 20 nodes, the third layer has 20 nodes, the fourth layer has 10 nodes, and the fifth layer has 5 nodes. The sigmoid function was chosen as the active function. Softmax was chosen as the loss function.
We initialize the weight parameters using the Gaussian distribution with mean value of 0 and variance of 0.01, and then the weight parameters are adjusted by gradient descent method and backpropagation function. The self-built database is used as the training sample, the batch normalization training model is used, and the random deactivation is used to avoid the overfitting of the model. The momentum value is selected as 0.99, the weight penalty item is set as 0.01, and the learning rate is 0.02.
7.2. Self-Construction Dataset
Twelve classes of database are used to construct the training set that are Middlebury database, the MPI_Sintel_final database, MPI_Sintel_clean database, KITTI database, and image sequences of eight outdoor videos with different conditions. It is well-known that higher number of training samples results in higher accuracy of the model. In our model, by increasing the number of samples, the accuracy of model also increases smoothly until it reaches 98.5%, but the accuracy increases slightly when the number of samples exceeds 300. Due to this, we use 300 samples for each class. For each sample, i.e., image sequence, the image quality parameters are calculated and the optimal smoothness parameter is obtained by ; then they are used to train the neural network model. Table 1 shows some of the training data.
8. Experimental Result and Analysis
Three public databases: Middlebury , MPI_Sintel , KITTI  and several outdoor videos are used in these experiments. Our proposed algorithm is compared with DFOVOFM , TV_L1 , HAST , MDP_Flow , PH_Flow , and WRMS based smoothness parameter selection method . All the simulation results are performed on MATALB 10.0 with system configuration of Windows 7, Intel 3.3 GHZ, and 16 GB RAM.
The intention of these experiments is to show the superiority of our OSP method.
In Figure 4 and Table 2, we applied DFOVOFM with different fixed smoothness parameter and our OSP method to show the relationship between the image content and optimal smoothness parameter, and our OSP method has the highest accuracy. In Figure 5 and Table 3, we compared DFOVOFM + OSP with other classical optical flow models; we can see our model is excellent in illumination uneven scenes, illumination insufficient scenes, and illumination changes scenes.
Figure 4 shows the optical flow estimation results for DFOVOFM with different smoothness parameters. From up to down in Figure 4, there are Army and Grove in Middlebury, Market_4 and Ambush_3 in MPI_Sintel, and two outdoor videos V1 and V2mmc5. First three columns from left to right represent different smoothness parameters having value equal to 3, 10, and 20, whereas fourth, fifth, and sixth column represent the WRMS based smoothness parameter selection method, our proposed method, and Ground Truth. It can be seen from Figure 4 that performance of each database changes by varying the value of . We get better results in Middlebury database when . However, MPI_Sintel final database performs better when , while results in outdoor videos are better when = 20; we can see that different image quality would have different optimal smoothness parameter. Apparently, it can be clearly seen that our proposed method achieves better results than existing methods for all databases. Our method shows clearer contour than other methods such as gun in the soldier’s hand in sequence Army, the end of the branches in Grove, Monster’s tail in Market_4, and the contour giant head and waving stick in Ambush_1. In the case of the outdoor videos V1, V2, the result of our method has most complete outline and least error regions caused by illumination changes. Table 2 shows the error rate (AEE/AAE, Average Endpoint Error/Average Angular Error) for the DFOVOFM with different smoothness parameters in Middlebury database. The red with bold font letter indicates the minimal error rate for fixed while black with bold font letter shows the error rate of our method. It can be seen in Table 2 that the value of affects the accuracy of the optical flow estimation and the error rate of our method is the lowest.
Figure 5 shows the optical flow estimation results for different algorithms in MPI_Sintel database. Each row from top to bottom represents different image sequences that are Market_1, Cave_3, Shaman_1, Ambush_1, and Market_3, whereas each column from left to right shows result of different models that are HAST, MDP_Flow, PH_Flow, our DFOVOFM + OSP, and Ground Truth. It can be seen in Figure 5, compared to other algorithms, the predominance of our method is the robustness in the illumination changes and insufficient illumination scenes, such as Shaman_1; we can see intact contour and bright color for our optical flow estimation result but in other algorithms, the gusset parts are missing. Meanwhile, illumination changes happened on the right side of the running girl in Market_1 which caused big optical flow estimation error in the results of HAST, MDP_Flow, and PH_Flow and our method is the most accurate one. However, our algorithm has a drawback that it does not show the movement of tinny details and some texture details are missing that can be seen in the end of the cone in Cave_3 and the beard of the black man in Ambush_1.
Table 3 shows the average AEE/AAE of different algorithms in MPI_Sintel database, where s0–10, s10–40, and s40+ represent the largest displacement that are less than 10 pixels, between 10 and 40 pixels, and exceeding 40 pixels, respectively. We can see in Table 3 that the accuracy of our method is higher than HAST while little lower than MDP_Flow and PH_Flow, because some sequences in MPI_Sintel include large displacement and occlusion problems due to which the accuracy of our method is slightly less than its components.
Figure 6 shows the optical flow estimation results for different algorithms in KITTI database. Each row from top to bottom represents different image sequences that are 00004, 00009, 00011, 00014, and 00015 which include illumination inhomogeneous scenes, whereas each column from left to right shows result of different models that are TV_L1 + WRMS, DFOVOFM + WRMS, TV_L1 + OSP, and DFOVOFM + OSP. We can see that results of both TV_L1 + OSP and DFOVOFM + OSP are better than DFOVOFM + WRMS and TV_L1 + WRMS in all five image sequences. Table 4 shows the AEE of the four optical flow estimation methods for 00004, 00009, 00011, 00014, and 00015 image sequences in KITTI database; AVE means the average of AEE of 50 image sequences. It can be justified that our OSP methods have higher accuracy than WRMS methods.
Figures 7 and 8 show the experimental results of the outdoor videos V3 (illumination changes and inhomogeneous regions) and V4 (illumination changes and insufficient regions). First two columns of these figures are input image sequences while remaining columns from left to right show the results of HAST, MDP_Flow, PH_Flow, TV_L1 + WRMS, DFOVOFM + WRMS, TV_L1 + OSP, DFOVOFM + OSP, and manual segmentation result, respectively. Results obtained by HAST technique as depicted in Figures 7(c) and 8(c) have relatively lesser error regions, but the object outline is not intact and several parts of human body are missing. For MDP_Flow method, we can get better intact human body in Figures 7(d) and 8(d), but the results contain more error regions meantime. We can see that DFOVOFM model achieves better results than other models while DFOVOFM + OSP produces even more better results with most intact object outline and least error regions. Comparing WRMS methods in Figures 7(f), 7(g), 8(f), and 8(g) with our OSP methods in Figures 7(h), 7(i), 8(h), and 8(i), we can see our OSP method achieves better optical flow estimation.
An adaptive smoothness parameter method for variational optical flow model is proposed in this work to solve a low matching rate problem of smoothness parameter and the image content. The main idea of this method is to assign different smoothness parameters for subregions with different image quality parameters. These subregions can be obtained by combining SLIC with LMF while smoothness parameters are calculated by image quality parameters (CRME, contrast, sharpness, colorfulness, and signal to noise ratio) and self-designed neural network model. Simulation results validate that, in comparison to the WRMS method, our method achieves better results especially in scenes with different image quality regions. Furthermore, combining with DFOVOFM, compared to other optical flow estimation methods, the method shows better performance in illumination abnormal scenes such as illumination changes, illumination insufficiency, and illumination inhomogeneity.
The images used in this study are publicly available.
Conflicts of Interest
The authors declare that there are no conflicts of interest regarding the publication of this study.
This research was supported by the National Natural Science Foundation of China (Grant nos. 71771080, 71790593, and 71988101).
S. Muthu, R. Tennakoon, T. Rathnayake, R. Hoseinnezhad, D. Suter, and A. Bab-Hadiashar, “Motion segmentation of RGB-D sequences: combining semantic and motion information using statistical inference,” IEEE Transactions on Image Processing, vol. 29, no. 12, pp. 5557–5570, 2020.View at: Publisher Site | Google Scholar
Z. Chen, H. Jin, and Z. Lin, “Large displacement optical flow from nearest neighbour fields,” in Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition, pp. 2443–2450, IEEE, Portland, December 2013.View at: Google Scholar
H. Li, J. Xu, and S. Hou, “Optical flow enhancement and effect research in action recognition,” in Proceedings of the 2021 IEEE 13th International Conference on Computer Research and Development (ICCRD), pp. 27–31, Beijing, China, January 2021.View at: Google Scholar
X. Xing, Y. Yongjie, and X. Huang, “Real-time object tracking based on optical flow,” in Proceedings of the 2021 International Conference on Computer, Control and Robotics (ICCCR), pp. 315–318, Shanghai, China, January 2021.View at: Google Scholar
V. Solo, “A sure-fired way to choose smoothing parameters in Ill-conditioned inverse problems,” in Proceedings of the 3rd IEEE International Conference on Image Processing, pp. 89–92, IEEE, Lausanne, Switzerland, September 1996.View at: Google Scholar
L. Ng and V. Solo, “A data-driven method for choosing smoothing parameters in optical flow problems,” in Proceedings of the IEEE International Conference on Image Processing, pp. 360–363, IEEE, Santa Barbara, USA, March 1997.View at: Google Scholar
K. Kai and R. Mester, “Bayesian model selection for optical flow estimation,” in Proceedings of the 29th DAGM Conference on Pattern Recognition, pp. 142–151, Springer, Berlin, Heidelberg, September 2007.View at: Google Scholar
J.. , L. Kyong, D. Kwon, and I.. , D. Yun, “Optical flow estimation with adaptive convolution kernel prior on discrete framework,” in Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2504–2512, IEEE, San Francisco, USA, June 2010.View at: Google Scholar
P. Kumar, “A duality based approach for fractional order Tv-model in optical flow estimation,” in Proceedings of the 5th International Conference on Computing, Communication and Security (ICCCS), pp. 1–5, Patna, India, October 2020.View at: Google Scholar
A. Geiger, P. Lenz, and R. Urtasun, “Are we ready for autonomous driving? the KITTI vision benchmark suite,” in Proceedings of the IEEE Conference Computer Vision and Pattern recognition, pp. 3354–3361, IEEE, RI, USA, October 2012.View at: Google Scholar
J. Yang and H. Li, “Dense, accurate optical flow estimation with piecewise parametric model,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1019–1027, Boston, USA, October 2015.View at: Google Scholar