Abstract

Blur detection (BD) is an important and challenging task in digital imaging and computer vision applications. Accurate segmentation of homogenous smooth and blur regions, low-contrast focal regions, missing patches, and background clutter, without having any prior information about the blur, are the fundamental challenges of BD. Previous work on BD has emphasized much effort on designing local sharpness metric maps from the images. However, the smooth/blurred regions having the same patterns as sharp regions make them problematic. This paper presents a robust novel method to extract the local metric map for blurred and nonblurred regions based on multisequential deviated patterns (MSDPs). Unlike the preceding, MSDP extracts the local sharpness metric map on the images at multiple scales using different adaptive thresholds to overcome the problems of smooth/blur regions and missing patches. By using the integral values of the image along with image masking and Otsu thresholding, highly accurate segmented regions of the images are acquired. We argue/hypothesize that the local sharpness map extraction by using direct integral information of the image is highly affected by the threshold selected for distinction between the regions, whereas MSDP feature extraction overcomes the limitations substantially by using automatic threshold computation over multiple scales of the images. Moreover, the proposed method extracts the relatively accurate sharp regions from the high-dense blur and noisy images. Experiments are conducted on two commonly used SHI and DUT datasets for blur and sharp region classifications. The results indicate the effectiveness of the proposed method in terms of sharp segmented regions. Experimental results of qualitative and quantitative comparisons of the proposed method with ten comparative methods demonstrate the superiority of our method. Moreover, the proposed method is also computationally efficient over state-of-the-art methods.

1. Introduction

With the exponential growth of digital image capturing devices, i.e., DSLR cameras, cellphone cameras, wearable cameras, etc., we have witnessed a massive collection of digital photos captured and uploaded on social media on a daily basis. A good quality photo must be sharp and not contain any degradation such as noise and blurred regions. As for many applications, we need to highlight the target object from the images. To make the object highlighted, many techniques are being used nowadays, e.g., using high-definition camera sensors, adding blurriness to the background objects, etc. However, the high-definition image sensors impose blurriness in the background of the images to make the foreground objects more prominent. Consequently, blurriness is being used as an editing effect that is added purposely in modern-day image capturing.

Image blurriness can be categorized into motion blur and defocus blur. The motion blur can occur due to two potential reasons: (a) when you try to capture the moving objects and (b) camera motion either intentionally or unintentionally, whereas the defocus blur usually occurs due to special effects used by the photographers to highlight the focus and out-of-focus regions in the image. It is a visual effect added by the photographer using highly sophisticated techniques to make the target object sharp and the rest of the image blur. An image contains useful information that can be used in various computer vision and image processing applications, i.e., background tracing, text retrieval, image retrieval, person authentication, etc. However, blur affects the contrast and sharpness details of the image that made the retrieval of information challenging. Similarly, photos are used as the key evidence in a criminal investigation, where it can be very challenging to extract the immersed information about the target object(s) in the presence of high-density blurred regions. For this purpose, we need to classify the image into blur and nonblur regions initially. The information of the objects lying in the nonblur regions is more reliable in comparison with those in blurred regions, as the information can be distorted in blur regions like an increase of edge thickness. Thus, blur detection and sharpening are required for the accurate extraction of the information from the images.

Defocus blur detection (DBD) is the classification problem of intentionally added blur by the highly sophisticated modern-day digital cameras. This classification has been paid substantial attention due to their significant potential applications, e.g., object detection [1], image segmentation [2], object augmentation [3], etc. Defocus blur affects the information and the sharpness details of the image making it more challenging for objects/regions detection. The DBD without having any prior information about the blur densities, blur type, or sensor settings of the camera is a challenging task.

Existing blur image detection approaches are divided into two categories: single image detection and multi-image detection. For multi-image detection, the knowledge of the blur densities, type, sensor information of the cameras, and other additional information is required [4]. In contrast, a single image can be split into sharp and blur regions without having any prior information about the blur and the device used to capture that image [5, 6]. Moreover, the existing approaches presented for DBD can be categorized into frequency-based [713], depth-based [1417], or local sharpness metric map-based for segmentation of blur and nonblur regions [6, 1821]. In [22], Zhu et al. used the local coherent map generated by the evaluation of gradient fields of the local spectrum. However, the use of flat areas information and color edges is not enough for accurate DBD detection. In [23], Chakrabarti et al. proposed a Point Spread Function (PSF) using the local frequency analysis to obtain the segmented map for DBD. This method has the limitation of generating erroneously labeled regions of the image. Su et al. [24] presented a method called singular value decomposition (SVD) based on single thresholding on image features to detect the blurred and nonblurred regions. Similarly, Xiao et al. extended the single threshold SVD into multiscale SVD in [8]. The fusion-based method is used to overcome the smooth/blur region problems from the images. In [25], Golestaneh estimated the level of blurriness at each location using a method called high-frequency multiscale Fusion and Sort Transform (HiFST) based on gradient magnitudes.

Depth-based methods [9, 1417] also proved to be effective in defocus blur detection using the information about the blur densities and blurry edges. In [9], Liu et al. presented different local features, i.e., association congruence, saturation, gradient histogram, and power bands to specify the type of blur from the images. In [26], a cross ensemble network is used along with a smaller defocus detector for diversity enhancement. However, this approach is computationally expensive and unable to differentiate the nonblur regions accurately in the presence of smooth regions. Furthermore, DBD measurements such as local variance, higher-order statistics, and variance of wavelength coefficient are also used in DBD with images containing a narrow depth of field (DOF) [27].

Most of the algorithms [6, 18, 19, 21] used the local sharpness metric approach for the detection of blur and nonblur regions. The local sharpness metric is like a filtering method such as energy function estimating the results based on the responses of blur energy from images. The low energy indicates the blur region, whereas the high energy represents the sharp region. In [12], Shi et al. introduced peculiar sharpness features, gradient histogram, and kurtosis span for DBD of local image regions. This method is unreliable and causes problems in the accurate detection of blur and sharp regions due to smooth homogenous regions. Zhu et al. analyze the blur from the images using PSF [22]. This method is unable to perform well in lightly blurred regions. In [21], Local Binary Patterns (LBP) are taken into consideration for defocus blur detection. The local metric map generated using the LBP segments the blur and nonblur regions but is unable to perform well in the presence of noisy images even in sharp regions. In our prior method [18], we proposed the Local Directional Mean Patterns (LDMP) to overcome the limitation of the sharp metric map in noisy situations. However, our prior method [18] is unable to detect simultaneous smooth blur and low-contrast regions.

Recently, deep learning methods have been heavily employed in various computer vision and image processing applications, i.e., saliency detection [1], semantic segmentation [2], automatic shadow detection [28] airplane detection using remote sensing images [29], ship detection from real-time images [30], vehicle detection [31], etc. The significance of deep learning algorithms is proven to be effective for defocus blur detection and segmentation, however, at the expense of increased computational cost. In [32], Kim introduced a deep learning method based on a convolution neural network (CNN) for the detection of sharp and blur regions of the image. The multiscale reconstruction loss function was used for the segmentation of blur regions. In [33], Park et al. encounter the DBD problem with patch level detection based on CNN. Unfortunately, the patch level DBD led to suppression in low-contrast regions. Tang et al. [34] proposed a Deep Neural Network (DNN) based technique Diffusion Network (DNet) that fused the refined features extracted by the networks to obtain the segmented blur and sharp regions. In [35], the author introduced global context-guided hierarchically residual feature refinement network “HRFRNet.” The hierarchical features are used to enhance the final outcomes. Furthermore, a deep-guided fusion module is used for the refining process.

Accurate DBD has initiated extensive research interest from the last few years. However, it is still a significant yet challenging computer vision problem. Although the aforementioned techniques can detect the defocus blur and nonblur regions, however, these approaches fail in certain cases, i.e., the presence of smooth blurry regions, missing sharp patterns, low contrast, etc. The DNN methods performed well for DBD, but all of these methods are computationally more complex and required high computational resources, i.e., GPUs, memory, etc. We aim to develop a robust method for DBD that can effectively extract the sharp targeted regions of the image in the presence of noise, smooth/blur, and low-contrast images. To address the aforementioned problems, we propose an efficient and robust Multisequential Deviated Patterns (MSDP) for accurate sharp region extraction from images at multiple scales. The extracted multiscale sharpness maps are further fused to get the refined map of the image. We used Otsu thresholding to segment the extracted sharp region into comparable binary representation. The proposed method is efficient due to using the local integral values of the images directly instead of using time-consuming matting approaches used by the preceding methods to segment the binary images. The major contributions of the proposed work are as follows:(i)We propose efficient and robust multisequential deviated patterns for accurate blur detection from high-density blur and noisy images(ii)For feature computation, we extract the sharpness metric using adaptive thresholding on multiple image scales to overcome the smooth blur and missing sharp region problem of manual thresholding(iii)For image segmentation, we fused the multiscale sharpness maps extracted from MSDP along with the image masking(iv)Rigorous experiments were performed against several state-of-the-art methods over the latest DUT and SHI datasets to prove the effectiveness of the system

The rest of the paper is organized as follows. Section 2 presents the details of the proposed method. Section 3 provides the discussion on results of different experiments conducted to evaluate the performance of our method. Finally, Section 4 concludes our work.

2. Methods

This paper presents a novel method based on the integral use of the image, which detects the blur and sharp regions from high-dense blurry and noisy images. The proposed method used the local window of different sizes based on the input image scale to extract the sharp regions in the image. Firstly, the image is divided into three different scales , and . Secondly, MSDP is used to extract a sharpness map from each scaled image using an adaptive threshold. Lastly, image masking is used over the extracted sharpness maps, and fusion is performed to produce a more accurate single sharpness map. Next, Otsu thresholding is used to retrieve the accurate binary images for comparison with state-of-the-art methods. The flow of the proposed method is shown in Figure 1.

2.1. Preprocessing and Image Scaling

The extraction of features using integral values of the image is challenging due to the influence of different factors, i.e., color, camera, object detail, etc., on the values. For accurate extraction of sharp regions, we used the RGB input image consisting of sharp and blur regions. First, we convert the RGB color image into grayscale and apply the two-dimensional median filtering to reduce the noise.where and denote the sharp and blur pixel values of the image, and represents the grayscale image. represents the 2D median filtered image with reduced noise obtained after applying the 2D median filtering function (). Next, we represent the image into three different scales (, and ) after employing the .

2.2. Multisequential Deviated Patterns (MSDPs)

Selecting an appropriate threshold for integral extraction is a complicated task. For example, the use of a high threshold in local feature extraction leads to exclusion of low/lesser sharp regions, whereas the selection of low threshold value causes the inclusion of additional useless details in the features, i.e., background object and noise, etc. The relationship of high and low thresholds in integral feature extraction is shown in Figure 2. For MSDP maps, we computed the upper and lower patterns of the image. In integral extraction of features, the three-level thresholding is proven effective [18] as compared to the two-level thresholding [21]. Moreover, two-level thresholding is not effective in the presence of a high density of noise in the images. Therefore, we employ 3-level thresholding for the extraction of sharp integral upper and lower features of the images. We extract the upper and lower features as follows:where and represent the 3-level thresholds and integral values of the pixels, and denote the total number of integral and mean values, and and are the neighboring and center pixels of the image windows, while represents the adaptive threshold, which is calculated by the sequential deviation of the existing window as shown in equation (2). Instead of adopting the threshold selection approach of the existing methods, we computed an adaptive threshold for each region of the image by rotating the extraction window over the image. The adaptive threshold is computed automatically by adding and subtracting the center pixel value of the window along with the standard deviation as shown in equation (3). An adaptive threshold based on and is responsible for the extraction of sharp pixels from the images; i.e., high threshold value leads to highly sharp regions, and low threshold value leads to the inclusion of noise and other unwanted content. In contrast, the preceding methods mostly used a hard-coded threshold value in their algorithms, which makes these methods unable to effectively extract the low-dense sharp regions locally [18, 21]. Consequently, we applied three-level thresholding with an adaptive threshold for the extraction of three values including 1, −1, and 0 from the image. For instance, as shown in Figure 3, if a 3 × 3 window is used over the image integral values having a central pixel value () of 23 with the deviation of the neighboring pixel () of 10.2, then the range of extraction lies between 13.2 and 33.2, which is shown as in equation (3). For three-level value extraction from the image, the neighboring pixel in the window lies between 13.2 and 33.2 and is converted into “0,” whereas the integral value of 1 is assigned to values greater than the threshold (33.2 in Figure 3), and −1 is assigned to the values below the threshold (13.2 in Figure 3). After replacing the integral values of the images, we obtained , which contains three-level values comprising 1, 0, and −1. The overall extraction of three-level values is shown in Figure 3. For instance, the image with window from the integral values of pixels where represents the 3 × 3 window having 9 values is shown in equation (4).where denotes the dimension of the rotating window. In order to compute the sharp region, the upper and lower features of the image have to be computed separately. However, the proposed extracts the combined features of the image. Therefore, to reduce the noise in the images, the three-level extracted patterns are further converted into two levels, i.e., upper and lower image patterns. For two levels of extraction, we need to replace the negative values from the obtained . For extracting the upper patterns of the image, we converted all −1 values from into 0. And for lower patterns, 1 and −1 are replaced with 0 and 1 as shown in Figure 3. Ultimately, the resultant upper and lower patterns of the image are converted into binary bit streams using equation (5) that are further represented into their equivalent decimal values.where denotes the upper features of the image, and represents the lower features. The window is used to obtain the decimal values of upper and lower patterns using equation (6), where denotes the size of the window varying for each scale of the image. At last, we need to pick the sharp and blur patterns smartly by retrieving only the sharp pixels and neglecting the blur once. The proposed method uses the deviation of the pixels twice to observe the change in the patterns of blur and sharp regions. We computed the standard deviation of two-level patterns consisting of the upper and lower patterns of the image obtained from the last step. For this purpose, first, we have to convert the binary two-level patterns into their equivalent decimal numbers by using the window . All the neighboring pixels values in the window covert into their equivalent decimal number as follows: and are the upper and lower patterns of equivalent decimal numbers from the extracted two-level binary patterns from equation (5). After that, we computed the deviation of the neighboring pixels again with decimal values of the upper and lower patterns computed from equation (6). The values higher than the deviation () are considered as the sharp region values and retained, while the rest are neglected for being the blurry region. The extraction of Multilayered Sequential Pattern is shown in equations (7) and (8).where and represent the deviation value of upper and lower patterns of the image, respectively. Finally, the sharpness map of sharp regions is extracted as shown in equation (8). Moreover, we extracted three sharpness maps for each scale of the image (, and ) as shown in Figure 4.

2.3. Image Masking

The extracted features contain only the sharp regions of the images at three scales , and . However, there are some regional variations among the images at all scales caused by the adaptive thresholding and different dimensions. The main reason for this variation is missing regions among the extracted maps. The empty integral values in the patterns usually occur due to the noise or background objects that ultimately lead to missing regions in the image. We applied a morphological operation on this image to fill the holes and gaps between the pixels. More specifically, we employed a binary filling operation to fill the gaps and holes using equation (9). Similarly, this filling process is applied at every scale (, and ) of the image. In addition, we created an image mask containing sharp regions of the image before applying the Otsu thresholding for segmentation. For this purpose, we applied the Otsu global thresholding to select the sharp regions from the obtained image.

2.4. Multifusion and Otsu Thresholding

In the last phase, we fused the scales (, and ) into a single sharpness map and employed the Otsu thresholding [36] for binarization of the images. The masked image obtained from the previous phase is converted into a segmented binary image of the sharp and blur regions. The variance-based thresholding is used along with linear discriminant principles to segment the target object (foreground) from the heterogeneous and diverse background regions. The threshold for segmentation in Otsu thresholding is based on the variation of the integral values. The extracted patterns in are divided into two classes, i.e., sharp region and the background blur region. The global threshold is computed according to the variance of the classes. The regions with values higher than the threshold value are selected, whereas the regions below the threshold value are ignored. Finally, the highly segmented binary image with the sharp object as foreground and the black background is obtained as follows:where and are the integral values of sharp and blur region classes separated by a threshold , whereas and denote the variance of the classes. Overall, we extracted the MSDP from images at multiple scales (, and ) using windows of multiple sizes (i.e. 3 × 3, 5 × 5, 7 × 7). The reason behind the extraction of the same at different scales (, , ) is to overcome the missing region problem from the extracted sharp regions. The sharpness map is extracted from single image containing some missing areas inside the sharp regions, whereas the extraction of sharpness map over different scales overcomes this problem to some extent as discussed in Section 4. The intensity of the blur varies in each region of the image; i.e., some regions are highly affected by the blur, whereas regions far away from the sharp objects are less affected. Therefore, the extraction of the integral value is highly affected by the selected threshold value for classification of pixels as sharp and blur. We employed an adaptive threshold calculated from the deviated ratio of the patches along with the central pixel values of the regions. The combination of local central pixel values of the regions and the overall deviation between the pixels for thresholding make our method robust in the extraction of the sharp regions from the blurry images. The overall computation process at multiple scales of the image is shown in Figure 4.

3. Experimental Results

This section provides a discussion on the results of different experiments performed to measure the performance of the proposed method. We have provided a detailed comparison of qualitative and quantitative results along with the analysis of the computational complexity of our method. The details of the datasets and evaluation metrics are also presented in this section.

3.1. Datasets

The performance of our method is evaluated on two standard and commonly used datasets (Shi and DUT) for blur detection. Shi dataset [12] is the first public dataset collected and evaluated for blur detection [12]. This dataset [12] is used by almost all state-of-the-art descriptors to show the effectiveness of the methods. Shi dataset [12] contains 1000 blur images, where 704 images are partially blurred using defocus, and the rest of the images contain the motion blur. Additionally, the manually annotated ground truth images are also available along with the blur images. Most of the images in the Shi dataset are of 640 × 427 resolution.

DUT dataset [37] is the second commonly used publicly available dataset consisting of 500 images with defocus blur. Similarly, this dataset is also provided with manually annotated ground truth images. In comparison with the Shi dataset [12], the DUT dataset [37] is more challenging for blur detection and segmentation for various reasons; i.e., many of the images contain homogeneous smooth blur regions, have cluttered background, and are low in contrast images.

3.2. Evaluation Metrics

Three standard and commonly used metrics including precision, recall, and F1-score are used to evaluate the performance of the proposed method. We selected these metrics as adopted by the comparative methods for performance evaluation. We calculated the precision and recall aswhere represents the pixels within the detected sharp areas, and corresponds to the pixels in the manually annotated ground truth image. Similarly, F1-score is computed aswhere and denote the precision and recall of the proposed method.

3.3. Performance Evaluation of the Proposed Method

The objective of this experiment is to evaluate the effectiveness of the proposed method for blur detection on two diverse datasets. For this purpose, we computed the results of our method on the images of Shi [12] and DUT [37] datasets separately and reported the results in Table 1. In the case of Shi [12] dataset, the proposed method dominates in every comparison, i.e., quantitative analysis, qualitative analysis, and PR curve. However, in DUT [37] dataset, the precision of the proposed system slightly deteriorates. The DUT [37] dataset is more challenging than Shi dataset [12] due to exceeding homogenous and low cluttered regions. The homogenous regions are always difficult to locate in the local extraction of the regions. Although the precision of the proposed system is a little low, however, the other quantitative results (recall, F1-score, and computational cost) prove the effectiveness of the method.

3.4. Performance Comparison of the Proposed and Existing Methods

The objective of this experiment is to measure the robustness of the proposed method for blur detection over state-of-the-art methods. For this purpose, we have provided both the qualitative and quantitative analyses of the proposed and comparative approaches.

3.4.1. Qualitative Comparative Analysis

This experiment is designed to show the qualitative analysis of the proposed and comparative methods on Shi [12] and DUT [37] datasets. The visual quality of processed images is presented in Figures 5 and 6 to show the effectiveness of the proposed method over state-of-the-art methods. From the images depicted in Figures 5 and 6, we can observe that the proposed method can detect highly accurate results from Shi [12] and the DUT [37] datasets. Specifically, in Figure 6, we can see that images of DUT dataset [37] contain more homogenous smooth regions and low local cluttered regions that are effectively classified using our proposed method.

3.4.2. Quantitative Comparative Analysis

This experiment is designed to evaluate the performance of our method in terms of quantitative analysis. For this purpose, we used three standard metrics, i.e., precision, recall, and F1-score, to measure the performance of our system against the comparative methods. The precision-recall (PR) curve is calculated on the results of the proposed method from Shi [12] and DUT [37] datasets and presented in Figures 7 and 8, respectively. A separate PR curve comparison is shown for Shi [12] and DUT [37] datasets. The PR curves demonstrate that our method consistently outperforms all the comparative methods. On the other hand, our method effectively addresses the problems of homogenous smooth and low local cluttered regions in the images of a more challenging DUT dataset [37]. The PR curve of the proposed method on DUT [37] dataset continuously dominates throughout the period as shown in Figure 7.

Additionally, the F1-score is measured and compared for both the Shi [12] and DUT [37] datasets as shown in Figures 9 and 10. Although DNet [34] achieves almost comparable results as of the proposed method, however, our method is computationally very efficient over the DNet method [34] as shown in Table 2. This comparative analysis illustrates the superiority of the proposed method for defocus blur detection over the comparative approaches.

3.4.3. Computational Cost Analysis

Although DFD methods must be effective in terms of detecting blurred regions from the images, however, producing such accurate results in a minimum time is also crucial, especially in real-time applications. This experiment is designed to evaluate the efficiency of the proposed and comparative approaches for defocus blur detection. The results of this comparative analysis of time complexity for both the detection and segmentation are provided in Table 2. The proposed method not only dominates in terms of accurate blur detection over the state-of-the-art methods, but also executes exceptionally fast and computationally very efficient over comparative approaches. From the results, we can clearly observe that the proposed method has the 2nd lowest computational cost after LBP [21] as compared to all comparative methods. More precisely, [21] performs the best by achieving 5 seconds to segment an image into blur and sharp regions, whereas our method performed second best and achieved the time complexity of 7 sec. On the contrary, [22] performed the worst by taking the highest computational cost of 12 minutes. The main reason behind achieving such low time complexity of our method is the direct extraction of the local sharpness metric and removal of the time-consuming matting procedures used by several competitive methods. Moreover, image masking and Otsu thresholding used to produce the segmented maps are also very fast in the execution. In our method, the multiscale inference phase consumes the majority of the execution times.

For this comparative analysis, we used the codes of comparative methods that are publicly available along with our own implementation. In the proposed method, the original image was scaled into 256 × 256, 128 × 128 and 64 × 64 dimensions for , and . Additionally, the window size is selected as 3 × 3, 5 × 5 and 7 × 7 for image scale , and respectively. The final values of window size Z for the specific scale Sn are selected after the detailed observations and experiments. Moreover, automatic global thresholding is used in image masking and Otsu thresholding for binarization of the images. We have implemented the proposed and comparative methods on Intel(R) Core (TM) m3-7Y30 CPU @ 1.00 GHz, 1.61 GHz with 8 GB memory system.

4. Discussion

The present study analyzes the findings about the selection of adaptive threshold and the impact of neighboring pixels in local extraction of the image regions. Experimental results demonstrate two facts. First, the global or hard-coded fix threshold value is not reliable for all types of images. Second, the neighboring pixels used to differentiate the integral values have a major impact on the extracted patterns. This is an important finding in the understanding of the local patterns and direct integral extraction from the images. Some sample results from LBP [21] and LDMP [18] are shown in Figure 11 to defend this fact. Figure 11 clearly demonstrates that the comparative methods LBP [21] and LDMP [18] are unable to perform well for many images. Two methods [18, 21] used a similar approach of extraction from local integral values of the images but used either a fixed static threshold or from a specified range. This hard-coded threshold scheme results in performance degradation, as shown in Figure 11 where [21] is unable to detect the regions in the image, whereas the proposed method used an adaptive threshold computed using the deviation between the neighboring pixels and performed very well. Our experiments further reveal that the results of [18, 21] indicate the gaps between the extracted sharp regions, whereas the proposed method produced the filled regions approximately. Extraction of the pixels using a small number of neighboring pixels can cause the overall selection of the region. We have used varying numbers of adjacent neighbors (3 × 3, 5 × 5, 7 × 7) to extract the regions based on the deviation between the neighbors. This provides significantly better results due to the large deviation of the region. Our results indicate that the pixel selection based on a small number of adjacent pixels affects the results, whereas using a large number of adjacent pixels to determine the pixel selection leads to good results. Existing defocus blur detection methods, based on using the integral values directly, fail to operate well for the motion blur. High variation in the integral values makes these methods unable to perform better on the motion blur. We aim to develop a unified method in the future that can effectively detect both the motion and focus blur.

5. Conclusion

We have proposed an effective and efficient method for defocus blur detection problem without having the prior information about blur, camera configuration, pixels densities, etc. The local sharpness metric map is extracted directly from the images at different scales along with different patch sizes of the images. The local deviations between the neighboring pixels are used for the extraction of sharp regions. The automatic image masking and Otsu thresholding provide highly accurate and optimal segmented results over state-of-the-art methods. Our experimental results demonstrated that the proposed method performs far better than many hard-coded threshold-based algorithms. Additionally, the proposed method has a significant speed advantage over several comparative segmentation algorithms, i.e., alpha matting, KNN matting, global matting by GPU implementation, etc.

Data Availability

The datasets used and analyzed in this paper are publicly available.

Conflicts of Interest

The authors declare that there are no conflicts of interest.

Acknowledgments

This work was supported by Education and Research Promotion Program of KOREATECH (2021).