Abstract

Image binarization and segmentation have been one of the most important operations in digital image processing and related fields. In spite of the enormous number of research studies in this field over the years, huge challenges still exist hampering the usability of some existing algorithms. Some of these challenges include high computational cost, insufficient performance, lack of generalization and flexibility, lack of capacity to capture various image degradations, and many more. These challenges present difficulties in the choice of the algorithm to use, and sometimes, it is practically impossible to implement these algorithms in a low-capacity hardware application where computational power and memory utilization are of great concern. In this study, a simple yet effective and noniterative global and bilevel thresholding technique is proposed. It uses the concept of image gradient vector to binarize or segment the image into three clusters. In addition, a parametric preprocessing approach is also proposed that can be used in image restoration applications. Evidences from the experiments from both visual and standard evaluation metrics show that the proposed methods perform exceptionally well. The proposed global thresholding outperforms the formidable Otsu thresholding technique.

1. Introduction

Image binarization and segmentation are without a doubt one of the commonest, most relevant, and frequently used preprocessing operations in digital image processing and related fields. In a more complex task, the segmentation process could significantly reduce the processing time as it reduces the image into smaller regions of interest where subsequent analysis can be conducted more effectively on a localized region. Binarization involves determining the single threshold value based on the pixel intensity to group the image into two clusters often referred to as background and foreground [1]. While binarization could be seen as the simplest form of segmentation, in a wider context, segmentation may involve creating groups of multiple clusters of objects of similar attributes and often involves finding more than one threshold (multilevel thresholding) for this task [2]. The central aim in both approaches is extraction and localization of similar information in the image. Though image binarization and segmentation are not new, it is very difficult to find a more generalized algorithm for this task due to numerous challenges such as image degradation types, artifacts, uneven illuminations, and inherent noise in the acquisition process. Each application of image binarization such as optical character recognition (OCR) [3, 4], document binarization, image restoration, and many machine vision applications may present different sets of challenges [5]. For instance, in a task where segmentation may be utilized as a preprocessing stage, a segmentation approach with low computational cost may be desirable, and in some machine vision applications where hardware has low processing and memory capacity, some available methods may not be applicable [2].

Several methods for image binarization are well known in literation; one of the most formidable methods was proposed by Otsu [6] which uses the concept of image histogram. The Otsu method and related ones are gradients dependent, while others methods that utilize the concept of entropy and computational intelligence techniques been proposed in [7-14]. Despite relative performance, Otsu and its related methods perform abysmally bad where the image histogram cannot be segregated into two clusters [15].

One of the most challenging problems related to the influence of image thresholding on further analysis is document image binarization, and therefore, newly developed algorithms are typically validated by using prepared document images containing various distortions. For this reason, well-known document image binarization contest (DIBCO) datasets are typically used to verify the usefulness and validate the efficacy and performance of binarization methods. These databases are prepared for yearly document image binarization competitions; an example is the handwritten H-DIBCO dataset [16], containing only handwritten document images without machine-printed samples. All DIBCO datasets contain not only the distorted document images but also “ground-truth” binary images, and therefore, the binarization results can be compared with them at the pixel level analyzing the numbers of correctly and incorrectly classified pixels [17, 18].

2. Review of Image Thresholding Techniques

Over the years, numerous scholarly articles were proposed and implemented for automatic image binarization and segmentation through identification of suitable thresholding intensity values. Some approaches are single threshold and others are multilevel threshold where thresholds are determined based on global intensity distribution or localized intensity distribution over smaller regions within the image. While some approaches consider salient attributes in computing threshold such as histogram distribution, gradients information, information gain in separating pixels into clusters, and so on, others use computation intelligence-based optimization approach inspired by nature [17].

One of the most popular global thresholding algorithms for image binarization was proposed by [6] in 1979. Otsu’s proposed an iterative method where the intensity levels are divided into two clusters (background and foreground) for all possible intensity values in the image. In each chosen threshold, they compute a measure of spread for the pixel levels intensity in each cluster. The aim is to find the threshold value where the sum of foreground and background spreads is at its minimum. Kittler et al. [19] used mixture of Gaussian distribution. Unlike the Otsu method, they modelled both the background and foreground cluster using a Gaussian distribution and determined the automatic threshold as the mixture of these two models. Bernsen and Sauvola and Pietikäinen [20, 21] proposed adaptive local thresholding techniques where an NXN window block slides over the entire image. At each window block, a threshold is determined based on the local pixel within the window until the entire image is thresholded. These methods may not generate accurate results where image is affected with degradations such as shading, blurring, low resolution, and uneven illumination [22]. Bouaziz et al. [12] proposed a multilevel image thresholding (MECOAT) using the cuckoo optimization algorithm (COA). The COA is a new nature-based optimization algorithm which is inspired by a bird named cuckoo to determine the threshold that will minimize the entropy in segmenting pixel intensity levels into clusters.

Each of the approach has its benefits and downside such as lack of generalization, computational complexity which makes them difficult for implementation in low-capacity hardware and inability to capture different degradation patterns in image which sometimes render them completely ineffective in some cases such as document binarization in H-DIBCO. For H-DIBCO, some additional preprocessings were proposed by [15, 18], but authors have to deal with various degradation types in the document before binarization. These steps are for the most part necessary to be able to produce a meaningful binarization. Our approach attempts to address some of the gaps described.(i)Noniterative approach with low computational complexity(ii)Multipurpose where the same algorithm can be used for global and bilevel thresholding without extra computational cost(iii)Proposed a parametric preprocessing approach in document binarization. The parameter can be varied to capture different degradations for improved binarization accuracy.

3. Proposed Method

3.1. Bilevel Thresholding

Bilevel thresholding involves the estimation of two thresholds which split the input image into three clusters with similar attributes. The assumption here is that the bulk volume of objects in an image possesses intensities that are close, while boundaries and edges within object occur at higher frequency transition and hence are likely to belong to a similar cluster. One important statistical tool to estimate these properties is intensity gradient between adjacent pixels [23]. We proposed computation of these gradients or deviations with respect to a fixed reference pixel value against the use of local or neighborhood pixel intensity. Arithmetic mean of the overall pixel intensity in an image is computed as the reference pixel, and then, we generate a gradient image by computing the intensity difference between each pixel in the original image and reference pixel using the relation in equations (1)–(3). Though the reference pixel may be useful as a threshold in some simple binary segmentation tasks, it will be grossly inadequate in a more complex segmentation task because it does not take into the consideration higher frequency gradients (e.g., boundaries and edges) of the objects in image. To incorporate gradient information in the threshold determination, we compute the arithmetic mean of the gradient image and then offset the reference pixel by this value. Figure 1 shows a cameraman image with the normalized histogram and the two estimated thresholds and mean of the gradient image .

Since and edges or boundaries gradients which mark object perimeter may have different intensity levels in the depending on how close or far off they are from , we developed the opinion that two thresholds and could be established to segment the image into three clusters . The threshold is a negative offset from the reference point , and the second threshold is determined as the positive offset from by distance . Hence, the thresholds and the clusters can be computed using equations (4) and (5), respectively. In Figure 2, pixels within a cluster are assigned logical ones, whereas those outside the cluster are assigned logical zeros.

3.2. Global Thresholding

The proposed bilevel thresholding could be extended to implement image binarization where single threshold is required. To realize this, we consider probability density function of the pixel intensity distribution of the gray scale image . One of the two thresholds computed in bilevel thresholding can be used to separate the pixels into two clusters. We deploy the concept of probability density function (pdf) to decide on the offset distance from the reference point that would have result in gain of more information when the image is segmented into two clusters. If and are rounded up to a nearest integer, we can compute the cumulative sum of the pdf of pixel between the reference point and the two points and . The two cumulative sums are compared to determine the single global threshold T, required to binarize the image using equation (7).

If we define nk as the frequency of a pixel with k intensity in an image I(i, j) of size M × N, then pdf Pk(nk) can be deduced using equation (6). An example of an image binarized using the proposed global thresholding is shown in Figure 3.

(1)Input: image, bilevel
(2)Output:
(3)If is RGB //RGB to gray conversion
(4)else
(5) //arithmetic mean //mean of gradient vector
(8) //first threshold
(9) //second threshold
(10)If bilevel is true:
(11)
(12)
(13)
(14)Else//global thresholding
   //compute probability distribution function
(15) If
(16)
(17)
(18)
(19)Else
(20)
(21)
(22)
3.3. Extension to Document Binarization

Document binarization requires an important stage to remove undesirable artifacts before segmenting the document into two clusters. We proposed a preprocessing technique prior to computing the global threshold proposed above. This consists of the number of stages to achieve the desired noise removal from the document as shown in Figure 4. It starts with removal of noise using median filtering which is immediately followed by contrast adjustment. The contrast adjustment deploys a technique based on contrast limited adaptive histogram equalization (CLAHE) to decrease the effect of uneven contrast distribution in the image.

3.3.1. Max Intensity Thresholding

It is a process of rough separation of the image into foreground and background cluster based on the maximum pixel intensity value. We first compute the negative of the contrast adjusted image and find the maximum intensity value pixel, . The foreground mask and background mask can then be determined using the following equations:where is an adjustable parameter between 0 and 1. The background mask is further processed using morphological opening operation with ball-like structural element. The morphologically opened background mask is added to the foreground mask to compensate for the foreground pixels that might have been misclassified in the foreground mask during max intensity thresholding. Median filtering is then applied to remove outliers and noise resulting from the compensation process. A copy of this compensated filtered foreground image is created and morphologically opened and then subtracted from the original copy in a process we referred as morphological opening and compensation. The last operation adjusts the contrast of the final image as shown in Figure 4.

(1)Input: gray scale image , k parameter
(2)Output: preprocessed gray scaled image
(4) //median filtering
(5) //adaptive histogram equalization filtering
(6) //image complement
(8)) //opening with structural element 1
(9) applying equations (8) and (9) on //background and foreground masks
(10)) //mask estimation with structural element 2
(11) //first foreground compensation
(12) //median filtering
(13)) //second compensation
(14) //contrast adjustment
(15) //closing operation

4. Experimental Results

Experimental results are presented in three steps for ease of comparison with the existing techniques: global thresholding, bilevel thresholding, and document binarization with proposed global thresholding and preprocessing.

4.1. Evaluations

To evaluate the performance of the proposed thresholding methods, we used the performance metrics described. These metrics compute indexes about the quality of predicted image in comparison to the ground-truth image. For binarization, attributes such as true positive (TP), false positive (FP), true negative (TN), and false negative (FN) are used to compute these metrics. The foreground pixels (logical ones) are often considered as the positives, whereas the background pixels (logical zeros) are considered negatives.

4.1.1. Precision

Precision (also known as positive predictive value) is the fraction of true positive pixels out of the total positive pixels contained in the predicted binary image. It provides a probabilistic measure of how positive pixels are predicted.

4.1.2. Recall

The recall metric (also known as sensitivity) is the fraction of the true positive pixels out of the total positive pixels therein the ground-truth image.

4.1.3. F-Measure

F-measure is the harmonic mean between the recall and precision as expressed in equations (10) and (11)

4.1.4. Root Mean Square Error

RMSE computes the standard deviation of the residual’s errors between the ground-truth image and the estimated or predicted image as given in the following equation:

4.1.5. Peak Signal-Noise-Ratio (PSNR)

PSNR is a measure of the ratio of the maximum pixel intensity to the noise in the predicted image expressed in logarithmic form as a function of the RMSE.

4.2. Global Thresholding

In this context the thresholding is implemented without preprocessing, and the results are compared to the Otsu method as shown in Figure 5. Based on the computed matrices such as PSNR, RMSE, and visual evidence given in Table 1, the proposed method matches and sometimes outperforms the Otsu global thresholding method. Much less computation is required with the proposed method than the Otsu method.

4.3. Bilevel Thresholding

In Figure 6, we present two original images and their clusters when segmented using the proposed bilevel threshold determination method. It becomes evident that information of similar attributes has been categorized in similar cluster.

4.4. Document Binarization

Figure 7 shows an example of the image from the database proposed method with the output of the proposed preprocessing at each stage. In Table 2, comparison results of the proposed method and other stage-of-the art approaches are presented.

5. Discussion

The proposed global thresholding is incredibly simple, yet effective as confirmed from both visual and empirical experimental evidences. The computational complexity of the proposed global thresholding is much lower than the Otsu method which is as an added advantage when used in real-time or low-capacity application where speed and resource management are essential. To have an insight of the comparative computational analysis, for instance, to determine a global threshold using our method, only three parameters are needed to be computed (mean , gradient image , and mean of the gradient image ). For an image of size , the number of additions and subtractions needed are approximately in the order . On the other hand, to compute the same threshold using Otsu methods, three parameters are also needed (weight, mean, and variance) for each gray level pixel intensity L in the image. For an 8-bit image , approximately additions and multiplications are needed for variance computation. This is a lot compared to the proposed method. Similarly, the proposed method can be extended to perform bilevel thresholding (3 clusters) without requiring additional computation which cannot be achieved by Otsu or any other global thresholding techniques.

6. Conclusion

A noniterative approach for global and bilevel image thresholding was proposed and implemented with low computational complexity. The approach combines the benefit of using the same algorithm to perform both global and bilevel thresholding without extra computational cost. Similarly, a parametric preprocessing approach for document binarization was proposed. The parameter can be varied to capture different degradations in image for improving document binarization accuracy. Both visual and experimental evidences with standard evaluation metrics demonstrate the efficacy of the proposed method. The global thresholding outperforms the formidable Otsu thresholding method.

Data Availability

Some of the sample images from DIBCO2017 used for evaluating the proposed method are publicly available at https://vc.ee.duth.gr/dibco2017/benchmark/.

Conflicts of Interest

The author declares that there are no conflicts of interest.