Practical Recognition System for Text Printed on Clear Reflected Material

Mohammad, Khader; Agaian, Sos

doi:https://doi.org/10.5402/2012/253863

International Scholarly Research Notices

On this page

Abstract Introduction Conclusion Disclosure Acknowledgments References Copyright Related Articles

Research Article | Open Access

Volume 2012 | Article ID 253863 | https://doi.org/10.5402/2012/253863

Practical Recognition System for Text Printed on Clear Reflected Material

Khader Mohammad¹and Sos Agaian²

Academic Editor: A. Prati, A. Gasteratos, D. Hernandez

Received19 May 2012

Accepted30 Jul 2012

Published14 Oct 2012

Abstract

Text embedded in an image contains useful information for applications in the medical, industrial, commercial, and research fields. While many systems have been designed to correctly identify text in images, no work addressing the recognition of degraded text on clear plastic has been found. This paper posits novel methods and an apparatus for extracting text from an image with the practical assumption: (a) poor background contrast, (b) white, curved, and/or differing fonts or character width between sets of images, (c) dotted text printed on curved reflective material, and/or (d) touching characters. Methods were evaluated using a total of 100 unique test images containing a variety of texts captured from water bottles. These tests averaged a processing time of ~10 seconds (using MATLAB R2008A on an HP 8510?W with 4?G of RAM and 2.3?GHz of processor speed), and experimental results yielded an average recognition rate of 90 to 93% using customized systems generated by the proposed development.

1. Introduction

Recognition of degraded characters is a challenging problem in the field of image processing and optical character recognition (OCR). The accuracy and the efficiency of OCR applications are dependent upon the quality of the input image ?[1–3]. Security applications and data processing have increased the interest in this area dramatically. Therefore, the ability to replicate and distribute extracted data becomes more important ?[4, 5].

In [6], Jung et al. presented a survey of the term text information extraction (TIE) within an image, by assuming that there is no prior knowledge such as location, orientation, number of characters, font, color, and so on. They also noted that: (a) the variations of text due to differences in size, style, orientation, and alignment, as well as low image contrast and complex background make the problem of automatic text extraction extremely challenging; (b) variety of approaches to TIE from images and video have been proposed for specific applications, such as page segmentation, address block location, license plate location, and content-based image/video indexing. In spite of such extensive studies, it still remains laborious to design a general-purpose OCR system [5] for the reason that there is an abundance of possible sources of variation when extracting text from a shaded or textured background, from low-contrast or complex images, or from images having variations in font size, style, color, and orientation [6]. These variations make the problem of automatic TIE extremely arduous and convoluted.

Recently, many sophisticated and efficient text recognition algorithms have been proposed ??[7–11]. A survey of text information extraction in images and video is presented in [6, 12, 13].

Babu et al. in [14] proposed degraded character recognition from historical printed books using a coupling of vertical and horizontal hidden Markov models. Other studies focused on text localization methods which directly measured pixel-by-pixel differences between two images and quantified them as the average of the squares using a log scale ?[15]. Despite their relative simplicity, these algorithms have the drawback of being unable to assess similarity across different distortion types ?[16] and possess a greater computational complexity ?[17].

Prior researches have also examined image invariants that handle scaled, rotated, skewed, blurred, and noise-distorted images [18, 19]. One recent research study dealt with degraded character recognition ?[20], while another research has focused on single characters ?[21]. Another study has been done on shape-based invariants, which are most commonly used for character recognition ?[22]. Methods of text extraction from colored (TEFC) images are presented in [23].

The process of extracting text from binary images has been presented in [17, 24, 25]. The research by Likforman-Sulem and Sigelle illustrated an example of how to convert a gray image to a binary image and how to locate the text area using connected component techniques ?[20]. Some enhanced methods used morphology ?[26], while other approaches used algorithms with performance measures. Other researchers employed skew correction and normalization operations for enhancements and parallel classification [27–29].

These approaches have been adopted for common applications such as locating an address on postal mail or a courtesy amount on checks mainly because of the simplicity in implementation ?[30]. The current researches are focusing on developing OCR systems for limited domain applications such as postal address reading [31], check sorting [32], tax reading [33], and office automation for text entry [6] and others. However, there are several practical security applications where one needs to have an optical dotted lines/character recognition system scanned from the reflection of light and the curvature of water bottles images (see Figure 1).

Since the quality of clear reflected material images is much lower than printed or handwritten texts images that employ commonly used character segmentation and recognition algorithms, it is complicated to gain the high success rates by directly applying them to these images. Figure 1 illustrates some bottle text images. It continues to be a challenging problem for an automatic recognition system to detect text from these images because they may face unexpected conditions such as closing loops, spurious branches, and shiny or raised text. Other conditions could be related to the surface on which the text is printed, such as curvature, indentation, or matte finishes.

As a result, new OCR tools should naturally be developed to use the unique properties of text in clear reflected material images.

This paper presents an application-oriented dotted text localization and character segmentation algorithms in clear reflected material images. We have successfully applied the proposed methods in developing text localization and character segmentation algorithms on curved plastic bottles images including Ozarka and Dasani bottled water brand names.

The rest of paper is organized as follows: Section 2 explicates the proposed system and covers the image capturing device (camera), enhancement, text localization and binarization; segmentation and Section 3 explains classification and recognition flow which includes the training set, feature vectors, and classification; Section 4 exhibits computer simulation results and comparison to other systems; Section 5 gives some concluding remarks and perspectives.

2. Proposed System

This section summarizes the proposed system along with components therein. System overview is summarized by the block diagram shown in Figure 2. It includes image capturing, preprocessing, localization, filling, rotation segmentation, and recognition.

2.1. Image Capturing

The capturing device selection satisfies the need for recognition application. The device resolution, focus, aperture, shutter speed, and price are the main factors for selection. Experimental testing shows that camera with a single lens reflex (SLR) has an extremely short shutter speed, high resolution, and excellent close-up capability. In addition of low cost, one of the advantages of this kind of camera is that a user can visualize the image and send it remotely to computer station for analyses. This reduces the amount of memory storage on the device.

2.2. Preprocessing

The goal of preprocessing is to reduce irrelevant information such as noise and increase both contrast and sharpness of images. Different enhancement options are proposed to empower the system and to enable portability. Accuracy and speed of recognition stand as two of the highest concerns in any type of enhancement. The measure of enhancement method is shown in (1). It checks and compares between different options where the original image is divided into blocks with (k1 k2 size) before processing each block. The variable a, and are the maximum and minimum intensity value in a given block, is a small constant equal to 0.0001 to avoid dividing by 0. This measure used for choosing the best parameters among set of images for the local enhancement algorithms. The process of enhancement goes through the three stages; the first one is a local one which is applied on the original image, followed by global, and finally background removal method.

2.3. Algorithm for Local Image Enhancement

Local image enhancement: the logarithmic model as presented in [26, 34] can be used for both contrast and sharpness of the image. It effectively enhances details in very dark or very bright areas (adjust image sharpness) of the image. The following are the proposed formulas used for computation: where and are the generated vectors from (2), , and are constant parameters [35], and and are the normalized original image and the normalized enhanced image. The arithmetic mean of a () window of the original image is , where the parameters a and ß controls the contrast. Therefore, from (2) when , the sharpness of the image increases as ß increases, conversely if , the resulting image is blurred.

This flexibility is not available with many conventional methods including histogram equalization. The flexibility and quality of a logarithmic image-processing algorithm make it a very effective technique for text recognition applications. Figure 3 displays the proposed algorithm example test case results using a window size of with and . The results demonstrate that the algorithm works better for the gray images. Using larger window size will generate an image with sharper edges, which may be desirable for edge detection applications. On the other hand, some of the details of the original image get lost as the window size increases. Larger window size has a disadvantage as the algorithm becomes more computational and time consuming.

(a)

(b)

2.4. Algorithm for Global Image Enhancement

The proposed global enhancement method is based on converting a typical 24-bit RGB colored image to a grayscale image by performing six conversions methods. Note that each pixel component (e.g., in RGB) ranges in value from 0 to 255 and is represented by an eight-bit binary number. As well, a digital color image may be easily converted from one color coordinate system to another. This method differs from what was presented by Adolf and Agaian in [23, 36] by factoring color intensity log function and applying logarithmic operations.

This paper employs the following conversions methods to map and generate gray images out of red (), green (), and blue () colors using different weight factors ( for red, for green, for blue, where k = 1?:?6).

To avoid division by zero, the corresponding value of the color component of the received image is biased upward by one. A crucial aspect to the text extraction process is the proposed conversion of the received color image into a grayscale in a manner that maximizes the contrast between any text in the image and the rest of the image. The proposed process uses different combinations of weight factors and logarithmic functions before the gray images are converted to binary images using a defined threshold (T) as shown below.

The following conversion process applies with respect to pixels in the following manner.(i)The first gray image () is based on the average of red () and green () components, as in (4). The “” and “” represent the row and column pixel location for all images. (ii)The second gray image is a result of dividing the minimum of the three color components values by the average of the color components values, as in (5). (iii) The third image is a result of dividing the minimum values of the three color components by the maximum values of the color components, as in (6). (iv) The fourth set selected consists of 6 colors images, which results from all possible combinations of the colors by dividing one of the values by the other values, as in (7), and using the log for the other set of colors. where , .

One means one of the colors (take log function), another means another log function of another color as follows. The fifth set of gray images (6 images) is a result of dividing a sum of a combination of two values of (, , and ) by the sum of all of the values, as in (8). The other newly proposed set of images (6 totals) was also generated based on . This new set is defined as shown in the (9), where ?:?6. For and images, various weight factors are used, and the optimal factors are selected for each set of images experimentally. A nested loop is implemented to find the best combination factors of , , and . The best output image is selected or used, based on the highest intensity difference between text and background. Table 1 shows set of factors for the images we used.

For all generated grayscale image as discussed is converted to a new image as shown in (10), with an optimal factor of . This factor is needed so that the nearest values (close pixel intensity) are spread out.

Figure 4 shows samples of enhancement results. The last column in the figure shows the best result (in terms of image quality and intensity difference between text and background). In comparison, the new methods illustrate that conventional enhancement (histogram, linearization, and local deviations) does not yield successful results even for fairly clean images, as shown in Figure 5.

To summarize, the following algorithm is what is been used as default system enhancement.

Background Removal Enhancements Algorithm
This is the last step for enhancement. The steps for the proposed algorithm are as follows.??Input Image (output of global enhancement)??Step??1: Examine adjacent pixels. If the pixel is significantly darker than the adjacent pixels, it is converted to black; otherwise it is converted to white (). Using this method the new value is determined based upon local conditions and not on a single value. The new image () is generated using background image () which is generated from () image.???Step?2: In this final step, the enhanced image is generated by combining the newly generated image () with the average Image ()/2). ??Output Image: Enhanced image.

Figure 6 demonstrates the proposed background removal enhancements example. The background removal is best suited for this system and is used as a default enhancement.

(a)

(b)

(c)

2.5. Finding the Region of Interest (Text Localization)

In this section, a text location algorithm with candidate location is presented. Intensity of a plastic image is a main information source for text detection, but it maintains sensitivity to lighting variations. On the other hand, the gradient of the intensity (edge) is less sensitive to lighting changes. An edge-based feature is used for detection phase with the following assumptions.(i)The text is designed with high contrast to its background in both color and intensity images.(ii)The characters in the same context have almost the same size in most of the cases.(iii) The characters in the same context have almost the same foreground and background patterns.

Most approaches for text identification refer to gray or binary document images. Only recently, some techniques have been proposed for text identification and extraction in color documents. Recently, Wang et al. [37] proposed a color text image binarization technique based on a color quantization procedure and a binary texture analysis; however, this technique neglects the identification of the text region stage that does not include a page layout analysis technique. Thillou and Gosselin [38] proposed a color binarization technique for complex camera-based images based on wavelet denoising and a color clustering with K-means; however, in this approach the technique does not include any text extraction stage and is applied only for document images containing already detected text.

The proposed approach uses edge detection and pixel intensity for localizing the edges of the text. After the edges are computed, the number of edges in the and directions is calculated. If the result is higher than a certain threshold, it will be considered as a text area. Then, each text area will be binarized using the luminance values. The noise around the digit will greatly affect this method, so a high quality preprocessing step is crucial for the localization approach. Having the right threshold eradicates noise and improves the binary image quality. In this research, gray image is used as the default color map for water bottles. Binarization is based on gray threshold values for a localized area.

Edge detection algorithms have been chosen for the water bottle system. Figure 7 shows examples of test cases for water bottles where (a) represents original image, (b) is the text area, (c) is the enhanced image, and (D-I) represents Gaussian, Sobel, Laplacian, log, average, and unsharp detection, respectively. Finding the region of interest algorithm is listed below and shown in Figure 8. Figure 9 show examples of test cases for slabs and water bottles, respectively. ??Input: The candidates’ image with text.??Step??1: The original image is converted to gray. This conversion is done using the standard conversion formula that is used for calculating the effective luminance of a pixel as shown in (11). ???Step?2: Enhancing the gray image and calculating the sum of column of inverted image.??Step?3: Applying the log (laplacian of Gaussian) edge detection vertically and horizontally.??Step?4: Identifying the connected regions. This is based on scanning the image, pixel-by-pixel from top to bottom and left to right in order to identify connected pixel regions, that is, regions of adjacent pixels which share the same set of intensity values. We use the procedure below for detecting text region of horizontal and vertical direction. ?Step?5: Calculating local threshold to get a binary image.?The original image is segmented into 6 images to get more accurate threshold. The threshold (T) is defined by (14) in which the result of multiplying the optimal factor with the standard deviation of the gray image is subtracted from the maximum intensity value of the gray image. ?Output: The created region of interest.

(a)

(b)

(c)

(d)

(e)

(f)

(g)

(h)

(i)

2.6. Text Segmentation Approach

This section details the new algorithm for rotation, segmentation, and filling. Before applying the segmentation approach, the rotation and curved text handling approaches are discussed.

After the completion of the edge detection process, the utilization of the connected region procedure occurs before the binary image is generated.

2.6.1. Image Rotation

The goal in this process is to use geometry tools and operations to automatically change the orientation of the images appropriate for classification. Dealing with degraded dotted text on clear plastic poses a challenge for the segmentation of dotted and angled text. To address this problem, the image is binarized and rotated (if necessary). Then, text filling, localization, and segmentation procedures are applied. In the proposed system, the localized binary text image is used as input to the system.

The text in the image can be at different angles. Below is the algorithm pseudocode flow for determining the angle of rotation. Figure 10 summarizes a proposed rotation flow with an example. ??Input: The candidates binarized image??Step??1: The algorithm identifies the location of the first nonbackground pixel (white, binary value of “1”) in each binary image column starting from the bottom.???Step?2: It then identifies the location of the first nonbackground pixel (white pixel “1”) from the top of each column in the previously generated image.??Step?3: Next, it identifies the four corners (top, left, bottom, and right), where the text touches.??Step?4: The rotation slope (angle ) is determined using these four corners and is calculated as: ??Once the angle is known, the text is rotated back to the horizontal orientation and the image is refurbished by using the select character procedure to remove all background outside the text.??Output Image: Rotated image

?The rotation methods have been tested for two sets of water bottle images where the text is rotated with different angles. Tables 2 and 3 illustrate the results obtained when testing for orientation or rotation invariance for 50 Dasani images and 40 Ozarka images. Each set of images is selected for different bottles, for example set 1 has the bottle with clear day light, and set 2 during night time.

2.7. Proposed New Curved Text Handling Approach

Text images can have curved text from the curvature of the bottle or the printing of the characters. Figure 11 shows the algorithm along with an example wherein the original data is curved in the text image. The idea is to segment the image based on the black column between characters and then crop only text pixels; the boundary of the segmented character image is then touching the characters from four sides.

2.8. Segmentation

Extraction of degraded text in an image remains problematic for current algorithms to resolve. One such problem, as mentioned in ?[2], is the occurrence of touching characters, and this invalidates the assumption that character repetition patterns in the input text match that of a language mode. Figure 12(a) shows the new algorithm for segmentation from a binary image (text image). The segmentations steps are??Input (Binary image after rotation)??Step??1: Extract text image (text touches the 4 corners) using curved text handling approach. ??Step?: Segment vertically: Based on number of lines, split the image vertically ??where is the image height and is the number of lines.??Step : Segment horizontally: for each segmented image in Step , segment the image Image (j) based on number of black columns. ??where is the width of each line image and is the number of black column between characters.??Step : In this step, every segmented character is checked. If it is larger than a normalized character with by 15% (this value is predetermined based on the experiment), then it is segmented again based on the normalized width, which means the characters are connected.??Output image: Characters.

(a)

(b)

The proposed segmentation in a localized binary text image is based on the black column between characters, the character width, and the text box width extracted from the text image. Based on the width of the new text image, the width of each separated image is reviewed. If the width of a segmented character is greater than a normalized width, then the segmented character is known to be composed of several characters and needs further segmentation. A new value for the normalized character width is chosen based on average width of the segmented characters. Figure 12(b) shows segmentation of solid-connected characters from binary image.

2.9. Proposed Fill Dotted Characters

Dotted characters in the segmented images complicate the recognition process. Morphological operations such as dilation, imfill ?[39], and other methods of interpolation using spline curves or spline fills [39, 40] are evaluated and tested to generate solid characters. The larger the gap, the more likely these fill methods yield an erratic result. These methods did not rectify the problem because the character holes/dots are not spaced close enough together. The proposed fill algorithm is applied after segmentation to solidify characters. The proposed fill algorithm is shown in Figure 13. The proposed method is based on the shift and combine operation.

The extracted character image goes through three shift operations: (1) shift up m pixels, (2) shift down with l pixel, and (3) shift left n pixels. Subsequently, a logical check of the original image with the three-shifted images occurs before combining them. The values of , , and are chosen for specific systems after experimental results. Figure 14 shows results of segmentation with filling using Ozarka bottles, this example shows segmentation with filling after an image is split into images,where is the number of lines.

The segmentation methods have been tested for the two sets of images that total 90 images. Table 4 displays the results obtained from Ozarka images with a segmentation rate of 96.5%, while Table 5 displays the results obtained from Dasani images with a segmentation rate of 98.1%. The segmentation and recognition equation (18) demonstrates the evaluation rates.

Table 6 summarizes recognition using the proposed fill algorithm, the morphological fill algorithm, no fill algorithm, and spline fill.

3. Proposed Classification and Recognition Flow

The proposed recognition process uses a structural method for identifying each character. After skeletonization of the extracted character, the feature vector is extracted and compared to the database. The objective of the feature extraction phase is an organized pull out of a set of important features that reduce redundancy in the word image while preserving the key information for recognition. The first feature set is based on global and geometric properties. The second feature set is based on the analysis of the local properties.

Figure 15 details the recognition algorithm. An advantage of this structural method is its ability to describe the structure of a pattern explicitly. The recognition step extracts the image invariant features and applies the character recognition algorithm using set of binary support vector machine (SVM) classifiers. This is a widely used discriminative classification algorithm (51).

3.1. Feature Vectors

After isolating the characters in an image, a set of properties for each of these characters is determined.

The new set of features for single characters based on their geometric properties. The character image is segmented as shown in Figure 16 into blocks. Each block has its own feature value. The solid lines show bigger blocks and the dotted lines show the smaller blocks. The feature value is calculated in (19), where and are the vertical and horizontal group numbers. There values are selected based on experimental result where and equal 40. As the process is driven by character properties, the total number of block feature vector (BFV) is based on 26 groups per character; each group is generated based on different areas as seen in (20). A comparison was done from each set of features (all generated groups) and those which varied by less than 30% were removed.

Groups: Gr(1), Gr(2),…,Gr().

The set of groups generated is regrouped, Figure 17 shows the pseudocode used to regroup characters based on the extracted features and an appropriate slope within the same group is reached. This approach demonstrates that characteristics of similar characters/shapes have distinguishable feature vector values. For example, ( and 8) (), and () are shown in Figure 18. This figure shows that for very close shapes like (), segmented from different images, the features from set () and set (2) exhibit an acceptable difference to distinguish between similar shapes.

(a)

(b)

(c)

(d)

An example of the regrouping method of feature vector results is shown in Figure 19(a) and all groups are shown in Figure 19(b). The line shows the sum of the features used to distinguish characters from each other based on the slope.

(a)

(b)

Features vectors were selected to minimize the processing time while maintaining a high accuracy rate. To maximize the recognition accuracy, three other existing set methods were used along with the proposed approach: (1) the first one is based on using 4 levels of the Haar transform ?[41] with a total of 6 features, (2) the second one is based on skeleton lines of the image ?(49) with 55 features, and (3) the third one is based on features for the area with 79 features ?[42]. Using a total number of 155 features per character, a target feature vector matrix is built for each system. Having a generalized target matrix for all images can slow and reduce accuracy.

3.2. Recognition

Having many different type feature vectors, the recognition is a multiclass problem. Since SVM supports only two-class recognition, a multiclass system is constructed by combining two-class SVMs as mentioned in [43].

Let , , and be the training samples where the training vector is and its corresponding target value. For input pattern , the decision function of binary classifier is shown in (13). where: where is the number of learning patterns, the target value of learning pattern is the bias, and is a kernel function which high-dimensional feature space that equals .

Polynomial kernel which is shown in (23) and the Gaussian radial basis functions kernel which is shown in (24).

In (24), if is chosen as 1, the polynomial kernel is called linear and if is chosen as 2, it is called a quadratic kernel. (This approach can be seen in [43]). In our study, for a Gaussian radial basis function kernel, the kernel width, is estimated from the variance of the sample vectors.

The SVM is chosen in this research for its processing speed and flexibility in using various distance methods. The classifier is tested using a different set of data with varying rules (nearest, random, consensus) and varying distances (Euclidean, cityblock, cosine, correlation). For the concrete slab and Ozarka test case systems, the highest accuracy results as shown in Table 7, where the Euclidean distance with the nearest rule gives best results (95% matching).

3.3. Proposed Training Procedure

The recognition algorithm relies on a set of learned characters and their properties. It compares the features in the segmented image file to the features in the learned set. Figure 20 shows the training database procedure. Each image can have up to 999 vectors in the training set. The input or the trained image for each character in the training set is either resized or assumed to be 24 by 42 pixels to satisfy the procedure, regardless of whether or not the image is distorted. The training databases include vectors for images of all characters and numbers along with distorted characters for system training. Training images are normalized to a fixed size.

4. Computer Simulations and Comparison

Experiments were carried out on an unconstrained Dasani bottles for both Ozarka plastic bottled water images as well as concrete slabs, which were captured at a resolution of 300?dpi. The water bottle text consists of 2 lines, each with 15 characters for Ozarka and each with varying numbers of characters for Dasani. Methods were evaluated using a total of 100 test images containing a variety of text captured from water bottles. For both Ozarka and Dasani water bottles, the text is comprised of dotted line characters, as shown in Figures 21 and 22. These were used as test case images from ?[6]. Images in Figure 23 were used for slab test cases. The system was built using MATLAB R2008A installed in a Compaq 8510W workstation with processor speed of 2.3?GHz and 4G of RAM.

Table 8 shows the summary results obtained from each set of pictures (systems) when tested for text localization and orientation. The segmentation results for all systems are shown in Table 9. The results in this table demonstrate impressive accuracy, especially when applying the fill algorithm. The results for slab images with varying line pictures are shown in Table 10.

The best accuracy with dotted characters was found with the Ozarka application. At a 93% accuracy rate, the elapsed time was 18.94 seconds. Dasani water bottles yielded an average accuracy of 90%, with an elapsed time of 29.5 seconds. Meanwhile, for the concrete slabs, the segmentation and recognition accuracy rate was an average of 98%.

The fill algorithm increased the accuracy by 10–20%. The greatest improvement after using the fill algorithm was for the Dasani bottles where the accuracy rate increased from 80% to 99%. The accuracy rate for Ozarka increased from 91% to 97%. Since the text in the slab images is solid lines, the fill algorithm did not affect the result.

All known research did not discuss the white text printed on reflected material, so comparison is applied to work presented in this research with relevant systems presented in other literatures [11–13, 44–51]. A comparison to their systems (as they reported), mostly from the last five years, as shown in Table 11 demonstrates that the presented system produced better results especially when compared with other related,same language, recognition systems as in [12, 44–48, 51].

5. Conclusion

In this paper, we presented a complete system for automatic detection and recognition of dotted text for application to clear reflected material images. This paper also poses novel tools for extracting text from an image with the following qualities: (a) poor background contrast, (b) white, curved, and/or differing fonts or character width between sets of images, (c) dotted text printed on curved reflective material, and (d) touching characters. Severalimprovements to current mechanisms were presented, including methods for:(i)text detection and segmentation of touching and/or closely printed characters,(ii)filling of dotted text,(iii)rotation and curvature handling.

We have successfully applied the proposed methods in developing text localization and character segmentation algorithms on Ozarka and Dasani curved plastic bottled water images. Methods were evaluated using a total of 100 test images containing a variety of texts captured from water bottles. The overall results yielded a recognition rate of 93% accuracy for images captured under the specified conditions. The filling algorithm improved text recognition by more than 20% compared to fill methods presented in the literature. This method shows almost 16% improvements over spline technique and other morphological operations. The main strength of the proposed system lies in its training phase, which does not require any manual segmentation of the data to train the character models.

These tests averaged a processing time of ~10 seconds (using MATLAB R2008A on an HP 8510W with 4G of RAM and 2.3?GHz of processor speed), and experimental results yielded an average recognition rate of 90% to 93% using customized systems generated by the proposed development. We consider that our approach achieves good performance given that the data correspond to real text bottle images; however, it is hard to compare completely with other approaches since there is no similar investigation.

Disclosure

The authors do not have any financial relation with the commercial identity “Ozarka, and Dasani" mentioned in this paper.

Acknowledgments

The authors would like to thank the associate editor and anonymous reviewers for their invaluable comments and suggestions led to a great improvement on this paper. The authors also gratefully acknowledge Dr. Hani Saleh for his assistance and coding support.

References

F. Idris and S. Panchanathan, “Review of image and video indexing techniques,” Journal of Visual Communication and Image Representation, vol. 8, no. 2, pp. 146–166, 1997.
View at: Google Scholar
H. Li, D. Doermann, and O. Kia, “Automatic text detection and tracking in digital video,” IEEE Transactions on Image Processing, vol. 9, no. 1, pp. 147–156, 2000.
View at: Publisher Site | Google Scholar
R. Cattoni, T. Coianiz, S. Messoldi, and C. M. Modena, “Geometric layout analysis techniques for document image understanding a review,” ITC-IRST Technical Report #9703-09, 1998.
View at: Google Scholar
B. A. Yanikoglu, “Pitch-based segmentation and recognition of dot matrix text,” International Journal on Document Analysis and Recognitionno, vol. 3, no. 1, pp. 34–39, 2000.
View at: Google Scholar
H. Liu, M. Wu, G. F. Jin, and Y. Yan, “A post processing algorithm for the optical recognition of degraded characters,” in Document Recognition and Retrieval VI, vol. 3651 of Proceedings of SPIE, pp. 41–48, The International Society for Optical Engineering, San Jose, Calif, USA, January 1999.
View at: Google Scholar
K. Jung, K. I. Kim, and A. K. Jain, “Text information extraction in images and video: a survey,” Pattern Recognition, vol. 37, no. 5, pp. 977–997, 2004.
View at: Publisher Site | Google Scholar
S. Choi, J. P. Yun, and S. W. Kim, “Text localization and character segmentation algorithms for automatic recognition of slab identification numbers,” Optical Engineering, vol. 48, no. 3, Article ID 037206, 2009.
View at: Publisher Site | Google Scholar
K. Wang and J. A. Kangas, “Character location in scene images from digital camera,” Pattern Recognition, vol. 36, no. 10, pp. 2287–2299, 2003.
View at: Publisher Site | Google Scholar
X. Chen, J. Yang, J. Zhang, and A. Waibel, “Automatic detection and recognition of signs from natural scenes,” IEEE Transactions on Image Processing, vol. 13, no. 1, pp. 87–99, 2004.
View at: Publisher Site | Google Scholar
Y. Liu, S. Goto, and T. Ikenaga, “A contour-based robust algorithm for text detection in color images,” IEICE Transactions on Information and Systems, vol. E89-D, no. 3, pp. 1221–1230, 2006.
View at: Publisher Site | Google Scholar
B. Zhu and M. Nakagawa, “Segmentation of on-line handwritten Japanese text of arbitrary line direction by a neural network for improving text recognition,” in Proceedings of the 8th International Conference on Document Analysis and Recognition, vol. 1, pp. 157–161, September 2005.
View at: Publisher Site | Google Scholar
X. Liu, H. Fu, and Y. Jia, “Gaussian mixture modeling and learning of neighboring characters for multilingual text extraction in images,” Pattern Recognition, vol. 41, no. 2, pp. 484–493, 2008.
View at: Publisher Site | Google Scholar
M. A. El-Shayeb, S. R. El-Beltagy, and A. Rafea, “Comparative analysis of different text segmentation algorithms on Arabic news stories,” in Proceedings of the IEEE International Conference on Information Reuse and Integration (IRI '07), pp. 441–446, August 2007.
View at: Google Scholar
D. R. R. Babu, M. Ravishankar, M. Kumar, K. Wadera, and A. Raj, “Degraded character recognition based on gradient pattern,” in The 2nd International Conference on Digital Image Processing, Proceedings of SPIE, February 2010.
View at: Publisher Site | Google Scholar
R. C. Gonzalez and R. E. Woods, Digital Image Processing, Prentice Hall, Upper Saddle River, NJ, USA, 2nd edition, 2002.
E. A. Silva, K. Panetta, and S. S. Agaian, “Quantifying image similarity using measure of enhancement by entropy,” in Mobile Multimedia/Image Processing for Military and Security Applications, vol. 6579 of Proceedings of the SPIE, April 2007, Paper #6579-32.
View at: Google Scholar
E. Wharton, K. Panetta, and S. Agaian, “Human visual system based similarity metrics,” in Proceedings of the IEEE International Conference on Systems, Man and Cybernetics (SMC '08), pp. 685–690, October 2008.
View at: Google Scholar
C. Fang and J. J. Hull, “A Modified character-level deciphering algorithm for OCR in degraded documents,” in IS&T Conference on Document Recognition II, vol. 2422 of Proceedings of SPIE, pp. 76–83, March 1999.
View at: Google Scholar
E. Y. Kim, K. Jung, K. Y. Jeong, and H. J. Kim, “Automatic text region extraction using cluster-based templates,” in Proceedings of the of International Conference on Advances in Pattern Recognition and Digital Techniques, pp. 418–421, 2000.
View at: Google Scholar
L. Likforman-Sulem and M. Sigelle, “Recognition of broken characters from historical printed books using Dynamic Bayesian Networks,” in Proceedings of the 9th International Conference on Document Analysis and Recognition (ICDAR '00), 2000.
View at: Google Scholar
M. Yokobayashi and T. Wakahara, “Binarization and recognition of degraded characters using a maximum separability axis in color space and GAT correlation,” in Proceedings of the 18th International Conference on Pattern Recognition (ICPR '06), pp. 885–888, August 2006.
View at: Google Scholar
D. J. Granrath, “The role of human visual models in image processing,” Proceedings of the IEEE, vol. 69, no. 5, pp. 552–561, 1981.
View at: Google Scholar
A. Cusmariu, “Method of extracting text present in a color image,” United State Patent, patent no 6519362b1, 2009.
View at: Google Scholar
S. Liang, M. Ahmadi, and M. Shridhar, “Segmentation of handwritten interference marks using multiple directional stroke planes and reformalized morphological approach,” IEEE Transactions on Image Processing, vol. 6, no. 8, pp. 1195–1202, 1997.
View at: Google Scholar
Y. K. Chen and J. F. Wang, “Segmentation of single- or multiple-touching handwritten numeral string using background and foreground analysis,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 22, no. 11, pp. 1304–1317, 2000.
View at: Google Scholar
E. Wharton, S. Agaian, and K. Panetta, “Comparative study of logarithmic enhancement algorithms with performance measure,” in Image Processing: Algorithms and Systems, Neural Networks, and Machine Learning, vol. 6064 of Proceedings of SPIE, January 2006, Paper #6064-12.
View at: Publisher Site | Google Scholar
S. S. Agaian, K. Panetta, and A. M. Grigoryan, “Transform-based image enhancement algorithms with performance measure,” IEEE Transactions on Image Processing, vol. 10, no. 3, pp. 367–382, 2001.
View at: Publisher Site | Google Scholar
P. Xiang, Y. Xiuzi, and Z. Sanyuan, “A hybrid method for robust car plate character recognition,” Engineering Applications of Artificial Intelligence, vol. 18, no. 8, pp. 963–972, 2005.
View at: Publisher Site | Google Scholar
L. Xu, A. Krzyzak, and C. Y. Suen, “Methods of combining multiple classifiers and their applications to handwriting recognition,” IEEE Transactions on Systems, Man and Cybernetics, vol. 22, no. 3, pp. 418–435, 1992.
View at: Publisher Site | Google Scholar
D. Chen, J. Luettin, and K. Shearer, “A survey of text detection and recognition in images and videos,” Institut Dalle Molled'Intelligence Artificielle Perceptive (IDIAP) Research Report IDIAP-RR, 2008.
View at: Google Scholar
S. N. Srihari, Y. C. Shin, V. Ramanaprasad, and D. S. Lee, “A system to read names and addresses on tax forms,” Proceedings of the IEEE, vol. 84, no. 7, pp. 1038–1049, 1996.
View at: Google Scholar
S. Gopisetty, R. Lorie, J. Mao, M. Mohiuddin, A. Sorin, and E. Yair, “Automated forms-processing software and services,” IBM Journal of Research and Development, vol. 40, no. 2, pp. 211–229, 1996.
View at: Google Scholar
N. Gorski, V. Anisimov, E. Augustin, O. Baret, D. Price, and J. C. Simon, “A2iA Check Reader: a family of bank check recognition systems,” in Proceedings of the 5th International Conference on Document Analysis and Recognition, 1999.
View at: Google Scholar
K. Mohammad, S. Agaian, and F. Hudson, “Implementation of Digital Electronic Arithmetics and its application in image processing,” Computers and Electrical Engineering, vol. 36, no. 3, pp. 424–434, 2010.
View at: Publisher Site | Google Scholar
G. Deng, L. W. Cahill, and G. R. Tobin, “Study of logarithmic image processing model and its application to image enhancement,” IEEE Transactions on Image Processing, vol. 4, no. 4, pp. 506–512, 1995.
View at: Publisher Site | Google Scholar
S. S. Agaian, “Visual morphology,” in Nonlinear Image Processing X, vol. 3646 of Proceedings of SPIE, pp. 139–150, January 1999.
View at: Google Scholar
B. Wang, X. F. Li, F. Liu, and F. Q. Hu, “Color text image binarization based on binary texture analysis,” Pattern Recognition Letters, vol. 26, no. 10, pp. 1568–1576, 2005.
View at: Publisher Site | Google Scholar
C. Thillou and B. Gosselin, “Color binarization for complex camera-based images,” in Proceedings of the Electronic Imaging Conference of the International Society for Optical Imaging, pp. 301–308, January 2005.
View at: Google Scholar
M. Unser, “Splines: a perfect fit for medical imaging,” in International Symposium on Medical Imaging: Image Processing (MI' 02), Proceedings of the SPIE, pp. 225–236, San Diego, Calif, USA, February 2002.
View at: Google Scholar
http://en.wikipedia.org/wiki/Spline_(mathematics).
http://cnx.org/content/m11089/latest.
M. Pechwitz and V. Maergner, “Baseline estimation for Arabic handwritten words,” in Proceddings of the 8th International Workshop of Frontiers in Handwriting Recognition (IWFHR '02), August 2002.
View at: Google Scholar
N. Kilic, P. Gorgel, O. N. Ucan, and A. Kala, “Multifont Ottoman character recognition using Support Vector Machine,” in Proceedings of the 3rd International Symposium on Communications, Control, and Signal Processing (ISCCSP '08), pp. 328–333, March 2008.
View at: Google Scholar
Y. J. Song, K. C. Kim, Y. W. Choi et al., “Text region extraction and text segmentation on camera-captured document style images,” in Proceedings of the Eight International Conference on Document Analysis and Recognition, Seoul, Korea, August 2005.
View at: Google Scholar
S. Sharma, Extraction of Text Regions in Natural Images, Rochester Institute of Technology, Rochester, NY, USA, 2007.
D. Chen, H. Bourlard, and J. P. Thiran, “Text identification in complex background using SVM,” in Proceedings of the IEEE International Computer Society Conference on Computer Vision and Pattern Recognition, vol. 2, pp. II621–II626, December 2001.
View at: Google Scholar
Z. Saidani, Image and Video Text Recognition Using Convolutional Neural Networks [Ph.D. thesis], LAP Lambert Academic, Saarbrücken, Germany, 2008.
V. Ganapathy and L. W. L. Dennis, “Malaysian vehicle license plate localization and recognition system,” Journal of Systemics, Cybernetics and Informatics, vol. 6, no. 1, 2008.
View at: Google Scholar
X. Li, W. Wang, Q. Huang, W. Gao, and L. Qing, “A hybrid text segmentation approach,” in Proceedings of the IEEE International Conference on Multimedia and Expo (ICME '09), pp. 510–513, July 2009.
View at: Publisher Site | Google Scholar
Q. Ye, W. Gao, and Q. Huang, “Automatic text segmentation from complex background,” in Proceedings of the International Conference on Image Processing (ICIP '04), pp. 2905–2908, October 2004.
View at: Google Scholar
J. Gllavata, E. Qeli, and B. Freisleben, “Detecting text in videos using fuzzy clustering ensembles,” in Proceedings of the 8th IEEE International Symposium on Multimedia (ISM '06), pp. 283–290, December 2006.
View at: Google Scholar

Copyright

Copyright © 2012 Khader Mohammad and Sos Agaian. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

PDF Download Citation

Download other formats

Order printed copies

Views

2927

Downloads

1869

Citations