Abstract

The paper addresses a problem of detection and classification of rubber stamp instances in scanned documents. A variety of methods from the field of image processing, pattern recognition, and some heuristic are utilized. Presented method works on typical stamps of different colors and shapes. For color images, color space transformation is applied in order to find potential color stamps. Monochrome stamps are detected through shape specific algorithms. Following feature extraction stage, identified candidates are subjected to classification task using a set of shape descriptors. Selected elementary properties form an ensemble of features which is rotation, scale, and translation invariant; hence this approach is document size and orientation independent. We perform two-tier classification in order to discriminate between stamps and no-stamps and then classify stamps in terms of their shape. The experiments carried out on a considerable set of real documents gathered from the Internet showed high potential of the proposed method.

1. Introduction

Nowadays, computer analysis of digitized documents is one of key areas of digital image processing and pattern recognition. Computer-based processing of documents joins many fields of research, especially computer vision, image analysis, pattern recognition, and artificial intelligence. The detection and recognition of rubber stamps (seals) on digital images are still very important problems. As we face a significant change in the technology, namely, a conversion from paper documents into digital ones, the need for solutions for automatic segmentation and extraction of important document’s elements is very high. The problem dates back to the ’80s last century, when an issue of stamp authenticity has been addressed [1]. However, despite considerable progress in this field, this problem still remains open [24].

The literature survey shows that algorithms of stamp detection and recognition employ different features of digital images. Most of the methods can be classified as ones using shape information [4, 5] or ones based on color features [3, 6]. There are only a few algorithms that join features from those domains (e.g., [7]).

While stamp detection can be realized by means of both shape and color features, the problem of stamp classification is purely shape-based. It has been proved many times that shape (silhouette, contour) is one of the most important low level image features since it is a very important attribute to human perception [8]. Humans tend to perceive complex scenes as being composed of elementary objects, which can be identified by their shapes. Shape is a particularly useful, especially when we want to identify specific objects in image, for example, in medicine or biology (e.g., cell shapes in microscopic images) [9, 10] or surveillance (e.g., optical character recognition of license plates [11]) and many other systems, where objects easy to segment may be determined. While general shape recognition and classification problems have a significant place in the scientific literature, often with reported accuracies of over 90% (e.g., [12, 13]), the specific topic of stamps retrieval by means of low-dimensional feature sets is much less represented (with significantly lower accuracies).

A recent study shows that few different methods oriented at automatic stamp detection and classification have been proposed. They work either on color stamps [3] or detect objects of particular shape [4, 14]. Unfortunately, there is no general approach aimed at detection and classification of all the diversity of stamps. There is also a large group of algorithms oriented at logo retrieval, which is a similar problem [15, 16]. The newest ones include application of modern shape descriptors, for example, SIFT/SURF/FAST [1719] or ART [20]. It should be, however, noted that such approaches aimed at logo detection can not be directly employed to stamp detection because of different properties of both objects’ classes. Such geometrical features that are common for both object classes [15, 21, 22] are heavily influenced by the process of imprinting (noise, inconsistency, gaps, stains, etc.) in the stamp representation. Because of this fact and the diversity of stamps, even within a single class (in terms of shape), a straightforward application of such descriptors is not possible.

In this paper we present a new solution for detecting stamps of different shapes and colors. The experiments showed that they can be extracted properly even if they are overlapped with signature or text. Because of the high variance in the shape domain we focus on rather standardized stamps (with well-defined shape) of any particular color. In opposition to [3], we do not rely on color properties only; hence the detection of black stamps is also possible. Moreover, shapes being detected are not limited to ovals and squares, like in [4]. The experiments showed also some good results of detecting stamps in documents containing other similar objects such as logos and texts.

1.1. Existing Approaches

As it has been noted, the algorithms of stamp detection and recognition can employ different features of digital images. These features are based, in most cases, on color or edge information. The algorithms proposed in [3, 6] use color representation. Candidate areas are extracted using the projection of and matrices onto both axes. As an effect, a binary mask of bounding boxes (instead of the original areas) is generated. Unfortunately, it negatively influences further processing aimed at determining their shapes. Although, the reported detection accuracy of color stamps is rather high (over 83%), such a solution can not be employed in case of grayscale images or black-colored stamps.

The method proposed in [23] allows detecting stamps even in highly degraded documents. It does not take into consideration the color information, since the main principle of the proposed method is so called connected edges that creates contour of the stamp. In the first step, an input image is subjected to scaling and edge detection. Then, candidate regions are evaluated using a set of rules that are characteristic to pixels within stamps. Selected elements are finally verified using Hough Transform. Unfortunately, the detection algorithm can deal with circular- and oval-shaped stamps only. The reported accuracy is higher than 35% and it is not sufficient for practical implementations.

Another method that employs edge information is presented in [5]. It uses a two-stage approach. The first step is based on letters detection. The letters being often a part of a stamp have specific spatial properties, which are employed in presented method. The second stage involves the usage of a Generalized Hough Transform in order to verify the result obtained in the first step. GHT has a clear advantage over the basic Hough Transform version since it allows finding any particular shapes. The experiment performed by the authors showed that the accuracy is close to 70%.

The solution proposed in [15] addresses the problem of logo detection, which can be considered similar to the tasks of stamp detection and recognition. Proposed approach is divided into three stages: segmentation, in which a document is decomposed into smaller parts (text regions, images, logos, etc.), detection of logotypes based on the features extracted from output segments, and matching logotypes by means of template matching. The main principles of the proposed method are projection of binarized image, analysis of connected components, and a classification stage. Unfortunately, this solution can not be directly used for the detection of stamps. Features used by the authors of [15] do not allow getting such good accuracy of classification in case of stamps, which differ significantly from logotypes.

The whole group of methods is based on recently proposed features, like Scale-Invariant Feature Transform (SIFT) or Speeded Up Robust Features (SURF). In the paper [17] a fast and efficient logo detector is proposed. The solution involves the use of feature vectors obtained by SURF. It is designed to detect a set of local features for each key point in the image. The final features are reduced using Principal Component Analysis. The reference database contains a large collection of samples. Despite a quite high complexity, the accuracy of the proposed algorithm is equal to 67%.

The algorithm proposed in [20] is also a multitier approach. Firstly, it performs a noise reduction (median filtering) on the input image. Then the image is binarized and subjected to horizontal dilation. Extracted candidate regions are described using a set of geometrical features. The final verification is done using a tree. The next step is verification. It is done using a set of rules. After successful detection, logos are classified using -Nearest Neighbor classifier. Reported accuracy is equal to 92%; however the method works only on logotypes that are not overlapped by text or any other objects.

It is worth noticing that in the method presented in this paper we joined several elementary approaches in order to create more flexible and more accurate algorithm.

1.2. Data Characteristics

The problem with stamp detection and classification comes mainly from the lack of templates or general models. The in-depth analysis shows that this is due to the absence of standard and commonly used stamp’s representation. Stamps are complex objects, containing graphical and textual elements that can be located anywhere within documents. Their diversity comes from varying orientation, color, fonts, ornaments, and quality of imprinting. Even two imprints of the same physical stamp can look very different. The same applies to other objects that were described in [21]. Taking into consideration above limitations, object’s shape is considered one of the most stable features in terms of classification. Exemplary stamps extracted from real documents divided into shape-specific classes are provided in Figure 1. The last row of Figure 1 shows stamps that can not be categorized in terms of shape. Such stamps are mostly unofficial and less common.

The color information is very important when it comes to document analysis since original documents in most cases include color stamps. The black stamps are less frequently used for official stamping, and the lack of color in many cases suggests a copy (not original document). It is also worth noticing that the stamps on official documents do not cover large area (often not more than 3%–5% of the total image area for A4 page) [24]. Above observation can ease the stage of stamp detection.

2. Algorithm Overview

The input for the proposed approach is given as a scanned or photographed (paper) document. In the scanning process the imaging plane is parallel to the document plane; hence we preserve geometrical features of all elements. In order to reduce the artifacts it is also recommended that the input image has quite high spatial resolution and is stored in a file format that provides minimal loss of quality after decompression. The algorithm presented further is developed to work on both color and monochrome images. It is worth noticing also that the input images are not subjected to any brightness alternation (neither histogram stretching nor equalization). Moreover, we do not assume the number of stamps as well as their location and orientation. The output provides information about the number of stamps detected, their shapes, colors, and coordinates.

The algorithm (see Figure 2) consists of the following steps.(1)Load an image.(2)Detect candidates (in monochrome images or in each color channel in case of color image):(i)detect lines;(ii)detect circles;(iii)detect other shapes (bwBlobs or colorBlobs).(3)Integrate locations and dimensions.(4)Verify stamps/no-stamps.(5)Classify and report the results.Firstly, in case of color images, we perform a color conversion and work on and planes in an independent manner [6]. Then, in order to detect areas of potential stamps, we look for elementary image structures, like lines, circles, and any other consistent areas. Next, we obtain candidate areas which are later verified using a number of features that characterize specific shapes and classify them into stamps and no-stamps (objects with similar shape, yet with different raster features). The further recognition (in terms of shape) is performed using shape descriptors and dedicated classification methods.

Several experiments on color-based detection have also been described in our previous works [6, 25].

2.1. Image Preprocessing

It is known that color space is not optimal in terms of color image segmentation due to the correlation between channels [6]. Hence we convert an input image into representation. Since the color of stamps are mainly in the blue or red range (according to our observations, less than 4% of the stamps are represented by other colors), it is especially usable because of the red/blue separation in that color space. What is more, it is a native color format for JPEG/JFIF files. However, in general case, we assume that an input image of a document is stored in a file with possibly lossless compression, high spatial resolution, and full color range (24-bit ). Hence, the input image is converted into color space (ITU-R BT.709 standard) in order to expose the above shown color properties: where , , and , , are appropriate color components.

According to the above observations, we examine each channel , , and in an independent manner. (see Figure 3).

As a next preprocessing stage, above matrices are filtered using simple averaging mask in order to reduce noise. The filter uses a mask whose size is equal to 3% of input image’s shorter edge. This ensures favorable results in the aspect of computation cost versus quality.

In the next step, each image is binarized and consistent areas are determined. Additionally, before filling holes, each area is subjected to morphological operation, region growing. Binary image is then labeled and each candidate region is passed to the module responsible for stamp detection (verification). Our previous experiments [6] showed also that stamps are objects that are not smaller than 5% of shorter edge of input image and not larger than 40% of its longer edge. Hence, we look for such areas only.

In the case presented below (Figure 3), several potential areas are detected and passed to the verification/classification stage. Presented binarized image is obtained as a superposition of thresholded and channels; hence it contains areas of high blue and red channels intensity.

2.2. Stamp Detection

Different stamp shapes are detected by independent modules (see Figure 2). Firstly, the input image with dimensions of pixels is binarized using adaptive thresholding [26], so it contains a number of closed areas.

2.2.1. Circular Shapes

Detection of circles is based on Circular Hough Transform (CHT) performed on intensity image. At first, an image is binarized and then edge detection using Canny filtering is performed; then morphological opening in order to eliminate noise is employed. On such image, classical Hough Transform is used. Using a voting strategy, it finds circles of different radii. The details of the algorithm can be found, for example, in [27]. In our experiments we used radii from interval [15, 120]. Exemplary results of circular stamps detection are presented in Figure 4.

2.2.2. Shapes Consisting of Straight Lines

For the extraction of shapes containing straight lines (e.g., squares, rectangles, and triangles), we employ a line detector based on Hough Transform. It works on image containing pixel gradients. Firstly, the input image is averaged in order to remove noise. Then, we select pixels of high intensity (above a certain threshold) and for each one we perform the following sequence of operations. Each pixel is represented in the parametric Hough Space as a sinusoid. After checking all selected pixels we get a bunch of intersecting curves. The values at the intersections give an information about the possible lines in the original image. Since values of both slope and intercept in original Hough approach are unbounded and slope value for vertical lines is huge, application of this original technique is complicated. Hence we use an approach, presented in [28], which is an alternative to the original Hough approach.

Then we extend each detected line using 20 pixels at both of its ends in order to create closed area, which is later filled. Further we eliminate introduced extensions. Exemplary results of stamps containing straight lines in their contours are presented in Figure 5.

2.2.3. Nonregular Shapes

In order to extract any other shapes we introduce a heuristic approach presented below. It uses the results of our observations (based on the analysis of our documents database) and very basic algorithms from the image processing area. For each consistent area (blob) found in the image its bounding box is determined. Its area is given as , where and are its width and height, respectively. We assume that the background pixels are zeros; hence, additionally, we calculate the number of pixels (other than zero) in each area . It gives us an information about complexity of each blob. Candidates that satisfy the following conditions are passed to the further processing: As it can be seen, each object’s area must fall into range (400, 5000) pixels. This constraint is not denominated by any relative value and is used just as a filter since other two conditions could easily select objects smaller or larger than expected.

The alternative way of detecting stamps, not taking into consideration their geometrical properties, was presented in [24]. Hoverer, that approach rejects longitudinal blobs and promotes objects of high contrast only.

2.3. Stamp/No-Stamp Verification

Extracted objects are verified using a set of 11 object’s characteristics captured in the spatial domain. We compute seven direct features: average pixel intensity , intensity standard deviation , median value of intensity, pixel contrast, brightness to contrast ratio, intensity of edges, brightness to edges intensity ratio, and four features calculated from Gray-Level Coocurrence Matrix (GLCM) [29]: contrast, correlation, energy, and homogeneity.

Firstly, image intensity matrix of size containing a single blob is converted into a vector of length . Aforementioned features (aside from those which are self-explanatory) are calculated according to following formulas: Median value is calculated over sorted vector , so that , . The result is equal either to or depending on parity. Calculation of remaining features depends on edge detection operator. In our case Sobel algorithm was applied [30]. Intensity of edges is then computed in accordance with 3.

GLCM matrix of size is created by calculating how often a pixel of intensity has a pixel of intensity in its closest horizontal neighbourhood. However, this could be changed based on , offset parameter. For a discrete image, and . Its size is determined by number of gray-levels; in this case both and are equal to . It also means that all values in image matrix are scaled to [1, 8] interval. The GLCM is calculated according to the following formula [31]: The contrast of GLCM is given as [31] The energy is calculated as [31] The homogeneity is given as [31] The correlation of GLCM is calculated as [31] where , and , are mean and standard deviation calculated in row and column directions, respectively.

3. Stamps Shape Features

In this paper we focus on the low-level image features employed in classification as it seems to be, according to our opinion, the most challenging problem. Such low-level properties can be easily computed and stored in limited memory. This two criteria are crucial when it comes, for example, for hardware implementation. Moreover, features presented below were selected, since they showed their abilities in other computer vision tasks.

In case of stamps, shape is one of the most valuable properties. Shape of stamp used for recognition can be considered as a binary object stored in a matrix, which can be represented by a specific number of points, including its interior or as a boundary (outer contour). Compact representation of shape is often known as shape descriptor. It is crucial for the recognition or classification to uniquely characterize shape and stay invariant to as many transformations as possible (i.e., translation, scaling, rotation, presence of noise, and occlusion). These distortions are considered as differences between object under recognition and the reference object belonging to the same class, stored in a database. In practical recognition of stamps one has to take into consideration the following distortions divided into three main categories. The first one includes spatial transformations of an object, mainly translation, rotation in the image plane, and change of scale. The second category includes distortions introduced by imaging system, for example, variable number of captured points, presence of noise, discontinuity of contour, and occlusion. The third category of problems comes from contour representation and contour evaluation. The elements of the second group are the most challenging and difficult to solve.

Shape descriptors can be classified in various ways. The first taxonomy is based on mentioned earlier difference between object boundary and the whole shape. The second very popular classification (as described in [32]) divides descriptors into global approaches (shape represented as a whole) or structural methods (set of primitives). The third one discriminates spatial and transform domains [13]. Since shape is one of the most important features in problems related to content-based image retrieval, there are many known shape representations and retrieval methods; however, most of those methods neither represent shape in a sufficiently precise manner nor are relatively easy in matching. Among them, methods based on moments, polar coordinate system representation, and histogram of distances from the centroid achieve both good representation and easy normalization.

As it was mentioned earlier, there are two main classes of shape descriptors which capture different features, namely, region-based shape representation and contour-based ones. In region based techniques, all pixels within a shape are taken into consideration to obtain final shape representation. Most of region-based methods employ different variants of moments calculation. Contour-based shape representation exploits shape boundary information. Such methods can be classified into global shape descriptors, shape signatures, and spectral descriptors. Although global descriptors such as area, circularity, eccentricity, and axis orientation are simple to compute and also robust in representation, they can only discriminate shapes with large dissimilarities and therefore are usually suitable for filtering purpose. Most shape signatures such as complex coordinates, curvature, and angular representations are essentially local representations of shape features; they are sensitive to noise and not robust. In addition, shape representation using shape signatures requires intensive computation during similarity calculation, due to the hard normalization of rotation invariance. As a result, these representations need often further processing. On the other hand, there are plenty of methods (e.g., MPEG-7 descriptors [33]) that are reported to be very efficient in shape retrieval. Unfortunately, their computational complexity is not justified in the task addressed in this paper.

3.1. Simple Scalar Features

In the shape analysis module we calculate several elementary measures that are later evaluated at the decision stage. These simple properties are later concatenated in a single vector having 19 elements [34]. All of them use the following measures as common values (: minor axis length, : major axis length, : object area defined by the number of pixels, and : object perimeter), which are later employed to build more complex characteristics. Above characteristics were chosen intentionally, since they give a maximal discriminative power in terms of shape classification. All of them are normalized to the interval [0, 1]. It should be also noted that in case of a bounding box (associated with an analysed blob), that is not rotated, is equal to and is equal to .

3.1.1. Roundness

It is computed as an average value of the three following measures , , and : where expresses normalized difference between longest and shortest diameter; incorporates information about object’s area measured in two different ways, as a number of pixels belonging to the area and calculated according to geometrical formula; is analogous to the measure but perimeter is used instead of area.

3.1.2. Squareness

Squareness, as a measure of similarity between analysed object an a perfect square, is calculated according to the following formula:

3.1.3. Number of Vertices

It is calculated as a function of object’s extreme points (in terms of geometry); that is, if an object contains three extremes, then it is considered to be a triangle. The observations show, however, that binary objects extracted from real documents are often noised or distorted; hence finding extremes is very difficult or the number of detected extremes can differ from the actual one. In order to count the extremes, an input object is binarized, and then we select all potential extreme points (their coordinates) and create a new binary matrix containing zeros. Next, we put ones in the cells that match found coordinates. Further we cluster the points to their groups based on their spatial location. At this stage we perform a dilation procedure in order to fill gaps and create groups of neighboring pixels. Then, based on those groups, the number of extremes is computed.

3.1.4. Aspect Ratio

It is a very elementary property that is calculated as the proportion of object’s width and height (in the similar manner to the one presented in Section 2.2):

3.1.5. Extent

This property represents a ratio of the number of pixels inside a rectangle that is circumscribed around an object to the total number of pixels belonging to the object itself:

3.1.6. Moment-Based Properties

Further measures that are later employed include ellipticity, elliptic variance, circular variance, triangularity coefficient (based on central moments), and minimal bounding figure, a coefficient defining a ratio between the area of a smallest shape that is wrapped around the object and this object’s area. They involve the use of moment invariants (of second order), where each contour point is described by its coordinates . The centroid is calculated as a mean of all points coordinates: , and the mean radius is equal to .

Ellipticity proposed in [35] is calculated using the following formula: The moment for a unit radius circle allows computing for a perfect figure. Ellipticity value ranges over [0, 1] and is computed as follows [35]:

Circular variance, as a similarity between an analysed object and a circle, was described in [36]. It is calculated using the following formula:

Elliptic variance responsible for describing ellipses employs the following formula [36]: where is a covariance matrix, while .

Triangularity is based on moment invariants [35] and for an ideal right-angled triangle is equal to ; thus the triangularity of an analysed object is computed as follows (again, giving values from the interval [0, 1]):

Minimal bounding figure coefficient is calculated as a ratio of object’s actual area to the area of an ideal figure that is circumscribed on it. It is called minimal bounding figure and the method of calculation is similar to [35]. The algorithm checks the similarity between each perfect figure (one of the predefined five classes) and an analysed object.

3.1.7. Other Properties

In the shape analysis we employ also a coefficient related to the object’s area understood as the number of pixels belonging to it. The following group of geometrical features is calculated on the basis of object’s area, its shortest radius (span) and its longest radius (span) . If the difference is small enough (in our case, lower than 10 pixels), we assume that the object has circular shape. On the other hand, an object is considered to be a square if a value of from the following equation is close to one: An ellipse is described as a coefficient: If it is close to one, then the object is recognized as an ellipse. In the same manner, the coefficients related to other shapes, like rectangle, triangle, and rhombus (which, for classification purposes, is treated as a square) can be calculated, respectively, as

All of above scalar features are joined into single vector and further used at the classification stage. The feature values for ideal shapes (circle, ellipse, rectangle, square, and triangle) and mean values for shapes from our experimental database are presented in Figure 6. In each plot, the horizontal axis is related to the feature number, while the vertical axis is related to the feature value. The ideal shapes were extracted from images created in painting software, while the mean shapes are averaged objects from our database.

3.2. Vector Features

As for the comparison we selected three popular vector shape descriptors: shape signature, Fourier descriptor, and point distance histogram. All of them have been successfully applied in many shape recognition tasks; hence the comparison is justified.

3.2.1. Shape Signature

Shape signature (SSig) is one of the most popular representations that belong to the contour-based class of descriptors. There are several variants of SSig which employ different features [32]. Here, we use the so-called Centroid Distance Function (CDF). It is easy to calculate and after some normalization can be invariant to scaling and rotation. However it should be noted that, in addition to the high matching cost, shape signatures are sensitive to noise, and slight changes in the boundary can cause large errors in matching. Hence, shape signatures should be stored in a reduced form. In this paper we calculate SSig according to the following algorithm.(1)Calculate the centroid of an object.(2)Detect outer contour of object and store its coordinates in a polar system of coordinates .(3)Find the maximal distance and perform circular shift so the related to this maximum occupies the first position in the vector.(4)Discard the information about ; hence remember only distances .(5)Normalize the vector to the maximal value of .(6)Interpolate the vector containing to the final length of elements.The shape signature feature vectors with 360 bins for ideal shapes (circle, ellipse, rectangle, square, and triangle) and mean vectors for shapes from our experimental database are presented in (Figure 7). In each plot, the horizontal axis is related to the angle between radius vector and the horizontal direction, while the vertical axis is related to the distance from the origin. The descriptors for ideal shapes were extracted from images created in painting software, while the mean shapes are averaged objects from our database.

3.2.2. Fourier Descriptor

There is a whole family of descriptors called Fourier descriptors (FDs). Different shape signatures have been exploited to derive such descriptors. It should be noticed that FDs derived from different signatures can have significantly different effect on the result of retrieval [37]. In this paper we calculate FD according to the following algorithm.(1)Calculate the centroid of an object.(2)Detect outer contour of object and store its coordinates as complex numbers, where coordinate is a real part and coordinate is an imaginary part.(3)Perform fast Fourier transform (FFT) on these values.(4)Normalize FFT spectrum to its maximal magnitude.(5)Remember first elements (related to low frequency components).The Fourier descriptor feature vectors with 16 bins for ideal shapes and mean vectors for our experimental database are presented in (Figure 8). In each plot, the horizontal axis is related to the spectral component, while the vertical axis is related to its amplitude. The descriptors for ideal shapes were extracted from images created in painting software, while the mean objects were averaged over the whole database.

3.2.3. Point Distance Histogram

The point distance histogram (PDH) is a histogram of contour point distances represented in the polar system of coordinates. The algorithm of calculating PDH is as follows [21].(1)Calculate the centroid of an object.(2)Detect outer contour of object and store its coordinates in a polar system of coordinates .(3)Discard the information about ; hence remember only distances .(4)Calculate the histogram of distances with bins.(5)Normalize to the maximal value.

The PDH feature vectors with 360 bins for ideal shapes and mean vectors for our experimental database are presented in (Figure 9). In each plot, the horizontal axis is related to the distance (or distance interval), while the vertical axis is related to the number of points within such distance. As above, the descriptors for ideal shapes were calculated for images created in painting software, while the mean objects are averaged over the whole our database.

4. Experiments

4.1. Experimental Setup

The experiments were aimed at the evaluation of the performance of both stages of processing, namely, the stamp detector and stamp’s shape classifier. Since there are no benchmark databases oriented at this specific problem, the experiments were performed on our own benchmark database consisting of a number of images collected from the Internet. It contains scanned or photographed documents of different origin and variable quality. The details of this database are as follows: documents with no stamps: 294 (41%), with single stamp: 309 (43%), with multiple stamps: 116 (16%), and with logotypes: 367 (51%). Exemplary documents gathered in the database are presented in Figure 10.

4.2. Stamp/No-Stamp Verification

The experiments on stamp verification were performed on 2925 graphical objects extracted from the above presented documents. There are 1589 stamps and 1336 no-stamps (logos, ornaments, and large letters) in the set. They were divided into learning and testing subsets according to the 10-fold cross-validation.

In order to select a most suitable classifier we performed several experiments involving several state-of-the-art approaches, namely, Naive Bayes Classifier (NBC), Linear Logistics Regression (LLR), Multilayer Perceptron (MLP), Support Vector Machines (SVM), Classification and Regression Tree (CART), Random Tree (RT), and -Nearest Neighbors (NN) with and . The rates of True Positive and False Positive for Stamp class are given in Table 1. The analysis of the results showed that the accuracy of SVM is significantly lower than any other classifiers (probably due to high variance). On the other hand, the high True Positive rates for NB and MLP are accompanied by also high False Positive rates. Based on these results at the stage of stamp/no-stamp verification, we selected NN with as a method with the lowest FP rate. During experiments employing 1NN, the algorithm was able to recognize 2456 objects, while the rest 469 were misclassified. For such experimental setup, the detailed results are presented in Table 2.

4.3. Shape Classification

We performed a shape classification on a database created form stamps extracted from the aforementioned documents. The set consists of 2101 binary objects. Exemplary objects are presented in Figure 11. For each object we calculated a scalar-based feature vector of 19 properties and three vector descriptors shown above. In order to select the most discriminative method we tested several modern classifiers and, as for comparison, simple voting classifier. The later uses simple scalar weights equal to one for all simple features, except the following, which have weights equal to 3: , , , , , minimal bounding triangle, minimal bounding circle, and minimal bounding square (from the group of minimal bounding figure). In Table 3 one can observe a comparison of different classifiers versus selected features. The column Simple contains the results of simple shape features described in Section 3.1.

As it can be observed, the performance of all classifiers in case of simple scalar features is superior to selected vector features. It is also visible that the classifiers ensembles based on nearest neighbours can give significantly higher accuracy than any other approach.

Table 4 shows the confusion matrix for shape classification in terms of particular types of stamps. The detailed results show that the highest confusion is observable for shapes pairs, for example, circle versus ellipse and square versus rectangle. It comes from the fact that the boundaries of those pairs of classes may be fuzzy.

4.4. Overall Performance

In the final experiment we wanted to evaluate the overall performance of the proposed approach. It included both stages: stamp detection and shape classification. It was performed on a set of 719 scanned or photographed documents. The results are presented in Table 5.

Tables 4 and 5 prove that the proposed set of algorithms is able to detect and classify most of stamps. A considerable number of missed stamps comes from the fact that the database consists of many problematic cases, that is, stamps which are hardly visible, only partially imprinted, or overlap objects such as pictures, post stamps, or printed text (see examples in Figures 12 and 13).

A special case is stamps containing small tables or just texts. It is almost impossible to distinguish them from actual tables or parts of text. Hence, it leads to the further misclassification. The decrease in classification performance is caused also by the fact that the database contains documents completely without stamps, documents with objects similar to stamps, or actual stamps that do not meet conditions specified in Section 2.2. As it can be seen from the observations, a large part of the original documents has a low contrast. Moreover, many wrongly detected areas (later rejected by the classifier) are simply the artifacts coming from strong compression of the input image.

4.5. Comparison with State-of-the-Art

We compared our approach with some of the works presented in the literature. Since it was impossible to repeat all the experiments presented in the papers, we took the results provided by the authors. They are gathered in Table 6. Unfortunately, not all details of compared algorithms could be obtained. Such cases are marked with a “—” sign. The comparison is divided into three areas: stamp detection, stamp verification, and stamp shape classification. In our comparison we included the numbers of input documents, verified objects (stamps/no-stamps), and extracted stamps. Actually, not all elements can be properly compared, since as for now a comprehensive method has not been presented and the methods given in the literature focus on individual actions only.

A full comparison of stamp detection is hard to perform, since several other methods allow detecting only color stamps. The reported accuracy of detection of our method is lower; however our database is the largest and we detect stamps not taking into account their color or overlapping. On the other hand, the actual accuracy of method presented in [3] is in fact equal to 69%.

As it can be noticed, no method found in the literature performs an appropriate verification step (discrimination between stamps and no-stamps). Our method employs a set of image features to distinguish stamps from objects like logotypes, small tables, and other graphical ornaments.

Finally, our shape recognition algorithm was tested on the largest database among all methods and gave a very competitive results.

5. Summary

In the paper we propose a novel approach for detecting and classifying stamp instances in scanned documents. To our best knowledge, this is the firs comprehensive algorithm that deals with color and monochrome stamps of any shape. It incorporates several methods of image processing, pattern recognition, and some heuristic. The algorithm is multistage, consisting of object detection, stamp verification, and shape classification. During experiments we selected a set of low-complexity features that are invariant to stamp’s color, orientation, scale, and position in the document; hence this approach is independent of the document size and orientation. Such basic low-level features containing elementary shape properties occurred to be superior to several, more complex descriptors. The results of experiments showed also that at the stage of stamp/no-stamp verification the best results gave simple 1NN classifier and at the stage of shape classification, classifiers ensembles utilizing Random Trees. The experiments performed on a large number of real scanned and photographed documents show that the performance is sufficient for practical implementation. Future works may include the investigations on the additional features of stamps, which can give an increase in the overall robustness, especially at the stage of stamp/no-stamp verification. Above approach could also be employed for recognizing different parts of documents (e.g., texts, tables, signatures, logotypes, etc.). The potential area of application of this algorithm includes office software for processing paper documents, content-based document retrieval systems, and various postal services. It could be also employed in cases, when stamps should be automatically covered in order to preserve privacy.

Conflict of Interests

The authors declare that there is no conflict of interests regarding the publication of this paper.