Abstract

An approach based on fractal is presented for extracting affine invariant features. Central projection transformation is employed to reduce the dimensionality of the original input pattern, and general contour (GC) of the pattern is derived. Affine invariant features cannot be extracted from GC directly due to shearing. To address this problem, a group of curves (which are called shift curves) are constructed from the obtained GC. Fractal dimensions of these curves can readily be computed and constitute a new feature vector for the original pattern. The derived feature vector is used in question for pattern recognition. Several experiments have been conducted to evaluate the performance of the proposed method. Experimental results show that the proposed method can be used for object classification.

1. Introduction

The images of an object taken from different viewpoints often suffer from perspective distortions. For this reason, features extracted from the image of an object should be tolerant to an appropriate class of geometric transformation (such as translation, rotation, scaling, and shearing). A perspective transformation between two views can be approximated with an affine transformation if the object is planar and far away from the image plane [1]. Therefore, the extraction of affine invariant features plays a very important role in object recognition and has been found application in many fields such as shape recognition and retrieval [2, 3], watermarking [4], identification of aircrafts [5, 6], texture classification [7], image registration [8], and contour matching [9].

Many algorithms have been developed for affine invariant features extraction [1012]. Based on whether the features are extracted from the contour only or from the whole shape region, the approaches can be classified into two main categories: region-based methods and contour-based methods. Contour-based methods provide better data reduction [13], but they are inapplicable to objects with several separable components. Region-based methods can achieve high accuracy but usually at the expense of high computational demands, for good overviews of the various techniques refer to [1316]. Central projection transformation (CPT) [17] can be used to combine contour-based methods and region-based methods together. However, CPT cannot be used to extract affine invariant features directly. In this paper, we extract affine invariant features by integrating CPT and fractal.

The essential advantage of fractal technique descriptor is that it can greatly speed up computation [17]. Fractal, which is introduced by Mandelbrot [18], has been shown to be one of the most important scientific discoveries in the last century. It proposes a powerful tool for human being to explore the complexity. It can be used to model many classes of time-series data as well as images. The fractal dimension (FD) is an important characteristic of fractals; it contains information about their geometrical structure. Many applications of fractal concepts rely on the ability to estimate the FD of objects. In the area of pattern recognition and image processing, the FD has been used for image compression, texture segmentation, feature extraction [19, 20], and so forth. The utility of fractal to extract rotation invariant features has been invested in [17]. CPT is employed to reduce the dimensionality of the original pattern. A discrete wavelet transformation technique transforms the derived pattern into a set of subpatterns. Consequently, its FD is computed and has been used as the feature vectors. A satisfying classification rate has been achieved in the recognition of rotated English letters, Chinese characters, and handwritten signatures. For more details, please refer to those papers.

However, the approach presented in [17] is hard to be used to extract invariant features for general affine transformation. A general affine transformation not only includes rotation, scaling, and translation but also includes shearing. That is to say, a circle may be transformed into an eclipse. Figure 1(a) is an image of a circle. Figure 1(b) is a scale and rotation version of circle in Figure 1(a). Figure 1(c) is an affine transformation version of circle in Figure 1(a). It can be calculated that FD of curve derived from the circle in Figure 1(a) by CPT is , FD of curve derived from the circle in Figure 1(b) by CPT is , while FD of curve derived from the eclipse in Figure 1(c) is . That is to say, FD can not be used to extract affine invariant features directly. To address this problem, a group of curves (which are called shift curves) are constructed from the closed curve derived by CPT in this paper. FDs of these curves can readily be computed and constitute a new feature vector for the original pattern. Several experiments have been conducted to evaluate the performance of the proposed method. Experimental results show that the constructed affine invariant feature vector can be used for object classification.

The rest of the paper is organized as follows. In Section 2, some basic concepts about CPT are introduced. The method for the extraction of affine invariant features is provided in Section 3. The performance of the proposed method is evaluated experimentally in Section 4. Finally, some conclusion remarks are provided in Section 5.

2. CPT and Its Properties

This section is devoted to providing some characteristics of CPT. In CPT, any object can be converted to a closed curve of the object by taking projection along lines from the centroid with different angles. Consequently, any object can be transformed into a single contour. In addition, the derived single contour also has affine property.

2.1. The CPT Method

Firstly, we translate the origin of the reference system to the centroid of the image. To perform CPT, the Cartesian coordinate system should be transformed to polar coordinate system. Hence, the shape can be represented by a function of and , namely, , where and . After the transformation of the system, the CPT is performed by computing the following integral: where .

Definition 1. For an angle , is given in (1) and denotes a point in the plane of . Letting go from 0 to , then forms a closed curve. We call this closed curve the general contour (GC) of the object.

For an object , we denote the GC extracted from it by CPT as . The GC of an object has the following properties: single contour, affine invariant.

By (1), an angle corresponds to a single value . Consequently, GC can be derived from any object by employing CPT. For instance, Figure 2(a) shows the image of Chinese character “Yang”, which consists of several components. Figure 2(b) shows the GC of Figure 2(a). The object has been concentrated into an integral pattern, and a single contour has been extracted.

In real life, many objects consist of several separable components (such as Chinese character “Yang” in Figure 2(a)). Contour-based methods are unapplicable to these objects. By CPT, a single closed curve can be derived, and contour-based methods can be applied. Consequently, shape representation based on GC of the object may provide better data reduction than some region-based methods.

2.2. Affine Invariant of GC

An affine transformation of coordinates is defined as where and is a two-by-two nonsingular matrix with real entries.

Affine transformation maps parallel lines onto parallel lines and intersecting lines into intersecting lines. Based on this fact, it can be shown that the GC extracted from the affine-transformed object by CPT is also an affine-transformed version of GC extracted from the original object.

Consider two objects and are related by an affine transformation :

Then and , GCs of and , are related by the same affine-transformation :

For example, Figure 3(a) shows an affine transformed version of Figures 2(a) and 3(b) shows the GC derived from Figure 3(a). Observing the two GCs in Figures 2(b) and 3(b), we can see that CPT not only represents the distribution information of the object but also preserves the affine transformation signature.

Therefore, to see whether an object is the affine transform version of , we just need to check if , the GC of , is the same affine-transformed version of . We extract affine invariant features using fractal from GC of the object.

3. Extraction of Affine Invariant Features Using Fractal

By CPT, a closed curve can be derived from any object. In order to extract affine invariant features from the derived GC, the GC should firstly be parameterized. Thereafter, shift curves are constructed from the parameterized GC. Consequently, divider dimensions of these curves are computed to form feature vectors.

3.1. Affine Invariant Parameterization

GC should be parameterized to establish one-to-one relation between points on GC of the original object and those on GC of its affine transformed version.

There are two parameters which are linear under an affine transformation: the affine arc length [21], and the enclosed area [22]. These two parameters can be made completely invariant by simply normalizing them with respect to either the total affine arc length or the enclosed area of the contour. In the discrete case, the derivatives can be calculated using finite difference equations. The curve normalization approach used in this paper is the same as the method given in [23]. In the experiments of this paper, GC is normalized and resampled such that .

Suppose that GC of the object has been normalized and resampled. Furthermore, we suppose that the starting point on GC of the original object and that on GC of the affine-transformed version of the original object are identical. Then, a parametric point on GC of the original object and a parametric point on GC of its affine transformed version satisfy the following equation:

3.2. Shift Curves

In this part, we will derive invariant features from the normalized GC. Let and , be the parametric equations of two GCs derived from objects that differ only by an affine transformation. For simplicity, in this subsection, we assume that the starting points on both GCs are identical. After normalizing and resampling, there is a one-to-one relation between and . We use the object centroid as the origin, then translation factor is elimated. Equation (2) can be written in matrix form as .

Letting be an arbitrary positive constant, then is a shift version of . We denote as the zero moment of the object: We define the following function: We call as shift curve of the object. Figure 4(a) shows a 10-shift curve of the Chinese character given in Figure 2(a). As a result of normalizing and resampling, , , and , , satisfy the following equation: It follows that In other words, given in (7) is affine invariant.

Note that, after affine transformation, the starting point of GC is different. Figure 4(b) shows the 10-shift curve of the affine transformed version Chinese character given in Figure 3(a). We observe that the shift curve of an affine transformation version of the object (see Figure 4(b)) is a translated version of the shift curve of the original object (see Figure 4(a)).

3.3. Computing Divider Dimension of Shift Curves

The FD is a useful method to quantify the complexity of feature details present in an image. In this subsection, we shall discuss the problem of computing the divider dimension of shift curves and, thereafter, use the computed divider dimension to construct a feature vector for the original two-dimensional pattern in question for pattern recognition.

Fractals are mathematical sets with a high degree of geometrical complexity, which can model many classes of time series data as well as images. The FD is an important characteristic of the fractals because it contains information about its geometric structures. When employing fractal analysis researchers typically estimate the dimension from an image. Of the wide variety of FDs in use, the definition of Hausdorff is the oldest and probably the most important. Hausdorff dimension has the advantage of being defined for any set and is mathematically convenient, as it is based on measures, which are relatively easy to manipulate. A major disadvantage is that in many cases it is hard to calculate or estimate by computation methods.

In general, the dimension of a set can be found by the equation where is the dimension and is the number of parts comprising the set, each scaled down by a ratio from the whole [18].

In what follows, we use the notion of divider dimension of a nonself-intersecting curve (see [24, 25] etc.). Suppose that is a nonself-intersecting curve and . Let be the maximum number of ordered sequence of points on curve , such that for . The divider dimension of curve is defined as follows: where represents the magnitude of the difference between two vectors and .

It should also be mentioned that is not necessarily the end point of curve , , but . Furthermore, may be viewed as the length of curve as measured using a pair of dividers that are set distance apart.

Since the divider dimension of nonself-intersecting curves is asymptotic values, we derive their approximations based on the following expression in our experiments: where is set small enough.

The divider dimension of shift curve in Figure 4(a) is and that of shift curve in Figure 4(b) is . In the experiments of this paper, divider dimensions of shift curves are computed. The feature vector is constituted

4. Experiment

In this section, we evaluate the discriminate ability of the proposed method. In the first experiment, we examine the proposed method by using some airplane images. Object contours can be derived from these images. In the second experiment, we evaluate the discriminate ability of the proposed method by using some Chinese characters. These characters have several separable components, and contours are not available for these objects.

In the following experiments, the classification accuracy is defined as where denotes the number of correctly classified images and denotes the total number of images applied in the test. Affine transformations are generated by the following matrix [5]: where denote the scaling and rotation transformation, respectively, and denote the skewing transformation. To each object, the affine transformations are generated by setting the parameters in (15) as follows: , , , and . Therefore, each image is transformed 168 times.

4.1. Binary Image Classification

In these experiments, we examine the discrimination power of the proposed method using 40 Chinese characters shown in Figure 5. These Chinese characters are with regular script font, and the images have size in the experiments. We observe that some characters in this database have the same structures, but the number of strokes or the shape of specific stokes may be a little different. Some characters consist of several separable components. As aforementioned, each character image is transformed 140 times. That is to say, the test is repeated 5600 times. Experiments on these Chinese characters in Figure 5 and their affine transformations show that 98.14% accurate classification can be achieved by the proposed method.

The images are sometimes noised for reasons in many real-life recognition situations. The robustness of the proposed method is tested using binary image in this part. We add salt and pepper noise to the transformed binary images. We compare the proposed method with two region-based methods, namely, the AMIs and MSA. The comparative methods are described in [26, 27], and these methods are implemented as discussed in those articles. 3 AMIs and 29 MSA invariants are selected for recognition. The nearest neighbor classifier is applied for AMIs and MSA methods. We firstly add the salt and pepper noise with intensities varying from 0.005 to 0.03 to the transformed images.

Table 1 shows the classification accuracies of all methods in the corresponding noise degree. We can observe that the classification accuracy of AMIs decreases rapidly from noise-free condition to small noise degree. The classification accuracy decreases from 91.70% to less than 50% when the noise intensity is 0.010. MSA performs much better than AMIs, but the results are not satisfying. To large noise degrees, the proposed method keeps high accuracies all the time.

4.2. Gray Image Classification

In this part, the well-known Columbia Coil-20 database [28], which contains 20 different objects shown in Figure 6, is applied in this experiment. To each object, the affine transformations are generated by setting the parameters in (15) as aforementioned. Therefore, each image is transformed 140 times. That is to say, the test is repeated 2800 times using every method. The classification accuracies of the proposed method, AMIs, and MSA in this situation are 96.00%, 100%, and 95.31%, respectively. The results indicate that AMIs perform best in this test, and the proposed method is similar with MSA.

The effect of adding different kinds of noises is also studied. The noise is added to the affine-transformed images before recognition.

We firstly add the salt and pepper noise with intensities varying from 0.005 to 0.03 to the transformed images. Table 2 shows the classification accuracies of all methods in the corresponding noise degree. We can observe that the classification accuracy of AMIs decreases rapidly from noise-free condition to small noise degree. The classification accuracy decreases from 100% to less than 50% when the noise intensity is 0.010. MSA performs much better than AMIs, but the results are not satisfying. To large noise degrees, the proposed method keeps high accuracies all the time.

In addition, we add the Gaussian noise with zero mean and different variance varying from 0.005 to 0.03 to the transformed images. Table 3 shows the classification accuracies of all methods in the corresponding noise degree. The results indicate that AMIs and MSA are much more sensitive to Gaussian noise than salt and pepper noise. However, the classification accuracies of the proposed method outperform AMIs and MSA in every noise degree.

The experimental results tell us that the proposed method presents better performances in noise situations. The reason may lie in that CPT is robust to noise. It was shown in [29] that Radon transform is quite robust to noise. We can similarly show that GC derived by CPT from the object is robust to additive noise as a result of summing pixel values to generate GC.

5. Conclusions

In this paper, affine invariant features are extracted by using fractal. A closed curve, which is called GC, is derived from the original input pattern by employing CPT. Due to shearing, affine invariant features cannot be extracted from GC directly. To address this problem, a group of curves (which are called shift curves) are constructed from the obtained GC. Fractal dimensions of these curves can readily be computed and constitute a new feature vector for the original pattern. The derived feature vector is used for object classification tasks. Several experiments have been conducted to evaluate the performance of the proposed method.

Although satisfying results have been achieved in object classification tasks, some remarks should be made. The performance of CPT depends strongly on the accuracy calculation of the centroid. We are working towards developing method without the centroid. Furthermore, some characteristics of CPT should be further studied.

Acknowledgments

This work was supported in part by the National Science Foundation under Grant 60973157, and Ming Li thanks the supports in part by the 973 plan under the project Grant no. 2011CB302800 and by the National Natural Science Foundation of China under the project Grant nos. 61272402, 61070214, and 60873264.