Abstract

Handwritten digit recognition plays a significant role in many user authentication applications in the modern world. As the handwritten digits are not of the same size, thickness, style, and orientation, therefore, these challenges are to be faced to resolve this problem. A lot of work has been done for various non-Indic scripts particularly, in case of Roman, but, in case of Indic scripts, the research is limited. This paper presents a script invariant handwritten digit recognition system for identifying digits written in five popular scripts of Indian subcontinent, namely, Indo-Arabic, Bangla, Devanagari, Roman, and Telugu. A 130-element feature set which is basically a combination of six different types of moments, namely, geometric moment, moment invariant, affine moment invariant, Legendre moment, Zernike moment, and complex moment, has been estimated for each digit sample. Finally, the technique is evaluated on CMATER and MNIST databases using multiple classifiers and, after performing statistical significance tests, it is observed that Multilayer Perceptron (MLP) classifier outperforms the others. Satisfactory recognition accuracies are attained for all the five mentioned scripts.

1. Introduction

The field of automated reading of printed or handwritten documents by the electronic devices is known as Optical Character Recognition (OCR) system, which is broadly defined as the process of recognizing either printed or handwritten text from document images and converting it into electronic form. OCR systems can contribute tremendously to the advancement of the automation process and can improve the interaction between man and machine in many applications, including office automation, bank check verification, postal automation, and a large variety of business and data entry applications. Handwritten digit recognition is the method of recognizing and classifying handwritten digits from 0 to 9 without human interaction [1]. Although the recognition of handwritten numerals has been studied for more than three decades and many techniques with high accuracy rates have already been developed, the research in this area continues with the aim of improving the recognition rates further.

Handwritten digit recognition is a complex problem due to the fact that variation exists in writing style of different writers. The phenomenon that makes the problem more challenging is the inherent variation in writing styles at different instances. Due to this reason, building a generic recognizer that is capable of recognizing handwritten digits written by diverse writers is not always feasible [2]. However, the extraction of the most informative features with highly discriminatory ability to improve the classification accuracy with reduced complexity remains one of the most important problems for this task. It is a task of great importance for which there are standard databases that allow different approaches to be compared and validated.

India is a multilingual country with 23 constitutionally recognized languages written in 12 major scripts [1]. Besides these, hundreds of other languages are used in India, each one with a number of dialects. The officially recognized languages are Hindi, Bengali, Punjabi, Marathi, Gujarati, Oriya, Sindhi, Assamese, Nepali, Urdu, Sanskrit, Tamil, Telugu, Kannada, Malayalam, Kashmiri, Manipuri, Konkani, Maithili, Santhali, Bodo, English, and Dogri. The 12 major scripts used to write these languages are Devanagari, Bangla, Oriya, Gujarati, Gurumukhi, Tamil, Telugu, Kannada, Malayalam, Manipuri, Roman, and Urdu. In a multilingual country like India, it is a common scenario that a document like job application form, railway ticket reservation form, and so forth is composed of text contents written in different languages/scripts in order to reach a larger cross section of people. The variation of different scripts may be in the form of numerals or alpha numerals in a single document page. But the techniques developed for text identification generally do not incorporate the recognition of digits. This is because the features required for the text identification may not be applicable for identifying the digits.

The paper is organized as follows: Section 2 presents a brief review of some of the previous approaches to handwritten digit recognition whereas, in Section 3, we introduce our script independent handwritten digit recognition system. Section 4 describes the performance of our system on realistic databases of handwritten digits and, finally, Section 5 concludes the paper.

Gorgevik and Cakmakov [3] developed Support Vector Machine (SVM) based digits recognition system for handwritten Roman numerals. They extracted four types of features from each digit image: (1) projection histograms, (2) contour profiles, (3) ring-zones, and (4) Kirsch features. They reported 97.27% recognition accuracy on National Institute of Standards and Technology (NIST) handwritten digits database [4]. In [5], Chen et al. proposed max-min posterior pseudoprobabilities framework for Roman handwritten digit recognition. They extracted 256 dimension directional features from the input image. Finally, these features were transformed into a set of 128 features using Principal Component Analysis (PCA). They reported recognition accuracy of 98.76% on NIST database [4]. Labusch et al. [6] described a sparse coding based feature extraction method with SVM as a classifier. They found recognition accuracy of 99.41% on MNIST (Modified NIST) handwritten digits database [7]. The work described in [8] combined three recognizers by majority vote, and one of them is based on Kirsch gradient (four orientations), dimensionality reduction by PCA, and classification by SVM. They achieved an accuracy rate of 95.05% with 0.93% error on 10,000 test samples of MNIST database [7]. Mane and Ragha [9] performed handwritten digit recognition using elastic image matching technique based on eigendeformation, which is estimated by the PCA of actual deformations automatically selected by the elastic matching. They achieved an overall accuracy of 94.91% on their own database collected from different individuals of various professions for the experiment. Cruz et al. [10] presented a handwritten digit recognition system which uses multiple feature extraction methods and classifier ensemble. A total of six feature extraction algorithms, namely, Multizoning, Modified Edge Maps, Structural Characteristics, Projections, Concavities Measurements, and Gradient Directional, were evaluated in this paper. A scheme using neural networks as a combiner achieved a recognition rate of 99.68% on a training set of 60,000 images and a test set of 10,000 images of MNIST database.

Dhandra et al. [11] investigated a script independent automatic numeral recognition system for recognition of Kannada, Telugu, and Devanagari handwritten numerals. In the proposed method, 30 classes were reduced to 18 classes by extracting the global and local structural features like directional density estimation, water reservoirs, maximum profile distances, and fill-hole density. Finally, a probabilistic neural network (PNN) classifier was used for the recognition system which yielded an accuracy of 97.20% on a total of 2550 numeral images written in Kannada, Telugu, and Devanagari scripts. In [12], Yang et al. proposed supervised matrix factorization method used directly as multiclass classifier. They reported recognition accuracy of 98.71% with supervised learning approach on MNIST database [7]. In [13], a mixture of multiclass logistic regression models was described. They claimed recognition accuracy of 98% on the Indian digit database provided by CENPARMI [14]. Das et al. [15] described a technique for creating a pool of local regions and selection of an optimal set of local regions from that pool for extracting optimal discriminating information for handwritten Bangla digit recognition. Genetic algorithm (GA) was then applied on these local regions to sample the best discriminating features. The features extracted from these selected local regions were then classified with SVM and recognition accuracy of 97% was achieved. In [16], a wavelet analysis based technique for feature extraction was reported. For classification, SVM and k-Nearest Neighbor (k-NN) were used and an overall recognition accuracy of 97.04% was reported on MNIST digit database [7]. A comparative study in [17] was conducted by training the neural network using Backpropagation (BP) algorithm and further using PCA for feature extraction. Digit recognition was finally carried out using 13 algorithms, neural network algorithm, and the Fisher Discriminant Analysis (FDA) algorithm. The FDA algorithm proved less efficient with an overall accuracy of 77.67%, whereas the BP algorithm with PCA for its feature extraction gave an accuracy of 91.2%.

In [18], a set of structural features (namely, number of holes, water reservoirs in four directions, maximum profile distances in four directions, and fill-hole density) and k-NN classifier were employed for classification and recognition of handwritten digits. They reported recognition accuracy of 96.94% on 5000 samples of MNIST digit database [7]. In [19], AlKhateeb and Alseid proposed an Arabic handwritten digit recognition system using Dynamic Bayesian Network. They employed DCT coefficients based features for classification. The system was tested on Indo-Arabic digits database (ADBase) which contains 70,000 Indo-Arabic digits [20] and an average recognition accuracy of 85.26% was achieved on 10,000 samples. Ebrahimzadeh and Jampour [21] proposed an appearance feature-based approach using Histogram of Oriented Gradients (HOG) for handwritten digit recognition. A linear SVM was then used for classification of the digits in MNIST dataset and an overall accuracy of 97.25% had been realized. Gil et al. [22] presented a novel approach using SVM binary classifiers and unbalanced decision trees. Two classifiers were proposed in this study where one used the digit characteristics as input and the other used the whole image as such. It is observed that a handwritten digit recognition accuracy of 100% was achieved on MNIST database using the whole image as input. El Qacimy et al. [23] investigated the effectiveness of four feature extraction approaches based on Discrete Cosine Transform (DCT), namely, DCT upper left corner (ULC) coefficients, DCT zigzag coefficients, block based DCT ULC coefficients, and block based DCT zigzag coefficients. The coefficients of each DCT variant were used as input data for SVM classifier and it was found that block based DCT zigzag feature extraction yielded a superior recognition accuracy of 98.76% on MNIST database. AL-Mansoori [24] implemented a MLP classifier to recognize and predict handwritten digits. A dataset of 5000 samples were obtained from MNIST database and an overall accuracy of 99.32% was achieved.

From the above literature, it is clear that most of the works have been done for the Roman script, whereas relatively few works [11, 15, 19] have been reported for the digit recognition written in Indic scripts. The main reasons for this slow progress could be attributed to the complexity of the shape of Indic scripts as opposed to Roman script. Again, the discriminating power of the features exploited till now is not easily measurable; investigative experimentations will be necessary for identifying new feature descriptors for effective classification of complex handwritten digits of different scripts. It is also revealed that the methods, described in the literature, suffer from larger computational time mainly due to feature extraction from large dataset. In addition, the above recognition systems fail to meet the desired accuracy when exposed to different multiscript scenario. Hence, it would be beneficial for multilingual country like India if there is a method which is independent of script and yields reasonable recognition accuracy. This has motivated us to introduce a script invariant handwritten digit recognition system for identifying digits written in five popular scripts, namely, Indo-Arabic, Bangla, Devanagari, Roman, and Telugu. The key module of the proposed methodology is shown in Figure 1.

3. Feature Extraction Methodology

One of the basic problems in the design of any pattern recognition system is the selection of a set of appropriate features to be extracted from the object of interest. Research on the utilization of moments for object characterization in both invariant and noninvariant tasks has received considerable attention in recent years. Describing digit images with moments instead of other more commonly used pattern recognition features (described in [2123]) means that global properties of the digit image are used rather than local properties. So, for the present work, we considered a moment based approach which is described in the next subsection.

3.1. Moments

Moments are pure statistical measure of pixel distribution around the center of gravity of the image and allow capturing global shapes information [25]. They describe numerical quantities at some distance from a reference point or axis. Moments are commonly used in statistics to characterize the distribution of random variables and, similarly, in mechanics to characterize bodies by their spatial distribution of mass.

A complete characterization of moment functional over a class of univariate functions was given by Hausdorff [26] in 1921.

Let be a real sequence of numbers and let us defineNote that can be viewed as the th order derivative of .

By the Hausdorff theorem, a necessary and sufficient condition that there exists a monotonic function satisfying the systemis that the system of linear inequalitiesshould be satisfied; that is, if is a positive function (in case of image processing), then the set of functionalscompletely characterizes the function.

A necessary and sufficient condition that there exists a function of bounded variation satisfying (7) is that the sequenceshould be bounded. The use of moments for image analysis is straightforward if we consider a binary or gray level image segment as a two-dimensional density distribution function. It can be assumed that an image can be represented by a real-valued measurable function . In this way, moments may be used to characterize an image segment and extract properties that have analogies in statistics and mechanics. In image processing and computer vision, an image moment is a certain particular weighted average (moment) of the image pixels’ intensities or a function of such moments, usually chosen to have some attractive property or interpretation. The first significant work considering moments for pattern recognition was performed by Hu [27]. He derived relative and absolute combinations of moment values that are invariant with respect to scale, position, and orientation based on the theories of invariant algebra that deal with the properties of certain classes of algebraic expressions which remain invariant under general linear transformations. Size invariant moments are derived from algebraic invariants but can be shown to be the result of simple size normalization. Translation invariance is achieved by computing moments that have been translated by the negative distance to the centroid and thus normalized so that the center of mass of the distribution is at the origin (central moments).

3.2. Geometric Moments

Geometric moments are defined as the projection of the image intensity function onto the monomial [25]. The th order geometric moment of a gray level image is defined aswhere . Note that the monomial product is the basis function for this moment definition. A set of moments consists of all ’s for ; that is, the set contains elements. If is piecewise continuous and contains nonzero values only in a finite region of the -plane, then the moment sequence is uniquely determined by and, conversely, is uniquely determined by . Considering the fact that an image segment has finite area or in the worst case is piecewise continuous, moments of all orders exist and a complete moment set can be computed and used uniquely to describe the information contained in the image. However, obtaining all the information contained in the image requires an infinite number of moment values. Therefore, to select a meaningful subset of the moment values that contain sufficient information to characterize the image uniquely for a specific application becomes very important. In case of a digital image of size , the double integral in (6) is replaced by a summation which turns into this simplified form:where are integers.

When changes by translating, rotating, or scaling, then the image may be positioned such that its center of mass (COM) is coincided with the origin of the field of view, that is, () and () and then the moments computed for that object are referred to as central moment [25] and it is designated by . The simplified form of central moment of order is defined as follows: where and .

The pixel point is the COM of the image. The central moments computed using the centroid of the image are equivalent to whose center has been shifted to centroid of the image. Therefore, the central moments are invariant to image translations. Scale invariance can be obtained by normalization. The normalized central moments denoted by are defined as where for .

The second order moments, known as the moments of inertia, may be used to determine an important image feature called orientation [25]. Here, the feature values F1–F3 have been computed from moments of inertia of the word images. In general, the orientation of an image describes how the image lies in the field of view or the directions of the principal axes. In terms of moments, the orientation of the principal axis, , taken as feature value F4, is given by where is the angle of the principal axis nearest to the -axis and is in the range . The minimum and maximum distances ( and ) between the centroid and the boundary of an image are also feature descriptors. The ratio is called elongation or eccentricity (F5) and can be defined in terms of central moments as follows:

3.3. Moment Invariants

Based on the theory of algebraic invariants, Hu [27] derived relative and absolute combinations of moments that are invariant with respect to scale, position, and orientation. The method of moment invariants is derived from algebraic invariants applied to the moment generating function under a rotation transformation. The set of absolute moment invariants consists of a set of nonlinear combinations of central moment values that remain invariant under rotation. A set of seven invariant moments can be derived based on the normalized central moments of order three that are invariant with respect to image scale, translation, and rotation. Consider This set of moments is invariant to translation, scale change, mirroring (within a minus sign), and rotation. The 2D moment invariant gives seven features (F6–F12) which had been used for the current work.

3.4. Affine Moment Invariants

The affine moment invariants are derived to be invariants to translation, rotation, and scaling of shapes and under 2D Affine transformation. The six affine moment invariants [28] used for the present work are defined as follows:A total of 6 features (F13–F18) is extracted from each of the handwritten digit images for the present work.

3.5. The Legendre Moment

The 2D Legendre moment [29] of order of an object with intensity function is defined as follows:where the kernel function denotes the th-order Legendre polynomial and is given bywhereSince the Legendre polynomials are orthogonal over the interval [] [20], a square image of pixels with intensity function , with , , must be scaled to be within the region , . The graphical plot for first 10 Legendre polynomials is shown in Figures 2(a)-2(b). When an analog image is digitized to its discrete form, the 2D Legendre moments , defined by (14), is usually approximated by the formula:where and , and, for a binary image, is given asAs indicated by Liao and Pawlak [30], (17) is not a very accurate approximation of (14). For achieving better accuracy, they proposed to use the following approximated form: wherewith and .

To evaluate the double integral defined by (20), an alternative extended Simpson rule was proposed by Liao and Pawlak. These values were then used to calculate the 2D Legendre moments defined by (19). Therefore, this method requires a large number of computing operations. As one can see, can be expressed with the help of a useful formula that will be given below as a linear combination of , with , .

A set of 10 Legendre moments (F19–F28) can also be derived based on the set of invariant moments found in the previous subsection:

3.6. Zernike Moments

Zernike polynomials are orthogonal series of basis functions normalized over a unit circle. The complexity of these polynomials increases with increasing polynomial order [31]. To calculate the Zernike moments, the image (or region of interest) is first mapped to the unit disc using polar coordinates, where the center of the image is the origin of the unit disc. The pixels falling outside the unit disc are not considered here. The coordinates are then described by the length of the vector from the origin to the coordinate point. The mapping from Cartesian to polar coordinates is defined as follows:whereAn important attribute of the geometric representations of Zernike polynomials is that lower order polynomials approximate the global features of the shape/surface, while the higher ordered polynomials capture local shape/surface features. Zernike moments are a class of orthogonal moments and have been shown to be effective in terms of image representation.

Zernike introduced a set of complex polynomials which forms a complete orthogonal set over the interior of the unit circle; that is, . Let the set of these polynomials be denoted by . The form of these polynomials is as follows:where: positive integer or zero,: positive and negative integers subject to constraints even, ,: length of vector from origin to pixel,: angle between vector and -axis in counterclockwise direction.As mentioned above, the complex Zernike moments of order with repetition for a continuous image function are defined as follows:in the image plane where and indicates the complex conjugate. Note that, for the moments to be orthogonal, the image must be scaled within a unit circle centered at the origin andin polar coordinates. The Zernike moment of the rotated image in the same coordinates is given byBy change of variable, ,Equation (28) shows that Zernike moments have simple rotational transformation properties; each Zernike moment merely acquires a phase shift on rotation. This simple property leads to the conclusion that the magnitudes of the Zernike moments of a rotated image function remain identical to those before rotation. Thus, the magnitude of the Zernike moment, , can be taken as a rotation invariant feature of the underlying image function. The real-valued radial polynomial is defined as follows:where = even and .

Zernike moments may also be derived from conventional moments as follows:Zernike moments may be more easily derived from rotational moments, , byWhen computing the Zernike moments, if the center of a pixel falls inside the border of unit disk , this pixel will be used in the computation; otherwise, the pixel will be discarded. Therefore, the area covered by the moment computation is not exactly the area of the unit disk. Advantages of Zernike moments can be summarized as follows:(1)The magnitude of Zernike moment has rotational invariant property.(2)They are robust to noise and shape variations to some extent.(3)Since the basis is orthogonal, they have minimum redundant information.(4)An image can better be described by a small set of its Zernike moments than any other types of moments such as geometric moments.(5)A relatively small set of Zernike moments can characterize the global shape of pattern. Lower order moments represent the global shape of pattern whereas the higher order moments represent the details.Therefore, we choose Zernike moments as our shape descriptor in digit recognition process. Table 1 lists the rotation invariant Zernike moment features (F29–F64) and their corresponding numbers from order 0 to order 10 used for the present work.

The defined features on the Zernike moments are only rotation invariant. To obtain scale and translation invariance, the digit image is first subjected to a normalization process using its regular moments. The rotation invariant Zernike features are then extracted from the scale and translation normalized image.

3.7. Complex Moments

The notion of complex moments was introduced in [32] as a simple and straightforward technique to derive a set of invariant moments. The two-dimensional complex moments of order for the image function are defined bywhere and are nonnegative integers and . Some advantages of the complex moments can be described as follows:(1)When the central complex moments are taken as the features, the effects of the image’s lateral displacement can be eliminated.(2)A set of complex moment invariants can also be derived which are invariant to the rotation of the object.(3)Since the complex moment is an intermediate step between ordinary moments and moment invariants, it is relatively more simple to compute and more powerful than other moment features in any pattern classification problem.The complex moments of order () are a linear combination with complex coefficients of all the geometric moments satisfying . In polar coordinates, the complex moments of order can be written as follows:where and denote the order and repetition of the complex moments, respectively. If the complex moment of the original image and that of the rotated image in the same polar coordinates are denoted by and , the relationship [33] between them is given as follows:where is the angle at which the original image is rotated. The complex moment features represent the invariant properties to lateral displacement and rotation. Based on the definition of moment invariants, we know that as the image is rotated, each complex moment goes through all possible phases of a complex number while its magnitude remains unchanged. If the exponential factor of the complex moment is canceled out, we will obtain its absolute invariant value, which is invariant to the rotation of the images. The rotation invariant complex moment features (F65–F130) and their corresponding numbers from order 0 to order 10 used for the present work are listed in Table 2.

Finally, a feature vector consisting of 130 moment based features is calculated from each of the handwritten numeral images belonging to five different scripts. Summarization of the overall moment based feature set used in the present work is enlisted in Table 3.

4. Experimental Study and Analysis

In this section, we present the detailed experimental results to illustrate the suitability of moment based approach to handwritten digit recognition. All the experiments are implemented in MATLAB 2010 under a Windows XP environment on an Intel Core2 Duo 2.4 GHz processor with 1 GB of RAM and performed on gray-scale digit images. The accuracy, used as assessment criteria for measuring the performance of the proposed system, is expressed as follows:

4.1. Detailed Dataset Description

Handwritten numerals from five different popular scripts, namely, Indo-Arabic, Bangla, Devanagari, Roman, and Telugu, are used in the experiments for investigating the effectiveness of the moment based feature sets as compared to conventional features. Indo-Arabic or Eastern-Arabic is widely used in the Middle-East and also in the Indian subcontinent. On the other hand, Devanagari and Bangla are ranked as the top two popular (in terms of the number of native speakers) scripts in the Indian subcontinent [34]. Roman, originally evolved from the Greek alphabet, is spoken and used all over the world. Also, Telugu, one of the oldest and popular South Indian languages of India, is spoken by more than 74 million people [34]. It essentially ranks third by the number of native speakers in India.

The present approach is tested on the database named as CMATERdb3, where CMATER stands for Center for Microprocessor Application for Training Education and Research, a research laboratory at Computer Science and Engineering Department of Jadavpur University, India, where the current research activity took place. db stands for database, and the numeric value 3 represents handwritten digit recognition database stored in the said database repository. The testing is currently done on four versions of CMATERdb3, namely, CMATERdb3.1.1, CMATERdb3.2.1, CMATERdb3.3.1, and CMATERdb3.4.1 representing the databases created for handwritten digit recognition system for four major scripts, namely, Bangla, Devanagari, Indo-Arabic, and Telugu, respectively.

Each of the digit images are first preprocessed using basic operations of skew corrections and morphological filtering [25] and then binarized using an adaptive global threshold value computed as the average of minimum and maximum intensities in that image. The binarized digit images may contain noisy pixels which have been removed by using Gaussian filter [25]. A well-known algorithm known as Canny Edge Detection algorithm [25] is then applied for smoothing the edges of the binarized digit images. Finally, the bounding rectangular box of each digit image is separately normalized to 32 × 32 pixels. Database is made available freely in the CMATER website (http://www.cmaterju.org/cmaterdb.htm) and at http://code.google.com/p/cmaterdb/.

A dataset of 3000 digit samples is considered for each of the Devanagari, Indo-Arabic, and Telugu scripts. For each of these datasets, 2000 samples are used for training purpose and the rest of the samples are used for the test purpose, whereas a dataset of 6000 samples is used by selecting 600 samples for each of 10-digit classes of handwritten Bangla digits. A training set of 4000 samples and a test set of 2000 samples are then chosen for Bangla numerals by considering equal number of digit samples from each class. For Roman numerals, a dataset of 6000 training samples is formed by random selection from the standard handwritten MNIST [7] training dataset of size 60,000 samples. In the same way, 4000 digit samples are selected from MNIST test dataset of size 10,000 samples. These digit samples are enclosed in a minimum bounding square and are normalized to 32 × 32 pixels dimension. Typical handwritten digit samples taken from the abovementioned databases used for evaluating the present work are shown in Figure 3.

4.2. Recognition Process

To realize the effectiveness of the proposed approach, our comprehensive experimental tests are conducted on the five aforementioned datasets. A total of 6,000 (for Devanagari, Indo-Arabic, and Telugu scripts) numerals have been used for the training purpose whereas the remaining 3000 numerals (1000 from each of the script) have been used for the testing purpose. For Bangla and Roman scripts, a total of 8,000 numerals (4000 taken from each script) have been used for the training purpose whereas the remaining 4,000 numerals (2000 taken from each script) have been used for the testing purpose. The designed feature set has been individually applied to eight well-known classifiers, namely, Naïve Bayes, Bayes Net, MLP, SVM, Random Forest, Bagging, Multiclass Classifier, and Logistic. For the present work, the following abovementioned classifiers with the given parameters are designed:Naïve Bayes: Naïve Bayes classifier: for details, refer to [35].Bayes Net: Estimator = SimpleEstimator-A 0.5, search algorithm = K2.MLP: Learning Rate = 0.3, Momentum = 0.2, Number of Epochs = 1000, minerror = 0.02.SVM: Support Vector Machine using radial basis kernel with (): for details, refer to [36].Random Forest: Ensemble classifier that consists of many decision trees and outputs the class that is the mode of the classes output by individual trees: for details, refer to [37].Bagging: Bagging Classifier: for detail, refer to [38].Multiclass Classifier: Method = “1 against all,” randomWidthFactor = 2.0, seed = 1.Logistic: LogitBoost is used with simple regression functions as base learner: for details, refer to [39].The design parameters of classifiers are chosen as typical values used in the literature or by experience. The classifiers are not specifically tuned for the dataset at hand even though they may achieve a better performance with another parameter set, since the goal is to design an automated handwritten digit recognition system based on the chosen set of classifiers.

The digit recognition performances of the present technique using each of these classifiers and their corresponding success rates achieved at 95% confidence level are shown in Figures 4(a)-4(b), respectively. It can be seen from Figure 4 that the highest digit recognition accuracy has been achieved by the MLP classifier which are found to be 99.3%, 99.5%, 98.92%, 99.77%, and 98.8% on Indo-Arabic, Bangla, Devanagari, Roman, and Telugu scripts, respectively. The performance analysis involves two parameters, namely, Model Building Time (MBT) and Recognition Time (RT). MBT is based on the time required to train the system on the given training samples whereas RT is based on the time required to recognize the given test samples. The MBT and RT required for the abovementioned classifiers on all the five databases are shown in Figures 5(a)-5(b).

4.3. Statistical Significance Tests

The statistical significance test is one of the essential ways for validating the performance of the multiple classifiers using multiple datasets. To do so, we have performed a safe and robust nonparametric Friedman test [40] with the corresponding post hoc tests on Indo-Arabic script database. For the present experimental setup, the number of datasets () and the number of classifiers () are set as 12 and 8, respectively. These datasets are chosen randomly from the test set. The performances of the classifiers on different datasets are shown in Table 4. On the basis of these performances, the classifiers are then ranked for each dataset separately, the best performing algorithm gets the rank 1, the second best gets rank 2, and so on (see Table 4). In case of ties, average ranks are assigned to the classifiers to break the tie.

Let be the rank of the th classifier on th dataset. Then, the mean of the ranks of the th classifier over all the datasets will be computed as follows:The null hypothesis states that all the classifiers are equivalent and so their ranks should be equal. To justify it, the Friedman statistic [40] is computed as follows:Under the current experimentation, this statistic is distributed according to with (=7) degrees of freedom. Using (37), the value of is calculated as 30.46. From the table of critical values (see any standard statistical book), the value of with 7 degrees of freedom is 14.0671 for (where is known as level of significance). It can be seen that the computed differs significantly from the standard . So, the null hypothesis is rejected.

Singh et al. [40] derived a better statistic using the following formula: is distributed according to the -distribution with (=7) and ()() (=77) degrees of freedom. Using (38), the value of is calculated as 8.0659. The critical value of (7, 77) for = 0.05 is 2.147 (see any standard statistical book) which shows a significant difference between the standard and calculated values of . Thus, both Friedman and Iman et al. statistics reject the null hypothesis.

As the null hypothesis is rejected, a post hoc test known as the Nemenyi test [40] is carried out for pairwise comparisons of the best and worst performing classifiers. The performances of two classifiers are significantly different if the corresponding average ranks differ by at least the critical difference (CD) which is expressed as follows: For the Nemenyi test, the value of for eight classifiers is 3.031 (see Table (a) of [41]). So, the is calculated as , that is, 3.031, using (39). Since the difference between mean ranks of the best and worst classifier is much greater than the (see Table 3), we can conclude that there is a significant difference between the performing abilities of the classifiers. For comparing all classifiers with a control classifier (say MLP), we have applied the Bonferroni-Dunn test [40]. For this test, is calculated using the same (39). But here, the value of for eight classifiers is 2.690 (see Table (b) of [41]). So, the for the Bonferroni-Dunn test is calculated as 2.690, that is, 2.690. As the difference between the mean ranks of any classifier and MLP is always greater than (see Table 3), the chosen control classifier performs significantly better than other classifiers for Indo-Arabic database. A graphical representation of the abovementioned post hoc tests for comparison of eight different classifiers on Dataset #1 is shown in Figure 6. Similarly, it can also be shown for Bangla, Devanagari, Roman, and Telugu databases that the chosen classifier (MLP) performs significantly better than the other seven classifiers.

4.4. Comparison among Moment Based Features

For the justification of the feature set used in the present work, the diverse combinations of six different types of moments, namely, geometric moment (F1–F5), moment invariant (F6–F12), affine moment invariant (F13–F18), Legendre moment (F19–F28), Zernike moment (F29–F64), and complex moment (F65–F130), are compared by considering all the possible combinations. This is done for measuring the discriminating strength of the individual moment features and their combinations based on their complementary information. These can be listed as follows:(a)Geometric moment + moment invariant + affine Moment invariant (F1–F18).(b)Legendre moment (F19–F28).(c)Geometric moment + moment invariant + affine moment invariant + Legendre moment (F1–F28).(d)Zernike moment (F29–F64).(e)Geometric moment + moment invariant + affine moment invariant + Legendre moment + Zernike moment (F1–F64).(f)Legendre moment + Zernike moment (F19–F64).(g)Complex moment (F65–F130).(h)Zernike moment + complex moment (F29–F130).(i)Geometric moment + moment invariant + affine moment invariant + Legendre moment + Zernike moment + complex moment (F1–F130).The graphical comparison of the corresponding numeral recognition accuracies achieved by MLP classifier over the same test set is shown in Figure 7. It can be observed from Figure 7 that the present combination of moment feature set outperforms all the other possible combinations.

4.5. Detail Evaluation of MLP Classifier

In the present work, detailed error analysis with respect to different parameters, namely, Kappa statistics, mean absolute error (MAE), root mean square error (RMSE), True Positive rate (TPR), False Positive rate (FPR), precision, recall, -measure, Matthews Correlation Coefficient (MCC), and Area under ROC (AUC), is computed. Tables 59 provide the said statistical measurements for handwritten numeral recognition written in Indo-Arabic, Bangla, Devanagari, Roman, and Telugu scripts, respectively.

5. Conclusion

India is a multilingual and multiscript country comprising of 12 different scripts. But there are not much competent works done towards handwritten numeral recognition of Indic scripts. The following issues are observed with handwritten digit recognition system: (1) mostly they have worked on limited dataset. (2) Training and testing times are not mentioned in most of the works. (3) Most of the works have been done for Roman because of the availability of larger dataset like MNIST. (4) Recognition systems for Indic scripts are mainly focused on single script. (5) Limitation to some feature extraction methods also exist; that is, they are local to a particular script/language rather having a global scope. In this work, we have verified the effectiveness of a moment based approach to handwritten digit recognition problem that includes geometric moment, moment invariant, affine moment invariant, Legendre moment, Zernike moment, and complex moment. The present scheme has been tested for five different popular scripts, namely, Indo-Arabic, Bangla, Devanagari, Roman, and Telugu. These methods have been evaluated on the CMATER and MNIST databases using multiple classifiers. Finally, MLP classifier is found to produce the highest recognition accuracies of 99.3%, 99.5%, 98.92%, 99.77%, and 98.8% on Indo-Arabic, Bangla, Devanagari, Roman, and Telugu scripts, respectively. The results have demonstrated that the application of moment based approach leads to a higher accuracy compared to its counterparts. Among the most important ones, an advantage of this feature extraction algorithm is that it is less computationally expensive where the most of the published works need more computation time. These features are also very simple to implement compared to other methods. It is obvious that, to improve the performance of proposed system further, we need to investigate more the sources of errors. Potential moment features other than the presented ones may also exist.

To further improve the performance, possible future works are as follows: (1) although the moment based features perform superbly on the whole, complementary features like concavity analysis may help in discriminating confusing numerals. For example, Indo-Arabic numerals “2” and “3” can better be separated by considering the original size before normalization. (2) For classifier design, it is better to select model parameters (classifier structures) by cross validation rather than empirically as done in our experiments. (3) Combining multiple classifiers can improve the recognition accuracy.

Conflict of Interests

The authors declare that there is no conflict of interests regarding the publication of this paper.

Acknowledgments

The authors are thankful to the Center for Microprocessor Application for Training Education and Research (CMATER) and Project on Storage Retrieval and Understanding of Video for Multimedia (SRUVM) of Computer Science and Engineering Department, Jadavpur University, for providing infrastructure facilities during progress of the work. The current work, reported here, has been partially funded by University with Potential for Excellence (UPE), Phase-II, UGC, Government of India.