Abstract

This paper proposes a novel local texture description method which defines six human visual perceptual characteristics and selects the minimal subset of relevant as well as nonredundant features based on principal component analysis (PCA). We assign six texture characteristics, which were originally defined by Tamura et al., with novel definition and local metrics so that these measurements reflect the human perception of each characteristic more precisely. Then, we propose a PCA-based feature selection method exploiting the structure of the principal components of the feature set to find a subset of the original feature vector, where the features reflect the most representative characteristics for the textures in the given image dataset. Experiments on different publicly available large datasets demonstrate that the proposed method provides superior performance of classification over most of the state-of-the-art feature description methods with respect to accuracy and efficiency.

1. Introduction

Texture feature description plays fundamental roles in many computer vision applications, especially general texture classification that can be widely used in material surface inspection [1], medical imaging [24], object recognition [57], scene recognition [8], and image retrieval [9, 10]. Because of its significance, a large number of texture description approaches have been proposed during the past decades [1114]. However, it remains a challenging problem to extract highly representative and robust texture features to describe textural images.

Several classical texture descriptors have proved to be effective in reflecting the texture properties. For example, autoregression (AR) model based texture descriptor [15] and Markov random field (MRF) model based descriptor [16] described the texture using the parameters of the models that the pixel intensities were assumed to follow. Edge-based descriptors calculated the gradient magnitudes and directions of the edge pixels in the image and then measured the first-order or second-order statistics of edge distributions by some distance-dependent functions as the texture pattern [17]. Spatial frequency based descriptors have been proposed based on the assumption that textural characteristics are directly related to the spatial frequencies of texture primitives [18]. The local binary pattern (LBP) [19] and its variants [2024] exploited the direct comparison between the pixel intensity and its neighbours’ intensities within a local area. These methods, the so-called low-level texture descriptors, described the texture by the intuitive features of the texture itself, including the occurrence and direction of the edges or corners, the shape of the textures, the occurrence frequency of the texture primitives, the variance of the pixel intensities, and other primitive level features. Since low-level descriptors typically lacked in giving significant attention to explore the relationship between the texture characteristics they described and human visual sense, image descriptors for high-level features that described how human beings observed the different texture regions have attracted more attention in recent years. However, the high-level texture descriptors still need to solve two problems:(1)how to bridge the high-level texture characteristics and the low-level statistical measurements;(2)how to reduce the unnecessary texture characteristics to avoid redundancy.

For the first problem, several methods have been proposed to define the high-level texture characteristics differently. One of the most classic high-level texture descriptor is the gray-level cooccurrence matrix (GLCM) texture descriptor proposed by Haralick et al. [25], where they defined 14 characteristics based on the gray-level cooccurrence matrix. In the 14 defined characteristics, the energy, entropy, contrast, homogeneity, and correlation were the most popular-used ones to describe a texture from others. Another famous high-level descriptor was proposed by Laws et al. in [26], where they calculated the uniformity, density, coarseness, roughness, regularity, linearity, directionality, frequency, and phase by assessing average gray level, edges, spots, ripples, and waves in texture [27]. Since then, a number of approaches were proposed to describe the high-level characteristics of the texture. Tomita and Tsuji defined the texture density, directionality, coarseness, randomness, linearity, and periodicity in [28]. M. Amadasun et al. [29] defined five basic characters of texture, namely, coarseness, contrast, busyness, complexity, and texture strength in terms of spatial changes in intensity. Rao and Lohse et al. [30] constructed a three-dimensional space for texture description, where the three orthogonal dimensions were repetitive vs. nonrepetitive, high-contrast and nondirectional vs. low-contrast and directional, and granular, coarse, and low-complexity vs. nongranular, fine, and high-complexity. K. Fujii et al. [31] presented a set of texture characters (contrast, coarseness, and regularity) corresponding to perceptual properties of visual texture based on autocorrelation function (ACF), showing how human subjects seemed to be sensitive to the same parameters captured by a certain descriptor (ACF). H. Tamura et al. [32] presented that six textural characteristics had high correlation to human visual perception, which were coarseness, contrast, directionality, line-likeness, regularity, and roughness. They developed both computational and psychological measurements for these characteristics and made a comparison between these two measurements. The six Tamura’s texture characteristics have proved to be most accurate and efficient in describing the texture according to the human visual perception so far. However, one drawback of the existing descriptors is the incorrect calculation of the high-level texture characteristics by using the low-level statistical measurements [33]. Moreover, most of the descriptors capture the texture characteristics globally for the given image, so they cannot precisely describe the variances of the texture characteristics in the textural images with multiple textures.

For the second problem, dimensionality reduction methods have been proposed as a usual solution, which reduced the number of random variables under consideration to obtain a set of “uncorrelated” principal variables [34]. Most of the dimensionality reduction techniques can be divided into feature extraction and feature selection [35]. Feature extraction approaches transformed data in a high-dimensional space to a lower-dimensional space. The transformations can be both linear (such as in principal component analysis (PCA) [36], linear discriminant analysis (LDA) [37], and maximum margin criterion (MMC) [38]) and nonlinear (such as kernel PCA, locally linear embedding (LLE) [39], and isomap [40]). However, transformed features usually lacked in obvious semantic meaning, making it difficult to understand the image and analyse image components representing different high-level visual characteristics. Feature selection techniques [4143] have proved to be more effective than feature extraction approaches for dimensionality reduction of image features because they pick a subset of the original features rather than finding a mapping that uses all of the original features. Several feature selection methods have been proposed and applied in different situations. Among them, the optimality properties of PCA have attracted research on PCA-based variable selection methods [4447]. However, these methods have the disadvantage of either being too computationally expensive, or choosing a subset of features with redundant information.

To overcome the above problems coming from the definition and measurement of high-level texture characteristics and the feature selection, this paper proposes a novel high-level texture description with the selection scheme of representative texture characteristics. Figure 1 illustrates the processing of the proposed texture description method. We redefine six texture characteristics that were originally defined by Tamura et al. [32], including coarseness, contrast, directionality, line-likeness, regularity, and roughness, by measuring each of them with novel local metrics so that these measurements reflect the human perception of each characteristic more precisely. To achieve more effective performance of description, we propose a PCA-based feature selection method exploiting the structure of the principal components of the feature set to find a subset of the original feature vector, where the features reflect the most representative characteristics for the textures in the given image dataset. The experimental results demonstrate that the proposed method provides superior performance in classification accuracy and efficiency over the state-of-the-art methods on three different databases.

The rest of this paper is organized as follows. We present the details of the proposed method in Section 2. The experiments are implemented and the results are discussed in Section 3. Finally, we conclude our proposed work in Section 4.

2. Materials and Methods

The proposed texture descriptor consists of two parts: novel local definitions and measurements of six texture characteristics that were named and defined by Tamura et al. [32] and selection of the most representative features whose corresponding principal component analysis (PCA) coefficients exhibit the largest orthogonality to each other. The two parts are introduced in detail in this section.

2.1. Local Tamura’s Texture Description
2.1.1. Tamura’s Texture Description

In [32], Tamura et al. proposed six human visual perceptual texture features based on psychological experiments. The six texture features were named as coarseness, contrast, directionality, line-likeness, regularity, and roughness, of which the definitions are reviewed as follows.

Coarseness. Coarseness was defined as the measurement of the size of the primitive elements (texels) composing the texture. The computational procedure was defined as shown in Table 1, where m and n are the effective width and height of the image, and denotes the neighbourhood size which generates the highest similarity of averaging intensity of the neighbourhood centred at .

Contrast. In Tamura’s work, they assumed the contrast difference between two texture patterns with different structures was influenced by the following two factors: dynamic range of gray levels and polarization of the distribution of black and white on the gray level histogram or ratio of black and white areas.

Given the two factors, the contrast was defined as in Table 1, where represented the standard deviation, denoted the kurtosis of the intensity histogram as a measurement of polarization, and was a positive number choosing from 8; 4; 2; 1; 1/2; 1/4; 1/8. This calculation of contrast could reduce the values for distributions with biased peaks while almost preserving those for polarized distributions.

Directionality. In Tamura’s descriptor, directionality was measured using the histogram of directional angles of the oriented local edges, i.e., the frequency distribution of oriented local edges against their directional angles. The Tamura’s directionality was calculated as the in Table 1, where was the direction histogram, was the number of peaks and was the -th peak position in the histogram , was the range of p-th peak between valleys, and was the quantized direction angle (cyclically in modulo 180°), while was a normalizing factor related to the number of quantization levels of .

Line-likeness. Tamura defined the word line-likeness as a characteristic of texture that was composed of lines. For this purpose, when the direction and the neighbouring edges’ directions for a given edge were nearly equal, they regarded such a group of edge pixels as a line. Therefore, the measurement of in Table 1 was defined so that cooccurrences in the same direction were weighted by +1 and those in the perpendicular direction by -1. In the mathematical function, was the direction cooccurrence matrix whose element was defined as the relative frequency with which neighbourhoods centred at two pixels separated by a distance along the edge direction occur on the image, one with the quantized direction and the other with the quantized direction .

Regularity. Tamura assumed that if any feature of a texture varied over the whole image, the image is irregular. As shown in Table 1, they took the sum of the variation for each of the four defined features extracted from subimages of the image defined above as the measurement of regularity , where was the normalizing factor and each meant the standard deviation of .

Roughness. According to the results of their psychological experiments on vision, Tamura emphasized the effects of coarseness and contrast on roughness and approximated a measurement of roughness as shown in Table 1.

From their own experiments [32] and the following research [48], Tamura’s texture descriptor has proved to be successful in reflecting the human visual perception on textures and bridging the semantic gap between low-level features and high-level concepts. However, all the six Tamura’s texture characteristics described the textural image globally, resulting in difficulties in discriminating the different textures in the image. Therefore, in the next part, Tamura’s descriptor is improved to represent texture features locally, still following the definition and the understanding of the texture characteristics.

2.1.2. Local Tamura’s Texture Description

Due to the discussion in Section 2.1.1, in this part we will focus on redefining six Tamura’s texture features for local texture description. When extending descriptors from global to local, a common method is as follows: for each pixel, treat the neighbourhood centred at it as an image, then apply the texture descriptor to this neighbourhood, and assign the feature vector to the centre pixel. However, this simple “global to local” strategy is not applicable for the calculation of coarseness because it requires estimating the size of the image block that maximizes the intensity similarity in it and minimizing the intensity similarity to its adjacent blocks with the same estimated size, which cannot be calculated in a local neighbourhood. As a result, regularity and roughness cannot be calculated locally following the original definitions since they both rely on the measurement of coarseness. Therefore, we research the definition of each feature and calculate them by the low-level calculations that measure the relation among pixels in the local neighbourhood centred at each pixel.

Coarseness. As shown in Figures 2(a) and 2(g), larger texels composing the texture should result in higher coarseness value. In our work, coarseness is considered to measure the density of the texels in the neighbourhood: the more the texels are there in the local window, the smaller sizes the texels have, and the lower the coarseness should be. Therefore, the computation procedure is defined as follows.

Step 1. Calculate the edges in the given image by applying the Laplacian of Gaussian (LoG) filter [49] to the image and finding the zero crossings, resulting in the edge map .

Step 2. Connect the edges by the morphological transformation that sets 0-valued pixels to 1 if they have two nonzero neighbours that are not connected [50], so that we get the connected edge map :After the edge connection, the edges surrounding the smooth regions as “texels” are mostly detected from the image.

Step 3. Calculate the “texels” image as the negative transformation of the connected edge image :Then the “texels” are labeled in the image .

Step 4. The coarseness is calculated as the ratio of the number of different texels to the number of the pixels in the neighbourhood:where diff() calculates the number of different “texels” in the neighbourhood of , while and N are the size of the neighbourhood.

Contrast. Based on the discussion in Section 2.1 and the observation of the textures with different contrast, as shown in Figures 2(b) and 2(h), the following factors are considered to influence the contrast difference between different texture patterns:(1)the range of gray levels: the textures where the distances between the maximum and minimum gray level are large are more likely to have higher contrast;(2)the concentration of the distribution of the gray level histogram: the textures where the histogram distributes uniformly in a wide range will exhibit higher contrast;(3)the polarization of the distribution of the gray level histogram: a small difference of intensity is negligible if the intensities are biased to 255, while the same small difference matters if the intensities are biased to 0.

Therefore, the contrast of the image can be measured locally aswhere corresponds to the dynamic range of the gray levels between the maximum intensity Imax and the minimum intensity :and combines the skewness and the kurtosis of the histogram of the pixel intensities in the neighbourhood to measure the concentration and the polarization of the distribution of the intensities:Directionality. The concepts of directionality and line-likeness in Tamura’s texture descriptor are obscured because the directionality measures the edges with similar directions while the line-likeness measures the coincidence of the edge directions. It can partly explain the incorrectness of the Tamura’s descriptor in describing directionality for some textures with random directions. Therefore, in this work, directionality is redefined as the measurement of the mean orientation of all the edges in the local neighbourhood. Moreover, we assume that the horizontal texture has high directionality value, as shown in Figure 2(c), while the vertical texture has low directionality value, as shown in Figure 2(i). Then the directionality of a texture pixel is calculated in its neighbourhood aswhere and are the size of the neighbourhood, and and represent the magnitude and direction of the edge pixel , which are calculated bywhere and represent the pixel-wise horizontal and vertical derivatives using the Sobel operator [51] to the neighbourhood centred at .

Line-likeness. The line-likeness measures how many of the edges in the neighbourhood have the same or similar directions. Figure 2(d) shows the texture with high line-likeness where most of the edges are in the similar direction, while Figure 2(j) shows the texture with low line-likeness because of the edges with various directions. The directions at the edge pixels are quantized into direction intervals . As a result, the line-likeness of the pixel is calculated as 1 minus the variance of the local edge orientations:where is the quantized direction of the edge direction , is the mean value of in the local window, and is the radius of the window.

Regularity. The regularity of the texture measures the spatially repetitiveness of texels over a certain distance. For a regular texture shown in Figure 2(e), the texel edges repeat every several pixels, and the distance of this repetitiveness is relatively fixed in contrast to an irregular texture as shown in Figure 2(k). Therefore, this pattern can be calculated by the autocorrelation functions of the edge image as follows:where C is the autocorrelation function of the neighbouring image centred at (x; y):where CORR(·) represents the autocorrelation function, and is the local window of the image I. Since the maximum of the autocorrelation function C will only appear when the displacement of the texture is 0, and the regular texture where there must exist another displacement of the repeating texels will have other local peak values in the autocorrelation function, the ratio of the maximum autocorrelation value to the sum of all the local peak autocorrelation values will be smaller in regular texture than that in irregular one, leading to a larger measurement .

Roughness. As shown in Figure 2(f), a rough texture means the surface of the texture is uneven and not smooth, and contains many edges, while the texture with quite smooth surface as shown in Figure 2(l) has the low value of roughness. Therefore, the roughness can be approximated by the density of the edges in the neighbourhood:where represents the edge map of the neighbourhood centred at with the size . Similar to the calculation of coarseness, the edge is computed by the Laplacian of Gaussian (LoG) filter.

As discussed above, the six human visual perceptual texture features that were originally proposed in Tamura’s work have been locally calculated with novel methods from low-level measurements. Therefore, the texture image can be described more precisely according to human visual perception and different types of textures can be better differentiated locally.

2.2. PCA-Orthogonality Key Texture Feature Selection

From the above discussion, the six novel-defined texture features can locally describe textural images. Using all these features in describing textural images, however, is sometimes redundant because these features are correlated to each other to some extent. In this part, a PCA-orthogonality key feature selection method is proposed to find the representative characteristics in differentiating textures in the image.

2.2.1. Principal Component Analysis (PCA) in Image Feature Reduction

Principal component analysis (PCA) is one of the most popular methods for dimensionality reduction. As described in [52], for a set of data with dimensions, PCA aims to find a linear subspace of dimension lower than so that the data points mostly lie in this subspace, which is likely to maintain most of the variability of the original data. orthogonal vectors that form a new coordinate system, called the “principal components,” can specify the linear subspace. The principal components are orthogonal, linear transformations of the original data points, so there can be no more than of them. However, the expectation is that only q < p principal components are required to approximate the space spanned by the original axes.

Mathematically, the transformation is defined [52] by a set of p-dimensional vectors of weights or loading that map each row vector of to a new vector of principal component scores , given byIn such a way that the individual variables of t considered over the data set successively inherit the maximum possible variance from x, with each weights vector w constrained to be a unit vector. Therefore the full principal components transformation of x can be given aswhere W is a p×p matrix whose columns are the eigenvectors of . Then the transformation T = XW maps a data vector from an original space of p variables to a new space of p variables which are decorrelated over the dataset. However, not all the principal components need to be kept. Keeping only the first q principal components, produced by using only the first q loading vectors, gives the truncated transformation:where the matrix now has rows but only columns. In other words, PCA learns a linear transformation , where the columns of matrix form an orthogonal basis for the features (the components of representation ) that are decorrelated. By construction of all the transformed data matrices with only columns, this score matrix maximizes the variance in the original data that has been preserved, while minimizing the total squared reconstruction error or .

Following this transformation scheme, PCA has been widely used to reduce the dimensionality of the image features selected by some certain image descriptor [4447, 53, 54]. Given an image with size and a local descriptor that extract features for each pixel in the image, then the data matrix can be considered aswhere n = M × N is the total number of the pixels in the image, and is the dimensionality of the original feature space for each pixel. Then according to (15), only the first principal components that generate the largest variances are preserved and the new feature matrix iswhere is the newly feature vector for each pixel in the image, with a reduced dimensionality of the image feature.

2.2.2. PCA-Orthogonality Key Feature Selection

Given the PCA transformation for image feature vector as described in (16) and (17), the “principal” features are the linear combination of the original features with the coefficients matrix:where are the principal components translated from the original features by the coefficients matrix.

Assuming that there are observations of the image with respect to the features, then we have sets of the original features with p-dimension. After PCA transformation, the n observations of each of the features with the reduced dimensionality of the original feature space as follows:where . Since for every , the observations of the original features are the same, the column controls the difference between two principal components Yi, while the absolute value of the represents the weight of the r-th original feature to the -th principal component.

Differently from the original PCA dimensionality reduction which directly uses as the features to describe the image, the PCA-based feature selection aims to find a subset of the original data vector , so that with the corresponding coefficients , which are also the subset of the original coefficients , the output components are also the “principal” components. In another word, the orthogonality of the output should remain close to that of the original principal components .

In our work, the orthogonality of the component matrix is defined by the cosine dissimilarity between each two components and in :Where is the number of components in , and the cosine dissimilarity is defined aswhere is the number of observations of the image data. However, it is difficult to calculate the dissimilarity of the and directly because is always large as the total number of the image pixels and varies in images with different sizes. We instead use the weighting coefficients matrix k to measure the dissimilarity because according to (19), the columns and are able to control the difference between two principal components and . Then the orthogonality of the component matrix can be calculated by the coefficient matrix k aswhere q is also the number of columns in k corresponding to the components in the , andwhere p is the dimensionality of the input features . Since p is based on the descriptor used to represent the image, p is a constant number for all the images, which simplifies the calculation of the orthogonality. Therefore, the PCA-based feature selection can be mathematically described aswhere is the subset of the original features and the number of the dimensionality of the selected subset is , while represents the full set of the original features. The algorithm of the PCA-orthogonality feature selection (PCA-ORTH) is implemented as Algorithm 1:(1)implement the PCA to the data vector consisting of the features of the images in the dataset;(2)find all the possible subsets of the coefficients matrix . Each subset is formed by selecting the rows from k, corresponding to the selected features;(3)calculate the dissimilarities between every two columns of all the possible subsets, and then compute the orthogonality of the components matrices represented by those subsets;(4)compare the orthogonality of the possible matrices, and select the key features represented by the subset of the coefficients matrix that derive the output components matrix with the orthogonality closest to the original one.

(1) Initialize L = 1, the number of selected features, , the selected features
set, ortho = 0, the orthogonality of the principal components computed from the selected
features, k is the principal component coefficients matrix with p rows corresponding to the
original features.
(2) Perform p - 1 times:
(i) Calculate the set s of possible combinations by choosing L features from the original p features
and the number of combinations .
(ii) Initialize i = 1, the order of the combination in the set s.
(3) Perform times:
(i) Calculate the orthogonality of the subset consisting of the i-th combination of rows in
the principal component coefficients matrix k.
(ii) If > ortho, update the orthogonality by , update the selected features
set by . Else, skip.
(iii) Update the order by .
(iv) Update the number of selected features by .
(4) Return the selected features set .

3. Results and Discussion

To evaluate the performance of the proposed PCA-orthogonality key local Tamura’s texture description method (LT+PCA-ORTH), different texture description methods combined with different feature selection methods are compared to classify textures in the given testing texture databases.

4. Experimental Materials

4.1. Image Database

There are three image databases used in our experiments to evaluate different texture descriptors:(i)32 texture images from the Brodatz texture database [55]; Figures 3(a), 3(b), and 3(c) show three example textures in the database;(ii)25 texture images from the UIUCTex texture database [56]; Figures 3(d), 3(e), and 3(f) show three example textures in the database;(iii)32 texture images from the CUReT texture database [57]; Figures 3(g), 3(h), and 3(i) show three example textures in the database.

For Brodatz and UIUCTex texture database, since there are not enough samples of each texture (1 sample for each texture in Brodatz and 40 samples for each texture in UIUCTex), the training and testing samples for each texture are generated from the given texture itself. As Figure 4 shows, the training and testing samples for a given texture are generated as follows:(1)each texture with the size of 512 × 512 is cut into 8 × 8 = 64 patches with the size of 64 × 64, shown as those in Figures 4(a) and 4(b); we choose 64×64 as the size of each texture patch because it is a good trade off: size larger than 128×128 would result in too few patches of the original image to generate enough texture samples, while size smaller than 32 × 32 would result in too shattered patches to generate the samples preserving enough original texture information;(2)the 64 patches are randomly divided into 8 groups, each of which contains 8 patches; in each group, the 8 patches are randomly combined (with repetition) into 8 rows and 8 columns and then merged into one image; this step is repeated 8 times in each group, resulting in 8 images with the same type of texture; Figures 4(c) and 4(d) show two example randomly generated texture images in two corresponding groups;(3)8-fold cross validation is implemented with the generated 8 × 8 = 64 images: 8 generated images in 1 group are used as the testing image set for one texture, 56 generated images in the other 7 groups are used as the training image set; the process is repeated 8 times by using every group of 8 images as testing set while the others as training set.

4.1.1. Comparator Texture Descriptor

The state-of-the-art texture description methods and feature selection methods are combined as the comparators to the proposed method in finding the most representative local texture features. The following texture description methods are used:(1)the local binary pattern (LBP) descriptor [19], which described the texture by computing the LBP code from comparing a pixel intensity with its neighbours;(2)the scale invariant feature transform (SIFT) descriptor [58], which combined a scale invariant region detector and a description based on the gradient distribution in the detected regions;(3)the Gabor transform based texture descriptor (Gabor) [59], which applied the Gabor coefficients of the image as the image description;(4)the Tamura’s texture descriptor (TM) [32], which originally defined the coarseness, contrast, directionality, line-likeness, regularity, and roughness based on the computational and psychological measurements.

And the following state-of-the-art feature selection methods are combined with all the texture descriptors as the comparators to the proposed PCA-ORTH feature selection method:(1)the mean square prediction error minimization of PCA method (PCA-MSPEM) [46], which found the subset of the feature set that minimized the trace of a measure matrix of the covariance matrix of the input feature vector;(2)the approximation of principal features by using the absolute value of the coefficients of the principal components (PCA-APX) [52], which chose the variables corresponding to the highest coefficients of each of the first q principal components as the principal features from the given ones;(3)the principal feature analysis (PFA) [47], which clustered all the rows in the principal component coefficient matrix by K-Means algorithm and selected the feature corresponding to the row vector that was closest to the mean of each cluster as the principal feature.

4.1.2. Implementation of the Experiment

With the generated training and testing samples of each texture in each database and the given texture description methods, the experiment of classifying different textures in each database is implemented through MATLAB2017b on an Intel i7-7700 2.60GHz PC with 8G RAM as follows.(1)The local Tamura’s features, consisting of the 6 specific characteristics, coarseness, contrast, directionality, line-likeness, regularity, and roughness, are calculated in the local neighbourhood. The selection of the neighbourhood size will be discussed in the following sections. For each sample image, we first calculate the features for all the pixels. Then the mean value of the feature values for all the pixels is considered as the certain feature value for the image; that is, 1 × 6 feature vector is calculated for each sample image. Similarly, other texture features are calculated by the comparing texture description methods.(2)The most discriminating features are selected from different texture features using different feature selection methods, and then for each sample image a 1 × q (q ≤ 6) vector of the most discriminating feature for each feature description method is generated.(3)The testing samples of textures are classified by the multiclass SVM classifier and different texture features. The classifiers are trained with the corresponding feature vectors of the training samples.(4)The classification accuracy of each classification is defined aswhere the traditional true positive (TP), true negative (TN), false positive (FP), and false negative (FN) were described in [60].(5)The confusion matrix of each classification is defined aswhere represents the element (i; j) in the confusion matrix CM, is the number of samples whose target is the i-th class that is classified as -th class, and denotes the total number of the samples belonging to i-th class. And the confusion rate is calculated aswhere is the total number of the texture classes in each database.(6)The classification accuracy, the confusion matrix, and confusion rate of classifying textures in each database by all the texture description methods are compared to evaluate the effect of different descriptors in extracting the most representative features for the textures in a given database.

5. Experimental Results

In this section, we show and analyse the experimental results of classifying textures in different texture image databases separately.

5.1. Experiments on Brodatz Texture Database

Figure 5 shows the relationship between the radius of the neighbourhood and the classification accuracy as well as the running time of the whole processing of the proposed local Tamura’s texture description (LTM) and the PCA-orthogonality (PCA-ORTH) feature selection. It is considered that 5 is a good trade-off as the radius of the neighbourhood: it achieves the highest classification accuracy (97.95%) with the running time as short (23.40s) as possible.

Figure 6 shows the average accuracy of classifying 32 textures in Brodatz database by multi-SVM, using the texture features extracted by different methods. On one hand, the proposed “full-feature” local Tamura’s texture description (LTM) (without feature selection method) provides higher classification accuracy (97.41%) than the other “full-feature” descriptors (97.02%, 96.63%, 96.04%, and 95.80%).On the other hand, the proposed PCA-orthogonality key feature selection method (PCA-ORTH) can improve the performance of each texture description method more than the comparing feature selection methods. As a result, the proposed PCA-orthogonality local Tamura’s texture descriptor generates the highest average classification accuracy on this database (97.95%).

Figure 7 provides confusion matrices of the classification using different texture description methods. The original Tamura’s texture descriptor fails to identify the textures D1, D12, and D21. The LBP descriptor cannot distinguish the global arrangement of local structures between the textures D10 and D22. Even the “full-feature” local Tamura’s texture descriptor incorrectly considers the textures D9, D14, and D17. The PCA-orthogonality local Tamura’s texture descriptor correctly classifies most of the testing textures to their target classes, leading to the lowest confusion rate (0.97%).

Furthermore, Table 2 illustrates the running time of feature extraction, feature selection, and classification using the selected features for a general evaluation of the efficiency of each texture descriptor. The LBP and Gabor provide shorter running time in feature extraction, but the classification time is much longer because of the high feature dimensionality, even with the feature selection step. The SIFT provides high running time in both the feature extraction and the classification step. In contrast, the two Tamura’s texture descriptors cost medium running time in feature extraction step but the shortest classification time since the feature vectors only have 6 dimensions. For the feature selection part, both PCA-MSPEM and PFA methods need long time to select features. The approximation method takes the shortest running time in selecting key features since it needs no computation other than checking the highest absolute value of the coefficients. The proposed PCA-ORTH feature selection method achieves the running time close to the approximation method because it only repeats the calculation of cosine distance between vectors in the possible subsets of the coefficient matrix , of which the size is not quite large for the chosen image descriptor. For example, only cosine distances need to be calculated in total for each of the two Tamura’s descriptors. Although the size of the coefficient matrix could be larger in the case of other 3 descriptors, the running time of PCA-ORTH is still comparable to the approximation method considering that the calculating ability of the computer is stronger than human in seeking the variables corresponding to the highest coefficients of each of the first q principal components. Over all, the proposed method relates the low-level statistical measurements to the high-level visual characteristics locally, applies the cosine distance as the simpler dissimilarity measurement of principal components, and finds the feature vectors that are most dissimilar to each other. Therefore, the most representative features are computed with less computational complexity than other methods and able to represent the most representative features of the original ones, leading to higher classification accuracy than the “full-feature” classification and much shorter running time in both feature selection and classification.

5.1.1. Experiments on UIUCTex Texture Database

Figure 8 illustrates how the classification accuracy and the running time of the proposed LTM+PCA-ORTH method change with different selections of the neighbourhood size. Similar as the results for the Brodatz texture database, 5 is a good trade-off as the radius of the neighbourhood: it achieves the highest classification accuracy (98.75%) with short running time (23.32 s).

Figure 9 shows the classification accuracy of applying different texture descriptors combined with different feature selection methods to the texture images in UIUCTex database. Figure 10 shows the confusion matrices of the classification using different texture description methods. Table 3 lists the running time of each step in the whole classification processing. Similar as the results on the Brodatz database, the proposed local Tamura’s texture descriptor combined with the proposed PCA-ORTH feature selection method provides better classification performance than the comparators, with respect to the average classification accuracy (98.75%), confusion rate (1%) and the running time (23.32 s).

5.1.2. Experiments on CUReT Texture Database

Figure 11 provides the classification accuracy of using the proposed feature description and selection method with different neighbourhood radius and the corresponding running time of the whole procedure. It can be considered that 5 is a good choice as neighbourhood radius, leading to highest classification rate (98.83%) in quite short running time (23.40 s).

Figure 12 shows the results of describing and classifying the texture images in the CUReT database using the features extracted by different description methods. Figure 13 shows the confusion matrices of the classification using different texture description methods. Table 4 illustrates the running time of different classification using different texture descriptors and feature selection methods. It can be considered that the proposed description algorithm, that is, combining the proposed local Tamura’s descriptor and the proposed PCA-ORTH feature selection, leads to higher classification accuracy (98.83%), lower confusion rate (0.58%) and shorter running time (23.40 s).

6. Discussion

We will discuss the experimental results on the above three database with respect to the aspects of texture description and feature selection since the proposed method consists of these two parts.

The classification of textures in different datasets by the “full-feature” version of different descriptors demonstrates their performance in reflecting the differences between different textures. The LBP descriptor firstly calculates the LBP code, which reflects the intensity distribution in the local neighbourhood, but it cannot distinguish the textures with similar structure but different intensity contrast because it cannot measure the exact difference between the neighbouring pixel and the centre pixel. The SIFT descriptor needs to detect the key points in the image first, and therefore it cannot describe the textures where the edges are quite smooth or the patterns changed gradually and it is difficult to find the key points. The Gabor descriptor makes use of the Gabor wavelet coefficients as the texture features, so the performance depends on the Gabor transform of the given texture image. The Gabor-based description may fail when the scale or the size of the Gabor window is not selected suitably. The LBP, SIFT, and Gabor descriptors focus on the local features too much, lacking in giving significant attention to explore the relationship between the texture characteristics they described and human visual sense globally. Therefore, they cannot differentiate those textures with tiny differences in visual system but similar edges or intensity distribution. Moreover, there is much redundant information in LBP, SIFT, and Gabor-based description, which increases the computational complexity, leading to longer running time than the Tamura (TM) and local Tamura (LTM) descriptors. The Tamura descriptor, as a high-level descriptor, pays more attention on the textural differences with respect to human visual perception, leading to higher classification accuracy. However, when applying the original Tamura’s definitions for the six characteristics directly to local neighbourhood, it lacks in measuring the local contrast, rotation, illuminance, and scale changes, resulting in difficulty in distinguishing the textures with similar appearances but different local details. The proposed local Tamura description defines the six Tamura’s characteristics using the local information in each neighbourhood directly. It correctly bridge the “semantic gap” between the low-level measurements and the high-level visual perceptual characteristics, leading to accurate description of local textural region with respect to human visual perception and accurate classification of both intraclass and interclass textures.

In the feature selection part, the PCA-MSPEM method preserves 4 or 5 of the 6 input features so the redundancy is still possible to exist. And the computational complexity of finding the representative features is high because the mean square prediction error is calculated for all possible combinations of the selected features. The approximation based on the magnitudes of coefficients of the principal components is a very intuitive and computationally feasible method because it requires no computation other than checking the highest absolute value of the coefficients. But the number of the selected features is the same as that of the principal components, losing some representative features; therefore the performance of classifying textures with these features varies a lot over the whole database. The PFA method achieves good performance in selecting the features that are largely spread in the lower dimensional space and good representation of the original data. However, it requires the prior parameter to be set so it is not adaptive in selecting the features. Moreover, the clustering method is high in computational cost when the number of coefficients is large. The proposed PCA-ORTH feature selection method applies the cosine distance as the simpler dissimilarity measurement and finds the feature vectors that are most dissimilar to each other. Therefore, the most representative features are computed with less computational complexity than other methods and able to select the most representative features of the original ones, leading to better classification performance and shorter running time in both feature selection and classification.

The experimental results on the above three database proves that the proposed local Tamura’s texture descriptor can correctly bridge the “semantic gap” between the low-level measurements and the high-level visual perceptual characteristics. Moreover, the proposed PCA-ORTH feature selection method can find the key features that reflect the essential differences between textures and avoid the redundancy in description, reducing the confusion of intraclass textures and the computational complexity of the following classification.

7. Conclusions

In this paper, we proposed a local Tamura’s texture description and a PCA-ORTH feature selection scheme. Texture images can be locally described by the high-level characteristics that fit the human visual perception, including coarseness, contrast, directionality, line-likeness, regularity, and roughness. Then a PCA-ORTH feature selection scheme was applied on a given database to find the most representative features that can differentiate the textures with low redundancy. Experiments were implemented to compare the performance of using the proposed description methods and the others in classifying the textures in different databases. The experimental results demonstrate that the local Tamura’s texture features can describe different textures in the database accurately and the proposed feature selection method can effectively find the features that are essential in differentiating the textures. As a result, the differences between different textures can be measured more precisely with respect to these key texture characteristics and different textures can be classified more accurately.

Data Availability

All of the data are available. The links are listed as follows: Brodatz texture database: http://multibandtexture.recherche.usherbrooke.ca/original_brodatz.html and UIUCTex texture database: http://slazebni.cs.illinois.edu/research/uiuc_texture_dataset.zip CUReT texture database: http://www.cs.columbia.edu/CAVE/software/curet/.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

This work was supported by the National Natural Science Foundation of China under Grants 61701101, U1713216, 61803077, and 61603080, the National Key Robot Project Grant 2017YFB1300900, and the Fundamental Research Funds for the Central Universities (N172603001 and N172604004). The authors would like to thank Ying Wang and Wei Zhou (Northeastern University, China) for their help and discussions.