Abstract

Texture pattern classification has long been an essential issue in computer vision (CV). However, texture is a kind of perceptual concept of human beings in scene observation or content understanding, which cannot be defined or described clearly in CV. Visually, the visual appearance of the complex spatial structure (CSS) of texture pattern (TP) generally depends on the random organization (or layout) of local homogeneous fragments (LHFs) in the imaged surface. Hence, it is essential to investigate the latent statistical distribution (LSD) behavior of LHFs for distinctive CSS feature characterization to achieve good classification performance. This work presents an image statistical modeling-based TP identification (ISM-TPI) method. It firstly makes a theoretical explanation of the Weibull distribution (WD) behavior of the LHFs of the imaged surface in the imaging process based on the sequential fragmentation theory (SFT), which consequently derives a symmetrical WD model (SWDM) to characterize the LSD of the TP’s SS. Multidirectional and multiscale TP features are then characterized by the SWDM parameters based on the oriented differential operators; in other words, texture images are convolved with multiscale and multidirectional Gaussian derivative filters (GDFs), including the steerable isotropic GDFs (SIGDFs) and the oriented anisotropic GDFs (OAGDFs), for the omnidirectional and multiscale SS detail exhibition with low computational complexity. Finally, SWDM-based TP feature parameters, demonstrated to be directly related to the human vision perception system with significant physical perception meaning, are extracted and used to TP classification with a partial least squares-discriminant analysis- (PLS-DA-) based classifier. The effectiveness of the proposed ISM-TPI method is verified by extensive experiments on three texture image databases. The classification results demonstrate the superiority of the proposed methods over several state-of-the-art TP classification methods.

1. Introduction

Texture is an important visual perception cue ubiquitously existing in almost all-natural images. TP classification (TPC) plays an essential role in a variety of image analysis and computer vision (CV) tasks, such as image segmentation [1], scene classification, object recognition and image retrieval [2], and many others more which cannot be enumerated exhaustively [3]. Developing a TPC or identification system mainly involves two steps, designing effective algorithm for TP feature extraction [2] and training a classifier for TPC [4, 5].

A great many texture feature extraction methods can be found in the literature reviews, e.g., the early comparative study work [6], a more recent review of the texture descriptors [7], and a specific survey of the texture feature extraction methods for medical image analysis [8]. However, it is worth noting that texture is a kind of subjective and abstract visual concept of human beings, including multiple perceptual properties of the observed object surface [9], such as coarseness, roughness, regularity, directionality, contrast, etc. but cannot be defined or described effectively in the CV. Hence, the TPC is still a challenging task due to the unclear definition of the concept TP, the wide ranges of natural texture types, and the large intraclass variations in TPs [10]. To facilitate CV application or digital image processing and analysis, researchers have proposed many approximation methods to describe these perceptions depending upon their specific definition of TP.

There has much interest in use of local (invariant) descriptors for texture analysis in CV tasks, such as region segmentation, texture pattern classification, and even the real applications, e.g., local descriptors-based face detection or recognition [12]. These local structure descriptors can be mainly divided into two types, the sparse descriptor based on the detectable interest points, e.g., the scale invariant feature transform (SIFT) [13], descriptor and its variants [13, 14], and the dense descriptor based on the whole image pixel by pixel, e.g., the local binary pattern (LBP) descriptor [15], the pairwise rotation invariant cooccurrence local binary pattern (PRI-CoLBP) [16], and Weber Local Descriptor (WLD) [12]. In addition, some researchers combined both the ideals of both the sparse and the dense descriptors in a crossway. For example, Li et al. used SIFT in a dense sampling way [17] and Heikkilä et al. extracted the LBP base on the interest regions in a sparse way [18]. All these methods attempt to achieve discriminant features from local image patches by choosing a limited subset of the texture features [3] in the original pixel space or in the filtering response space. Even though there has been ample empirical evidence that these methods can achieve good classification performance on some specific texture databases, it concluded in a comparative study [19] that “no single approach did perform best or very close to the best for all images, thus no single approach may be selected as the clear “winner” ”for TPC.

Visually, TPs, especially the fully microtextons-occupied complex TPs, are highly structured with the spatial configurations of LHFs. In other words, real-world TPs are distinct but share some random-distributed elements [20]. Randomly configured or distributed LHFs or the frequency of occurrence of LHFs connects the complexity of TPs [21, 22], which brings about the visual varieties of the SS of TPs. Hence, how to characterize the distinctive organizations of the TP’s SS is greatly important for TPC. This paper mainly concerns the characterization of the essential structural feature of the image surface appearance for TPC.

Since the visual appearance of a region in the imaged TP is determined by the random layouts of LHFs, the distinctive TP characteristics are inevitably related to the statistical methods [23]. The early widespread methods are the local or global statistics of the images pixel intensities or the image filter responses, such as the first-order and second-order statistics, and cooccurrence matrixes-based statistics, e.g., gray level cooccurrence or difference histograms [24], gray level cooccurrence matrix (GLCM), gray level run length matrix (GLRM) [25], and multivariate image analysis (MIA) [26], as well as their variants [27]. It demonstrated that the statistical measures-based texture research method was a promising tool in computer texture analysis. As stated by Varma and Zisserman in [28], the success of the canonical texture research areas, such as synthesis, classification, segmentation, and compression, was largely due to learning a statistical representation of the filter bank responses or the image pixel intensity values.

The statistics-based texture analysis methods directly compute some local or global statistics (e.g., low-order moments) as the texture features in the filter bank response domain or in the original pixel intensity domain, whereas they do not give any prior assumption of the latent statistical distribution (LSD) of the SS of the TP. There is evidence that statistics-based texture features cannot achieve distinctive description for some complex and confusing TP analysis, which may even achieve the same measures from distinct TPs. For instance, it could obtain the same mean and variance in high probability from two types of texture images sampling from different distribution models [21], seen from Figure 1, where it displays two artificial images with the same statistics (zero means and unit variance), but exhibits distinctly different visual appearance. Hence, these statistical measures may result in low classification accuracies, since they cannot offer meaningful description of the LSD of the SS characteristics of texture images.

Some researchers attempt to discover the relation between the visual appearance and the LSD of the texture image’s SS in biovisual perception experiments or computational textural modeling methods [23, 29]. Considerable efforts have devoted to the statistical distribution modeling- (SDM-) based methods to the LSD characterization to learn a fuller representation of the distribution models based on the random field constraint, especially integrated with the prevalent multichannel and multiscale image analysis, such as dyadic Wavelet transform (DWT) and Gabor Wavelet transform (GWT) [30]. Many useful statistical models are introduced to approximate the LSDs of TPs in the filter bank response domains [31, 32].

In particular, researchers have recommended some sharp-peaked with fat and long-tailed or skew-distributed models, e.g., Gaussian scale mixture model (GSMM) [33], Mixture Gaussian model (MGM) [34], Gamma distribution model (GDM) [30], Student’s t distribution model (STDM) [35], etc., to characterize the marginal statistical distribution (MSD) of texture images in the multiscale image transform domain, owing to the sparse or skew behavior of the image filter bank responses (IFBRs). In other words, the MSDs of the IFBRs are highly kurtotic in the center of the distribution but long-tailed on both sides of the distribution or exhibit seriously left-skewed or right-skewed distribution behavior.

MSD models can be extended to multivariate statistical distribution (MvSD) models, e.g., the joint distribution, representing the statistical dependence of IFBRs or image pixel intensity values in the specific distance and fixed orientation, which were subsequently investigated by augmenting the aforementioned MSD model arguments. Besides the applications of image statistical modeling in TPC, MSD, and MvSD models can be also used as a prior to improve the capabilities of the image processing and analysis technologies, such as image denoising [36], image retrieval [37], object recognition [38], image segmentation [39], etc.

SDM-based TP feature description methods assume the image pixels as samples from an empirical distribution model (EDM) with some unknown model parameters, which can be used to describe the relationship between a certain image pixel and its adjacent pixels by using a certain probability model, e.g., Markov random field (MRF) model and simultaneous autoregression (SAR) model [40]. Although the SDM-based method provides a meaningful approach for the LSD characterization, current methods have taken few analyses of the exact statistical distribution of the SS of TP in theory; namely, the MSD and MvSD models selection for TP feature description depends on the researchers’ experience and greatly related to the training samples. It would achieve poor classification performance provided the unreasonable empirical model is introduced. In addition, the commonly used MSD and MvSD models do not take good use of human visual perception (HVP) characteristics, which lacks of physical perception meaning. In other words, these methods are not based on the mechanism of the human visual system, and therefore they can hardly represent multiple textural properties salient to the visual perception [9], and they need a lot of image samples for the reasonable statistical model learning. Hence, common SDM-based texture analysis methods are basically TP-specific [41] and they would have poor performance on the classification of the TPs with large intraclass variability and great interclass similarity.

To obtain distinctive SS feature of TPs and achieve high classification accuracy, this paper investigates a novel image statistical modeling-based texture pattern identification (ISM-TPI) method for TPC. Firstly, it made a theoretical analysis of the LSD behavior of the SS of TP. It reveals that the spatial structure of LHFs in the image surface obeys a typical Weibull distribution (WD) by introducing the sequential fragmentation theory (SFT). A symmetrical Weibull distribution (SWD) model is derived based on the SFT and the SWD model parameters of the multidirectional and multiscale SS of TP are extracted by the local differential operators, including a kind of fast steerable isotropic Gaussian derivative filters and a kind of oriented anisotropic Gaussian derivative filters. And then, TPs can be categorized with high accuracy based on the SWDM-based features with a partial least square-discriminant analysis classifier. The effectiveness of the proposed ISM-TPI method is verified by extensive experiments on three different texture image databases, which achieve relatively superior performance compared with the commonly used determination methods.

2. Statistical Modeling of CSSs in Texture Patterns

2.1. Theoretical Analysis of the WD Behavior of the LSD of Randomly Arranged Fragments

During visual imaging or scene observation process, local homogenous regions or objects in the imaging scene will fragment the field of view (FOV) into separated parts, e.g., the inside-object regions and outside-object regions [42]. If we enhance the imaging resolution, these inside-object regions and outside-object regions will be fragmented again by the presence of more LHFs or more complicated objects and finer details, and so on, until in any direction, the limitation of the FOV is reached. When the FOV is sufficiently fragmented by LHFs, the imaging scene becomes stable and the addition of any new details will not significantly change its appearance and consequently it leads to a specific visual appearance of TP. On the contrary, when reducing the resolution of the visual sensor, the adjacent local homogeneous structures will recombine together and constitute relatively coarser fragments. Geusebroek [42, 43] has demonstrated that this imaging process that fragmenting the visual surface details into LHFs can be simulated or analogized by the sequential fragmentation (SF) process well known in ore milling process, which is used to describe the distribution of the ore-particle size or mass in the ball grinding mill [44]. Hence, the distribution of the local fragments or particles in images can be predicted by the SF theory (SFT).

In terms of SFT [45], the statistical distribution of LHFs based on the fragmentation process obeys the power-law distribution, which can be expressed by the following formula:where represents a single fragmentation process from a coarse fragment structure to a fine fragment structure and parameter in the ore grinding process represents the average mass (or size given the ore density) of the crushed ore particles, whereas in the texture image analysis it denotes the average illumination intensity of the LHF ; is a shape parameter, satisfying .

With the increasing of the resolution power of visual sensor, more local fragment details are introduced into the imaging scene, which will divide the image into smaller LHFs (regions or particles). Hence, the histogram distribution of the illumination contrast is the integrating over the power-law-distributed LHFs [43], given by

Substitute (1) into (2), and let ; we can obtain by solving the following equation:

By solving (3), it can be found that obeys the typical WD law; namely,where is a normalized constant to make sure that is a reasonable distribution function.

Since the resolution power of the visual sensor is limited, the fragmentation process will inevitably terminate and exhibit a stable TP; the statistical distribution of the SS of LHFs corresponds to the accumulated distribution of the fragments whose local contrast larger than , which is demonstrated obeying a symmetrical Weibull distribution model (SWDM) given by the following formula:

To obtain the generalization characterization of TPs, the SWDM can be modified a little by introducing a location parameter , indicating the distribution location (center), namely, the probability density of the SS of TPs can be described by a triparameter SWDM [25, 35, 46],where is a normalized constant and is the Gamma function.

2.2. Relations to the HVP System

It has demonstrated that SWDM can completely characterize the spatial layout of the stochastic textures [43]. In the SWDM depicted in (6), is a location parameter, representing the global reflectance of texture image, and it can be derived from the shape-from-shading method [47]; is a scale parameter, directly related to the illumination contrast of the LHFs; is the shape parameter, reflecting the size of LHFs in texture image.

The visual appearance of TP varies with the statistical distribution of LHFs. The psychophysical evidence indicates that two TPs are often difficult to discriminate when they produce similar distribution of responses of a bank of orientation and spatial-frequency selective linear filters [9]. From regular texture to pure random TP, the distribution of the SS of TP gradually changes from a power-law distribution to an exponential distribution and then to a typical Gaussian distribution. SWDM is a generalized model for the effective expression of LSDs of various forms by adjusting the model parameters [25, 35]. For instance, when , a SWDM will turns into an exponential distribution with the mean of , when , it is basically close to the Gaussian distribution. In addition, some studies have demonstrated that the shape parameter is directly related to the fractal dimension of a texture image. The fractal dimension can be approximated by [46].

Hence, it reveals that SWDM generate a complete orthogonal basis for stochastic textures description and SWDM parameters are directly related to the HVP properties [43, 46]. The commonly used HVP properties for texture perception include coarseness, regularity, contrast, roughness, and directionality [9]. SWDM parameters can be used to make a psychophysical explanation of these perceptual properties [43], namely, coarseness, regularity, contrast, roughness, and directionality.

Coarseness is a fragment size-related perceptual property. Generally, the larger the size of basic fragments in texture pattern, the coarser it is felt of the HVP system. In SWDM, coarseness can be reflected by the shape parameter and the smaller means the coarser TP. However, Coarseness is an observation scale-related parameter [51]. As a psychophysical evidence, if we magnify a texture image but without resolution enhancement of visual sensor, the HVP system will adjust the observation perception and keep the scale invariance with no more extra details addition in the FOV. SWDM is demonstrated to be well fits this scale invariance. In other words, the shape parameter remains unchanged under this circumstance. Alternatively, if the texture image is magnified with an increase of the resolving power, the shape parameter becomes smaller for the perception of coarser TP.

Regularity is a layout variation-related HVP characteristic. In general, a finer TP is felt to be more regular. As analyzed above, coarseness (or fineness) can be reflected by the shape parameter ; hence regularity can be also expressed by the shape parameter . In the extreme case, a few particles or even just one fragment exists in the FOV, the LSD of the SS of TP rejects the SWDM because the real distribution obeys either a power-law distribution or a pure regular fragment (just one or two fragments fully occupy the entire FOV) and the exhibited statistical distribution is often multimodal. This phenomenon can be reflected with the over-fitting of the maximum likelihood estimation (MLE) of the shape parameter, e.g., [43].

Contrast is illumination variation-related HVP characteristics [51]. It results from the illumination intensity and the surface height variations of the observation objects [47]. The global illumination variation is reflected by the location parameter , which indicates the inhomogeneous illumination surface, whereas the surface height variations can be reflected by the scale parameter , the width of the SWDM according to the SFT. Hence, the perceptual contrast can be indicated by two SWDM parameters, and [43].

Roughness is originally a tactile property in surface perception. Though it seems like a three-dimensional perceptual property, human beings can perceive it based on a two-dimensional image. Roughness is a HVP property results from the height variations of a certain homogeneous fragment. The greater the height variations of the fragments in the FOV, the rougher perception it is felt. As analyzed above, is an indicator of the height variations of TP, and the shape parameter is the reflectance of coarseness of the fragments in texture image. Hence, roughness can be reflected by the shape parameter and scale parameter , under a special illumination condition, indicated by the location parameter [43].

Directionality is a global sense of the whole FOV and caused by the shapes of the basis texture elements as well as their placement rules. Though the SWDM parameters do not contain the direct shape information, it reveals that the anisotropy of grains can be described with the dominant direction of the shape parameter [4345]. Anisotropy in texture shadows (or contrast) can be reflected by the dominant orientation on the scale parameter . If we fully consider the overall SS information of TP, the SWDM parameters can effectively describe the HVP characteristic directionality with the corresponding dominant orientation information.

2.3. Parameter Estimation of SWDM

The commonly used model parameter estimation method is maximum likelihood estimation (MLE). Given a sample set samples from a SWDM, the corresponding logarithmic likelihood function (LLF) indicates how well the statistical model fit the training samples . We can achieve the model parameter by maximizing LLF; namely,After expanding and simplifying the LLF, we obtain

The SWDM parameters can be estimated by computing the partial derivative of with respect to and setting each of the derivative to be equal to zero, respectively. Hence, we can obtain the following equation set:where is the digamma function,

However, we cannot achieve the close-form solution of the SWDM parameters by solving the above equation set (9)-(11). In this work, we use the gradient-based optimization method, Newton-Raphson iterative optimization scheme, to estimate the SWDM parameter.

Firstly, we rewrite the three parameters of SWDM in the column vector form, namely, . Then we update the parameter estimation results step by step based on the following updating rule:where denotes the parameter estimation result of the step, is the first-order derivatives of , and represents the negative of the LLF ,and is the Hessian matrix and .

Given initial parameter estimation , we repeat the parameter estimation by (13) until the estimation results converge, then we achieve the best estimation result of the SWDM parameters , where the derivative vector and the Hessian matrix are as follows:where is the trigamma function and .

The convergence rate of the iterative estimation process is related to the initial model parameter . To fasten the estimation process, the initial model parameter is assigned as an empirical value which is close to the extremum point; namely, , where

It reveals that computing the inverse of the Hessian matrix is a time-consuming task. Actually, the location parameter is nearly the mean value of the sampling points and we can find the relation of the shape parameter and scale parameter by the partial derivative equation (9); namely,

Hence, we subtract a mean value from each sampling points and get a new corrected sample set , which still obeys the SWDM with the same scale parameter and shape parameter but symmetric around the y-axis. So, if we make a rough estimation of the location parameter with the mean value of the samples and take into account the relation equation of the scale parameter and shape parameter, namely, substituting (17) into (10), we can achieve a univariate equation as follows:

where represents the samples in . We can solve (18) by the Newton-Raphson scheme and then based on the relation formula (17), we can obtain the scale parameter conveniently.

The fitting goodness of the SWD model is evaluated by the statistics and the Kullback-Leibler divergence (KLD), which is as follows:where and are the expected SWDM probability and empirical probability over , respectively. The lower value of and KLD statistics, the more precise statistical model we achieve.

3. CSS Feature Characterization

3.1. Local Differential-Based SS Characterization

Digital image is a collect of discrete sampling points from a continuous two-dimensional function. Hence, the local SS at any pixel point around a given original point in a visual image can be determined by the Taylor expansion formula [42]. The -order Taylor approximation can be computed as follows [52]:where is the combinatorial number and is the -order image derivatives of function , namely, , which involves the important SS information of .

The above Taylor expansion formula indicates that the image pixel observation value can be approximated by the weighting summation of the differential structures, representing the SS of a texture image, under a certain observation scale. Since the observed image is smoothed by a Gaussian kernel, the differential structure is equivalent to the convolution operation between image and a Gaussian derivative filter (GDF) [52]; namely,where the symbol denotes the 2D convolution operation, is the Gaussian kernel function (GKF) with the scale , and represents the order GDF with -order partial derivative and -order partial derivative over and direction, respectively, , .

3.2. Steerable Isotropic GDF (SIGDF)

The isotropic GKF-based GDF, , can achieve the spatial structure characteristics of a texture image in its corresponding or direction depending on the derivative orders of the isotropic GKF. Since the GKF is circularly symmetric, we call the isotropic GKF-based GDF as isotropic GDF (IGDF). Intuitively, many TPs exhibit obvious directional behavior. Hence, the directional information is necessary to be incorporated into the IGDF so that the SS characteristics of TP in any direction can be achieved, which will benefit TPC.

We adopt a kind of steerable IGDF [53] (SIGDF) for SS characterization, which can achieve multidirectional SS feature of TP efficiently. To facilitate description, we denote as a -order IGDF with the scale parameter . Suppose represents the directional operation of the function at the orientation and represents the result of after rotating angle . According to the research results of Freeman [53], -oriented GDF can be computed by the following formula:

As can be seen from (23), the -oriented IGDF can be constructed through the weighting summation of some basic IGDF on the specific orientations. The computational schematic can be seen in Figure 2. The key issue in this expression is to determine the number of basic IGDFs and the corresponding weighting value for each of the basis IGDF.

We transform the from Cartesian coordinates into the polar coordinates to obtain the number of IGDF bases, which is supposed to be . Suppose the corresponding polar coordinates expression formula of is , where , , which can be also expressed with the Fourier series; namely,where denotes the Fourier decomposition. Comparing (24) with (23), it indicates that the number is equivalent to the number of nonzero harmonic components in the Fourier series of , in other words, is equal to the sum of nonzero number in . After the determination of the -value, it is easy to achieve the weighting coefficient , which is in fact the solution of the following equation by solving the Fourier transformation under the polar coordinates:

Namely, can be obtained by solving (25) after choosing the appropriate GDF bases. Then we can obtain a steerable IGDF on any direction. For instance, the steerable first-order IGDF and second-order IGDF can be computed as follows:Figure 3 displays the oriented filters of first-order, second-order, and fourth-order IGDF, , , and where [0~180].

3.3. Oriented Anisotropic GDF (OAGDF)

Many TPs have obvious geometric anisotropy visually, whose local linear structures, edge structures, and some specific spatial layout cannot be characterized effectively by the IGDF. Hence, it should adopt the anisotropic GDF (AGDF) to characterize various complicated anisotropic SS of TPs.

The commonly used two-variable elongated Gaussian kernel function (eGKF) in the two-dimensional Euclidean space is given by [54, 55]where is the 2D coordinate, is referred as the scale parameter, and is the anisotropic factor, satisfying . It is analogous to the SIGDF, the directional information should be introduced; hence the oriented AGKFs are generated by rotating the eGKF, given bywhere is the rotation matrix and is the orientation of AGKF. Geusebroek [56] presented an alternative expression of AGKF by setting different standard deviations in two orthonormal directions (e.g., in (28) and (29), they used and with respect to and , rather than a universal scale ); we adopt the abovementioned definition of AGKF with same scale parameter inspired by Shui’s work [55], because they have derived an explicit noise suppression formula and it is much easier to compute.

AGDF is the first-order partial derivative of eGKF with respect to the first variable , , and the oriented AGDF (OAGDF) is defined as the rotating expression of the AGDF, which reflects the SS variation of TP along the direction ,

It is worth noting that the OAGDF defined in (30) is not steerable (because ), so that the computation of the OAGDF in an arbitrary direction cannot be obtained by the weighting summation of some fixed GKF bases of fixed orthonormal directions [54] in analogous to the generation of the SIGDF as denoted in Figure 3.

To facilitate application, it reveals that the AGKF can be decomposed into two Gaussian line filters of nonorthogonal directions in literature [56]. The main thought came from the property of Fourier transform or the convolution theorem; namely, the Fourier transform of a convolution is the point-wise product of the Fourier transforms. Hence, an eGKF can be separated as the convolution of two nonorthogonal Gaussian line filters in accordance with the product components of the eGKF in the Fourier domain.where and . The Fourier transform of iswhere , , and .

The quadratic form of the exponential coefficients in can be rewritten as the following form:

To keep the response remaining the same, we get the corresponding weighted coefficients in (33) as

Hence the Fourier transform of can be rewritten as

According to the property of the exponential function, is a product of two Gaussian function; hence we can get the convolution form of by the inverse transform; namely,where is the standard Gaussian kernel function with zero mean and STD, e.g., , and is the 2-norm of vector , , , , and .

It indicates that the AGKF can be decomposed into a 1D Gaussian filter with the standard deviation (STD) and another 1D Gaussian filter with STD at the orientation of . It is worth noting that the orientation is not orthogonal to the orientation. Hence the OAGDF can be also computed as

We display a set of AGKFs with their corresponding OAGDFs, which exhibit obvious anisotropic characteristic in Figure 4. It can be used to extract the imperceptible changes in texture images along the orientation . It reveals that OAGDF with large-scale and large anisotropic factor can effectively extract the grayscale variation characteristics along the direction at a high signal-to-noise ratio (SNR); namely, the OAGDF can effectively characterize the SS of texture images, especially the edges along the direction , which involves the essential layout property of the SS of TP. The directional local SS of TP along the orientation is computed by the convolution operator.

4. Texture Pattern Classification

The proposed ISM-TPI mainly includes the texture image feature extraction module, classifier learning module, and taking actions with the trained classifier to TPC. We attempt to extract the essential SS characteristics of texture images in texture feature extraction based on the local differential-based SS representation and SWDM-based feature extraction. Schematic ISM-TPI can be found in Figure 5.

4.1. SWDM-Based TP Feature Extraction

SIGDF and OAGDF are used to obtain the omnidirectional SS characteristics of texture image at first. Then, the triparameter SWDM of the filter responses is approximated to obtain the image’s structure feature vector. Figure 6 displays a texture image with its SWDM-based image statistical modeling result. As can be seen from Figure 6, the SWDM can well fit the LSD of the GDF-based local SS characteristics and is apparently superior to the Gaussian distribution approximation result.

To achieve the omnidirectional SS characteristics, we discretize the omnidirectional interval into orientations uniformly. Then the SWDM parameters of the orientations are obtained to construct the texture image feature. Figure 7 displays the polar plot of the omnidirectional SWDM parameter features of three texture images convoluted with two kinds of OGDFs: one is a second-order IGDF and the other is the OAGDF. The SWDM parameter-based features are obvious different between different texture patterns, which makes the automatic texture pattern classification possible.

Besides the directional information, the observation scale is another important factor affecting the SS characterization. Hence, spatial structure characteristics observed in different scales are also analyzed by changing the size of in this paper.where collects the SWDM parameters of the SS convolved by the SIGDFs in and collects the SWDM parameters of the texture image’s SS characteristics convolved with OAGDFs in .where is the total number of GKF scales, is the number of anisotropic factor for MOOAGDFs generation, and , are the three parameters of the estimated SWDM of texture image filtered with a SIGDF, whose Gaussian kernel scale is and the orientation is ; , , are the corresponding three SWDM parameters of ITP filtered with an OAGDF, whose orientation is , anisotropic factor is , and the scale parameter of the eGKF is .

Hence, the statistical modeling-based SS characteristics of TP consist of two parts: one part is the multiscale and omnidirectional SIGDFs- (MOSIGDF-) based SWDM parameter feature, denoting as , and the other is the multiscale and omnidirectional OAGDFs- (MOOAGDFs-) based SWDM parameter features, abbreviated as . To obtain the local SS characteristics of TP, we can divide the image into multiple subblocks of fixed size, then each subblock can be treated as an independent image. Given subblocks are conducted detailed texture image feature analysis, the expanded TP feature can be consequently expressed bywhere denotes the SWDM parameter feature of the local subimage .

4.2. PLS-DA Based Classifier

TPC is a nonlinear function approximation task, with the multivariate image feature variables as the independent variables and the textural pattern labels as the dependent variables. Currently, the commonly used methods [57] to solve this task include the minimum distance classification (MDC), the K-nearest neighbor method (KNN), the soft independent modeling of class analogy classification (SIMCA), artificial neural network (ANN), the support vector machine (SVM), the partial least square-discriminant analysis (PLS-DA) method, and so on.

The PLS-DA [58] is a discriminant analysis method based on the PLS regression, which considers the class member information provided by the auxiliary matrix in the process of constructing the classification model. PLS-DA has the advantages of PLS method in feature extraction and noise reduction. It has been revealed that PLS-DA can effectively achieve the dimension reduction of input variables so as to obtain good classification performance [59]. It inevitably gives information redundant in the SWDM-based texture feature because of the introduction of the filter bank. So, the PLS-DA based classifier is adopted to build the TP classification models in this work.

Suppose there are labeled texture image samples , where is the image SS feature vector of each texture images, whose corresponding category label is , the effective mapping model from to can be established by the PLS-DA method to achieve TPC. Based on the PLS-DA method, and can be expressed by using the following model:where and are parameter matrixes, satisfying , , and are the regression residual error, and and are the latent variables in the PLS model, which are mainly used to build the correlation model between and . The linear regression model between and can be expressed as , where the regression factor can be expressed as .

As a result, in the classification task, firstly, the score vector of the sample to be tested is obtained according to parameter matrix , then the category label corresponding to the sample can be acquired based on the following expression

For the two-category classification problem, suppose is the class label, the obtained category label for any test image is not necessarily limited to ; the following rules can be chosen to distinguish the image classes:where is the appropriate threshold value and the selection of the suitable can minimize the classification error.

Concerning the multiclass TPC task, the decision-making tree method can be used to solve this problem with -classes samples; -1 PLS-DA classifiers are constructed in series to realize the accurate classification and recognition of -classes texture image samples.

5. Experimental Verification

5.1. Texture Data Sets and Experiment Setup

Three commonly used texture data sets, the Brodatz texture database (Brodatz [60]), the Columbia-Utrecht-Reflectance and Texture Database (CURet [61]), and the texture database (TDB [62]), were used to validate the classification accuracy (CA) of the proposed ISM-TPI and compare it with the state of the art. Detailed training and testing data sets are explained as follows.

The small-scale Brodatz data set (24 classes) was chosen and processed as reported in literature [3]. The 24 homogeneous texture patterns downloaded from the normalized Brodatz’s texture (NBT) database [63] can be seen in Figure 8., each of which was originally 640 × 640 pixels and was partitioned into 25 nonoverlapped subimages of size of 128 × 128 pixels, of which 13 samples were used for classifier training and the remaining 12 samples of each TP were used for testing.

The large-scale challenging Brodatz data set (112 classes) contains the entire 112 TPs in the NBT database. We call as a challenging data set due to the great visual similarity on different texture patterns. For instance, some different texture patterns in the data set are visually belonging to the same vision scene with different imaging distance, e.g., D6 and D14, D23 and D27, D25 and D26, and D79 and D78, while some texture patterns may incur visual ambiguities, e.g., D40 and D42, D103 and D104, etc. Some easily confusing TPs can be seen in Figure 9. The training and testing samples were processed the same as the data set.

The small-scale CURet data set (20 classes) was chosen from the CURet data set, where the texture patterns were captured from the real-world surface (with different capturing scales, different surface height variations and different orientations, which was believed can be used to span a wide range of geometric and photometric properties. We collect 90 images of the size of 128 × 128 (trimmed and rescaled) for each of the texture pattern and half the samples are used for classifier training. We display the texture pattern in Figure 10.

The full CURet data set (61 classes) was chosen from the CURet database, where all the 61 real-world surface texture patterns were used for performance comparison. The samples were prepared the same with database .

The original TDB data set (25 classes) includes 25 texture classes and each texture pattern contains 40 samples with the size of 640 × 480 pixels. The samples from each texture pattern are collect from different orientations and scales; some texture patterns can be seen from Figure 11. This data set is also a challenging testing set due to the large variability in the imaged appearance of the same texture pattern captured under different illuminations, positions, etc.

Table 1 summarizes the related information of the data sets used in the experiments. Since the multichannel color content does not introduce discriminative information (because they are very similar for all the textural images), we just consider the gray values of the texture images for feature analysis and TPC. Each experiment is repeated 30 times by randomly choosing the samples with replacement; as a result, the effectiveness of the method can be evaluated by calculating the average classification accuracies of the experiments with the standard deviations.

5.2. Textural Feature Characterization Methods in the Comparison Study

The proposed ISM-TPI method was compared with the state of the art. A summary of these comparative methods is displayed in Table 2 and the implementations details are described as follows.

GLCM [48]. GLCM method characterizes the texture image by calculating how often pairs of pixels with specific values and in a specified spatial relationship occur in an image. In the comparative study, texture images were firstly quantized into 16, 32, 64, and 256 grayscale levels. Then, we obtain 16 GLCMs with different spatial relationship settings on each specific gray level image. The 16 GLCMs can be denoted as , where denotes the pixel offset and denote the orientation of two cooccurrence pixel intensities x and y. Successively, 14 Haralick statistical measures [48] were calculated based on each , including energy, contrast ratio, correlation, entropy, and so on. Consequently, a 14 × 16-tuple vector for each texture image was obtained by concatenating the average value of each statistical measure on four different grayscale levels. However, to make fuller representation of the texture structure, texture image can be divided into nonoverlapped subimages with the size of no smaller than 32 × 32 pixels and then each subimage can be treated as an independent image to obtain the corresponding GLCM-based statistical measures. Finally, we can obtain a -tuple feature vector for texture pattern analysis, where and represent the image width and image height, respectively. For a fair comparison; the classifier was PLS-DA, the classifier structure with the training and test samples were used the same as the proposed ISM-TPI.

GWT [49]. We applied the Gabor wavelet transform with 5 scales and 7 directions on texture image and we obtain 35 Gabor filter bank responses. The Gabor filter bank are designed based on the strategy that ensure the half-peak magnitude support of the filter response in the frequency spectrum touch each other as described in literature [64]. The statistics, mean and variance, of the energy spectrum of each Gabor filtering responses were computed to constitute the GWT texture feature. It was treated the same as the GLCM feature; GWT features based on the local subblock were also obtained. Hence, the ultimate GWT-based texture feature is a -tuple feature vector.

MRMRF [50]. It applies a 3-layer Haar wavelet decomposition to construct the multiresolution image representation firstly, and then the MRF model in each decomposition subband is established. Each subband image is modeled with 4 nonzero MRF model parameters. For the 9 detailed subbands, a total of 36 parameters can obtained. At the same time 10 wavelet energy signatures (including the low-pass subimage energy signature) are collected. Hence, we can obtain 36+10 parameters to generate the MRMRF feature vector. If we divided the image into nonoverlapped subimages of 32×32 pixels and each subimage is processed the same as an independent image. We can obtain a -tuple feature vector. The classifier in the comparative study we used the PLS-DA classifier as well as the NLC approach as displayed in [50].

VZ algorithm [11]. Texture pattern is represented by the joint probability distribution of image filtering responses (MR8 filter bank is used in the experiments). The distribution is represented by the frequency histogram of filter response cluster centers (textons). For each texture class 10 texton cluster centers (texton) are learned by the standard K-means algorithm as experimented in [11]. Hence an image can be represented by a -bin histogram, and is the number of texture patterns.

PATCH [28]. Row pixel intensities of local patches of size 7 × 7 from the selected training images in a texture class are aggregated and clustered. The set of cluster centers from all the classes comprises the texton dictionary. Both the training and the testing are performed in the patch domain. The classification is still achieved by nearest neighbor classifier with the statistic.

PATCH-MRF [28]. A texture image is represented by a two-dimensional histogram: one dimension for the quantized bins of the patch center pixel and the other dimension for the learned textons from the patch with the center pixel excluded. As reported in [28], it chooses 200 bins for the central pixel with textons, which can achieve relative better performance.

WLD [12]. A texture image is represented by a two-dimensional histogram, one dimension for the pixel’s differential excitation and the other for the orientation. Hence, a size of histogram is achieved, where represents the number of cells in each orientation. Then, each column of subhistogram is evenly divided into segments and then the two-dimensional histogram is regrouped and encoded into a one-dimensional histogram . In the literature [12], the parameters are set as , , .

PRICoLBP [16]. PRICoLBP is an extended local binary pattern feature. According to the processing method reported in literature [16], we extract six cooccurrence patterns from three scales and two orientations and thus achieve a 2 × 3 × 590-dimensional feature vector. Concerning the classifier, the SVM classifier with RBF kernel and one-versus-the-rest strategy for the multiclass classification task was used in the literature [16]. In this work, the PLS-DA classifier is also applied for the classification performance comparison.

5.3. Experimental Results
5.3.1. Validation Experiments

The SWDM-based TP feature extraction results are directly related to the SS characterization results, which depend on the configuration of GDFs, and . To facilitate describing, we denote as the -th omnidirectional SIGDFs (OSIGDFs) with five specific scales, which includes five SIGDF templates, , where and , and we denote as the omnidirectional OAGDF (OOAGDF) with the fixed anisotropic coefficient under five specific scales . The multidirectional SS characteristics of texture image in the real application are implemented by discretizing the range of into orientations uniformly. In the experiments, . And regarding the local SS expression, we divide the original image into the nonoverlapped subimages with size of 32 × 32; namely, the samples in the , , , and are partitioned into 16 subimages, while the samples in the are divided into 300 subimages, each subimage is treated as an independent image and the expanded TP feature are extracted as described in the formula (41).

We firstly consider the classification performance based on the OSIGDFs and OOAGDFs independently. Figure 12 displays the OSIGDFs-based classification accuracies (CAs) of the proposed ISM-TPI method on the five prepared data sets.

As can be seen from Figure 12, in the small-scale data sets, e.g., , , it yields apparently higher classification accuracies, whereas on the challenging data sets, e.g., , , the classification performance declines for the great similarities existing in the visually confusing TPs.

Regarding the influence on the CAs of the differential-order of the oriented GDFs, the selection of the oriented third-order GDF alone, , achieves the relative lower classification accuracies on whatever the data set is, whereas the classification accuracies (CAs) based on the and are relatively higher than the exclusive usage of the . However, and generally reach close CAs on all of the five prepared data sets except on the data set , where CAs of are apparently higher than that of .

The combination of more than one differential-orders OSIGDFs can achieve much higher CAs, especially the full combination of the three OSIGDFs, denoted as in Figure 12, which can achieve the CAs as high as 92% on average on the data sets , , and and can also achieve CAs nearly close to the 90% on the challenging data sets and .

Some microstructural difference on different TP can be reflected by the local anisotropic and directional features. Hence, effective exhibition and representation of anisotropic features of the SS characteristics exhibiting in different texture patterns can facilitate the visually similar TP analysis in theory. We conduct experiments with only OAGDFs to obtain the anisotropic characteristics for classification performance comparison. Figure 13 displays the average classification accuracies with standard deviation values on the five prepared five data sets depending on OOAGDFs with different anisotropic factors , .

Comparing the OSIGDFs-based CAs in Figure 12 and the OOAGDFs-based CAs in Figure 13, OOAGDFs can achieve higher average classification accuracies (the average classification accuracy on the five data sets is almost as high as 94%) than that of OSIGDFs. The main reason is that the SSs of many TP exhibit obvious anisotropic behavior, which can be obtained by OOAGDFs with different parameter setting.

However, OOAGDFs-based TPC would yield higher variance value of CAs on these data sets, because many TPs exhibit obvious anisotropic organization behavior of the local homogeneous fragments, when the anisotropic factor of the OSIGDF is consistent to the anisotropy of the SS of TP; it will achieve good classification performance and vice versa. Hence, it is necessary to combine OSIGDFs and OOAGDFs with multiple scales and multiple potential anisotropic factors to obtain sufficient characteristic details of the SS of TPs to achieve better classification performance on various TPs. Table 3 enumerates the CAs on five data sets based on the combined OGDFs, including the first-order, second-order, and third-order IGDFs and AGDFs with five specific anisotropic factors, . In Table 3, the remarks, , , and , and the Gaussian scale parameter in OSIGDFs and OOAGDFs both include 5 different values, , and the omnidirectionality is implemented by discretizing the range into directions uniformly.

The classification results in Table 3 indicate that the proposed ISM-TPI method can achieve quite high and stable classification accuracies on the five data sets. Especially, based on the full combination of OSIGDFs and OOAGDFs (), it achieves high classification accuracies over 98% on data sets, , , and and over 94% on the challenging data sets and . The classification performance is quite stable considering the low STD values of CAs of the repeated experiments on almost all the five data sets, which are no greater than 2%.

5.3.2. Comparative Experiments

In terms of the characteristic of texture features extracted by these comparative methods, they are all statistics-related texture analysis methods, which can be categorized into three classes. One is the statistical measurement-based methods, including, GLCM, GWT, one is the statistical distribution modeling-related methods, MRMRF, and the other is histogram-based local descriptor method, including WLD and PRICoLBP. A summary description of these comparative TPC methods, including the feature description, feature dimensions and the classifier selection, is displayed in Table 2. The classification accuracies of the comparative methods on these data sets are displayed in Table 4, where the results marked with an asterisk are directly taken from the recent literature and marked with references.

As can be seen from Table 4, the proposed ISM-TPI can achieve comparable or superior classification performance with the state-of-the-art methods. In terms of CA results on data sets, , , and , the proposed ISM-TPI achieves higher CAs than all the comparative methods. Regarding the CAs results on data sets , almost all the methods can achieve higher classification performance; where the classic statistical measurement-based methods achieve about 92% CA, the methods VZ algorithm and PATCH can achieve as high as 97% CAs and the methods WLD and PRICoLBP can achieve the CAs higher over 98%. On data set , the best CA result is 98.92%, coming from the WLD method, which obtains the histogram information of the differential excitation and the gradient orientation, representing the salient local texture pattern, or in other words, the spatial structural characteristic as the proposed ISM-TPI method. On data set , the best classification result is 98.42%, coming from the PRICoLBP+SVM-based TPC method. However, the average CAs of ISM-TPI on the two data sets are only inferior as low as 0.1% and 0.38% than the best classification results from WLD and PRICoLBP+SVM, respectively. At the same time, CAs of the proposed ISM-TPI are higher than the other state-of-the-art methods, e.g., VZ algorithm, Path, and Path-MRF on the two data sets.

5.3.3. Discriminant Power of the SWDM-Based Feature

The main reason for achieving high classification accuracies is due to the stronger discriminant power of the SWDM-based texture features, which are demonstrated to be directly related to the perceptual properties of HVP system as explained in Section 2.2. Hence, the proposed ISM-TPI method can obtain the essential spatial structural characteristics of texture images and can be adaptive to various texture patterns. In addition, the introduced SIGDFs and OAGDFs are effective filter banks for multiscale and omnidirectional spatial structural characteristics expression. Hence, SWDM-based features can achieve comparative and even superior classification performance compared with the state-of-the-art TPC methods.

The SWDM parameters of two groups of typical confusing TPs are displayed in Figure 14. D23, D27, and D31 represent the rubble-like texture pattern with different visual scales; the visual difference of these TPs can be expressed by the scale parameter and location parameter , which is plot visually in the Figure 14(a). Regarding the TPs, D80, D82, and D84, whose surface organization of LHFs is resembling that may even mislead the visual identification of human beings. However, the essential distinctive SS of these TPs can be expressed clearly by the SWDM parameters on different observation directions as can be seen in Figure 14(b).

Some commonly used methods, e.g., GLCM, GWT, and MRMRF, cannot achieve high CAs on the challenging data sets mainly because that they may achieve similar feature parameters for the confusing TP analysis with great visual similarity. Figure 15 displays four kind of texture feature extraction results of the confusing TPs, D80, D82, and D84, by the GLCM-based texture analysis method, GWT texture analysis method, WLD method, and VZ algorithm. The GLCM features are computed as follows. Image pixel gray-value of the original images are quantized into 8 gray levels and the GLCMs are computed under four orientation, 0°, 45°, 90°, and 135°, each orientation with 8 distances, ; namely, we achieve 32 GLCMs for each ITP, and then the four GLCM-based statistics, contrast, correlation, energy, and homogeneity, are extracted and displayed in Figure 15(a).

Regarding the GWT features, we compute the Gabor wavelet filters by the following construction formula:where , , is the standard deviation of the Gaussian factor (in pixels), represents the preferred orientation (in radians), denotes the phase offset (in radians), represents the spatial aspect ratio (of the x- and y-axis of the Gaussian ellipse), and means the preferred wavelength (period of the cosine factor, united in pixels).

In the experiments, we choose 8 orientations in , and five different wavelengths, namely, , and we specify the as . Convolved with the forty Gabor filters, we can obtain the statistics mean and variance the texture images in each Gabor subband, which is plotted in Figure 15(c). The WLD features and the VZ algorithm features are histogram-based texture representation. The feature extraction steps fully follow the protocol in the literature [11, 12]. As can be seen clearly, GWT features, GLCM-based statistics, WLD histogram, and VZ algorithm-based texture features of D80, D82, and D84 exhibit strong similarity, some elements of which are even overlapped or only have very small difference, which seriously affected the discriminative ability of these methods for TPC.

In a word, the state-of-the-art methods either compute some statistical measurement, e.g., GLCM, GWT, or make joint statistics of the image local structure with one-dimensional or two-dimensional histograms, whose resultant models are not based on the visual perception mechanism; hence, the ultimate results of the texture feature would have low discriminant power for the visual similarity texture patterns or make mistake in identifying the true TP of texture images sampled from large intravariant TPs, whereas the proposed SWDM reveals the essential LSD behavior of LHFs of the SS of TPs and the SWDM parameters under different observation scales and orientations can effectively obtain the distinctive visual features of TPs; hence the proposed ISM-TPI achieve relative higher classification performance, especially with better identification ability to the classification of the TPs with great visual resemblance or TPs with large visual variants.

6. Conclusions

The visual appearance of a texture pattern is visually determined by the random layout of LHFs in the imaged surface. The effective characterization of the LSD of LHFs in texture image, determined by the CSS of texture pattern, is the foundation for accurate TPC. This paper presents a TPC method from the image statistical modeling aspect, denoted as ISM-TPI, which mainly concerns the LSD of the CSS of TP. It has made a theoretical explanation of the Weibull distribution behavior of the LHFs in the imaging process by introducing the SFT, which consequently derives a SWDM to extract the distinctive SS characteristics of texture images for TPC. Omnidirectional image SS details under different observation scales are achieved by the local differential operation with two kinds of OGDFs with high computational efficiency, including OSIGDF and OOAGDF. The WDM features are demonstrated to be directly related to the HVP system, which can achieve high classification accuracies combined with the PLS-DA classifier. Compared with the state-of-the-art methods, the proposed ISM-TPI method can obtain higher and stable classification accuracy especially has good identification ability to TPC for the structurally similar and visually confusing texture images.

Data Availability

All the texture image data sets supporting this work are from previously reported studies and data sets, which have been cited.

Conflicts of Interest

The authors declare that there are no conflicts of interest regarding the publication of this article.

Acknowledgments

This work is supported by the joint fund of National Natural Science Foundation of China (NSFC) and Guangdong Provincial Government under Grant U1701261, NSFC nos. 61501183, 61771492, and 61563015, in part by the Hunan Natural Science Foundation under Grant 2018JJ3349 and the Young Teacher Foundation of Hunan Normal University under Grant no. 11405, and partially supported by the MOE Key Laboratory of Image Processing and Intelligence Control under Grant IPIC2017-03.