A Weighted Block Dictionary Learning Algorithm for Classification
Discriminative dictionary learning, playing a critical role in sparse representation based classification, has led to state-of-the-art classification results. Among the existing discriminative dictionary learning methods, two different approaches, shared dictionary and class-specific dictionary, which associate each dictionary atom to all classes or a single class, have been studied. The shared dictionary is a compact method but with lack of discriminative information; the class-specific dictionary contains discriminative information but consists of redundant atoms among different class dictionaries. To combine the advantages of both methods, we propose a new weighted block dictionary learning method. This method introduces proto dictionary and class dictionary. The proto dictionary is a base dictionary without label information. The class dictionary is a class-specific dictionary, which is a weighted proto dictionary. The weight value indicates the contribution of each proto dictionary block when constructing a class dictionary. These weight values can be computed conveniently as they are designed to adapt sparse coefficients. Different class dictionaries have different weight vectors but share the same proto dictionary, which results in higher discriminative power and lower redundancy. Experimental results demonstrate that the proposed algorithm has better classification results compared with several dictionary learning algorithms.
Recently, sparse representation based classifications have been extensively discussed with encouraging results. In these methods, choosing a proper dictionary is the first and most important step. In literatures, there are two ways to design a dictionary: prespecified versus adaptive. At the early stage, for the sake of simplicity, predetermined dictionaries (e.g., overcomplete DCT or wavelets dictionaries) are often resorted to. Later on, dictionaries learned from training data  obtain more attentions, because the learned dictionaries usually lead to better representation and achieve much success in applications such as classification.
In last decades, many famous dictionary learning approaches have been proposed. These approaches can be divided into two categories: unsupervised dictionary learning (UDL) approaches  and supervised dictionary learning (SDL) approaches [3–7]. UDL learns dictionary using unlabeled training samples. SDL learns dictionary using labeled training samples. K-SVD algorithm  is a popular UDL algorithm that learns a compact dictionary by singular value decomposition from a set of unlabeled samples. It has been widely applied to image processing tasks, such as image compression [8, 9], image restoration [10, 11], image deblurring [12, 13], super-resolution [14, 15], and visual tracking [16, 17]. K-SVD mainly focuses on the representational power of sparse representation but ignores the discriminative power of sparse representation which is critical for pattern classification. Representational power of sparse representation is the capability to sparsely reconstruct sample using sparse coefficient and dictionary. Discriminative power is the capability that these sparse coefficients can be well distinguished when they belong to different categories; this results in the application of these sparse coefficients to classify these samples.
Depending on whether training samples have been labeled, current dictionary learning approaches can be divided into two types: UDL approach and SDL approach. However, depending on whether atoms have been labeled, current dictionary learning approaches also can be divided into two main types: shared dictionary learning approaches [1, 19–23] and class-specific dictionary learning approaches [3–7]. In shared dictionary learning approaches, all atoms do not have label information and are shared by samples from all classes. Shared dictionary learning approaches can be UDL approaches, but they also can be SDL approaches. For example, K-SVD algorithm  is a shared and unsupervised dictionary learning approach, but D-KSVD algorithm  is an extension of K-SVD algorithm, which is a shared and supervised dictionary learning approach. D-KSVD learns a discriminative dictionary by incorporating linear classification error into objective function. In class-specific dictionary learning approaches, all atoms are labeled, each of which can only be shared by the same class samples. Class-specific dictionary learning approaches must be SDL approach. For class-specific dictionary, class-specific reconstruction errors can be utilized to classify samples. Moreover, some discriminant criterions can be incorporated into dictionary learning processing. For example, Zhang et al. presented a low rank constraint , Yang et al. added a Fisher discrimination criterion , Ramirez et al. proposed a structure incoherent constraint , and so forth. However, when there are a large number of classes, the size of learned dictionary will be very large, and the redundancy of learned dictionary could become serious. Recently, some hybrid methods which combine shared dictionary and class-specific dictionary have been proposed [25–27]. In these methods, shared and class-specific parts need to be predefined, and the balance of the two parts is not a trivial task which is usually determined empirically.
Although above-mentioned dictionary learning methods have achieved good classification results, labels of these dictionary atoms are predefined and fixed, which may not be able to accurately interpret true structure of data. Yang et al.  proposed a latent dictionary learning (LDL) method. LDL learns a latent matrix to build the relationship between dictionary atoms and class labels; this mechanism has achieved very high classification accuracy.
In this paper, we propose a new dictionary learning method, named weighted block dictionary learning (WBDL) method. This method is a compromise between shared dictionary and class-specific dictionary. As shown in Figure 1(a), WBDL learns a proto dictionary which can be shared by all class dictionaries. Proto dictionary contains blocks. Assuming training samples have classes, model should learn subdictionaries, , where is a class-specific dictionary corresponding to class . Each class dictionary is obtained by multiplying proto dictionary with the corresponding weight vector. For the weight vector, each value in it indicates the contribution of a block when constructing class dictionary. Class dictionary is a class-specific dictionary which represents samples coming from the same class. The sparse coefficients are obtained by sparsely represented samples into the class dictionary. As shown in Figure 1(b), a new test sample is represented by each class dictionary, and we get sparse coefficients, . Those sparse coefficients can be utilized to classify this test sample. For WBDL model, instead of predefining each block belonging to only a single class, each block of proto dictionary can belong to all classes. The shared dictionary [1, 19–23] could be regarded as a special case of our WBDL model when weight matrix is an all-one matrix. The class-specific dictionary [3–7] also could be regarded as a special case of our WBDL model when weight vector has only one unique nonzero element 1. Compared to shared dictionary and class-specific dictionary, our proposed model is more flexible, and it increases discriminability and reduces redundancy simultaneously.
Our specific contributions are listed as below.
Firstly, for higher discriminative power and lower redundancy, we design a proto dictionary and some class dictionaries, where each class dictionary is a weighted proto dictionary. Our goal is to learn a compact proto dictionary and some discriminative class dictionaries. The class dictionary can represent samples sparsely and discriminatively.
Secondly, sparse coefficients obtained by sparse representation can be utilized to implement the classification. Two classification algorithms are proposed: local classification algorithm and global classification algorithm. When training samples for each class are enough, test samples are locally coded into all class dictionaries. On the contrary, test samples are globally coded into total dictionary. Global classification algorithm is a simplification of local classification algorithm.
Thirdly, weight vector corresponding to each class dictionary is easy to learn as it adapts to sparse coefficients of the samples coming from the same category. These weights can be computed from these sparse coefficients directly. Compared to traditional dictionary learning algorithms, WBDL algorithm would not significantly increase computational complexity. Experiments on some databases show that WBDL algorithm is competitive to some algorithms such as [2, 3, 5, 20, 23, 29].
This paper is organized as follows. In Section 2, we illustrate the related work, including shared dictionary and class-specific dictionary. In Section 3, weighted block dictionary learning model is proposed and analyzed. Two WBDL classification approaches also are proposed in this section. In Section 4, optimization of WBDL model is described and its two classification algorithms are given. In Section 5, experiments are performed on face recognition and object classification datasets to compare our algorithm with several state-of-the-art methods. We end this paper with a conclusion in Section 6.
2. Related Work
In this section, we review two types of dictionaries, shared dictionary and class-specific dictionary.
2.1. Shared Dictionary
K-SVD algorithm is a popular UDL algorithm, which learns a shared dictionary. K-SVD optimizes the following objective function:where are input signals; each is dimension. (, making overcomplete) is a dictionary with atoms. are sparse codes of input signals . is a constant which controls the number of nonzero elements in less than .
The minimization of (1) is solved by a two-step iterative algorithm. Firstly, dictionary is fixed and sparse coefficients can be found. This is a sparse coding problem, which can be solved by OMP , and so forth. Secondly, sparse coefficient matrix is fixed and dictionary is updated one atom at a time while fixing all other atoms in .
For each atom and the corresponding th row of coefficient matrix denoted by , define the group of samples that use this atom as . The error matrix is computed: ; restrict by choosing only the columns in and obtain . Then the following problem is solved:
A singular value decomposition (SVD) is performed, . , , where denotes the first column of and for the first row of .
D-KSVD algorithm is a supervised extension of K-SVD algorithm, which is SDL algorithm, but it also learns a shared dictionary. D-KSVD optimizes the following problem:
is the label matrix of input signals . is a linear classifier; is its weight matrix. The term denotes reconstruction error. The new term is a classification error of linear classifier. controls balance between reconstruction and discriminant.
Compared to K-SVD algorithm, D-KSVD algorithm adds the second term classification error. D-KSVD dictionary learning method utilizes the class labels of training samples and its dictionary is more discriminative. However, the class labels of atoms have not been taken into account in D-KSVD algorithm.
2.2. Class-Specific Dictionary
Class-specific dictionary should be learned using a SDL algorithm. Suppose that there are class samples; a class-specific dictionary is denoted as ; each is a subdictionary corresponding to class . All atoms of dictionary have been labeled, and subdictionary should be learned or constructed class by class.
Sparse representation based classification (SRC)  method is a popular method to construct class-specific dictionary. Suppose that there are classes of samples; is the set of training samples and is the subset belonging to the th class. SRC can be summarized into two stages. Supposing is a query sample, is sparsely represented into constructed dictionary through . Then , where can be used to classify . Obviously, SRC utilizes representation residual to classify a test sample.
SRC is a constructed dictionary, and the subset can be denoted as the th class dictionary. Generally, class-specific dictionaries can be learned class by class using the following objective function:
Equation (4) can be regarded as a basic model of class-specific dictionary learning method. This model does not consider the relationship between subdictionaries. Yang et al. proposed a Fisher discrimination dictionary learning method (FDDL) , which learns subdictionary for each class. FDDL model can be described as follows:where the first term is data fidelity term, the second term is sparsity penalty, the third term is discrimination term, and the fourth term could make the function smooth and convex. FDDL model makes class-specific dictionary more distinctive.
In this paper, we integrate shared dictionary and class-specific dictionary into a new dictionary learning model and propose WBDL model.
The block structure of WBDL model is ensured by a mixed norm regularization. Most aforementioned methods simply adopt norm or norm for sparsity regularization. norm sparsity regularization has also been referred to as Lasso . Inspired by the success of structured sparsity (Group Lasso) in the area of compressed sensing, some methods have been proposed for structured sparsity regularization. For example, Bengio et al. proposed group sparse coding (GSC) , which joins each category training sample into the same group and regularizes using mixed norm. This method encourages the same group samples encoded using the same dictionary atoms. Elhamifar and Vidal proposed block sparse coding (BSC) , which also uses mixed norm for regularization, but this regularization is used on a sparse coefficient vector rather than on a sparse coefficient matrix. The block sparsity regularization encourages block structure of the learned dictionary. Chi et al. proposed block and group regularized sparse coding (BGSC) , which combines group sparse coding and block sparse coding together. As shown by above methods, mixed norm is a suitable tool to learn a block structure of proto dictionary.
Generally, dictionary learning model has two unknown variables dictionary and sparse coefficient; WBDL introduces a new variable weight vector. We find that when a dictionary block and the th class samples are more similar than the other dictionary block, this dictionary block is more suitable to represent the th class samples. In consequence, the sparse coefficients corresponding to this block are relatively larger than others. Inspired by this observation, weight vector corresponding to the th class dictionary can be obtained through sparse coefficients of the th class samples. In order to avoid increasing computation complexity, weight vector is constructed through sparse coefficients directly in WBDL model.
3. Weighted Block Dictionary Learning
Shared dictionary learning algorithm ignores class labels of these dictionary atoms. Recently various class-specific dictionary learning approaches [3–7] have been proposed. These class-specific dictionary learning approaches are based on the assumption that the class label of each atom is invariable during the dictionary learning process. However, since dictionary atoms have been updated, the class label of these atoms should be reassigned in accordance with the updating of these dictionary atoms. The goal of our proposed weighted block dictionary learning model is to learn a labeled adaptive dictionary, which is composed of a proto dictionary and a weight matrix. Each column of the weight matrix is a weight vector that indicates the contribution of each proto dictionary block to construct a class dictionary. As a result, class dictionary is obtained by the product of a weight vector and the proto dictionary. In this section, firstly, we propose weighted block dictionary learning model. Secondly, we discuss the construction of weight matrix. Thirdly, we compare the difference of our WBDL model with BDL model. Finally, two classification approaches are proposed using WBDL model.
3.1. Weighted Block Dictionary Learning Model
Assume that is an -dimensional signal with class label . The training set with samples is denoted as , where is the subset associated with class . We design a proto dictionary , where is the th block of proto dictionary, is the number of blocks, and is the total number of dictionary atoms. To better describe the relationship between a proto dictionary and class dictionaries, a weight matrix is introduced into our WBDL model, where is a vector to indicate the contribution of each proto dictionary block when constructing the th class dictionary. For instance, is the weight value of the th block proto dictionary to construct the th class dictionary. Correspondingly, the th class dictionary , , is denoted as . is a diagonal matrix with vector as its diagonal vector. In order to represent weight value of each atom, the size of diagonal matrix is , so the weight vector must be resized from to , where is the number of atoms in th block. For example, when the number of atoms in each block is 2, a weight vector should be resized from to . Finally, a sparse representation to encode data on the corresponding class dictionary is obtained. Take the th class data as an example; the th class data can be represented as .
The objective function of our proposed weighted block dictionary learning (WBDL) model can be described as follows:where is a proto dictionary, is the th class dictionary, and are the sparse codes of . is the th block sparse coefficient of the th sample. The first term denotes the reconstruction error of all th category samples. The second term is the block sparse regularization of all th category samples. is a scalar controlling the trade-off between reconstruction and sparsity. In order to avoid a trivial solution of sparse coefficient , each dictionary atom be constrained, .
As shown in (6), WBDL model is a nonconvex optimization problem, in which three unknown variables , , and need to be optimized. We propose a two-step iterative algorithm to solve this problem. The first step is the following: weight matrix is fixed and coefficient matrix and dictionary are learned, which is a general dictionary learning problem. The second step is the following: coefficient matrix and dictionary are fixed and weight matrix is constructed. Construction of the weight matrix is crucial for this new dictionary learning model.
3.2. Construction of Weight Matrix
Without loss of generality, a weight value is required to be nonnegative, and the sum of which is equal to 1; that is, , , ; . When proto dictionary and sparse coefficients are fixed, how to calculate weight value for every block is crucial for us. If we do not take into account weight matrix, we can rewrite the block representation of the th class samples as follows:where is the th class samples, is the th block of proto dictionary, and is the th block sparse coefficient corresponding to proto dictionary block . Observing the sparse coefficients obtained by block sparse representation, a phenomenon is found. When is more similar to the th class samples, should be larger than ; we find that the value of is larger than the other coefficient block , . Inspired by this observation, weight value can be calculated using value of sparse coefficient . Consistent with Frobenius norm of reconstruction error, Frobenius norm of is utilized to compute weight. Thus, our objective function can be rewritten as follows:
Obviously, weight value can be computed from other variables directly; three variables that needed to be optimized have been reduced to two variables. The weight value in (8) is nonnegative and satisfies the former constraint, , .
3.3. A Discussion about BDL Model and WBDL Model
Compared to block dictionary learning (BDL) model , our weighted block dictionary learning (WBDL) model introduces a weight vector into dictionary learning. The objective function of original block dictionary learning model can be described aswhere is the th block dictionary. The objective function of (9) can be rewritten:
Compared to the objective function of BDL  in (10), WBDL model in (8) deletes weight in block sparse regularization. When weight value of the th block is larger than weights of other blocks, the th block represents the th class samples better than other blocks, so sparse coefficients corresponding to this block should be even larger than others. In BDL model , a big value will suppress and force the solution to be small. In our proposed WBDL model, we delete weight , and this deletion will bring a relative increase of block sparse coefficients . These increased sparse coefficients improve discriminative power compared with the original BDL model.
For example, a proto dictionary has 4 dictionary blocks, and each block has 2 atoms. A weight vector corresponding to th class dictionary is . If a sparse coefficient of th class sample is , this sparse coefficient is taken as initialization of our model; then how will our new model modify this initial coefficient? Firstly, according to equation , new sparse coefficient of our WBDL model can be obtained, . Now we observe the new sparse coefficient ; obviously, this sparse coefficient cannot clearly denote representational power of each dictionary block to represent the th class samples. The 2nd block sparse coefficient is very large in the block sparse regularization term. This will be modified at the next iteration, and the 2nd block sparse coefficient will decrease. For example, the final sparse coefficient can be denoted as . In the end, weighted sparse coefficient is used to classify this sample. Obviously, this weighted sparse coefficient is more coherent with weight vector compared with initial sparse coefficient . Weighted sparse coefficient obtained by our model will be more coherent with weight vector, and this will improve classification accuracy in the following classification task. An experiment has been taken to test the two models on a subset of AR database. We selected 60 face images for testing. The sparse coefficients of testing images in BDL model are shown in Figure 2(a) and the weighted sparse coefficients of the same testing images in WBDL model are shown in Figure 2(b). As shown in Figures 2(a) and 2(b), compared to BDL model, weighted sparse coefficients in WBDL model are more compact and discriminant.
3.4. WBDL Classification Model
The weighted sparse coefficient obtained by WBDL model will be used to classify sample. A linear classifier can be learned while we learn a dictionary as D-KSVD algorithm does. Combining classification error of a linear classifier, WBDL classification model can be described as follows:where is label matrix of , in which is the label matrix of all the th class samples. Each column of is a vector to indicate class label of the corresponding sample. For example, , where the position of nonzero element indicates the class label. is the coefficient matrix of linear classifier; the th row of is coefficient vector of the th linear classifier. The first term denotes the reconstruction error of the th class samples, the second term is classification error of the linear classifier, and the third term is block sparse regularization. and are scalars controlling trade-off between reconstruction, discriminant, and sparsity.
Concatenating the first term and the second term, let and ; (11) can be rewritten as follows:
Upon the training of the labeled data, we learn a weight matrix and an extended dictionary , which is the concatenation of a proto dictionary and a linear classifier . However, since and are normalized jointly in the previous learning process, does not support sparse code of a new test sample. As proposed in , proto dictionary and corresponding classifier are normalized as follows:
For normalized proto dictionary and weighted matrix , we can obtain a sparse code for a new test sample by two coding strategies: local coding and global coding.
When training samples for each class are enough, a test sample is locally coded into all class dictionaries, respectively. Taking the th class local code as an example, we have the following: Local sparse code:
The final classification of this test sample is based on the following classifier: Local classifier:
The label of test sample is determined by the class label of sparse code which has the smallest reconstruction error.
When training samples for each class are not enough, a test sample can be globally coded. We define a total weight vector, denoted by , which reflects the total relationship between each block of proto dictionary and all involved classes. A big value of shows that proto dictionary block is important to represent all classes. Global sparse code can be computed as follows: Global sparse code:
Utilizing the former learned linear classifier , the final classification of test sample can be obtained by the following classifier: Global classifier:where is a vector. The label of test sample is determined by the index of the largest element in .
4. Optimization of WBDL Model
In the objective function of our proposed WBDL model, there are two unknown variables and , a variable which can be computed from directly. We adopt alternated optimization to solve such multivariable problem and design a two-step iterative algorithm. Firstly, weight matrix is fixed but coefficient matrix and proto dictionary are learned, which is actually a general dictionary learning problem. Secondly, coefficient matrix and proto dictionary are fixed and weight matrix is updated, which is a process to learn weight matrix. In this section, we describe the optimization of each step separately and give the whole algorithm at last.
4.1. Dictionary Learning
When weight matrix is fixed, coefficient matrix and proto dictionary are learned. Firstly, we fix proto dictionary and learn coefficient matrix ; this is block sparse coding. Secondly, we fix coefficient matrix and update proto dictionary , which is dictionary updating.
4.1.1. Block Sparse Coding
For example, we compute the block sparse coefficient of a th class sample . Firstly, we obtain the th class dictionary, , and then we formulate the minimization of (8) only for the th block of sparse coefficient ; this optimization is similar to that of BGSC method . By fixing , (8) can be written aswhere is the th block of class dictionary and is the term that does not depend on . Computing the gradient of (18) with respect to , we can obtain the following condition:
Assuming , denoting the first two terms by , substituting the semidefinite matrix with its Eigen-decomposition , and multiplying with on both sides, (19) can be formulated intoDenoting new variable , and , we haveSetting , and , we haveWe can compute using Newton’s method. Once is known, we can compute and . Finally, can be obtained, .
When the solution of is not positive, the above assumption does not hold. In this case, the optimality solution is . The proof can be found in .
4.1.2. Dictionary Updating
Let denote weighted sparse coefficients; we fix and update proto dictionary ; the objective function subject to is as follows:
We can minimize the objective function by Lagrange dual method .
4.2. Learning of Weight Matrix
We find that sparse coefficients inherit weight information of class dictionary. Motivated by this observation and the details described in Section 3.2, weight matrix can be constructed as follows:where is the th block of matrix ; is the sparse code matrix of the th class samples. The adopted norm is Frobenius norm. satisfies the following conditions:
In the process of block sparse coding, sparse coefficient is computed one by one. When learning weight vector, each weight vector is computed from all the sparse coefficients of the same class. A weight vector is reflected by all the sparse coefficients of the same class. As a result, the computation of weight vector has not been integrated with block sparse coding. Weight vector is computed after all sparse coefficients have been obtained.
In our proposed WBDL model, for each proto dictionary block, rather than assigning it to only one class, we assign weight values to indicate its relationship to all class dictionaries. The weight matrix preserves more class-label information. The construction of a weight matrix in the model adapts to block sparse coefficients and has not improved computation complexity significantly.
WBDL algorithm and its two classification algorithms, local classification algorithm and global classification algorithm, are described as follows.
Algorithm 1 (weighted block dictionary learning algorithm).
Input. A training sample set and its class label , .
Output. Proto dictionary and weight matrix .
Step 1. Initialize to all-one matrix.
Step 2. Dictionary learning Repeat Block sparse coding: compute sparse coefficient by minimizing (18) while fixing the corresponding class dictionary. Dictionary updating: update proto dictionary by minimizing (23) while fixing the weighted sparse coefficients. Until convergence.Step 3. Construct weight matrix by the definition in (24).
Step 4. Return to Step until the values of the objective function in (8) are close enough or the maximum number of iterations is reached. Output and .
Algorithm 2 (WBDL local classification algorithm).
Input. A training sample set and its class label , , a test sample .
Output. Proto dictionary , weight matrix , and the classification result of test sample .
Step 1. Learn proto dictionary and weight matrix using WBDL algorithm (Algorithm 1).
Step 2. For a test sample , compute local code , , using (14).
Step 3. Compute the label of using (15).
Algorithm 3 (WBDL global classification algorithm).
Input. A training sample set and its class label , , a test sample .
Output. Proto dictionary , weight matrix , classifier , and the classification result of test sample .
Step 1. Generate label matrix .
Step 2. Generate new data .
Step 3. Learn extended dictionary and using WBDL algorithm (Algorithm 1).
Step 4. Separate extended dictionary into and ; normalize and by (13).
Step 5. Compute total weight vector as .
Step 6. For a test sample , compute global code by (16).
Step 7. Compute the label of using (17).
5. Experimental Results
In this section, WBDL algorithm was evaluated on three classification tasks of simulation experiment, face recognition, and object recognition. For simulation experiment, we compared Fisher value when block structure and weight matrix were separately introduced. For face recognition, we experimented on two face databases: AR  and Extended Yale B . For object recognition, Caltech101 database  was adopted for evaluation. For all experiments, the randomly selected samples from the same class were taken as the initialization of proto dictionary, and an all-one matrix was taken as the initialization of weight matrix. For global classification algorithm, the scalar controlling discriminant term was set to 1.
5.1. Simulation Experiment
Compared to general dictionary learning algorithm, WBDL model introduces block structure and weight matrix. In this section, we measure the discrimination of this model by Fisher criterion, which is the ratio of between-class variance and in-class variance. Fisher criterion can be defined as follows: where is the mean of the th class sparse codes, is the total number of the th class samples, and is the sparse code of the th sample belonging to the th class. A bigger Fisher value means a better classification result. We used 52 images from 2 random persons in AR database for this simulation. For each person, we randomly selected 20 images for training and the remaining 6 images for testing. We used the same parameters for all the following four methods, D-KSVD , WDL, BDL, and WBDL. WDL is the algorithm which only introduces weight vector, and BDL is the algorithm only introducing block structure. WBDL is the algorithm adding weight vector and block structure, simultaneously. The obtained Fisher values are listed in Table 1. The results show that all weight vector and block structure have improved discriminant performance of dictionary learning; WBDL algorithm is more discriminative for dictionary learning. Particularly, WBDL local classification algorithm is more competitive than WBDL global classification algorithm. Local classification algorithm can improve Fisher value better than global classification algorithm. Just because local classification algorithm fully takes advantage of weight vector but global classification algorithm does not, global classification algorithm can be taken as simplicity of local classification algorithm.
5.2. Face Recognition on AR
Face recognition is a popular application of computer vision and pattern recognition in recent years. In this section, WBDL algorithm is evaluated through face recognition task on AR face database . As shown in Figure 3, we show 10 face images of different two subjects. These images in AR database include much more facial variations, including expression, illumination, and facial disguises (sunglasses and scarves). AR database consists of over 4,000 color images of 126 persons, and each person has 26 face images. A subset consisting of 2600 images from 50 male subjects and 50 female subjects was used in this experiment. For each person, we randomly selected 20 images for training and the remaining 6 images for testing. The average of the results on six such random splits of training and testing images is taken as the final results.
In all experiments, AR face image was projected into a vector with Randomface . The learned proto dictionary had 500 atoms, 5 items per block. The regularization parameter was set to 0.03. WBDL algorithm is compared with several recently proposed algorithms including SRC , KSVD , D-KSVD , and LC-KSVD . Recognition results are summarized in Table 2. As shown in Table 2, WBDL global classification algorithm and WBDL local classification algorithm all outperform the competing methods. In these experiments, dictionary learning processes are all the same; local classification algorithm can improve accuracy better than global classification algorithm. Generally speaking, when there are adequate training samples, WBDL local classification algorithm is more competitive than WBDL global classification algorithm with respect to making full use of weight vector.
In addition, when block structure and weight vector were introduced, we recorded the decrease of objective function values in (8) with varied number of iterations. Figure 4 displays the values of objective function with different number of iterations. As shown in Figure 4, after 10 runs, the objective function values decrease very lowly, so WBDL algorithm converges much faster.
5.3. Face Recognition on Extended Yale B
In this section, we evaluate WBDL algorithm with existing dictionary learning methods on Extended Yale B face database . Extended Yale B database consists of 2,432 cropped frontal face images of 38 individuals. For each person, there are 64 face images that are captured under various lighting conditions. As shown in Figure 5, we show 12 face images of different two subjects. The key challenge of this database is due to varying illumination and expression. Since the dimension of original face images is large, we reduce dimension of images to using Randomface . To compare proposed algorithm with other methods, we randomly chose the half for training and the rest for testing for each subject in all experiments. For simplicity of analysis, we learned 38 dictionary blocks. Assuming that all proto dictionary blocks have the same number of atoms, we learned atoms for each block. Regularization parameter was set to 0.06, 0.07, 0.07, and 0.08 for each block size. Test samples were globally coded and locally coded, separately. Experiments were repeated 6 times for random split of training data and testing data; the average classification rates among all the trials were taken as the final results.
The proposed WBDL algorithm is compared with several recently proposed algorithms including SRC , KSVD , D-KSVD , LC-KSVD , Pl2/1 , and SVGDL . Recognition results are presented in Figure 6. As the results showed, WBDL algorithm always outperforms other methods, especially when dictionary size is small. When dictionary size is bigger, for example, block size being 32, classification accuracies of these learned dictionaries (K-SVD, D-KSVD, LC-KSVD, and SVGDL) do not excess accuracies of those constructed dictionaries (SRC, Pl2/l1), but the classification accuracies of our two WBDL classification algorithms are far in excess of 3% and 3.6% compared with SRC.
5.4. Object Classification on Caltech101
Caltech101 database  contains 101 object classes and a “background” class with high shape variability. The number of images per category varies from 31 to 800. Most images are medium resolution of about pixels. As shown in Figure 7, we show 15 images of Caltech101 database; those images come from 15 different categories.
We firstly extracted SIFT  descriptors from patches which were densely sampled using a grid with step size of 6 pixels. Secondly, we extracted the spatial pyramid features based on SIFT features with three grids of sizes , , and in each spatial subregion of the spatial pyramid. Thirdly, the features were pooled together to form pooled features . Max pooling and normalization were used for pooling and normalization, respectively, which were evaluated in  being superior to other pooling and normalization methods. Fourthly, we trained the codebook for spatial pyramid features using standard -means clustering with ; then the spatial pyramid features were reduced to 3000 dimensions from dimensions by PCA. Finally, we trained class dictionary and learned classifier on the final spatial pyramid features using WBDL algorithm. Following the common experimental settings, we trained on 5, 10, 15, 20, 25, and 30 samples per category and tested on the rest, and the test samples were globally coded and locally coded, separately. We repeated experiments 6 times with different random splits of training and testing images; the average results of each run were reported as final recognition rates. The sparsity controlling used in all the experiments is 0.06. The results compared with the popular ScSPM , SRC , K-SVD , D-KSVD , LC-KSVD , FDDL , and SVGDL  algorithms are listed in Table 3. As shown in Table 3, WBDL algorithm maintains the highest classification accuracies when we trained on 5, 10, 20, 25, and 30 samples per category. SVGDL algorithm obtains the highest classification accuracy when 15 samples per category were selected to train the dictionary. In general, WBDL algorithm maintains the higher classification accuracies.
We also compare classification accuracy with SRC , K-SVD , D-KSVD , LC-KSVD , and SVGDL  using different dictionary sizes K = 510, 1020, 1530, 2040, 2550, and 3060 when we randomly select 30 images per category as training data. As shown in Figure 8, WBDL algorithm maintains the highest classification accuracy in all dictionary size compared with other six methods. Experiment results in Table 3 demonstrate the increased classification accuracy while adding the numbers of training samples. The experiment results in Figure 8 describe the increased classification accuracy while adding the size of dictionary. Our results in Figure 8 maintain the highest accuracy in all size, which are better than the results of SVGDL algorithm. Since WBDL algorithm is more sensitive to size of dictionary block SVGDL algorithm is more sensitive to numbers of training samples. Compared to other six dictionary learning algorithms, when dictionary size is small, for example, 510, WBDL algorithm and SVGDL algorithm all have improvement in classification accuracy.
In this paper, a weighted block dictionary learning algorithm is proposed, which is a compromise of shared dictionary and class-specific dictionary. This WBDL dictionary is the product of proto dictionary and corresponding weight vector. Proto dictionary is a shared dictionary. Weighted proto dictionary is a class-specific dictionary. WBDL dictionary learning method reduces the redundancy and enhances the discriminative ability, and it is beneficial to explore the intrinsic structures of dictionary. The experimental results on three databases demonstrate that WBDL algorithm maintains the higher classification accuracies. Compared to the other dictionary learning algorithms, the proposed WBDL algorithm is more discriminative with small dictionary size. Just because WBDL takes advantage of weight vector, the dictionary learned by WBDL algorithm is more discriminative and compact.
The author of this paper declares that there is no conflict of interests regarding the publication of this paper.
This research was supported by the National Natural Science Fund of China (Grant nos. 60632050, 9082004, and 61202318), the National 863 Project (Grant no. 2006AA04Z238), and the Basic Key Technology Project of the Ministry of Industry and Information Technology of China (Grant no. E0310/1112/JC01).
I. Ramirez, P. Sprechmann, and G. Sapiro, “Classification and clustering via dictionary learning with structured incoherence and shared features,” in Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR '10), pp. 3501–3508, June 2010.View at: Publisher Site | Google Scholar
J. Mairal, F. Bach, J. Ponce, G. Sapiro, and A. Zisserman, “Discriminative learned dictionaries for local image analysis,” in Proceedings of the 26th IEEE Conference on Computer Vision and Pattern Recognition (CVPR '08), pp. 1–8, Anchorage, Alaska, USA, June 2008.View at: Publisher Site | Google Scholar
A. M. Martinez and R. Benavente, “The AR face database,” CVC Technical Report 24, The Ohio State University, 1998.View at: Google Scholar
S. Bengio, F. Pereira, Y. Singer, and D. Strelow, “Group sparse coding,” in Proceedings of the 23rd Annual Conference on Neural Information Processing Systems (NIPS '09), pp. 82–89, December 2009.View at: Google Scholar
J. Mairal, J. Ponce, G. Sapiro, A. Zisserman, and F. R. Bach, “Supervised dictionary learning,” in Advances in Neural Information Processing Systems, pp. 1033–1040, 2009.View at: Google Scholar
S. Kong and D. Wang, “A dictionary learning approach for classification: separating the particularity and the commonality,” in Computer Vision—ECCV 2012, pp. 186–199, Springer, 2012.View at: Google Scholar
L. Shen, S. Wang, G. Sun, S. Jiang, and Q. Huang, “Multi-level discriminative dictionary learning towards hierarchical visual categorization,” in Proceedings of the 26th IEEE Conference on Computer Vision and Pattern Recognition (CVPR '13), pp. 383–390, Portland, Ore, USA, June 2013.View at: Publisher Site | Google Scholar
S. Cai, W. Zuo, L. Zhang, X. Feng, and P. Wang, “Support vector guided dictionary learning,” in Computer Vision—ECCV 2014, pp. 624–639, Springer, 2014.View at: Google Scholar
Y. C. Pati, R. Rezaiifar, and P. Krishnaprasad, “Orthogonal matching pursuit: recursive function approximation with applications to wavelet decomposition,” in Proceedings of the Conference Record of The Twenty-Seventh Asilomar Conference on Signals, Systems and Computers, pp. 40–44, 1993.View at: Google Scholar
H. Lee, A. Battle, R. Raina, and A. Y. Ng, “Efficient sparse coding algorithms,” in Proceedings of the 20th Annual Conference on Neural Information Processing Systems (NIPS '06), pp. 801–808, December 2006.View at: Google Scholar
D. G. Lowe, “Object recognition from local scale-invariant features,” in Proceedings of the 7th IEEE International Conference on Computer Vision (ICCV '99), pp. 1150–1157, Kerkyra, Greece, September 1999.View at: Google Scholar
J. Yang, K. Yu, Y. Gong, and T. Huang, “Linear spatial pyramid matching using sparse coding for image classification,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR '09), pp. 1794–1801, 2009.View at: Google Scholar