Low-Rank Affinity Based Local-Driven Multilabel Propagation

Li, Teng; Cheng, Bin; Wu, Xinyu; Wu, Jun

doi:https://doi.org/10.1155/2013/323481

Mathematical Problems in Engineering

On this page

Abstract Introduction Conclusion Acknowledgments References Copyright Related Articles

Special Issue

Stochastic Systems: Modeling, Optimization, and Applications

View this Special Issue

Research Article | Open Access

Volume 2013 | Article ID 323481 | https://doi.org/10.1155/2013/323481

Low-Rank Affinity Based Local-Driven Multilabel Propagation

Teng Li,¹Bin Cheng,²Xinyu Wu,^3,4and Jun Wu⁵

Academic Editor: Shuping He

Received21 Oct 2013

Accepted29 Nov 2013

Published21 Dec 2013

Abstract

This paper presents a novel low-rank affinity based local-driven algorithm to robustly propagate the multilabels from training images to test images. A graph is constructed over the segmented local image regions. The labels for vertices from the training data are derived based on the context among different training images, and the derived vertex labels are propagated to the unlabeled vertices via the graph. The multitask low-rank affinity, which jointly seeks the sparsity-consistent low-rank affinities from multiple feature matrices, is applied to compute the edge weights between graph vertices. The inference process of multitask low-rank affinity is formulated as a constrained nuclear norm and -norm minimization problem. The optimization is conducted efficiently with the augmented Lagrange multiplier method. Based on the learned local patch labels we can predict the multilabels for the test images. Experiments on multilabel image annotation demonstrate the encouraging results from the proposed framework.

1. Introduction

Graphbased label propagation is an important methodology in machine learning, which has been widely adopted in classification tasks such as image annotation [1–3]. It can effectively leverage the unlabeled data in addition to the labeled data for the classification and therefore solve the problem of lack of sufficient labeled data in many real applications. Conventional graph-based label propagation mainly focuses on the cases with a single label for each datum and models the semantic relation between images based on the global feature matching. An image is considered as a vertex linking with others in the graph. However, real-world online images are always associated with multiple labels and each corresponds to a local region. Thus the global matching based methods are not well suitable for the multilabel propagation and image annotation cases. Recently, local matching based methods have been adopted widely and have shown superior performance to that of the global matching based methods in several classification tasks [4]. In graph-based multilabel propagation, [5] proposed to construct the graph based on local feature matching, which is expected to model the multilabel semantics more accurately than conventional global matching based methods.

In this work, we are going to follow the local matching based way of [5] exploring the graph-based multilabel propagation problem. A critical step in graph-based label propagation is the graph construction. Conventional methods for graph construction include the -nearest-neighbor method and -ball based method, where, for each datum, the samples within its surrounding ball are connected, and then various approaches, for example, binary, Gaussian kernel [6], and -reconstruction [7], are applied for determining the graph edge weights. In [5] the -ball based method is adopted to construct the local feature graph. However, recent studies [8–10] reveal that sparse representation (SR) and low-rank representation (LRR) for graph construction can lead to several characteristics we desired, such as robustness to noise, sparsity, and datum-adaptive neighborhood. The corresponding -graph and low-rank graph have demonstrated their superiorities in several real applications [10–12]. Comparatively, LRR is better at capturing global structures of data and utilizing the overall contextual information.

Motivated by the recent successes of LRR, we propose a novel low-rank affinity based local-driven multilabel propagation approach for image annotation. Figure 1 illustrates the proposed scheme: local features in the images are extracted as the vertices to construct a graph, and the graph edges are computed according to image matching criterions, for example, the adopted low-rank affinity. Similar to [5], labels of the training vertices are derived from the context of matching images with multilabels and are propagated to vertices in the test images via the graph. Each test image is then associated with multilabel according to labels of the local features it contains. To combine multiple features matching information, the affinity matrices inferring for different features is jointly solved through a multitask low-rank affinity pursuit method, which is formulated as an optimization problem. The resulted multiple affinity matrices not only preserve the low-rank property but also are forced to be sparsity consistent. The graph edge weights between local patches are then computed from these derived affinity matrices.

Methods have been proposed to address the correlations among multiple labels by adding a regularizer in semisupervised learning framework and so forth [13, 14]. They showed that the multilabel correlations can be important information to utilize. Sparsity-based methods have also been explored for other applications such as image tagging [15], and they also demonstrate the effectiveness of sparse theory with promising results.

The proposed method has such properties distinguishing it from all previous works: the multilabel propagation is based on the local feature matching, which is semantically more accurate than conventional global matching base methods; the constructed graph is derived from the low-rank affinity over all local features, which is expected to bring several advantages as robustness and sparsity. In experiments we demonstrate the promising results on two multilabel image datasets: MSRC and Corel subsets.

The rest of this paper is organized as follows. Section 2 presents the problem formulation; the low-rank affinity based graph construction is detailed in Section 2.1; Section 3 reports the experimental results, followed by conclusions in Section 4.

2. Problem Formulation

The local regions, that is, superpixels, are obtained from images through the oversegmentation algorithm of [16]. Each given image is partitioned into subregions, in which multiple types of features are extracted and will be utilized later. In the multilabel case, an image is associated with several classes, but each oversegmented local region purely belongs to one of the labels. Comparing to the global matching based propagation, to propagate the labels via the graph of local regions, the semantic relation can be modeled more accurately and the labels can be propagated more effectively.

Let be the set of vertices in the graph, each column of which is a local feature vector . The first points belonging to the training set have labels derived from the image-level labels, and their labels are denoted as ; the task is converted to labeling the remaining vertices . The graph to be constructed is denoted by , where the vertices set with corresponding to and corresponding to . The edges are weighted by the affinity matrix with indicating the similarity measure between and . The predicted labels of to be obtained are denoted as . In the proposed work multiple types of visual features are going to be combined in matching local regions, and the th type of features is written in a matrix in this case.

The labels of training vertices are derived from the image-level labels from the matching image context, and finally the multilabels of the test images are obtained from the predicted vertex labels in similar ways to [5]. Specifically, the multilabel of each test image is obtained by summing and normalizing all predicted labels of the local vertices it contains. Except for these processes, there are two main steps in the proposed approach: graph construction and label propagation. In the graph construction procedure the low-rank affinity over local features is derived and used for weighting the graph edges.

2.1. Low-Rank Affinity Based Graph Construction

LRR based modeling is chosen for graph construction owing to its effectiveness and robustness. For a matrix with each representing the th local feature, the affinities among local features are computed by solving the following LRR problem: where denotes the nuclear norm, also known as the trace norm or the Ky Fan norm, which can be obtained through summing of the singular values, is the -norm [10, 17] for characterizing noise, and the parameter is used to balance the effects of the two parts. The optimal solution to problem (1) naturally forms an affinity matrix that represents the pairwise similarities among local feature vectors. Namely, the affinity between two local features and could be computed by , where denotes the th element of a matrix. Therefore the local matching graph can be formed accordingly.

The above presented LRR solution is for the single feature case, where each local region is represented by a certain type of visual features. Different types of visual features can usually be combined to yield more accurate matching for good performance. For multiple features integration, an intuitive approach is to directly combine the affinity matrices individually inferred by LRR. A straightforward way of combination can be simply adding together multiple affinities. However, the inference of the individual affinity matrix does not well utilize the cross-feature information, which is crucial to produce accurate and reliable results.

To fuse multiple features effectively, a new solution of multitask low-rank affinity pursuit is used here, which aims at jointly inferring a collection of affinity matrices , and , where each matrix corresponds to the th feature matrix . The multiple matrices inferring can be formulated in an optimization problem taking into account two aspects of considerations: first, the affinity matrices should be of low rank to inherit the advantages of LRR for a single feature case; second, to make effective use of the cross-feature information, the affinity matrices are forced to be sparsity consistent. By considering both sides, the affinity matrices , and inferring for types of features is formulated by solving the following convex optimization problem: where is a parameter and the matrix is formed by concatenating , and together as the following: The first factor of the formulation (2) is the straightforward summation of the LRR objectives for multiple features; direct optimization of this factor only will lead to a “trivial” solution that is equal to applying LRR to each feature matrix individually. The second factor of the formulation (2) is the -norm regularization defined on , which plays a key role in the proposed multitask low-rank pursuit: it forces the affinities , , and , to have consistent magnitudes, all either large or small. That is, the fusion of multiple features is “seamlessly” performed by minimizing the -norm of .

2.1.1. Optimization Procedure

Problem (2) is convex and can be optimized in polynomial time. We first convert it into the following equivalent problem: This problem can be solved with the augmented Lagrange multiplier (ALM) method [18], which minimizes the following augmented Lagrange function: where , and , , and , and , and are Lagrange multipliers and is a penalty parameter. The inexact ALM method [18], also called the alternating direction method (ADM), is outlined in Algorithm 1. Note that the subproblems of the algorithm are convex and they all have closed-form solutions. Step 1 is solved via the singular value thresholding operator [19], while Steps 3 and 4 are solved via [10, Lemma 3.2].

Inputs: Data matrices , parameters and .
while not converged do
(1) Fix the others and update by
.
(2) Fix the others and update by
.
(3) Fix the others and update by
,
where is a matrix formed as follows:
,
where .
(4) Fix the others and update by
.
(5) Update the multipliers
,
,
.
(6) Update the parameter by
( in all experiments).
(7) Check the convergence condition: and , .
end while
Output: .

2.2. Label Propagation

The graph-based label propagation utilizes a weighted graph such that the vertices correspond to labeled and unlabeled data points and the edge weights reflect the similarities between data points. Here the optimal solution of to problem (2) derived from the previous section is used for computing the graph edge weights. To obtain a unified affinity matrix for the constructed graph, we only need a simple step to quantify the columns of the matrix : Note here that is the right of the -norm of the th column of used in (2) and thus (6) is not a simple late fusion of ’s. With such an affinity matrix as the graph edge weights, the label propagation algorithms can be applied directly to produce the graph label results.

Existing graph-based label propagation algorithms are all based on the common assumption that the labels are smooth on the graph. Then, they essentially estimate a labeling function over the graph such that it satisfies two conditions: it should be close to the given labels and it should be smooth on the whole graph. These two conditions are usually presented in a regularization form. Mathematically, the graph-based methods aim to find an optimal essentially by minimizing the following energy function: where is a loss function to penalize the deviation from the given labels and is a regularizer to prefer the label smoothness.

We adopt the graph-based propagation algorithm similar to the Gaussian random fields (GRF) [3] in our approach. The in GRF is defined as a quadratic energy function: with the weight matrix defined before. The factor in (5) represents the constraints that values of the training vertices should equal to the given labels. The solution can be efficiently obtained using matrix methods or belief propagation [20, 21].

Solving the label propagation problem yields labels of testing vertices, which correspond to local regions in test images. Obviously, the multilabel of each test image can be obtained by summing and normalizing all predicted labels of the local vertices it contains.

3. Experiment

3.1. Experimental Setup

In this section, we evaluate the proposed low-rank affinity based local-driven multilabel propagation on two datasets which have been used in [5] and compare with the previous local-driven approach of [5]. The first dataset is the MSRC image database [22] which contains 591 images from 23 classes. Around 80% of the images are associated with more than one label and there are around three labels per image on average. We select a subset of the database, focusing on relatively well labeled classes, yielding 355 images and 14 different classes: “building,” “grass,” “tree,” “cow,” “sheep,” “sky,” “mountain,” “airplane,” “water,” “car,” “bicycle,” “bird,” “road,” and “boat.” The dataset is randomly split into two subsets of 50% : 50% for training and testing with the consideration that each class has enough samples in the training set. The second dataset is a subset of the labeled Corel database used in [23]. The dataset contains 674 labeled images from 11 labels: “Sky,” “Waterscape,” “Mountain,” “Grass,” “Tree,” “Flower,” “Rock,” “Earth,” “Ground,” “Building_Material,” and “Animal_Skin.” This dataset is also split into the training and test subsets of 50% : 50%.

The images are segmented to local regions, that is, superpixels, through the algorithm of [16], and local features are extracted on these oversegmented patches. Two types of local features are extracted here: the popular scale-invariant feature transform (SIFT) descriptor [24] for object categories such as “building” and “animal” and the local texton statistics (LTS) descriptor [25] encoding texture and color information for scene categories such as “sky” and “grass.” The combination of the two types of features is done in the graph construction step using the previously presented multitask low-rank based solution. For the LTS descriptor, pixel features from the response of several filter banks are clustered to textons and the normalized texton histogram within a segmented local region is computed as the descriptor. In the experiments, 200 textons are used and the texton histogram descriptor therefore is of 200 dimensions. The adopted feature types are the same as the previous local-driven approach of [5] for comparison.

3.2. Experimental Results

The proposed algorithm is compared with the previous local-driven multilabel approach of [5], and the area under ROC curve (AUC) value is adopted to measure the performance of multilabel image annotation. We also compare with the cases using a single type of feature, that is, SIFT or LTS, in graph construction.

Table 1 lists the image annotation performances from different methods on the two datasets. The “SIFT” and “LTS” represent the methods with only one type of the features, respectively, and “Proposed” represents the fully proposed low-rank affinity based local-driven multilabel propagation method combining the two types of features. It can be seen that the local-driven graph-based label propagation method yields good performances for the multilabel image annotation tasks, for the effectiveness of the scheme. Not surprisingly, combining two types of features yields better performance than using any single feature type, since the two feature types we adopted have different characteristics in describing various visual categories, as we explained before. It is also worth noting that our the proposed method may be further boosted by integrating other types of visual features, for example, contour and spatial information.

The proposed method effectively combines multiple features in graph construction through the adopted joint inferring algorithm. Comparing with the previous related work of [5], which uses the conventional -ball based local feature graph construction, the proposed method shows its priorities. The adopted low-rank affinity not only leads to several characteristics such as robustness to noise and sparsity but also is good at capturing global structures of data and utilizing the overall contextual information, which have also been demonstrated previously in other tasks. The experimental results validate the suitability of the low-rank affinity for measuring the semantic relation among the local features.

4. Conclusion

In this paper, a new technique of graph-based multilabel propagation for image annotation task is proposed, where the low-rank affinity over local features is pursued and utilized for graph construction. To combine multiple types of features, the local features matching is formulated in a multitask low-rank affinity pursuit problem and solved by optimization. The proposed scheme propagates the labels based on the local feature matching, which is semantically more accurate than conventional global matching based methods. Furthermore, the low-rank affinity based graph construction brings superiorities to the previous adopted graph construction methods. The multilabel image annotation experiments on two benchmark datasets validated the performance of the proposed method.

Acknowledgments

This work is supported by the National Natural Science Foundation (NSF) of China (no. 61300056) and the Ph.D. Programs Foundation of the Ministry of Education of China (no. 20133401120005).

References

M. Belkin, I. Matveeva, and P. Niyogi, “Regularization and semi-supervised learning on large graphs,” in Proceedings of the 17th Annual Conference on Learning Theory (COLT '04), pp. 624–638, July 2004.
View at: Publisher Site | Google Scholar | Zentralblatt MATH | MathSciNet
D. Zhou, O. Bousquet, T. Lal, J. Weston, and B. Scholkopf, Learning with Local and Global Consistency, Advances in Neural Information Processing Systems, 2004.
X. Zhu, Z. Ghahramani, and J. Lafferty, “Semi-Supervised Learning Using Gaussian Fields and Harmonic Functions,” in Proceedings of the 20th International Conference on Machine Learning (ICML '03), pp. 912–919, August 2003.
View at: Google Scholar
J. Zhang, M. Marszałek, S. Lazebnik, and C. Schmid, “Local features and kernels for classification of texture and object categories: a comprehensive study,” International Journal of Computer Vision, vol. 73, no. 2, pp. 213–238, 2007.
View at: Publisher Site | Google Scholar
T. Li, S. Yan, T. Mei, and I. S. Kweon, “Local-driven semi-supervised learning with multi-label,” in Proceedings of the IEEE International Conference on Multimedia and Expo (ICME '09), pp. 1508–1511, July 2009.
View at: Publisher Site | Google Scholar
M. Belkin and P. Niyogi, “Laplacian eigenmaps for dimensionality reduction and data representation,” Neural Computation, vol. 15, no. 6, pp. 1373–1396, 2003.
View at: Publisher Site | Google Scholar
S. T. Roweis and L. K. Saul, “Nonlinear dimensionality reduction by locally linear embedding,” Science, vol. 290, no. 5500, pp. 2323–2326, 2000.
View at: Publisher Site | Google Scholar
B. Cheng, J. Yang, S. Yan, Y. Fu, and T. S. Huang, “Learning with ℓ¹-graph for image analysis,” IEEE Transactions on Image Processing, vol. 19, no. 4, pp. 858–866, 2010.
View at: Publisher Site | Google Scholar
S. He and F. Liu, “Robust $L_{2}$ - $L_{\infty}$ filtering of time-delay jump systems with respect to the finite-time interval,” Mathematical Problems in Engineering, vol. 2011, Article ID 839648, 17 pages, 2011.
View at: Publisher Site | Google Scholar
G. Liu, Z. Lin, and Y. Yu, “Robust subspace segmentation by low-rank representation,” in Proceedings of the 27th International Conference on Machine Learning (ICML '10), pp. 663–670, June 2010.
View at: Google Scholar
B. Cheng, G. Liu, J. Wang, Z. Huang, and S. Yan, “Multi-task low-rank affinity pursuit for image segmentation,” in Proceedings of the IEEE International Conference on Computer Vision (ICCV '11), pp. 2439–2446, November 2011.
View at: Publisher Site | Google Scholar
S. He and F. Liu, “Robust finite-time estimation of markovian jumping systems with uncertain transition probabilities,” Applied Mathematics and Computation, vol. 222, pp. 297–306, 2013.
View at: Google Scholar
Z. J. Zha, X. S. Hua, T. Mei, J. Wang, G. J. Qi, and Z. Wang, “Joint multi-label multi-instance learning for image classification,” in Proceedings of the 26th IEEE Conference on Computer Vision and Pattern Recognition (CVPR '08), June 2008.
View at: Publisher Site | Google Scholar
Z. J. Zha, T. Mei, J. Wang, Z. Wang, and X. S. Hua, “Graph-based semi-supervised learning with multi-label,” in Proceedings of the IEEE International Conference on Multimedia and Expo (ICME '08), pp. 1321–1324, June 2008.
View at: Publisher Site | Google Scholar
J. Tang, Q. Chen, M. Wang, S. Yan, T. S. Chua, and R. Jain, “Towards optimizing human labeling for interactive image tagging,” ACM Transactions on Multimedia Computing, Communications, and Applications, vol. 9, no. 4, article 29, 2013.
View at: Publisher Site | Google Scholar
G. Mori, X. Ren, A. A. Efros, and J. Malik, “Recovering human body configurations: combining segmentation and recognition,” in Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR '04), pp. 326–333, July 2004.
View at: Google Scholar
G. Liu, Z. Lin, S. Yan, J. Sun, Y. Yu, and Y. Ma, “Robust recovery of subspace structures by low-rank representation,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 35, no. 1, pp. 171–184, 2013.
View at: Publisher Site | Google Scholar
Z. Lin, M. Chen, L. Wu, and Y. Ma, “The augmented Lagrange multiplier method for exact recovery of corrupted low-rank matrices,” Tech. Rep UILU-ENG-09-2215, 2009.
View at: Google Scholar
J. F. Cai, E. J. Candès, and Z. Shen, “A singular value thresholding algorithm for matrix completion,” SIAM Journal on Optimization, vol. 20, no. 4, pp. 1956–1982, 2010.
View at: Publisher Site | Google Scholar | Zentralblatt MATH | MathSciNet
B. Ni, S. Yan, A. Kassim, and L. F. Cheong, “Learning by propagability,” in Proceedings of the 8th IEEE International Conference on Data Mining (ICDM '08), pp. 492–501, December 2008.
View at: Publisher Site | Google Scholar
X. Zhu, “Semi-supervised learning literature survey,” Tech. Rep. 1530, Computer Sciences, University of Wisconsin-Madison, Madison, Wis, USA, 2005.
View at: Google Scholar
J. Shotton, J. Winn, C. Rother, and A. Criminisi, “TextonBoost: joint appearance, shape and context modeling for multi-class object recognition and segmentation,” in Proceedings of the 9th European Conference on Computer Vision (ECCV '09), pp. 1–15, May 2006.
View at: Publisher Site | Google Scholar
J. Yuan, J. Li, and B. Zhang, “Exploiting spatial context constraints for automatic image region annotation,” in Proceedings of the 15th ACM International Conference on Multimedia (MM '07), pp. 595–604, September 2007.
View at: Publisher Site | Google Scholar
D. G. Lowe, “Distinctive image features from scale-invariant keypoints,” International Journal of Computer Vision, vol. 60, no. 2, pp. 91–110, 2004.
View at: Publisher Site | Google Scholar
T. Li and I. S. Kweon, “A semantic region descriptor for local feature based image categorization,” in Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP '08), pp. 1333–1336, April 2008.
View at: Publisher Site | Google Scholar

Copyright

Copyright © 2013 Teng Li et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

PDF Download Citation

Download other formats

Order printed copies

Views

998

Downloads

922

Citations