Abstract

This paper presents a novel low-rank affinity based local-driven algorithm to robustly propagate the multilabels from training images to test images. A graph is constructed over the segmented local image regions. The labels for vertices from the training data are derived based on the context among different training images, and the derived vertex labels are propagated to the unlabeled vertices via the graph. The multitask low-rank affinity, which jointly seeks the sparsity-consistent low-rank affinities from multiple feature matrices, is applied to compute the edge weights between graph vertices. The inference process of multitask low-rank affinity is formulated as a constrained nuclear norm and -norm minimization problem. The optimization is conducted efficiently with the augmented Lagrange multiplier method. Based on the learned local patch labels we can predict the multilabels for the test images. Experiments on multilabel image annotation demonstrate the encouraging results from the proposed framework.

1. Introduction

Graphbased label propagation is an important methodology in machine learning, which has been widely adopted in classification tasks such as image annotation [13]. It can effectively leverage the unlabeled data in addition to the labeled data for the classification and therefore solve the problem of lack of sufficient labeled data in many real applications. Conventional graph-based label propagation mainly focuses on the cases with a single label for each datum and models the semantic relation between images based on the global feature matching. An image is considered as a vertex linking with others in the graph. However, real-world online images are always associated with multiple labels and each corresponds to a local region. Thus the global matching based methods are not well suitable for the multilabel propagation and image annotation cases. Recently, local matching based methods have been adopted widely and have shown superior performance to that of the global matching based methods in several classification tasks [4]. In graph-based multilabel propagation, [5] proposed to construct the graph based on local feature matching, which is expected to model the multilabel semantics more accurately than conventional global matching based methods.

In this work, we are going to follow the local matching based way of [5] exploring the graph-based multilabel propagation problem. A critical step in graph-based label propagation is the graph construction. Conventional methods for graph construction include the -nearest-neighbor method and -ball based method, where, for each datum, the samples within its surrounding ball are connected, and then various approaches, for example, binary, Gaussian kernel [6], and -reconstruction [7], are applied for determining the graph edge weights. In [5] the -ball based method is adopted to construct the local feature graph. However, recent studies [810] reveal that sparse representation (SR) and low-rank representation (LRR) for graph construction can lead to several characteristics we desired, such as robustness to noise, sparsity, and datum-adaptive neighborhood. The corresponding -graph and low-rank graph have demonstrated their superiorities in several real applications [1012]. Comparatively, LRR is better at capturing global structures of data and utilizing the overall contextual information.

Motivated by the recent successes of LRR, we propose a novel low-rank affinity based local-driven multilabel propagation approach for image annotation. Figure 1 illustrates the proposed scheme: local features in the images are extracted as the vertices to construct a graph, and the graph edges are computed according to image matching criterions, for example, the adopted low-rank affinity. Similar to [5], labels of the training vertices are derived from the context of matching images with multilabels and are propagated to vertices in the test images via the graph. Each test image is then associated with multilabel according to labels of the local features it contains. To combine multiple features matching information, the affinity matrices inferring for different features is jointly solved through a multitask low-rank affinity pursuit method, which is formulated as an optimization problem. The resulted multiple affinity matrices not only preserve the low-rank property but also are forced to be sparsity consistent. The graph edge weights between local patches are then computed from these derived affinity matrices.

Methods have been proposed to address the correlations among multiple labels by adding a regularizer in semisupervised learning framework and so forth [13, 14]. They showed that the multilabel correlations can be important information to utilize. Sparsity-based methods have also been explored for other applications such as image tagging [15], and they also demonstrate the effectiveness of sparse theory with promising results.

The proposed method has such properties distinguishing it from all previous works: the multilabel propagation is based on the local feature matching, which is semantically more accurate than conventional global matching base methods; the constructed graph is derived from the low-rank affinity over all local features, which is expected to bring several advantages as robustness and sparsity. In experiments we demonstrate the promising results on two multilabel image datasets: MSRC and Corel subsets.

The rest of this paper is organized as follows. Section 2 presents the problem formulation; the low-rank affinity based graph construction is detailed in Section 2.1; Section 3 reports the experimental results, followed by conclusions in Section 4.

2. Problem Formulation

The local regions, that is, superpixels, are obtained from images through the oversegmentation algorithm of [16]. Each given image is partitioned into subregions, in which multiple types of features are extracted and will be utilized later. In the multilabel case, an image is associated with several classes, but each oversegmented local region purely belongs to one of the labels. Comparing to the global matching based propagation, to propagate the labels via the graph of local regions, the semantic relation can be modeled more accurately and the labels can be propagated more effectively.

Let be the set of vertices in the graph, each column of which is a local feature vector . The first points belonging to the training set have labels derived from the image-level labels, and their labels are denoted as ; the task is converted to labeling the remaining vertices . The graph to be constructed is denoted by , where the vertices set with corresponding to and corresponding to . The edges are weighted by the affinity matrix with indicating the similarity measure between and . The predicted labels of to be obtained are denoted as . In the proposed work multiple types of visual features are going to be combined in matching local regions, and the th type of features is written in a matrix in this case.

The labels of training vertices are derived from the image-level labels from the matching image context, and finally the multilabels of the test images are obtained from the predicted vertex labels in similar ways to [5]. Specifically, the multilabel of each test image is obtained by summing and normalizing all predicted labels of the local vertices it contains. Except for these processes, there are two main steps in the proposed approach: graph construction and label propagation. In the graph construction procedure the low-rank affinity over local features is derived and used for weighting the graph edges.

2.1. Low-Rank Affinity Based Graph Construction

LRR based modeling is chosen for graph construction owing to its effectiveness and robustness. For a matrix with each representing the th local feature, the affinities among local features are computed by solving the following LRR problem: where denotes the nuclear norm, also known as the trace norm or the Ky Fan norm, which can be obtained through summing of the singular values, is the -norm [10, 17] for characterizing noise, and the parameter is used to balance the effects of the two parts. The optimal solution to problem (1) naturally forms an affinity matrix that represents the pairwise similarities among local feature vectors. Namely, the affinity between two local features and could be computed by , where denotes the th element of a matrix. Therefore the local matching graph can be formed accordingly.

The above presented LRR solution is for the single feature case, where each local region is represented by a certain type of visual features. Different types of visual features can usually be combined to yield more accurate matching for good performance. For multiple features integration, an intuitive approach is to directly combine the affinity matrices individually inferred by LRR. A straightforward way of combination can be simply adding together multiple affinities. However, the inference of the individual affinity matrix does not well utilize the cross-feature information, which is crucial to produce accurate and reliable results.

To fuse multiple features effectively, a new solution of multitask low-rank affinity pursuit is used here, which aims at jointly inferring a collection of affinity matrices , and , where each matrix corresponds to the th feature matrix . The multiple matrices inferring can be formulated in an optimization problem taking into account two aspects of considerations: first, the affinity matrices should be of low rank to inherit the advantages of LRR for a single feature case; second, to make effective use of the cross-feature information, the affinity matrices are forced to be sparsity consistent. By considering both sides, the affinity matrices , and inferring for types of features is formulated by solving the following convex optimization problem: where is a parameter and the matrix is formed by concatenating , and together as the following: The first factor of the formulation (2) is the straightforward summation of the LRR objectives for multiple features; direct optimization of this factor only will lead to a “trivial” solution that is equal to applying LRR to each feature matrix individually. The second factor of the formulation (2) is the -norm regularization defined on , which plays a key role in the proposed multitask low-rank pursuit: it forces the affinities ,  , and , to have consistent magnitudes, all either large or small. That is, the fusion of multiple features is “seamlessly” performed by minimizing the -norm of .

2.1.1. Optimization Procedure

Problem (2) is convex and can be optimized in polynomial time. We first convert it into the following equivalent problem: This problem can be solved with the augmented Lagrange multiplier (ALM) method [18], which minimizes the following augmented Lagrange function: where , and , , and , and , and are Lagrange multipliers and is a penalty parameter. The inexact ALM method [18], also called the alternating direction method (ADM), is outlined in Algorithm 1. Note that the subproblems of the algorithm are convex and they all have closed-form solutions. Step 1 is solved via the singular value thresholding operator [19], while Steps 3 and 4 are solved via [10, Lemma 3.2].

Inputs: Data matrices , parameters and .
while not converged do
 (1) Fix the others and update by
         .
 (2) Fix the others and update by
         .
 (3) Fix the others and update by
       ,
 where is a matrix formed as follows:
          ,
 where .
  (4) Fix the others and update by
          .
 (5) Update the multipliers
          ,
          ,
          .
 (6) Update the parameter by
  ( in all experiments).
 (7) Check the convergence condition: and , .
end while
Output: .

2.2. Label Propagation

The graph-based label propagation utilizes a weighted graph such that the vertices correspond to labeled and unlabeled data points and the edge weights reflect the similarities between data points. Here the optimal solution of to problem (2) derived from the previous section is used for computing the graph edge weights. To obtain a unified affinity matrix for the constructed graph, we only need a simple step to quantify the columns of the matrix : Note here that is the right of the -norm of the th column of used in (2) and thus (6) is not a simple late fusion of ’s. With such an affinity matrix as the graph edge weights, the label propagation algorithms can be applied directly to produce the graph label results.

Existing graph-based label propagation algorithms are all based on the common assumption that the labels are smooth on the graph. Then, they essentially estimate a labeling function over the graph such that it satisfies two conditions: it should be close to the given labels and it should be smooth on the whole graph. These two conditions are usually presented in a regularization form. Mathematically, the graph-based methods aim to find an optimal essentially by minimizing the following energy function: where is a loss function to penalize the deviation from the given labels and is a regularizer to prefer the label smoothness.

We adopt the graph-based propagation algorithm similar to the Gaussian random fields (GRF) [3] in our approach. The in GRF is defined as a quadratic energy function: with the weight matrix defined before. The factor in (5) represents the constraints that values of the training vertices should equal to the given labels. The solution can be efficiently obtained using matrix methods or belief propagation [20, 21].

Solving the label propagation problem yields labels of testing vertices, which correspond to local regions in test images. Obviously, the multilabel of each test image can be obtained by summing and normalizing all predicted labels of the local vertices it contains.

3. Experiment

3.1. Experimental Setup

In this section, we evaluate the proposed low-rank affinity based local-driven multilabel propagation on two datasets which have been used in [5] and compare with the previous local-driven approach of [5]. The first dataset is the MSRC image database [22] which contains 591 images from 23 classes. Around 80% of the images are associated with more than one label and there are around three labels per image on average. We select a subset of the database, focusing on relatively well labeled classes, yielding 355 images and 14 different classes: “building,” “grass,” “tree,” “cow,” “sheep,” “sky,” “mountain,” “airplane,” “water,” “car,” “bicycle,” “bird,” “road,” and “boat.” The dataset is randomly split into two subsets of 50% : 50% for training and testing with the consideration that each class has enough samples in the training set. The second dataset is a subset of the labeled Corel database used in [23]. The dataset contains 674 labeled images from 11 labels: “Sky,” “Waterscape,” “Mountain,” “Grass,” “Tree,” “Flower,” “Rock,” “Earth,” “Ground,” “Building_Material,” and “Animal_Skin.” This dataset is also split into the training and test subsets of 50% : 50%.

The images are segmented to local regions, that is, superpixels, through the algorithm of [16], and local features are extracted on these oversegmented patches. Two types of local features are extracted here: the popular scale-invariant feature transform (SIFT) descriptor [24] for object categories such as “building” and “animal” and the local texton statistics (LTS) descriptor [25] encoding texture and color information for scene categories such as “sky” and “grass.” The combination of the two types of features is done in the graph construction step using the previously presented multitask low-rank based solution. For the LTS descriptor, pixel features from the response of several filter banks are clustered to textons and the normalized texton histogram within a segmented local region is computed as the descriptor. In the experiments, 200 textons are used and the texton histogram descriptor therefore is of 200 dimensions. The adopted feature types are the same as the previous local-driven approach of [5] for comparison.

3.2. Experimental Results

The proposed algorithm is compared with the previous local-driven multilabel approach of [5], and the area under ROC curve (AUC) value is adopted to measure the performance of multilabel image annotation. We also compare with the cases using a single type of feature, that is, SIFT or LTS, in graph construction.

Table 1 lists the image annotation performances from different methods on the two datasets. The “SIFT” and “LTS” represent the methods with only one type of the features, respectively, and “Proposed” represents the fully proposed low-rank affinity based local-driven multilabel propagation method combining the two types of features. It can be seen that the local-driven graph-based label propagation method yields good performances for the multilabel image annotation tasks, for the effectiveness of the scheme. Not surprisingly, combining two types of features yields better performance than using any single feature type, since the two feature types we adopted have different characteristics in describing various visual categories, as we explained before. It is also worth noting that our the proposed method may be further boosted by integrating other types of visual features, for example, contour and spatial information.

The proposed method effectively combines multiple features in graph construction through the adopted joint inferring algorithm. Comparing with the previous related work of [5], which uses the conventional -ball based local feature graph construction, the proposed method shows its priorities. The adopted low-rank affinity not only leads to several characteristics such as robustness to noise and sparsity but also is good at capturing global structures of data and utilizing the overall contextual information, which have also been demonstrated previously in other tasks. The experimental results validate the suitability of the low-rank affinity for measuring the semantic relation among the local features.

4. Conclusion

In this paper, a new technique of graph-based multilabel propagation for image annotation task is proposed, where the low-rank affinity over local features is pursued and utilized for graph construction. To combine multiple types of features, the local features matching is formulated in a multitask low-rank affinity pursuit problem and solved by optimization. The proposed scheme propagates the labels based on the local feature matching, which is semantically more accurate than conventional global matching based methods. Furthermore, the low-rank affinity based graph construction brings superiorities to the previous adopted graph construction methods. The multilabel image annotation experiments on two benchmark datasets validated the performance of the proposed method.

Acknowledgments

This work is supported by the National Natural Science Foundation (NSF) of China (no. 61300056) and the Ph.D. Programs Foundation of the Ministry of Education of China (no. 20133401120005).