Abstract

This paper presents a novel algorithm to numerically decompose mixed signals in a collaborative way, given supervision of the labels that each signal contains. The decomposition is formulated as an optimization problem incorporating nonnegative constraint. A nonnegative data factorization solution is presented to yield the decomposed results. It is shown that the optimization is efficient and decreases the objective function monotonically. Such a decomposition algorithm can be applied on multilabel training samples for pattern classification. The real-data experimental results show that the proposed algorithm can significantly facilitate the multilabel image classification performance with weak supervision.

1. Introduction

Signal decomposition, which separates a mixed source signal into its constitutive pure components, is an important step in many practical engineering problems. For example, in image classification or speech recognition, a single training or test sample we collected usually contains multiple additive signals from several classes. The decomposition of the mixed signals can yield pure feature representations for better classes model training and recognition. Figure 1 illustrates the decomposition for an image containing three object classes. From the given mixed signal on the top, we aim to derive a specific representation for each of its associated labels, which reflects a corresponding pure region in the image. The decomposed representation can be more accurate and informative for describing object categories.

Unsupervised signal decomposition, for example, blind source separation in signal processing, is a fundamental problem which has been well explored in the past decades [1, 2]. Various methods have been proposed for different tasks, including Independent Component Analysis (ICA) for speech signal [3] and Null Space Pursuit (NSP) for electricity consumption and global surface temperature [4]. In image processing, a straightforward way to decompose the image representation into label-pure representations is manual labeling. It is tedious and impractical in most cases. Most conventional approaches to automatic image decomposition focused on clustering based on low-level visual cues, such as the bottom-up image segmentation in which pixels are locally grouped on the basis of their appearances [5] and the top-down image parsing in which primitives (e.g., rectangles, sketches, and edges) are correlated based on a few grammar rules [6]. However, all these methods have not taken the supervision information into consideration. In a practical case, online images from internet are usually associated with the textual labels on the webpage, which can be utilized to guide the decomposition.

In this paper, we focus on the signal decomposition problem with label supervision; that is, given a set of mixed signals each with class labels it contains, we decompose each signal to the specific labels. To this aim; the fundamental principle in pattern classification is utilized that signals of the same label should be close to each other while those of different labels should be relatively separated in feature space. Based on this assumption, a novel algorithm to numerically decompose mixed signals with labels in a collaborative way is proposed. The decomposition is formulated as an optimization problem incorporating nonnegative constraint for practical consideration, due to the nonnegative property of most real-life signals. Furthermore, we present a nonnegative data factorization solution, which is shown to be efficient and can decrease the objective function monotonically. Such a decomposition can be applied on multi-label training samples for pattern classification. The real-data experimental results show that the proposed algorithm can significantly facilitate the multi-label image classification with weak supervision.

2. Notation

In the following, we will use these rules to facilitate presentation: for any matrix , means the th row vector of , its corresponding lowercase version means the th column vector of , and denotes the element of at the th row and th column. Throughout this paper we will also use the following notation: : number of total training samples, : number of valid tags in the training set, : label vector of the th training sample,: mixed signal representation vector of the th training sample, : mixed signal representations of all training samples (),: decomposed signal representation of the th label in th training sample,: decomposed representation of the th valid tag in the full training set, : decomposed signal representations from all training samples (),: coefficient of normalized vector in reconstructing , : reconstruction coefficients matrix (), : membership matrix denoting relation between tags and samples, defined in (5).

Here a label represents a class, and a tag is a specific label instance. For a specific sample , its signal representation can be calculated as a -dimensional normalized vector with , where represents the -norm operator. The labels of this sample are denoted in a vector with only 0 or 1 elements: where denotes the number of labels. Let a -dimensional vector denote the representation of the corresponding part for the th label in the th sample. For a set of training samples, we collect the representations for all nonzero label instances, that is, “tag”s, and reindex as vectors , , where is the number of valid tags in the set obtained by . Note that are unknown as the objective to be computed.

3. Problem Formulation

In this section, the above decomposition problem is solved by nonnegative data factorization. The presented solution appears to be efficient. Furthermore, the monotonic decrease of the objective function can be proved regarding the convergence properties.

With training samples, to formulate the decomposition problem, we use such representations as follows: a matrix to enclose the input holistic signal representations in columns; a matrix to enclose the decomposed representations for all the tags, which are also arranged in columns, where is the total number of tags in the training set. The decomposition problem formulation consists of the following two parts.

(1) The first part is minimizing the error of reconstruction from the decomposed representations to the additive signal representations; that is, for all ,

Considering that, in the above formulation, are not normalized, we introduce the coefficients and yield:

where represents the coefficient of in the reconstruction of . Using matrix form and an matrix with elements to enclose the information of coefficients, the reconstruction error minimization can be formulated as .

Note that a mixed holistic representation should be reconstructed only from representations of the associated labels. Therefore, some elements of matrix must be ; that is,

This information can be derived from the training labels. Here we release the -elements constraint for matrix and introduce a membership matrix :

Then the minimization problem becomes

Note that matrix will be abandoned later in the proposed solution when the -elements constraint for matrix is already considered.

(2) The second part is minimizing intralabel distance between the decomposed signal representations of the same label. Here we use matrix to denote the intralabel information with

To force the decomposed representations belonging to the same label to be similar to each other, we will minimize (here for convenience are assumed as normalized, which will be described later), and we can derive in the matrix form where

Representing the context among the decomposed local representations in a graph as shown in Figure 2, matrix is therefore the Laplacian matrix for the graph, based on which graph preserving energy [7] (similar to (8)) can be derived.

Note that may not be normalized. When measuring distance between different in the graph based representation, we introduce for normalizing . Since is known to be normalized in column, by minimizing the reconstruction error with (6), can be considered to be approximately normalized. Thus is designed as a diagonal matrix: where is a dimensional vector; that is,

Multiplying with , respectively, we can see that can be approximately columnwise normalized. If we take into account the normalization factor , the second factor of the objective function minimizing intralabel distance is then yielded as

Based on the above two parts, the presented decomposition problem can be formulated as a regularized nonnegative data factorization problem as where is a weight to be empirically set in the experiments. The constraints show that and should be nonnegative to satisfy the reconstruction assumption, since in many real applications only when the signal representations and the reconstruction coefficients are nonnegative, can the proposed decomposition be physically meaningful. The decomposed label representations can be obtained from the fission of by solving (13). The solution will be given in the following. In the formulation, the given is normalized, but there is no constraint that or matrix should be normalized. In the following optimization process we will normalize .

4. Optimizing Solution

The problem of (13) falls into the framework of nonnegative matrix factorization (NMF) [8] for its nonnegative constraints. NMF has shown its effectiveness in practical signal processing for image classification [9]. Here similar to the optimization strategy adopted in the previous work by Lee and Seung [8], we optimize the objective function of (13) in an iterative way with a multiplicative updating rule, which guarantees the non-negativity. For initialization, to get an original guess of the target decomposed representations, we initialize . Note that “” stands for dot product in this paper, and is the average representation of th class from training samples associated with label .

Most iterative procedures for solving high-order optimization problems transform the original intractable problem into a set of tractable subproblems and finally obtain the convergence to a local optimum. The proposed iterative procedure also follows this philosophy and optimizes and alternately. The proposed iteration rules can be proved to monotonically decrease the objective function value, and the theoretical proof is given in Section 5. In fact, although stationary is the necessary condition for achieving local minima, it is difficult to justify that any limit points are the stationary point. Lin [10] has slightly modified the original iterative algorithm of NMF to achieve the convergence safely. Their modification, however, will increase the computational complexity while achieving his similar performance, as reported in [10]. Therefore, in this work, we do not adopt that the modification and experiment results show that the proposed solution could usually converge to local minima. We give the iterative procedure as follows.

(1)  Optimize for Given . For a fixed , the objective function in (13) with respect to can be written as Let be the Lagrange multiplier for constraint ; the Lagrange factor is and the Lagrange function is

The partial derivation of with respect to is where we have used the following deductions:

Using the KarushKuhnTucker (KKT) condition [11] , from (16), the following equations can be obtained for : which leads to the update rule

Remember that if , where means the th iteration. Note that it could be satisfied by initializing if ; therefore actually can be neglected without any influence. It is obvious that the updated is still nonnegative if the matrices and are nonnegative. After getting , we normalize the column vectors of and consequently convey the norms to the basis matrix; namely, The above updating of and does not change the value of the objective function in (13).

(2)  Optimize for Given Normalized . Then based on the normalized in (22), the objective function in (13) with respect to for given can be written as

Let be the Lagrange multiplier for constraint , the Lagrange function is

The partial derivation of with respect to is

Using the KKT condition , from (25), we obtain the following equations for : which leads to the following update rule:

5. Convergence Proof of the Proposed Solution

Definition 1. Function is an auxiliary function for function if the conditions are satisfied.

Theorem 2. If is an auxiliary function, then is nonincreasing under the update where means the th iteration.

5.1. Convergence of the Update Rule for H

Firstly let us go back to (14) and consider one element of matrix . Using to denote the part of relevant to , we can get

The auxiliary function of is designed as

Lemma 3. Equation (32) is an auxiliary function for , the part of only relevant to .

Proof. Since is obvious, we need only show that .
To this aim, we compare the Taylor series expansion of with (32) to find that is equivalent to
where
Thus, (33) holds and .

Lemma 4. Equation (20) could be obtained by minimizing the auxiliary function .

Proof. By setting , we have Then the update rule for in (20) can be obtained.

5.2. Convergence of the Update Rule for R

Let us go back to (23) and consider any element of matrix . Now is used to denote the part of relevant to . It is easy to check that

The auxiliary function of is designed as

Lemma 5. Equation (37) is an auxiliary function for , the part of only relevant to .

Proof. Since is obvious, we need only show that . To do this, we compare the Taylor series expansion of with (37), and is equivalent to where
Thus, (39) holds and .

Lemma 6. Equation (27) could be obtained by minimizing the auxiliary function .

Proof. By setting , we can get Then the update rule for in (27) can be obtained.

6. Experiment

A typical application of the proposed decomposition in real life is as follows: for a set of images related to a scene such as campus and each associated with multiple labels, we can decompose the image feature representations to pure label representations, to provide better label samples for modeling object categories, which therefore can facilitate object recognition, detection, and so on. It leads to a useful tool for automatic labeling of online images or personal photo album. There are two main reasons that the proposed decomposition is necessary here. First, with limited training samples, it is difficult to learn good label models using the original image presentations directly without decomposition. Since each sample is a mixture of several labels and the background for each label is various, it is better to decompose the image representation to labels optimally. Second, instances of the same label related to a scene usually have short intrasubspace distance, which supports the assumption of the proposed decomposition and guarantees the effectiveness.

To demonstrate a practical application of the proposed decomposition in automatic image annotation, we manually crawled images of “KAIST campus” from the Flickr photo sharing website [12], excluded the irrelevant images, and obtained images (we share the data at http://rcv.kaist.ac.kr/~tengli/Teng_resource.html). They are labeled with nine concepts: “sky,” “plant (grass/tree/flower),” “water,” “ground,” “man-made (building/sign/road),” “car/bus,” “bicycle,” “person,” and “animal.” images are used for training samples, which are used for class models training and can be applied to the proposed decomposition, and the rest are used for testing. Labels are highly mixed in the images.

In the implementation, the images and regions are represented using the Bag-of-Words (BoW) feature [13], where the number of visual words is set as in the experiments for the tradeoff of classification accuracy and computational cost, and the feature dimension is therefore . With the decomposed label representations from the training images, that is, normalized matrix, a typical classifier, support vector machine (SVM) [14] is applied to learn label models and classify the testing samples. In the experiment of multilabel image classification, we adopted the area under ROC curve (AUC) measurement [15].

Table 1 gives the performance comparison between the image classification adopting the proposed nonnegative signal decomposition and the classic BoW method [13]. To further examine the performance, except for the well-known SVM, we also apply a recent popular Sparse Representation Classification (SRC) [16] as the classifier. To train using the original BoW image representations without applying decomposition yield AUC as low as around 0.5, whereas, to train label models using the decomposed results, the AUCs on the test samples are 0.77 and 0.76, in combination with SVM and SRC, respectively. Figure 3 gives the multi-label classification results for some sample images of KAIST campus Flickr dataset. The results appear to be promising.

7. Conclusion

A new technique for the nonnegative signals decomposition considering the supervised additive label information has been developed in this paper, with the assumption that intralabel signals should be relatively closer to each other than interlabel signals. The nonnegative constraint makes it applicable for many practical cases. We formulated it as an optimization problem and proposed an efficient iterative solution based on nonnegative data factorization. Convergence analysis showed that the solution guarantees monotonic decrease of the objective function.