Abstract

Low-rank matrix is desired in many machine learning and computer vision problems. Most of the recent studies use the nuclear norm as a convex surrogate of the rank operator. However, all singular values are simply added together by the nuclear norm, and thus the rank may not be well approximated in practical problems. In this paper, we propose using a log-determinant (LogDet) function as a smooth and closer, though nonconvex, approximation to rank for obtaining a low-rank representation in subspace clustering. Augmented Lagrange multipliers strategy is applied to iteratively optimize the LogDet-based nonconvex objective function on potentially large-scale data. By making use of the angular information of principal directions of the resultant low-rank representation, an affinity graph matrix is constructed for spectral clustering. Experimental results on motion segmentation and face clustering data demonstrate that the proposed method often outperforms state-of-the-art subspace clustering algorithms.

1. Introduction

Matrix rank minimizing [1] is ubiquitous in machine learning, computer vision, control, signal processing, and system identification. For instance, low-rank representation based subspace clustering [24] and matrix completion [5, 6] methods have achieved great success recently. Subspace clustering [7] is one of the fundamental topics with numerous applications, for example, image representation [8, 9], face clustering [3, 10], and motion segmentation [11, 12]. It is assumed that high-dimensional data is more likely a union of low-dimensional subspaces rather than one individual subspace. For example, different subspaces are needed to describe trajectories of different moving objects in a video sequence. Subspace clustering is an intrinsically difficult problem, since we need to simultaneously cluster all data points into multiple groups and find a low-dimensional subspace fitting each group of points.

Subspace clustering has been an active research topic over the past decades. Four main categories of methods are proposed [10]: iterative, algebraic, statistical, and spectral clustering-based methods. The first three kinds of approaches are sensitive to initialization, noise, and outliers; in addition, they are difficult to optimize [10]. Spectral clustering-based methods have achieved promising performance, where the key is to learn a good affinity matrix of data points. For instance, the algorithms of local subspace affinity (LSA) [13], locally linear manifold clustering (LLMC) [14], and spectral local best-fit flats (SLBF) [15] use local information around each point to construct the affinity matrix, while spectral curvature clustering (SCC) [16] method preserves the global structures of the whole dataset in deriving the affinity matrix. Subsequently, -means [17] or Normalized Cuts (NCuts) [18, 19] are applied to the affinity matrix to obtain clustering results.

Recently, some spectral clustering-based methods, such as sparse representation (SSC) [10] and low-rank representation (LRR) [3], have been proposed to obtain state-of-the-art results in subspace clustering. SSC represents each data point as a sparse linear combination of the other points and solves an -norm regularized minimization problem for sparsity. SSC shows promising results if the subspaces are either independent or disjoint [20].

The basic idea of LRR is to learn a low-rank representation of data by capturing the global Euclidean structure of the whole data. In this scheme, each data point is represented as a linear combination of the examples in the data matrix itself, and a convex nuclear norm minimization is used as a surrogate of the rank function to obtain the desired low-rank representation. Though its optimization is well studied and has a global optimum, its performance may be far from optimal in real applications because the nuclear norm might not be a good approximation to the rank function. Compared to the rank function to which all nonzero singular values have equal contributions, the nuclear norm treats those values differently by simply adding them together. As a result, the nuclear norm may be dominated by a few very large singular values and significantly deviated from the true value of the rank. Several papers have considered this problem of using the nuclear norm and designed methods to alleviate it by either thresholding or removing some of the singular values; for instance, singular value thresholding [21] and truncated nuclear norm [6] both considerably enhance the performance of matrix completion.

In this paper, we propose using a log-determinant (LogDet) function for rank approximation and study its minimization in subspace clustering. Different from the nuclear norm-based approaches which minimize the summation of all singular values, our approach aims to minimize the rank by making the contribution to be much closer to one from a big singular value, while being zero from a small singular value. In this way, we can get closer and more robust approximation to the rank function than the nuclear norm. Since the LogDet function is nonconvex, we apply the method of augmented Lagrange multipliers (ALM) to solve the associated optimization for potentially large-scale applications, in which the subproblem for minimizing the LogDet function in each iteration has a closed-form solution. To demonstrate the effectiveness of our LogDet minimization method, we apply it to subspace clustering. By employing a rather simple formulation based on the LogDet function, we obtain a low-rank representation for subspace clustering. Subsequently, we exploit the angular information of principal directions of such a representation to further enhance the separation ability of the affinity matrix. In summary, our main contributions of this work include the following.(i)More accurate and robust rank approximation is used to obtain the low-rank representation, which is able to capture the global structure of the dataset.(ii)An iterative optimization algorithm is designed for minimizing this rank approximation-based objective function. Theoretical analysis shows that our algorithm converges to a stationary point. Specifically, the proposed optimization method is applied to subspace clustering.(iii)Angular information of principal directions of the low-rank representation is employed to further exploit the intrinsic local geometrical structure relevant to the membership of data points.(iv)Extensive experiments demonstrate the effectiveness of the proposed LogDet minimization method for rank approximation. Particularly, when used for subspace clustering, our simple formulation shows favorable performance compared to other state-of-the-art methods, although we do not explicitly account for outliers in our model. This demonstrates the robustness of our approach.The remainder of the paper is organized as follows: Section 2 provides a brief review of LRR and SSC. In Section 3, we present the proposed approximation and design an efficient optimization scheme. We give convergence analysis in Section 4. Experimental results are shown in Section 5. Finally, conclusions are drawn in Section 6.

2. Review of LRR and SCC

In this section, we give a brief review of SSC and LRR.

Let be a set of -dimensional data points drawn from an unknown union of linear subspaces . The task of subspace clustering is to segment data points into subspaces.

LRR tries to seek the lowest rank representation among many possible linear combinations of the bases in a given dictionary, which typically is the data matrix itself. The problem can be formulated aswhere is the coefficient matrix with each being the representation of . The above problem is NP-hard due to the combinatorial nature of the rank function.

The tightest convex relaxation of the rank function [22] is the nuclear norm. For a matrix , its nuclear norm is defined as , where means the th singular value of . Using this relaxation, LRR solves the following problem:After obtaining , the affinity matrix is defined asThen the spectral clustering algorithm, Normalized Cuts [18], is used to produce the final segmentation.

SSC aims to find a sparse representation of by solving the following convex optimization problem:where , is a sparse matrix containing the gross error, , and is a matrix of fitting residuals. After obtaining , subsequent procedures are similar to LRR.

3. LogDet Rank Approximation and Its Minimization Algorithm

A function is absolutely symmetric if is invariant under arbitrary permutations and sign changes of the elements of . Based on this function , we have the following theorem [23].

Theorem 1. Function is unitarily invariant if , where , whose singular value decomposition is , are singular values of , and . And the gradient of at iswhere .

Equation (5) can be obtained directly from Theorem 3.1 of [23].

In this work, we utilize unitarily invariant function LogDet to achieve a closer, though not convex, rank relaxation than the nuclear norm. We apply the method of ALM for LogDet rank approximation associated minimization. To explain our method, we specifically consider using LogDet as a rank surrogate in subspace clustering. We first obtain a low-rank representation of high-dimensional data based on the LogDet optimization. Then we construct an affinity graph matrix for spectral clustering by using the angular information of principal directions of the low-rank representation.

3.1. LogDet Rank Minimization

We use as a surrogate of the rank function of . It is obvious that . Because it can be easily verified that for any , we always have ; particularly, if there are large nonzero singular values, the LogDet function will be much smaller than the nuclear norm since for a large . It is noted that, for small nonzero singular values, their contribution to the LogDet function will be significantly reduced compared to the nuclear norm. Because small nonzero singular values are often regarded as being from noise in the data, the LogDet function reduces noise effect more compared to the nuclear norm.

It is worthwhile to note that a similar function was proposed in [24] to approximate rank and iterative linearization was used to find a local minimum. However, is a very small constant (e.g., ), which leads to biased approximation for small singular values.

This LogDet function is differentiable with respect to the singular values by Theorem 1, and even though it is nonconvex, its minimization is rather simple by using our optimization method. To explain its minimization, we consider its specific application to subspace clustering. By employing the above LogDet function, we simply formulate the subspace clustering into the following unconstrained nonconvex minimization problem:where is the identity matrix. The first term of (6) is to minimize the rank of , while the second is a relaxation of , which is referred to as the self-expressiveness of with representing the similarity between data points. Because the LogDet function is not convex in , we resort to ALM technique to solve (6), by rewriting (6) as follows:

We turn to the minimizing of the following augmented Lagrangian function:where is a penalty parameter and is the Lagrangian dual variable. With a sufficiently large , the objective function converges to objective function in (6). This can be solved by updating , , and alternatively while fixing the other variables. Specifically, assume that at the th iteration we have obtained , and ; then, for the th iteration, optimization problem (8) can be updated via the following four steps.

Step 1. Compute . Fix and and then calculate :which has a closed-form solution:

Step 2. Compute . Fix and and minimize as follows:This can be converted to a scalar minimization problem due to the following theorem. As we notice, this can also be rewritten as a special case of the problem in a recent work [25].

Theorem 2. For unitarily invariant function , assuming SVD of is , , the optimal solution to the problemis , with obtained by solving scalar minimization problems

Proof. Let be SVD of ; then . Denoting which has exactly the same singular values as , that is, , we have In the above, (15) holds because the Frobenius norm is unitary invariant; (16) holds because is unitary invariant; (17) is true by von Neumann’s inequality; and (20) holds as . The inequality between (15) and (19) can also be obtained by the Hoffman-Wielandt inequality. Therefore, (20) is a lower bound of (14), where is obtained by minimizing (20). Note that the equality in (18) is attained if . Because , the SVD of is , which is the minimizer of problem (12). Hence the proof is completed.

The first-order optimality condition is that the gradient of (13) with respect to each singular value should vanish. Thus, for subproblem (11), we havewhere SVD of is . The above equation is cubic and gives three roots. In addition, we need to enforce the nonnegativity of . It is easily seen that there exists at least one nonnegative root. And there is a unique minimizer if . Finally, we obtain the update of variable with .

Step 3. Compute . Fix and , and then we calculate as follows:

Step 4. Update as . The complete procedure is summarized in Algorithm 1.
Problem (6) is nonconvex. It is difficult to give a rigorous mathematical argument for convergence to (local) optimum. We will provide a theoretical proof that our algorithm converges to an accumulation point and this accumulation point is a stationary point. Our empirical experiments confirm the convergence of the proposed method on the benchmark datasets. The experimental results are promising, despite the fact that the solution obtained by the proposed optimization method may be a local optimum.

Input: data matrix , parameters , , and .
Initialize:  , .
Repeat
(1) Update as:
   .
(2) Solve using (11) and (23).
(3) Update the augmented multiplier and the augmented Lagrange multiplier :
   ,
   .
Until stopping criterion is satisfied.
Return  .

3.2. Affinity Graph Matrix Construction

Now we will construct an affinity matrix for subspace clustering. Optimal may not accurately describe the relationship between samples if the data is severely corrupted. Therefore, in general, it is not a good idea to construct by directly using . In the spirit of [3, 12], we construct an affinity matrix in the following way.

Assuming the skinny SVD of is , we define and . Based on the weighted eigenvector matrix or , we construct an affinity matrix as follows:where () and () represent the th and th columns (rows) of (), respectively, and parameter tunes the sharpness of the affinity between two points, with helping in separating the clusters. When increases, while the between-cluster separability can be increased, the intracluster cohesiveness would nevertheless be degraded. Thus, a suitable needs to balance within-cluster cohesiveness and between-cluster separability. In this paper, we set to be 2. Then we have the same postprocessing as LRR. (For LRR, we use (12) in [3] rather than (3) to construct . We also confirmed with an author of [3] that the power 2 of (12) is a typo and it should be 4.) As or spans the principal directions of , we employ the angle information or powered correlation coefficients of the examples, because their lengths may be affected significantly by the noise or outliers in the data.

Now using the resultant affinity matrix, we can apply spectral clustering algorithm to do segmentation. In this paper, we simply perform NCuts [18] on . The proposed subspace clustering procedure is summarized in Algorithm 2.

Input: data matrix , number of subspaces , parameters , , and .
(1) Obtain from Algorithm 1.
(2) Compute the skinny SVD .
(3) Calculate or .
(4) Construct the affinity graph matrix by (25).
(5) Apply to perform NCuts.

4. Convergence Analysis

In this section, we give the convergence analysis for Algorithm 1. We will show that our optimization algorithm attains at least one stationary point of problem (7). We first rewrite the objective function of (7) as

Lemma 3. The sequence is bounded.

Proof. To minimize at step , the optimal needs to satisfy the first-order optimality condition Note that the updating rule for isthus . We know from (5) thatand , so is bounded. Then it is seen that ; that is, is bounded.

Lemma 4. and are bounded if and .

Proof. ConsiderThus,Since the second term in the above inequality is finite, is bounded. We can rewrite asBecause and are bounded and each term on the right hand side of (34) is nonnegative, each term will be bounded. being bounded implies that all singular values of are bounded and is bounded. Since (), clearly we have bounded . Therefore and are bounded.

Theorem 5. has at least one accumulation point , and is a stationary point of optimization problem (7) with the assumption that .

Proof. is a bounded sequence; hence, by the Bolzano-Weierstrass theorem, there must be at least one accumulation point, which is denoted by . Without loss of generality, we assume that itself converges to . Next, we prove that this accumulation point is a stationary point of problem (26). As , we have . Because and is bounded, we get ; that is, . By first-order optimality condition and the definition of , we have . Letting , we get . At the th step, satisfies ; that is, . With the assumption that [26], we get .
Now we can see that satisfies the KKT conditions of and thus is a stationary point of (7).

5. Experiments and Analysis

In this section, we conduct experiments on the subspace clustering task with both synthetic and real data.

5.1. Experiments with Synthetic Data

We construct 5 independent subspaces whose bases are generated by a random rotation matrix through , , where is a random orthogonal matrix [2]. We sample 20 data vectors from each subspace by , , where is a i.i.d. matrix. Some data vectors are randomly chosen to corrupt; for example, for a data vector , it is corrupted by adding Gaussian noise with zero mean and variance . We then use SCLD to segment the data into 5 clusters. Subspace clustering error rate defined as is used to assess the performance. We report the clustering error rate (averaged from 30 trials) with different corruption levels in Figure 1. Without any corruption, SCLD can cluster all data points correctly.

5.2. Experiments with Real Data

In this section, we evaluate the effectiveness and robustness of SCLD on benchmark datasets, Extended Yale B (EYaleB) [27, 28] and Hopkins 155 [29]. We compare the proposed method SCLD with several state-of-the-art subspace clustering algorithms: LRR [3], SSC [10], LRSC [4, 30], and local subspace affinity (LSA) [13]. For these methods, we use the parameters given by the respective authors. For our method, we also tune to obtain the best performance. Generally, should be relatively large if the data are slightly corrupted. and have little influence on the clustering results, so we just set to ensure the uniqueness of minimizer and use empirically. Other parameters are shown in Table 1. The experiments are conducted on Windows 7 with 16 GM memory and Intel Core i5-2300 CPU.

5.2.1. Face Clustering

Face clustering is to cluster a set of face images from multiple individuals in a hope to reveal the identity of these individuals. EYaleB Database includes 2414 frontal images of 38 individuals. For each individual, the images are taken under 64 lighting conditions and can be described by a low-dimensional subspace [31]. The images are resized to 48 42 pixels and each vectorized image is regarded as a data point. Figure 2 shows some example images from the database.

(1) First Experiment Scenario. As done in [2], we test the algorithms on the first 10 classes of EYaleB, which consists of 640 frontal face images. More than half of the images are corrupted by shadow and noise. We use this heavily corrupted data to test the effectiveness of our method. As shown in Table 2, SCLD significantly enhances the performance. Specifically, it improves the clustering accuracy by at least when compared to the other algorithms. Since the only difference between our approach and LRR is rank approximation, this improvement is due to LogDet.

(2) Second Experiment Scenario. For a fair comparison, we have followed the experimental setup of [10]. We divide the 38 subjects into four groups: subjects 1 to 10, 11 to 20, 21 to 30, and 31 to 38. We consider all choices of subjects for the first three groups. For the last group, we consider all choices of . We implement our subspace clustering algorithm on each set of subjects. For all experiments, the stopping criterion for is triggered by a relative difference of between two successive iterations or by a maximum of 100 iterations.

The results are presented in Table 3. For other methods, we cited the results from Table 5 of [10]. SCLD consistently has low clustering error rates and is more stable than the other methods whose error rates increase drastically as the number of subjects increases to 8 and 10. As shown in Figure 2, there are many sparse within-sample outliers in the face images, for example, shadows. Although LRR uses a regularization term to count for corruptions, the regularization term does not appear to be well suited to EYaleB. LSA has inferior performance possibly because it does not explicitly exploit the low-rank structure of the data.

(3) Third Experiment Scenario. In this section, we compare SCLD with other algorithms with RPCA [32] as a preprocessing step. In practice, we do not know the clustering of the data beforehand and hence we apply RPCA to the collection of all data points for each trial prior to clustering. As shown in Table 4, SCLD is still superior to other methods though they apply RPCA to deal with sparse outlying entries. Compared to Table 3, only the clustering error rates of LRSC reduced in some cases. We can conclude that applying RPCA to all data points simultaneously is not effective in improving clustering performance. This is due to the fact that RPCA seeks a common low-rank subspace, which will decrease the principal angles between subspaces and decrease the distance between data points in different subjects [10].

5.2.2. Motion Segmentation

Motion segmentation is to segment the trajectories associated with different moving objects into different groups according to their motions in a video sequence. Because different motions can be treated as different subspaces, we use the Hopkins 155 Dataset to validate SCLD. This dataset is slightly corrupted as shown in Figure 3. It consists of 155 sequences of two or three motions and 1 sequence of 5 motions; the latter is regarded as outlier. Each sequence is regarded as a separate clustering problem.

The experimental results are reported in Table 5. We also used the results in Table 1 of [10]. It can be seen that SCLD produces superior results compared to the other methods. For all 155 sequences, the error rate is as low as 1.79%. If we use all 156 sequences, the overall error rate of our proposed algorithm will be 1.87%. We report the average computation time for every sequence at the bottom of Table 5. The computational cost of LRSC is much lower than the other methods, while LRR, SSC, and SCLD are comparable.

To testify the influence of parameter in our algorithm, we show the clustering error rates of SCLD for different over all 155 sequences in Figure 4. As we can see, when was between 1 and 200, the clustering error varied between 1.79% and 4.67%. This implies that SCLD performs well under a wide range of values of .

To test the dependence of SCLD on initialization, we apply another two different initializations. First, we use the solutions from LRR as initial guess for SCLD. Second, we just generate some random numbers. We find that we can still get the same results. Actually, it is recommended to use convex relaxation solutions as initialization for nonconvex formulations [33, 34].

6. Conclusion

In this paper we propose using a log-determinant function (LogDet) as a rank approximation to recover the low-rank representation of high-dimensional data. When applied to subspace clustering, the proposed algorithm, called SCLD, exploits both global and local structures of the data through the LogDet rank approximation and angle-based affinity matrix. Consequently, it captures more intrinsic information of the data that benefits subspace clustering. Our extensive experimental results show that it outperforms other low-rank representation algorithms based on the nuclear norm. Therefore LogDet appears to be an effective rank approximation function well suited to subspace clustering applications. Although our model is simple and with no explicit modeling of outliers, it is resilient to various corruptions. Our future research will consider modeling corruptions explicitly.

Conflict of Interests

The authors declare that there is no conflict of interests regarding the publication of this paper.

Acknowledgment

This work is supported in part by US National Science Foundation Grants IIS 1218712.