Abstract

The subspace segmentation problem is fundamental in many applications. The goal is to cluster data drawn from an unknown union of subspaces. In this paper we state the problem and describe its connection to other areas of mathematics and engineering. We then review the mathematical and algorithmic methods created to solve this problem and some of its particular cases. We also describe the problem of motion tracking in videos and its connection to the subspace segmentation problem and compare the various techniques for solving it.

1. Introduction

The subspace clustering problem is fundamental in many engineering and mathematics applications [111]. It can be described as follows: let be the nonlinear set consisting of a union of subspaces of a Hilbert or a Banach space . Let be a set of data points drawn from . The subspace segmentation (or clustering) problem is then to determine (equivalently determine for ), from the data , that is, to(1)determine the number of subspaces ; (2)find an orthonormal basis for each subspace , ;(3)group the data points belonging to the same subspace into the same cluster.

The data is often corrupted by noise; it may have outliers or some of the data vectors may have missing entries. Therefore, any technique for solving the subspace segmentation problem above must be robust and stable for the aforementioned nonideal cases.

Depending on the application, the space can be finite or infinite dimensional. For example, the set of all two dimensional images of a given face , obtained under different illuminations and facial positions, can be modeled as a set of vectors belonging to a low dimensional subspace living in a higher dimensional space [1214]. For this case, a set of such images from different faces is a union . Another application in which a union of subspaces provides a good model is the problem of motion tracking of rigid objects in videos. For this situation (further developed below), a 4-dimensional subspace is assigned to each moving object in a space , where is the number of frames in the video. Examples where is infinite dimensional arise in sampling theory, and in learning theory [1519]. For example, signals with finite rate of innovations are modeled by a union of subspaces that belongs to an infinite dimensional space such as [2, 3, 20, 21].

1.1. Known Number of Subspaces and Dimensions

In some subspace segmentation problems, the number of subspaces or the dimensions of the subspaces are known or can be estimated [1, 8, 22, 23]. In these cases, the subspace segmentation problem, for both the finite and infinite dimensional space cases, can be formulated as follows.

Let be a Hilbert space, a finite set of vectors in , a family of closed subspaces of , and the set of all sequences of elements in of length (i.e., . The subspace segmentation problem formulation as a minimization problem is as follows.

Problem 1 (optimization formulation of the subspace segmentation problem). Given a finite set , a number with , and a fixed integer , find the infimum of the expression over , and .
Find a sequence of -subspaces (if it exists) such that

An example in finite dimensions is when and is the family of all subspaces of of dimensions no greater than . For this case, when , and , this is a well-known least square problem that can be solved using the singular value decomposition technique [24]. An example in infinite dimensions is when and is a family of closed, shift-invariant subspaces of that are generated by at most generators [2]. Typical shift-invariant spaces with one generator are for example the space of bandlimited functions, generated by integer shifts of the generator function . Other important shift invariant spaces are the spline spaces generated by the B-spline functions of degree [25, 26]. In these cases the subspaces in are also infinite dimensional subspaces of . Thus, even in the case where and , this (least squares) problem is much more difficult than its finite dimensional counterpart. It should be noted that when and for any Problem 1 is neither linear nor convex [27, 28]. In the presence of outliers, it has been proven that the best value for is [27, 28], and a good choice for light-tailed noise is . There are more general versions of Problem 1, for example, the Hilbert space can be replaced by a Banach space ; moreover, the family can be replaced by the more general type of family [22].

1.2. Applications and Connection to Other Areas

The subspace segmentation problem has connections to several active areas of research, including learning theory, compressed sampling, and signal processing in general [2, 3, 17, 21, 2932]. Moreover, it is relevant to several computer vision applications including motion tracking of rigid objects in videos and facial recognition [1, 4, 14, 3338].

1.2.1. Connection to Compressed Sampling

In compressed sampling, the goal is to find an unknown vector from a small set of linear measurements , , where are known sampling vectors. Clearly, this problem has a solution only if some extra information is known about and if the sampling vectors s are well chosen. In compressed sampling, the assumption is that, in a suitable basis, or frame, the unknown vector is -sparse or nearly -sparse (compressible), with [30, 3944]. This means that in a suitable basis or frame the vector has at most nonzero components, or, in the compressible assumption, that has at most large components. This sparsity assumption (or compressibility assumptions) implies that the vector must belong (or must be close to) a union of subspaces of dimensions at most . Thus, finding the sparse model for a class of signals can be obtained by solving the subspace segmentation problem in the special case where , and where the is the class of subspaces of of dimensions at most , and [45, 46].

1.2.2. Connection to Learning Theory and Data Mining

In many learning theory problems, a class of data may form a complex structure embedded in a high dimensional space [4753]. In the neighborhood of each data point, the structure may be modeled by a local tangent space, or a union of tangent spaces whose dimensions are much smaller than the dimension of the ambient space [16]. The global shape of the data model can then be obtained from the observed data points by solving Problem 1.

1.2.3. Connection to Signal Processing

In signal processing, signals are often modeled by an infinite dimensional shift-invariant subspace of [15, 5461]. For example, the classical shift-invariant space is the space of bandlimited functions , also known as the Paley-Wiener space [6267]. This is the space generated by the function and its integer shifts. Multiresolution and wavelet spaces are also shift-invariant spaces that are often used in signal processing applications. Choosing the model for a class of signals can be cast in terms of finding the solution of 1 from observed data. Unlike the compressed sampling or learning theory discussed earlier, in this situation the class consists of infinite dimensional subspaces of and therefore are more difficult to deal with even for a single shift-invariant subspace model () [68]. The case in which a signal model is not a single subspace but a union of several of such subspaces is natural as in the case of signals with finite rate of innovation [6973].

1.2.4. Application to Motion Tracking in Video

The problem of tracking rigid moving objects in a video can be formulated as a subspace segmentation problem [33, 35, 7477]. Let ,   be the Cartesian coordinates of a point of a moving object in frame of a video. By concatenating all the coordinates of into a single vector we obtain the so-called  trajectory vector of whose length is where is the number of frames in the video. It can be shown that, for rigid bodies, the trajectories of any point of object belong to a subspace of of dimensions no greater than . Thus, if is a set of trajectory vectors from a set video of moving objects (background is one such objects), then the set belongs to a union of subspaces of dimensions at most . Thus, solving the subspace segmentation problem in this situation consists in using the data to find the subspaces, and then grouping together the trajectory vectors that belong to the same objects according to the subspace they belong to. It can also be shown that human facial motion and other nonrigid motions can be approximated by linear subspaces [78].

1.2.5. Application to Face Recognition

It has been shown that the set of all two-dimensional images of a given face , obtained under different illuminations and facial positions, can be modeled as a set of vectors belonging to a low dimensional subspace, , living in a higher dimensional space [14]. A set of such images from different faces are then a union , where each face is associated with a give face.

1.3. Dimensionality Reduction

Since the data may live in a very high dimensional space , but may consist of spaces with dimension and , the subspace clustering problem can be solved in a smaller dimensional space , the  effective dimension. Specifically, if , then the data can be projected on a space of dimension , where the projection is not necessarily an orthogonal projection, but any “good” linear process that maps the data to another (low dimensional) space, for example, random projection [7982]. As a result of projecting and the data , we get the set and the data . It is now possible to solve the subspace segmentation problem with data instead of and use the segmentation in the low dimensional space to solve the original problem. This dimensionality reduction technique can be very effective and is often used in conjunction with the subspace segmentation problem [12, 83].

2. Algebraic Methods for Finite Dimensional Noise Free Case

The general subspace segmentation problem described in Section 1 does not yet have a good approach for solving it. In the ideal case where no noise is present, there are several algebraic methods that can solve this problem as will be described below. However, in realistic situations when noise, outliers, and corrupted data are present, there are no satisfactory algorithms for finding the solution, even in the finite dimensional case when . The difficulties are both theoretical as well as computational, as will be further described below.

In the ideal case, when , and the data is drawn from a finite union of subspaces , the general problem can be solved using algebraic methods. Obviously, there must be enough data points. In particular, it is necessary that for each subspace there is a subset of data points of that form a basis for . However, this is not sufficient. Consider for example the very simple case in which the data is drawn from a union of two subspaces of such that . If we are supplied with two points, one from each line , we will not be able to decide whether the data is drawn from a single subspace or from the union . However, if we are supplied with enough points belonging to and enough belonging to , the structure becomes apparent.

2.1. Reduced Row Echelon Form Method

One of the recent algebraic methods for solving the noise free subspace segmentation problem under the independent subspace restriction is the reduced row echelon form (RREF) method [22]. This method is a generalization of the method of Gear who observed that, for four dimensional subspaces, the reduced echelon form can be used to segment motions in videos [84]. It turns out that in the noise free case the reduced row echelon form method can completely solve the subspace segmentation in almost its most general version.

The RREF is based on the familiar Gauss elimination techniques for solving linear systems of equation. However, for this method to work, certain assumptions on the data and the subspaces are needed. Specifically, there must be enough data to cover all the dimensions of the union of subspaces from which the data is drawn. Moreover, the susbpaces must be independent. To make these assumptions precise, we make the following definitions.

Definition 1 (generic data). Let be a linear subspace of with dimension . A set of data drawn from with dimension are said to be generic, if (i) , and (ii) every vector from forms a basis for .

Definition 2 (independent subspaces). Subspaces are called independent if .

Independent subspaces have the property that for . The converse, however, is false, for example, three subspaces ,  ,   in with can never be independent. More generally, if are independent, then and for .

If we knew the subspaces , it would be easy to partition the data into the partition such that . Conversely, if we knew a partition of the data such that the set comes from the same subspace , then we would set and our problem subspace segmentation would be solved.

However, all we are given is the data , and we do not know the partition . Thus, solving the subspace segmentation problem amounts to finding the partition of . To do this, we construct a matrix whose columns are the data vectors . The matrix is a matrix, where may be large, while the rank of is often much smaller (noise free case). Using the three elementary row operation used in Gaussian elimination, we transform to its reduced row echelon form where is an matrix and where is the rank of . By setting to the value all nonzero coefficients in we obtain the so-called  binary reduced echelon form of denoted by . The Binary reduced row echelon form of has a structure that allows us to easily find the partition and thereby solve the subspace segmentation problem as Theorem 3 below suggests [22].

Theorem 3. Let be a set of nontrivial linearly independent subspaces of . Let be a matrix whose columns are drawn from . Assume the data is drawn from each subspace and that it is generic. Let be the binary reduced row echelon form of . Then(1)the inner product of a pivot column and a nonpivot column in is one, if and only if the corresponding column vectors in belong to the same subspace for some ; (2)moreover, , where is the -norm of ; (3)finally, if and only if   or .

This theorem suggests a very simple yet effective approach to clustering the data points (Algorithm 1) and solves the subspace segmentation problem. This is done by finding a partition of the data into clusters such that . The clusters can be formed as follows: pick a nonpivot element in , and group together all columns in such that . Repeat the process with a different nonpivot column until all columns are exhausted. This is detailed in Algorithm 1 below.

Require: data matrix W.
1: Find rref (W) of W.
2: Find Brref (W) of W by setting all non-zero entries of rref (W) to 1.
3: for all from 1 to do
4:  Pick the th column of Brref (W).
5:  if is pivot then
6:   continue
7:  end if
8:  for all from 1 to −1 do
9:  if is non-pivot and > 0 then
10:   Place in the same cluster .
11:   break
12:  end if
13: end for
14: end for
15: for all do
16:  Pick any .
17:  Separate into unit vectors . {These vectors form a basis for a subspace
   with dimension .}
18:  for all from 1 to do
19:   if then
20:    Place in the same cluster . {This is for handling pivot columns.}
21:   end if
22:  end for
23:  Place the corresponding columns in W into the same cluster .
24: end for
25: Renumber indices ’s of starting from 1.

Note that, we do not need to know the number of subspaces nor do we need to know the dimensions of the subspaces for solving the subspace segmentation problem in this case. and are an output of the algorithm. The only assumption is that there are enough data points and that they are well distributed (they are generic), and that the subspaces are independent.

For noisy data, the reduced row echelon form method does not work, and a thresholding must be applied. However, the effect of the noise on the reduced echelon form method depends on the noise level and the relative positions of the subspaces. This dependence has been analyzed in [22].

2.2. The Generalized Principle Component Analysis GPCA

Another algebraic method for solving the subspace segmentation problem is the so-called  generalized principle component analysis (GPCA) [12, 85]. Although the most general form of this method solves the subspace segmentation problem in its entire generality for finite dimensions, we will only describe the idea behind the GPCA method in the simplified case where the number of subspaces is known and when the subspaces are hyperplanes in , that is, their dimensions is . For this case, each subspace can be described by its normal vector , and every data point satisfies the linear equation where . Thus, a data point drawn from the union of subspaces must satisfy the polynomial equations The product is in fact a homogeneous polynomial of degree , where , ( integers). Thus, if it must satisfy the equation . Hence, in order to solve the subspace segmentation problem for this case, we must(1)find the polynomial by finding the values of its coefficients . This is done by creating a system of linear equations in the unkown by setting for each data . If the number of data points is generic, then the solution of the system of equations determines the polynomial ;(2)once the polynomial is determined, it must be factored into its product . The vectors can then be found by identification. The subspaces in the unions are thus determined.

A modification of the GPCA method described in the previous section works for the general subspace segmentation in which neither the dimensions of the subspace nor their number is described [12, 85]. However, as in the case of RREF method, this method cannot work directly when noise is present and some modification is needed in the presence of noise and outliers as described in [12, 85].

3. Optimization Methods and Subspace Segmentation in the Presence of Noise

The algebraic methods discussed in the previous section do not work without modification for the case in which the data is corrupted by noise or outliers. Even with some of the adjustments to take care of noisy environment, the algebraic algorithms do not perform well when the noise is not small. Algorithms rated according to their simplicity, computational speed, and their performance in nonideal situations. Thus, algebraic methods or their modifications may be the algorithms of choice if the noise is small and computational speed is the main requirement. However, when noise is relatively large and accuracy is important, other methods are needed. In this section we discuss other methods that are robust to noise and other inaccuracies in the data.

One of the methods for the subspace segmentation problem when noise is present is typified by Problem 1. Minimizing the functional described in Problem 1 amounts to finding the union of subspaces that is nearest to the data. However, some a priori knowledge of the number of subspaces and the dimensions of the subspaces may be necessary. The cost function can be modified to incorporate a cost that depends on the number of subspaces and their dimensions, if these quantities are unknown. But before getting into algorithms for solving Problem 1, the existence of a minimizer is a theoretical question of interest. Thus, we start by some of the results pertaining to this issue.

3.1. Existence of a Minimizer to Problem 1

Given a family of closed subspaces of , a solution to Problem 1 may not exist even in the linear case when . For example, assume that and is the set of all lines through the origin except the line . For this case, a minimizer may exist for certain distribution of data points but not for others. The existence of a solution here means that a minimizer exists for any distribution of any finite number of data points. We will describe the existence results when is a Hilbert space. The case when is not a Hilbert space is very difficult and only partial results are known.

It turns out that the existence of a minimizing sequence of subspaces that solves Problem 1 is equivalent to the existence of a solution to the same problem but for [2].

Theorem 4. Problem 1 has a minimizing set of subspaces for all finite sets of data and for any if and only if it has a minimizing subspace for all finite sets of data and for .

Therefore, the following definition is useful.

Definition 5. A set of closed subspaces of a separable Hilbert space have the minimum subspace approximation property (MSAP) if for every finite subset there exists an element that minimizes the expression

Using this terminology, Problem 1 has a minimizing sequence of subspaces if and only if satisfies the MSAP. If and , then satisfies MSAP. This fact is easy to prove directly and is in fact a consequence of the Eckart-Young theorem [24]. Another situation is when and is the set of all shift-invariant spaces of length at most . For this last case, a result in [68] implies that satisfies the MSAP.

In order to understand the general case, we identify each subspace with the orthogonal projector whose kernel is exactly (i.e., , where is the orthogonal projector on ). Now we can think of as a set of projection operators and endow it with the induced weak operator topology. This setting allows us to give the necessary and sufficient conditions for a class to have the MSAP property for the case when in (1). Note that it is sufficient that is closed in order for to have the MSAP. However, this condition is too strong as the following example shows: let and consider the set which is the union of the plane and the set of lines . Then (identified with a set of projectors as described earlier) is not closed (since ). However, it is easy to show that this set satisfies the MSAP, since if the infimum in (1) is achieved by the missing line given by , it is also achieved by the plane .

For finite dimensions, the weak operator and strong operator topologies are the same and the characterization of the MSAP can be obtained in terms of the convex hull of the family consisting of together with the positive semidefinite operators added to it. Recall that the convex hull of a set is the smallest convex set containing , that is, is the intersection of all convex sets containing . For finite dimensions, the following theorem give the necessary and sufficient conditions for the MSAP property and hence the necessary and sufficient conditions for the existence of a solution to Problem 1, when in (1).

Theorem 6. Suppose has dimension . Then the following are equivalent(i) satisfies MSAP;(ii) is closed;(iii).

The necessary and sufficient conditions in infinite dimensions for the existence of solutions when can be found in [20], but are much more complicated. However, no such results are known for the existence of solution to Problem 1 when .

3.2. Search Algorithms for Problem 1

Searching for a solution to Problem 1 is easier when since this problem is then a linear problem. Using an algorithm for solving this simpler problem, the more difficult problem when can be solved by using multiple times in an iterative algorithm as follows.

Let be the set of all partitions of the data , that is, if is such that when , .(1)Let be a partition of the data . For each , use Algorithm to find the subspace that is nearest to in the sense that it minimizes . We obtain a sequence of subspaces .(2)Construct a new partition by reassigning each data point to its nearest subspace from and by grouping together those points that are assigned to the same subspace.(3)Iterate between the two steps as described in Algorithm 2.

1: Pick any partition (W)
2: For each subset in the partition find the subspace
 that minimizes the expression ( ) =
3: while do
4:  for all i from 1 to do
5:   Update
6:   Update (P) =
7:  end for
8:  Update
9: end while
10: = ( ), … ( )}

It can be shown that this algorithm always converges. However, the convergence may be a local minima instead of the global one. For this reason, a good initial partition is important. This initial partition can be supplied by some modified version of the algebraic methods described in the previous section, for example.

There are many iterative algorithms for finding a solution to the subspace segmentation problem or some of its special cases (see, e.g., [86, 87]). Most of them iterate between partitioning the data and finding the union of subspaces that is consistent with the partition. The general algorithm described below solves the subspace segmentation problem by searching for the minimizer of Problem 1.

Note that the cost functions and in the while loop of Algorithm 2 are the one defined by (1) in Problem 1, but correspond to for .

Step in Algorithm 2 is problem dependent. For example, in the situation where and is the set of subspaces of dimensions no greater than , Step can be solved using the singular value decomposition (SVD). A similar algorithm works in a much more general context as described in [2].

4. Motion Segmentation

The problem of motion segmentation has been described in Section 1.2.4. This problem is a special case of subspace segmentation in which and is the family of subspaces of dimensions no bigger than . There are many algorithms that have been developed to solve this problem, such as the methods based on sparsity [10, 8890], the algebraic methods, [1, 12, 91], the statistical methods [76, 9295], and the iterative methods [22, 86]. The most successful methods however are all based on the spectral clustering or some related method [22, 34, 36, 96, 97]. The main idea is that a similarity matrix is used to describe the “connection” between the points. Once this similarity matrix is obtained a classical clustering technique (such as the -means) is applied to a projection of the similarity matrix on a low dimensional space (here projection is used loosely and is not necessarily an orthogonal projection). These methods are often tested and compared to the state-of-the-art methods on the Hopkins 155 Dataset [8], which serves as benchmark database to evaluate motion segmentation algorithms. It contains two and three motion sequences. Cornerness features that are extracted and tracked across the frames are provided along with the dataset. The ground truth segmentations are also provided for comparison. Figure 1 shows two samples from the dataset with the extracted features.

4.1. Nearness to Local Subspace Algorithm

Since most spectral clustering algorithms use similar overall structure, we describe the Nearness to Local Subspace (NLS) algorithm, which is the most performant of the spectral clustering type methods as applied to the Hopkins 155 Dataset. Other spectral clustering based algorithms will be discussed in Section 4.2.

The NLS method works whenever the dimensions of the subspaces are equal and known. First, a local subspace is estimated for each data point (vector). Then, the distances between the local subspaces and points are computed and a distance matrix is generated. This is followed by construction of a binary similarity matrix constructed by applying a data-driven threshold to the distance matrix. Finally, the segmentation problem is converted to a one-dimensional data clustering problem.

The algorithm for subspace segmentation is given in Algorithm 3. It assumes that the subspaces have dimension (for motion segmentation, ). The details of the various steps are as follows.

Require: The data matrix whose columns are drawn from subspaces of dimension
Ensure: Clustering of the feature points.
1: Compute the SVD of  as in (8).
2: Estimate the rank of (denoted by ) if it is not known. For example, using (9) or any other appropriate choice.
3: Compute consisting of the first rows of .
4: Normalize the columns of .
5: Replace the data matrix with .
6: Find the angle between the column vectors of and represent it as a matrix.
 {i.e., arccos ).}
7: Sort the angles and find the closest neighbors of column vector.
8: for all Column vector of   do
9:  Find the local subspace for the set consisting of and neighbors (see (10)).
  {Theoretically, is at least . We can use the least square approximation for the subspace
   (see the section Local Subspace Estimation). Let denote the matrix whose columns form
  an orthonormal bases for the local subspace associated with .}
10: end for
11: for to do
12:   for to do
13:   define
14:  end for
15: end for Build the distance matrix}
16: Sort the entries of the matrix from smallest to highest values into the vector and set the threshold to
  the value of the entry of the sorted and normalized vector , where is such that
is minimized, and where is the characteristic function of the discrete set .
17: Construct a similarity matrix by setting all entries of less than threshold to 1 and by setting all
 other entries to 0. {Build the binary similarity matrix}
18: Normalize the rows of using -norm.
19: Perform SVD = .
20: Cluster the columns of using k-means. is the projection on to the span of   .

Dimensionality Reduction and Normalization. A dimensionality reduction step is typical in any algorithm, including those using spectral clustering. Let be a data matrix whose columns are drawn from a union of subspaces, where each subspace has dimensions at most . The data is possibly perturbed by noise and may have other imperfections. One way to reduce the dimensionality of the problem is to use SVD. Specifically, compute the SVD of where is an matrix, is an matrix, and is a diagonal matrix with diagonal entries , where .

If the rank of the data is not known, one can use the modal selection algorithm [34] to estimate its rank by where is the singular value and is a suitable constant. Another possible model selection algorithm can be found in [98]. is the best rank- approximation of , where refers to a matrix that has the first columns of as its columns and refers to the first rows of . In the case of motion segmentation, if there are independent motions across the frames captured by a moving camera, the rank of is between and .

To reduce the dimensionality of the data, replace the data matrix with the matrix that consists of the first rows of   . This step is justified by the following proposition in [22].

Proposition 7. Let and be and matrices. Let . Assume .(i)If then .(ii)If is full rank and then .

It should also be noted that this step reduces additive noise as well, especially in the case of light-tailed noise, for example, Gaussian noise. The number of subspaces corresponds to the number of moving objects. Dimensionality reduction corresponds to Steps 1, 2, and 3 in Algorithm 3.

Another type of data reduction is normalization. Specifically, the columns of are normalized to lie on the unit sphere . This is because by projecting the subspace on the unit sphere we effectively reduce the dimensionality of the data by one. Moreover, the normalization gives equal contribution of the data matrix columns to the description of the subspaces. Note that the normalization can be done by using norms of the columns of . This normalization procedure corresponds to Steps 4 and 5 in Algorithm 3.

Local Subspace Estimation. The data points (i.e., each column vector of ) that are close to each other are likely to belong to the same subspace. For this reason, a local subspace is estimated for each data point using its closest neighbors. This can be done by generating a distance matrix and then sorting each column of the distance matrix to find the neighbors of each , which is the column of .

Once the distance matrix between the points is generated, one can find, for each point , a set of points consisting of and its closest neighbors. Then a -dimensional subspace that is nearest (in the least square sense) to the data is generated. This is accomplished using SVD Let denote the matrix of the first columns of associated with . Then, the column space is the -dimensional subspace nearest to . Local subspace estimation corresponds to Steps 6 to 10 in Algorithm 3.

Construction of Binary Similarity Matrix. So far, we have associated a local subspace to each point . Ideally, the points and only those points that belong to the same subspace as should have zero distance from . This suggests computing the distance of each point to the local subspace and forming a distance matrix .

The distance matrix is generated as . A convenient choice of is 2. Note that as decreases, the probability of having on the same subspace as increases. Moreover, for , is the Euclidean distance of to the subspace associated with .

Since we are not in the ideal case, a point that belongs to the same subspace as may have nonzero distance to . However, this distance is likely to be small compared to the distance between and if and do not belong to the same subspace. This suggests that we compute a threshold that will distinguish between these two cases and transform the distance matrix into a binary matrix in which a zero in the entry means and are likely to belong to the same subspace, whereas entry of one means and are not likely to belong to the same subspace.

To do this, we convert the distance matrix into a binary similarity matrix . This is done by applying a data-driven thresholding as follows.(1)Create a vector that contains the sorted entries of from the smallest to the highest values. Scale so that its smallest value is zero and its largest value is one.(2)Set the threshold to the value of the th entry of the sorted vector , where is such that is minimized, and where is the characteristic function of the discrete set . If the number of points in each subspace is approximately equal, then we would expect points in each subspace, and we would expect small entries (zero entries ideally). However, this may not be the case in general. For this reason, we compute the data-driven threshold that distinguishes the small entries from the large entries. The data-driven threshold is chosen according to the method described in [1].(3)Create a similarity matrix from such that all entries of less than the threshold are set to 1 and the others are set to 0.

Segmentation. The last step is to use the similarity matrix to segment the data. To do this, we first normalize the rows of using -norm, that is, , where is a diagonal matrix . is related to the random walk Laplacian [66]. Although other normalizations are possible for , however, because of the geometry of the ball, -normalization brings outliers closer to the cluster clouds (distances of outliers decrease monotonically as decreases to 1). Since SVD (which will be used next) is associated with minimization, it is sensitive to outliers. Therefore normalization works best when SVD is used.

Observe that the initial data segmentation problem has now been converted to segmentation of 1-dimensional subspaces from the rows of . This is because, in the ideal case, from the construction of , if and are in the same subspace, the th and th rows of are equal. Since there are subspaces, then there will be 1-dimensional subspaces.

Now, the problem is again a subspace segmentation problem, but this time the data matrix is with each row as a data point. Also, each subspace is 1-dimensional and there are subspaces. Therefore, we can apply SVD again to obtain Using Proposition 7, it can be shown that can replace and we cluster the columns of , which is the projection of onto the span of . Since the problem is only segmentation of subspaces of dimension 1, we can use any traditional segmentation algorithm such as -means to cluster the data points. The segmentation corresponds to Steps 18 to 20 in Algorithm 3.

4.2. Other Spectral Clustering Methods

Other subspace clustering methods use essentially the same general algorithm as above, but the main difference is the construction of the similarity . For example, Yan and Pollefeys' method estimates a subspace for each point and then uses the cordal distance between the local subspaces to construct a similarity matrix . The algorithm of Elhamifar and Vidal [88, 89] uses the sparsity method to compute a similarity matrix based on sparse representations of the data . The sparse representations are found using the standard minimization techniques in compressed sampling. We have tested these algorithms using minimizations and found that both cases produce essentially the same results. Thus, it is our conclusion that it is the spectral clustering performed on the similarity matrix that is the main reason for the good performance of this and other related algorithms.

4.3. Comparison of Motion Segmentation Algorithms

Tables 1, 2, and 3 display some of the experimental results for the Hopkins 155 Dataset. Seven approaches are compared for the motion detection algorithms: GPCA [12], RANSAC [99], local subspace affinity (LSA) [34], MLS [93], agglomerative lossy compression (ALC) [100], sparse subspace clustering (SSC) [88], and NLS. An evaluation of those algorithms is presented in [88] with a minor error in the tabulated results for articulated three-motion analysis of SSC-N. SSC-B and SSC-N correspond to Bernoulli and normal random projections, respectively [88]. Table 1 displays the misclassification rates for the two motions video sequences. Table 2 shows the misclassification rates for the three motion sequences, and Table 3 presents the misclassification rates for all of the video sequences. It can be seen that the NLS algorithm outperforms all of the algorithms.

Acknowledgment

This research is supported in part by NSF Grant DMS-110863.