Abstract

We investigate the conditions for which nonnegative matrix factorization (NMF) is unique and introduce several theorems which can determine whether the decomposition is in fact unique or not. The theorems are illustrated by several examples showing the use of the theorems and their limitations. We have shown that corruption of a unique NMF matrix by additive noise leads to a noisy estimation of the noise-free unique solution. Finally, we use a stochastic view of NMF to analyze which characterization of the underlying model will result in an NMF with small estimation errors.

1. Introduction

Large quantities of positive data occur in research areas such as music analysis, text analysis, image analysis, and probability theory. Before deductive science is applied to large quantities of data, it is often appropriate to reduce data by preprocessing, for example, by matrix rank reduction or by feature extraction. Principal component analysis is an example of such preprocessing. When the original data is nonnegative, it is often desirable to preserve this property in the preprocessing. For example, elements in a power spectrogram, probabilities, and pixel intensities should still be nonnegative after the processing to be meaningful. This has led to the construction of algorithms for rank reduction of matrices and feature extraction generating nonnegative output. Many of the algorithms are related to the nonnegative matrix factorization (NMF) algorithm proposed by Lee and Seung [1, 2]. NMF algorithms factorize a nonnegative matrix or into two nonnegative matrices and : There are no closed-form solutions to the problem of finding and given a , but Lee and Seung [1, 2] proposed two computationally efficient algorithms for minimizing the difference between and for two different error functions. Later, numerous other algorithms have been proposed (see [3]).

An interesting question is whether the NMF of a particular matrix is unique. The importance of this question depends on the particular application of NMF. There can be two different viewpoints when using a model like NMF—either one can believe that the model describes nature and that the variables and have a physical meaning or one can believe that the model can capture the part of interest even though there is not a one-to-one mapping between the parameters and the model, and the physical system. When using NMF, one can wonder whether is a disturbed version of some underlying or whether the data is constructed by another model or, in other words, a ground truth and does exist. These questions are important in evaluating whether or not it is a problem that there is another NMF solution, , to the same data, that is, If NMF is used even though the data is not assumed to be generated by (1), it may not be a problem that there are several other solutions. On the other hand, if one assumes that a ground truth exists, it may be a problem if the model is not detectable, that is, if it is not possible to find and from the data matrix .

The first articles on the subject was two correspondences between Berman and Thomas. In [4] Berman asked for what amounts to a simple characterization of the class of nonnegative matrices for which an NMF exists. As we shall see, the answer by Thomas [5] can be transferred into an NMF uniqueness theorem.

The first article investigating the uniqueness of NMF is Donoho and Stodden [6]. They use convex duality to conclude that in some situations, where the column vectors of “describe parts,” and for that reason are nonoverlapping and thereby orthogonal, the NMF solution is unique.

Simultaneously with the development of NMF, Plumbley [7] worked with nonnegative independent component analysis where one of the problems is to estimate a rotation matrix from observations on the form , where is a nonnegative vector. In this setup, Plumbley investigates a property for a nonnegative independent and identically distributed (i.i.d.) vector such that can be estimated. He shows that if the elements in are grounded and a sufficiently large set of observations is used, then can be estimated. The uniqueness constraint in [7] is a statistical condition of .

The result in [7] is highly relevant to the NMF uniqueness due to the fact that in most cases new NMF solutions will have the forms and as described in Section 3. By using Plumbley's result twice, a restricted uniqueness theorem for NMF can be constructed.

In this paper, we investigate the circumstances under which NMF of an observed nonnegative matrix is unique. We present novel necessary and sufficient conditions for the uniqueness. Several examples illustrating these conditions and their interpretations are given. Additionally, we show that NMF is robust to additive noise. More specifically, we show that it is possible to obtain accurate estimates of and from noisy data when the generating NMF is unique. Lastly, we consider the generating NMF as a stochastic process and show that particular classes of such processes almost surely result in unique NMFs.

This paper is structured as follows. Section 2 introduces the notation, some definitions, and basic results. A precise definition and two characterizations of a unique NMF are given in Section 3. The minimum constraints of and for a unique NMF are investigated in Section 4. Conditions and examples of a unique NMF are given in Section 5. In Section 6, it is shown that in situations where noise is added to a data matrix with a unique NMF, it is possible to bound the error of the estimates of and . A probabilistic view on the uniqueness is considered in Section 7. The implication of the theorems is discussed in Section 8, and Section 9 concludes the paper.

2. Fundamentals

We will here introduce convex duality that will be the framework of the paper, but first we shall define the notation to be used. Nonnegative real numbers are denoted as denotes the Frobenius norm, and is the space spanned by the set of vectors. Each type of variables has its own font. For instance, a scalar is denoted , a column vector is denoted , a row vector is denoted by , a matrix is denoted by , a set is denoted by , and a random variable is denoted by . Moreover, is the th index of the vector . When a condition for a set is used to describe a matrix, it is referring to the set of column vectors in the matrix. The NMF is symmetric in and , so the theorems for one matrix may also be used for the other.

In the paper, we make a geometric interpretation of the NMF similar to that used in both [5, 6]. For that, we need the following definitions.

Definition 1. The positive span is given by .

In some literature, the positive span is called the conical hull.

Definition 2. A set is called a simplicial cone if there is a set such that . The order of a simplicial cone is the minimum number of elements in .

Definition 3. The dual to a set , denoted , is given by .

The following lemma is easy to prove and will be used subsequently. For a more general introduction to convex duality, see [8].

Lemma 1. (a) If , then if and only if for all .
(b) If and is invertible, then .
(c) If , then .
(d) If and are closed simplicial cones and , then .

3. Dual Space and the NMF

In this section, our definition of unique NMF and some general conditions for unique NMF are given. As a starting point, let us assume that both and have full rank, that is, .

Let and be any matrices that fulfil, . Then, . The column vectors of and are therefore both bases for the same space and as a result there exists a basis shift matrix such that . It follows that . Therefore, all NMF solutions where , are of the form . In these situations, the ambiguity of the NMF is the matrix. Note that if the above arguments are not valid because can differ from and thereby .

Example 1. The following is an example of an matrix of rank , where there are two NMF solutions but no matrix to connect the solutions We mention in passing that Thomas [5] uses this matrix to illustrate a related problem. This completes the example.

Lemma 2 (Minc [9, Lemma 2 1.1]). The inverse of a nonnegative matrix is nonnegative if and only if it is a scaled permutation.

Lemma 2 shows that all NMF solutions on the forms and , where is a scaled permutation, are valid, and thereby that NMF only can be unique up to a permutation and scaling. This leads to the following definition of unique NMF in this paper.

Definition 4. A matrix has a unique NMF if the ambiguity is a permutation and a scaling of the columns in and rows in .

The scaling and permutation ambiguity in the uniqueness definition is a well-known ambiguity that occurs in many blind source separation problems. With this definition of unique NMF, it is possible to make the following two characterizations of the unique NMF.Theorem 1. If , an NMF is unique if and only if the positive orthant is the only -order simplicial cone such that .Proof. The proof follows the analysis of the matrix above in combination with Lemma 1(b). The theorem can also be proved by following the steps of the proof in [5].Theorem 2 (see [6]). The NMF is unique if and only if there is only one -order simplicial cone such that , where is the positive orthant.Proof. The proof follows directly from the definitions.The first characterization is inspirited by [5] and the second characterization is implicit introduced in [6]. Note that the two characterizations of the unique NMF analyze the problem from two different viewpoints. Theorem 1 takes a known and pair as starting point and looks at the solution from the “inside,” that is, the -dimensional space of row vectors in and column vectors in . Theorem 2 looks at the problem from the “outside,” that is, the -dimensional column space of .

4. Matrix Conditions

If is unique, then both and have to be unique, respectively, that is, there is only one NMF of and one of , namely, and . In this section, a necessary condition for and is given and a sufficient condition is shown.

The following definition will be shown to be a necessary condition for both the set of row vectors in and column vectors in .

Definition 5. A set of vectors in is called boundary close if for all and there is an element such that

In the case of closed sets, the boundary close condition is that and . In this section, the sets will be finite (and therefore closed), but in Section 7 the general definition above is needed.

Theorem 3. The set of row vectors in has to be boundary close for the corresponding NMF to be unique.

Proof. If the set of row vectors in are not boundary close, there exist indexes and such that the th element is always more than times larger than the th element in the row vectors in . Let , where and denotes the th standard basis vector. This set fulfils the condition and we therefore, using Theorem 1, conclude that the NMF cannot be unique.

That not only the row vectors of with small elements determine the uniqueness can be seen from the following example.

Example 2. The following is an example where is not unique but is.
Let Here is boundary close but not unique since . The uniqueness of can be verified by plotting the matrix as shown in Figure 1, and observe that the conditions of Theorem 1 are fulfilled. This completes the example.

In three dimensions, as in Example 2, it is easy to investigate whether a boundary close is unique—if , then can only have two types of structure: either the trivial (desired) solution where or a solution where only the diagonal of is zero. In higher dimensions, the number of combinations of nontrivial solutions increases and it becomes more complicated to investigate all possible nontrivial structures. For example, if is the matrix from Example 2, then the matrix is boundary close and can be decomposed in several ways, for example, Instead of seeking necessary and sufficient conditions for a unique , a sufficient condition not much stronger than the necessary is given. In this sufficient condition, we only focus on the row vectors of with a zero (or very small) element.Definition 6. A set of vectors in is called strongly boundary close if it is boundary close, and there exists a and a numbering of the elements in the vectors such that for all and there are vectors from that fulfil the following:
(1) for all ; and(2), where is the “condition number” of the matrix defined as the ratio between the largest and smallest singular values [10, page 81], and is a projection matrix that picks the last element of a vector in .
Theorem 4. If is strongly boundary close, then is unique.The proof is quite technical and is therefore given in the Appendix. The most important thing to notice is that the necessary condition in Theorem 3 and the sufficient conditions in Theorem 4 are very similar. The first item in the strongly boundary close definition states that there have to be several vectors with small value. The second item ensures that the vectors with small value are linear independent in the last elements.

5. Uniqueness of R

In this section, a condition for unique is analyzed. First, Example 3 is used to investigate when a strongly boundary close and pair is unique. The section ends with a constraint for and that results in a unique NMF.

Example 3. This is an investigation of uniqueness of when and are given as where . Both and are strongly boundary close and the parameter can be calculated as The equation above shows that small will result in a close to one and an close to one results in a large . In Figure 2, the matrix is plotted for . The dashed line is the desired solution and is repeated in all figures. It is seen that the shaded area is decreasing when increases, and that the solid border increases when increases. For all -values, both the shaded area and the solid border intersect with the dashed triangle. Therefore, it is not possible to get another solution by simply increasing/decreasing the desired solution. The figure shows that the NMF is unique for and not unique for where the alternative solution is shown by a dotted line. That the NMF is not unique for can also be verified by selecting the to be the symmetric orthonormal matrix and see that both and are nonnegative. If , then the matrix is given by This shows that needs no zeros for the NMF to be unique. This completes the example.

In the example above, equals and thereby fulfils the same constraints. In many applications, the meaning of and differs, for example, in music analysis where the column vectors of are spectra of notes and is a note activity matrix [11].

Next, it is investigated how to make an asymmetric uniqueness constraint.

Definition 7. A set of vectors in is called sufficiently spread if for all and , there is an element such that

Note that in the definition for sufficiently spread set the th element is larger than the sum in contrast to the strongly boundary close definition where the th element is smaller than the sum.

Lemma 3. The dual space of a sufficiently spread set is the positive orthant.

Proof. A sufficiently spread set is nonnegative and the positive orthant is therefore part of the dual set for any sufficiently spread set. Let be a vector with a negative element in the th element and select In any sufficiently spread set, an exists, such that and therefore The is therefore not in the dual to any sufficiently spread set.

In the case of finite sets, the sufficiently spread condition is the same as the requirement for a scaled version of all the standard basis vectors to be part of the sufficiently spread set. It is easy to verify that a sufficiently spread set also is strongly boundary close and that the parameter is one.

Theorem 5. If a pair is sufficiently spread and strongly boundary close, then the NMF of is unique.

Proof. Lemma 3 states that the dual set of a sufficiently spread set is the positive orthant, Theorem 4 states that is unique and by using (16) and Theorem 1 we conclude that is unique.

Theorem 5 is a stronger version of the results of Donoho and Stodden [6, Theorem 1]. Theorem 1 in [6] also assumes that is sufficiently spread, but the condition for is stronger than the strongly boundary close assumption.

6. Perturbation Analysis

In the previous sections, we have analyzed situations with a unique solution. In this section, it is shown that in some situations the nonuniqueness can be seen as estimation noise on and . The error function that describes how close an estimated pair is to the true pair is where is a permutation matrix and is a diagonal matrix.

Theorem 6. Let be a unique NMF. Given some , there exists a such that any nonnegative , where fulfils where

The proof is given in the appendix. The theorem states that if the observation is corrupted by additive noise, then it will result in noisy estimation of and . Moreover, Theorem 6 shows that if the noise is small, it will result in small estimation errors. In this section, the Frobenius norm is used in (17) and (19) to make Theorem 6 concrete. Theorem 6 is also valid with the same proof if any continuous metric is used instead of the Frobenius norm in those equations. Example 4. This example investigates the connection between the additive noise in and the estimation error on and . The column vectors in are basis pictures of a man, a dog, and the sun as shown in Figures 3(a), 3(b), and 3(c). In Figure 3(d), the sum of the three basis pictures is shown. The matrix is the set of all combinations of the pictures, that is, Theorem 5 can be used to conclude that the NMF of is unique because both and are sufficiently spread and thereby also strongly boundary close.


In the example, two different noise matrices, and , are used. The matrix models noisy observation and has elements that are random uniform i.i.d. The matrix contains elements that are minus one in the positions where has elements that are two and zero elsewhere, that is, is minus one in the positions where the dog and the man are overlapping. In this case, the error matrix simulates a model mismatch that occurs in the following two types of real-world data. If the data set is composed of pictures, the basis pictures will be overlapping and a pixel in will consist of one basis picture and not a mixture of the overlapping pictures. If the data is a set of amplitude spectra, the true model is an addition of complex values and not an addition of the amplitudes.
The estimation error of the factorization is plotted in Figure 4 when the norm of the error matrix is , that is, . An estimate of the pair is calculated by using the iterative algorithm for Frobenius norm minimized by Lee and Seung [2]. The algorithm is run for 500 iterations and is started from 100 different positions. The decomposition that minimizes is chosen, and is calculated numerically. Figure 4 shows that when the added error is small, it is possible to estimate the underlying parameters. When the norm of added noise matrix increases, the behavior of the two noise matrices, and , differ. For , the error of the estimate increases slowly with the norm of the added matrix while the estimation error for increases dramatically when the norm is larger than . In the simulation, we have made the following observation that can explain the difference in the performance of the two types of noise. When is used, the basis pictures remain noisy versions of the man, the dog, and the sun. When is used and the norm is larger than , the basis pictures are the man excluding the overlap, the dog excluding the overlap, and the overlap of man and dog. Another way to describe the difference is that the rank of is one and the disturbance is in one dimension, where is full rank and the disturbance is in many dimensions. This completes the example.
Corollary 1. Let be a unique NMF and , where and . Given and there exists a such that if the largest absolute value of both and is smaller than , then where are any NMF of .Proof. This follows directly from Theorem 6. The corollary can be used in situations where there are small elements in and but no (or not enough) zero elements—as in the following example.

Example 5. Let , where is generated as in Example 3. Let all elements in both and be equal to . In Figure 5, is plotted when and . In this example, neither the shaded area nor the solid border intersect with the desired solution. Therefore, it is possible to get other solutions by simply increasing/decreasing the desired solution. For , the corners of the solutions are close to the corners of the desired solution. When , the corners can be placed mostly on the solid border and still form a triangle that contains the shaded area. When , the corners can be anywhere on the solid border. This completes the example.

7. Probability and Uniqueness

In this section, the row vectors of and the column of are seen as results of two random variables. Characteristics of the sample space (the possible outcome) of a random variable that lead to unique NMF will be investigated.

Theorem 7. Let the row vectors of be generated by the random variable and let the column vectors of be generated by a random variable . If the sample space of is strongly boundary close and the sample space of is sufficiently spread, then for all and , there exist and such that where is any matrix such that and are nonnegative and the data size is such that and .

Proof. If the data is scaled, , it does not change the nonuniqueness of the solutions when measured by the matrix. The proof is therefore done on the normalized versions of and . Let and be the normalized version of and . There exist finite sets and of vectors in the closure of and that are strongly boundary close and sufficiently spread. By Theorem 5, it is known that is unique. By increasing the number of vectors sampled from and , for any , there will be two subsets of the vectors, and , that with a probability larger that any will fulfil It is possible to use Corollary 1 on this subset. The fact that limiting is equivalent to limiting (21) when the vectors are normalized concludes the proof.

Example 6. Let all the elements in be exponential i.i.d. and therefore generated with a sufficiently spread sample space. Additionally, let each row in be exponential i.i.d. plus a random vector with the sample space and thereby strongly boundary close. In Figure 6, the above variables are shown for the following four matrix sizes . This completes the example.

8. Discussion

The approach in this paper is to investigate when nonnegativity leads to uniqueness in connection with NMF, . Nonnegativity is the only assumption for the theorems, and the theorems therefore cannot be used as argument for an NMF to be nonunique if there is additional information about or . An example with stronger uniqueness results is the sparse NMF algorithm of Hoyer [12] built on the assumption that the row vectors in have known ratios between the norm and the norm. Theis et al. [13] have investigated uniqueness in this situation and shown strong uniqueness results. Another example is data matrices with an added constant on each row. For this situation, the affine NMF algorithm [14] can make NMF unique even though the setup violates Theorem 3 in this paper.

As shown in Figure 4, the type of noise greatly influences on the error curves. In applications where noise is introduced because the additive model does not hold as, for example, when is pictures or spectra, it is possible to influence the noise by making a nonlinear function on the elements of . Such a nonlinear function is introduced in [15] and experiments show that it improves the results. A theoretical framework for finding good nonlinear functions will be interesting to investigate.

The sufficiently spread condition defined in Section 5 has an important role for unique NMF due to Lemma 3. The sufficiently spread assumption is seen indirectly in related areas where it also leads to unique solutions, for example, in [7] where the groundedness assumption leads to variables with a sufficiently spread sample space. If the matrix is sufficiently spread, then the columns in will occur (almost) alone as columns in . Deville [16] uses the “occur alone” assumption, and thereby sufficiently spread assumption, to make blind source separation possible.

9. Conclusion

We have investigated the uniqueness of NMF from three different viewpoints as follows:

(i)uniqueness in noise free situations;(ii)the estimation error of the underlying model when a matrix with unique NMF is added with noise; and(iii)the random processes that lead to matrices where the underlying model can be estimated with small errors. By doing this, we have shown that it is possible to make many novel and useful characterizations that can be used as theoretical underpinning for using the numerous NMF algorithms. Several open issues can be found in all the three viewpoints that, if addressed, will give a better understanding of nonnegative matrix factorization.

Appendix

Proof of Theorem 4. The theorem state that is a unique NMF. To proof this, it is shown that the condition for Theorem 1 is fulfilled. The positive orthant is self-dual () and thereby , where is an -order simplicial cone that contains . Let the set of row vectors in be denoted by . An -order simplicial cone, like , is a closed set and it therefore needs to contain the closure of denoted by . The two items in Definition 6 of strongly boundary close can be reformulated for that contains the border:
(1) for all ,(2)the vectors are linearly independent. The rest of the proof follows by induction. If , then and is therefore unique. Let therefore . Then linearly independent vectors in have zero as the first element, and of the basis vectors therefore need to have zero in the first element. In other words, there is only one basis vector with a nonzero first element. Let us call this vector . For all there is a vector in which is nonnegative in the first element and zero in the th element, so all the elements in except the first have to be zero. The proof is completed by seeing that if the first element is removed from the vectors in , it is still strongly boundary close and the problem is therefore the dimensional problem.

Proof of Theorem 6. Let be the open set of all pairs that are close to and , Let be the set of all nonnegative pairs that are not in and where . The uniqueness of ensures that for all . The fact that the Frobenius norm is continuous, is a closed bounded set, and the statement above is positive ensures that since a continuous function attains its limits on a closed bounded set [17, Theorem 4.28]. The pairs that are not in and where can either be transformed by a diagonal matrix into a matrix pair from , , having the same product or it can be transformed into a pair where both and have large elements, that is, and thereby .
Select to be The error of the desired solution can be bounded by . Let be any matrix constructed by a nonnegative matrix pair not from . Because of the way is selected, . By the triangle inequality, we get All solutions that are not in therefore have a larger error than and will not be the minimizer of the error.

Acknowledgments

This research was supported by the Intelligent Sound project, Danish Technical Research Council Grant no. 26-02-0092. The work of M. G. Christensen is supported by the Parametric Audio Processing project, Danish Research Council for Technology, and Production Sciences Grant no. 274-06-0521. Part of this work was previously presented at a conference [18].