Abstract

Nonnegative matrix factorization (NMF) is a popular method for the multivariate analysis of nonnegative data. It involves decomposing a data matrix into a product of two factor matrices with all entries restricted to being nonnegative. Orthogonal nonnegative matrix factorization (ONMF) has been introduced recently. This method has demonstrated remarkable performance in clustering tasks, such as gene expression classification. In this study, we introduce two convergence methods for solving ONMF. First, we design a convergent orthogonal algorithm based on the Lagrange multiplier method. Second, we propose an approach that is based on the alternating direction method. Finally, we demonstrate that the two proposed approaches tend to deliver higher-quality solutions and perform better in clustering tasks compared with a state-of-the-art ONMF.

1. Introduction

Nonnegative matrix factorization (NMF) has been investigated by many researchers, such as Paatero and Tapper [1]. However, it was only through the works of Lee and Seung published in and [2, 3] that this method has gained popularity. Based on the argument that nonnegativity is crucial for human perception, they proposed a simple algorithm to find nonnegative representations of nonnegative data and images. The basic NMF is a technique that decomposes a nonnegative data matrix into a pair of other nonnegative matrices with lower ranks: where denotes a nonnegative data matrix. and denote the basis matrix and coefficient matrix, respectively. denotes the number of factors that is usually chosen so that . Recently, NMF has proven to be useful for many applications in pattern recognition, multimedia, text mining, and clustering, as well as environment and DNA gene expression classifications [412].

In solving the NMF problem, the update rules given by Lee and Seung [3] take a multiplicative form and satisfy the nonnegative constraint implicitly and elegantly. The extension of the multiplicative updates provided by them [3] can be found, for instance, in the works of Sha et al. [10], whereby a multiplicative update rule is proposed to solve a nonnegative quadratic programming problem. However, the slow convergence of the multiplicative updating rule has been pointed out, and more efficient algorithms equipped with more powerful theoretical convergence properties have been introduced. These efficient algorithms are based on either the alternating nonnegative least squares (ANLS) framework or the hierarchical alternating least squares (HALS) method.

Recently, other forms of update rules have been studied to increase the speed of convergence. For instance, a quasi-Newton method for NMF was considered by Zdunek and Cichocki [13], and a projected gradient method was proposed by Hoyer and Lin [6, 14].

There has also been significant interest in orthogonal nonnegative matrix factorization (ONMF), which imposes orthogonality constraints on the factor matrix [1518]. It has been proven that ONMF is equivalent to K-means clustering [16, 18]. Moreover, as a result of the nonnegativity and orthogonality constraints, the orthogonal factor matrix is naturally sparse. In such cases, the factorization is essentially unique, and we can ignore the permutation problem.

Ding et al. and Pompili et al. [16, 18] proposed two different kinds of methods to solve the ONMF problem. In a study conducted by Ding et al. [16], ONMF algorithms strictly enforce nonnegativity for each iterate while trying to achieve orthogonality at the limit. This can be done using a proper penalization term [16], a projection matrix formulation [19], or choosing a suitable search direction [15]. In a study conducted by Pompili et al. [18], the authors proposed a method working the opposite way: at each iteration, a projected gradient scheme is used to ensure that the orthogonal factor iterates are orthogonal but not necessarily nonnegative. However, these algorithms still are not convergent algorithms for ONMF. In this study, we first briefly introduce NMF and ONMF algorithms. Next, based on the Lagrange multiplier, we design a convergence algorithm, which is a method similar to the algorithm presented by Lin [20]. Because NMF algorithms based on the alternating direction method are more efficient than multiplicative update algorithms, we propose another convergence algorithm for solving the ONMF problem by combining the ADM approach with our convergence algorithm. Experiments on two grayscale images and the ALLAML gene-sample data demonstrate that Algorithms 2 and 3 tend to deliver higher -quality solutions and perform better in a clustering task compared with Algorithm 1.

Input: and .
Output: Factors , which solve problem (18).
(1) Select initial matrix and .
(2) For k = 1 : maxIt
(2.1)  Calculate the factor matrix using the following formula:
   
(2.2)  Calculate the factor matrix using the following formula:
   
  End Do
Input: and .
Output:, which solve the optimization problem (1.1).
(1) Select initial matrices and .
(2) For k = 1:maxIt
(2.1)  Calculate the factor matrix using the following formula:
   
(2.2)  Calculate the factor matrix using the following formula:
   
(2.3)  Calculate the factor matrix using the following formula:
   
  End Do
Input: and .
Output:, which solve the optimization problem (1.1).
(1) Select initial matrices and .
(2) For k = 1:maxIt
  Update by formulas (62)–(67)
  If a stopping criterion is met, then
  exit and output
  end if
  End Do

The remainder of this article is organized as follows: Section 2 gives a brief review of NMF and ONMF algorithms. In Section 3, an improved version of the original ONMF algorithm and the convergence analysis is provided and another new ONMF algorithm based on the alternating least squares (ALS) approach is provided. Section 4 is devoted to numerical experiments. Some concluding remarks are provided in Section 5.

In this section, we provide a brief review of NMF and ONMF algorithms.

2.1. Nonnegative Matrix Factorization

NMF aims to find a nonnegative matrix and another nonnegative matrix whose product approximate a given nonnegative matrix :

Because the NMF problem is not convex in both and , various algorithms have been proposed. The first method for solving the equation (2) is the ALS algorithm utilized by Paatero and Tapper in 1994. It minimizes the least squares cost function with respect to either or , one at a time, by fixing the other and disregarding nonnegativity; after each least squares step, the algorithm sets all negative entries to zero. The scheme can be updated as follows:where denotes the Moore–Penrose pseudoinverse.

Another popular method for NMF is the multiplicative updating method proposed by Lee and Seung. It can be expressed as follows:where and denote elementwise multiplication and division, respectively. When they are started from nonnegative initial estimates, iterates will remain nonnegative throughout the iterations. By all indications, this algorithm appears to be the most widely used solution method in NMF by far.

The modified update rule given in equations (7) and (8) was first proposed by Gillis and Glineur [21]; it can be expressed as follows:where denotes a small number. They proved that if a sequence of solutions generated by (7) and (8) has a limit point, then this point is necessarily a stationary point of the following optimization problem:

2.2. Original ONMF Algorithms

In an ONMF, an additional orthogonal constraint, , is imposed on the NMF. We briefly review the first ONMF algorithm proposed by Ding et al. [16] and reveal the problem behind ONMF.

The goal of ONMF is to find a nonnegative orthogonal matrix and a nonnegative matrix by minimizing the following objective function using a Lagrangian multiplier . Because it is not easy to solve this problem for every value of , Ding et al. [16] ignored the nonnegativity and relied on to obtain a unique value of :

The KKT conditions for the objective in equation (10) are as follows:with

Using the multiplicative updating method, the update rules of and are derived as follows:

The problem with this algorithm is that it is difficult to determine . By summing over the index , the authors found an exact formulation for the diagonal entries:

The off-diagonal entries are obtained by ignoring the nonnegativity constraint on and setting to a zero matrix:

Equation (15) is derived from equation (10) using the fact that and from equation (16) using the orthogonality constraint . By combining equations (14) and (16), can be defined as follows:

Note that the formulation with the specific value of does not strictly satisfy orthogonality. Therefore, ONMF algorithms prioritize the approximation while relaxing the degree of orthogonality.

To improve the approximation, Mirzal [22] proposed the following cost function:

Using the same strategy as in the aforementioned algorithm, the update forms are derived as follows:

Next, the author proposed an equivalent algorithm with a robust convergence guaranteed by (1) transforming the algorithm into a corresponding AUR algorithm and (2) replacing every zero entry that does not satisfy the KKT conditions with a small positive number to prevent zero locking. This strategy was employed to derive a convergent algorithm for the ONMF.

The AUR version of equation (18) can be expressed as follows:where denotes the th entry of matrix . Additionally, the author proposed an equally robust AUR algorithm (see Algorithm 1) for equation (18), where denotes a limit maxtime of iteration, represents the gradient of the objective function (18), and

The author proved the convergence of this algorithm. However, this algorithm requires a high computational cost. The algorithm that is based on the alternated application of these rules in equations (5) and (6) is not guaranteed to converge to a first-order stationary point. However, a slight modification proposed by Lin [20] achieves this property (generally speaking, MU is recast as a variable metric steepest descent method, and the step length is modified accordingly). In another study [23], the authors demonstrated through numerical experiments that the update rules in equations (7) and (8) work better than the original update rules in equations (5) and (6) in some cases. This indicates that equations (7) and (8) are important not only from a theoretical point of view but also from a practical viewpoint. Therefore, in the next section, we propose convergence updates based on the alternated approach.

3. Alternating Direction Algorithm for ONMF

3.1. Classic ADM Approach

ADM is an algorithm that is intended to blend the decomposability of dual ascent with the superior convergence properties of the multipliers method. The algorithm solves problems in the form.with variables and , where , , and . We assume that and are convex. The optimal value of equation (22) will be denoted by

As in the method of ADM, we form the augmented Lagrangian as follows:

ADM consists of the following iterations:where and represent a step length. The algorithm is similar to dual ascent and the method of multipliers. It consists of an x-minimization step (26), a z-minimization step (27), and a dual variable update (27). As in the method of multipliers, the dual variable update uses a step size that is equal to the augmented Lagrangian parameter . Conversely, the classic augmented Lagrangian multiplier method requires a joint minimization with respect to both and , i.e., it replaces steps (26) and (27) withwhich involves both and and could become much more expensive. In Section 3.2, we propose improved algorithms for ONMF and present the convergence analysis. We discuss the orthogonality constraint on the rows of . Similar results can be derived for .

3.2. Classic ADM Approach Extension to ONMF

First, we define the objective of the ONMF as follows:such that .

The AUR version of Problem (18) can be modified as follows:

Inspired by the approach proposed by Gillis and Glineur [23] and Yoo and Choi [24], we employ these strategies to devise a convergent algorithm for NMF with an orthogonality constraint, which is Problem (4.1). The modified AUR version of Problem (4.1) can be modified as follows:where denotes a small number and and represent a step length.stand for the modifications for avoiding zero locking with a small positive number and

The detailed procedure for the ALS-based version of Problem (4.1) can be seen in Algorithm 2.

For Algorithm 2, we obtain the following basic result:

Lemma 1. For any natural number , if and , then and .

Proof. This statement for is similar to the proof for . Therefore, we only prove the statement for . It is obvious for ; hence, we only need to prove the results for .

Case 1. For ,With the nonnegativity of , and ,Because

Case 2. For ,Because and , equation (27) holds.
Similarly, the statement of can be proven.

3.2.1. Convergence Analysis

To analyze the convergence of Algorithm 2, we first prove its nonincreasing property under the update rules in equations (25) and (26), i. e.,

Because the algorithm solves Problem (1.1) in an alternating method, and can be analyzed separately. Because the update rule of is similar to that of , it is sufficient to prove the nonincreasing property of .

By making use of the auxiliary function approach [3], the nonincreasing property of can be proved by showing thatwhere the auxiliary function is defined in a similar function by [22]where denotes the th column of . Meanwhile, we definewhere denotes the th column of , andwhere denotes a diagonal matrix with its diagonal entries

Here, denotes the set of non-KKT indices in the th column of .

Using the Taylor series, the expansion formulation for can be expressed as follows:where and represent the higher components of the Taylor series, andwhere denotes the th column of . Next, for the function , which is an auxiliary of , we first have to prove the following Lemma 2. As shown later, and must be bounded for . The boundedness of and will be proven in Theorem 3.

Lemma 2. Given sufficiently large and the boundedness of and, then(1).(2).(3). Moreover, only if satisfies the KKT conditions.(4), and the equality holds only if satisfies the KKT conditions.

Proof. (1) and (2) are obvious from equation (41).
From (3), because , we havewhere and are diagonal matrices that sum up to withLet ; then,With the boundedness of and and using sufficiently large , the inequality can be guaranteed.
Additionally, it is clear that only if satisfies the KKT conditions demonstrated in equation (48) regardless of . Because is a variable, holds only if in equation (51) as a result of the update rule of and the boundedness of and .
(4) Usingand the positive semidefinite of matrix , is a strictly convex function with respect to ; it has a unique minimum that satisfiesThe aforementioned consideration results in the following:If satisfies the KKT conditions, then , and therefore, .
Conversely, assuming that the equality holds, but does not satisfy the KKT conditions, there is at least an index such thatFurthermore, if , then , which contradicts the equality assumption. Therefore, . As a result,which contradicts the assumption.

Theorem 1. Given a sufficiently large number and supposing that and are bounded, then for the auxiliary function ,(1) under the update rule in equation (4) and the equality holds only if satisfies the KKT conditions(2) under the update rule in equation (3) and the equality holds only if satisfies the KKT conditions(3) and the equality holds only if is a stationary point

Proof. (1) and (2) can be obtained directly from Lemma 1. (3) By combining (1) and (2), it is obvious thatwith only if and satisfy the KKT conditions, i.e., is a stationary point.

Theorem 2. The objective function (22) is nonincreasing under the multiplicative update rules in equations (25) and (26).

Proof. This theorem is the corollary of Theorem 1.

Lemma 3. Sequence generated by Algorithm 2 is bounded and is lower bounded.

Proof. Let us verify Assumptions A1–A3 (assumptions A1–A3 can be found in the works of [23]). Assumption A1 holds because of the coercivity of . Assumptions A2 and A3 hold because all the coefficient matrices are identity matrices. Additionally, because is Lipschitz differentiable, generated by Algorithm 3 is bounded, and is lower bounded according to Theorem 2 presented in the works of [23].
The convergence property of the modified update rules in equations (27) and (29) is stated as follows:

Theorem 3. Given a sufficiently large number , every limit point of this algorithm is a stationary point of the following optimization problem:

Proof. Because are in a compact set, the sequence generated by Algorithm 2 has at least one limit point. Gillis and Glineur [23] introduced a similar analysis that every limit point of this algorithm is a stationary point of the optimization problem (41).

3.3. Auxiliary Variable-Based ADM Extension to ONMF

To facilitate the efficient use of ADM, we first introduce two auxiliary variables, and , and consider the following equivalent model:where and . The augmented Lagrangian of equation (59) is as follows:where are Lagrangian multipliers, are penalty parameters, and for matrices and of the same size.

The alternating direction method for equation (59) is derived by successively minimizing with respect to , one at a time, while fixing others at their more recent values, i. e.,and then updating the multipliers and . The introduction of the two auxiliary variables, and , makes it easy to carry out each of the alternating minimization steps. Specifically, these steps can be expressed in closed form as follows:

The detailed procedure for the algorithm of Problem 59 can be seen in Algorithm 3.

4. Experiments and Results

In this section, we analyze and compare Algorithms 2 and 3 with Algorithms 1 and 4 (ONP-MF [18]). All experiments are performed using the Windows 7 operating system and MATLAB v8.1 (R2013a) running on a Lenovo laptop with an Intel Core(TM)i5-3470 CPU at 3.2 GHz and 4 GB of memory.

4.1. Test on Images

To visualize the results, we use examples of ONMF arising in image processing to analyze the quality of the solutions. Two grayscale images are tested using Algorithms 13 and 4. The two grayscale images, a panda in Figure 1(a) and a cat in Figure 1(b), have resolutions of and , respectively.

We evaluate the degree of approximation and orthogonality using two indices:

We set , , and , and maxiter = 500 for the image of the cat and maxiter = 100 for the image of the panda, after which we repeat the process 10 times and calculate the relative errors and orthogonality. The results are given in Tables 15, and the recovered images are shown in Figures 2 and 3.

Tables 1 and 2 show that Algorithm 2 performs slightly better than Algorithms 1 and 3 in terms of relative errors in most cases. Moreover, we observe that Algorithm 4 converges in least iterations. From Tables 3 and 4, we can see that Algorithm 2 demonstrates the best performance compared with Algorithms 1, 3 and 4 in terms of orthogonality. From Table 5, we observe that Algorithm 4 requires the lowest computational cost compared with the rest of algorithms. From Figures 2 and 3, we observe that Algorithm 2 achieves a better recovery quality of images compared with Algorithms 1 and 3, and the recovery quality of Algorithm 4 is the best.

4.2. Clustering Capability

We test the clustering capability of the three ONMF algorithms on the ALLAML gene-sample data provided by [4] and compare the performance of the algorithms. To measure the clustering performance, we use purity and entropy as our performance metrics. We expect that these metrics will provide us with good insights on how our algorithm works.

Entropy measures how classes of genes are distributed within each cluster [25]. It is defined as follows:where denotes the number of samples, denotes the number of samples in the th cluster that belong to the th class, and denotes the size of the th cluster.

Purity is the most commonly used metric. It measures the extent to which each cluster contains data points from primarily one class [25]. Therefore, the higher the value of purity, the better the clustering performance. Purity is defined as follows:

Algorithms 2 and 3 are compared with Algorithms 1 and 4, and the comparisons are shown in Tables 6 and 7. From Tables 6 and 7, we observe that our Algorithms 2 and 3 clustering achieves better purity and entropy results on the leukemia data sets compared with the performance of Algorithms 1 and 4. There exist slight differences in the relative performance of purity and entropy in our comparison, that is, higher purity values do not necessarily correspond to lower entropy values. This is because entropy takes into account the entire distribution of the genes in a particular cluster and not only the largest class as in the computation of purity. In summary, the comparison shows that Algorithms 2 and 3 are viable and competitive in clustering tasks.

5. Conclusion

In this study, we develop two efficient modified algorithms based on the ADM algorithms for the ONMF problem. We then prove that any sequence of solutions generated by the modified updates of Algorithms 2 and 3 has at least one convergent subsequence, and the limit of any convergent subsequence is a stationary point of the corresponding optimization problem. From the numerical results, a notable observation is that our Algorithms 2 and 3 are feasible and efficient, especially in the properties of the solution, that is, the orthogonality.

The framework for the convergence analysis of the updates for Algorithm 3 presented in this article may be applied to the ADM algorithms for NMF [26].

Data Availability

No data were used to support this study.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

This work was partially supported by the Xijing University School Fund Project (no. XJ200102), the NSF of Shannxi Province (no. 20JK0963), and the NSF of China under Grant (no. 11171270).