Research Article  Open Access
Rafal Zdunek, Andrzej Cichocki, "Fast Nonnegative Matrix Factorization Algorithms Using Projected Gradient Approaches for LargeScale Problems", Computational Intelligence and Neuroscience, vol. 2008, Article ID 939567, 13 pages, 2008. https://doi.org/10.1155/2008/939567
Fast Nonnegative Matrix Factorization Algorithms Using Projected Gradient Approaches for LargeScale Problems
Abstract
Recently, a considerable growth of interest in projected gradient (PG) methods has been observed due to their high efficiency in solving largescale convex minimization problems subject to linear constraints. Since the minimization problems underlying nonnegative matrix factorization (NMF) of large matrices well matches this class of minimization problems, we investigate and test some recent PG methods in the context of their applicability to NMF. In particular, the paper focuses on the following modified methods: projected Landweber, BarzilaiBorwein gradient projection, projected sequential subspace optimization (PSESOP), interiorpoint Newton (IPN), and sequential coordinatewise. The proposed and implemented NMF PG algorithms are compared with respect to their performance in terms of signaltointerference ratio (SIR) and elapsed time, using a simple benchmark of mixed partially dependent nonnegative signals.
1. Introduction and Problem Statement
Nonnegative matrix factorization (NMF) finds such nonnegative factors (matrices) and with a that , given the observation matrix , the lowerrank , and possibly other statistical information on the observed data or the factors to be estimated.
This method has found a variety of realworld applications in the areas such as blind separation of images and nonnegative signals [1β6], spectra recovering [7β10], pattern recognition and feature extraction [11β16], dimensionality reduction, segmentation and clustering [17β32], language modeling, text mining [25, 33], music transcription [34], and neurobiology (gene separation) [35, 36].
Depending on an application, the estimated factors may have different interpretation. For example, Lee and Seung [11] introduced NMF as a method for decomposing an image (face) into partsbased representations (parts reminiscent of features such as lips, eyes, nose, etc.). In blind source separation (BSS) [1, 37, 38], the matrix represents the observed mixed (superposed) signals or images, is a mixing operator, and is a matrix of true source signals or images. Each row of or is a signal or 1D image representation, where is a number of observed mixed signals and is a number of hidden (source) components. The index usually denotes a sample (discrete time instant), where is the number of available samples. In BSS, we usually have , and is known or can be relatively easily estimated using SVD or PCA.
Our objective is to estimate the mixing matrix and sources subject to nonnegativity constraints of all the entries, given and possibly the knowledge on a statistical distribution of noisy disturbances.
Obviously, NMF is not unique in general case, and it is characterized by a scale and permutation indeterminacies. These problems have been addressed recently by many researchers [39β42], and in this paper, the problems will not be discussed. However, we have shown by extensive computer simulations that in practice with overwhelming probability we are able to achieve a unique nonnegative factorization (neglecting unavoidable scaling and permutation ambiguities) if data are sufficiently sparse and a suitable NMF algorithm is applied [43]. This is consistent with very recent theoretical results [40].
The noise distribution is strongly applicationdependent, however, in many BSS applications, a Gaussian noise is expected. Here our considerations are restricted to this case, however, the alternative NMF algorithms optimized to different distributions of the noise exist and can be found, for example, in [37, 44, 45].
NMF was proposed by Paatero and Tapper [46, 47] but Lee and Seung [11, 48] highly popularized this method by using simple multiplicative algorithms to perform NMF. An extensive study on convergence of multiplicative algorithms for NMF can be found in [49]. In general, the multiplicative algorithms are known to be very slowly convergent for largescale problems. Due to this reason, there is a need to search more efficient and fast algorithms for NMF. Many approaches have been proposed in the literature to relax these problems. One of them is to apply projected gradient (PG) algorithms [50β53] or projected alternating leastsquares (ALS) algorithms [33, 54, 55] instead of multiplicative ones. Lin [52] suggested applying the Armijo rule to estimate the learning parameters in projected gradient updates for NMF. The NMF PG algorithms proposed by us in [53] also address the issue with selecting such a learning parameter that is the steepest descent and also keeps some distance to a boundary of the nonnegative orthant (subspace of real nonnegative numbers). Another very robust technique concerns exploiting the information from the secondorder Taylor expansion term of a cost function to speed up the convergence. This approach was proposed in [45, 56], where the mixing matrix is updated with the projected Newton method, and the sources in are computed with the projected leastsquares method (the fixed point algorithm).
In this paper, we extend our approach to NMF that we have initialized in [53]. We have investigated, extended, and tested several recently proposed PG algorithms such as the oblique projected Landweber [57], BarzilaiBorwein gradient projection [58], projected sequential subspace optimization [59, 60], interiorpoint Newton [61], and sequential coordinatewise [62]. All the presented algorithms in this paper are quite efficient for solving largescale minimization problems subject to nonnegativity and sparsity constraints.
The main objective of this paper is to develop, extend, and/or modify some of the most promising PG algorithms to a standard NMF problem and to find optimal conditions or parameters for such a class of NMF algorithms. The second objective is to compare the performance and complexity of these algorithms for NMF problems, and to discover or establish the most efficient and promising algorithms. We would like to emphasize that most of the discussed algorithms have not been implemented neither used till now or even tested before for NMF problems, but they have been rather considered for solving a standard system of algebraic equations: (for only ) where the matrix and the vectors are known. In this paper, we consider a much more difficult and complicated problem in which we have two sets of parameters and additional constraints of nonnegativity and/or sparsity. So it was not clear till now whether such algorithms would work efficiently for NMF problems, and if so, what kind of projected algorithms is the most efficient? To our best knowledge only the LinPG NMF algorithm is widely used and known for NMF problems. We have demonstrated experimentally that there are several novel PG gradient algorithms which are much more efficient and consistent than the LinPG algorithm.
In Section 2, we briefly discuss the PG approach to NMF. Section 3 describes the tested algorithms. The experimental results are illustrated in Section 4. Finally, some conclusions are given in Section 5.
2. Projected Gradient Algorithms
In contrast to the multiplicative algorithms, the class of PG algorithms has additive updates. The algorithms discussed here approximately solve nonnegative least squares (NNLS) problems with the basic alternating minimization technique that is used in NMF:or in the equivalent matrix formwhere , , , , , , and usually . The matrix is assumed to be a fullrank matrix, so there exists a unique solution for any matrix since the NNLS problem is strictly convex (with respect to one set of variables ).
The solution to (1) satisfies the KarushKuhnTucker (KKT) conditions:or in the matrix notationwhere and are the corresponding gradient vector and gradient matrix:
Similarly, the KKT conditions for the solution to (2), and the solution to (4) are as follows:or in the matrix notation:where and are the gradient vector and gradient matrix of the objective function:
There are many approaches to solve the problems (1) and (2), or equivalently (3) and (4). In this paper, we discuss selected projected gradient methods that can be generally expressed by iterative updates: where is a projection of onto a convex feasible set , namely, the nonnegative orthant (the subspace of nonnegative real numbers), and are descent directions for and , and and are learning rules, respectively.
The projection can be performed in several ways. One of the simplest techniques is to replace all negative entries in by zerovalues, or in practical cases, by a small positive number to avoid numerical instabilities. Thus,However, this is not the only way to carry out the projection . It is typically more efficient to choose the learning rates and so as to preserve nonnegativity of the solutions. The nonnegativity can be also maintained by solving leastsquares problems subject to the constraints (6) and (10). Here we present the exemplary PG methods that work for NMF problems quite efficiently, and we implemented them in the Matlab toolboxm, NMFLAB/NTFLAB, for signal and image processing [43]. For simplicity, we focus our considerations on updating the matrix , assuming that the updates for can be obtained in a similar way. Note that the updates for can be readily obtained solving the transposed system , having fixed (updated in the previous step).
3. Algorithms
3.1. Oblique Projected Landweber Method
The Landweber method [63] performs gradientdescent minimization with the following iterative scheme:where the gradient is given by (8), and the learning rate . The updating formula assures an asymptotical convergence to the minimalnorm least squares solution for the convergence radius defined bywhere is the maximal eigenvalue of . Since is a nonnegative matrix, we have , where . Thus the modified Landweber iterations can be expressed by the formulaIn the obliqueprojected Landweber (OPL) method [57], which can be regarded as a particular case of the PG iterative formula (12), the solution obtained with (14) in each iterative step is projected onto the feasible set. Finally, we have Algorithm 1 for updating .

3.2. Projected Gradient
One of the fundamental PG algorithms for NMF was proposed by Lin in [52]. This algorithm, which we refer to as the LinPG, uses the Armijo rule along the projection arc to determine the steplengths and in the iterative updates (12). For the cost function being the squared Euclidean distance, and .
For computation of , such a value of is decided, on whichwhere is the first nonnegative integer that satisfies The parameters and decide about a convergence speed. In this algorithm we set , experimentally as default.
The Matlab implementation of the LinPG algorithm is given in [52].
3.3. BarzilaiBorwein Gradient Projection
The BarzilaiBorwein gradient projection method [58, 64] is motivated by the quasiNewton approach, that is, the inverse of the Hessian is replaced with an identity matrix , where the scalar is selected so that the inverse Hessian has similar behavior as the true Hessian in the recent iteration. Thus,In comparison to, for example, Lin's method [52], this method does not ensure that the objective function decreases at every iteration, but its general convergence has been proven analytically [58]. The general scheme of the BarzilaiBorwein gradient projection algorithm for updating is in Algorithm 2.

Since is a quadratic function, the line search parameter can be derived in the following closedform formula:
The Matlab implementation of the GPSRBB algorithm, which solves the system of multiple measurement vectors subject to nonnegativity constraints, is given in Algorithm 4 (see also NMFLAB).
3.4. Projected Sequential Subspace Optimization
The projected sequential subspace optimization (PSESOP) method [59, 60] carries out a projected minimization of a smooth objective function over a subspace spanned by several directions which include the current gradient and gradient from the previous iterations, and the Nemirovski directions. Nemirovski [65] suggested that convex smooth unconstrained optimization is optimal if the optimization is performed over a subspace which includes the current gradient , the directions , and the linear combination of the previous gradients with the coefficients . The directions should be orthogonal to the current gradient. Narkiss and Zibulevsky [59] also suggested to include another direction: , which is motivated by a natural extension of the conjugate gradient (CG) method to a nonquadratic case. However, our practical observations showed that this direction does not have a strong impact on the NMF components, thus we neglected it in our NMFPSESOP algorithm. Finally, we have Algorithm 3 for updating which is a single column vector of .


The parameter denotes the number of previous iterates that are taken into account to determine the current update.
The line search vector derived in a closed form for the objective function is as follows:The regularization parameter can be set as a very small constant to avoid instabilities in inverting a rankdeficient matrix in case that has zerovalue or dependent columns.
3.5. Interior Point Newton Algorithm
The interior point Newton (IPN) algorithm [61] solves the NNLS problem (1) by searching the solution satisfying the KKT conditions (5) which equivalently can be expressed by the nonlinear equationswhere , , andApplying the Newton method to (22), we have in the th iterative stepwhereIn [61], the entries of the matrix are defined byfor .
If the solution is degenerate, that is, and , the matrix may be singular. To avoid such a case, the system of equations has been rescaled to the following form:with for . In [61], the system (27) is solved by the inexact Newton method, which leads to the following updates:where , , and is a projection onto a feasible set .
The transformation of the normal matrix by the matrix in (27) means the system matrix is no longer symmetric and positivedefinite. There are many methods for handling such systems of linear equations, like QMR [66], BiCG [67, 68], BiCGSTAB [69], or GMRESlike methods [70], however, they are more complicated and computationally demanding than, for example, the basic CG algorithm [71]. To apply the CG algorithm the system matrix in (27) must be converted to a positivedefinite symmetric matrix, which can be easily done with normal equations. The methods like CGLS [72] or LSQR [73] are therefore suitable for such tasks. The transformed system has the formwith and .
Since our cost function is quadratic, its minimization in a single step is performed with combining the projected Newton step with the constrained scaled Cauchy step that is given in the formAssuming , is chosen as being either the unconstrained minimizer of the quadratic function or a scalar proportional to the distance to the boundary along , whereThuswhere with . For , the global convergence is achieved if , with
The usage of the constrained scaled Cauchy step leads to the following updates: with , and are given by (30) and (35), respectively, and is the smaller square root (laying in ) of the quadratic equation:
The Matlab code of the IPN algorithm, which solves the system subject to nonnegativity constraints, is given in Algorithm 5. To solve the transformed system (32), we use the LSQR method implemented in Matlab 7.0.

3.6. Sequential CoordinateWise Algorithm
The NNLS problem (1) can be expressed in terms of the following quadratic problem (QP) [62]:wherewith and .
The sequential coordinatewise algorithm (SCWA) proposed first by Franc et al. [62] solves the QP problem given by (41) updating only single variable in one iterative step. It should be noted that the sequential updates can be easily done, if the function is equivalently rewritten aswhere , and
Considering the optimization of with respect to the selected variable , the following analytical solution can be derived:
Updating only single variable in one iterative step, we haveAny optimal solution to the QP (41) satisfies the KKT conditions given by (5) and the stationarity condition of the following Lagrange function:where is a vector of Lagrange multipliers (or dual variables) corresponding to the vector . Thus, . In the SCWA, the Lagrange multipliers are updated in each iteration according to the formulawhere is the th column of , and .
Finally, the SCWA can take the following updates:
4. Simulations
All the proposed algorithms were implemented in our NMFLAB, and evaluated with the numerical tests related to typical BSS problems. We used the synthetic benchmark of 4 partially dependent nonnegative signals (with only samples) which are illustrated in Figure 1(a). The signals are mixed by random, uniformly distributed nonnegative matrix with the condition number . The matrix is displayed inThe mixing signals are shown in Figure 1(b).
(a)
(b)
Because the number of variables in is much greater than in , that is, and , we test the projected gradient algorithms only for updating . The variables in are updated with the standard projected fixed point alternating least squares (FPALS) algorithm that is extensively analyzed in [55].
In general, the FPALS algorithm solves the leastsquares problemwith the MoorePenrose pseudoinverse of a system matrix, that is, in our case, the matrix . Since in NMF usually , we formulate normal equations as , and the leastsquares solution of minimal norm to the normal equations is where is the MoorePenrose pseudoinverse of . The projected FPALS algorithm is obtained with a simple βhalfrectifiedβ projection, that is,
The alternating minimization is nonconvex in spite of the cost function being convex with respect to one set of variables. Thus, most NMF algorithms may get stuck in local minima, and hence, the initialization plays a key role. In the performed tests, we applied the multistart initialization described in [53] with the following parameters: (number of restarts), (number of initial alternating steps), and (number of final alternating steps). Each initial sample of and has been randomly generated from a uniform distribution. Each algorithm has been tested for two cases of inner iterations, that is, with and . The inner iterations mean a number of iterative steps that are performed to update only (with fixed , i.e., before going to the update of and vice versa). Additionally, the multilayer technique [53, 54] with 3 layers () is applied.
The multilayer technique can be regarded as multistep decomposition. In the first step, we perform the basic decomposition using any available NMF algorithm, where and with . In the second stage, the results obtained from the first stage are used to perform the similar decomposition: , where and , using the same or different update rules, and so on. We continue our decomposition taking into account only the last achieved components. The process can be repeated arbitrary many times until some stopping criteria are satisfied. In each step, we usually obtain gradual improvements of the performance. Thus, our model has the form with the basis matrix defined as . Physically, this means that we build up a system that has many layers or cascade connection of mixing subsystems.
There are many stopping criteria for terminating the alternating steps. We stop the iterations if or the following condition is held, where stands for the number of alternating step, and . Note that the condition (20) can be also used as a stopping criterion, especially as the gradient is computed in each iteration of the PG algorithms.
The algorithms have been evaluated with the signaltointerference ratio (SIR) measures, calculated separately for each source signal and each column in the mixing matrix. Since NMF suffers from scale and permutation indeterminacies, the estimated components are adequately permuted and rescaled. First, the source and estimated signals are normalized to a uniform variance, and then the estimated signals are permuted to keep the same order as the source signals. In NMFLAB [43], each estimated signal is compared to each source signal, which results in the performance (SIR) matrix that is involved to make the permutation matrix. Let and be the th source and its corresponding (reordered) estimated signal, respectively. Analogically, let and be the th column of the true and its corresponding estimated mixing matrix, respectively. Thus, the SIRs for the sources are given byand similarly for each column in we have
We test the algorithms with the Monte Carlo (MC) analysis, running each algorithm 100 times. Each run has been initialized with the multistart procedure. The algorithms have been evaluated with the meanSIR values that are calculated as follows:for each MC sample. The meanSIRs for the worst (with the lowest meanSIR values) and best (with the highest meanSIR values) samples are given in Table 1. The number means the number of inner iterations for updating , and denotes the number of layers in the multilayer technique [53, 54]. The notation means that the multilayer technique was not used. The elapsed time [in seconds] is measured in Matlab, and it informs us in some sense about a degree of complexity of the algorithm.

For comparison, Table 1 contains also the results obtained for the standard multiplicative NMF algorithm (denoted as MNMF) that minimizes the squared Euclidean distance. Additionally, the results of testing the PG algorithms which were proposed in [53] have been also included. The acronyms LinPG, IPG, RMRNSD refer to the following algorithms: projected gradient proposed by Lin [52], interiorpoint gradient, and regularized minimal residual norm steepest descent (the regularized version of the MRNSD algorithm that was proposed by Nagy and Strakos [74]). These NMF algorithms have been implemented and investigated in [53] in the context of their usefulness to BSS problems.
5. Conclusions
The performance of the proposed NMF algorithms can be inferred from the results given in Table 1. In particular, the results show how the algorithms are sensitive to initialization, or in other words, how easily they fall in local minima. Also the complexity of the algorithms can be estimated from the information on the elapsed time that is measured in Matlab.
It is easy to notice that our NMFPSESOP algorithm gives the best estimation (the sample which has the highest bestSIR value), and it gives only slightly lower meanSIR values than the LinPG algorithm. Considering the elapsed time, the PL, GPSRBB, SCWA, and IPG belong to the fastest algorithms, while the LinPG and IPN algorithms are the slowest.
The multilayer technique generally improves the performance and consistency of all the tested algorithms if the number of observation is close to the number of nonnegative components. The highest improvement can be observed for the NMFPSESOP algorithm, especially when the number of inner iterations is greater than one (typically, ).
In summary, the best and the most promising NMGPG algorithms are NMFPSESOP, GPSRBB, and IPG algorithms. However, the final selection of the algorithm depends on a size of the problem to be solved. Nevertheless, the projected gradient NMF algorithms seem to be much better (in the sense of speed and performance) in our tests than the multiplicative algorithms, provided that we can use the squared Euclidean cost function which is optimal for data with a Gaussian noise.
References
 A. Cichocki, R. Zdunek, and S. Amari, βNew algorithms for nonnegative matrix factorization in applications to blind source separation,β in Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP '06), vol. 5, pp. 621β624, Toulouse, France, May 2006. View at: Publisher Site  Google Scholar
 J. Piper, V. P. Pauca, R. J. Plemmons, and M. Giffin, βObject characterization from spectral data using nonnegative matrix factorization and information theory,β in Proceedings of the AMOS Technical Conference, pp. 1β12, Maui, Hawaii, USA, September 2004. View at: Google Scholar
 M. N. Schmidt and M. Mørup, βNonnegative matrix factor 2D deconvolution for blind single channel source separation,β in Proceedings of the 6th International Conference on Independent Component Analysis and Blind Signal Separation (ICA '06), vol. 3889 of Lecture Notes in Computer Science, pp. 700β707, Charleston, SC, USA, March 2006. View at: Publisher Site  Google Scholar
 A. Ziehe, P. Laskov, K. Pawelzik, and K.R. Mueller, βNonnegative sparse coding for general blind source separation,β in Advances in Neural Information Processing Systems 16, Vancouver, Canada, 2003. View at: Google Scholar
 W. Wang, Y. Luo, J. A. Chambers, and S. Sanei, βNonnegative matrix factorization for note onset detection of audio signals,β in Proceedings of the 16th IEEE International Workshop on Machine Learning for Signal Processing (MLSP '06), pp. 447β452, Maynooth, Ireland, September 2006. View at: Publisher Site  Google Scholar
 W. Wang, βSquared euclidean distance based convolutive nonnegative matrix factorization with multiplicative learning rules for audio pattern separation,β in Proceedings of the 7th IEEE International Symposium on Signal Processing and Information Technology (ISSPIT '07), pp. 347β352, Cairo, Egypt, December 2007. View at: Publisher Site  Google Scholar
 H. Li, T. Adali, W. Wang, and D. Emge, βNonnegative matrix factorization with orthogonality constraints for chemical agent detection in Raman spectra,β in Proceedings of the 15th IEEE International Workshop on Machine Learning for Signal Processing (MLSP '05), pp. 253β258, Mystic, Conn, USA, September 2005. View at: Publisher Site  Google Scholar
 P. Sajda, S. Du, T. Brown, L. Parra, and R. Stoyanova, βRecovery of constituent spectra in 3D chemical shift imaging using nonnegative matrix factorization,β in Proceedings of the 4th International Symposium on Independent Component Analysis and Blind (ICA '03), pp. 71β76, Nara, Japan, April 2003. View at: Google Scholar
 P. Sajda, S. Du, T. R. Brown et al., βNonnegative matrix factorization for rapid recovery of constituent spectra in magnetic resonance chemical shift imaging of the brain,β IEEE Transactions on Medical Imaging, vol. 23, no. 12, pp. 1453β1465, 2004. View at: Publisher Site  Google Scholar
 P. Sajda, S. Du, and L. C. Parra, βRecovery of constituent spectra using nonnegative matrix factorization,β in Wavelets: Applications in Signal and Image Processing X, vol. 5207 of Proceedings of SPIE, pp. 321β331, San Diego, Calif, USA, August 2003. View at: Publisher Site  Google Scholar
 D. D. Lee and H. S. Seung, βLearning the parts of objects by nonnegative matrix factorization,β Nature, vol. 401, no. 6755, pp. 788β791, 1999. View at: Publisher Site  Google Scholar
 W. Liu and N. Zheng, βNonnegative matrix factorization based methods for object recognition,β Pattern Recognition Letters, vol. 25, no. 8, pp. 893β897, 2004. View at: Publisher Site  Google Scholar
 M. W. Spratling, βLearning image components for object recognition,β Journal of Machine Learning Research, vol. 7, pp. 793β815, 2006. View at: Google Scholar
 Y. Wang, Y. Jia, C. Hu, and M. Turk, βNonnegative matrix factorization framework for face recognition,β International Journal of Pattern Recognition and Artificial Intelligence, vol. 19, no. 4, pp. 495β511, 2005. View at: Publisher Site  Google Scholar
 P. Smaragdis, βNonnegative matrix factor deconvolution; extraction of multiple sound sources from monophonic inputs,β in Proceedings of the 5th International Conference on Independent Component Analysis and Blind Signal Separation (ICA '04), vol. 3195 of Lecture Notes in Computer Science, pp. 494β499, Granada, Spain, September 2004. View at: Publisher Site  Google Scholar
 P. Smaragdis, βConvolutive speech bases and their application to supervised speech separation,β IEEE Transactions on Audio, Speech and Language Processing, vol. 15, no. 1, pp. 1β12, 2007. View at: Publisher Site  Google Scholar
 J.H. Ahn, S. Kim, J.H. Oh, and S. Choi, βMultiple nonnegativematrix factorization of dynamic PET images,β in Proceedings of the 6th Asian Conference on Computer Vision (ACCV '04), pp. 1009β1013, Jeju Island, Korea, January 2004. View at: Google Scholar
 P. CarmonaSaez, R. D. PascualMarqui, F. Tirado, J. M. Carazo, and A. PascualMontano, βBiclustering of gene expression data by nonsmooth nonnegative matrix factorization,β BMC Bioinformatics, vol. 7, article 78, pp. 1β18, 2006. View at: Publisher Site  Google Scholar
 D. Guillamet, B. Schiele, and J. Vitrià, βAnalyzing nonnegative matrix factorization for image classification,β in Proceedings of the 16th International Conference on Pattern Recognition (ICPR '02), vol. 2, pp. 116β119, Quebec City, Canada, August 2002. View at: Publisher Site  Google Scholar
 D. Guillamet and J. Vitrià, βNonnegative matrix factorization for face recognition,β in Proceedings of the 5th Catalan Conference on Artificial Intelligence (CCIA '02), pp. 336β344, Castello de la Plana, Spain, October 2002. View at: Google Scholar
 D. Guillamet, J. Vitrià, and B. Schiele, βIntroducing a weighted nonnegative matrix factorization for image classification,β Pattern Recognition Letters, vol. 24, no. 14, pp. 2447β2454, 2003. View at: Publisher Site  Google Scholar
 O. G. Okun, βNonnegative matrix factorization and classifiers: experimental study,β in Proceedings of the 4th IASTED International Conference on Visualization, Imaging, and Image Processing (VIIP '04), pp. 550β555, Marbella, Spain, September 2004. View at: Google Scholar
 O. G. Okun and H. Priisalu, βFast nonnegative matrix factorization and its application for protein fold recognition,β EURASIP Journal on Applied Signal Processing, vol. 2006, Article ID 71817, 8 pages, 2006. View at: Publisher Site  Google Scholar
 A. PascualMontano, J. M. Carazo, K. Kochi, D. Lehmann, and R. D. PascualMarqui, βNonsmooth nonnegative matrix factorization (nsNMF),β IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 28, no. 3, pp. 403β415, 2006. View at: Publisher Site  Google Scholar
 V. P. Pauca, F. Shahnaz, M. W. Berry, and R. J. Plemmons, βText mining using nonnegative matrix factorizations,β in Proceedings of the 4th SIAM International Conference on Data Mining (SDM '04), pp. 452β456, Lake Buena Vista, Fla, USA, April 2004. View at: Google Scholar
 F. Shahnaz, M. W. Berry, V. P. Pauca, and R. J. Plemmons, βDocument clustering using nonnegative matrix factorization,β Journal on Information Processing & Management, vol. 42, no. 2, pp. 373β386, 2006. View at: Publisher Site  Google Scholar
 T. Li and C. Ding, βThe relationships among various nonnegative matrix factorization methods for clustering,β in Proceedings of the 6th IEEE International Conference on Data Mining (ICDM '06), pp. 362β371, Hong Kong, December 2006. View at: Publisher Site  Google Scholar
 C. Ding, T. Li, W. Peng, and H. Park, βOrthogonal nonnegative matrix trifactorizations for clustering,β in Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD '06), pp. 126β135, Philadelphia, Pa, USA, August 2006. View at: Google Scholar
 R. Zass and A. Shashua, βA unifying approach to hard and probabilistic clustering,β in Proceedings of the 10th IEEE International Conference on Computer Vision (ICCV '05), vol. 1, pp. 294β301, Beijing, China, October 2005. View at: Publisher Site  Google Scholar
 A. Banerjee, S. Merugu, I. S. Dhillon, and J. Ghosh, βClustering with Bregman divergences,β in Proceedings of the 4th SIAM International Conference on Data Mining (SDM '04), pp. 234β245, Lake Buena Vista, Fla, USA, April 2004. View at: Google Scholar
 H. Cho, I. S. Dhillon, Y. Guan, and S. Sra, βMinimum sumsquared residue coclustering of gene expression data,β in Proceedings of the 4th SIAM International Conference on Data Mining (SDM '04), pp. 114β125, Lake Buena Vista, Fla, USA, April 2004. View at: Google Scholar
 S. Wild, Seeding nonnegative matrix factorization with the spherical kmeans clustering, M.S. thesis, University of Colorado, Boulder, Colo, USA, 2000.
 M. W. Berry, M. Browne, A. N. Langville, V. P. Pauca, and R. J. Plemmons, βAlgorithms and applications for approximate nonnegative matrix factorization,β Computational Statistics and Data Analysis, vol. 52, no. 1, pp. 155β173, 2007. View at: Publisher Site  Google Scholar
 Y.C. Cho and S. Choi, βNonnegative features of spectrotemporal sounds for classification,β Pattern Recognition Letters, vol. 26, no. 9, pp. 1327β1336, 2005. View at: Publisher Site  Google Scholar
 J.P. Brunet, P. Tamayo, T. R. Golub, and J. P. Mesirov, βMetagenes and molecular pattern discovery using matrix factorization,β Proceedings of the National Academy of Sciences of the United States of America, vol. 101, no. 12, pp. 4164β4169, 2004. View at: Publisher Site  Google Scholar
 N. Rao and S. J. Shepherd, βExtracting characteristic patterns from genomewide expression data by nonnegative matrix factorization,β in Proceedings of the IEEE Computational Systems Bioinformatics Conference (CSB '04), pp. 570β571, Stanford, Calif, USA, August 2004. View at: Publisher Site  Google Scholar
 A. Cichocki, R. Zdunek, and S. Amari, βCsiszár's divergences for nonnegative matrix factorization: family of new algorithms,β in Independent Component Analysis and Blind Signal Separation, vol. 3889 of Lecture Notes in Computer Science, pp. 32β39, Springer, New York, NY, USA, 2006. View at: Publisher Site  Google Scholar
 A. Cichocki, R. Zdunek, and S. Amari, βNonnegative matrix and tensor factorization,β IEEE Signal Processing Magazine, vol. 25, no. 1, pp. 142β145, 2008. View at: Publisher Site  Google Scholar
 D. Donoho and V. Stodden, βWhen does nonnegative matrix factorization give a correct decomposition into parts?,β in Advances in Neural Information Processing Systems 16, Vancouver, Canada, 2003. View at: Google Scholar
 A. M. Bruckstein, M. Elad, and M. Zibulevsky, βSparse nonnegative solution of a linear system of equations is unique,β in Proceedings of the 3rd International Symposium on Communications, Control and Signal Processing (ISCCSP '08), St. Julians, Malta, March 2008. View at: Google Scholar
 F. J. Theis, K. Stadlthanner, and T. Tanaka, βFirst results on uniqueness of sparse nonnegative matrix factorization,β in Proceedings of the 13th European Signal Processing Conference (EUSIPCO '05), Antalya, Turkey, September 2005. View at: Google Scholar
 H. Laurberg, M. G. Christensen, M. D. Plumbley, L. K. Hansen, and S. H. Jensen, βTheorems on positive data: on the uniqueness of NMF,β Computational Intelligence and Neuroscience, vol. 2008, Article ID 764206, 9 pages, 2008. View at: Publisher Site  Google Scholar
 A. Cichocki and R. Zdunek, βNMFLAB for signal and image processing,β Laboratory for Advanced Brain Signal Processing, BSI, RIKEN, Saitama, Japan, 2006. View at: Google Scholar
 A. Cichocki, S. Amari, R. Zdunek, R. Kompass, G. Hori, and Z. He, βExtended SMART algorithms for nonnegative matrix factorization,β in Proceedings of the 8th International Conference on Artificial Intelligence and Soft Computing (ICAISC '06), vol. 4029 of Lecture Notes in Computer Science, pp. 548β562, Springer, Zakopane, Poland, June 2006. View at: Publisher Site  Google Scholar
 R. Zdunek and A. Cichocki, βNonnegative matrix factorization with quasiNewton optimization,β in Proceedings of the 8th International Conference on Artificial Intelligence and Soft Computing (ICAISC '06), vol. 4029 of Lecture Notes in Computer Science, pp. 870β879, Zakopane, Poland, June 2006. View at: Publisher Site  Google Scholar
 P. Paatero, βLeastsquares formulation of robust nonnegative factor analysis,β Chemometrics and Intelligent Laboratory Systems, vol. 37, no. 1, pp. 23β35, 1997. View at: Publisher Site  Google Scholar
 P. Paatero and U. Tapper, βPositive matrix factorization: a nonnegative factor model with optimal utilization of error estimates of data values,β Environmetrics, vol. 5, no. 2, pp. 111β126, 1994. View at: Publisher Site  Google Scholar
 D. D. Lee and H. S. Seung, βAlgorithms for nonnegative matrix factorization,β in Advances in Neural Information Processing Systems 13, pp. 556β562, Denver, Colo, USA, 2000. View at: Google Scholar
 Ch.J. Lin., βOn the convergence of multiplicative update algorithms for nonnegative matrix factorization,β IEEE Transactions on Neural Networks, vol. 18, no. 6, pp. 1589β1596, 2007. View at: Publisher Site  Google Scholar
 M. T. Chu, F. Diele, R. Plemmons, and S. Ragni, βOptimality, computation, and interpretation of nonnegative matrix factorizations,β Departments of Mathematics and Computer Science, Wake Forest University, WinstonSalem, NC, USA, 2004. View at: Google Scholar
 P. O. Hoyer, βNonnegative matrix factorization with sparseness constraints,β Journal of Machine Learning Research, vol. 5, pp. 1457β1469, 2004. View at: Google Scholar
 C.J. Lin, βProjected gradient methods for nonnegative matrix factorization,β Neural Computation, vol. 19, no. 10, pp. 2756β2779, 2007. View at: Publisher Site  Google Scholar  PubMed  MathSciNet
 A. Cichocki and R. Zdunek, βMultilayer nonnegative matrix factorization using projected gradient approaches,β International Journal of Neural Systems, vol. 17, no. 6, pp. 431β446, 2007. View at: Publisher Site  Google Scholar
 A Cichocki and R. Zdunek, βMultilayer nonnegative matrix factorization,β Electronics Letters, vol. 42, no. 16, pp. 947β948, 2006. View at: Publisher Site  Google Scholar
 A. Cichocki and R. Zdunek, βRegularized alternating least squares algorithms for nonnegative matrix/tensor factorizations,β in Proceedings of the 4th International Symposium on Neural Networks on Advances in Neural Networks ( ISNN '07), vol. 4493 of Lecture Notes in Computer Science, pp. 793β802, Springer, Nanjing, China, June 2007. View at: Publisher Site  Google Scholar
 R. Zdunek and A. Cichocki, βNonnegative matrix factorization with constrained secondorder optimization,β Signal Processing, vol. 87, no. 8, pp. 1904β1916, 2007. View at: Publisher Site  Google Scholar
 B. Johansson, T. Elfving, V. Kozlov, Y. Censor, P.E. Forssén, and G. Granlund, βThe application of an obliqueprojected Landweber method to a model of supervised learning,β Mathematical and Computer Modelling, vol. 43, no. 78, pp. 892β909, 2006. View at: Publisher Site  Google Scholar  MathSciNet
 J. Barzilai and J. M. Borwein, βTwopoint step size gradient methods,β IMA Journal of Numerical Analysis, vol. 8, no. 1, pp. 141β148, 1988. View at: Publisher Site  Google Scholar  Zentralblatt MATH
 G. Narkiss and M. Zibulevsky, βSequential subspace optimization method for largescale unconstrained problems,β Department of Electrical Engineering, Technion, Israel Institute of Technology, Haifa, Israel, October 2005. View at: Google Scholar
 M. Elad, B. Matalon, and M. Zibulevsky, βCoordinate and subspace optimization methods for linear least squares with nonquadratic regularization,β Applied and Computational Harmonic Analysis, vol. 23, no. 3, pp. 346β367, 2007. View at: Publisher Site  Google Scholar  MathSciNet
 S. Bellavia, M. Macconi, and B. Morini, βAn interior point Newtonlike method for nonnegative leastsquares problems with degenerate solution,β Numerical Linear Algebra with Applications, vol. 13, no. 10, pp. 825β846, 2006. View at: Publisher Site  Google Scholar  MathSciNet
 V. Franc, V. Hlaváč, and M. Navara, βSequential coordinatewise algorithm for the nonnegative least squares problem,β in Proceedings of the 11th International Conference on Computer Analysis of Images and Patterns (CAIP '05), A. Gagalowicz and W. Philips, Eds., vol. 3691 of Lecture Notes in Computer Science, pp. 407β414, Springer, Versailles, France, September 2005. View at: Publisher Site  Google Scholar
 M. Bertero and P. Boccacci, Introduction to Inverse Problems in Imaging, Institute of Physics, Bristol, UK, 1998.
 Y.H. Dai and R. Fletcher, βProjected BarzilaiBorwein methods for largescale boxconstrained quadratic programming,β Numerische Mathematik, vol. 100, no. 1, pp. 21β47, 2005. View at: Publisher Site  Google Scholar  MathSciNet
 A. Nemirovski, βOrthmethod for smooth convex optimization,β Izvestiia Akademii Nauk SSSR. Tekhnicheskaia Kibernetika, vol. 2, 1982 (Russian). View at: Google Scholar
 R. W. Freund and N. M. Nachtigal, βQMR: a quasiminimal residual method for nonHermitian linear systems,β Numerische Mathematik, vol. 60, no. 1, pp. 315β339, 1991. View at: Publisher Site  Google Scholar  Zentralblatt MATH  MathSciNet
 R. Fletcher, βConjugate gradient methods for indefinite systems,β in Proceedings of the Dundee Biennial Conference on Numerical Analysis, pp. 73β89, Springer, Dundee, Scotland, July 1975. View at: Publisher Site  Google Scholar
 C. Lanczos, βSolution of systems of linear equations by minimized iterations,β Journal of Research of the National Bureau of Standards, vol. 49, no. 1, pp. 33β53, 1952. View at: Google Scholar
 H. A. van der Vorst, βBiCGSTAB: a fast and smoothly converging variant of BiCG for the solution of nonsymmetric linear systems,β SIAM Journal on Scientific and Statistical Computing, vol. 13, no. 2, pp. 631β644, 1992. View at: Publisher Site  Google Scholar
 Y. Saad and M. H. Schultz, βGMRES: a generalized minimal residual algorithm for solving nonsymmetric linear systems,β SIAM Journal on Scientific and Statistical Computing, vol. 7, no. 3, pp. 856β869, 1986. View at: Publisher Site  Google Scholar  Zentralblatt MATH
 M. R. Hestenes and E. Stiefel, βMethod of conjugate gradients for solving linear systems,β Journal of Research of the National Bureau of Standards, vol. 49, pp. 409β436, 1952. View at: Google Scholar
 P. C. Hansen, RankDeficient and Discrete IllPosed Problems, SIAM, Philadelphia, Pa, USA, 1998.
 C. C. Paige and M. A. Saunders, βLSQR: an algorithm for sparse linear equations and sparse least squares,β ACM Transactions on Mathematical Software, vol. 8, no. 1, pp. 43β71, 1982. View at: Publisher Site  Google Scholar
 J. G. Nagy and Z. Strakos, βEnforcing nonnegativity in image reconstruction algorithms,β in Mathematical Modeling, Estimation, and Imaging, vol. 4121 of Proceedings of SPIE, pp. 182β190, San Diego, Calif, USA, July 2000. View at: Publisher Site  Google Scholar
Copyright
Copyright © 2008 Rafal Zdunek and Andrzej Cichocki. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.