Abstract
We present a novel method, called graph sparse nonnegative matrix factorization, for dimensionality reduction. The affinity graph and sparse constraint are further taken into consideration in nonnegative matrix factorization and it is shown that the proposed matrix factorization method can respect the intrinsic graph structure and provide the sparse representation. Different from some existing traditional methods, the inertial neural network was developed, which can be used to optimize our proposed matrix factorization problem. By adopting one parameter in the neural network, the global optimal solution can be searched. Finally, simulations on numerical examples and clustering in realworld data illustrate the effectiveness and performance of the proposed method.
1. Introduction
Dimensionality reduction plays a fundamental role in image processing, and many researchers have been seeking effective methods to solve this problem. For a given image database, there are many distinct features, whereas the available features are far less enough. Thus, it is of great significance to find useful features with lowdimensionality to represent the original feature space. For this purpose, matrix factorization techniques have attracted great attention in recent decades [1–3], and many different methods have been developed by using different criteria. The most familiar methods include Singular Value Decomposition (SVD) [4], Principal Component Analysis (PCA) [5], and Vector Quantization (VQ) [6]. The main idea of matrix factorization methods is finding several matrices whose product approximates to the original matrix. In dimensionality reduction, the dimension of the decomposed matrices is smaller than that of the original matrix. This gives rise to a lowdimensional compact representation of the original data points, which can facilitate clustering or classification.
Among these matrix factorization methods, one of the most used methods is nonnegative matrix factorization (NMF) [3], which requires the decomposed matrices to be nonnegative. The effect of the nonnegative constraint leads NMF to learn a partbased representation of highdimensional data, and it is applied to so many areas such as signal processing [7], data mining [8, 9], and computer vision [10]. In general, NMF is shown to be effective for unsupervised learning problems, but not applicable to supervised learning problems. To overcome this problem, some researchers [11–13] have presented semisupervised learning theory to achieve better performance in dimensionality reduction. In the light of locality preserving projection, a graph regularized nonnegative matrix factorization method (GNMF) has been proposed to impose the geometrical information on the data space. The geometrical structure is constructed by a nearest neighbor graph [11]. Based on the idea of label propagation, Liu et al. [13] imposed the label information constraint into nonnegative matrix factorization (CNMF). The idea of CNMF is that the neighboring data points with the same class are supposed to merge together in the lowdimensional representation space.
Motivated by previous researches in matrix factorization, in this paper, we propose a novel method, called graph sparse nonnegative matrix factorization (GSNMF), for dimensionality reduction, which can be used for semisupervised learning problems. In GSNMF, a sparse constraint is imposed on GNMF, and this leads matrix factorization to learn a sparse partbased representation of the original data space. The sparse constraint causes GSNMF to be a nonconvex nonsmooth problem, and traditional optimization algorithms can not be optimized directly. Recently, numerous neural networks have emerged as a powerful tool for optimization problems [14–27]. For some nonconvex problems, an inertial projection neural network (IPNN) [16] has been proposed to search different local optimal solutions by the inertial term. In [17], a shuffled frog leaping algorithm (SFLA) has been developed using the recurrent neural network. Based on the SFLA framework, the global optimal solution can be searched. Moreover, there are many optimization methods for nonconvex nonsmooth problems that use neural networks [22–27].
It is worth highlighting some advantages of our proposed method as follows:(i)Traditional algorithms for GNMF [11] and NMF [3] can easily trap into local optimum solution, and these algorithms are sensitive to initial values, while our proposed algorithm using inertial projection neural network can avoid these problems.(ii)Our proposed algorithm can be initialized by sparser matrices; however, GNMF and NMF may fail in this case.(iii)By adopting one parameter in the neural network, GSNMF has the better clustering effect than GNMF and NMF.
The rest of the paper is organized as follows. In Section 2, some related works to NMF are briefly reviewed; then we introduce GSNMF. Section 3 reviews the inertial projection neural network theory and provides a convergence proof to GSNMF. Section 4 presents numerical examples to demonstrate the validity of our proposed algorithm. Experiments on clustering are given in Section 5. Finally, we present some concluding marks and future work in Section 6.
2. Problem Formulation
To find the effective features of high dimensionality data, matrix factorization can be used to learn sets of features to represent data. Given a data matrix and an integer , matrix factorization is to find two matrices and such thatWhen , the matrix factorization method can be regarded as a dimensionality reduction method. In image dimensionality reduction, each column of is a basis vector to capture the original image data and each column of is the representation with respect to the new basis. The most used method to measure the approximation is the Frobenius norm in the following form:
Different matrix factorization methods imposed different constraints on (2) that can solve different practical problems. At present, the most used matrix factorization method is nonnegative matrix factorization (NMF) [3] with nonnegative constraints on and . The classic algorithm is summarized as follows:
Recently, Cai et al. [11] proposed a graph regularized nonnegative matrix factorization method (GNMF) and incorporated the geometrical information into the data space. The goal of GNMF is to find effective basis vectors to represent the intrinsic structure. The research has presented the natural assumption that if two data points and from are close in the intrinsic geometry of the data distribution, new representations of two points and are also close to each other. For each data point , we find its nearest neighbors and put edges between and its neighbors. Edges between each data points can be considered as the weight matrix . If nodes and are connected by an edge, then . can be described by
The lowdimensional representation of with respect to the new basis is . The Euclidean distance is used to measure the dissimilarity between and byWith the above analysis, the following term is used to measure the smoothness of the lowdimensional representation:where denotes the trace of a matrix, , and . Combining (6) and (2), the new objective function is defined by the Euclidean distance.The algorithm to solve (7) is presented as follows:
When or , GNMF is equivalent to nonnegative matrix factorization. In the representation of the image data, GNMF and NMF only consider the Euclidean structure of image space. However, recent researches have shown that human generated images may from a submanifold of the ambient Euclidean space [28, 29]. In general, the human generated images cannot uniformly fill up the highdimensional Euclidean space. Therefore, the matrix factorization should respect the intrinsic manifold structure and learn the sparse basis to represent the image data. In the light of sparse coding [30], we impose the sparse constraint on (7) and the optimization problem is transformed into another form:
Because the optimization problem (9) is nonconvex, a blockcoordinate update (BCD) [31] structure is proposed to optimize GSNMF. Given the initial and , BCD alternatively solvesuntil convergence. Since (11) and (10) have a similar form, we only consider how to solve (10); then (11) can be solved accordingly. The problem (10) can be transformed into the following vector form:whereIt is evident that norm is not differentiable. However, [32] has presented a method to solve it. Supposing , problem (12) can be rewritten as follows:where , , , , and and are, respectively, defined as follows:According to the BCD structure, (14) can be separated into two subproblems. Given the initial and , one alternatively solvesuntil convergence. Since (16) and (17) have a similar form, we only consider how to solve (16); then (17) can be solved accordingly. Equation (16) can be transformed into the following convex quadratic program (CQP):where
According to the above analysis, problem (11) can be also transformed into a convex quadratic program problem. For saving space, we do not provide the derivation process. In the following section, we will introduce IPNN to optimize (18).
3. Neural Network Model and Analysis
3.1. Inertial Projection Neural Network
To solve problem (18), we establish the following neural network using IPNN [16]:where . Now, we are ready to show the convergence and optimality of (20). For any and , we set , , and ; then we will present the following theorems.
Theorem 1. For any initial point with the initial condition , there exists unique continuous solution , where .
Proof. Note that is Lipschitz continuous. Let be the Lipschitz constant. Thus, for any and , we obtainwhere and are unit matrix and Lipschitz constant, respectively. Therefore, is Lipschitz continuous on . There exists unique solution with initial condition by the local existence theorem of solution to ordinary differential equations.
Theorem 2. Define , if the following two conditions hold.
For any and ,where is the angle between and .
. Then, the solution of model (20) converges to optimal solution set of (18).
Proof. Considering and the Lyapunov function , we obtainSince and Condition holds, (23) can be rewritten asThen, (24) can be transformed intowhich indicates that the function is monotone nonincreasing. Thus, for any ,Since , we obtain . FurtherBy multiplying the inequality (27) into , we obtain which implies thatIntegrating (28) from 0 to , it is obtained that . Therefore, the trajectory of model (20) is bounded.
Since is bounded and inequality (26) holds, we obtainThus, (29) can be rewritten asSince is bounded, (30) indicates that is also bounded. From (25) and (30), one obtains and .
Assuming , and since , it implies that there exists , such that for any . Therefore, one obtainsand it is easy to obtain thatAccording to the theory of Calculus, the value of exists. Therefore, the value of also exists. Since , we obtain . Since is bounded, we have .
It follows from (20) thatIf , thenwhich implies that . In the following, we will prove .
Defining , and substituting into (20), one obtainsSince Condition holds, we haveHence, we getIntegrating (35), it gives . Since , we obtain . Thus, the solution of system (20) converges to the optimal set . The proof is completed.
Remark 3. In Theorem 2, Condition should be satisfied. In the following, we discuss the existence of parameter . For any and , we haveTherefore, is also Lipschitz continuous andThus, we getwhere is the angle between and . It is easy to obtain
3.2. Algorithms
Based on the above analysis, we summarize Algorithms 1, 2, and 3 to optimize GSNMF. Firstly, the parameters , , , and can be derived by Algorithm 1. Secondly, Algorithm 2 applies IPNN to optimize the CQP problem. Thirdly, the optimization problem (9) is divided into two CQP problems which are solved alternatively by Algorithm 3. In the following, we analyse the time complexity of our proposed algorithms. The main cost of our proposed algorithms is spent on the calculation of the gradient in Algorithm 2. To optimize subproblem (10), the operation in Algorithm 2 is the matrix product , which takes . With the cost for calculating , the complexity of using Algorithm 2 to optimize the subproblem (10) is Similarly, the complexity of using Algorithm 2 to optimize the subproblem (11) is Hence, the overall cost to solve GSNMF is We summarize the time complexity of one iteration round of GSNMF, GNMF, and NMF in Table 1. At each iteration, there are two operations, same as NMF and GNMF.



4. Numerical Examples
In this section, we exhibit the global searching ability of GSNMF. By adjusting the inertial term in the neural network, different local optimal solutions can be searched. Let , , , , , , , , and . In order to ensure the validity of this experiment, we provide the initial , , and in Tables 2, 3, and 4, respectively. Table 5 shows the comparison between GSNMF and NMF. To investigate whether GSNMF can converge, the convergence curve is depicted in Figure 1 with .
5. Application in Image Clustering
5.1. Databases
To examine the clustering performance of GSNMF, we present the experiment in two databases including IRIS and COIL20. Their details are presented in the following (see also Table 6).
(1) IRIS. It includes 150 instances with 4 features. There are 3 classes including Versicolour, Setosa, and Virginca. Each class includes 50 instances.
(2) COIL20. This data set is an image library which contains 1440 instances with 16 × 16 gray scale features. There are 20 different classes and each class contains 72 instances.
5.2. Compared Methods
We present the clustering performance on two databases using GSNMF and GNMF. There are two metrics including accuracy and normalized mutual information [33] to evaluate the clustering performance. To reveal the effect of the sparse constraint, different cardinalities (the number of zero entries) of the initial are considered to evaluate the clustering performance. Because NMF is a nonconvex problem, different initial and may lead to different local optimal solutions. For the fair comparison, we try 10 different initial and and report the average results. We compare different methods in two cases. Firstly, and are randomly generated between 0 and 1. Secondly, and are randomly generated between −1 and 1.
5.3. Compared Results
Tables 7 and 8 present the clustering results on IRIS in two cases. The parameters for IRIS are , , , , , and . The clustering performance on COIL20 is shown in Tables 9 and 10 in two cases. The parameters for COIL20 are , , , , , and . These tables reveal some interesting points:(i)When the sparse constraint is imposed on GNMF, the clustering performance of GSNMF is better than NMF and GNMF.(ii)When the initial and have some negative entries, NMF and GNMF fail to cluster. However, GSNMF is not affected in this case.
5.4. Parameters Selection
Our GSNMF has one essential parameter: the inertia term . Figures 2 and 3 depict the average performance of GSNMF with different .
(a) Accuracy
(b) Normalized mutual information
(a) Accuracy
(b) Normalized mutual information
5.5. Convergence Study
According to the BCD and IPNN theory, the method for optimizing GSNMF is proved to be convergent. Here we investigate whether this method can converge to a stationary point. Figure 4 depicts the convergence curves of GSNMF on two data sets. For each figure, the axis is the iteration number and the axis denotes the objective value.
(a) IRIS
(b) COIL20
6. Conclusion and Future Work
We propose a dimensionality reduction method, which can be solved by the inertial projection neural network. According to the experiments, three advantages are presented. Firstly, different local solutions can be achieved with different inertial terms . Secondly, the clustering performance cannot be affected by the negative initial values. However, GNMF and NMF have poor performance in clustering with negative initial values. Thirdly, if the initial values are sparse, our proposed method performs better than GNMF and NMF in the clustering.
Several topics remain to be discussed in our future work:(i)There is a parameter which searches the global optimal solution of GSNMF. Thus, a suitable value of is critical to our algorithm. It remains unclear how to select theoretically.(ii) is a step length to decide the convergence rate in Algorithm 2. If it is assigned a small value, slow convergence makes a bad clustering performance. Thus, an adaptive step length should be considered.
Conflicts of Interest
The authors declare that they have no conflicts of interest.