Journal of Optimization

Volume 2016, Article ID 5975120, 8 pages

http://dx.doi.org/10.1155/2016/5975120

## Bidirectional Nonnegative Deep Model and Its Optimization in Learning

Chongqing Key Lab of Computational Intelligence, College of Computer Science and Technology, Chongqing University of Posts and Telecommunications, Chongqing 400065, China

Received 18 August 2016; Accepted 17 October 2016

Academic Editor: Tongliang Liu

Copyright © 2016 Xianhua Zeng et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

#### Abstract

Nonnegative matrix factorization (NMF) has been successfully applied in signal processing as a simple two-layer nonnegative neural network. Projective NMF (PNMF) with fewer parameters was proposed, which projects a high-dimensional nonnegative data onto a lower-dimensional nonnegative subspace. Although PNMF overcomes the problem of out-of-sample of NMF, it does not consider the nonlinear characteristic of data and is only a kind of narrow signal decomposition method. In this paper, we combine the PNMF with deep learning and nonlinear fitting to propose a bidirectional nonnegative deep learning (BNDL) model and its optimization learning algorithm, which can obtain nonlinear multilayer deep nonnegative feature representation. Experiments show that the proposed model can not only solve the problem of out-of-sample of NMF but also learn hierarchical nonnegative feature representations with better clustering performance than classical NMF, PNMF, and Deep Semi-NMF algorithms.

#### 1. Introduction

In the study of machine learning, pattern recognition, computer vision, and image processing, it is an important problem to find the effective representations of the input data matrix with nonnegative elements and very high dimensions. In 1999, Lee and Seung had proposed a classical feature representation method, named nonnegative matrix factorization (NMF) [1], which effectively solved the above problems. The basic idea and analysis of the NMF algorithm may be simply described as follows.

Given a nonnegative data matrix , which is a collection of samples as columns, and each sample is nonnegative so that they allow only additive not subtractive and linear combinations. To a degree, it can capture the essence of intelligent data description. And the objective function can be defined as

Although the NMF is optimal for learning the parts of objects, it suffers from the out-of-sample problem [2, 3]; namely, it is indirect or repeats the factorization to obtain the coefficients of any new coming examples. To overcome the disadvantages of the NMF, after that, the researchers put forward some improved methods based on the NMF algorithm. For example, Yuan et al. proposed a Projective NMF (PNMF) [4] in 2009. The PNMF is a modified form of the traditional NMF, with strong sparseness and orthogonality [4, 5] under the projection assumption. It only needs to calculate a nonnegative matrix , thereby reducing the amount of computation at each iteration; that is, the PNMF learns a nonnegative matrix to directly project onto the lower-dimensional nonnegative subspace. If denotes the basis matrix, the PNMF treats as the coefficient and utilizes to reconstruct . So its objective function is

The PNMF has fewer parameters than the NMF, and it is widely used in linear dimension reduction and can solve the problem about out-of-sample deficiency. Being the same with the NMF, the PNMF is a linear dimensionality reduction method, but many data present the nonlinear characteristics [6]. At the same time, the NMF and the PNMF only factorize the original data one time [7]. In many situations, the nonnegative data sampled from real applications are usually very complex and need to be factorized many times for obtaining the high-level deep features with distinction and strong representation ability. Some studies have shown that, in order to learn the high-level representations of complex data and have better performance in image understanding and speech perception, the deep learning is needed [6]. And the deep learning has a profound impact both in academia and in industry fields since Hinton and Salakhutdinov published a known article [8] in Science in 2006. This article shows the following: (1) artificial neural network with a lot of hidden layers has excellent ability for learning characteristics, which is more essential to describe data and facilitates the visualization, clustering, and classification; (2) the difficulty on training the deep neural network can be overcome by the “layer by layer initialization” (layer-wise pretraining). With the success of training deep architectures, several variants of deep learning have been introduced [6, 9]. These multilayer algorithms take hierarchical approaches in feature extraction and provide efficient solution to complex problems, and they use an error backpropagation algorithm and unsupervised learning to obtain an effective representation model. However, they have not considered the following concerns: (1) the weights should be nonnegative when a lot of physical signal is nonnegative data; (2) the pure additive description uses little component to make the components of the nonnegative data clear.

For obtaining the deep nonnegative feature representation, Trigeorgis et al. applied the concept of Semi-NMF [10] to propose a Deep Semi-NMF [9] that is able to learn hidden deep representations of the original data. In the Semi-NMF, the goal is to construct a low-dimensional nonnegative representation of our original data , with the bases matrix serving as the mapping between our original data and its lower-dimensional representation [10]. The Deep Semi-NMF model finds a representation of the data that has a similar interpretation at the top layer. The input data matrix is now further analyzed as a product of multiple factors , which are thought to be deep seminonnegative matrix factorization. That means it is able to decompose the data in different ways according to multiple different attributes:

Although the Deep Semi-NMF uses a multilayer model to obtain more features, it can only deal with seminonnegative data, which is a linear transformation with weak representation capacity. Moreover, the Deep Semi-NMF model still has the out-of-sample problem.

Based on the above analysis, the PNMF only computes one projection matrix and it cannot learn more rich features, especially when the data are a nonlinear or near a nonlinear manifold, or the data are hierarchically generated. Motivated by the ideas of the PNMF, the Deep Semi-NMF, and deep learning (especially, AutoEncoder [8, 11]), in this paper, we propose a novel model which we call bidirectional nonnegative deep learning (BNDL), for learning more helpful and meaningful deep nonnegative representations of the original data with nonlinear characteristic and overcoming the out-of-sample problem. In Section 2, we introduce our BNDL method and the analysis of the optimal objective functions. And we give the corresponding algorithms in Section 3. Experiments are demonstrated in Section 4. In Section 5, we briefly give some conclusion remarks about this paper.

#### 2. Bidirectional Nonnegative Deep Learning Model

##### 2.1. Motivation

The particular attraction of the NMF alspongorithm is the nonnegative constraints, and it is useful for data representation in clustering. But the NMF is a simple linear coding algorithm using a single layer network with nonnegative constraints, and it suffers from the out-of-sample deficiency which cannot directly obtain the codes of any new coming examples [12, 13].

To the PNMF algorithm, it uses the transpose matrix of the learned basis matrix as the projection matrix, which obtains nonnegative coefficients for any new coming examples [4, 14]. Although it overcomes the problem of out-of-sample of the NMF, the PNMF is also a linear coding algorithm and simple single layer decomposition.

On the other hand, the current existing deep network models rarely consider the nonnegative constraints, even if the newest related model Deep Semi-NMF [9] only broadens an incomplete nonnegative constraint and is still a linear model.

In this paper, we propose a nonnegative hierarchical data representation model, named bidirectional nonnegative deep learning (BNDL) model, which applies the concept of PNMF to train an initial multilayer nonlinear structure that is able to learn hidden complete deep representations of the original data.

Different from the other deep architectures, the BNDL firstly constructs a pretraining deep network through stacking every nonnegative two layers network independently to get the whole network, and the learning process of each layer is to combine the PNMF and a designed nonlinear mapping. That is to say that each time we do one-step decomposition, then the basis matrix of two-layer BNDL can be regarded as the weight matrix of the deep network, and the output of this step can be used as the input of the next layer by a Sigmoid function. Upwards, iterating this process, we can get a deep network. Downwards, we can reconstruct the original sample data. Because BNDL only learn one layer in each step, we can fast build a deep network. The hierarchical feature extraction strategy learns more meaningful, helpful features and higher-order nonnegative nonlinear characteristics than one-step learning. Finally, a fine-tune training is applied to improve the reconstruction performance and deep features of our deep network under the nonnegative weight value constraints.

##### 2.2. Bidirectional Nonnegative Deep Learning Model

Let denote the data sample set, among which denotes the feature descriptor of the th sample and is the number of total samples. Here we assume that the input data matrix is nonnegative. Let denote the dimension of the desired dimension-reduced feature space. The task of data factorization is to get a nonnegative basis matrix and its corresponding coefficient matrix . Here devotes the projection matrix that transforms an -dimensional feature vector into a -dimensional feature space. The matrices , , are the output matrices of the first layers, and is equal to the original matrix . The objection of the th factorization is as close as possible ; that is, . So the objection function for projective nonnegative multilayer factorization can be defined aswhere is a positive factor to avoid the too large input amount for Sigmoid function and (or or ) denotes that each element of it is nonnegative.

Due to preserving the same S-shape nonlinear mapping function of the top-to-bottom operator, the top-to-bottom reconstruction basis should also be constrained to reconstruct the input of the ()th layer. So the new objective cost is further improved intowhere is the balance factor. In our experiments, if , ; else .

So the two-layer structure for constructing BNDL can be illustrated in Figure 1.