Mathematical Problems in Engineering

Volume 2016 (2016), Article ID 3934872, 11 pages

http://dx.doi.org/10.1155/2016/3934872

## A General Solution to Least Squares Problems with Box Constraints and Its Applications

Sino-Dutch Biomedical and Information Engineering School, Northeastern University, Shenyang 110004, China

Received 15 November 2015; Revised 4 March 2016; Accepted 18 April 2016

Academic Editor: Masoud Hajarian

Copyright © 2016 Yueyang Teng et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

#### Abstract

The main contribution of this paper is presenting a flexible solution to the box-constrained least squares problems. This solution is applicable to many existing problems, such as nonnegative matrix factorization, support vector machine, signal deconvolution, and computed tomography reconstruction. The key concept of the proposed algorithm is to replace the minimization of the cost function at each iteration by the minimization of a surrogate, leading to a guaranteed decrease in the cost function. In addition to the monotonicity, the proposed algorithm also owns a few good features including the self-constraint in the feasible region and the absence of a predetermined step size. This paper theoretically proves the global convergence for a special case of below-bounded constraints. Using the proposed mechanism, some valuable algorithms can be derived. The simulation results demonstrate that the proposed algorithm provides performance that is comparable to that of other commonly used methods in numerical experiment and computed tomography reconstruction.

#### 1. Introduction

Solving the linear system is a classic inverse problem, where and are vectors and is a matrix. This problem is applicable in many fields, including nonnegative matrix factorization (NMF), support vector machine (SVM), signal deconvolution, and medical image reconstruction (e.g., computed tomography (CT)) [1–5]. In many cases, we need to impose a box constraint on the problem. A classical approach to choosing is to minimize the least squares (LS) error between and : where and are given constant vectors and denotes the Euclidean distance.

In fact, (1) can be easily converted into an equivalent form by a linear transformation , which is convenient to study. In the following, we only focus on such LS problems and still denote variables with for simple notation; subsequently, the readers may generalize all the results of this paper to the generic form by themselves:where is a constant vector.

If we denote , then it is the widely used case, that is, the nonnegativity constrained LS problem:

There are many methods for solving nonnegativity constrained problems. A simple method for solving such problems is by using Of course, we must truncate the negative pixel values to satisfy the nonnegativity constraint. However, we will encounter two difficulties: perhaps is not invertible or is too large. A viable example for the minimization of (3) is the gradient-based method. The steepest descent method is perhaps the simplest technique to implement, which takes the negative gradient as the descent direction: where the superscript () denotes the th iteration and is the step size.

Let be a constant, where , and denotes the maximum eigenvalue for a matrix; then, (5) becomes the projected Landweber (PL) method [6]:

In addition, Lantéri et al. [7, 8] provided a general multiplicative algorithm, which was also a gradient-based method. Let be a positive function with positive values for any ; then, is also a descent direction. Therefore, the algorithm in a modified gradient form can be written as Again, let and be two positive functions for any , which satisfy . Taking , Lantéri et al. obtained a multiplicative update

The conjugate gradient (CG) method is also a popular method, and it is often implemented as an iterative algorithm, applicable to sparse systems that are too large to be handled by a direct implementation.

The above approaches are not effective for processing the upper-bounded constraints, and, generally, a truncation is imposed to the update rule, which may lead to a divergent modification.

In this paper, we consider a specific application of the surrogate-based methods [9] to a specific type of LS problem with box constraints. We derive a multiplicative update rule to iteratively and monotonically (in the sense of decreasing the cost function) solve the problem, similar to the EM algorithm [10]. The nonnegativity constraints will be satisfied automatically, and the upper-bounded constraints are performed by an upper truncation. Meanwhile, the algorithm still has the monotonicity and does not require an adjustable step size. We provide a rigorous global convergence proof for the case of only nonnegativity constraints, which can be easily generalized to the below-constrained case. We demonstrate the power of this mechanism by deriving many existing and new algorithms. We also present some computer simulation results to show the desirable behavior of the proposed algorithm with respect to the convergence rate and the stability compared with existing methods.

#### 2. Methodology

A note about our notation: all vectors will be column vectors unless transposed to a row vector by a prime superscript . For a matrix or vector , means that any component of is equal to or greater than 0. For a matrix , and represent the th row and th column of , respectively.

As mentioned in many articles [1, 2, 9, 11], a surrogate as defined below is useful in algorithm derivations and convergence proofs.

*Definition 1 (surrogate). *Denote as a surrogate for at (fixed) if and .

Clearly, is decreasing under the update because

There are two obvious and important properties for a surrogate: additivity and transitivity. For the former, the sum of two surrogates is a surrogate of the sum of the two original functions. For the latter, the surrogate of a surrogate of a function is a surrogate of this function.

In the following, we proceed step by step: firstly, assume that and ; secondly, remove the restriction of ; and thirdly, remove the restriction of .

##### 2.1. and

Let , where and are nonnegative vectors; then, . Note that we decompose instead of for convenient derivation, which can be seen below. We construct a surrogate by the convexity of . Denote that satisfy , and . They can be the convex combination coefficients such that

It is easy to verify that . If considering Jensen’s inequality and the convex combination coefficients , then is proven by the following inequality:

Take the partial derivatives of ; then, we can solve the one-dimensional equations to obtain a multiplicative update rule:

##### 2.2. Any but

Let , , and , where and are two nonnegative matrices; then, we can construct the surrogate for as follows: where

It is easy to verify that . By the convexity of , it is clear that

Following the same process as in Section 2.1, we can construct surrogates for and . Let then, Note that and are relative to the constant terms, which do not work for the optimization problems; thus, we ignore them. We can now obtain a surrogate for at :

Take the partial derivatives of and :

Then, we solve the one-dimensional equations to obtain a multiplicative update rule:

##### 2.3. Any and

As shown, (21) will ensure the monotonic decrease of the cost function and satisfy the nonnegativity constraints. In the following, it will be modified by a truncation at the end of each iteration to satisfy the box constraints: We will show that the truncation still ensures the monotonic decrease of the cost function.

It is easy to see the separability of the variables of (19); that is, , and every separated function has a quadratic form:

We further simplify it into a general form: where , , and . It is clear that minimizes the quadratic function. If , we change nothing. If , then it is easy to obtain that because on the open interval ( such that strictly monotonously decreases on that open interval. Thus, the proof is established.

In fact, we know that ; therefore, we may provide a generic solution to (2), which is the main result of this paper.

*Algorithm 2. *Start from an initial point ; then, (1)update by (21);(2)truncate by (22).

It is easy to generalize the constraints from to by the transformation , which is left for readers to complete by themselves.

#### 3. Convergence

The convergence proof with box constraints is very difficult; thus, we only focus on the below-bounded constraints, which can be viewed as a special case of box constraints with . Additionally, we consider the equivalence between the general below-constrained form and the nonnegative constrained one by a linear transformation ; then, we will theoretically prove the global convergence of the latter case for simplification, which is easy to extend to that of the former. In theory, the Kuhn-Tucker (KT) point will be a global solution if the cost function is convex. It is easy to see the convexity of . By Theorem 2.19 in [12], the KT conditions of (3) () are as follows:

We entertain several important and reasonable assumptions.

*Assumption 3. *For the iteration sequence , we assume that (1)the algorithm starts from a positive image;(2) for all ;(3) is a strictly convex function.

The first assumption forces the iterations to be positive, but the limit may be zero. The second condition is reasonable because suggests for any . Thus, the equation is irrelevant to , and, then, in is removable. The third one is indeed restrictive; however, it is important for the following derivation.

We will prove the global convergence along the lines of [13–17]. First, we provide several useful lemmas.

Lemma 4. *The set of accumulation points of a bounded sequence with is connected and compact.*

*Proof. *This is Theorem 28.1 from Ostrowski [18]. The reader is kindly referred to this paper for the proof.

Lemma 5. *The iteration sequence is bounded.*

*Proof. *Because is a strictly convex function, is a positive definite symmetric matrix, and, thus, it can be factorized into by Cholesky’s method, where is an invertible matrix. We assume that is the sole minimum, and, then, we expand on using a Taylor series: Let ; then, for any constant , it is easy to know the boundedness of such that is bounded, which is equivalent to being bounded. Let ; then, because monotonically decreases. Therefore, is bounded.

Lemma 6. *The sequence .*

*Proof. *Take into account that Let . Because , Because monotonically decreases and is bounded from below, ; therefore, .

Lemma 7. *If a subsequence , then as well.*

*Proof. *By contradiction, if diverges, then it must have a convergent subsequence because of the boundness by Lemma 5. Let , and if we consider the two convergent subsequences and , then there must be a positive integer to make and when . By the triangle inequality, we can obtain the contradictive result to as

Lemma 8. *At each iteration, one knows that *

*Proof. *We consider that It is easy to verify the correctness of (31) if replacing by (32).

The global convergence will be proven from the three theorems below.

Theorem 9. *Let be any convergent subsequence; then, meets the first KT condition (25).*

*Proof. *When , it is easy to obtain that by (33). In addition, and such that (Lemma 7); thus,

Theorem 10. *The entire sequence converges.*

*Proof. *According to Lemmas 4, 5, and 6, the set of accumulation points of is connected and compact. If we can prove that the number of accumulation points is finite, then the desired result follows because a finite set can be connected only if it consists of a single point [16].

To prove the existence of a finite number of accumulation points, we consider any accumulation point . Given an integer set , where is the total number of components of , is a subset of . Let be the restrictions of to the set , which is a strictly convex function of the reduced variables. It follows that has a unique minimum (Theorem 9: ). This means that an accumulation point must correspond to a subset of . The number of subsets of is finite, and, thus, the number of accumulation points is also finite.

In Theorem 9, we prove that every accumulation point meets the first KT condition, by which the full sequence convergence is provided in Theorem 10. Naturally, the limit of satisfies the first KT condition. In the following, we will show that the second KT condition is satisfied.

Theorem 11. *The limit of satisfies the second KT condition (26).*

*Proof. *When , by contradiction, we assume that there is an that satisfies . Because , there exists an and a positive integer such that for ; then, Thus, we can obtain that , which is a contradiction to .

#### 4. Solutions to Some Existing and New Examples

##### 4.1. NMF

NMF has been widely used in pattern recognition, machine learning, and data mining. It decomposes the nonnegative matrix by the product of two other nonnegative matrices and . Lee and Seung [1] have proposed using the square of the Euclidean distance to measure the similarity between and :

Minimizing them under the constraints and , Lee and Seung [1] alternated between solving an optimization problem in the variables and then solving another one in the variables . As can be seen, and have the same roles.

To take , for instance, with and given, denote such that the surrogate function can be constructed: The update rule can be obtained by solving . Reversing the roles of and , we can similarly construct the surrogate for and acquire the update rule:

##### 4.2. Variation of Linearized SVM

SVM attempts to separate points belonging to two given sets in real Euclidean space by a surface. Several practical applications of SVMs use nonlinear kernels, such as the polynomial and radial basis function kernels. However, in applications such as text classification, linear SVMs are still used because it has been observed that many text classification problems are linearly separable [19]. Most literature on large-scale SVM training have targeted the linear SVM problem, citing this fact. A dual form of linear SVM is usually used because of the simple structure, which can be formulated as [20]where is the input data, is the class label, is the penalty parameter, and represents the Lagrangian multiplier that needs to be optimized. Hsieh et al. [21] handled the constraint by removing it, which corresponded to removing the “intercept” from the classifier. For a simplified expression, we denote , and, then, we obtain the following simple formula:

We can solve this optimization problem using the proposed method. Let and , where and are two nonnegative matrices; then, where

We, respectively, construct surrogates for and . Let then,

We now obtain a surrogate for at :

We solve the one-dimensional equation and truncate it to obtain an update:

##### 4.3. Nonnegative Image Deblurring

In many optical devices, the process of image blurring can be considered to be the result of convolution by a point spread function (PSF) [22, 23]. It is assumed that the degraded image is in the form , where is the true image and is the PSF matrix consisting of PSFs at every pixel. As noted in [24, 25], a simple way to approach the deconvolution problem is to find the least squares (LS) estimation between and . It is well known that the problem of restoring the original image from the noisy and degraded version is an ill-posed inverse problem: small perturbations in the data may result in an unacceptable result [26]. The Tikhonov regularization method [27, 28] is a popular method that generally leads to a unique solution, which is formulated as Note that we must approve negative values existing in the matrix .

In [17], we assume that and are a nonnegative matrix and vector and that there is no restriction on ; then, we, respectively, construct surrogate functions for and using the method proposed in this paper such that we obtain a multiplicative update rule as follows:If , it is De Pierro’s ISRT (Image Space Reconstruction Technique) [4]:

##### 4.4. CT Reconstruction with Regularization

In CT reconstruction, let be the system probability matrix, projections, and a predetermined matrix. Then, a classical approach is to choose an (X-ray attenuation image) such that the LS error [5] between and is minimized: where serves as a penalty parameter. Many matrices have been used, such as the first- or second-order derivative matrix or wavelet basis matrix. Note that both and are a nonnegative matrix and vector; however, we must accept negative values existing in the matrix .

The alternating direction method of multipliers (ADMM) [29, 30] has been developed for solving optimization problems. This method decomposes the original problem into three subproblems and then sequentially solves them at each iteration.

Introducing the additional variable , a fixed penalty parameter , and the Lagrangian multiplier , a unified framework can then be given to solve the -norm regularized LS reconstruction problem [29].

*Algorithm 12 (ADMM general framework). *() .

() .

() .

The -update is the most difficult problem that minimizes with nonnegativity constraints. We can, respectively, construct surrogate functions for and ; then minimize the sum of the surrogates to update ; however, there is a high overlap ratio with the above, so we leave them to the readers.

#### 5. Experimental Results

We compare the performance of the proposed surrogate method, with those of the PL method (6) and the CG method on simulated data and CT projection data. For the PL method, we use [31], where and denote the 1- and -norm of a matrix. The code for the CG method comes from [32], which is slightly modified to meet our criteria.

The experiments are performed on a HP Compaq PC with a 3.00 GHz Core i5 CPU and 4 GB memory. The algorithms are implemented in MATLAB 7.0. All of the algorithms are initiated by the same uniform image for a fair comparison.

The mean square error (MSE) is used to measure the similarity to the true solution, as given below:

The following criterion is applied to stop the iterative process: where is a difference tolerance.

##### 5.1. Numerical Simulation

Here, we randomly generate and , where the former are uniformly distributed on and the latter are on . Then we obtain by . We will minimize under the box constraints . In the experiment, we generate , and once, but we randomly generate start points (uniformly distributed on ) 50 times to obtain a reliable averaged result.

Figure 1 presents a comparison of MSE and versus iteration number. As shown, CG rapidly finishes iteration to a static solution, so that after that, which can not be shown with a log-scaled -axis. However, such a phenomenon does not mean a good result. Instead, it is the worst result among the three algorithms by observing the MSE curve. We can conclude that CG is not suitable to the box-constrained LS problem. PL and the proposed method always decrease the curves of MSE and . However, the proposed method shows a superior performance than the PL method.