A Gradient Based Iterative Solutions for Sylvester Tensor Equations
This paper is concerned with the numerical solution of the Sylvester tensor equation, which includes the Sylvester matrix equation as special case. By applying hierarchical identification principle proposed by Ding and Chen, 2005, and by using tensor arithmetic concepts, an iterative algorithm and its modification are established to solve the Sylvester tensor equation. Convergence analysis indicates that the iterative solutions always converge to the exact solution for arbitrary initial value. Finally, some examples are provided to show that the proposed algorithms are effective.
This paper is concerned with the solution of the following equation: where matrices , , and , and tensor . The properties of the operators , , and are described in detail in Section 2. When is a 2-mode tensor, that is, a matrix, (1) reduces to It is just a Sylvester matrix equation, which has received much attention, often motivated by applications in control theory, model reduction, signal processing, filtering, and system identification [1–7]. So we call (1) a Sylvester tensor equation in this paper.
The simplest direct approach for solving the matrix equation (2) is Kronecker product method  in which the huge memory is needed, for example, if , the dimension of the new coefficient matrix is 10000. Alternative direct method is to firstly transform coefficient matrices into special forms, such as Schur canonical form , Hessenberg-Schur form , and [10–12], and then to solve another matrix equation which may be readily computed. Such kinds of methods have been widely adopted. For example, 3D Schur decomposition was developed to solve the Sylvester tensor equation (1) (for more details see ). However, these methods require the transformations of coefficient matrices, which need considerable cost.
Iterative algorithms for solving matrix equations have been studied extensively, for example, [14–21]. For example, Ding and Chen proposed a few simple iterative methods for matrix equation in [22–24]. The methods, resembling the classical Jacobi and Gaussian iterations for linear systems, are easy to implement and cost little per step. Gradient based iterative algorithms and least squares based iterative algorithms have also been proposed in [25–27] for (coupled) matrix equations, which include the Sylvester matrix equations and Lyapunov matrix equations as a special form. However, to the best knowledge of the authors, there is hardly any iterative method for the Sylvester tensor equation in references up to now.
In this paper, we first extend the gradient based iterative algorithm for Sylvester matrix equation (2) to solve the Sylvester matrix equation (1) based on previous work in . The basic idea of such an algorithm is to regard the unknown tensor as the system parameter tensor by applying hierarchical identification principle. Then, we present a more efficient modified iterative method. The convergence properties of the two algorithms are proved. Numerical examples are given to show that the two iterations are effective.
The rest of this paper is organized as follows. In Section 2, we describe tensor notations and common operations used throughout the paper. An extension and modification of the gradient based iterative algorithms to the Sylvester tensor equation (1) are presented in Section 3. The convergence of the algorithms are analyzed in Section 4. Section 5 contains some numerical experiments to illustrate the theoretical results in this paper. Finally, some conclusions are outlined in Section 6.
2. Tensor Notations and Basic Operations
In this section, we briefly review some concepts and notations that are used throughout the paper. Matrices are denoted by capital letter, for example, , , and . Tensors are denoted by Euler script letter, for example, , and . The identity matrix is denoted by . The order of a tensor is the number of dimensions, also known as ways or modes. As a special case, a vector is 1-mode tensor and a matrix is 2-mode tensor. The operator stacks the columns of a matrix to form a vector. denotes the Frobenius norm unless otherwise stated.
2.1. Tensor-Matrix Multiplication
An important operation for a tensor is the tensor-matrix multiplication [28–30]. The 1-mode product of a tensor by a matrix , denoted by , is a tensor in which the entries are given by Similarly, 2-mode multiplication of a tensor by a matrix 3-mode multiplication is analogous and is omitted here.
It follows immediately from the definition that -mode and -mode multiplication commutes if : If the modes are the same, then
2.2. Inner Product and Tensor Norm
The inner product of two tensors is defined by and the norm induced by this inner product is
For two tensors and , the computation of can be simplified as follows:
Moreover, -mode multiplication commutes with respect to the inner product, that is,
Finally, the tensor norm of -mode multiplication is mutually consistent, that is,
3. Iterative Algorithms
In this section, we apply the tensor arithmetic concepts and the hierarchical identification principle to solve the Sylvester tensor equation (1).
Firstly, we consider the existence and uniqueness of the solution of (1). By using the Kronecker product, we have which is equivalent to (1), This vectorized representation immediately leads to the following result.
Lemma 1. Equation (1) has a unique solution if and only if for any , , and , where , and represent the eigenvalues of , and , respectively.
In the case of the Sylvester matrix equation (2), this corresponds to the well-known condition , for any and .
In order to drive the iterative algorithms for solving (1), we need to introduce the following residual tensors: According to the hierarchical identification principle , (1) is decomposed into three subsystems: Consider the following least squares problem: Note that the derivatives of with respect to are derived in  and are given by Minimizing leads to the following recursive equations: where is iterative step length and is given in the following section.
Remark 2. Note that when we compute in Algorithm 1, the has been computed. Similarly, when we compute , the and have been computed. Hence, we can fully take advantage of the information of to update the and get a more efficient algorithm as in Algorithm 2.
4. Convergence Analysis
In this section, we will discuss convergence properties of the algorithms in the previous section. The following theorem provides a sufficient condition to guarantee convergence of the Algorithm 1.
Proof. Define the error tensors Using (1) and Algorithm 1, the following equalities are trivial: where . Taking norm on both sides of (21), and using (9)–(11), we have According to (19) and (22), we have that is, Since , we have This implies that as , , that is, According to Lemma 1, as . Thus, the theorem holds.
Based on Theorem 3, we have the following theorem.
Proof. The proof is similar to Theorem 3 and hence omitted.
Now we consider the convergence rate of the Algorithm 1. From (19) and (21), it is not difficult to get the following residual error equation: where . We observe that the closer the eigenvalues of are to , the closer the eigenvalues of tend to zero, and hence, the faster the residual error converges to zero. So the convergence rate of Algorithm 1 depends on the condition number of the associated system ; that is, Algorithm 1 has a fast convergence rate for a small condition number of . The analysis of Algorithm 2 is similar; hence, it is omitted here.
5. Numerical Experiments
This section gives some examples to illustrate the performance of the proposed algorithms. All the implementations are based on the codes from the MATLAB Tensor Toolbox developed by Bader and Kolda [31, 32].
Example 5. Let Then, the solution of is Taking , , , we apply Algorithms 1 and 2 to compute approximate solution in (1). Since the exact solution is known in this example, we plot the values to indicate the convergence of the sequence . These values are plotted on logarithmic scales, if the curve is approximate decreasing line, then it is indicated that the convergence of the sequence is at least linear.
From Figure 1, we can see that two algorithms are convergent linearly and MGI algorithm converges much faster than GI algorithm.
Remark 6. The effect of changing the step length and is illustrated in Figure 2. We observe that the larger the step length is, the faster the convergence of the algorithm. However, if the step length is too large, the algorithm may diverge. How to choose a best step length is still a issue to be studied .
Example 7. This example comes from , except for and which will be set to a random matrix (and tensor). The MATLAB codes which generate these matrices and compute the step lengths are as follow: ; ; ; rand(’state’,0); alpha = 3; = triu(rand,1) + diag(alpha + diag(rand())); = triu(rand,1) + diag(alpha + diag (rand())); = triu(rand(l,l),1) + diag(alpha + diag (rand)); = tenrand(); mu = 2.2/(norm()2 + norm()2 + norm ()2); Compute kappa = 1.2 * min([1/norm()/norm ()/norm()2]); Compute .Since the exact solution in this example is unknown, we plot the relative residual errors
From Figure 3, we can see that the convergence of GI algorithm tends to slow down after some iterative steps. The MGI algorithm converges linearly, and much faster than GI method.
Example 8. In this example, we consider the effect of the condition number of matrix on two algorithms. The stoping criterion is set to be . The MATLAB codes which generate these matrices are as follows: ; ; ; rand(’state’,0); = rand + diag(ones) * alpha; = rand + diag(ones) * alpha; = rand + diag(ones) * alpha; = tenrand().
For different , the number of iterations and the corresponding condition number () of the matrix are outlined in Table 1.
We have presented the gradient based iterative algorithms for solving Sylvester tensor equation, Sylvester matrix equation as a special case included. They are based on the hierarchical identification principle and tensor arithmetic concepts. The convergence analysis indicates that the iterative solutions given by the proposed algorithms converge to exact solution for any initial values. Such statement is also confirmed by the given numerical examples. In addition, the idea of this paper can be easily extended to study iterative solutions of high order ( is -mode tensor) Sylvester tensor equation.
This work is supported by the National Natural Science Foundation of China (Grant no. 11201092, no. 11261012) and the Scientific Research Starting Foundation for Doctors of Guizhou Normal University, China.
J. Li, F. Ding, and G. Yang, “Maximum likelihood least squares identification method for input nonlinear finite impulse response moving average systems,” Mathematical and Computer Modelling, vol. 55, no. 3-4, pp. 442–450, 2012.View at: Publisher Site | Google Scholar | Zentralblatt MATH | MathSciNet
G. H. Golub and C. F. Van Loan, Matrix Computations, Johns Hopkins University Press, Baltimore, Md, USA, 3rd edition, 1996.View at: MathSciNet
J. A. Heinen, “Technique for solving the extended discrete Lyapunov matrix equation,” IEEE Transactions on Automatic Control, vol. 17, no. 1, pp. 156–157, 1972.View at: Google Scholar
B.-W. Li, S. Tian, Y.-S. Sun, and Z.-M. Hu, “Schur-decomposition for 3D matrix equations and its application in solving radiative discrete ordinates equations discretized by Chebyshev collocation spectral method,” Journal of Computational Physics, vol. 229, no. 4, pp. 1198–1212, 2010.View at: Publisher Site | Google Scholar | Zentralblatt MATH | MathSciNet
T. G. Kolda, “Multilinear Operators for Higher-Order Decompositions,” Tech. Rep. SAND2006-2081, Sandia National Laboratories, Livermore, Calif, USA, 2006.View at: Google Scholar