Research Article | Open Access
A Gradient Based Iterative Solutions for Sylvester Tensor Equations
This paper is concerned with the numerical solution of the Sylvester tensor equation, which includes the Sylvester matrix equation as special case. By applying hierarchical identification principle proposed by Ding and Chen, 2005, and by using tensor arithmetic concepts, an iterative algorithm and its modification are established to solve the Sylvester tensor equation. Convergence analysis indicates that the iterative solutions always converge to the exact solution for arbitrary initial value. Finally, some examples are provided to show that the proposed algorithms are effective.
This paper is concerned with the solution of the following equation: where matrices , , and , and tensor . The properties of the operators , , and are described in detail in Section 2. When is a 2-mode tensor, that is, a matrix, (1) reduces to It is just a Sylvester matrix equation, which has received much attention, often motivated by applications in control theory, model reduction, signal processing, filtering, and system identification [1–7]. So we call (1) a Sylvester tensor equation in this paper.
The simplest direct approach for solving the matrix equation (2) is Kronecker product method  in which the huge memory is needed, for example, if , the dimension of the new coefficient matrix is 10000. Alternative direct method is to firstly transform coefficient matrices into special forms, such as Schur canonical form , Hessenberg-Schur form , and [10–12], and then to solve another matrix equation which may be readily computed. Such kinds of methods have been widely adopted. For example, 3D Schur decomposition was developed to solve the Sylvester tensor equation (1) (for more details see ). However, these methods require the transformations of coefficient matrices, which need considerable cost.
Iterative algorithms for solving matrix equations have been studied extensively, for example, [14–21]. For example, Ding and Chen proposed a few simple iterative methods for matrix equation in [22–24]. The methods, resembling the classical Jacobi and Gaussian iterations for linear systems, are easy to implement and cost little per step. Gradient based iterative algorithms and least squares based iterative algorithms have also been proposed in [25–27] for (coupled) matrix equations, which include the Sylvester matrix equations and Lyapunov matrix equations as a special form. However, to the best knowledge of the authors, there is hardly any iterative method for the Sylvester tensor equation in references up to now.
In this paper, we first extend the gradient based iterative algorithm for Sylvester matrix equation (2) to solve the Sylvester matrix equation (1) based on previous work in . The basic idea of such an algorithm is to regard the unknown tensor as the system parameter tensor by applying hierarchical identification principle. Then, we present a more efficient modified iterative method. The convergence properties of the two algorithms are proved. Numerical examples are given to show that the two iterations are effective.
The rest of this paper is organized as follows. In Section 2, we describe tensor notations and common operations used throughout the paper. An extension and modification of the gradient based iterative algorithms to the Sylvester tensor equation (1) are presented in Section 3. The convergence of the algorithms are analyzed in Section 4. Section 5 contains some numerical experiments to illustrate the theoretical results in this paper. Finally, some conclusions are outlined in Section 6.
2. Tensor Notations and Basic Operations
In this section, we briefly review some concepts and notations that are used throughout the paper. Matrices are denoted by capital letter, for example, , , and . Tensors are denoted by Euler script letter, for example, , and . The identity matrix is denoted by . The order of a tensor is the number of dimensions, also known as ways or modes. As a special case, a vector is 1-mode tensor and a matrix is 2-mode tensor. The operator stacks the columns of a matrix to form a vector. denotes the Frobenius norm unless otherwise stated.
2.1. Tensor-Matrix Multiplication
An important operation for a tensor is the tensor-matrix multiplication [28–30]. The 1-mode product of a tensor by a matrix , denoted by , is a tensor in which the entries are given by Similarly, 2-mode multiplication of a tensor by a matrix 3-mode multiplication is analogous and is omitted here.
It follows immediately from the definition that -mode and -mode multiplication commutes if : If the modes are the same, then
2.2. Inner Product and Tensor Norm
The inner product of two tensors is defined by and the norm induced by this inner product is
For two tensors and , the computation of can be simplified as follows:
Moreover, -mode multiplication commutes with respect to the inner product, that is,
Finally, the tensor norm of -mode multiplication is mutually consistent, that is,
3. Iterative Algorithms
In this section, we apply the tensor arithmetic concepts and the hierarchical identification principle to solve the Sylvester tensor equation (1).
Firstly, we consider the existence and uniqueness of the solution of (1). By using the Kronecker product, we have which is equivalent to (1), This vectorized representation immediately leads to the following result.
Lemma 1. Equation (1) has a unique solution if and only if for any , , and , where , and represent the eigenvalues of , and , respectively.
In the case of the Sylvester matrix equation (2), this corresponds to the well-known condition , for any and .
In order to drive the iterative algorithms for solving (1), we need to introduce the following residual tensors: According to the hierarchical identification principle , (1) is decomposed into three subsystems: Consider the following least squares problem: Note that the derivatives of with respect to are derived in  and are given by Minimizing leads to the following recursive equations: where is iterative step length and is given in the following section.
Remark 2. Note that when we compute in Algorithm 1, the has been computed. Similarly, when we compute , the and have been computed. Hence, we can fully take advantage of the information of to update the and get a more efficient algorithm as in Algorithm 2.
4. Convergence Analysis
In this section, we will discuss convergence properties of the algorithms in the previous section. The following theorem provides a sufficient condition to guarantee convergence of the Algorithm 1.
Proof. Define the error tensors Using (1) and Algorithm 1, the following equalities are trivial: where . Taking norm on both sides of (21), and using (9)–(11), we have According to (19) and (22), we have that is, Since , we have This implies that as , , that is, According to Lemma 1, as . Thus, the theorem holds.
Based on Theorem 3, we have the following theorem.
Proof. The proof is similar to Theorem 3 and hence omitted.
Now we consider the convergence rate of the Algorithm 1. From (19) and (21), it is not difficult to get the following residual error equation: where . We observe that the closer the eigenvalues of are to , the closer the eigenvalues of tend to zero, and hence, the faster the residual error converges to zero. So the convergence rate of Algorithm 1 depends on the condition number of the associated system ; that is, Algorithm 1 has a fast convergence rate for a small condition number of . The analysis of Algorithm 2 is similar; hence, it is omitted here.
5. Numerical Experiments
This section gives some examples to illustrate the performance of the proposed algorithms. All the implementations are based on the codes from the MATLAB Tensor Toolbox developed by Bader and Kolda [31, 32].
Example 5. Let Then, the solution of is Taking , , , we apply Algorithms 1 and 2 to compute approximate solution in (1). Since the exact solution is known in this example, we plot the values to indicate the convergence of the sequence . These values are plotted on logarithmic scales, if the curve is approximate decreasing line, then it is indicated that the convergence of the sequence is at least linear.
From Figure 1, we can see that two algorithms are convergent linearly and MGI algorithm converges much faster than GI algorithm.
Remark 6. The effect of changing the step length and is illustrated in Figure 2. We observe that the larger the step length is, the faster the convergence of the algorithm. However, if the step length is too large, the algorithm may diverge. How to choose a best step length is still a issue to be studied .
Example 7. This example comes from , except for and which will be set to a random matrix (and tensor). The MATLAB codes which generate these matrices and compute the step lengths are as follow: ; ; ; rand(’state’,0); alpha = 3; = triu(rand,1) + diag(alpha + diag(rand())); = triu(rand,1) + diag(alpha + diag (rand())); = triu(rand(l,l),1) + diag(alpha + diag (rand)); = tenrand(); mu = 2.2/(norm()2 + norm()2 + norm ()2); Compute kappa = 1.2 * min([1/norm()/norm ()/norm()2]); Compute .Since the exact solution in this example is unknown, we plot the relative residual errors
From Figure 3, we can see that the convergence of GI algorithm tends to slow down after some iterative steps. The MGI algorithm converges linearly, and much faster than GI method.
Example 8. In this example, we consider the effect of the condition number of matrix on two algorithms. The stoping criterion is set to be . The MATLAB codes which generate these matrices are as follows: ; ; ; rand(’state’,0); = rand + diag(ones) * alpha; = rand + diag(ones) * alpha; = rand + diag(ones) * alpha; = tenrand().
For different , the number of iterations and the corresponding condition number () of the matrix are outlined in Table 1.
We have presented the gradient based iterative algorithms for solving Sylvester tensor equation, Sylvester matrix equation as a special case included. They are based on the hierarchical identification principle and tensor arithmetic concepts. The convergence analysis indicates that the iterative solutions given by the proposed algorithms converge to exact solution for any initial values. Such statement is also confirmed by the given numerical examples. In addition, the idea of this paper can be easily extended to study iterative solutions of high order ( is -mode tensor) Sylvester tensor equation.
This work is supported by the National Natural Science Foundation of China (Grant no. 11201092, no. 11261012) and the Scientific Research Starting Foundation for Doctors of Guizhou Normal University, China.
- G. H. Golub, S. Nash, and C. Van Loan, “A Hessenberg-Schur method for the problem AX+XB=C,” IEEE Transactions on Automatic Control, vol. 24, no. 6, pp. 909–913, 1979.
- U. Baur and P. Benner, “Cross-Gramian based model reduction for data-sparse systems,” Electronic Transactions on Numerical Analysis, vol. 31, pp. 256–270, 2008.
- D. Calvetti and L. Reichel, “Application of ADI iterative methods to the restoration of noisy images,” SIAM Journal on Matrix Analysis and Applications, vol. 17, no. 1, pp. 165–186, 1996.
- J. Li, F. Ding, and G. Yang, “Maximum likelihood least squares identification method for input nonlinear finite impulse response moving average systems,” Mathematical and Computer Modelling, vol. 55, no. 3-4, pp. 442–450, 2012.
- F. Ding, “Hierarchical multi-innovation stochastic gradient algorithm for Hammerstein nonlinear system modeling,” Applied Mathematical Modelling, vol. 37, no. 4, pp. 1694–1704, 2013.
- W. Wang, F. Ding, and J. Dai, “Maximum likelihood least squares identification for systems with autoregressive moving average noise,” Applied Mathematical Modelling, vol. 36, no. 5, pp. 1842–1853, 2012.
- F. Ding, Y. Liu, and B. Bao, “Gradient based and least squares based iterative estimation algorithms for multi-input multi-output systems,” Proceedings of the Institution of Mechanical Engineers I, vol. 226, no. 1, pp. 43–55, 2012.
- G. H. Golub and C. F. Van Loan, Matrix Computations, Johns Hopkins University Press, Baltimore, Md, USA, 3rd edition, 1996.
- J. A. Heinen, “Technique for solving the extended discrete Lyapunov matrix equation,” IEEE Transactions on Automatic Control, vol. 17, no. 1, pp. 156–157, 1972.
- D. C. Sorensen and Y. Zhou, “Direct methods for matrix Sylvester and Lyapunov equations,” Journal of Applied Mathematics, no. 6, pp. 277–303, 2003.
- H. S. Najafi and A. H. Refahi Sheikhani, “Refinement methods for state estimation via Sylvester-observer equation,” Advances in Numerical Analysis, vol. 2011, Article ID 184314, 15 pages, 2011.
- H. Li and R. Li, “A note on the inversion of Sylvester matrices in control systems,” Mathematical Problems in Engineering, vol. 2011, Article ID 609863, 10 pages, 2011.
- B.-W. Li, S. Tian, Y.-S. Sun, and Z.-M. Hu, “Schur-decomposition for 3D matrix equations and its application in solving radiative discrete ordinates equations discretized by Chebyshev collocation spectral method,” Journal of Computational Physics, vol. 229, no. 4, pp. 1198–1212, 2010.
- G.-S. Wang, Q. Lv, and G.-R. Duan, “On the parametric solution to the second-order Sylvester matrix equation MVF2+DVF+KV=BW,” Mathematical Problems in Engineering, vol. 2007, Article ID 21078, 16 pages, 2007.
- J. Zhou, R. Wang, and Q. Niu, “A preconditioned iteration method for solving Sylvester equations,” Journal of Applied Mathematics, vol. 2012, Article ID 401059, 12 pages, 2012.
- L. Xie, Y. Liu, and H. Yang, “Gradient based and least squares based iterative algorithms for matrix equations AXB +CXTD = F,” Applied Mathematics and Computation, vol. 217, no. 5, pp. 2191–2199, 2010.
- J. Ding, Y. Liu, and F. Ding, “Iterative solutions to matrix equations of the form AixBi=Fi,” Computers & Mathematics with Applications, vol. 59, no. 11, pp. 3500–3507, 2010.
- P. Benner, R.-C. Li, and N. Truhar, “On the ADI method for Sylvester equations,” Journal of Computational and Applied Mathematics, vol. 233, no. 4, pp. 1035–1045, 2009.
- L. Grasedyck and W. Hackbusch, “A multigrid method to solve large scale Sylvester equations,” SIAM Journal on Matrix Analysis and Applications, vol. 29, no. 3, pp. 870–894, 2007.
- D. Y. Hu and L. Reichel, “Krylov-subspace methods for the Sylvester equation,” Linear Algebra and its Applications, vol. 172, pp. 283–313, 1992.
- J.-R. Li and J. White, “Low-rank solution of Lyapunov equations,” SIAM Review, vol. 46, no. 4, pp. 693–713, 2004.
- F. Ding and T. Chen, “Gradient based iterative algorithms for solving a class of matrix equations,” IEEE Transactions on Automatic Control, vol. 50, no. 8, pp. 1216–1221, 2005.
- F. Ding and T. Chen, “Iterative least-squares solutions of coupled Sylvester matrix equations,” Systems & Control Letters, vol. 54, no. 2, pp. 95–107, 2005.
- F. Ding and T. Chen, “On iterative solutions of general coupled matrix equations,” SIAM Journal on Control and Optimization, vol. 44, no. 6, pp. 2269–2284, 2006.
- L. Xie, J. Ding, and F. Ding, “Gradient based iterative solutions for general linear matrix equations,” Computers & Mathematics with Applications, vol. 58, no. 7, pp. 1441–1448, 2009.
- Q. Niu, X. Wang, and L.-Z. Lu, “A relaxed gradient based algorithm for solving Sylvester equations,” Asian Journal of Control C, vol. 13, no. 3, pp. 461–464, 2011.
- F. Ding, P. X. Liu, and J. Ding, “Iterative solutions of the generalized Sylvester matrix equations by using the hierarchical identification principle,” Applied Mathematics and Computation, vol. 197, no. 1, pp. 41–50, 2008.
- L. Eldén, Matrix Methods in Data Mining and Pattern Recognition, Society for Industrial and Applied Mathematics (SIAM), Philadelphia, Pa, USA, 2007.
- T. G. Kolda and B. W. Bader, “Tensor decompositions and applications,” SIAM Review, vol. 51, no. 3, pp. 455–500, 2009.
- T. G. Kolda, “Multilinear Operators for Higher-Order Decompositions,” Tech. Rep. SAND2006-2081, Sandia National Laboratories, Livermore, Calif, USA, 2006.
- B. W. Bader and T. G. Kolda, “Efficient MATLAB computations with sparse and factored tensors,” SIAM Journal on Scientific Computing, vol. 30, no. 1, pp. 205–231, 2007/08.
- B. W. Bader and T. G. Kolda, “MATLAB Tensor Toolbox, Version 2.4,” 2010, http://csmr.ca.sandia.gov/~tgkolda/TensorToolbox/.
Copyright © 2013 Zhen Chen and Linzhang Lu. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.