This paper is concerned with the numerical solution of the Sylvester tensor equation, which includes the Sylvester matrix equation as special case. By applying hierarchical identification principle proposed by Ding and Chen, 2005, and by using tensor arithmetic concepts, an iterative algorithm and its modification are established to solve the Sylvester tensor equation. Convergence analysis indicates that the iterative solutions always converge to the exact solution for arbitrary initial value. Finally, some examples are provided to show that the proposed algorithms are effective.

1. Introduction

This paper is concerned with the solution of the following equation: where matrices , , and , and tensor . The properties of the operators , , and are described in detail in Section 2. When is a 2-mode tensor, that is, a matrix, (1) reduces to It is just a Sylvester matrix equation, which has received much attention, often motivated by applications in control theory, model reduction, signal processing, filtering, and system identification [17]. So we call (1) a Sylvester tensor equation in this paper.

The simplest direct approach for solving the matrix equation (2) is Kronecker product method [8] in which the huge memory is needed, for example, if , the dimension of the new coefficient matrix is 10000. Alternative direct method is to firstly transform coefficient matrices into special forms, such as Schur canonical form [9], Hessenberg-Schur form [1], and [1012], and then to solve another matrix equation which may be readily computed. Such kinds of methods have been widely adopted. For example, 3D Schur decomposition was developed to solve the Sylvester tensor equation (1) (for more details see [13]). However, these methods require the transformations of coefficient matrices, which need considerable cost.

Iterative algorithms for solving matrix equations have been studied extensively, for example, [1421]. For example, Ding and Chen proposed a few simple iterative methods for matrix equation in [2224]. The methods, resembling the classical Jacobi and Gaussian iterations for linear systems, are easy to implement and cost little per step. Gradient based iterative algorithms and least squares based iterative algorithms have also been proposed in [2527] for (coupled) matrix equations, which include the Sylvester matrix equations and Lyapunov matrix equations as a special form. However, to the best knowledge of the authors, there is hardly any iterative method for the Sylvester tensor equation in references up to now.

In this paper, we first extend the gradient based iterative algorithm for Sylvester matrix equation (2) to solve the Sylvester matrix equation (1) based on previous work in [22]. The basic idea of such an algorithm is to regard the unknown tensor as the system parameter tensor by applying hierarchical identification principle. Then, we present a more efficient modified iterative method. The convergence properties of the two algorithms are proved. Numerical examples are given to show that the two iterations are effective.

The rest of this paper is organized as follows. In Section 2, we describe tensor notations and common operations used throughout the paper. An extension and modification of the gradient based iterative algorithms to the Sylvester tensor equation (1) are presented in Section 3. The convergence of the algorithms are analyzed in Section 4. Section 5 contains some numerical experiments to illustrate the theoretical results in this paper. Finally, some conclusions are outlined in Section 6.

2. Tensor Notations and Basic Operations

In this section, we briefly review some concepts and notations that are used throughout the paper. Matrices are denoted by capital letter, for example, , , and . Tensors are denoted by Euler script letter, for example, , and . The identity matrix is denoted by . The order of a tensor is the number of dimensions, also known as ways or modes. As a special case, a vector is 1-mode tensor and a matrix is 2-mode tensor. The operator stacks the columns of a matrix to form a vector. denotes the Frobenius norm unless otherwise stated.

2.1. Tensor-Matrix Multiplication

An important operation for a tensor is the tensor-matrix multiplication [2830]. The 1-mode product of a tensor by a matrix , denoted by , is a tensor in which the entries are given by Similarly, 2-mode multiplication of a tensor by a matrix 3-mode multiplication is analogous and is omitted here.

It follows immediately from the definition that -mode and -mode multiplication commutes if : If the modes are the same, then

2.2. Inner Product and Tensor Norm

The inner product of two tensors is defined by and the norm induced by this inner product is

For two tensors and , the computation of can be simplified as follows:

Moreover, -mode multiplication commutes with respect to the inner product, that is,

Finally, the tensor norm of -mode multiplication is mutually consistent, that is,

3. Iterative Algorithms

In this section, we apply the tensor arithmetic concepts and the hierarchical identification principle to solve the Sylvester tensor equation (1).

Firstly, we consider the existence and uniqueness of the solution of (1). By using the Kronecker product, we have which is equivalent to (1), This vectorized representation immediately leads to the following result.

Lemma 1. Equation (1) has a unique solution if and only if   for any , , and , where , and represent the eigenvalues of , and , respectively.

In the case of the Sylvester matrix equation (2), this corresponds to the well-known condition , for any and .

In order to drive the iterative algorithms for solving (1), we need to introduce the following residual tensors: According to the hierarchical identification principle [22], (1) is decomposed into three subsystems: Consider the following least squares problem: Note that the derivatives of with respect to are derived in [30] and are given by Minimizing leads to the following recursive equations: where is iterative step length and is given in the following section.

Substituting (13) into (17) and replacing the unknown variable in (13) with their estimate at time , we have where    . Taking the average of , , and , we obtain Algorithm 1.

Input: , and .
Output: .
Initialize .
 Compute ,

Remark 2. Note that when we compute in Algorithm 1, the has been computed. Similarly, when we compute , the and have been computed. Hence, we can fully take advantage of the information of to update the and get a more efficient algorithm as in Algorithm 2.

Input: , and .
Output: .
Initialize .

4. Convergence Analysis

In this section, we will discuss convergence properties of the algorithms in the previous section. The following theorem provides a sufficient condition to guarantee convergence of the Algorithm 1.

Theorem 3. If the Sylvester tensor equation (1) has a unique solution and , then the iterative solution given by the Algorithm 1 converges to for arbitrary initial value .

Proof. Define the error tensors Using (1) and Algorithm 1, the following equalities are trivial: where . Taking norm on both sides of (21), and using (9)–(11), we have According to (19) and (22), we have that is, Since , we have This implies that as , , that is, According to Lemma 1, as . Thus, the theorem holds.

Based on Theorem 3, we have the following theorem.

Theorem 4. If the Sylvester tensor equation (1) has a unique solution and , then the iterative solution given by the Algorithm 2 converges to for any initial value .

Proof. The proof is similar to Theorem 3 and hence omitted.

Now we consider the convergence rate of the Algorithm 1. From (19) and (21), it is not difficult to get the following residual error equation: where . We observe that the closer the eigenvalues of are to , the closer the eigenvalues of tend to zero, and hence, the faster the residual error converges to zero. So the convergence rate of Algorithm 1 depends on the condition number of the associated system [22]; that is, Algorithm 1 has a fast convergence rate for a small condition number of . The analysis of Algorithm 2 is similar; hence, it is omitted here.

5. Numerical Experiments

This section gives some examples to illustrate the performance of the proposed algorithms. All the implementations are based on the codes from the MATLAB Tensor Toolbox developed by Bader and Kolda [31, 32].

Example 5. Let Then, the solution of is Taking , , , we apply Algorithms 1 and 2 to compute approximate solution in (1). Since the exact solution is known in this example, we plot the values to indicate the convergence of the sequence . These values are plotted on logarithmic scales, if the curve is approximate decreasing line, then it is indicated that the convergence of the sequence is at least linear.

From Figure 1, we can see that two algorithms are convergent linearly and MGI algorithm converges much faster than GI algorithm.

Remark 6. The effect of changing the step length and is illustrated in Figure 2. We observe that the larger the step length is, the faster the convergence of the algorithm. However, if the step length is too large, the algorithm may diverge. How to choose a best step length is still a issue to be studied [22].

Example 7. This example comes from [22], except for and which will be set to a random matrix (and tensor). The MATLAB codes which generate these matrices and compute the step lengths are as follow:   ; ; ; rand(state,0); alpha = 3;   = triu(rand ,1) + diag(alpha + diag(rand( )));   = triu(rand ,1) + diag(alpha + diag (rand( )));   = triu(rand(l,l),1) + diag(alpha + diag (rand ));   = tenrand([ ]);  mu = 2.2/(norm( ) 2 + norm( ) 2 + norm ( ) 2); Compute   kappa = 1.2 * min([1/norm( ) /norm ( ) /norm( ) 2]); Compute .Since the exact solution in this example is unknown, we plot the relative residual errors

From Figure 3, we can see that the convergence of GI algorithm tends to slow down after some iterative steps. The MGI algorithm converges linearly, and much faster than GI method.

Example 8. In this example, we consider the effect of the condition number of matrix on two algorithms. The stoping criterion is set to be . The MATLAB codes which generate these matrices are as follows:   ; ; ; rand(state,0);   = rand + diag(ones ) * alpha;   = rand + diag(ones ) * alpha;   = rand + diag(ones ) * alpha;   = tenrand([ ]).

For different , the number of iterations and the corresponding condition number ( ) of the matrix are outlined in Table 1.

The results of Table 1 affirm the analysis at the end of Section 4. In particular, we see that the convergence rate becomes faster as the condition number of decreases.

6. Conclusions

We have presented the gradient based iterative algorithms for solving Sylvester tensor equation, Sylvester matrix equation as a special case included. They are based on the hierarchical identification principle and tensor arithmetic concepts. The convergence analysis indicates that the iterative solutions given by the proposed algorithms converge to exact solution for any initial values. Such statement is also confirmed by the given numerical examples. In addition, the idea of this paper can be easily extended to study iterative solutions of high order ( is -mode tensor) Sylvester tensor equation.


This work is supported by the National Natural Science Foundation of China (Grant no. 11201092, no. 11261012) and the Scientific Research Starting Foundation for Doctors of Guizhou Normal University, China.