Abstract

We propose a randomized sampling Kaczmarz algorithm for the solution of very large systems of linear equations by introducing a maximal sampling probability control criterion, which is aimed at grasping the largest entry of the absolute sampling residual vector at each iteration. This new method differs from the greedy randomized Kaczmarz algorithm, which needs not to compute the residual vector of the whole linear system to determine the working rows. Numerical experiments show that the proposed algorithm has the most significant effect when the selected row number, i.e, the size of samples, is equal to the logarithm of all rows. Finally, we extend the randomized sampling Kaczmarz to signal reconstruction problems in compressed sensing. Signal experiments show that the new extended algorithm is more effective than the randomized sparse Kaczmarz method for online compressed sensing.

1. Introduction

In this paper, we mainly consider the iterative solution of large-scale consistent linear systems of the formwhere , , and denotes the n-dimensional unknown vector. The Kaczmarz algorithm [1] is a typical row-action method [25] for solving such a linear system (1) and is known as algebraic reconstruction technique in the field of computerized tomography [6, 7] and image reconstruction [5, 810]; see also [1113] for additional references. More specifically, the (t + 1)th iterative vector xt + 1 is generated by the following Kaczmarz updates:where denotes the itth row of the matrix A, (⋅)∗ denotes the conjugate transpose of the corresponding vector or matrix, denotes the itth entry of the vector b, and it = (t mod m) + 1. Thus, the Kaczmarz algorithm is very suitable for the solution of very large linear equations since the main computation in the procedure of this algorithm is an inner product; however, this algorithm sometimes converges very slowly, see [14, 15] and the references therein. In order to improve the convergence of this algorithm, in 2009, Strohmer and Vershynin [16] proposed the randomized Kaczmarz (RK) algorithm with an expected exponential rate of convergence by using the rows of the coefficient matrix A in a random manner rather than in a given order, i.e., the RK chooses row with probability

Recently, Bai and Wu proposed a different but more effective probability criterion and, based on it, they constructed the greedy randomized Kaczmarz (GRK) algorithm for solving the linear system (1) in [17]. This probability criterion makes the corresponding GRK algorithm to converge significantly faster than the RK method in both theory and experiments. We refer to [15, 18, 19] for more details on convergence theory and algorithmic generalizations for the GRK algorithm.

We observe that the most expensive component at the tth iterate of the GRK algorithm is computing the square norm of the residual vector (i.e., ) of the linear system, and it is the main factor that affects the computing time. Consequently, in this paper, we first randomly select k indices of rows, say, , of the coefficient matrix of the linear system (1) and then compute the largest absolute relative residual, i.e, , and construct a randomized sampling Kaczmarz (RSK) algorithm. Therefore, our method needs not to calculate the residuals of all rows as the GRK method does and can be expected to be more effective than GRK in terms of the computing time. In theory, we provide a proof that shows linear convergence in expectation to the unique least-norm solution of the consistent linear system (1) for our RSK algorithm. And in computations, we show that the RSK algorithm significantly outperforms the RK algorithm in terms of both iteration counts and computing times and costs much less CPU time than the GRK algorithm but slightly requires more iteration steps than GRK algorithm. In addition, we extend the RSK algorithm to signal reconstruction problems in compressed sensing [20, 21] and establish the corresponding randomized sparse sampling Kaczmarz (RaSSK) algorithm and show that RaSSK performs much better than the randomized sparse Kaczmarz (RaSK) algorithm, which is a special case of the Bregman iterative algorithm [22] for online compressed sensing.

The remainder of the paper is organized as follows. In Section 2, we review the RK and GRK algorithms. In Section 3, we give the RSK algorithm and establish its convergence theory. Numerical experiments and comparisons are shown in Section 4. In Section 5, we extend the RSK to signal reconstruction in compressed sensing and give some signal experiments in this section. Finally, we end this paper with a few remarks and conclusions in Section 6.

2. The RK and GRK Algorithms

In this section, we briefly introduce the RK and GRK algorithms for solving systems of linear equations. First, let us give some necessary explanations. In this paper, we use to denote the expected value conditional on the first t iterations, that is,where il (l = 1, 2, …, t) is the ilth row chosen at the lth iterate. Then, it is easy to get that (see, e.g., the work of Bai and Wu [17]).

When the linear system (1) is consistent and its coefficient matrix is of the full-column rank, Strohmer and Vershynin [16] proposed the randomized Kaczmarz algorithm, as given in Algorithm 1.

Require: A, b, l, and x0.
Ensure: xl.
(1)for t = 0, 1, …, l do
(2)Select with probability
(3)Set
(4)end for

The RK method is convergent to the unique least-norm solution of the consistent linear system (1) when the coefficient matrix A is of the full-column rank with m ≥ n [16], and later in [23], Ma, Needell, and Ramdas gave the same convergence rate of the RK algorithm, but for the case that m < n. Compared with the Kaczmarz method, we can find, according to probability criterion (3), that the order in which the RK method selects the action row is based on the probability corresponding to the size of each row norm. It can improve the convergence of the Kaczmarz method discussed in [1].

In [17], Bai and Wu proposed a new probability criterion (i.e., Steps 2–5 of Algorithm 2) for the RK algorithm, and based on it, they constructed a greedy randomized Kaczmarz algorithm, as given in Algorithm 2, which converges faster than the RK algorithm.

Require: A, b, l, and x0.
Ensure: xl.
(1)for t = 0, 1, …, l do
(2)Compute
(3)Determine the index set of positive integers
(4)Compute the ith entry of the vector according to
(5)Select it ∈ Ut with probability
(6)Set
(7)end for

According to Algorithm 2, we can find that, at the tth iterate, the corresponding residual vector is rt = b − Axt and, if , then the GRK algorithm is to make the ith row be selected with a larger probability than the jth row. Actually, only some entries of rt whose relative residual is greater than a threshold are nonzero. Convergence property of the GRK method has been studied by Bai and Wu in [17]. In addition, from [17], we can find that if the GRK method is convergent, then it must converge to the least-norm solution x = Ab of the linear system (1), where A indicates the Moore–Penrose pseudoinverse of the matrix A.

3. A Randomized Sampling Kaczmarz Method

3.1. The Randomized Sampling Kaczmarz Method

In this part, a new algorithm will be given. Through the contents of the previous sections, we note that at each iteration of GRK, it will cost a lot of money to calculate the residual vector rt. If the dimension of the matrix is too large relative to the memory of the computer, the performance of the GRK method will be greatly reduced. In response to this situation, we propose a new method to solve the linear system (1). The purpose of our method is to make full use of the residual information as the GRK method does, while avoiding calculating the residual rt at each iteration. Let in the linear system (1) with 1 ≤ i ≤ m. Our method is carried out as follows: first, according to uniform distribution, randomly select k rows i1, i2, …, ik from the m rows of the coefficient matrix A and put them in a set ; second, calculate the relative residuals of all corresponding rows in the set Vt; then, select the row with the largest absolute relative residual value (i.e., , ) to implement the Kaczmarz update in (2). The RSK algorithm proposed in this paper is expressed in Algorithm 3.

Require: A, b, l, k and x0.
Ensure: xl.
(1)for t = 0, 1, …, l do
(2)Randomly generate by uniform distribution k different elements from 1 to m, indicated by
(3)Compute the lth entry of the vector rt according to
(4)For l = 1, 2, …, k, select the row according to for il ∈ Vt and assume that the index il = j
(5)Set
(6)end for

According to Algorithm 3, we can see that the selection of the appropriate number of rows k plays an important role in the RSK algorithm. More specifically, if the number of rows k is too small, it is difficult to ensure the efficiency of selecting iterative action rows in each iteration, and, if the number of rows k is too large, then the RSK method will have higher requirement for the memory of the computer, especially when the matrix dimension is large enough. In the following numerical examples, we will discuss the selection of parameter k in the RSK method.

In fact, according to Algorithms 13, the RSK method will cost (2k + 2) n + 1 + k flopping operations (flops) at each iteration, while the GRK method and the RK method will cost 7m + 2 (n + 1) and 4n + 1 flops, respectively [17]. We list the flops of these three algorithms at each iteration in Table 1. Obviously, when the row number m is extremely large, the RSK method costs less flops than the GRK method. When k = 1, the RSK method degenerates into the uniform sampling RK method.

3.2. Convergence Analysis of the RSK Algorithm

In this section, we pay attention to the convergence property of our algorithm. The following theorem guarantees the convergence of the RSK method.

Theorem 1. Let x = Ab be the solution of the consistent linear system (1) and x0 ∈ R(A), where R(A) denotes the column space of the matrix A. Then, the iteration sequence , generated by the RSK algorithm, converges to the solution x in expectation. Moreover, the solution error in expectation for the iteration sequence obeyswhere is the smallest positive eigenvalue of .

Proof. Update rule of the RSK algorithm yieldswhich indicates that xt+1 − xt is parallel to . Using , we computeIn other words, xt+1 − x is orthogonal to . Consequently, by the Pythagorean theorem, we obtainBased on this equality, we haveNote that the population mean can be expressed as the expectation of the sample mean, thus,So, we upper boundBy the assumption that x0 ∈ R(A), we have from the RSK algorithm that xt also does for each t. Since x = Ab ∈ R(A), we obtain xt − x ∈ R(A), which implies thatHence, it holds thatIn addition, by taking full expectation on both sides of (13) and using the law of iterated expectation , we immediately getIt follows from induction on the iteration index t, and we finally obtain that

Remark 1. Actually, compared with the GRK algorithm in [17], the RSK algorithm cannot improve the convergence for the number of iterations for the linear system (1). However, when the row number m of the coefficient matrix A is much larger than k, in each iteration of the GRK algorithm, we must calculate the residuals of all m rows, while in the RSK algorithm, we just calculate the residual of k rows. This explains how and why the RSK algorithm uses less calculation time to achieve convergence accuracy when compared with the RK and GRK algorithms.

3.2.1. Sensitivity Analysis

The essence of the RaSK algorithm is to solve the least square problem. Consider least square problem (16), given matrix A ∈ Rm×n, observation vector b ∈ Rm, and x ∈ Rn is the vector to be solved and satisfies

Let A be accurate and vector b be the measured data, and then we consider the sensitivity of the solution x of problem (16) with respect to each component of data bj. Let the sensitivity of x with respect to bj be

According to articles [24, 25], we have the following conclusions.

Theorem 2. In the least square problem (16), the sensitivity of x with respect to bj iswhere uik and are the elements of orthogonal matrix U and V, respectively, and it satisfies singular value decomposition formula A = VΛUT with Λ = diag (λ1, λ2, …) being the singular value of matrix A.

Proof. Without losing generality, let rank (A) = n(m ≥ n), and according to the singular value decomposition formula of the matrix, we havewhere the orthogonal matrix V ∈ Rm×m, U ∈ Rn×n and Λ ∈ Rm×n.
Let and λ1 ≥ λ2 ≥…≥ λn, then we haveFor any x and b, letthen we can getLet , where . We haveSo x should satisfyThus, we can getAccording to β = VTb, we haveThus, we can getSo we haveIn fact, Sij reflects the rate of change of x with respect to data bj, which depends on the singular value of matrix A and its decomposition orthogonal matrix U, V. In particular, when m = n, the least square problem is reduced to solve the system of equations.

4. Numerical Experiments

4.1. Algorithm Comparison

In this section, we use RK, GRK, and RSK algorithms to solve equations (1) with different matrices A. The numerical behaviors of these algorithms are tested and evaluated in terms of the computing time in seconds (denoted as “CPU”) and the number of iteration steps (denoted as “IT”), and the CPU and IT mean the medians of the CPU time and iteration steps with respect to 50 times repeated runs of the corresponding method. In addition, we also report the speed-up of RSK (k) (k represents the number of rows selected from the coefficient matrix A in the RSK algorithm) against GRK, which is defined as

All algorithms started from the initial vector x0 = 0 and terminated once the relative solution error (RSE), defined asat the current iterate xt, satisfies RSE < 10−6, or the number of iteration steps exceeds 200,000. The latter is given a label “−” in the numerical tables. For part of the tested matrices, we give their Euclidean condition number, denoted by cond(A), and their density is defined as

All experiments are carried out using MATLAB (R2015b) on a personal laptop with 2.5 GHz (Intel(R) Core (TM) CPU i5-7300HQ), 8.00 GB memory, and Windows operating system (Windows 10).

Example 1. In this example, test matrices A are randomly generated by the MATLAB function randn, which produces independent entries subject to the standard normal distribution N (0,1). In our implementations, the random vectors is randomly generated by using the MATLAB function randn, and is taken to be ; in addition, the solution vectors is taken to be pinv(A)b.
In Tables 2 and 3, we report iteration counts and CPU times for RK, GRK, RSK (2), and RSK (7) when the linear system (1) is consistent. As the results in Table 2 show that the RSK (7) vastly outperform both the RK and GRK in terms of CPU times with significant speed-ups when the corresponding linear system is fat (i.e., m < n) and the minimum speed-up is 1.22 and the maximum reaches 2.08. From Table 3, we see that the CPU times of RSK (10) are considerably less than those of RK and GRK when the corresponding linear system is thin (i.e., m > n), with the speed-up being at least 2.32 and at most attaining 3.49. In addition, we can find that the “fatter” the matrix is, the RSK algorithm shows less advantages, and the “thinner” the matrix is, the RSK algorithm shows more advantages. It is in line with our intuition because if the row number m is extremely large, the RSK algorithm can reduce more computational complexity, for the RSK algorithm is independent of m while the GRK algorithm is not.

Example 2. In this example, we select full-rank sparse matrices from [26], which originate in different applications such as linear programming, combinatorial optimization, DNA electrophoresis model, and Pajek or world city network. They possess certain structures and properties, such as square (m = n) (e.g., cage5), thin (m > n) (e.g., WorldCities), or fat (m < n) (e.g., bibd_16_8, and refine). There are also rank-deficient sparse matrices from [26], which come from applications like combinatorial problem and Pajek or world soccer network, such as square (e.g., football) and thin (e.g., relat6 and flower_5_1).
In Table 4, we list the numbers of iteration steps and the computing times for RK, GRK, RSK (2), and RSK (5) algorithms. According to the test results in Table 4, we can see that the RSK algorithm performs better than the GRK algorithm in terms of CPU, even if the matrix is square or not, full-rank or not, and sparse or not and the condition number is infinite or not. More specifically, in terms of the RSK (2) algorithm, the speed-up is at least 1.59 and the biggest is 2.76; and in terms of the RSK (5) algorithm, the speed-up is at least 1.84 and the biggest even attains 4.45.

4.2. The Choice of Parameter k

There is no doubt that the value of parameter k makes a huge difference in the performance of the RSK algorithm. In this section, we will have a tentative discussion on k and the matrix row number m. We simulate the relationship between the CPU of the RSK algorithm (with RSE < 10−6) and the size of k under different A. The simulation results are shown in Figure 1.

Through a large number of our numerical experiments, we find that k = [log2(m)] may be a good choice.

5. Application of RSK Algorithm in Compressed Sensing

5.1. The RaSSK Algorithm

Consider the linear system Ax = b, where , x is a sparse n dimension vector, and b is an m-dimension vector. By solving the following regularized basis pursuit problemwe can find that the least Euclidean norm solution satisfies the sparsity requirement. In 2014, Lorenz et al. [22] proposed the RaSK algorithm for solving the regularized basis pursuit problem as given in Algorithm 4).

Require: x0, A, b, and l.
Ensure: xl.
(1)for t = 0, 1, …, l do
(2)Generate it randomly by
(3)Set
(4)Set xt+1 = (xt+1/2)
(5)end for

Shrink function , where λ(>0) is set according to different applications and is to control the sparsity of the solution. Lorenz et al. proved that, for a consistent linear system Ax = b, the RaSK algorithm converges to the unique solution of the regularized basis pursuit problem in [22].

Similar to the construction method of the RaSK algorithm, we can give the randomized sampling sparse Kaczmarz (RaSSK) algorithm which is listed in Algorithm 5.

Require: x0, A, b, l, k.
Ensure: xl.
(1)for t = 0, 1, …, l do
(2)Randomly generate by uniform distribution k different elements from 1 to m, indicated by
(3)Compute the lth entry of the vector rt according to
(4)For l = 1, 2, …, k, select the row according to for il ∈ Vt and assume that the index il = j
(5)Set
(6)Set xt+1 = (xt+1/2)
(7)end for

The proof of the convergence of RaSSK is similar to RaSK. It can be seen as a special case of the Bregman projections for split feasible problems (BPSFP) algorithm in [22]; if we change its feasibility question to “Find ” (i.e., the hyperplane formed by the rows of the matrix A) and define , then, the convergence can be easily obtained by Theorem 2.8 in [22].

5.2. Signal Experiments

In this section, we will show the efficiency of the RaSSK method by several numerical examples and compare it with the RaSK algorithm. We implement the numerical experiments, by MATLAB (R2015b) on a personal laptop with 2.5 GHz (Intel(R) Core(TM) CPU i5-7300HQ), 8.00 GB memory, and Windows operating system (Windows 10).

Example 3. In this example, the test signal is randomly generated with length 256 and limit its sparsity to 10, that is, only 10 nonzero coefficients. The signal is reconstructed by RaSK and RaSSK, respectively. The recovery signal map, relative error map, and relative residual map are given in Figure 2. As shown in this figure, the parameter λ in the RaSSK method and the RaSK method is 50. We can see that both algorithms can reconstruct the original signal. It is worth mentioning that the RaSSK method is more efficient compared to the RaSK method.

Example 4. In this example, we use the famous image lena with 64 × 64 and 128 × 128 pixels to test our algorithm. The MATLAB function randn generates an m × n random matrix which is independently distributed in the standard normal distribution N (0,1) as the observation matrix. For the sparse representation of the image, we use MATLAB function dwt, which is a wavelet transform process. The error is defined byand the stopping criterion of the algorithm is error < 0.1 or reaches the maximum number of iteration steps 50,000. The value of the parameter λ and the results are shown in Table 5. In this table, “CPU” denotes the computing time and “IT” denotes the number of iteration steps.
From Table 5, it is easy to obtain that the RaSSK algorithm significantly performs better than the RaSK algorithm. It greatly reduces the number of iteration steps and the computing time. Meanwhile, from Figures 3 and 4, we can find that for human beings, there is almost no perception loss whenever some information is discarded.

6. Conclusion

Variants of the RK algorithm are effective iteration methods for large-scale linear systems. In this paper, based on the randomized Kaczmarz algorithm and the greedy randomized Kaczmarz algorithm, we propose a new algorithm which makes use of the residual information, while it need not calculate all the residuals. This algorithm converges faster than RK [16] and GRK [17] in experiments. Furthermore, after a large amount of numerical experiments, we recommend the parameter k to take [log2(m)]. As an application, we apply it to the signal reconstruction in compressed sensing. The experiments show that our algorithm has a good performance.

Data Availability

The table and figure data used to support the findings of this study are included within the article. The software code data used to support the findings of this study are available from the corresponding author upon request. We respect and implement the relevant provisions of Mathematical Problems in Engineering, and our article implements

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

This work was funded by the National Natural Science Foundation of China, under Grant no. 11371243, Key Disciplines of Shanghai Municipality, under Grant no. S30104, Fostering Master’s Degree Empowerment Point Project of Hefei University, under Grant no. 2018xs03, Key Natural Science Research Project of University of Anhui Province, Education Department of Anhui Province, under Grant nos. KJ2019A0846 and KJ2017A547, and Innovation Program of Shanghai Municipal Education Commission (13ZZ068).