Abstract
Convolutional sparse coding (CSC) models are becoming increasingly popular in the signal and image processing communities in recent years. Several research studies have addressed the basis pursuit (BP) problem of the CSC model, including the recently proposed local block coordinate descent (LoBCoD) algorithm. This algorithm adopts slicebased local processing ideas and splits the global sparse vector into local vector needles that are locally computed in the original domain to obtain the encoding. However, a convergence theorem for the LoBCoD algorithm has not been given previously. This paper presents a convergence theorem for the LoBCoD algorithm which proves that the LoBCoD algorithm will converge to its global optimum at a rate of . A slicebased multilayer local block coordinate descent (MLLoBCoD) algorithm is proposed which is motivated by the multilayer basis pursuit (MLBP) problem and the LoBCoD algorithm. We prove that the MLLoBCoD algorithm is guaranteed to converge to the optimal solution at a rate . Preliminary numerical experiments demonstrate the better performance of the proposed MLLoBCoD algorithm compared to the LoBCoD algorithm for the BP problem, and the loss function value is also lower for MLLoBCoD than LoBCoD.
1. Introduction
Sparse representation models have been widely used in various image processing [1, 2] and computer vision [3, 4] applications. A sparse representation model assumes that signals can be expressed as a linear combination of several columns, i.e., , where is a matrix that forms the dictionary , and is a sparse vector. If is assumed to be fixed, it can be considered a basis pursuit (BP) problem to find the sparse vector . However, the BP algorithms are only encoded in patches and ignore the relationship between neighboring patches, resulting in a high degree of redundancy in the encoding. The convolution sparse coding (CSC) model [5] has been proposed and extended in the last ten years, and it imposes constraints on the dictionary by using a banded circulant matrix. This model assumes that the signal can be represented as the superposition of a few local filters, convolved with a sparse vector. Several works have presented algorithms for solving the CSC problem [6, 7]. Contemporary BP algorithms for CSC often rely on the Alternating Direction Methods of Multipliers (ADMM) algorithm in the Fourier domain. It is known that algorithms encoded in the Fourier domain are often computationally infeasible. Additionally, algorithms based on the ADMM formula need to introduce auxiliary variables which increases the difficulty of optimization. A recent work proposed by Papyan et al. [8], adopted slicebased local processing ideas and split the global sparse vector into local vector needles that are locally computed in the original domain rather than the Fourier domain to obtain the encoding. While this approach still relies on the ADMM algorithm, its convergence largely depends on the auxiliary variables that were introduced. The LoBCoD algorithm [9] is another algorithm that was proposed for the BP problem. The advantages of the LoBCoD algorithm are that it is not calculated on the Fourier domain and the calculation does not use the ADMM formula. More precisely, the LoBCoD algorithm optimizes needles of the CSC model in the original domain and operates without any auxiliary variables. Compared with global or local ADMMbased methods, the LoBCoD algorithm achieves better performance to solve the BP problem. However, the literature [9] does not provide convergence theorem for the LoBCoD algorithm. Thus, this paper will present a convergence theorem and proof of the LoBCoD algorithm.
A multilayer convolution sparse coding (MLCSC) model has been proposed in the last three years by Sulam et al. [10], which is a deep extension of the CSC model. The core assumption of the MLCSC model is that a signal can be expressed by sparse representations at different layers in terms of nested convolutional filters. The traditional BP problem was recently extended to a multilayer setting, which was motivated by the MLCSC model [11]. Several methods have been proposed to solve the MLBP problem. The first method is a layered basis pursuit algorithm [12], which establishes a connection between convolutional neural networks and sparse modeling. However, layered basis pursuit algorithm does not provide a signal that satisfies the assumption of the multilayer model, and the signal reconstruction error increases as the network deepens. Subsequently, the multilayer iterative threshold algorithm (MLISTA) and its fast version (MLFISTA) algorithm [11] were proposed, which only require matrix multiplications and entrywise operations and will converge well to the global optimal. Unfortunately, both methods operate on patches only and do not utilize slicebased local processing idea. Therefore, the slicebased MLLoBCoD algorithm is proposed for MLBP problem. This algorithm employs slicebased local processing idea and the block coordinate descent (BCD) method. Based on the convergence theorem proof of the block coordinate descent algorithm [13], this paper provides a convergence theorem for the MLLoBCoD algorithm and proves that the MLLoBCoD algorithm converges to the global optimal value at a rate of .
The rest of this paper is organized as follows. We begin by reviewing the slicebased CSC and slicebased LoBCoD algorithms in Section 2. The convergence theorem and proof of the LoBCoD algorithm are given in Section 3. In Section 4, we propose a slicebased MLCSC model and a slicebased MLLoBCoD algorithm. The convergence theorem and proof of the slicebased MLLoBCoD algorithm are given in Section 5. In Section 6, the experimental results of the signal reconstruction and classification accuracy of the two networks inspired by the two algorithms are given. Finally, we conclude this work in Section 7.
2. Background
2.1. SliceBased Convolutional Sparse Coding
The CSC model assumes that a global signal can be decomposed as , where is a banded convolutional dictionary that consists of all shifted versions of a local dictionary , are local filters that are extracted from , the global sparse vector contains the interlacing cascades of all the sparse representations , and is the corresponding sparse representation of local filter . Using the above formula, the BP problem can be expressed as
The global sparse vector can be decomposed into N nonoverlapping mdimensional local sparse vectors, , which are called needles [8], i.e., . Thus, the global signal can be expressed as , where is the operator that places in the th position and pads the remaining entries with zeros. Therefore, the BP problem (1) can be expressed as a local problem:
Papyan et al. proposed the slicebased local processing idea and defined as the th slice. The global signal can be rewritten as . Then, the slicebased BP problem (2) can be expressed as
Papyan et al. tackled the BP problem (3) using the ADMM algorithm [8], which minimizes the following augmented Lagrangian problem:
Here, denotes the dual variables of the ADMM formulation.
2.2. SliceBased Local Block Coordinate Descent Algorithm
The CSC model parameters are represented by the local sparse vectors and the local dictionary . Assuming is fixed, a slicebased local processing idea and block coordinate descent method are adopted to update the needles. is objection function of equation (2). The BCD algorithm [14] will be briefly described below.
Initialization: choose any ,
Iteration: choose an index and computeuntil the convergence condition is met.
Output:
In this paper, each needle can be treated as a block of coordinates taken from the global vector which can be optimized separately with respect to each block in sequence.
Consequently, the update rule for each needle can be written as
Equation (6) can be decomposed into a local problem:where is global variable, representing the residual image without the contribution of needle , and is the transpose of , representing the operator that extracts the th ndimensional patch from .
The LoBCoD algorithm is proposed to minimize equation (7) [9]. The function can be defined, which is a convex smooth function. The LoBCoD algorithm can be considered to be a generalized gradient algorithm that applies an update in the form , where . The update rule for each needle can be expressed as
3. Convergence of SliceBased Local Block Coordinate Descent Algorithm
The convergence theorem will now be proposed and proof of LoBCoD algorithm will be given.
Lemma 1. (fundamental proximal gradient inequality).
Assume that is a convex smooth function, is a convex function, and is proximal gradient operator. For any , , , satisfyingit holds thatwhere
Theorem 1. (convergence of LoBCoD).
Given a signal and local dictionary , the slicebased LoBCoD algorithm is guaranteed to converge to the optimal solution at a rate .
Proof. The optimization problem of CSC can be represented as a general minimization model as follows:where and are convex functions. The gradient of is Lipschitz continuous. is defined to represent the nonempty optimal problem set of problem (12), and the optimal object function value is represented by . According to the proximal gradient method, the general update step of can be written in the following form:Due to the proximal gradient operator , L is the Lipschitz constant of . Therefore, the update step of each needle can be written in the following form:Exploiting the fundamental proximal gradient inequality and making , we obtainWhen all of the above inequalities are added together for , the following result is obtained:The following result is obtained using the scaling method:Thus, we obtainFinally, we obtainSince the sparse vector can be decomposed into nonoverlapping mdimensional sparse vectors and is convergent, we can obtain convergence of and the convergence rate is constant with . Thus, the LoBCoD algorithm convergences to the global optimum at a rate of .
4. SliceBased Multilayer Local Block Coordinate Descent Algorithm
4.1. SliceBased Multilayer Convolutional Sparse Coding
The MLCSC model is a deep extension of the CSC model, since the MLCSC model assumes that the signal can be represented by a sparse representation of the nested convolution filter on different layers. The MLCSC model assumes that for , is the corresponding sparse representation and a global signal can be expressed as . This model can be cascaded by imposing a similar assumption , i.e., , for a convolutional dictionary and corresponding sparse representations . The model can be cascaded to layers and the final global signal can be expressed as , where . Applying the slicebased local processing idea to the MLCSC model, the slicebased MLCSC model is proposed. The definition of the slicebased MLCSC model will now be given. For a set of local convolution dictionaries of appropriate dimensions and a global signal , the slicebased MLCSC model can be expressed aswhere the norm is defined the maximal number of nonzeros in vector [10], is the sparse representation of the th layer, is the operator that extracts the th ndimensional patch from the th layer sparse representation , is th layer local dictionary, is of the th layer, and is a superparameter. The proposed slicebased multilayer basis pursuit (MLBP) problem can be expressed as
The th layer of the BP problem can be expressed as
4.2. SliceBased Multilayer Local Block Coordinate Descent Algorithm
Based on the LoBCoD algorithm, this paper extends the LoBCoD algorithm to a multilayer algorithm to solve the MLBP problem of equation (21). MLLoBCoD uses slicebased local processing idea to update the needles. However, rather than optimizing with respect to all needles at the same time, we can treat each needle as a coordinate block and optimize with respect to each block separately in sequence. Therefore, the th layer of the BP problem can be expressed as
By defining as the residual image of the th layer without the contribution of the needles , the th layer of the MLBP problem of equation (23) can be rewritten as
The th layer of the MLBP problem of equation (24) becomes equivalent to solving the following minimization problem:
We define , which is a convex function. The gradient of is . The MLLoBCoD algorithm can also be considered to be a generalized gradient algorithm that applies an update of the form . The update rule of each needle can be expressed as
The process of the MLLoBCoD algorithm is shown as Algorithm 1. The input is the global signal , the convolution dictionary , and the initial needles . The initial parameter values are and . The outputs are the needles . For the iterative process, we take the th layer as an example, and select a direction needle . The steps of MLLoBCoD algorithm are as follows. (1) The residuals and needles of the previous iteration are used to calculate the residual . (2) The needles are updated. (3) The other needles are fixed to update . All needles of the th layer are updated. (4) The convolutional sparse code of the th layer is obtained by accumulating each of the updated needles in turn. (5) All needles of all layers are updated and the layer residual is updated. (6) This process is repeated k times until convergence occurs.

The natural question that arises is whether the slicebased MLLoBCoD algorithm is guaranteed to converge to the global minimum of the MLBP problem, and the answer is positive as will be discussed in Section 5.
5. Convergence of SliceBased MLLoBCoD
In this section, we will propose the convergence theorem and provide proof of convergence of the slicebased MLLoBCoD algorithm.
Theorem 2. (convergence of MLLoBCoD).
Given signal and local dictionary , the slicebased MLLoBCoD algorithm will converge with convergence rate .
Proof. The definition of nonexpansive operators [15] is employed, which are guaranteed to converge to their fixed point. Firstly, the definition of internal operators and external operators is given as follows:An operator T is nonexpansive if it is Lipschitz continuous with constant 1, i.e., if . In addition, if an operator is firmly nonexpansive, then it must be nonexpansive. A firmly nonexpansion operator needs to satisfy the following conditions:The proximal operator is firmly nonexpansive. The internal operator will now be analyzed as follows:where s is chosen such that . Thus, is firmly nonexpansive.
Similarly, it can be proved that is also a nonexpansive operator. Let be a positive constant and satisfies . We obtainTherefore, the MLLoBCoD algorithm is a nonexpansive operator and it converges to a fixed point.
Next, we analyze the fixed point which is defined as for MLLoBCoD algorithm. Using the second proximal theorem [16], we obtainAssuming that there is , the above expression can be rewritten asUsing the second proximal theorem, we obtain andCombining these equations together, we obtainThe fixed point satisfies the optimization condition of the convex function, so the algorithm will converge to a fixed point.
Next, the nonexpansive operator is used to analyze the convergence rate of this algorithm:Summing all the inequalities above from , we can obtainTherefore, we obtainFinally, the results are obtained:The th needles is convergent, i.e., converges. Since the convolution sparse code is an interleaved cascade of all , thus will also converge, and the convergence rate is constant with . Thus, MLLoBCoD algorithm converges to the global optimum solution at a rate .
6. Experiment and Discussion
6.1. Methods
In this section, the LoBCoD algorithm and MLLoBCoD algorithm for image reconstruction are described. They are inspired by the MLISTA [11, 17]. We constructed a onelayer CSC model. The LoBCoD algorithm is iterative unfolded into a layer of convolutional recurrent neural networks. We constructed a threelayer CSC model. The MLLoBCoD algorithm is iterative unfolded into a multilayer convolutional recurrent neural network. MLLoBCoDNET iterates 3 times to form a convolutional recurrent structure because the algorithm requires multiple iterations to obtain the optimal performance. The forward process of MLLoBCoD is that the images are inputted to the MLLoBCoD network to obtain the coding, then which is classified by the classification layer. The backward process of MLLoBCoD is minimizing the total loss of function to update the convolutional dictionary by the backpropagation algorithm. The loss function is the negative log likelihood function that represents the loss of a classification error between the ground truth labels and the predicted labels by the network.
6.2. Experimental Results
In this section, we perform experiments on the MINIST dataset using both the LoBCoD and MLLoBCoD model for image reconstruction. We constructed a onelayer CSC model and a threelayer CSC model. The onelayer CSC model contains 64 local filters of size 6 × 6. In the threelayer CSC model, the first convolutional layer includes 64 local filters of size 6 × 6, the second convolutional layer contains 128 filters of size 6 × 6, and the third convolutional layer contains 512 filters of size 4 × 4. Its parameters are the same as a traditional CNN, and thus the network parameters remain unchanged.
The cause of selecting the MNIST datasets that this dataset is the most popular and the most frequently used image datasets and are also the entrylevel benchmark datasets in deep learning community.
Experiments on MNIST using the proposed networks are run on Pytorch platform and Linux system using a single computer with a Nvidia Geforce 1080Ti GPU. The code used to support the findings of this paper inspired by the MLISTA [11] in GitHub website and are available from the corresponding author upon request.
The loss function values of LoBCoD and MLLoBCoD with the number of iterations in the MNIST dataset are shown in Figure 1. It is obvious that the loss function values of the MLLoBCoD are lower than that of the LoBCoD.
The results of the original test image and the reconstructed image using the LoBCoD method and the MLLoBCoD method after 100 iterations are shown in Figures 2 and 3, respectively. It can be clearly seen that the reconstruction quality of the MLLoBCoD is better than that of LoBCoD.
The loss function value, the training time, and the peak signaltonoise ratio (PSNR) value of the two networks after 100 iterations are shown in Table 1. It can be observed that the loss function and PSNR values of the MLLoBCoD network is smaller than the LoBCoD network. The PSNR value of the LoBCoD network is lower than that of the MLLoBCoD network. The loss function value of the MLLoBCoD network is 3.03 × 10^{−6}. The PSNR value of the MLLoBCoD network is 20.15 dB. However, the MLLoBCoD network has longer training time than LoBCoD network.
The MLLoBCoD network has more model parameters than the LoBCoD network. The model parameters of the two networks are shown in Table 2. The model parameters and of the MLISTA and corresponding CNN are given in [11]. It can be observed that the MLISTA, MLLoBCoD, and corresponding CNN have the similar parameters, more than the LoBCoD network.
The classification accuracy results of the four model are shown in Table 3. It can be observed that MLLoBCoD has the highest accuracy rate, far superior to the LoBCoD. These experimental results demonstrate the superiority of our proposed method.
7. Conclusion
In this paper, the convergence theorem of LoBCoD algorithm was proposed and it was proved that this method converges to the global optimum at a rate . Inspired by MLCSC and slicebased local processing idea, a slicebased MLCSC model was proposed. Motivated by MLISTA and the LoBCoD algorithm, an MLLoBCoD algorithm was proposed and it was established that this method is guaranteed to converge to the global optimum of MLBP at a rate of . The experiment compares the loss function value, training time, and PSNR of the two networks. The experiment shows that the reconstruction quality of the MLLoBCoD network is better than that of LoBCoD network. However, this paper only studies the pursuit problem of MLCSC. The dictionary learning problem of CSC needs further study. This paper is a preliminary study of MLLoBCoD network, and further study is needed to obtain a more complete analysis, which will likely contribute to the further understanding of deep learning based on driven optimization algorithms.
Data Availability
The data used to support the findings of this study are available from the website http://yann.lecun.com/exdb/mnist/.
Conflicts of Interest
The authors declare that they have no conflicts of interest.
Acknowledgments
The authors thank the National Natural Science Foundation of China (Grant 61473339) and Special Projects on Basic Research Cooperation of Beijing, Tianjin and Hebei (Grant nos. 19JCZDJC65600Z and F2019203583).