Research Article | Open Access
Jing Li, Hui Yu, Xiao Wei, Jinjia Wang, "Convergence of Slice-Based Block Coordinate Descent Algorithm for Convolutional Sparse Coding", Mathematical Problems in Engineering, vol. 2020, Article ID 4367515, 8 pages, 2020. https://doi.org/10.1155/2020/4367515
Convergence of Slice-Based Block Coordinate Descent Algorithm for Convolutional Sparse Coding
Convolutional sparse coding (CSC) models are becoming increasingly popular in the signal and image processing communities in recent years. Several research studies have addressed the basis pursuit (BP) problem of the CSC model, including the recently proposed local block coordinate descent (LoBCoD) algorithm. This algorithm adopts slice-based local processing ideas and splits the global sparse vector into local vector needles that are locally computed in the original domain to obtain the encoding. However, a convergence theorem for the LoBCoD algorithm has not been given previously. This paper presents a convergence theorem for the LoBCoD algorithm which proves that the LoBCoD algorithm will converge to its global optimum at a rate of . A slice-based multilayer local block coordinate descent (ML-LoBCoD) algorithm is proposed which is motivated by the multilayer basis pursuit (ML-BP) problem and the LoBCoD algorithm. We prove that the ML-LoBCoD algorithm is guaranteed to converge to the optimal solution at a rate . Preliminary numerical experiments demonstrate the better performance of the proposed ML-LoBCoD algorithm compared to the LoBCoD algorithm for the BP problem, and the loss function value is also lower for ML-LoBCoD than LoBCoD.
Sparse representation models have been widely used in various image processing [1, 2] and computer vision [3, 4] applications. A sparse representation model assumes that signals can be expressed as a linear combination of several columns, i.e., , where is a matrix that forms the dictionary , and is a sparse vector. If is assumed to be fixed, it can be considered a basis pursuit (BP) problem to find the sparse vector . However, the BP algorithms are only encoded in patches and ignore the relationship between neighboring patches, resulting in a high degree of redundancy in the encoding. The convolution sparse coding (CSC) model  has been proposed and extended in the last ten years, and it imposes constraints on the dictionary by using a banded circulant matrix. This model assumes that the signal can be represented as the superposition of a few local filters, convolved with a sparse vector. Several works have presented algorithms for solving the CSC problem [6, 7]. Contemporary BP algorithms for CSC often rely on the Alternating Direction Methods of Multipliers (ADMM) algorithm in the Fourier domain. It is known that algorithms encoded in the Fourier domain are often computationally infeasible. Additionally, algorithms based on the ADMM formula need to introduce auxiliary variables which increases the difficulty of optimization. A recent work proposed by Papyan et al. , adopted slice-based local processing ideas and split the global sparse vector into local vector needles that are locally computed in the original domain rather than the Fourier domain to obtain the encoding. While this approach still relies on the ADMM algorithm, its convergence largely depends on the auxiliary variables that were introduced. The LoBCoD algorithm  is another algorithm that was proposed for the BP problem. The advantages of the LoBCoD algorithm are that it is not calculated on the Fourier domain and the calculation does not use the ADMM formula. More precisely, the LoBCoD algorithm optimizes needles of the CSC model in the original domain and operates without any auxiliary variables. Compared with global or local ADMM-based methods, the LoBCoD algorithm achieves better performance to solve the BP problem. However, the literature  does not provide convergence theorem for the LoBCoD algorithm. Thus, this paper will present a convergence theorem and proof of the LoBCoD algorithm.
A multilayer convolution sparse coding (ML-CSC) model has been proposed in the last three years by Sulam et al. , which is a deep extension of the CSC model. The core assumption of the ML-CSC model is that a signal can be expressed by sparse representations at different layers in terms of nested convolutional filters. The traditional BP problem was recently extended to a multilayer setting, which was motivated by the ML-CSC model . Several methods have been proposed to solve the ML-BP problem. The first method is a layered basis pursuit algorithm , which establishes a connection between convolutional neural networks and sparse modeling. However, layered basis pursuit algorithm does not provide a signal that satisfies the assumption of the multilayer model, and the signal reconstruction error increases as the network deepens. Subsequently, the multilayer iterative threshold algorithm (ML-ISTA) and its fast version (ML-FISTA) algorithm  were proposed, which only require matrix multiplications and entry-wise operations and will converge well to the global optimal. Unfortunately, both methods operate on patches only and do not utilize slice-based local processing idea. Therefore, the slice-based ML-LoBCoD algorithm is proposed for ML-BP problem. This algorithm employs slice-based local processing idea and the block coordinate descent (BCD) method. Based on the convergence theorem proof of the block coordinate descent algorithm , this paper provides a convergence theorem for the ML-LoBCoD algorithm and proves that the ML-LoBCoD algorithm converges to the global optimal value at a rate of .
The rest of this paper is organized as follows. We begin by reviewing the slice-based CSC and slice-based LoBCoD algorithms in Section 2. The convergence theorem and proof of the LoBCoD algorithm are given in Section 3. In Section 4, we propose a slice-based ML-CSC model and a slice-based ML-LoBCoD algorithm. The convergence theorem and proof of the slice-based ML-LoBCoD algorithm are given in Section 5. In Section 6, the experimental results of the signal reconstruction and classification accuracy of the two networks inspired by the two algorithms are given. Finally, we conclude this work in Section 7.
2.1. Slice-Based Convolutional Sparse Coding
The CSC model assumes that a global signal can be decomposed as , where is a banded convolutional dictionary that consists of all shifted versions of a local dictionary , are local filters that are extracted from , the global sparse vector contains the interlacing cascades of all the sparse representations , and is the corresponding sparse representation of local filter . Using the above formula, the BP problem can be expressed as
The global sparse vector can be decomposed into N nonoverlapping m-dimensional local sparse vectors, , which are called needles , i.e., . Thus, the global signal can be expressed as , where is the operator that places in the th position and pads the remaining entries with zeros. Therefore, the BP problem (1) can be expressed as a local problem:
Papyan et al. proposed the slice-based local processing idea and defined as the th slice. The global signal can be rewritten as . Then, the slice-based BP problem (2) can be expressed as
Here, denotes the dual variables of the ADMM formulation.
2.2. Slice-Based Local Block Coordinate Descent Algorithm
The CSC model parameters are represented by the local sparse vectors and the local dictionary . Assuming is fixed, a slice-based local processing idea and block coordinate descent method are adopted to update the needles. is objection function of equation (2). The BCD algorithm  will be briefly described below.
Initialization: choose any ,
Iteration: choose an index and computeuntil the convergence condition is met.
In this paper, each needle can be treated as a block of coordinates taken from the global vector which can be optimized separately with respect to each block in sequence.
Consequently, the update rule for each needle can be written as
Equation (6) can be decomposed into a local problem:where is global variable, representing the residual image without the contribution of needle , and is the transpose of , representing the operator that extracts the th n-dimensional patch from .
The LoBCoD algorithm is proposed to minimize equation (7) . The function can be defined, which is a convex smooth function. The LoBCoD algorithm can be considered to be a generalized gradient algorithm that applies an update in the form , where . The update rule for each needle can be expressed as
3. Convergence of Slice-Based Local Block Coordinate Descent Algorithm
The convergence theorem will now be proposed and proof of LoBCoD algorithm will be given.
Lemma 1. (fundamental proximal gradient inequality).
Assume that is a convex smooth function, is a convex function, and is proximal gradient operator. For any , , , satisfyingit holds thatwhere
Theorem 1. (convergence of LoBCoD).
Given a signal and local dictionary , the slice-based LoBCoD algorithm is guaranteed to converge to the optimal solution at a rate .
Proof. The optimization problem of CSC can be represented as a general minimization model as follows:where and are convex functions. The gradient of is Lipschitz continuous. is defined to represent the nonempty optimal problem set of problem (12), and the optimal object function value is represented by . According to the proximal gradient method, the general update step of can be written in the following form:Due to the proximal gradient operator , L is the Lipschitz constant of . Therefore, the update step of each needle can be written in the following form:Exploiting the fundamental proximal gradient inequality and making , we obtainWhen all of the above inequalities are added together for , the following result is obtained:The following result is obtained using the scaling method:Thus, we obtainFinally, we obtainSince the sparse vector can be decomposed into nonoverlapping m-dimensional sparse vectors and is convergent, we can obtain convergence of and the convergence rate is constant with . Thus, the LoBCoD algorithm convergences to the global optimum at a rate of .
4. Slice-Based Multilayer Local Block Coordinate Descent Algorithm
4.1. Slice-Based Multilayer Convolutional Sparse Coding
The ML-CSC model is a deep extension of the CSC model, since the ML-CSC model assumes that the signal can be represented by a sparse representation of the nested convolution filter on different layers. The ML-CSC model assumes that for , is the corresponding sparse representation and a global signal can be expressed as . This model can be cascaded by imposing a similar assumption , i.e., , for a convolutional dictionary and corresponding sparse representations . The model can be cascaded to layers and the final global signal can be expressed as , where . Applying the slice-based local processing idea to the ML-CSC model, the slice-based ML-CSC model is proposed. The definition of the slice-based ML-CSC model will now be given. For a set of local convolution dictionaries of appropriate dimensions and a global signal , the slice-based ML-CSC model can be expressed aswhere the norm is defined the maximal number of nonzeros in vector , is the sparse representation of the th layer, is the operator that extracts the th n-dimensional patch from the th layer sparse representation , is th layer local dictionary, is of the th layer, and is a superparameter. The proposed slice-based multilayer basis pursuit (ML-BP) problem can be expressed as
The th layer of the BP problem can be expressed as
4.2. Slice-Based Multilayer Local Block Coordinate Descent Algorithm
Based on the LoBCoD algorithm, this paper extends the LoBCoD algorithm to a multilayer algorithm to solve the ML-BP problem of equation (21). ML-LoBCoD uses slice-based local processing idea to update the needles. However, rather than optimizing with respect to all needles at the same time, we can treat each needle as a coordinate block and optimize with respect to each block separately in sequence. Therefore, the th layer of the BP problem can be expressed as
By defining as the residual image of the th layer without the contribution of the needles , the th layer of the ML-BP problem of equation (23) can be rewritten as
The th layer of the ML-BP problem of equation (24) becomes equivalent to solving the following minimization problem:
We define , which is a convex function. The gradient of is . The ML-LoBCoD algorithm can also be considered to be a generalized gradient algorithm that applies an update of the form . The update rule of each needle can be expressed as
The process of the ML-LoBCoD algorithm is shown as Algorithm 1. The input is the global signal , the convolution dictionary , and the initial needles . The initial parameter values are and . The outputs are the needles . For the iterative process, we take the th layer as an example, and select a direction needle . The steps of ML-LoBCoD algorithm are as follows. (1) The residuals and needles of the previous iteration are used to calculate the residual . (2) The needles are updated. (3) The other needles are fixed to update . All needles of the th layer are updated. (4) The convolutional sparse code of the th layer is obtained by accumulating each of the updated needles in turn. (5) All needles of all layers are updated and the layer residual is updated. (6) This process is repeated k times until convergence occurs.
The natural question that arises is whether the slice-based ML-LoBCoD algorithm is guaranteed to converge to the global minimum of the ML-BP problem, and the answer is positive as will be discussed in Section 5.
5. Convergence of Slice-Based ML-LoBCoD
In this section, we will propose the convergence theorem and provide proof of convergence of the slice-based ML-LoBCoD algorithm.
Theorem 2. (convergence of ML-LoBCoD).
Given signal and local dictionary , the slice-based ML-LoBCoD algorithm will converge with convergence rate .
Proof. The definition of nonexpansive operators  is employed, which are guaranteed to converge to their fixed point. Firstly, the definition of internal operators and external operators is given as follows:An operator T is nonexpansive if it is Lipschitz continuous with constant 1, i.e., if . In addition, if an operator is firmly nonexpansive, then it must be nonexpansive. A firmly nonexpansion operator needs to satisfy the following conditions:The proximal operator is firmly nonexpansive. The internal operator will now be analyzed as follows:where s is chosen such that . Thus, is firmly nonexpansive.
Similarly, it can be proved that is also a nonexpansive operator. Let be a positive constant and satisfies . We obtainTherefore, the ML-LoBCoD algorithm is a nonexpansive operator and it converges to a fixed point.
Next, we analyze the fixed point which is defined as for ML-LoBCoD algorithm. Using the second proximal theorem , we obtainAssuming that there is , the above expression can be rewritten asUsing the second proximal theorem, we obtain andCombining these equations together, we obtainThe fixed point satisfies the optimization condition of the convex function, so the algorithm will converge to a fixed point.
Next, the nonexpansive operator is used to analyze the convergence rate of this algorithm:Summing all the inequalities above from , we can obtainTherefore, we obtainFinally, the results are obtained:The th needles is convergent, i.e., converges. Since the convolution sparse code is an interleaved cascade of all , thus will also converge, and the convergence rate is constant with . Thus, ML-LoBCoD algorithm converges to the global optimum solution at a rate .
6. Experiment and Discussion
In this section, the LoBCoD algorithm and ML-LoBCoD algorithm for image reconstruction are described. They are inspired by the ML-ISTA [11, 17]. We constructed a one-layer CSC model. The LoBCoD algorithm is iterative unfolded into a layer of convolutional recurrent neural networks. We constructed a three-layer CSC model. The ML-LoBCoD algorithm is iterative unfolded into a multilayer convolutional recurrent neural network. ML-LoBCoD-NET iterates 3 times to form a convolutional recurrent structure because the algorithm requires multiple iterations to obtain the optimal performance. The forward process of ML-LoBCoD is that the images are inputted to the ML-LoBCoD network to obtain the coding, then which is classified by the classification layer. The backward process of ML-LoBCoD is minimizing the total loss of function to update the convolutional dictionary by the backpropagation algorithm. The loss function is the negative log likelihood function that represents the loss of a classification error between the ground truth labels and the predicted labels by the network.
6.2. Experimental Results
In this section, we perform experiments on the MINIST dataset using both the LoBCoD and ML-LoBCoD model for image reconstruction. We constructed a one-layer CSC model and a three-layer CSC model. The one-layer CSC model contains 64 local filters of size 6 × 6. In the three-layer CSC model, the first convolutional layer includes 64 local filters of size 6 × 6, the second convolutional layer contains 128 filters of size 6 × 6, and the third convolutional layer contains 512 filters of size 4 × 4. Its parameters are the same as a traditional CNN, and thus the network parameters remain unchanged.
The cause of selecting the MNIST datasets that this dataset is the most popular and the most frequently used image datasets and are also the entry-level benchmark datasets in deep learning community.
Experiments on MNIST using the proposed networks are run on Pytorch platform and Linux system using a single computer with a Nvidia Geforce 1080Ti GPU. The code used to support the findings of this paper inspired by the ML-ISTA  in GitHub website and are available from the corresponding author upon request.
The loss function values of LoBCoD and ML-LoBCoD with the number of iterations in the MNIST dataset are shown in Figure 1. It is obvious that the loss function values of the ML-LoBCoD are lower than that of the LoBCoD.
The results of the original test image and the reconstructed image using the LoBCoD method and the ML-LoBCoD method after 100 iterations are shown in Figures 2 and 3, respectively. It can be clearly seen that the reconstruction quality of the ML-LoBCoD is better than that of LoBCoD.
The loss function value, the training time, and the peak signal-to-noise ratio (PSNR) value of the two networks after 100 iterations are shown in Table 1. It can be observed that the loss function and PSNR values of the ML-LoBCoD network is smaller than the LoBCoD network. The PSNR value of the LoBCoD network is lower than that of the ML-LoBCoD network. The loss function value of the ML-LoBCoD network is 3.03 × 10−6. The PSNR value of the ML-LoBCoD network is 20.15 dB. However, the ML-LoBCoD network has longer training time than LoBCoD network.
The ML-LoBCoD network has more model parameters than the LoBCoD network. The model parameters of the two networks are shown in Table 2. The model parameters and of the ML-ISTA and corresponding CNN are given in . It can be observed that the ML-ISTA, ML-LoBCoD, and corresponding CNN have the similar parameters, more than the LoBCoD network.
The classification accuracy results of the four model are shown in Table 3. It can be observed that ML-LoBCoD has the highest accuracy rate, far superior to the LoBCoD. These experimental results demonstrate the superiority of our proposed method.
In this paper, the convergence theorem of LoBCoD algorithm was proposed and it was proved that this method converges to the global optimum at a rate . Inspired by ML-CSC and slice-based local processing idea, a slice-based ML-CSC model was proposed. Motivated by ML-ISTA and the LoBCoD algorithm, an ML-LoBCoD algorithm was proposed and it was established that this method is guaranteed to converge to the global optimum of ML-BP at a rate of . The experiment compares the loss function value, training time, and PSNR of the two networks. The experiment shows that the reconstruction quality of the ML-LoBCoD network is better than that of LoBCoD network. However, this paper only studies the pursuit problem of ML-CSC. The dictionary learning problem of CSC needs further study. This paper is a preliminary study of ML-LoBCoD network, and further study is needed to obtain a more complete analysis, which will likely contribute to the further understanding of deep learning based on driven optimization algorithms.
The data used to support the findings of this study are available from the website http://yann.lecun.com/exdb/mnist/.
Conflicts of Interest
The authors declare that they have no conflicts of interest.
The authors thank the National Natural Science Foundation of China (Grant 61473339) and Special Projects on Basic Research Cooperation of Beijing, Tianjin and Hebei (Grant nos. 19JCZDJC65600Z and F2019203583).
- F. I. Miertoiu and B. Dumitrescu, “Feasibility pump algorithm for sparse representation under laplacian noise,” Mathematical Problems in Engineering, vol. 2019, Article ID 5615243, 12 pages, 2019.
- L. H. Fernando, F. V. Luis, and M. M. I. Sarria, “Detecting image brush editing using the discarded coefficients and intentions,” International Journal of Interactive Multimedia and Artificial Intelligence, vol. 5, no. 5, pp. 15–21, 2019.
- J. Wright, Y. Ma, J. Mairal, G. Sapiro, T. S. Huang, and S. Yan, “Sparse representation for computer vision and pattern recognition,” Proceedings of the IEEE, vol. 98, no. 6, pp. 1031–1044, 2010.
- M. Ribeiro and A. Gomes, “Contour enhancement algorithm for improving visual perception of deutan and protan dichromats,” International Journal of Interactive Multimedia and Artificial Intelligence, vol. 5, no. 5, pp. 79–88, 2019.
- V. Papyan, J. Sulam, and M. Elad, “Working locally thinking globally: theoretical guarantees for convolutional sparse coding,” IEEE Transactions on Signal Processing, vol. 65, no. 21, pp. 5687–5701, 2017.
- H. Bristow, A. Eriksson, and S. Lucey, “Fast convolutional sparse coding,” in Proceedings of the 2013 IEEE Conference on Computer Vision and Pattern Recognition, pp. 391–398, Portland, OR, USA, June 2013.
- B. Wohlberg, “Efficient convolutional sparse coding,” in Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 7173–7177, IEEE, Florence, Italy, May 2014.
- V. Papyan, Y. Romano, J. Sulam, and M. Elad, “Convolutional dictionary learning via local processing,” in Proceedings of the IEEE International Conference on Computer Vision (ICCV), pp. 5296–5304, IEEE, Venice, Italy, October 2017.
- E. Zisselman, J. Sulam, and M. Elad, “A local block coordinate descent algorithm for the convolutional sparse coding model,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), IEEE, Long Beach, CA, USA, June 2019.
- J. Sulam, V. Papyan, Y. Romano, and M. Elad, “Multi-layer convolutional sparse modeling: pursuit and dictionary learning,” IEEE Transactions on Signal Processing, vol. 66, no. 15, 2018.
- J. Sulam, V. Papyan, Y. Romano, and M. Elad, “On multi-layer basis pursuit, effificient algorithms and convolutional neural networks,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 42, no. 8, pp. 1968–1980, 2020.
- V. Papyan, Y. Romano, and M. Elad, “Convolutional neural networks analyzed via convolutional sparse coding,” The Journal of Machine Learning Research, vol. 18, no. 1, pp. 2887–2938, 2017.
- A. Beck and L. Tetruashvili, “On the convergence of block coordinate descent type methods,” SIAM Journal on Optimization, vol. 23, no. 4, pp. 2037–2060, 2013.
- P. Tseng, “Convergence of a block coordinate descent method for nondifferentiable minimization,” Journal of Optimization Theory and Applications, vol. 109, no. 3, pp. 475–494, 2001.
- H. H. Bauschke, S. M. Moffat, and X. Wang, “Firmly nonexpansive mappings and maximally monotone operators: correspondence and duality,” Set-Valued and Variational Analysis, vol. 20, no. 1, pp. 131–153, 2012.
- A. Beck, The Proximal Operator, First-Order Methods in Optimization, MOS-SIAM Series on Optimization, Society for Industrial and Applied Mathematics (SIAM), Philadelphia, PA, US, 2017.
- J. Wang, J. Xia, Q. Yang, and Y. Zhang, “Research on semi-supervised sound event detection based on mean teacher models using ML-LoBCoD-NET,” IEEE Access, vol. 8, pp. 38032–38044, 2020.
Copyright © 2020 Jing Li et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.