DFAN: Dual Feature Aggregation Network for Lightweight Image Super-Resolution

Li, Shang; Zhang, Guixuan; Luo, Zhengxiong; Liu, Jie

doi:https://doi.org/10.1155/2022/8116846

Wireless Communications and Mobile Computing

On this page

Abstract Introduction Related Work Conclusions Data Availability Conflicts of Interest Acknowledgments References Copyright Related Articles

Special Issue

Conference Issue: Intelligent Media Computing Technology and Applications for Mobile Internet

View this Special Issue

Research Article | Open Access

Volume 2022 | Article ID 8116846 | https://doi.org/10.1155/2022/8116846

DFAN: Dual Feature Aggregation Network for Lightweight Image Super-Resolution

Shang Li,^1,2Guixuan Zhang,²Zhengxiong Luo,^1,2and Jie Liu²

Academic Editor: Ming Yan

Received12 Oct 2021

Revised12 Nov 2021

Accepted06 Dec 2021

Published24 Jan 2022

Abstract

With the power of deep learning, super-resolution (SR) methods enjoy a dramatic boost in performance. However, they usually have a large model size and high computational complexity, which hinders the application in devices with limited memory and computing power. Some lightweight SR methods solve this issue by directly designing shallower architectures, but it will adversely affect the representation capability of convolutional neural networks. To address this issue, we propose the dual feature aggregation strategy for image SR. It enhances feature utilization via feature reuse, which largely improves the representation ability while only introducing marginal computational cost. Thus, a smaller model could achieve better cost-effectiveness with the dual feature aggregation strategy. Specifically, it consists of Local Aggregation Module (LAM) and Global Aggregation Module (GAM). LAM and GAM work together to further fuse hierarchical features adaptively along the channel and spatial dimensions. In addition, we propose a compact basic building block to compress the model size and extract hierarchical features in a more efficient way. Extensive experiments suggest that the proposed network performs favorably against state-of-the-art SR methods in terms of visual quality, memory footprint, and computational complexity.

1. Introduction

Single image super-resolution (SISR) aims to reconstruct a visually natural high-resolution (HR) image from its low-resolution (LR) counterpart, which is an inherently ill-posed inverse problem. Due to the essential role in video processing [1], surveillance system [2], and object restoration [3], super-resolution (SR) is still an active research area.

Recently, deep learning-based image super-resolution methods [4–7] have shown prominent performance over conventional methods such as Bicubic interpolation and Lanczos resampling. After the proposal of residual learning [8], which simplifies the optimization of deep convolutional neural networks (CNNs), SR networks tend to become even deeper and larger. However, it is impractical to simply pursue performance gains without considering the model size and computational complexity. For devices with limited memory and battery capacity, cost-effective methods are preferred, which encourages the design of lightweight SR models. To reduce the number of parameters, some approaches adopt a recursive manner or parameter sharing scheme [9, 10]. However, to compensate for the performance drop, these methods have to increase the network width or depth, thus, resulting in high computational complexity as shown in Figure 1. Some other methods directly design shallower network architectures, which reduce parameters and calculations simultaneously. For example, [11, 12] are such compact models with fewer than 40 layers. However, their representation ability is restricted by the shallow architecture.

Towards these drawbacks, we propose Dual Feature Aggregation Network (DFAN) that can strike a better trade-off between SR performance and computational cost as illustrated in Figure 1. The key component of DFAN is the dual feature aggregation strategy. It aggregates local features and global features in a coarse-to-fine manner and could largely improve feature utilization via feature reuse. Specifically, the dual feature aggregation strategy consists of two modules: Local Aggregation Module (LAM) and Global Aggregation Module (GAM). LAM uses an efficient connection method and one convolutional layer to adaptively fuse hierarchical features along the channel dimension. Then, GAM further fuses the local aggregated features along the spatial dimension in an iterative manner. This progressive aggregation strategy fully leverages all hierarchical features, which enables the lightweight model to achieve better SR performance. In this paper, we also design an Efficient Convolutional Block (ECB) as the basic building block of DFAN. It comprises group convolutional layers with channel shuffle operation. Although ECB is compact, DFAN can still achieve competitive results with the help of the dual feature aggregation strategy.

In summary, our main contributions are as follows: (i)We propose DFAN, which can achieve better SR performance with limited computational cost. It is more practical in real applications(ii)We propose the dual feature aggregation strategy which aggregates local and global features in a progressive manner. It could make full use of all hierarchical features through feature reuse, which enhances the feature utilization while introducing only marginal computation cost. With our dual feature aggregation strategy, the lightweight SR model can achieve better cost-effectiveness(iii)We also propose ECB as the basic building structure, which can extract hierarchical features in a computationally economical way(iv)We show through extensive experiments that our model can achieve competitive results against state-of-the-art methods with relatively fewer parameters and calculations

2.1. Lightweight SISR

Since Dong et al. [4] first applied CNNs to design Super-Resolution Convolutional Neural Network (SRCNN) and achieved significant improvement, deep learning based SISR methods have been actively explored and shown great advantages in representation capability. To obtain more powerful features for image reconstruction, they continue to enlarge the model size or network depth. Most existing SR methods have hundreds of convolutional layers, such as Residual Channel Attention Network (RCAN) [13], Residual Dense Network (RDN), and Deep Alternating Network (DAN) [14]. However, these methods are computationally expensive for real application. Thus, more and more lightweight SR methods are proposed. Deep Recursive Residual Network (DRRN) [9] and Memory Network (MemNet) [10] introduce recursive learning or weight sharing schemes to reduce parameters. However, they need to increase the computational complexity to compensate for the performance drop. Another idea is to build relatively shallower models, which can cut down the model size and calculations at the same time. Cascading Residual Network (CARN) [11], Information Distillation Network (IDN) [12], and Information Multi-Distillation Network (IMDN) [15] are all lightweight networks that have fewer than 40 layers. However, the shallow architecture could restrict their representation ability to some extent. For our method, we improve the feature utilization through dual feature aggregation, which can better balance the SR performance and computational cost.

2.2. Group Convolution

There has been rising interest in designing small and efficient neural networks [16–19] since many deep and complicated neural networks are infeasible in practical applications. Group convolution is an important method for designing efficient neural networks. The application of the group convolution method dates back to [20] where the model is distributed over two GPUs, resulting in gains in accuracy and convergence speed. Depthwise convolution is a special case of group convolution and is originally introduced in [21]. In depthwise convolution, the number of groups is equal to the number of channels. Based on the depthwise convolution, Mobile Network (MobileNet) [18] gains state-of-the-art results among lightweight models in many visual tasks. Then, group convolution and depthwise convolution are generalized in a novel form in [22]. Channel shuffle operation is also proposed in [22] to overcome the side effect of group convolution. Recently, group convolution has been used in some lightweight image super-resolution methods. Ahn et al. [11] proposed efficient residual block containing group convolutional layers, and Hui et al. [12] introduced group convolution to some specific layers. However, there is still room for improvement in the reconstruction performance of these two models. In our DFAN, group convolution is used as a basic building unit without affecting the reconstruction performance.

2.3. Deep Feature Aggregation

As the feature representation capability of a single network layer is limited [23, 24], deep feature aggregation is typically used to fuse features of different layers, which can improve the representation capability in a computationally economical way. For instance, the Densely Connected Network (DenseNet) [25] and the Feature Pyramid Network (FPN) [26] are the dominant architectures for semantic feature aggregation and spatial feature aggregation [27]. DenseNet can better propagate features and gradients through dense connections that connect each layer to every other layer in a feed-forward fashion. FPN can equalize resolution and standardize semantics across the levels of a pyramidal feature hierarchy through top-down and lateral connections. Besides, Residual Network (ResNet) [8] is also a typical feature aggregation method which aggregates features via simple element-wise summation. Recently, Yu et al. [28] proposed an iterative aggregation method and a hierarchical aggregation method, which can further improve the performance of the aforementioned dominant architectures in many visual tasks. Inspired by this work, we introduce an iterative and adaptive global feature aggregation module to DFAN, obtaining more comprehensive information and improving reconstruction performance.

3. Proposed Method

3.1. Network Architecture

As depicted in Figure 2(a), DFAN mainly consists of four parts: the shallow feature extraction layer, stacked local feature aggregation modules, the global feature aggregation module, and the upsampling module.

(a)

(b)

(c)

(d)

The shallow feature extraction layer contains only one convolutional layer. It extracts shallow features from the LR image. Then, is input into the stacked LAMs for global residual learning. There are stacked LAMs, and the local aggregated feature from the LAM can be formulated as where refers to the operation of the LAM, and is the local aggregated feature from it. As shown in Figure 2, each LAM is composed of a series of ECBs, therefore, can be viewed as a composite function.

After that, GAM fully leverages local aggregated features from LAMs in an iterative way, which can be expressed as where is the global aggregated feature. denotes the operation of GAM. Then, the global long skip connection adds to , obtaining the final aggregated feature . The global skip connection can better propagate information and gradients, thus, stabilizing the training of DFAN.

Finally, we use an upscale module proposed in [29] to restore the final SR image . That is, where denotes the group convolution, indicates the standard convolution, and is the upscaling module.

3.2. Local Feature Aggregation

Since features of different layers contain different weighted information, adaptively aggregating all hierarchical features could effectively improve the representation ability. Referring to [28], the key axes of feature fusion are semantic and spatial, which are closely related to channel and spatial dimensions, respectively. Thus, we propose the dual feature aggregation strategy, in which features are locally aggregated along the channel dimension, and then globally aggregated along the spatial dimension. In this subsection, we first explain the local feature aggregation.

3.2.1. Efficient Convolutional Block

As depicted in Figure 2(c), ECB is the basic building block of LAM. ECB is a residual learning module consisting of two group convolutional layers with channel shuffle operation [22] and a channel attention module [7]. Group convolution with channel shuffle operation can extract useful features in a computationally economical way. Assuming the group size of an group convolutional kernel is , the parameter amount and computation complexity of this group convolutional kernel will be both of an standard convolutional kernel. Moreover, the channel shuffle operation enhances the information exchange among channels without extra parameters and calculations. There are ECBs in each LAM. LAM fuses hierarchical features from ECBs by exploring the interchannel relationship. The local aggregated feature from the LAM can be obtained by where represents the concatenation of local features from ECBs in the LAM.

3.2.2. Balanced Connection

The connection method in LAM is what we call balanced connection. As shown in Figure 3, compared with two commonly used connection methods in SR, i.e., skip connection and dense connection, our balanced connection is more flexible than skip connection and more lightweight than dense connection. The analysis is as follows: (1)Difference to Skip Connection. As shown in Figure 3(b), for each LAM, if we only use skip connection which makes the elementwise sum of the hierarchical feature maps, all hierarchical features will contribute equally to the final aggregated feature. It may be inflexible since different features contain information of different importance. Our balanced connection can simply solve this issue by a convolutional kernel. This convolutional kernel assigns specific learned weights to each pixel of local features, thus, adaptively aggregating them along the channel dimension(2)Difference to Dense Connection. As shown in Figure 3(c), dense connection connects each ECB and all preceding ECBs to be concatenated and compressed as inputs to all subsequent ECBs, which requires more convolutional kernels and harms the overall efficiency. However, our balanced connection directly connects each ECB for feature aggregation, which not only fully uses local features but also greatly reduces the number of parameters and computation operations

(a) Balanced connection

(b) Skip connection

(c) Dense connection

3.3. Global Feature Aggregation

The spatial dimension is orthogonal to the channel dimension. Thus, further fusing local aggregated features along the spatial dimension could supplement more information. Besides, since local aggregated features contain abundant information, it could be suitable to aggregate them in a coarse to fine fashion. Therefore, we design GAM, which can further fuse local aggregated features with spatial attention mechanism in an iterative manner.

In Figure 2(d), represents the global aggregated feature in the iteration, and represents the output of the LAM. The iterative fusion of GAM can be formulated as where is initialized with , which is the output of the first LAM. represents the global aggregation of GAM.

The main parts of GAM are (1) spatial attention generation and (2) iterative feature aggregation. First, the spatial attention is generated by the following operation, where denotes a convolutional kernel that reduces the channel number of by half. denotes a depthwise convolutional kernel to extract spatial information. Depthwise convolution applies a single filter to each input channel, which is more efficient than common convolution in terms of memory and computation. is the Sigmoid activation function constraining the spatial attention to . The spatial attention is the same size as and . Second, as shown in Figure 2(d), the feature fusion in A-Unit can be formulated as where denotes the Hadamard product. is the tensor with all elements being 1.

After the iteration, we obtain the final global aggregated feature . The overall iterative global aggregation can be summarized as follows, where denotes the final spatial attention for , which is determined by all the local aggregated features from LAMs, thus, highly comprehensive. Additionally, Eq. (8) indicates that the global feature aggregation strategy satisfies the convex combination.

4. Experiments

4.1. Experimental Setup

4.1.1. Datasets and Metrics

We use the training set of DIV2K [30] to train all of our models. For testing, we use five standard benchmark datasets: Set5 [31], Set14 [32], BSD100 [33], Urban100 [34], and Manga109 [35]. The visual quality of SR results is evaluated with Peak Signal-to-Noise Ratio (PSNR) and Structural Similarity (SSIM) [36] on the channel (i.e., luminance) of transformed YCbCr space. We also represent the number of parameters and multiply-adds to evaluate the memory footprint and computation complexity, respectively.

4.1.2. Degradation Models

To fully demonstrate the effectiveness of our DFAN, we use two degradation models to simulate LR images. The first is the bicubic degradation model. The bicubic degradation model simulates LR images on scale , , and . The second is the blur-down degradation model that blurs HR images by Gaussian kernel with a standard deviation . The blurred image is then downsampled on scale .

4.1.3. Training Details

The size of LR patches is . During training, we randomly rotate input images by , , or and flip them horizontally or vertically. The batch size is . We use loss as the loss function. We use Adam as the optimizer. The initial learning rate is , decayed by half every 200 epochs. We train our model for 1000 epochs.

4.2. Study on Efficient Convolutional Block

Different from most of the super-resolution networks, our DFAN uses group convolutional kernels instead of standard convolutional kernels in an ECB to extract features. Since group convolution is a basic operation of our ECB, we design DFAN_W and DFAN_D to validate the effectiveness of ECB. These two models have the same structure as DFAN, but group convolutional kernels in ECBs are replaced with standard convolutional kernels. All three models have similar number of parameters and computation operations, i.e., approximately 900 K and 60 G, respectively.

We denote the number of ECBs in each LAM as , the number of LAMs as , and the number of channels of each intermediate feature as . For our DFAN, we set , , and to 10, 6, and 64, respectively, and the group number of each group convolutional kernel in ECBs is 8. We set , , and to 3, 2, and 64, respectively, for DFAN_W, and these hyperparameters to 10, 6, and 27, respectively, for DFAN_D. In other words, the width of DFAN_W is the same as DFAN. While the depth of DFAN_D is the same as DFAN.

As shown in Table 1, group convolution makes an outstanding trade-off between representation capability and computational costs. Compared with standard convolution, group convolution can make the model deeper or wider with limited parameters and calculations, which is beneficial to obtain richer hierarchical information.

4.3. Study on Dual Feature Aggregation

In this section, we experimentally investigate the effectiveness of the dual feature aggregation strategy. LAM0_GAM0 is the baseline network by removing balanced connections in LAM and GAM from DFAN. LAM1_GAM0 is built by removing GAM from DFAN. LAM1_GAM1 has both LAM and GAM, which is the same as DFAN. As shown in Table 2, when only LAM is added, PSNR is improved by approximately . When both LAM and GAM are added, the performance is improved by a large margin (PSNR: on Set14).

4.3.1. LAM Analysis

To intuitively show the effectiveness of LAM, we plot the training curves of LAM0_GAM0 and LAM1_GAM0 in Figure 4(a). Benefitting from the balanced connection in LAM, gradients could be better propagated. The margin between the two curves indicates that LAM could not only help the network converge faster but also help it converge to a better point. Additionally, the weight distribution is visualized in Figure 4(b). This indicates how much information of each ECB in an LAM contributes to the local aggregated feature generated by this LAM. Features from different ECBs contribute differently to local aggregated features, which suggest that LAM could adaptively aggregate hierarchical features to improve the final performance.

(a)

(b)

4.3.2. GAM Analysis

We experimentally prove that GAM also works well for some other networks. We use a shallower RCAN [7] as the baseline network (denoted as sRCAN). To facilitate network training, we set the RG number to 3, and the RCAB number to 5 for sRCAN. Then, we apply our GAM to sRCAN, which is denoted as sRCAN_GAM. As Table 3 shows, with only a small increase in parameters and computational complexity (Paramerters:+9 K, MultAdds:+0.9G), GAM can significantly improve the SR performance on all the benchmark datasets with scaling factor . Therefore, GAM could be used as a general lightweight tool to improve the performance of some existing SR methods.

To better understand the adaptive and iterative aggregation strategy of GAM, we visualize the spatial attention heatmaps generated by GAM in Figure 5. The 3D attention is transformed to 2D by taking the absolute mean along the channel dimension and then normalized to over the spatial dimension. We can see that (1) spatial attention for different LAMs focuses on regions of different frequencies. For example, the spatial attention for LAM_1 (Figure 5(a)) focuses on low-frequency regions such as the background. While the spatial attention for LAM_6 (Figure 5(f)) focuses more on high-frequency regions with rich textures. Thus, both high-frequency and low-frequency information is important for SR. (2) Although some spatial attention focuses on high-frequency regions, they emphasize different parts. In LAM_6, more attention is given to regions of the main object. But in LAM_5, high-frequency regions in the background are emphasized. It indicates that GAM provides additional flexibility to deal with different types of information, which could enhance the representation capability.

4.4. Results with Bicubic Degradation Model

We compare DFAN with other state-of-the-art methods: SRCNN [4], Fast Super-Resolution Convolutional Neural Network (FSRCNN) [37], Very Deep Super-Resolution(VDSR) [5], Deeply-Recursive Convolutional Network (DRCN) [38], DRRN [9], MemNet [10], CARN [11], IDN [12], and IMDN [15].

4.4.1. Quantitative Results

We evaluate the average PSNR and SSIM on five benchmark datasets. In particular, we also calculate the number of parameters and multiply-adds of these models by assuming the HR image size to be 720p (). In Table 4, the proposed DFAN performs favorably against these methods on all benchmark datasets for , , and SR. Note that the number of parameters of our method is inconsistent for different scales because we apply the pixelshuffle operation [29] for upscaling, and the convolutional kernels in the upscaling module are of different sizes for different scales. CARN [11] used to be a strong baseline for lightweight SR models, but our DFAN outperforms it by a large margin (PSNR:, SSIM:+0.0024 on Set5) with fewer parameters and fewer MultAdds on scale . It indicates that our method can achieve a better trade-off between computational cost and effectiveness. Therefore, feature aggregation has promising prospects in the research of lightweight image SR.

4.4.2. Visual Results

In Figure 6, we show visual comparisons on scale . Our method restores the letter “g” in “ppt3” more clearly, while most other methods encounter artifacts or edge distortion. For “img030” in Urban100 and “img86000” in BSD100, most methods do not reconstruct the contour of the window well, but our method can reconstruct these edges better.

4.5. Results with Blur-Down Degradation Model

As mentioned in the main submission, we further apply our method to super-resolve images with blur-down degradation, which is also commonly used in [7, 13]. We compare DFAN with SRCNN [4], FSRCNN [37], VDSR [5], CARN [11], IDN [12], and IMDN [15].

As shown in Table 5, compared with the networks that are stacked by several elaborately designed building blocks, such as CARN, IDN, and IMDN, our lightweight network with the dual feature aggregation strategy can better leverage the hierarchical features. In addition, the visual comparison in Figure 7 also demonstrates the superiority of our method.

5. Conclusions

We propose DFAN that can strike a better trade-off between SR performance and computational cost. The proposed dual feature aggregation strategy makes local and global feature aggregations adaptively. Through feature reuse, it could simultaneously improve feature utilization and representation ability. Benefitting from the dual feature aggregation strategy, our network achieves competitive performances with fewer parameters and lower computational complexity, which is more practical for real applications.

Data Availability

The image datasets supporting this work are from previously reported studies and datasets, which have been cited. The processed data are available at the repository: BasicSR(https://github.com/xinntao/BasicSR/blob/master/docs/DatasetPreparation.md#Image-Super-Resolution).

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

This work was supported by the National Key R&D Program of China (2019YFB1406200) and was also the research achievement of the Key Laboratory of Digital Rights Services. It is based on our previous teamwork, Lightweight Image Super-Resolution via Dual Feature Aggregation Network, presented in 2021 at the 2nd International Conference on Culture-oriented Science & Technology (ICCST).

References

X. Wang, K. C. Chan, K. Yu, C. Dong, and C. Change Loy, “Edvr: video restoration with enhanced deformable convolutional networks,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, Long Beach California, USA, 2019.
View at: Google Scholar
J. Kim, G. Li, I. Yun, C. Jung, and J. Kim, “Edge and identity preserving network for face super-resolution,” Neurocomputing, vol. 446, pp. 11–22, 2021.
View at: Publisher Site | Google Scholar
J. Peng, K. Fu, Q. Wei, Y. Qin, and Q. He, “Improved multiview decomposition for single-image high-resolution 3D object reconstruction,” Wireless Communications and Mobile Computing, vol. 2020, 14 pages, 2020.
View at: Publisher Site | Google Scholar
C. Dong, C. C. Loy, K. He, and X. Tang, “Image super-resolution using deep convolutional networks,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 38, no. 2, pp. 295–307, 2016.
View at: Google Scholar
J. Kim, J. Kwon Lee, and K. Mu Lee, “Accurate image super-resolution using very deep convolutional networks,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1646–1654, Las Vegas Nevada, USA, 2016.
View at: Google Scholar
B. Lim, S. Son, H. Kim, S. Nah, and K. Mu Lee, “Enhanced deep residual networks for single image super resolution,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 136–144, Honolulu Hawaii, USA, 2017.
View at: Google Scholar
Y. Zhang, K. Li, K. Li, L. Wang, B. Zhong, and Y. Fu, “Image super-resolution using very deep residual channel attention networks,” in Proceedings of the European Conference on Computer Vision, pp. 286–301, 2018.
View at: Google Scholar
K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778, Las Vegas Nevada, USA, 2016.
View at: Google Scholar
Y. Tai, J. Yang, and X. Liu, “Image super-resolution via deep recursive residual network,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3147–3155, Honolulu, Hawaii, 2017.
View at: Google Scholar
Y. Tai, J. Yang, X. Liu, and C. Xu, “Memnet: a persistent memory network for image restoration,” in Proceedings of the IEEE International Conference on Computer Vision, pp. 4539–4547, Venice, Italy, 2017.
View at: Google Scholar
N. Ahn, B. Kang, and K.-A. Sohn, “Fast, accurate, and lightweight super-resolution with cascading residual network,” in Proceedings of the European Conference on Computer Vision, pp. 252–268, Munich, Germany, 2018.
View at: Google Scholar
Z. Hui, X. Wang, and X. Gao, “Fast and accurate single image super-resolution via information distillation network,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 723–731, Salt Lake City Utah, 2018.
View at: Google Scholar
Y. Zhang, Y. Tian, Y. Kong, B. Zhong, and Y. Fu, “Residual dense network for image super-resolution,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2472–2481, Salt Lake City Utah, 2018.
View at: Google Scholar
Z. Luo, Y. Huang, S. Li, L. Wang, and T. Tan, “Unfolding the alternating optimization for blind super resolution,” Advances in Neural Information Processing Systems (NeurIPS), vol. 33, 2020.
View at: Google Scholar
Z. Hui, X. Gao, Y. Yang, and X. Wang, “Lightweight image super-resolution with information multidistillation network,” in Proceedings of the 27th ACM International Conference on Multimedia, pp. 2024–2032, Nice, France, 2019.
View at: Google Scholar
M. Wang, B. Liu, and H. Foroosh, “Factorized convolutional neural networks,” in Proceedings of the IEEE International Conference on Computer Vision, pp. 545–553, Venice, Italy, 2017.
View at: Google Scholar
F. N. Iandola, S. Han, M. W. Moskewicz, K. Ashraf, W. J. Dally, and K. Keutzer, “Squeezenet: Alexnet-level accuracy with 50x fewer parameters and<0.5 mb model size,” 2016, https://arxiv.org/abs/1602.07360.
View at: Google Scholar
A. G. Howard, M. Zhu, B. Chen et al., “Mobilenets: efficient convolutional neural networks for mobile vision applications,” 2017, https://arxiv.org/abs/1704.04861.
View at: Google Scholar
J. Wu, C. Leng, Y. Wang, Q. Hu, and J. Cheng, “Quantized convolutional neural networks for mobile devices,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4820–4828, Las Vegas Nevada, USA, 2016.
View at: Google Scholar
A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep convolutional neural networks,” Advances in Neural Information Processing Systems, vol. 25, pp. 1097–1105, 2012.
View at: Google Scholar
L. Sifre and S. Mallat, “Rigid-motion scattering for texture classification,” 2014, https://arxiv.org/abs/1403.1687.
View at: Google Scholar
X. Zhang, X. Zhou, M. Lin, and J. Sun, “Shufflenet: an extremely efficient convolutional neural network for mobile devices,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856, Salt Lake City Utah, USA, 2018.
View at: Google Scholar
M. D. Zeiler and R. Fergus, “Visualizing and understanding convolutional networks,” in European Conference on Computer Vision, pp. 818–833, Springer, Cham, 2014.
View at: Google Scholar
J. Long, E. Shelhamer, and T. Darrell, “Fully convolutional networks for semantic segmentation,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3431–3440, Boston Massachusetts, USA, 2015.
View at: Google Scholar
G. Huang, Z. Liu, L. Van Der Maaten, and K. Q. Weinberger, “Densely connected convolutional networks,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4700–4708, Honolulu Hawaii, USA, 2017.
View at: Google Scholar
T.-Y. Lin, P. Dollár, R. Girshick, K. He, B. Hariharan, and S. Belongie, “Feature pyramid networks for object detection,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2117–2125, 2017.
View at: Google Scholar
J. Liu, R. Jia, W. Li, F. Ma, and X. Wang, “Image dehazing method of transmission line for unmanned aerial vehicle inspection based on densely connection pyramid network,” Wireless Communications and Mobile Computing, vol. 2020, Article ID 8857271, 9 pages, 2020.
View at: Publisher Site | Google Scholar
F. Yu, D. Wang, E. Shelhamer, and T. Darrell, “Deep layer aggregation,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2403–2412, Salt Lake City Utah, USA, 2018.
View at: Google Scholar
W. Shi, J. Caballero, F. Huszár et al., “Real-time single image and video super-resolution using an efficient sub-pixel convolutional neural network,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1874–1883, Las Vegas Nevada, USA, 2016.
View at: Google Scholar
E. Agustsson and R. Timofte, “Ntire 2017 challenge on single image super-resolution: dataset and study,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 126–135, Honolulu Hawaii, USA, 2017.
View at: Google Scholar
M. Bevilacqua, A. Roumy, C. Guillemot, and M.-L. A. Morel, “Low-complexity single-image superresolution based on nonnegative neighbor embedding,” in British Machine Vision Conference, BMVA Press, 2012.
View at: Google Scholar
R. Zeyde, M. Elad, and M. Protter, “On single image scale-up using sparse-representations,” in International Conference on Curves and Surfaces, pp. 711–730, Springer, 2010.
View at: Google Scholar
D. Martin, C. Fowlkes, D. Tal, and J. Malik, “A database of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics,” in Proceedings Eighth IEEE International Conference on Computer Vision. ICCV, pp. 416–423, Vancouver, BC, Canada, 2001.
View at: Google Scholar
J.-B. Huang, A. Singh, and N. Ahuja, “Single image super-resolution from transformed self-exemplars,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5197–5206, Boston Massachusetts, USA, 2015.
View at: Google Scholar
Y. Matsui, K. Ito, Y. Aramaki et al., “Sketch-based manga retrieval using manga109 dataset,” Multimedia Tools and Applications, vol. 76, no. 20, pp. 21811–21838, 2017.
View at: Publisher Site | Google Scholar
Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli, “Image quality assessment: from error visibility to structural similarity,” IEEE Transactions on Image Processing, vol. 13, no. 4, pp. 600–612, 2004.
View at: Publisher Site | Google Scholar
C. Dong, C. C. Loy, and X. Tang, “Accelerating the super-resolution convolutional neural network,” in European Conference on Computer Vision, pp. 391–407, Springer, 2016.
View at: Google Scholar
J. Kim, J. Kwon Lee, and K. Mu Lee, “Deeply-recursive convolutional network for image super-resolution,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1637–1645, Las Vegas Nevada, USA, 2016.
View at: Google Scholar
W.-S. Lai, J.-B. Huang, N. Ahuja, and M.-H. Yang, “Deep Laplacian pyramid networks for fast and accurate super-resolution,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 624–632, Honolulu Hawaii, USA, 2017.
View at: Google Scholar

Copyright

Copyright © 2022 Shang Li et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

PDF Download Citation

Download other formats

Order printed copies

Views

532

Downloads

494

Citations

Wireless Communications and Mobile Computing

Conference Issue: Intelligent Media Computing Technology and Applications for Mobile Internet

DFAN: Dual Feature Aggregation Network for Lightweight Image Super-Resolution

Abstract

1. Introduction

2. Related Work

2.1. Lightweight SISR

2.2. Group Convolution

2.3. Deep Feature Aggregation

3. Proposed Method

3.1. Network Architecture

3.2. Local Feature Aggregation

3.2.1. Efficient Convolutional Block

3.2.2. Balanced Connection

3.3. Global Feature Aggregation

4. Experiments

4.1. Experimental Setup

4.1.1. Datasets and Metrics

4.1.2. Degradation Models

4.1.3. Training Details

4.2. Study on Efficient Convolutional Block

4.3. Study on Dual Feature Aggregation

4.3.1. LAM Analysis

4.3.2. GAM Analysis

4.4. Results with Bicubic Degradation Model

4.4.1. Quantitative Results

4.4.2. Visual Results

4.5. Results with Blur-Down Degradation Model

5. Conclusions

Data Availability

Conflicts of Interest

Acknowledgments

References

Copyright