Abstract

The range of light illumination in real scenes is very large, and ordinary cameras can only record a small part of this range, which is far lower than the range of human eyes’ perception of light. High-dynamic range (HDR) imaging technology that has appeared in recent years can record a wider range of illumination than the perceptual range of the human eye. However, the current mainstream HDR imaging technology is to capture multiple low-dynamic range (LDR) images of the same scene with different exposures and then merge them into one HDR image, which greatly increases the amount of data captured. The advent of single-pixel cameras (compressive imaging system) has proved the feasibility of obtaining and restoring image data based on compressive sensing. Therefore, this paper proposes a method for reduced-dimensional capture of high dynamic range images with compressive sensing, which includes algorithms for front end (capturing) and back end (processing). At the front end, the K-SVD dictionary is used to compressive sensing the input multiple-exposure image sequence, thereby reducing the amount of data transmitted to the back end. At the back end, the Orthogonal Matching Pursuit (OMP) algorithm is used to reconstruct the input multiple-exposure image sequence. A low-rank PatchMatch algorithm is proposed to merge the reconstructed image sequence to obtain an HDR image. Simulation results show that, under the premise of reducing the complexity of the front-end equipment and the amount of communication data between the front end and the back end, the overall system achieves a good balance between the amount of calculation and the quality of the HDR image obtained.

1. Introduction

With the development of mobile Internet and Internet of Things (IoT) technology, devices with cameras are becoming more common, such as smart phones, network surveillance cameras, laptop computers, autonomous vehicles, and traffic monitoring cameras. Furthermore, camera is now an essential feature for smartphones and laptops. However, common cameras on the market can only capture low-dynamic range (LDR) images, i.e., these cameras can only capture a small part of the rang of illuminance in a real scene. The dynamic range of the real scene perceptible to the human eye is as high as 108 : 1, but the dynamic range of the LDR images captured by these cameras is only 28 : 1 or 216 : 1, which makes the LDR images unable to truly represent the real scene. To solve this problem, high-dynamic range (HDR) imaging technique has been proposed, and it can capture a wider range of illumination than that of human eye. There are two ways to obtain HDR images: software and hardware. The hardware method directly captures HDR images by increasing the dynamic range of the sensor, but the range is very limited, and it is expensive [1]. Therefore, the software method is currently the main method, i.e., fusing multiple-exposure LDR images (hereinafter called image sequence or sequence) to obtain HDR images. The fusion method can be further divided into two categories: one is to restore the Camera Response Function (CRF) and then reconstruct the HDR light radiation pattern [2]; the other is to directly fuse the pixels of multiple-exposure sequences at the pixel level. Both categories of methods need to consider all the pixels of the multiple-exposure image sequence, increasing the computational complexity and storage space. In the process of transmission and storage, the images are further compressed and transformed to remove redundancy to extract the required information. This method of sampling after compression results in sampling redundancy, excessive storage space, and increased transmission costs.

Compressive sensing (CS, also called compressed sensing) [35] can solve the above problem, which compresses signal while sampling. CS breaks through the limitation of the traditional Shannon sampling theorem and can perform high-probability reconstruction of incomplete signals at a condition far below the Nyquist sampling rate. Rice University has developed a single-pixel camera based on the theory of compressive sensing [6]. By replacing the CCD or CMOS sensors with a digital micromirror array (DMD) and a single photon detector, it only needs to sample the image fewer times than the number of pixels. Its appearance confirms the feasibility of compressive sensing applying to imaging systems. Therefore, this paper proposes a method for reduced-dimensional capture of high dynamic range images with compressive sensing, which includes algorithms for front end (capturing) and back end (processing). At the front end, the K-SVD dictionary is used to compressive sensing the input multiple-exposure image sequence, thereby reducing the amount of data transmitted to the back end. At the back end, the Orthogonal Matching Pursuit (OMP) algorithm is used to reconstruct the input multiple-exposure image sequence. A low-rank PatchMatch algorithm is proposed to merge the reconstructed image sequence to obtain an HDR image.

2. Materials and Methods

Figure 1 is the schematic of the proposed method. The whole system of this method includes three parts: front end, communication, and back end. In practical applications, especially for IoT applications, the front end is generally a low-power device with very limited computing resources, and its main role is to sense the real world. The back end is generally a cloud computing center or edge computing node, which has powerful computing resources, but is far away from the field that needs to be sensed. Communication between front end and back end includes Ethernet, mobile communication networks (including 2G, 3G, 4G, 5G, and NB-IoT), wireless local area networks (WLAN), and low-power wide area networks (LPWAN, including Lora, Sigfox). Among these communication methods, the wired network (such as Ethernet) is rarely used to directly connect front-end equipment because of high deployment costs and inflexibility. Because WLAN has a limited transmission distance, it is applicable but is relatively limited. The communication distance of LPWAN is very long, but its bandwidth is also very small, which is difficult to use for transmitting traditional image sequences. The most suitable technology for transmitting image sequences is the mobile communication network (especially 4G and 5G), but the larger the bandwidth used is, the higher the price is, and the more energy the front end uses for communication. Therefore, it is necessary to reduce the computational complexity of the front-end device and the amount of communication data between the front end and the back end. The method proposed in this paper uses compressive sensing technology to reduce the computational complexity and data volume of HDR image capturing front-end devices, thereby reducing the cost of the entire system.

2.1. Reduced-Dimensional Capture and Reconstruction of Multiple-Exposure Image Sequences

Since compressive-sensing cameras are not common now, we assume that the front end uses a common camera to capture a series of LDR images with different exposures. This assumption makes the system not only easier to implement, but also easier to compare with other methods. Every image in the image sequence is resampled using compressive sensing. Compressive sensing includes sparse representation of signals, design of measurement matrices, and design of signal reconstruction algorithms [7].

2.2. Sparse Representation of Image with Overcomplete Dictionaries

Signals are not sparse in practical applications, but when a suitable basis is used to represent the signals, they are sparse or compressible [8], i.e., the number of nonzero elements is small, which is conducive to the improvement of the sampling rate. The use of sparse representation in multiple fields has become increasingly mature, such as compression, regularization in inverse problems, and feature extraction [9]. Sparseness is the premise of compressive sensing, which means that the signal itself is sparse or sparse after some transformation, for example, transforming nonsparse signals into sparse ones by Fourier transform, discrete wavelet transform [7], i.e., the nonsparse signal is represented by a linear combination of several atoms in a fixed dictionary (such as a DCT dictionary, a wavelet dictionary, a Haar dictionary, and a Gabor dictionary). The fixed dictionary has a simple structure and simple calculation, but it can only be applied to a limited range of signals, and the sparse representation cannot be guaranteed to be optimal, i.e., the sparseness of the signal sparse representation cannot be guaranteed. To best suit a set of given signals, we can train an overcomplete dictionary with the given signals. The K-SVD [9] method can continuously iterate through sparse coding and dictionary update to optimize the sparse representation of the signal on the premise of a given training set. K-SVD can be regarded as a generalized form of K-means clustering. The only difference is that the number of atoms used for each signal is different.

Blocking each image in the multiple-exposure image sequence can further reduce storage space and the block size can usually be , and so on [10]. Figure 2 is the flowchart of the K-SVD algorithm. The input image sequence is {I1, I2, …, IM}, where M is the number of images in the sequence and Im, m = 1, 2, …, M, is an image in the sequence. Here we suppose that all images in the sequence have the same size of . All images in image sequence are divided into blocks (with block size ) and pixels in each block are rearranged into a column vector yi, i = 1, 2, …, N. In case that b is not divisible by r or c, the image is expanded by 0. is the number of vectors generated from the image sequence and n = b2 is the length of yi. is a matrix of column vectors yi. The dictionary is made up of atom vector dk, where K is the total number of atoms in D(J) and the superscript (J) is the number of iterations. is the sparse representation of Y under dictionary D and is made up of row vectors , where the subscript T of indicates that is a row vector and superscript T indicates matrix transpose. Equation (1) is the object function of K-SVD, where is the ith column of matrix X and T0 is the predetermined number of nonzero elements in :

The matrix Ek is the error for all the input signal when the kth atom is removed. The detail of and SVD decomposition of can be found in [9].

2.3. Measurement Matrix Design

After the signal is sparsely represented, a suitable measurement matrix is needed to compressive sense the signal. The design principle of the measurement matrix is that the sensing matrix should meet the Restricted Isometry Property (RIP) [11], [12] to ensure one-to-one mapping from the original space to the sparse space. The compressive sensing of signal yi is shown in (2), where is the compressed sample of signal yi:

When is a Gaussian random matrix, the sensing matrix can satisfy the RIP with large probability [13]. The advantage of a Gaussian measurement matrix is that it is not related to almost any sparse signal, so it requires very few measurements. Therefore, we use the Gaussian random matrix as measurement matrix.

2.4. Reconstructing Image Sequence

The reconstruction method is the core step of compressive sensing. The quality of the reconstruction method determines the quality of the reconstructed image. Compressive sensing reconstruction methods mainly include three categories [14]. The first is greedy algorithm (such as orthogonal matching pursuit (OMP) [15], stagewise orthogonal matching pursuit (StOMP) [16], and regular orthogonal matching pursuit) (ROMP) [17]). This method solves the local optimal solution to approximate the signal in each iteration. The second is a convex optimization algorithm (such as the base tracking algorithm (BP) [18], the interior point method [19], the gradient projection method [20], and the iterative threshold algorithm [21]). Convex optimization can achieve better reconstruction results with a small number of samples but has a higher computational complexity. The third is combination optimization algorithm, which uses the group testing to accurately reconstruct the signal. The reconstruction speed is fast, but the scope of application is limited, such as HHS Pursuit [22]. In this paper, we use the OMP algorithm to reconstruct the image sequence. The performance of the OMP algorithm is stable and the reconstruction accuracy is high, which can ensure that the original signal is accurately recovered at a lower sampling rate.

Given the sensing matrix and the compressed sample zi of signal yi, the OMP algorithm can estimate the sparse representation xi of signal yi. Then the signal yi can be recovered by 3

The idea behind the OMP is to pick columns in a greedy fashion, i.e., at each iteration t, the column of that is most strongly correlated with the remaining part of is chosen [15]. Figure 3 is the flowchart of the OMP algorithm. The input is the sensing matrix and one of the compressed signals z = zi, i = 1, 2, …, N in (2). After running N times of OMP, we can get the matrix X (the sparse representation of Y) and Y can be calculated column by column using (2). At last the image sequence can be reconstructed from Y.

2.5. Low-Rank PatchMatch Algorithm

During the capture process of the multiple-exposure image sequence, camera shaking or unpredictable moving objects in the scene are inevitable, which will cause artifacts or noise to appear in the final fused HDR image. Currently, block matching fusion method is mainly used to eliminate the artifacts and noise. The essence of block matching fusion is to find a mapping relationship between two different images A and B (given the image block set of A and B as {PA} and {PB}, respectively), i.e., by calculating the correlation, find the nearest-neighbor field (NNF) of B, so that the error of similar image blocks in the two images is minimized. By looking for the block in {PA} that is closest to block in {PB}, the artifacts in the fused image are reduced.

If image block matching is performed through a full search, the complexity is as high as O(mM2), where m and M are size of the image and the size of the block, respectively. To reduce the complexity, Connelly Barnes et al. [23], [24] proposed a fast PatchMatch algorithm with randomized nearest neighbor and successfully reduced the complexity of the algorithm to O(mMlog(M)). The main steps of the algorithm can be summarized as initialization, propagation, and random search. Due to the high efficiency and better performance of the PatchMatch algorithm, it has a profound impact in the fields of image stitching, image completion, and image reorganization.

In fact, there are generally more than three multiple-exposure LDR images of the same scene to fuse into HDR image, so Pradeep Sen et al. [25] proposed multisource bidirectional similarity (MBDS), as shown in (4). S is the original image, and T is the target image. N is the number of source images. P and Q are patches in S and T, respectively. weighs the source patches when calculating completeness based on how well-exposed they are. In order to measure the weight of a well-exposed image block, the well-exposed image block has a large weight, and vice versa. is a distance metric, which is usually calculated using the l2 norm. |T| is the total number of image blocks of the target image. This formula mainly includes the integrity of mapping from S to T and the correlation from T to S. MBDS selects well-exposed blocks in the image sequence to fill the registration image, so it can achieve better registration results:

From the perspective of low-rank matrix recovery, combined with the idea of MBDS, this paper proposes an improved algorithm for removing artifacts from HDR images. The objective function is shown in (5). The input image sequence has N images , i = 1, …, N, and is the reference image selected from the sequence. , i = 1, …, N, is the result image of being aligned to the reference image ; that is, the content of is aligned with the reference image, and the exposure parameters remain the same with . Function is the mapping from exposure parameter i to exposure parameter j. Function maps the grayscale domain of LDR to the radiance domain of HDR. Function turns a two-dimensional image into a column vector:

Solve the MBDS problem to get . More details can be found in [25]. In addition, the addition of a low-rank constraint enables the aligned images to ensure a sufficiently low rank, i.e., to maintain a linear correlation in brightness. The solution is to divide into two independent local optimization subproblems, namely, the problem of MBDS and low-rank matrix recovery. At the same time, the iterative solution under multiresolution scale is used to find the optimal solution of MBDS. The low-rank matrix finally obtained is the target HDR image with high dynamic range and linear brightness of the scene. The process is shown in Figure 4.

3. Results and Discussion

This section will analyze the convergence of the low-rank PatchMatch algorithm, simulate the multiexposure image compressive sensing and reconstruction algorithm and the antiartifact fusion algorithm, and evaluate the algorithms in terms of subjective and objective criteria.

3.1. Convergence of the Low-Rank PatchMatch Algorithm

Randomly generate data matrices with rank of r and size of , and add sparse noise with a noise ratio of . Validate the convergence by two sets of experiments. In first set, fix matrix rank r to 1, and observe the convergence under different noise ratios . In the second set, fix the noise ratio , which is set to 0.2 in the experiment, and observe the convergence under different ranks r. The results are shown in Figure 5. In both cases, the low-rank PatchMatch algorithm converges within 5 iterations.

3.2. Image Evaluation Criteria

In this paper, we will use the mean squared error (MSE), peak signal-to-noise ratio (PSNR), information entropy, average gradient, and running time to objectively evaluate the image quality and algorithm.

The definition of mean squared error (MSE) is shown in (4), where m and n are the width and height of the images, and f and are two different images. The standard deviation represents how much the experimental data deviates from the mean. The higher the standard deviation, the more diverse the result data, and the lower the accuracy of the result:

PSNR is a commonly used criterion for reconstructed image quality evaluation. According to the definition of the standard deviation in equation (4), the definition of PSNR is given in equation (5), where is the maximum value of the pixel value of an image, for example, 255 for an 8 bit grey image. The larger the PSNR value, the lower the degree of distortion of the image:

Information entropy represents the average information of an image, that is, the average information after removing redundant information. The definition is shown in equation (7), where is the probability that the brightness appears in the image, and L is the maximum grey value of the image:

The average gradient characterizes the relative sharpness of the image and reflects the rate of change in contrast of details. The larger the average gradient is, the larger the changes of grey level are, and the richer the levels of the image are. The definition is shown in equation (8), where M and N are the number of rows and columns of the image f, respectively:

For the fusion of multiple-exposure images of complex scenes with moving objects, the evaluation of deghosting needs to be further explored. At present, there is no mature objective criterion to evaluate the deghosting of HDR images. In this paper, the deghosting evaluation method proposed by Karaduzovic-Hadziabdic and Telaovic et al. [26] is used, and the test image set used is a complex real scene.

3.3. Simulation of Compressive Sensing and Reconstruction for Multiple-Exposure Images

The simulation platform for this experiment is MATLAB 2015b; the hardware is 32G memory, Intel Core i5-6600K processor (main frequency 3.5 GHz). Airplane and Lena with an image size of 512 × 512 were selected for simulation, and the simulation results were compared with BP, OMP, and StOMP algorithms at lower (R = 0.3), medium (R = 0.5), and higher (R = 0.7) sampling rates, respectively. The simulation results are shown in Table 1.

Among the three major types of algorithms for compressive sensing reconstruction, the convex optimization algorithm has the best performance, but then it has the highest complexity and the longest reconstruction time. As a representative of convex optimization algorithm, BP has better reconstruction performance than greedy algorithm and combinatorial optimization algorithm. At the sampling rate of 0.3 and 0.5, compared with BP algorithm, the performance of our algorithm is better than BP algorithm. With a sampling rate of 0.8, although the PSNR of our algorithm is slightly lower than BP, it is higher than the other algorithms. In addition, the reconstruction time of our algorithm is shorter than the reconstruction time of the BP algorithm except that the sampling rate is low (R = 0.3).

The simulation results are shown in Figure 6 when the sampling rate is 0.5. From a subjective point of view, for the letter area on the fuselage and wings of the airplane image, both the BP algorithm and our algorithm can recover the clear letters, but the letters recovered by the OMP algorithm and the StOMP algorithm are blurred. The images recovered by BP, OMP, and StOMP algorithms all have obvious noise. The particle noise of StOMP algorithm is the most obvious. Our algorithm can basically restore the image information correctly.

The reconstruction difference of the Lena image is not as obvious as the aircraft image from the subjective point of view, but it can be observed that the images recovered by BP, OMP, and StOMP all have different degrees of noise, and the particle noise of StOMP is the most obvious.

Among the above algorithms, the StOMP algorithm has the shortest reconstruction time, but the reconstruction effect is also the worst, and the particle noise is very obvious. Because of the existence of noise, the average gradient of the StOMP algorithm is higher than that of other algorithms. The average gradient of our algorithm is similar to the OMP algorithm, which is better than the BP algorithm. From the perspective of MSE, the MSE of BP algorithm is the smallest, and our algorithm is second. Information entropy is similar to the case of MSE.

3.4. Simulation Multiple-Exposure Images Fusion Algorithm

In this section, the multiple-exposure image sequences Arch, Sculpture Garden, and Puppet are compressive sensed, reconstructed, and then fused into HDR images. The results are compared with robust principal component analysis (RPCA) [27], partial sum of singular value (PSSV) [28], [29], and [2], MBDS algorithm (referred to as SEN) proposed by Pradeep Sen et al. [25], the brightness and texture consistency deghosting method (referred to as HU) proposed by Jun Hu et al. [30], and the low-rank restoration based deghosting fusion proposed by Tae-Hyun Oh et al. [31].

Figure 7 shows the Arch image sequence. There are moving people in the picture. Artifacts can occur with direct fusion. Reference [2] and RPCA cannot suppress the appearance of artifacts, and our algorithm and PSSV algorithm can both suppress artifacts well and have better subjective visual effects.

Figure 8 shows the results of the Puppet sequence. Our algorithm adds low-rank constraints to minimize the impact of misaligned regions and keep the resulting image as linear as possible. It can be seen from the result that our algorithm is better.

Figure 9 shows the results of Sculpture Garden sequence. There are many pedestrians in the picture, which makes it difficult to remove artifacts. From the results of the fusion, [30] is the worst, and there are obvious artifacts in [6], and the SEN method has a fuzzy phenomenon. Both HU and this algorithm suppress artifacts well, but due to the effect of image blocking, block effect exists in the fusion result.

Figure 10 is the local area details of the fusion result in Figure 9. Because result of [31] is the worst compared to other algorithms, the detail of it is not enlarged. The literature [6] is less effective in removing artifacts because of the obvious silhouette cross. There is obvious blurring at the pedestrian edges of the SEN method. HU and our algorithm achieve better results, but our algorithm has noise caused by block effects.

4. Conclusions

Aiming at the problems of traditional cameras with redundant sampling, large storage space consumption, and inability to fully record the radiance in the real scene due to the limitation of the dynamic range of the sensor, this article uses the K-SVD dictionary to compressive sensing LDR images of different exposure in the same scene. Then the LDR images is reconstructed and fused with low-rank PatchMatch algorithm to get an HDR image. The simulation results show that the method in this paper can effectively reduce the sampling rate and remove the image artifacts and blurring caused by the camera shake and the motion of the objects in the scene. It provides a method for compressive sensing to obtain HDR images.

However, due to the introduction of block compressive sensing, the size of the image block has become a factor that cannot be ignored. Simulation results show that when the image block is small, the block effect is more obvious and the edge details are distorted. But when increasing the image block, it will increase the storage space and computational complexity. In addition, adding compressive sensing and dictionary learning before fusion increases the computation time, sacrificing time in exchange for reduction in complexity, and sampling rate. Therefore, the next step is to perform pixel-level fusion in the compressive sensing domain of the HDR image to further reduce the time required for the algorithm and improve the quality of the fused image.

Data Availability

No data were used to support this study.

Conflicts of Interest

The authors declare that there are no conflicts of interest regarding the publication of this paper.

Acknowledgments

This work was funded by the Project for Distinctive Innovation of Ordinary Universities of Guangdong Province (no. 2018KTSCX120), the Ph.D. Start-Up Fund of Natural Science Foundation of Guangdong Province (no. 2016A030310335), and Guangdong Science and Technology Plan Project (no. 76120-42020022).