Abstract

With the rapid development of Internet technology, images on the Internet are used in various aspects of people’s lives. The security and authorization of images are strongly dependent on image quality. Some potential problems have also emerged, among which the quality assessment and denoising of images are particularly evident. This paper proposes a novel NR-IQA method based on the dual convolutional neural network structure, which combines saliency detection with the human visual system (HSV), used as a weighting function to reflect the important distortion caused by the local area. The model is trained using gray and color features in the HSV space. It is applied to the parameter selection of an image denoising algorithm. The experiment proves that our proposed method can accurately evaluate image quality in the process of denoising. It provides great help in parameter optimization iteration and improves the performance of the algorithm. Through experiments, we obtain both improved image quality and a reasonable result of subject assessment when the cascaded algorithm is applied in image security and authorization.

1. Introduction

With the continuous increase in the size of video surveillance networks, current large-scale video surveillance systems contain tens of thousands of cameras and millions of millions of images or pictures. However, they may be interfered and attacked [1, 2] during authorization, encryption [3], acquisition, coding, and transmission. As a result, the quality of video has deteriorated, which has affected the development of related work. In terms of artificial intelligence, for example, image content analysis, text detection, and recognition of traffic signs [4] require high image quality to ensure the accuracy and reliability of detection and recognition.

Today’s systems of security and authorization are very large and there are millions of images and videos produced by the Internet every day. Under such circumstances, it is unrealistic to employ a large number of people to subjectively evaluate the quality of each monitored image and video. Therefore, how to objectively evaluate such quality to meet the needs of monitoring and authorization has become a new direction of research in the field of security and authorization.

Image quality assessment is divided into subjective and objective evaluation methods. The subjective evaluation method [57] mainly evaluates image quality through human viewing and is also suitable for the evaluation of video image quality. However, this method is influenced by both subjective human factors and objective factors arising from the observation environment. Therefore, the stability of the evaluation results is poor. An objective evaluation method is based on a mathematical model, which has the advantages of simple operation, fast calculation speed, and the ability to be embedded in the system. Objective evaluation methods of image quality are divided into three types: no parameter, semiparameter, and parameter. This article is based on the actual situation of the application, using the parameter-free algorithm for image distortion.

A no-reference image quality assessment (NR-IQA) method can not only evaluate image processing algorithms but also optimize the image system [8, 9]. At present, remarkable results have been achieved. Therefore, this paper proposes a parameter selection for a denoising algorithm based on the evaluation of no-reference image quality [10] and applies the no-reference image quality evaluation method to the image denoising algorithm [1113], making it possible to adaptively select the optimal parameters. We embed it in the ROF model to experiment [14] in the search for the optimal parameters, while reducing the number of iterations of the algorithm. This achieved the best denoising effect.

2.1. Image Preprocessing

Image preprocessing is a very important part of the image system. It can provide reliable data for subsequent image processing, thereby improving detection and recognition accuracy. There are many methods, such as image denoising and image enhancement, which can eliminate interference information and purposefully enhance useful information respectively. However, there are very few methods of image preprocessing in the field of image quality evaluation, because most preprocessing operations will enhance or reduce the distortion information in the image. The existing no-reference image quality evaluation methods, such as those discussed in the literature [15, 16], use local contrast normalization as a preprocessing operation. As Ruderma described in [17], image local normalization has the effect of image decorrelation. Given an image, the grayscale image is obtained and each pixel in the image is subtracted from its local mean and divided by its local variance, as shown in Figure 1. Assuming that is the pixel value at position in the image, the local comparison operation is normalized:where , , and are the height and width of the image, and is a positive constant to avoid having the denominator be zero.where is a 2-dimensional symmetric Gaussian weighting function and K and L are the normalized window sizes. The literature [15] shows that when calculating the local mean and variance, the performance of the algorithm is relatively stable with the change of window scale. However, when the window scale becomes very large, the performance of the algorithm will begin to decrease because the calculated mean and variance are no longer local information.

2.2. Visual Saliency

In the method proposed by Le Kang, the average value of all local quality scores for the test image is taken as the quality value for the entire image. However, such methods do not take into account human visual perception of the image, because the content of the local image block and its position in the entire image will affect the viewer’s visual perception. For example, people are less sensitive to distortion in flat areas of an image (such as blue sky) than in complex textured areas (such as edges). Each block in the image has a different effect on human-perceived image quality. We therefore use the saliency of image blocks in the following study as a weight to represent the degree of human visual perception. Figure 2 shows how to predict a saliency map.

3. Method

3.1. Input

Previous NR-IQA methods based on deep learning, such as [16, 18], consider only the information of grayscale images and ignore the distortion information contained in the color components of the image. Yet, as can be seen in Figure 3, distortion has the most significant effect on the hue component. Therefore, we use HSV as the color space and use the image hue component and grayscale information as the input for the CNN model to extract features related to image distortion.

3.2. CNN

In this section, we will introduce the proposed CNN model in detail. Figure 4 shows the network structure of the model. It consists of two parts. Apart from the different inputs, the other structures, including the number and size of the convolution kernels, are the same. One of the inputs is the gray scale image block and the other input is the hue image block. The size of the block is 64 64. For the network, the first layer is a convolutional layer with 96 convolutional kernels with size of 5 5 and stride of 2 pixels for each. It produces 96 feature maps with size of 30 30 for each. A max pooling layer follows the convolutional layer to reduce each feature map to size of 15 15. After the pooling layer, there is an inception module that produces 256 feature maps with size of 15 15 for each. The next layer is a convolutional layer with 128 convolutional kernels with size of 3 3 and a padding of 1 pixels. It produces 128 feature maps with size of 15 15 for each. Following the convolutional layer is a max pooling layer that samples feature maps to a size of 7 7. The sixth layer is a fully connected layer of 192 nodes. The next layer is a concatenate layer that connects two feature vectors with dimension of 192 for each layer. It has two components. The last layer is also a fully connected layer that gives the quality score of the image block. Except for the last fully connected layer, each layer is followed by a ReLUs (Rectified Linear Units) activation layer.

In addition, to prevent overfitting when training the CNN model, we apply dropout regularization with a ratio of 0.5 before the last fully connected layer. Because image quality assessment is a quantitative problem, the CNN needs continuous variable prediction. Thus, the Euclidean distance loss function is chosen as the learning loss function:where is the image block quality score predicted by CNN, is the image block quality score, and is the number of blocks.

3.3. Image Quality Assessment

Through the proposed CNN model, the local image quality evaluation can be obtained. An RGB image is given, converted into a grayscale image and a hue image in the HSV color space, and then the grayscale and hue images are sampled separately. The sampled image block is then subjected to a preprocessing operation, i.e., the local contrast normalization described in (3). Figure 5 is a preprocessed gray image. Similarly, the hue image is subjected to the same sampling operation and preprocessing to obtain a corresponding hue image block. Finally, the prepared gray image block and the hue image block are inputted in pairs into the CNN model, and a series of convolution and pooling processes are performed to output the image block quality value, i.e., the local image quality evaluation.

For a test image G, we obtain the corresponding saliency map S through the saliency detection model. In order to obtain the image block weight corresponding to the CNN model, we adopt a nonoverlapping sampling strategy for the saliency map S. The weight of the image block is expressed asFor test image and saliency map , is the number of blocks. There is a one-to-one correspondence between and . is the pixel value of the position in the significant image block , and is the sum of the coefficients of the image block . Finally, we calculate the global quality score of test image G:The algorithm structure is shown in Figure 6.

3.4. Image Denoising

Nowadays, many algorithms [1921] have achieved very good denoising effects. For example, Rudin et al. [21] proposed a classic total variation (TV) denoising algorithm in 1992, namely, the ROF model. However, an iterative denoising algorithm is always computationally intensive since it requires multiple rounds of image quality assessment to find the best parameters. To solve this problem, we propose a parameter selection framework which reduces the number of iterations of the algorithm while finding the optimal parameters, in order to save execution time.

3.4.1. No-Reference Image Quality Assessment

The assessment method for image quality will be applied to an image processing algorithm to achieve optimal parameter selection. Based on the success of the CNN model for image quality assessment in Section 3.3, we propose a simple CNN model for evaluating the quality of denoised images, as shown in Figure 7. First, we perform a local normalization for a gray image and then sample nonoverlapping image blocks where the size is 64 × 64 pixels from normalized image. Our network consists of ten layers: 64 × 64-32 × 30 × 30-32 × 15 × 15-96 × 15 × 15-96 × 7 × 7-128 × 7 × 7-128 × 3 × 3-800-800-1. We apply a dropout regularization with a ratio of 0.5 after the second fully connected layers. Finally, we obtain average scores of all of the image blocks to represent the entire image quality score.

3.4.2. Image Denoising Algorithm

The ROF model is one of the most effective methods for image denoising. It aims to model the problem of image denoising as a minimization of the energy function to make the image smooth while the edge can be maintained well. Generally, the total variation value of the noisy image is larger than the clear image. The effect of noise removal can be achieved by solving the minimization function of the total variance. Therefore, the ROF model for image denoising is as follows:where is the noise image and is the denoising image. The first term of the above formula is the fidelity term, so that the denoised image keeps as much of the information in the noise image as possible. The second item is the TV regular term, which allows the model to effectively preserve edges while denoising. is a regularization parameter. The larger is, the smoother the denoising image will be, resulting in blurred image. The smaller is, the worse the denoising effect is. Therefore, choosing a suitable has a great effect on the results of the denoising. There are two cases of total variance. In this section, we only consider the isotropic ROF denoising problem:Split Bregman method [22] is used to solve formula (9); assuming and , the image denoising problem is as follows:Using Bregman iterative algorithm to solve (10), the isotropic ROF algorithm is as shown in Algorithm 1.

Initialize:
1: while    do
2:
3:
4:
5:
6:
7: end while

G denotes solving the linear equations with Gauss-Seidel iteration. We focus on the influence of parameter selection on the performance of the total variation denoising algorithm.

3.4.3. The Framework of Parameter Selection

In order to study the effect of parameter selection based on image quality on denoising results, we denoise the Park image [23] with 25 different values of parameter . The values are uniformly sampled from 1 to 49. Figure 8(a) shows the variation of the denoised image quality in the iterative process for different parameter values. Figure 8(b) is the result of the denoised image obtained using the SSIM and CNN Q methods to evaluate different parameters. Their predicted image quality trends are the same. It is obvious that the best denoising effect is obtained when parameter is 13.

However, image quality assessment is performed after the completion of the whole processing of iterative convergence when the range of parameter is large. Therefore, choosing an optimum parameter in a certain range requires a very large number of iterations.

Therefore, we use the changes in iteratively generated denoising image quality as a basis for determining if the iterative algorithm converges. Assuming that Dm = Q(m+1)−Q(m), Q(m) represents the quality score of the denoised image produced by the m-th iteration, and the algorithm convergence is presentedin Algorithm 2.

if    then
Converge
else
Continue iterating
end if

denotes the iterative length of the change of image quality that needs to be taken into account to ensure that the algorithm converges and produces a permissible change in image quality. The larger the “ value is, the smaller the “Tq” value becomes, indicating that the convergence condition of the algorithm is stricter.

Our improvement to the ROF algorithm is as follows. The algorithm can determine whether the parameter value is the optimal parameter we need before convergence, as shown in Figure 9.

Suppose ; pre is the predicted quality value of the parameter k. q represents the quality of the denoised image from the jth iteration. dev is the numerical derivative of the jth iteration. lpre is the predicted iteration length.

Suppose qbest is the quality score of denoised image. And t is the count and T is the threshold.

At the end of each iteration of the ROF algorithm, we use the above-mentioned evaluation algorithm to evaluate the quality of the denoising image q. In order to obtain an optimal-quality image, q is compared with the quality of the best image qbest obtained previously. The iterative process terminates if the quality of the denoised image resulting from T consecutive iterations is worse than qbest. At this time we believe that the current parameter has achieved the best denoising effect. More iterations of the current parameter are unnecessary. However, there may be an abrupt change in the quality of the image produced by subsequent iterations, so the judgment condition pre < qbest is increased to prevent misjudgment. Next, we need to iteratively evaluate other parameters in the same way.

If we want to further improve the performance of the parameter selection framework, we can simplify the complexity of the image evaluation network model, as described in Section 3.4.1, to speed up the evaluation time at the expense of minor loss of precision. Threshold T relates to the stringency of the convergence conditions. Finding a reasonable T-value for the speed and accuracy is helpful for the algorithm. In addition, we can also reduce the number of candidate parameters by reducing the fineness of the range division of the parameter within an acceptable range.

4. Results and Analysis

4.1. Image Quality Assessment Experiment

We obtain the quality evaluation of the image by weighting the quality score of the sampled image block. Therefore, the sampling strategy can affect the overall algorithm performance. In this section, we study the effect of two parameters on the performance of the algorithm: the block size of the sampled image and the number of samples per image.

(1) Block size: we assume the sampled image is nonoverlapped. In order to study the effect of image block size on the performance of our algorithm, we use fixed sampling strides and different image block sizes. The results are shown in Table 1. We find that the performance increases slightly with the increasement of patch size. Moreover, the performance of the algorithm will begin to decrease if the size is very large because the local information of the image has been lost. Here, we choose the size of 64x64.

(2) Sampling stride: we study the effect of the number of image blocks sampled on each image. We use a fixed block size of 64x64 and various strides. Figure 10 shows that different sampling strides lead to changes in the performance of the algorithm. We have found that the number of image blocks seriously affects the performance of the algorithm. In general, a larger stride leads to lower performance because less image information is used to evaluate image quality.

In order to test the performance of our proposed algorithm, we conducted experiments on the LIVE database [24]. In order to ensure the robustness of our proposed method, we performed 100 training and testing experiments, independently of the image content and the specific training test set. For each experiment, we randomly selected distorted images corresponding to 60% of the reference images in the LIVE database as the training set, 20% distorted image as the verification set, and the remaining images as the test set. Therefore, all the results shown in this chapter are averages of the results of these 100 experiments.

Tables 2 and 3 are the results of comparing our algorithm to the existing representative image quality assessment methods, including no-reference image quality evaluation methods and reference image quality evaluation methods. No-reference methods include DIIVINE [24], BLIINDS-II [25], BRISQUE, CORNIA [26], and CNN.

It can be seen that the method we propose has a high degree of consistency with human subjective evaluation. Compared to other no-reference image quality assessment methods, our method has the best performance on the three distortion types, JP2K, WN, and FF.

In order to test the generalization ability of our proposed algorithm, we conducted an experiment on the TID2008 database. We train our model on the entire LIVE database and then test model on the TID2008 database. In Table 4, the LCC indicator shows that our algorithm’s performance is slightly better than those of other algorithms, similarly to the CNN method in [16]. The SROCC value of our algorithm is very high, which is obviously better than the performance of the other algorithms. In general, the independence of our proposed algorithm is strong.

4.2. Algorithm Evaluation Experiment

First, we evaluate the performance of our proposed evaluation algorithm. We train our CNN model on the LIVE database and test the performance of model on the TID2008 database. Our training data is WN and blur types of distortion images from the LIVE database. By default, all results reported are averaged from 100 train-test iterations. In each iteration, we randomly select 80% distorted images as the training set and the remaining 20% as the validation set. The result is shown as Table 5.

We apply the parameter selection framework to the ROF denoising model. First, we use different parameters to denoise the two images and observe the denoising results. Figure 11 presents the denoising results of the Park image at three different parameter values. For Park images, the image denoising effect is better than that of other values when the parameter value is 13.

Next, we study the influence of the parameter selection framework on the iterative denoising algorithm. In our experiment, we set = 0.001, l = 5, = 5, and = 10. We evaluate the quality of the denoised image produced by each iteration to predict the quality of the denoised image during subsequent iterations. Figure 11 is the result of comparative experiments with a parameter selection framework and with no parameter selection framework for denoising a Park graph. As shown in Figure 12, with a denoising algorithm that uses a parameter selection framework, the number of iterations per parameter is significantly reduced. Moreover, the optimal parameter values selected by the two experiments are the same, which indicates that our proposed parameter selection framework not only greatly reduces the number of iterations of the algorithm, but also accurately selects the optimal parameter value.

In order to further prove this conclusion, we calculated the data generated by the denoising experiments on multiple images. The experimental results are shown in Table 6. Obviously, they are consistent with the above findings. According to experiments, the running time of the improved ROF algorithm is much lower than that of the original algorithm. In other words, the computational cost of the algorithm mainly comes from the iteration; i.e., the computational cost is proportional to the number of iterations. We improve the algorithm performance by reducing the number of iterations while ensuring the optimal parameters are obtained

5. Conclusions

We propose a cascaded algorithm for no-reference image quality assessment and image denoising based on CNN and visual saliency, to be applied in image security and authorization. This algorithm uses a strategy of significantly weighting image blocks to obtain a quality assessment more in line with the human visual system. This method has achieved very good performance on the LIVE database. The evaluation method is embedded into an image denoising algorithm, such as the total variation denoising algorithm. We propose a parameter selection framework that can judge the advantages and disadvantages of the parameters according to the quality change of the denoising image generated during the iterative process. And it is verified that the iteration number of denoising process is significantly reduced by parameter selection. Meanwhile, we can accurately find the best parameters from a bunch of parameter candidates.

Data Availability

The data used to support the findings of this study are included within the article.

Conflicts of Interest

The authors declare that they have no conflicts of interest.