Abstract

Visual quality measure is one of the fundamental and important issues to numerous applications of image and video processing. In this paper, based on the assumption that human visual system is sensitive to image structures (edges) and image local luminance (light stimulation), we propose a new perceptual image quality assessment (PIQA) measure based on total variation (TV) model (TVPIQA) in spatial domain. The proposed measure compares TVs between a distorted image and its reference image to represent the loss of image structural information. Because of the good performance of TV model in describing edges, the proposed TVPIQA measure can illustrate image structure information very well. In addition, the energy of enclosed regions in a difference image between the reference image and its distorted image is used to measure the missing luminance information which is sensitive to human visual system. Finally, we validate the performance of TVPIQA measure with Cornell-A57, IVC, TID2008, and CSIQ databases and show that TVPIQA measure outperforms recent state-of-the-art image quality assessment measures.

1. Introduction

Visual quality evaluation has numerous uses in practice and also plays a central role in shaping many visual processing algorithms and systems, as well as their implementation, optimization, and testing. As human being is end receiver of images, one straightforward way for evaluating image quality is subjective testing. The mean opinion score (MOS), subjective quality measurement, has been used for many years. However, it is very expensive and time consuming, which makes it impractical for image processing applications. These drawbacks lead to the development of perceptual image quality assessment (PIQA) metrics that can automatically evaluate the image perceptual quality.

An objective measurement of perceptual quality plays a very important role in many image processing tasks, such as image compression and enhancement. It can be used to dynamically monitor and adjust image quality, optimize algorithms, and benchmark image processing systems [1]. In recent years, a great deal of effort has been made to develop objective image quality metrics that correlate with human visual behaviors in evaluating image quality [14].

Depending upon the availability of a “perfect quality” reference image, the image quality assessment (IQA) metrics are classified into full-reference (FR), reduced-reference (RR), and no-reference (NR) [4]. FR metrics are those that need access to an original reference image to produce a quality score that predicts the subjective judgment of a distorted image. NR metrics only require distorted images to predict quality scores. RR metrics are between FR metrics and NR metrics, which require only partial information about the reference image [3]. In this paper, we focus on FR image quality assessment.

Generally, FR metrics measure the distance between a distorted image and its original image in a perceptually meaningful way. FR metrics can also be designed in two ways. One is modeling HVS, which has been regarded as the most appropriate way to measure and predict the perceptual image quality. The underlying assumption is that HVS is sensitive to the differences of visual signals in some respects, such as brightness, contrast, and frequency content. Under this assumption, the strength of the difference between a distorted and its reference image reflects the different perceived sensitivities of HVS. The other way explores signal fidelity criteria that is not based on assumptions about HVS model but is motivated instead by the need to capture the loss of signal structure that HVS hypothetically extracts for cognitive understanding [5].

In this paper, we propose a new framework for PIQA based on TV model in spatial domain. In the proposed PIQA metric, two human visual sensitivity factors, image structures and luminance changes in enclosed regions, are considered. As far as natural image signal is concerned, an evaluation metric needs to consider the characteristics of the image itself, such as image structure and content, to reflect the image visual complexity. On the other hand, from HVS perspective, another important factor to be considered is luminance change of smooth and enclosed regions, as HVS is very sensitive to luminance change. Based on these ideas, we propose a TVPIQA metric in spatial domain, in which we use TV to describe the image structure and the energy of enclosed regions in the difference image to measure luminance changes. Then, TVPIQA is represented by the weighted sum of these two factors.

To successfully assess the image quality, there are two major contributions of the proposed metric. First, we introduce TV model to assess image’s structure. The TV comparison between a distorted image and its reference image is applied to measure the distance of an image structural characteristic. Because of the good performance of TV in describing the edges, the proposed TVPIQA metric can describe the image structure information very well. Second, the luminance changes in enclosed regions are also considered in TVPIQA. The energy of enclosed regions in a difference image is used to measure the missing luminance information that is sensitive to human visual system. In addition, in order to make TVPIQA metric closer to perceptual feelings, isolated pixels’ energy in difference image is removed based on the idea of just noticeable distortion (JND), and a fast approximation method of calculating difference image’s energy is also proposed.

We demonstrate our TVPIQA metric by presenting performance results with extensive subjective databases (Cornell-A57 [6, 7], IVC [8, 9], TID2008 [10, 11], and CSIQ [12, 13]) and comparisons to seven often-used image metrics (PSNR, SSIM [1], IW-PSNR [4], IW-SSIM [4], MS-SSIM [14], VSNR [7], and VIF [5]). Experimental results demonstrate that the performance of TVPIQA metric outperforms other state-of-the-art metrics. It is worth noting that the proposed metric is easy to compute in spatial domain and does not need any other additional information.

The rest of paper is organized as follows. Section 2 presents some related work. The PIQA metric based on TV model is given in Section 3, and the implementation details of TVPIQA metric are also provided in this section. The characteristics of TVPIQA metric are analyzed in Section 4 and its performance is evaluated and discussed in Section 5. We conclude the paper in Section 6.

According to different methodologies being considered, PIQA metrics can be divided into two categories: HVS features based modeling and signal driven approach [15]. For HVS features based modeling, PIQA metrics are developed based upon systematical modeling of relevant psychophysical properties and physiological knowledge, including temporal/spatial/color decomposition, contrast sensitivity function (CSF), luminance adaptation, and various masking effects [15]. A number of HVS based methods have been proposed in the literature [1619]. Some have also considered JND model [20, 21]. HVS based methods extrapolate the vision models that have been proposed in the visual psychology literature to PIQA. However, HVS features based methods involve expensive computation and difficulties due to the gap between the knowledge for vision research and the need for engineering modeling [15].

Recently, a lot of research efforts have been concentrated on signal driven PIQA metrics, which are designed from the viewpoint of signal extraction and analysis, such as statistical features, structural distortion, and so forth [1, 4, 22, 23]. Signal driven methods do not attempt to build a comprehensive HVS model regarding quality evaluation. These metrics look at how to represent image features to estimate overall quality, and they often consider psychophysical effects as well, usually based on image content and distortion analysis. However, although some signal fidelity metrics reflect picture quality change, they fail to predict HVS perception because of some problems [15]. For example, not every image change is noticeable and leads to distortion. Therefore, signal driven methods need HVS features to help tackle these problems, so that they can better approximate perceptual quality evaluation.

Variational methods have been extremely successful in wide various fields in image processing and computer vision during last decades [24, 25]. TV model is first introduced by Rudin, Osher, and Fatemi (ROF) in their pioneering work on edge preserving [26]. For an image , its TV can be formulated by where denotes the image domain and means the gradient operator.

Many research results have shown that the proper norm for an image is the total variation norm, which is essentially norm of derivative and is more appropriate for image estimation and description in discontinuities [24, 25]. The advantages of TV norm led us to consider using it to measure image structure change, which is the distance between a distorted image and its original image. The proposed TVPIQA metric will focus on two human visual sensitivity factors, that is, image structures and luminance changes in enclosed regions. A significant difference between TVPIQA metric and other PIQA metrics is that TV model is introduced to assess image structures in spatial domain. Moreover, luminance changes in enclosed regions are also considered.

3. PIQA Metric Based on TV Model

Many signal driven PIQA metrics have been investigated based on the assumption that the loss of perceptual quality is related to the visibility of error signals. Generally, the mean squared error (MSE) or the peak signal-to-noise ratio (PSNR) has been a popular and usual metric to evaluate image quality. But MSE measure exhibits weak performance in assessing perceptual image quality [1]. From Figure 1 where the original “Lena” image is degraded with different distortions, we can see that MSE cannot reflect an image PIQ. The motivation of the paper is to design an appropriate measure for some HVS characteristics, especially image structures and luminance, and develop a novel PIQA metric.

3.1. Framework of Proposed TVPIQA Metric

When a natural image is observed through HVS, the subjective quality measure is affected by many factors. Because human eyes are sensitive to changes of image edges, especially the edge location information, and changes of luminance contrast [27], there are two main factors worthy of attention. One is image edge (structure) information, and the other is luminance information. Therefore, we propose a new TVPIQA metric to measure these two factors and provide a good approximation to perceived image distortion.

Figure 2 shows the framework of the proposed TVPIQA metric, which is separated into two parts: structure part and local region luminance part. In the structure part, the normalized TV comparison between the distorted image and the reference image, denoted by , is applied to represent the image structure changes. On the other hand, the energy of the enclosed regions in a difference image is used to measure the luminance changes, which is also normalized and denoted by . Then, the proposed TVPIQA metric is the mean of these two parts as follows:

3.2. TV Based Structure Measure

Let and represent the degraded image and its reference image, respectively. Because TV norm is very appropriate for image description in discontinuities, the structure’s change is measured by the TV difference between a reference image and its degraded image where represents norm and denotes the total variation of the image , expressed in discrete form where represents the intensity value at pixel .

Although the above measurement can work to assess the structure change, it is not a normalized measure and cannot be used as an evaluation to describe subjective feelings about image’s quality. Therefore, according to , the normalized perceptual distance for image structure is derived and defined by where is the image size and is a constant and set to 75 according to our experiments. It is obvious that .

3.3. Local Region Luminance Measure

A difference image represents the information loss in a distorted image and is defined by the difference between a reference image and its distorted image; that is, . Because HVS is also sensitive to luminance changes when observing an image, the energy of a difference image is used to measure the missing luminance information. Furthermore, based on the idea of JND modeling, that is, not every change in an image is noticeable [15], the isolated pixels’ energy in a difference image is filtered out. The energy measure of a difference image , which measures the luminance loss of a distorted image, is defined by where represents the enclosed regions in a difference image and is the intensity value at pixel in difference image. To compute the energy measure efficiently and avoid the judgment of the enclosed regions, we proposed an approximation of the difference image’s energy defined as where represents the whole regions in a difference image.

To test the relationship between and , we applied these two functions to TID2008 image database [10, 11], which includes 1700 distorted images generated from 25 reference images with 17 distortion types at four distortion levels. Figure 3(a) shows the relationship between and . The value of correlation coefficient is 0.9988, indicating that is highly related to . In Figure 3(b), the horizontal axis indicates the image number and the vertical axis indicates the energy value. Figure 3(b) shows the change curves of and plotted by all distorted images in TID2008. From Figure 3(b), we can see that and almost have the same change trend. Therefore, we can use , instead of , to measure the energy of a difference image.

Considering that human perception is more sensitive to luminance contrast rather than to absolute luminance, we adjust according to the mean intensity of a difference image where represents the mean intensity of a difference image; that is, .

In order to obtain the normalized measure for luminance change, we need to find the maximum difference image’s energy according to the reference image. In the case of consistency of the overall image energy, we assume that the distorted image, in which intensity values in all pixels are the same and equal to the mean intensity of the reference image, corresponds to the maximum luminance change. Based on this assumption, the maximum difference image is denoted by . It describes the maximum loss of luminance information in an original reference image. Then, the normalized perceptual distance measure for image luminance change is defined by where , computed by (8), represents the energy measure of the maximum difference image . Obviously, . The higher the value of is, the less luminance information lose.

4. Analysis of the Proposed TVPIQA Measure

This section analyzes some properties of the proposed TVPIQA measure, such as symmetry, boundedness, and unique maximum. Meanwhile, the difference of TVPIQA measure is also discussed in evaluating different types (contrast change, noise contamination, and blurring) of distorted images in this section.

4.1. Properties of TVPIQA Measure

Symmetry. Because the structure measure function is derived by , is obvious. On the other hand, the energy function satisfies the symmetry; that is, , and the energy measure function also satisfies the symmetry. Therefore, the proposed TVPIQA measure, , is symmetric.

Boundedness and Unique Maximum. According to the definition of and in Section 3, measure and measure . So, TVPIQA metric is bounded; that is, . Only when a distorted image is the same as its reference image, the structure measure and energy measure ; that is, if and only if , .

4.2. TVPIQA Measure for Different Types of Distortions

Due to the limited considered factors in designing PIQA metric and limited understanding of HVS, it is impossible for any PIQA metric to measure all kinds of distortions in the same measurement scale. To some distortions, a PIQA metric may be gentle, while being critical to other distortions. In our extensive experiments, the proposed TVPIQA metric is more critical in evaluating contrast distortions, compared with evaluating other distortions. This performance of TVPIQA exactly reflects HVS contrast sensitivity characteristic.

Figure 4 shows different levels of contrast distortions for an artificial image. In Figure 4, measures structure changes in the distorted images, and measures luminance changes. TVPIQA and SSIM measure the distances from distorted images to the reference image. From Figure 4, we can see that structure measure in the distorted images, , changes extremely slowly. However, the energy measure in the distorted images, , declines rapidly. The Fourier spectrums of the distorted images also show the same changes. Figure 5 shows another example, and the test image is “1600” images in CSIQ database. The same situation can be observed from Figures 5(b), 5(c), and 5(d) that changes slowly and drops dramatically. For the same distortion level with different distortion types, shown in Figures 5(d), 5(e), and 5(f), the contrast distortion has the largest drop between luminance measure and structure measure.

According to Figures 4 and 5, TVPIQA measure shows different measure scales for contrast distortions and other distortions, such as noise contamination and blurring. However, this does not mean that the performance of TVPIQA measure is not good, because, for HVS perception, not every change yields the same extent of perceptual effect with the same magnitude of change [15]. Therefore, to evaluate TVPIQA measure performance, the contrast distortion will be discussed separately in the next section.

5. Experimental Results

In this section, we validate the performance of the proposed TVPIQA measure and compare it with other seven IQA measures, that is, PSNR, SSIM [1], IW-PSNR [4], IW-SSIM [4], MS-SSIM [14], VSNR [7], and VIF [5]. PSNR is widely used in the image processing field and is also a useful baseline comparison. SSIM, MS-SSIM, visual signal-to-noise ratio (VSNR), and VIF are state-of-the-art measures that have demonstrated competitive performance. The information content weighted PSNR (IW-PSNR) and information content weighted SSIM (IW-SSIM) measures have been confirmed having the best overall performance compared with the previous measures. So, IW-PSNR and IW-SSIM are also good benchmarks to evaluate the new TVPIQA measure.

The proposed TVPIQA measure and other seven measures are evaluated on four publicly available subjective image databases that are widely recognized in the IQA research community, that is, Cornell-A57 [6, 7], IVC [8, 9], TID2008 [10, 11], and CSIQ [12, 13]. Two different types of subjective quality scores have been used: MOS and differential MOS (DMOS) in these image databases.

The Cornell-A57 database [6, 7] was created at Cornell University. It contains 54 distorted images with six types of distortions including quantization distortion, noise contamination, and blurring. The IVC database [8, 9] includes 185 distorted images generated from ten original images. There are four types of distortions that are (a) JPEG compression, (b) JPEG2000 compression, (c) local adaptive resolution (LAR) coding, and (d) blurring. The Tampere Image Database 2008 (TID2008), introduced in the previous section, is intended to evaluate full-reference image visual quality assessment metrics. It has 17 types of distortions, such as noise distortion, blur distortion, and contrast change. The categorical image quality (CSIQ) database [12, 13] was developed at Oklahoma State University. It consists of 30 original images and 866 distorted images using six different types of distortions at four to five different levels of distortion. The distortion types include JPEG compression, JPEG2000 compression, global contrast decrements, additive pink Gaussian noise, and Gaussian blurring.

The performance of any objective visual quality assessment metric is evaluated by measuring its correlation with human perception. In order to do so, the objective quality scores of IQA metrics are correlated with subjective scores using a variety of statistical measures such as the correlation coefficient (CC), Spearman's rank correlation coefficient (SRCC), and Kendall’s rank correlation coefficient (KRCC). To compare performance of different IQA measures, three evaluation metrics, that is, CC, SRCC, and KRCC, are used in experiments as shown in the following.

Correlation coefficients evaluate the prediction accuracy and measure linear dependence between the subjective and the objective scores. CC is defined as where is the mean value of subjective scores and is the mean value of objective scores .

Spearman’s rank correlation coefficients measure the prediction of monotonicity [28]. SRCC is given by where and are the th image’s ranks in subjective and objective evaluations, respectively. SRCC is a nonparametric rank-based correlation metric, independent of any monotonic nonlinear mapping between subjective and objective scores [4].

Kendall’s rank correlation coefficient (KRCC) is another nonparametric rank correlation metric computed by where and are the numbers of concordant and discordant pairs in the data set, respectively.

According to the above definitions, larger CC, SRCC, and KRCC values, close to 1, indicate that the objective and subjective scores correlate better, that is to say, a better performance of IQA metric. In our performance comparisons, CC, SRCC, and KRCC of seven metrics are either computed or referenced from some research works [2931]. Among these metrics, PSNR, IW-PSNR, SSIM, MS-SSIM, and IW-SSIM are referred to Zhou Wang’s research [29], while VSNR and VIF are referred to Lin Zhang’s research [30, 31].

Table 1 shows our test results of eight IQA metrics using four databases. As analyzed in Section 4, the contrast distorted images in TID2008 and CSIQ databases are discussed separately. For each evaluation metric in each test, we highlight our proposed TVPIQA metric and the best of other seven metrics with boldface.

From the experimental results in Table 1, we have two major observations. TVPIQA metric has similar performance to IW-SSIM on Cornell-A57 and IVC databases. Without considering the contrast distortion, the proposed TVPIQA metric has the best performance on TID2008 and CSIQ databases. On the other hand, TVPIQA metric has the highest SRCC value in evaluating the contrast distortion in CSIQ database. However, many metrics do not work well on the contrast distortion in TID2008 database. The main reason is that contrast enhancement images in TID2008 have high subjective scores for HVS perception, while they are measured as the distortion by some objective evaluation metrics.

To evaluate the overall performance of IQA metrics under comparison, Table 2 presents the average CC, SRCC, and KRCC results over four databases, where the average values are computed in two cases. In the first case, without considering the contrast distortion, the correlation scores are computed by the size weighted average method. Different weights are given to different databases, depending upon their sizes (measured as the numbers of images, i.e., 54 for Cornell-A57, 185 for IVC, 1600 for TID2008, and 750 for CSIQ databases), while in the second case, the contrast distortions in TID2008 and CSIQ are also considered, and the correlation scores are still averaged according to the numbers of images, 100 for TID2008 contrast distortion and 116 for CSIQ contrast distortion.

From Table 2, it can be observed that our proposed TVPIQA metric has better overall performance than other IQA metrics. Although IW-SSIM and TVPIQA nearly have the same performance from test results, it is worth mentioning that, from the view point of computation complexity, TVPIQA metric achieves this excellent performance only by computing the image structure and energy information in spatial domain, which does not need any preprocessing, such as image transform operation, extra image analysis operation, and so forth. However, IW-SSIM needs more computation time to information content weights.

6. Conclusions

In this paper, we propose a new framework for TV based perceptual image quality assessment measure in spatial domain. The proposed TVPIQA measure focuses on two human visual sensitivity factors, image structures and luminance changes. A significant difference between TVPIQA measure and other IQA measures is that TV model is introduced to assess image structures. Meanwhile, the energy of enclosed regions in a difference image is used to measure the missing luminance information which is also sensitive to human visual system. Extensive experimental results with four publicly available independent image databases demonstrate that the proposed TVPIQA measure achieves the best overall performance when compared with other seven popular IQA measures.

Conflict of Interests

The authors declare that there is no conflict of interests regarding the publication of this paper.

Acknowledgments

This work is partially supported by National Natural Science Foundation of China (Grant no. 61303127), Western Light Talent Culture Project of Chinese Academy of Sciences (Grant no. 13ZS0106), Project of Science and Technology Department of Sichuan Province (Grant nos. 2011JQ0041 and 11ZS2009), and Key Program of Education Department of Sichuan Province (Grant nos. 11ZA130 and 13ZA0169).