Abstract

Learning a deep structure representation for complex information networks is a vital research area, and assessing the quality of stereoscopic images or videos is challenging due to complex 3D quality factors. In this paper, we explore how to extract effective features to enhance the prediction accuracy of perceptual quality assessment. Inspired by the structure representation of the human visual system and the machine learning technique, we propose a no-reference quality assessment scheme for stereoscopic images. More specifically, the statistical features of the gradient magnitude and Laplacian of Gaussian responses are extracted to form binocular quality-predictive features. After feature extraction, these features of distorted stereoscopic image and its human perceptual score are used to construct a statistical regression model with the machine learning technique. Experimental results on the benchmark databases show that the proposed model generates image quality prediction well correlated with the human visual perception and delivers highly competitive performance with the typical and representative methods. The proposed scheme can be further applied to the real-world applications on video broadcasting and 3D multimedia industry.

1. Introduction

During the past few decades, there has been an exponential increase of stereoscopic images and videos in 3D display market [1]. However, due to various 3D quality factors [2, 3] including binocular rivalry, visual comfort, and depth perception, the visual quality assessment of stereoscopic images is much more complex and relatively less researched than the traditional 2D image quality evaluation. To address these challenges, we require a deeper understanding of binocular vision mechanisms and interactions for the quality prediction of distorted stereoscopic images.

There are mainly two groups of methods on 3D image quality assessment (IQA): subjective quality evaluation by human observer [4] and objective quality evaluation by devised metric used to simulate human perceptual judgements [5]. Since the human eyes are the final receiver of visual information, the subjective evaluation can directly reflect the human visual perception and is accurate and effective to evaluate the visual quality. However, the subjective evaluation involves many participants in the course of experiments, which is time-consuming and costly. Therefore, it is unrealistic to implement it in many scenarios like real-time evaluation [6]. As a result, it is in urgent demand to propose objective methods that can effectively evaluate the human perceptual quality of stereoscopic images.

Based on the volume of accessible information in the images, existing objective quality assessment metrics can be generally divided into three categories: full-reference (FR) [7, 8], reduced-reference (RR) [9], and no-reference/blind (NR) methods [10, 11]. When the reference contents are accessible, the FR method can offer more accurate quality assessment. Early approaches for FR 3D-IQA directly stemmed from 2D quality metrics [12]. Conventionally, a straightforward way is to apply the 2D-IQA metrics to both views of a 3D image independently and then integrate the two 2D quality scores into a final 3D quality score. Several 3D-IQA methods [13, 14] were proposed by introducing the associated disparity or depth map into the 3D image quality model. These research findings indicated that a satisfactory result can be obtained if the disparity images and reference images are combined appropriately. Afterwards, more sophisticated algorithms were developed based on the binocular vision properties. For example, Lin and Wu [15] revisited the physiological discoveries of binocular vision and incorporated the binocular integration into the existing 2D quality metrics for measuring the quality of stereoscopic images. Shao et al. [16] classified the stereoscopic images into noncorresponding, binocular fusion, and binocular suppression regions. Each region was evaluated individually according to its binocular perception property. In our previous work [17], we proposed a full-reference quality evaluator by considering the local and global qualities of 3D images. The experimental results showed its good performance in terms of stereoscopic image quality assessment.

Since pristine reference images are rarely available in practical applications [18, 19], the NR algorithms are potentially much more feasible solutions. They can give quality evaluation without any information extracted from the corresponding pristine image. The NR 3D-IQA is still preliminary, and a limited number of blind 3D-IQA algorithms have been developed. Inspired by the human visual system, Chen et al. [20] proposed a no-reference binocular image quality assessment method for natural stereopairs. The proposed method extracted both 2D and 3D natural statistical features from a stereopair and utilized these statistical features and the binocular rivalry for 3D image quality prediction. Ryu and Sohn [21] investigated the relationship between visual information and binocular quality perception and developed an NR quality evaluation algorithm for 3D images. The scores of perceptual blockiness and blurriness were combined into an overall quality index based on the binocular perception models. Shen et al. [22] devised a no-reference quality scheme for stereoscopic images based on the visual perceptual characteristics. Three types of features relating to image distortion, depth perception, and binocular disparity were used to map the human opinion scores. Other relevant works can be found in references [2325].

Recently, machine learning techniques have achieved great success and been widely applied to various research fields [2628]. One of the advantages of applying machine learning to quality evaluation is that it can directly take original image data as input and then combine feature learning with quality regression in the training procedure [29, 30]. Kang et al. [31] applied convolution neural network (CNN) to image quality assessment. They devised a shallow network which extracts quality-predictive features from image patches. Several NR algorithms for 3D-IQA using deep learning have been developed. Oh et al. [32] reported a novel deep learning method for NR 3D-IQA in terms of local to global feature aggregation. Zhou et al. [33] proposed a dual-stream interactive network for stereoscopic image quality assessment. In our previous work [34], we developed a no-reference quality prediction scheme for 3D images based on binocular features and support vector regression (SVR). The scheme showed its effectiveness, but the performance in terms of prediction accuracy and time complexity needs to be further improved. More efficient methods for stereoscopic image quality assessment should be explored to address these limitations.

It is challenging for the NR algorithms to have the assessment accuracy as good as can be obtained with the FR quality evaluation methods. Moreover, 3D image quality databases generally lack large-scale training images with subjective quality scores, which limit the performance of these algorithms using deep neural networks. Other techniques should be explored and worthy of further research. We are motivated to tackle these limitations for 3D image quality assessment. In this paper, inspired by the research findings on the human binocular visual system, we try to simulate the perceptual mechanism of binocular vision. We primarily work on extracting certain types of binocular features from distorted stereoscopic image and constructing a statistical regression model to map these quality-aware features to the human perceptual judgements. The main contributions of this work are as follows:(1)Different from other related studies [33, 35], the novelty of our work lies in that we propose to adopt the effective binocular statistical features from the fusion and difference maps of a stereopair for stereoscopic image quality prediction.(2)We have demonstrated that appropriate combination of binocular features and binocular energy can greatly promote the performance of 3D image quality evaluation.(3)Compared with other typical and representative methods, the proposed scheme achieves higher consistent alignment with human subjective assessment and has lower time complexity. The experimental results show that our scheme can accurately estimate the perceptual quality of distorted stereoscopic images and has promising generalization ability.

The remainder of this paper is organized as follows. Section 2 introduces some fundamental knowledge about binocular visual perception. Section 3 presents the proposed quality assessment scheme for stereoscopic images in detail. Section 4 gives the experimental results and performance analysis of the proposed scheme and the comparison with other related algorithms. Finally, Section 5 concludes the paper with possible ideas for future work.

2. Foundation for Binocular Visual Perception

It has been known that the binocular vision is a complex visual process that requires the brain and both eyes working together to produce clear vision. Figure 1 describes a simplified framework of two important visual neural pathways for the binocular visual system. The ventral stream starts from the primary visual cortex V1 and goes through V2 and V3 to V4 area. The functions of ventral stream are about the recognition and perception behaviors. The dorsal stream begins from the V1 area, goes through V2 and V3 to V5 area. The visual information-guided interactions occur in dorsal stream [15]. Regarding the detailed functions of each visual area, please refer to the binocular visual perception book [36] for further information.

The visual cortex plays an important role in our binocular visual perception, and it has been demonstrated that the primary visual cortex (V1) is mainly responsible for the human visual system (HVS) [37]. In the V1, simple and complex receptive fields are usually characterized to understand the behavior of visual perception. According to visual psychophysical study, two visual phenomena usually occur in the process of binocular visual response: binocular rivalry and binocular fusion. When the two eyes view mismatched images at the same retinal location, one experiences binocular rivalry. As a result of competition between the eyes, binocular rivalry involves reciprocal inhibition between the monocular channels. When two slightly different retinal signals can be perceived by two eyes, one experiences binocular fusion. During fusion, two retinal points are integrated into one single perception, superimposing and combining similar contents from the two views. Therefore, the binocular vision can be generally considered as a combination of binocular rivalry and binocular fusion.

As a significant content of primitives in V1, image structural information is closely related to image visual quality. And the degradation of perceptual quality can be reflected via the change of image structural information. Previous studies [29, 38] highlighted the significance of image structural information for image quality assessment. The gradient magnitude (GM) and Laplacian of Gaussian (LOG) are basic elements that are commonly used to represent image semantic structures [39]. More importantly, during 3D visual stimuli processing, binocular fusion and disparity responses are primitively formed in the V1 cortical area. The visual signals from the binocular summation and subtraction channels are multiplexed, and then each neuron in V1 receives a weighted sum of the visual stimuli from these two channels [40]. Motivated by these research results, in this paper, we extract the GM and LOG features from a stereopair and its fusion and difference maps as binocular features. In the following section, we will describe our proposed quality prediction model for distorted stereoscopic images in detail.

3. The Proposed No-Reference Quality Assessment Scheme

Figure 2 illustrates the architecture of the proposed scheme for stereoscopic image quality prediction. Given an original stereopair, we first generate the fusion map and difference map and extract the binocular statistical features from them as basic feature vectors. Then, we calculate the binocular energy responses from the local amplitude and local phase of the stereopair as quality-aware features. Finally, we employ an extreme learning machine method to map these features of distorted stereopair to its human perceptual quality score.

3.1. Binocular Feature Extraction

As can be seen from Figure 3, the fusion maps and difference maps of the left- and right-view images with different distortion types are discriminative, which can be utilized for extracting effective quality features. Specifically, the fusion map reflects the fusion ability of the left and right stereo-halves, while the difference map reveals the disparity information of a stereopair.

As discussed in Section 2, the gradient magnitude (GM) and Laplacian of Gaussian (LOG) features can be adopted to build the basic elements of image semantic structures, and they are hence closely related to the perceptual quality of natural images. The Gaussian derivative functions can model the receptive field responses of neurons along the visual pathway [41]. Therefore, we compute the GM and LOG maps using the first- and second-order derivatives of a circularly symmetric 2D Gaussian function defined as follows:where and represent the horizontal and vertical directions, respectively. The parameter is the standard deviation. Then, we calculate the first-order partial derivative of with respect to or bywhere is the Gaussian partial derivative filter applied along the horizontal or vertical direction. An image is denoted by ; thus, the GM map of the image can be obtained bywhere the symbol represents the convolution operation. , where and refer to the left and right views of a stereopair, respectively. Similarly, the LOG filter, corresponding to the second-order Gaussian partial derivative, is defined as follows:

Accordingly, we estimate the LOG map of the left and right views by

Subsequently, a joint adaptive normalization procedure [39] is employed to normalize the GM and LOG coefficients for stable statistical image representations. Previous works [42, 43] have revealed that the overall quality of a distorted stereopair cannot be accurately calculated by directly averaging the qualities of the left- and right-view images, especially for an asymmetrically distorted 3D image. In practical application, to simulate the binocular rivalry (BR) phenomenon, the basis of weighting factors for the quality-predictive feature vectors of a stereopair can be defined as follows:where and denote the weights for the distorted left and right views, which can reflect the binocular contrast combination to a certain extent. and represent the local energy variances of the left and right images for a stereopair, respectively. The intensity adjusting parameter is empirically set to 3 in the experiment. Therefore, the basic feature vectors of the gradient magnitude and Laplacian of Gaussian responses for a stereoscopic image can be calculated bywhere and denote the gradient magnitude and Laplacian of Gaussian for the left and right image, respectively. The features of GM and LOG responses are utilized to represent the visual semantic structures of the first-order and second-order binocular combination.

Finally, combined with the GM and LOG features of the fusion and difference maps, the binocular feature vectors used for further data training can be expressed bywhere and are the GM/LOG features of the fusion and difference maps, respectively.

3.2. Binocular Energy Response

The above extracted features are mainly utilized to indicate the visual sensitivity of binocular rivalry. Neurological research has reported that the human binocular vision phenomenon is a complicated process with combination of binocular rivalry, binocular fusion, and other factors [44]. The binocular fusion also contributes significantly to human visual perception besides the binocular rivalry. Previous research findings [45] indicated that the binocular energy responses play critical roles in representing binocular visual perception, especially for binocular fusion. In this paper, the binocular energy responses are obtained from the local magnitude and local phase of a stereopair.

In the proposed scheme, the left and right images of a stereopair are first processed using the log-Gabor filter. Here, we define to represent the responses on different scales, where is the spatial scale index. And we let denote the responses along different orientations, where is the orientation scale index. The detailed description of this log-Gabor filter can be referred to the work in [46]. According to the given scale and orientation, the local amplitude at location on scale and along orientation can be defined as

With the sum of the local amplitudes on all the scales along the orientation [46], the local amplitude can be calculated bywhere is a parameter used to indicate the orientation with the maximum phase congruency value. Similar to the local amplitude, the local phase can be obtained by the angle along the orientation [46]:

Based on previous works on binocular vision energy [16, 34], the left-view response and right-view response of a stereopair can be defined as follows:

The right-view response usually can be taken as a shifted transformation of the left-view response . The disparity is defined as the difference between the locations of associated points in the left- and right-view responses. By considering a simple binocular cell with the left and right receptive fields, the binocular energy response for a stereoscopic image pair can be calculated by

Finally, combined with the binocular statistical features, the overall quality-predictive features are , which are fed into the following quality prediction for model learning. The weights among them are determined in the learning process.

3.3. Quality Prediction Model Learning

A number of training methods can be utilized to map the quality-predictive features of a stereopair to its corresponding subjective quality score, such as support vector regression (SVR) [47] and neural networks (NNs) [48]. SVR requires complex training algorithms and involves a quadratic programming problem. Neural networks have the difficulties of local minima, learning epochs, and slow convergence. An important question is that neural networks or training-based methods usually need large quantities of labeled training samples, while 3D image quality databases generally lack large-scale training images with subjective quality scores, which limits the performance of the methods using deep neural networks. In recent years, the extreme learning machine (ELM) [49] has attracted considerable attention and has been demonstrated as an effective and efficient technique in many applications, such as pattern recognition [50] and quality evaluation [51]. The ELM has advantages of faster learning speed, higher learning accuracy, and improved generalization. The weights between the input and hidden layers can be selected randomly and independent of the training data, and layer-by-layer back propagated tuning is not required [35]. Motivated by these unique properties, we try to employ the ELM for feature mapping and regression model learning in 3D image quality prediction.

For a given set of arbitrary training samples , where represents the quality-predictive features of the th pair of original/distorted images and is the corresponding subjective quality score, our goal is to find a function which minimizes the deviation from the subjective quality score for all the training data. The function with hidden nodes can be mathematically modeled and expressed bywhere is the output vector of the hidden neuron and denotes the output weighting vector between the output node and the hidden layer of nodes. The activation function can approximate training samples by minimizing the training error and can be formulated aswhere is the weighing vector which connects the input layer and the th hidden node and denotes the corresponding threshold of the hidden node. In equation (14), is the only parameter to be determined, which leads to fast learning for ELM [49]. For training samples , the mathematical model for ELM (equation (14)) can be described as follows:where represents the target vector and is called the hidden layer output matrix, which can be defined as

The minimal norm least-squares method is used in ELM to minimize the norm of the output weights. Then, the vector of the output weights can be predicted analytically and expressed bywhere denotes the Moore–Penrose (MP) generalized pseudoinverse of the hidden layer output matrix . In practice, the orthogonal projection method [49] can be efficiently employed to calculate the Moore–Penrose inverse:

Based on the ridge regression theory, a positive value is added to the diagonal of or , which makes the solution more stable. Therefore, with this positive value , we can obtainwhere represents the number of training samples and denotes the number of hidden nodes. In this paper, the number of nodes is selected to be equal to the number of training samples . As a result, the output weight vector is determined as in the experiments. More details on the ELM can be found in [49].

4. Experimental Results and Analysis

In the experiments, we first describe the databases and criteria used for quality assessment. Then, we give the performance comparison with other related algorithms in terms of predicting the quality of distorted stereoscopic images. Moreover, we show the evaluation results on individual distortion type. In addition, we investigate the effect of each component in the proposed metric. Finally, we perform the cross-database evaluation and analyze the time complexity in our experiments.

4.1. Experimental Databases and Protocols

In order to verify and compare the performance of our proposed quality assessment metric, three public and subject-rated benchmark 3D image databases were used as standards: LIVE 3D IQA database Phase I [52], LIVE 3D IQA database Phase II [20], and MCL-3D database [53].(1)LIVE 3D IQA database Phase I [52]: phase I contains 20 reference stereopairs and 365 symmetrically distorted stereopairs corresponding to five distortion types: JPEG compression, JPEG2000 (JP2K) compression, additive white noise (WN), Gaussian blur (GB), and a simulated fast-fading (FF) model. Each distorted stimulus has been evaluated by human observers and assigned a difference mean opinion score (DMOS) value. The lower DMOS values represent higher visual quality.(2)LIVE 3D IQA database Phase II [20]: phase II has 120 symmetrically distorted stimuli and 240 asymmetrically distorted stimuli generated from 8 pristine stereopairs. Each of the five distortion types (JPEG, JP2K, WN, GB, and FF) is symmetrically and asymmetrically applied to the pristine stereopairs at various degradation levels. The corresponding DMOS values are also given for the distorted stereopairs.(3)The MCL-3D database [53]: this database consists of 684 stereoscopic image pairs. Nine image-plus-depth sources are selected, and then a depth-image-based rendering technique is used to render 3D images. Four levels of distortions are applied to either the depth map or texture stereoscopic image prior to 3D image rendering. The distortion types are JPEG, JP2K, WN, Gaussian blur (GBLUR), downsampling blur (SBLUR), and transmission error (TERROR). Each distorted stimulus has been scored by human observers, and a pairwise comparison is used to obtain reliable mean opinion score (MOS) values.

To benchmark the performance of quality assessment metrics, three general performance indicators were employed to provide quantitative performance evaluations: (1) Pearson’s linear correlation coefficient (PLCC), which measures the linear dependence between the predicted quality scores and the ground truth targets, (2) Spearman’s rank-order correlation coefficient (SRCC), which serves as a measure of prediction monotonicity, and (3) Kendall’s rank-order correlation coefficient (KRCC), which is a nonparametric rank-order-based correlation metric. Higher values of PLCC, SRCC, and KRCC represent good consistency with human perceptual quality ratings. For the nonlinear regression, a five-parameter logistic function [54] was applied to fit the predicted quality scores and provided quality scores.

In the experiments, we randomly split each database into two nonoverlapping subsets: a training subset and a test subset. A training process was required to calibrate the quality prediction model. In each train-test procedure, of the database content was selected for training and the remaining for test. After learning the statistical regression model using the training set, the quality prediction performance was evaluated using the test set. In specific, to avoid potential performance bias of the proposed scheme, the train-test iteration was repeated 1000 times, and the median values of PLCC, SRCC, and KRCC were chosen as the final validation results for performance evaluation. In the implementation, a unipolar sigmoidal function with was used as the ELM nonlinear activation function.

4.2. Overall Performance Comparison

To comprehensively investigate the effectiveness and robustness of the proposed scheme, we have conducted several different experiments to compare our scheme with the typical and representative methods. These mainly include two 2D-IQA methods (PSNR and multiscale structural similarity (MS-SSIM) [55]), two FR 3D-IQA methods (Benoit et al.’s method [13] and Chen et al.’s method [8]), and three NR 3D-IQA methods (Zhou and Yu’s method [56], Fan et al.’s method [41], and Shen et al.’s method [22]). For the previous two 2D-IQA approaches, the predicted quality score of a stereoscopic image was obtained by averaging the left and right image qualities. For Benoit et al.’s approach [13], the disparity distortion was the global correlation between the original and distorted disparity maps. For Chen et al.’s approach [8], we adopted the cyclopean metric in terms of multiscale SSIM described in their paper.

Figure 4 provides the scatter plots of predicted quality scores against subjective DMOS values for the proposed scheme and other compared methods on the LIVE 3D IQA database Phase I. In these figures, the horizontal axis represents the predicted quality scores and the vertical axis denotes the subjective DMOS values of the perceived distortions. Considering performance comparison, a straight-lined distribution of scatter points is better than other arbitrary shapes. For the PSNR and MS-SSIM [55] approaches, the performance is worse than most of other methods in general. The reason can be attributed to that these methods treat the left- and right-view images independently and binocular visual characteristics are not taken into account. For Benoit et al.’s approach [13], the quality evaluation accuracy is even lower than the 2D-IQA approaches under some distortions. One possible explanation is that the 2D image quality metric for disparity maps does not coincide with the human perception of disparity. Overall, the proposed scheme has better consistent alignment with human subjective judgements for stereoscopic 3D images on the database.

In order to further evaluate the performance comparison of quality assessment accuracy on the three databases, we have given the values of PLCC, SRCC, and KRCC between the provided and predicted quality scores for the proposed scheme and the compared methods. Table 1 presents the performance comparison results in terms of PLCC, SRCC, and KRCC on the three databases. In each case, the results of the best-performance metric are marked in bold. According to the experimental results in this table, Shen et al.’s method [22] performs best on asymmetrically distorted stereoscopic images in the LIVE 3D IQA database Phase II, and our proposed scheme achieves higher consistency with human opinion scores on the other two databases. Moreover, the PLCC and SRCC values for our scheme are above 0.912 and 0.907, respectively, on all databases, which demonstrate that the proposed scheme exhibits a good stability to quantify and predict the perceptual distortions of 3D images. On the whole, the proposed scheme has competitive performance and shows statistically superiority over other typical and representative methods for 3D image quality prediction.

4.3. Distortion-Specific Performance Evaluation

In this section, we have investigated the distortion-specific performance of the proposed scheme and other compared methods for each individual distortion type on the hybrid distortion databases. The PLCC, SRCC, and KRCC comparison results are summarized in Tables 24, respectively. For reasons of space and for brevity, M is used to represent the corresponding compared method proposed in paper . The top two quality assessment metrics for each index (PLCC, SRCC, or KRCC) have been highlighted in bold. From these tables, we can find that our proposed metric achieves the highest hit-count for each index and is statistically superior to the compared methods. Some metrics may have high assessment accuracies for specific distortion types: Chen et al.’s method [8] shows strong competitiveness on Gaussian blur, and Shen et al.’s method [22] has outstanding performance on JPEG compression . But our method is comparable to the best-performing metrics for these kinds of distortions. The proposed scheme generally outperforms the vast majority of compared methods by a certain margin for distortion-specific performance evaluation. From these experimental results, it is worth noting that the quality prediction of our scheme is basically independent of different sorts of distortions.

4.4. Contribution of Each Component in the Proposed Scheme

In this section, to understand the respective contributions of each component to the overall quality score in the proposed metric, we have devised three different schemes for comparison, denoted by scheme A, scheme B, and scheme C, respectively. For scheme A, the binocular features of GM response and the binocular energy were used to measure visual quality. For scheme B, the binocular features of LOG response and the binocular energy were adopted for quality prediction. For scheme C, the binocular energy was not included, and the binocular features of GM and LOG responses were considered for quality evaluation. The PLCC, SRCC, and KRCC results are reported in Table 5. As can be observed from this table, the binocular features of LOG response have the most important impact on quality prediction under distortions. It can be inferred that the binocular features of GM and LOG responses and the binocular energy are complementary, and only adopting one aspect of these features cannot obtain the best performance. In addition, according to the results in the table, scheme B has higher assessment accuracies than scheme A, which implies that the LOG features contain more useful visual information and contribute more to 3D quality prediction than the GM features. The experimental results have also demonstrated that the quality assessment performance can be promoted by appropriate combination of binocular features and binocular energy.

4.5. Cross-Database Performance Evaluation

In the above experiments, the training and test subset have the same distortions selected from the databases. Since the proposed scheme is based on a learning framework, it is necessary to ascertain whether the performance is bound to a special training database on which it is trained. To verify the generalization ability and stability of our scheme, we have carried out cross-database experiments for performance evaluation. In the experiment, we examined whether satisfactory results could be obtained by applying the regression model trained on one database to the testing set from another database. For brevity, the SRCC results of cross-database performance evaluation are given in Table 6. It can be observed that the proposed metric has comparatively weak performance in comparison with the evaluation results in Table 3. The reason can be mainly attributed to that the training and test subsets have different types of distortions. For instance, the LIVE Phase I database only has symmetrically distorted images, while the LIVE Phase II database contains both symmetric and asymmetric distortions. However, the values of corresponding indicators are still relatively high, which show that our framework can maintain a satisfying predictive capacity under different circumstances. Based on the above experimental results, it can be concluded that a larger training database with more comprehensive distortion types could probably promote the prediction accuracy of our scheme.

In this section, we have also compared the cross-database evaluation of the proposed scheme with other related methods. The SRCC results are provided in Table 7, where the top two metrics have been marked in bold. According to the experimental results in the table, no matter which training database is used, the cross-database evaluation of our scheme is usually stable, and it offers statistically better quality prediction in line with human perception than the compared methods on different databases. These facts demonstrate the generalization ability and effectiveness of the proposed scheme for stereoscopic image quality assessment.

4.6. Time Complexity Analysis

Time complexity is a significant indicator in evaluating the performance of the proposed scheme, to facilitate its use in real-time applications such as monitoring and adjustment. We have compared the computational complexity of our proposed scheme with other related methods. The experiment was performed in MATLAB R2014a on a Windows 10 PC with a 2.5 GHz Intel Core i7 processor and 8 GB RAM. The results of time consumption are given in Table 8, which presents the running time comparison on the LIVE Phase I database with 365 stereopairs. As can be seen from this table, the total processing time of the proposed scheme is 127 seconds, which indicates that it takes less than 0.35 seconds to predict a distorted stereopair. Although it is not the most efficient method, it has the best comprehensive performance in achieving the balance between accuracy and timeliness. The simulation results demonstrate that our proposed scheme has relatively lower computing complexity than the compared methods.

5. Conclusions

In this paper, we have presented a novel no-reference quality prediction method for stereoscopic images based on binocular statistical features and machine learning. The framework of the proposed scheme includes a feature extraction stage and a feature mapping stage. The gradient magnitude and Laplacian of Gaussian responses from a stereopair and its fusion and difference maps are utilized as quality-predictive features. With the extreme learning machine, a statistical regression model is established to map these binocular features of a stereopair to its corresponding perceptual quality score. The visual quality predictions by the proposed metric are highly correlated with subjective quality judgements for distorted image pairs of various distortion types. More importantly, our method achieves excellent performance and has a promising generalization ability. The proposed scheme can be applied to video broadcasting and 3D multimedia industry for its practicality.

For future work, how to explore deeper structure representation for a human visual system and how to design more efficient machine learning methods for visual quality prediction should be researched. In addition, more effective quality features can be considered to simulate the human perceptual vision. Other 3D quality factors such as depth perception and visual comfort still deserve further study.

Data Availability

The data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest

The authors declare that there are no conflicts of interest regarding the publication of this article.

Acknowledgments

This work was supported in part by the Key Research and Development Program of China under Grant no. 2018YFC0831000, the National Natural Science Foundation of China under Grant no. 62001267, the Natural Science Foundation of Shandong Province under Grant no. ZR2020QF013, the Shandong Provincial Key Research and Development Program (Major Scientific and Technological Innovation Project) under Grant no. 2019JZZY010119, and the Fundamental Research Funds of Shandong University under no. 2020HW017.