Abstract

Polar harmonic transforms (PHTs) have been applied in pattern recognition and image analysis. But the current computational framework of PHTs has two main demerits. First, some significant color information may be lost during color image processing in conventional methods because they are based on RGB decomposition or graying. Second, PHTs are influenced by geometric errors and numerical integration errors, which can be seen from image reconstruction errors. This paper presents a novel computational framework of quaternion polar harmonic transforms (QPHTs), namely, accurate QPHTs (AQPHTs). First, to holistically handle color images, quaternion-based PHTs are introduced by using the algebra of quaternions. Second, the Gaussian numerical integration is adopted for geometric and numerical error reduction. When compared with CNNs (convolutional neural networks)-based methods (i.e., VGG16) on the Oxford5K dataset, our AQPHT achieves better performance of scaling invariant representation. Moreover, when evaluated on standard image retrieval benchmarks, our AQPHT using smaller dimension of feature vector achieves comparable results with CNNs-based methods and outperforms the hand craft-based methods by 9.6% w.r.t mAP on the Holidays dataset.

1. Introduction

Rotation-invariant moments (RIMs) are extensively used in image representation and pattern recognition [13] because of their outstanding description capability and invariance property. Moreover, RIMs not only provide hand craft features for image representation but also can reconstruct original image from these hand craft features, making them suitable for image watermarking [46]. There exist two kinds of RIMs: orthogonal and nonorthogonal. Since the orthogonal RIMs (ORIMs) possess minimum information redundancy and hence better information compactness, they are effective in performance. There are several low order moments and transforms can sufficiently extract the essential features of images. The most popular among these ORIMs [7, 8] are Zernike moments (ZMs) and pseudo ZMs (PZMs). Yap et al. [9] recently introduced a few ORITs, collectively known as PHTs, including the polar sine transforms (PSTs), polar cosine transforms (PCTs), and polar complex exponential transforms (PCETs). ORIMs are different from ORITs in the radial parts of kernel functions, which are polynomials and sinusoidal functions in ORIMs and ORITs, respectively. Compared with ORIMs, PHTs are more efficient in computation [10]. Besides, the high order transforms of PHTs are numerically stable, whereas those of ORIMs are numerically unstable. Therefore, PHTs are preferred to ORIMs and have recently been utilized in numerous image processing applications.

Especially for small size images, the current computational framework of PHTs is disadvantaged by geometric errors and numerical integration errors, which can be seen from image reconstruction errors, which are visible near the circular disk center, and even low orders of transform are numerically unstable. Mapping a square image into a unit circular disk can cause geometric errors, and approximating the integration with zeroth-order summation will lead to numerical integration errors. Accurate PHTs are extremely essential in tremendous image processing applications, primarily in template matching and optical character recognition problems using small size images. In addition to the problem of computation accuracy, another problem is that ORIMs and ORITs mainly focus on grayscale images, yet with the continuous improvement of computer performance, color images are drawing more and more attention from researchers in the related fields, because color images can provide much more abundant information compared with grayscale images. Most of the current research on color image moments depends on the intensity or a single channel within the color space of color images, where the information and relationship between color components in a specific color space are discarded. Several traditional orthogonal and nonorthogonal moments have been developed into quaternion moments, e.g., quaternion Zernike moments (QZMs) [2], quaternion radial harmonic Fourier moments (QRHFM) [11], quaternion polar harmonic Fourier moments (QPHFMs) [12], and quaternion polar harmonic transforms (QPHTs) [13, 14], Although various quaternion image orthogonal moments have been proposed by researchers, most of them give unsatisfied image reconstruction performance. In the present work, accurate quaternion orthogonal transforms, i.e., accurate QPHTs (AQPHTs), are proposed. In summary, the innovations of the proposed AQPHTs include the following: firstly, by using the algebra of quaternions, we extend the PHTs to color images. Secondly, a computational framework, which reduces geometric errors and numerical integration errors, is developed to construct an approach to accurately calculate PHTs. Experiments are conducted to comparatively study the performance in image reconstruction and image retrieval of AQPHT and other ORIMs, and the results present that AQPHT has the best image reconstruction performance and superb behavior in invariant image representation with and without noises.

The paper is organized as follows. Section 2 introduces the quaternion algebra and PHTs. Section 3 describes the computational framework of our AQPHTs. In Section 4, the effectiveness of the proposed AQPHTs is evaluated. Section 5 concludes this study.

2. Preliminaries

2.1. Quaternion Algebra

A quaternion can be regarded as a generalized complex number. The formula of a quaternion is as follows:where denotes the quaternion, and denote the real and the imaginary part; , , , and are real numbers, and , , and are complex operators that satisfy these conditions:

The conjugate of is , and the magnitude equals . A quaternion can be considered as the combination of a scalar part and a vector part .

2.2. Polar Harmonic Transform

The two-dimensional polar harmonic transforms of an n-order image function , with repetition defined bywhere denotes a normalization factor, and denotes the complex conjugate of the basis function . For PCET, and are given bywhere . While in PCT and PST,

The basis functions of PST and PCT are defined, respectively, as

The total number of PCET for and is where . However, PST and PCT have a smaller number of features, which are and , respectively.

For an N  ×  N image, there is no analytical solution to the integration in equation (3), therefore its zeroth-order approximation is commonly used and given bywhere

We perform inverse transforms to reconstruct the image function. It is expressed bywhere , , , and are the maximum and minimum orders and repetitions of PHTs, respectively, as specified in equation (3).

3. Accurate QPHTs

3.1. QPHTs

In traditional processing methods, color images are usually divided into 3 components and each component is separately handled by a follow-up process without taking into account the association within components. But, quaternion-based theory treats a color image as an integral vector, by which the relationships between each component are reflected. Let , , and denote the color components of red, green, and blue, respectively, then we encode a color pixel . The pixel can be encoded as a pure quaternion:

Therefore, PHTs can be defined in the quaternion field. The multidimension feature of color images can be addressed by the transforms wherein each color pixel is treated as a whole. Due to the noncommutative multiplication of quaternions, each quaternion transform has two different forms. The left-side quaternion PCET is defined as follows:where denotes a pure unit quaternion and denotes the quaternion representation of a color pixel. , i.e., the gray line in the RGB space is chosen in this work. By reversing the orders of images and transform kernels in equation (11), we can obtain the right-side QPCET:

The reconstruction of a color image by using the QPCET coefficients of the left and right sides can be written as equations (13) and (14), respectively:

Analogous to QPCET, the left and right-side QPCT can be defined as equations (15) and (16), respectively,where all factors are defined as in equation (5). Replace the radial components in Equations (15) and (16) with , and we can obtain the QPST. The construction of color images can also be achieved by using QPCT and QPST coefficients.

We note that the QPHT in this paper refers to the left-side QPCET as defined in equation (11), unless otherwise specified.

3.2. Computation of Accurate QPHT

This subsection provides a computational framework for calculating QPHTs, which can reduce geometric errors and numerical integration errors. Using our method, only the part inside the unit disk of the pixel is considered. For numerical integration, we rewrite equation (11) as

We assume the image function is constant in one grid. The calculation accuracy of QPHTs can be increased through computing the numerical integration of basis function for each grid, that is,where

Since equation (18) is evaluated by using a point Gaussian numerical integration method, it can be simplified into

The values of and for a given can be obtained by using standard procedures [15]. For a quick reference, we provide these values in Table 1 for  = 1 through 8. The constraint in equation (21) is an improvement over the constraint in the zeroth-order approximation.

If the center of a grid falls outside the unit circle, this grid will be completely ignored in computation in zeroth-order approximation. But in the new constraint, a grid will be considered in the calculation if its sampling point falls within the unit disk even if its centers fall outside the unit disk. An improvement of the approximation of the unit disk has been observed with this new constraint. This is a great contribution of the numerical integration method for reducing geometric errors and numerical integration errors simultaneously by ensuring that none of the sampling points fall outside the unit disk. We find that the performance of image reconstruction improves as we increase the number of , but the gain starts to saturate at  = 6. Therefore, the accurate computation experiment of QPHTs uses 5 × 5 sampling points (i.e.,  = 5), which we take as a tradeoff accuracy and speed for color images.

3.3. Geometric Invariance of AQPHT

Here, we will derive and analyze the rotation and scaling invariant property of AQPHTs.

3.3.1. Rotation Invariance

Let denote the image rotated by the angle . Accordingly, the left-side AQPHT of iswhere and are the AQPHTs of and , respectively.

Accordingly to equation (21), we know that a rotation of the color image by an angle induces a phase shift of the . Taking the norm on both sides of equation (17), we have

Therefore, the rotation invariant can be achieved by taking the norm of APHTs. In other words, the AQPHTs modulus coefficients are invariant with respect to image rotation.

3.3.2. Scaling Invariance

Theoretically, AQPHTs are not invariant to image scaling, but scaling invariance can be obtained by normalizing the image into a unit circle. If an image with pixels is mapped to a unit circle with and a unit circle is made to cover the same contents of the image, the AQPHTs are variant to image scaling.

4. Experiments and Analysis

This section is intended to validate the effectiveness of AQPHT invariants for color images. The experiments are performed using MATLAB version 8.6 on a 2.9 GHz processor, 8 GB RAM computer, and Microsoft Windows 10 Ultimate operating system.

4.1. Experiment on Scaling Invariance

Numerous great classification models, e.g., VGG [16] and ResNet [17], have been developed on the basis of the powerful deep learning framework Their recognition accuracy is amazing and even higher than that of human beings. However, the performance is achieved only for large size images having rich object structure and high quality appearance. The resolution of images with small size is low, which limits the learning of discriminative representations, thus leading to identification failure [18]. In this experiment, the impact of down-scaling operation on image representations can be evaluated on Oxford5K dataset [19] by our AQPHT and the widespread deep model of VGG16. For convenience, only the neural activation in the 36-th layers of the VGG16 model is studied as an example of deep models in this work. The ability of representation is measured by the mean Euclidean distance which is calculated between the features extracted from an image at various resolutions. Figure 1 separately presents the experimental results of mean Euclidean distance at various down-sampling scales for the VGG16 model and AQPHT. From Figure 1(a), although the VGG16 model is carefully trained with some extra useful training tricks on the Oxford5K dataset, the mean Euclidean distance is growing larger with the decrease of image resolution, indicating that the ability of deep representation reduces with decreasing image resolution. From Figure 1(b), the mean Euclidean distance of the AQPHT is almost constant, demonstrating the superb performance of the AQPHT invariants under scaling transform.

4.2. Experiment on Image Reconstruction

The capability of image representation of image moments can be measured by image reconstruction performance. This subsection compares the performance of image reconstruction between AQPHT and QZM [2], QPHT [14], QPHFM [12], and QRHFM [11]. Following the standard [11, 12], we set the relevant parameters in our experiments. The mean square reconstruction error (MSRE) is used to measure the reconstruction performance [11, 12]; lower MSREs equate to better reconstruction performance. Suppose and indicate the original image and reconstructed image, respectively; the MSRE is defined as

In Figure 2, we plot the curves of average MSREs versus the data amount of coefficients. The coefficients in image reconstruction can be represented by real values, and the total number of required real values is defined as data amount. For instance, four real values are needed to represent one AQPHT coefficient, while representing the coefficients of three components of APHT-RGB require six real values [9]. It can be seen from Figure 2 that our AQPHT achieves higher MSREs than APHT-RGB. By dealing with various color channels in a holistic way, AQPHTs can obtain the essence of color images from the inter- and intrachannel directions, demonstrating better compactness.

Next, we use two additional metrics, i.e., PSNR and SSIM, to evaluate the reconstruction performance. The PSNR between the reconstructed image and original image of size is calculated aswhere is the maximum possible pixel intensity value. The SSIM is designed to better match the human perception compared to PSNR; the SSIM is defined aswhere and are respectively the average and variance of the pixel values, is the covariance of and ; and are constants. Following previous methods [20, 21], we set and in this paper. Higher values for PSNR and SSIM indicate better performance.

The “Lena” of size of 128 × 128 is reconstructed using our AQPHT and the compared methods with a maximum order ranging from 4 to 28. As shown in Figure 3, we can find that the reconstructed images of our AQPHT are far better than other moments. When the number of moments exceeds a certain value, the reconstruction performance of QZM [2] and QPHT [14] even degrades, whereas the reconstruction performance of our proposed AQPHT keeps getting better.

4.3. Experiment on Image Retrieval

Here, a series of extensive experiments are conducted to compare our method with other leading edge ones, namely, traditional hand craft-based methods [2225] and CNN-based methods [2630]. To evaluate the performance, we use the average precision (AP) measure computed as the area under the precision-recall curve for a query. We compute an average precision score for each of the 5 queries for a landmark, averaging these to obtain a mean average precision (mAP) score. The average of these mAP scores is used as a single number to evaluate the overall performance. For a fair comparison, postprocessing methods, e.g., query expansion are excluded in the work, and only mAP in representation with relevant feature dimensions is reported. The results of retrieval accuracy (mAP) [31] of the UK-bench [32], Holidays [33], and Oxford5k [19] are presented in Table 2, in which the bold indicates the best results. From Table 2, the performance of our method is much better than all hand craft-based methods but marginally worse than some CNN-based methods [2830]. However, the length of a feature vector largely determines its retrieval efficiency. We note that our method uses feature vectors with significantly low dimensions comparing with CNN-based methods. In other words, the retrieval efficiency of our method is increased under the same experimental conditions in spite of the slight decrease of its accuracy, which means our method is a more effective compromising method for color image retrieval. Besides, the proposed AQPHT performs better than the QPHT [31] on 3 datasets by a large margin, which demonstrates the image representation ability of AQPHT which is stronger than that of QPHT.

5. Conclusion

This paper presents a novel computational framework of quaternion-based polar harmonic transform, namely, accuracy quaternion polar harmonic transforms (AQPHTs). Firstly, to holistically deal with color images, AQPHTs are introduced based on the algebra of quaternions. Secondly, geometric errors and numerical errors are reduced by using the Gaussian numerical integration. Many comparative experiments are conducted to analyze the performance of AQPHTs and other ORIMs. Experimental results verify the superb performance of AQPHTs in image reconstruction and invariant image representation. For future work, AQPHTs would be tested in other color image processing domains, e.g., watermarking, segmentation, and retrieval. Besides, more accurate algorithms will be put forward and the computational method of quaternion moments will be improved.

Data Availability

The data included in this paper are available without any restriction from the corresponding author.

Conflicts of Interest

The authors declare that there are no conflicts of interest regarding the publication of this paper.

Acknowledgments

This work was supported in part by the National Science Foundation of China (NSFC) under Grant no. 61602226 and in part by the PhD Startup Foundation of Liaoning Technical University of China under Grant no. 18-1021.