#### Abstract

Quantum image processing (QIP) is a research branch of quantum information and quantum computing. It studies how to take advantage of quantum mechanics’ properties to represent images in a quantum computer and then, based on that image format, implement various image operations. Due to the quantum parallel computing derived from quantum state superposition and entanglement, QIP has natural advantages over classical image processing. But some related works misuse the notion of quantum superiority and mislead the research of QIP, which leads to a big controversy. In this paper, after describing this field’s research status, we list and analyze the doubts about QIP and argue “quantum image classification and recognition” would be the most significant opportunity to exhibit the real quantum superiority. We present the reasons for this judgment and dwell on the challenges for this opportunity in the era of NISQ (Noisy Intermediate-Scale Quantum).

#### 1. Introduction

Since the concept of quantum computing was proposed by Feynman in 1982 [1], the achievements by many geniuses have shown that quantum computing has dramatically improved computational efficiency. The theory to implement quantum computing is nearly mature; the challenge of realizing universal quantum computing mainly comes from technical issues, such as manipulating large-scale qubits [2]. In recent years, with the successional breakthroughs in quantum technology, quantum computing has entered the era of NISQ (Noisy Intermediate-Scale Quantum), when it is supposed as ready to display “Quantum Supremacy” in some practical application areas [3].

Among these applications, quantum image processing (here mainly refers to the image classification and recognition) is most likely to be a killer app in the near future and, for the commercial reason, to be the favorite one of big companies (Google, IBM, Intel, etc.). Some new works have already made considerable progress. For example, in this February, Llyod et al. described the method of training the map from classical data to quantum states (maximizing the gap between mapped classes in Hilbert space), which has the power to distinguish the images of ant and bee [4]. Whether the method is superior to classical image recognition is unknown, but it does show the great potential of quantum image recognition.

Besides image classification and image recognition, quantum image processing (QIP) involves exploiting quantum properties to represent, manipulate, compress, and address other issues related to images in a quantum computer. The quantum properties lead to the higher efficiency of representing and manipulating images in the quantum context but also bringing about troubles that do not exist in the classical environment. For example, when representing images with quantum states, one cannot replicate images since quantum states obey the noncloning principle. For another example, if one wants to obtain the image manipulating result by measurements, he must find a smart way to handle state collapsing.

The missing or improper way to handle these issues in some published papers leads to a big controversy. Some scholars even deem QIP is a “quantum hoax.” Honestly, their doubts make sense. Not all classical image operations are worthy of implementing in the quantum realm. Still, we note there are branch research studies in QIP that can maximize quantum superiority and weaken or even eliminate these troubles derived from quantum properties. We argue “quantum image classification and recognition” would be the candidate. We will give the reasons for this judgment after describing the research status of QIP and then discuss the opportunities and challenges in this direction.

#### 2. Research Status of Quantum Image Processing

The study of QIP started in 1997 [5]. In the early days, very few scholars paid attention to this direction, and the publications were also very few. In recent years, frequent relevant publications in major journals indicate QIP is heating up. The research topics have become rich and hierarchical, which denotes QIP has become an independent research branch of quantum information and quantum computing. In the following sections, we will discuss QIP from two aspects, quantum image format and quantum image operations.

##### 2.1. Quantum Image Format

Quantum image format is the core topic of QIP. Qubit Lattice [6, 7], Real Ket [8], and FRQI [9] are three major quantum image formats.

Qubit Lattice is the first quantum image format proposed by Venegas-Andraca [6, 7]. He said if the frequency value (color value) of the light wave can be mapped to the probability amplitude of a qubit, then the pixel value of *i*^{th} row and the *j*^{th} column can be stored in the amplitude angle shown in equation (1), and the whole image can be represented as a qubit string (equation (2)).

This representation scheme’s essence is to map the image’s spatial information to the amplitude of a single qubit without using quantum properties of superposition and entanglement.

The Flexible Representation of Quantum Images (FRQI) proposed by Le et al. [9] was an upgraded version of Qubit Lattice by exploiting quantum state superposition. The scheme still maps each pixel’s grayscale value to the amplitude, meanwhile introducing an auxiliary qubit to denote the spatial position of each pixel. Then, the whole image is prepared into a large quantum superposition state. Equation (4) depicts a 2^{n} × 2^{n} quantum image, where *i* can be regarded as an indicator of pixels’ position (row × column converted to a one-dimensional vector). Due to quantum states’ superposition effect, the representation (storage) space decreases exponentially compared to the classical image.

Due to the merit (small storage space, simple, and easy to understand) of FRQI, many follow-up studies have been carried out to extend the scheme. Instead of mapping the pixel's gray value to the amplitude angle, Zhang et al. used a group of ground states to represent the pixel’s value (a qubit string) in a larger Hilbert space. This scheme facilitates some image operations and improves efficiency [10]. Wang et al. extended FRQI to polar coordinates and replaced pixels’ spatial position with polar diameters and polar angles [11, 12]. Ruan et al. expressed the pixel’s gray value as , replacing the that stored the gray value in FRQI [13], which actually replaced the ZOX plane’s rotation on Bloch ball with a rotation of angle in the XOY plane. All these extended schemes’ basic idea is to prepare the image into a quantum superposition state in terms of pixels’ spatial distribution information, which is not fundamentally different from FRQI.

The last quantum image format, Real Ket, was proposed by Latorre [8]. He divided the image into 2 × 2 pieces and then mapped the four pixels’ grayscale value to the probability amplitude of each component of a quantum state with 2 qubits. Equation (5) describes this quantum state, where *i*_{1} = 1 can be understood as the index of the top-left pixel, *i*_{1} = 2 as the index of the top-right pixel, *i*_{1} = 3 as the bottom-left pixel, and *i*_{1} = 4 as the bottom-right pixel. stores the mapping value of each pixel and satisfies .

In Ref. [6], this normalized quantum state is called qudit. If one rewrites it in terms of qubits, a qudit is actually a superposition state of 2 qubits.

Expanding this 2 × 2 block once, one can obtain a 4 × 4 block, which can be represented as a quantum state shown in equation (6), where *i*_{2} = 1 can be regarded as the index of the first (upper left corner) 2 × 2 small block, *i*_{2} = 2 is the index of the second (upper right corner) 2 × 2 small block, *i*_{2} = 3 is the index of the third (lower left) 2 × 2 small block, and *i*_{2} = 4 is the index of the fourth (lower right) 2 × 2 small block. In the process of construction, the quantum state must be normalized again, i.e., satisfies .

Through expanding increasingly, a 2^{n} × 2^{n} image can be mapped to the quantum state as shown in the following equation:

The basic idea of this representation is using the basis to represent the pixel’s spatial position while using the probability amplitude to represent the color information. Ref. [13, 14] exploit this idea to implement similar schemes.

Let us analyze the storage efficiency of these formats briefly. For a 2^{n} × 2^{n} gray image, if the gray value of each pixel is represented by 8 classical bits, then a total of 2^{n} × 2^{n} × 8 bits are required for the classical image. As for the Qubit Lattice, only one qubit is needed to represent a single pixel’s grayscale value, so 2^{n} × 2^{n} qubits are needed for the whole image. In FRQI, a qubit’s gray value is still represented by a qubit, but since the whole image is prepared into a superposition state in terms of the row and column coordinates, only 2*n* + 1 qubits are needed. In the last format Real Ket, as the pixels’ grayscale information is stored in the probability amplitude of the components of a superposition quantum state, the storage space only need 2*n* qubits. Thus, Real Ket uses minimum storage space.

##### 2.2. Quantum Image Operation

Image operations involve a wide variety of types. In recent years, many works have discussed this topic in the quantum realm, and the operations realized are increasingly abundant. Here, we roughly classify these operations into geometric transformation, color transformation, and complicated operations such as compression and retrieval. We describe them, respectively, as follows.

###### 2.2.1. Geometric Transformation

Geometric transformation refers to changing the spatial position (coordinate) of pixels, such as image rotation, or changing image shape, like zoom in or out. The realized operations include rotation, the top/bottom swapping or left/right swapping, the interchange of any two coordinates of the image [15–17], erosion and dilation operations [18], image magnification [19, 21], and overall translation and cyclic translation of the image [22].

Take FRQI as an example. Its format can be simplified as qubits representing the color information tensor by qubits representing pixels’ coordinate information (equation (8)). *k* is the position component, representing *N* pixel’s spatial position. *C*_{k} is the color component, representing the *k*^{th} pixel’s gray value.

In general, geometric transformation refers to realizing the image operation operator acting on the position component. According to the scope of the operator, we can classify it into global operation and local operation.

As the name implies, global operations operate on all pixels of the image, such as rotation (rotate , , ). Equation (9) gives a formal description of such operations, where *G* is the unitary transformation acting on all pixels.

Since each pixel’s gray values are stored in the corresponding component of the superposition state, performing a unitary operation *G* would modify all components simultaneously (quantum parallelism). Thus, global operation *G*’s efficiency is much higher than the counterpart operation to a classical image.

Local operations only involve the manipulation of a few pixels, leaving other pixels unchanged. For example, swap the coordinate of *i*^{th} and *j*^{th} pixels. Then, there should be a unitary operator *S* that satisfies

The operator *S* can be constructed as equation (11). Swapping two pixels only needs this operator to act once.

Consider the counterpart operation in a classical image. Swapping two pixels is equivalent to switching *i* and *j* in an array. That means one needs to perform at least two writes, one at *i* and one at *j*. Obviously, local quantum operations are also faster than classical processes.

###### 2.2.2. Color Transformation

Color transformation refers to changing the image’s pixel value, such as halftone processing of images [23]. If such changes can be restored by some means, then image watermark [24–26], shuffling color blocks [27, 28], image encryption/decryption [29–33], etc., can be regarded as this category.

In terms of the acting scope of color transformation, one can also divide it into global operation and local operation. Using a similar analysis above, one can see that the global color transformation achieves exponential acceleration compared to the classical counterpart and local operation is faster than its classical counterpart [10, 34]. Because the quantum Fourier transform is invertible and exponentially faster than the classical Fourier transform, it is widely used in image watermarking and image encryption/decryption [24, 29, 30]. Due to the “uncertainty” principle of quantum mechanics, quantum image watermarking, quantum image encryption/decryption, etc., can guarantee hidden information security.

###### 2.2.3. Complicated Image Operations

In classical image processing, effective methods to achieve compression, retrieval, recognition, segmentation, registration, and other operations generally need to perform some preprocessing based on the original image data, such as transforming the image domain or extracting image features. In this paper, we call these operations as “complicated” image operations.

Compression is the most discussed topic. In Ref. [9], quantum image compression is defined as reducing the number of quantum gates when preparing quantum images (quantum states). This definition is different from classical image compression, and in essence, the discussion of such problems can be defined as the optimization of quantum circuits. Ref. [10, 11, 34, 35] describe the methods of performing such compression. Although these methods are slightly different, the basic idea is to reduce the number of quantum gates by simplifying Boolean expressions. A more natural definition of compression would be how to reduce the number of qubits representing a quantum image (corresponding to classical image compression), that is, to reduce the dimensions of the Hilbert space representing the quantum state (quantum image). From the perspective of information theory, it seeks to represent a quantum image with more concentrated energy, i.e., concentrate image primary information into a smaller dimension. Ref. [8] proposed a transforming method based on the matrix product state theory. And this mathematical form makes it possible to obtain the optimum lossless compression and enables us to distinguish the important information and redundant information to perform lossy compression.

Another frequently discussed topic is image retrieval. In quantum image processing, retrieval has two meanings: one is to retrieve classical information from a quantum state and the other is to retrieve the quantum information in the quantum state. The former’s implemental method has been described in Ref. [5, 6, 35–37]. With the same idea, a large number of quantum images (quantum states) are prepared, the probability amplitude of each ground state is estimated by measurements repeatedly, and then the original quantum image (quantum states) is recovered according to the probability distribution. The latter is based on the retrieval of quantum image content. Schutzhold presented a quantum algorithm for finding simple patterns (such as a parallel line) in black-and-white binary images [38]. The algorithm utilizes quantum Fourier transform characteristics to work in parallel and can achieve exponential speedup compared to classical algorithms. Venegas-Andraca described the relationships between the vertices of graphs such as triangles and squares with quantum entanglement and exploited Bell inequality to provide a method to retrieve the existence of these graphs in black-and-white binary images [39].

Besides, Le et al. discussed image segmentation. The algorithm uses the operator prepared by an orthogonal basis and the gray level information encoded into the ground state of the quantum state (orthogonal basis) to make an equivalent determination and uses Grover algorithm to accelerate this process [40]. Caraiman realized image segmentation based on the threshold by calculating histogram [41]. Zhang et al. discussed image registration by giving an ordinal number to images with different rotation angles and then using the Grover algorithm to retrieve the ordinal number [12]. This approach is similar to binding a keyword to each image and then retrieving it based on the keyword, rather than content-based retrieval.

##### 2.3. Discussion

Applying the quantum properties of superposition and entanglement to map classical images and store them in qubits is the basic idea of preparing quantum images. Due to the parallel computing induced by the quantum superposition effect, the quantum image operation’s efficiency is much higher than the corresponding classical image operation. But if taking into account the cost of quantum image preparation and the cost of obtaining the image manipulation result by measurements, the claim that exponential acceleration of quantum image operation may not exist. Mastriani enumerated these doubts [42] (similar challenges in quantum machine learning [43]) and concluded that many published works related to QIP are “Quantum Hoax.” His main viewpoints can be summarized as follows:(1)Input: many published papers did not consider/reckon the cost of preparing quantum states (images) from classical data (image).(2)Output: obtaining the result of image operation requires an exponential scale of measurements.(3)Noise: quantum image is sensitive to noise and simulation software such as MATLAB is not capable of verifying the correctness of quantum algorithms.

His criticism triggered a fierce debate. This March, the journal Quantum Information Processing published both Li et al.’s comments [44] on Mastriani’s original paper and Mastriani’s rebuttal to these comments [45]. The opinions from both sides partly make sense, but neither seems to be quite right.

For the first doubt, FRQI or other image formats are just a method to map classical data to quantum data without theoretical defects. These quantum images can be efficiently prepared. The interested reader can refer to Yao et al.’s work [14] for a detailed discussion on this issue.

For the second and third doubts, if image operations like geometric transformation and color transformation are one’s final goals, he has to measure each pixel to get the results. That procedure for a quantum state (quantum image) is named quantum state tomography, which requires an exponential scale of measurements for a general state [46]. Besides, since the measurement result is the statistical result of the observed value, it is difficult to eliminate the measurement noise. Thus, in quantum image processing, this kind of research work would have little practical significance.

But, if one only wants to take advantage of the quantum image’s overall characteristics or some statistical properties rather than read all pixels’ value. Quantum image operations like geometric transform and color transform are intermediate steps to the end. These operations would make sense. For example, the HHL algorithm [47] for solving linear equations presents the solution hidden in the quantum superposition state. The correct way to use this solution is to exploit this state’s overall property rather than measure it to get the probability distribution for each component. This approach to using HHL forms the foundation of many quantum machine learning algorithms [48–50].

Whether the simulation software MATLAB can verify the algorithm of quantum image processing should be considered from two aspects. First, quantum image operations are certain quantum algorithms which can be described by sequential unitary transformations acting on a complex vector in a Hilbert space. Certainly, MATLAB has the capability to simulate such unitary evolution. Second, to our knowledge, there seems to be no module in MATLAB that can simulate quantum noise. Therefore, if one wants to verify quantum algorithms’ performance on real quantum computing devices, IBM’s qiskit or other tools would be a better choice.

#### 3. Opportunities and Challenges

##### 3.1. Opportunities

Quantum algorithms can improve computational efficiency, but it may not apply to all application scenarios. The successful quantum algorithms all have a distinct characteristic: the intermediate procedure of computation is complicated, suitable for constructing quantum state entanglement and superposition for parallel acceleration; the result is simple and often a decisive answer. In that case, entanglement/superposition degenerates at the end of the quantum algorithm, and the probability amplitude of a single basis is 1 or close to 1. For quantum image processing, the task of image classification and recognition accords with this characteristic. The intermediate algorithm execution procedures involve feature extraction, classifier training, and various distance computing. The result only needs to answer “yes or no” to determine whether the image belongs to a specific category.

Moreover, a wide variety of machine learning protocols operate by performing matrix operations on vectors in a high-dimensional vector space. Quantum mechanics is all about matrix operations on vectors in high-dimensional vector spaces. Thus, performing machine learning tasks in the quantum realm would probably be beneficial if one can take advantage of the natural connection between these two disciplines. This path has proved to be correct. Quantum machine learning has made a lot of achievements in recent years. Thus, quantum machine learning algorithms could act as the building blocks for the classification and recognition of quantum images. Here, we would like to introduce two commonly used techniques “swap test” and “inversion test” to show the advantages of these methods.

Figure 1(a) illustrates the principle of “swap test.” and are two quantum states (quantum images). Using the circuit, one can estimate the similarity between and by measuring whether auxiliary qubit ends up in alone.

**(a)**

**(b)**

is known as “fidelity” in quantum information and “cosine distance” in classical machine learning. One can see that if and are farthest away from each other (orthogonal), this probability is ; if and are closest to each other, this probability is 1. One can also see that this probability estimation has nothing to do with the feature space’s dimension. The higher the dimension is, the more efficient this technique is against the classical algorithm. Lloyd points out that even considering the cost of preparing quantum states, this technique is much more efficient than distance calculations on classical computers [51]. Derived from this technique, many more general distances such as *Euclidean distance* = can also be calculated. This technique is widely used in various quantum machine learning algorithms [49–53].

The circuit in Figure 1(b), proposed by Havlíček et al. in 2019, is known as the “inversion test” to calculate the distance between quantum states. *x* and *y* are features of the classical data (image), as an argument to input into quantum circuit and its inverse circuit . One can see that if *x* and *y* are equal (similar), the probability should eventually be 1 or near to be 1 when doing the projection measurement on . If *x* and *y* differ a lot, this probability is smaller. This technique can also calculate distances in quantum space (Hilbert space), and its advantage versus “swap test” is reducing a half number of qubits used.

##### 3.2. Challenges

Based on the above discussion, one could conclude that quantum image classification and recognition is the most significant opportunity in quantum image processing. But we must note that the certification of its ability beyond the classical image recognition method in both theory and practice still needs to face the following challenges.

###### 3.2.1. Preparation of Quantum Images

Only by mapping and preparing classical data to quantum superposition state can the advantage of quantum computation be brought into play. The general method to prepare a quantum superposition state is to use QRAM (Quantum Random Access Memory) [54, 55]. Its basic idea is to use a “bucket brigade” structure to distribute *N* d-dimensional vector data on the *Nd* leaf nodes of the “tree.” Based on this structure, QRAM can prepare *N* d-dimensional vectors into log(*Nd*) qubits superposition state in O(log(*Nd*)) time. However, this structure requires O(*Nd*) physical resources. Its scale is exponential in the number of qubits, so whether it can provide real computational advantages in the actual experimental environment is still a big question [56, 57].

###### 3.2.2. Feature Extraction

Feature extraction plays a vital role in classical image recognition. Effective features can eliminate the influence of image background, size, lighting conditions, camera angle, etc., and improve image recognition accuracy; however, little literature talks about how to extract features from the mainstream quantum image formats. The absence of discussion on this issue cast doubt on some impressive work. For example, Ref. [14] used Hadamard transform to calculate the difference between adjacent pixels quantum image and then compared the known pattern by “swap test” to detect the image edge. However, if the image pattern has a certain degree of deformation, such as size inconsistency, which will lead to a considerable difference between the known pattern and the quantum image to be tested, is it still valid to use “swap test” to detect the image edge? Moreover, this paper only experiments for image edge detection on binary images; it is unknown that the proposed approach for natural images’ detection is still effective.

###### 3.2.3. Nonlinear Operations

Effective procedures in a classical image (pattern) recognition and machine learning tend to be nonlinear, such as the sigmoid function used by training perceptron in deep learning. The nature of quantum mechanics is linear. A feasible scheme is to induce nonlinear operation through measurements. However, the quantum state will collapse after measurements, making the system’s evolution lose its quantum characteristics and degenerate into the classical probability perceptron. Another popular solution is to implement nonlinear operation (such as feature extraction) by a classical process and then compute the kernel function in quantum space (Hilbert space). Llyod called this approach as “quantum embedding” [4, 52, 58]. This strategy’s essence is to circumvent the nonlinear operation in quantum space; whether it is superior to the classical machine learning algorithm needs further verification.

###### 3.2.4. Noise

Noise is one of the most important theoretical problems in quantum computation. The existence of quantum noise may lead to quantum computation to classical probabilistic computation. Since the 1990s, people have been studying quantum error correction codes [59] and further developing the concept of fault-tolerant quantum computing [60]. In recent years, eliminating errors caused by noise has been a fascinating research direction, among which topological quantum computing has aroused the most concern because it promises to solve quantum noise completely [61]. The more interesting thing is that functions that cannot be learned in a noisy classical environment can be learned in the noisy quantum environment, such as Disjunctive Normal Form (DNF) [62]. In recent years, this research has extended to more general linear functions such as odd and even functions [63].

#### 4. Conclusions

Compared with classical image processing, quantum image processing is far from sufficient in both depth and width. The “quantum advantage” claimed in some related published papers has also been doubted by many scholars; the core of these doubts is “how to obtain quantum image operating results efficiently and accurately.” We deem that these studies try to get quantum image operation results by recovering classical images via measurements without practical significance.

On the contrary, the kind of research that only exploits quantum images’ statistical characteristics may be valuable. With the assistance of quantum machine learning, quantum image classification and recognition would have the most significant opportunity to be a “killer app” in the actual commercial field in the NISQ era.

We are sure that not all classical image manipulation is necessary to implement in a quantum computer. Whether image geometric transformation, color transformation, and other similar operations are worthy of implementing in the quantum realm depends on the application scenario. For example, Yao et al. calculated the difference between adjacent pixels in the quantum image via Hadamard transform (which can be regarded as a kind of color transform) and then used the result directly to obtain the image edge by matching a known pattern [14]. That is, these kinds of quantum image operations make sense, only proving that they are intermediate steps to the end rather than the final goal.

Whether quantum image processing (in some aspects or some specific applications) can achieve an advantage over classical image processing in realistic scenes remains to be seen. It depends on the solution of four challenges in Section 3.2, some of which are unique to the quantum image processing, such as the quantum image feature extraction; some are common to quantum algorithms, such as handling noise. Figuring out one will significantly boost the research of this field.

In short, quantum image processing with both opportunities and challenges is worthy of further in-depth study.

#### Data Availability

The data used to support the findings of this study are included within the article.

#### Conflicts of Interest

The authors declare that there are no conflicts of interest regarding the publication of this paper.

#### Acknowledgments

This research was funded by the Natural Science Foundation, China (grant no. 61802002), Natural Science Foundation of Anhui Province, China (grant no. 1708085MF162), and Foundation for Key Project of Anhui Provincial Department of Education (grant nos. KJ2019A0063 and KJ2020A0233).