Research Article  Open Access
Dimension Reduction Using Quantum Wavelet Transform on a HighPerformance Reconfigurable Computer
Abstract
The high resolution of multidimensional spacetime measurements and enormity of data readout counts in applications such as particle tracking in highenergy physics (HEP) is becoming nowadays a major challenge. In this work, we propose combining dimension reduction techniques with quantum information processing for application in domains that generate large volumes of data such as HEP. More specifically, we propose using quantum wavelet transform (QWT) to reduce the dimensionality of high spatial resolution data. The quantum wavelet transform takes advantage of the principles of quantum mechanics to achieve reductions in computation time while processing exponentially larger amount of information. We develop simpler and optimized emulation architectures than what has been previously reported, to perform quantum wavelet transform on highresolution data. We also implement the inverse quantum wavelet transform (IQWT) to accurately reconstruct the data without any losses. The algorithms are prototyped on an FPGAbased quantum emulator that supports doubleprecision floatingpoint computations. Experimental work has been performed using highresolution image data on a stateoftheart multinode highperformance reconfigurable computer. The experimental results show that the proposed concepts represent a feasible approach to reducing dimensionality of high spatial resolution data generated by applications such as particle tracking in highenergy physics.
1. Introduction
Highenergy physics deal with advanced instruments such as particle accelerators and detectors. These machines use electromagnetic fields to accelerate charged particles to high speeds and create collisions. By studying particle collisions and tracking collision trajectories, physicists can test the predictions of many theories of particle physics such as properties of the Higgs boson [1], discovering new particle families [2] as well as many highenergy physics problems [3]. There are a number of highenergy physics (HEP) research centers [4]. The largest particle accelerator is the Large Hadron Collider (LHC) in Geneva, Switzerland. Largescale generalpurpose particle detectors have been developed at the LHC. The ATLAS [5] and Compact Muon Solenoid (CMS) [6] are two examples which are used for studying the properties of the Higgs boson and investigating new physics. The ATLAS has an inner detector that has been used to observe the decay products of collisions. The pixel detector [7] is one of the main components of the inner detector, having over 80 million readout channels [8] (pixels), which contribute to half the total readout channels of the entire experiment. Reconstruction of highenergy particles from the pixel detector is considered a critical design and engineering challenge [9], due to its large readout count, high spatial resolution, and 3D spacetime measurements. There have been efforts to improve the tracking performance of the ATLAS Inner Detector [9, 10], which involved insertion of additional pixel detector layer (Insertable BLayer). Another approach that has been considered in the ATLAS FTK (Fast Track Trigger) upgrade [11] is using variable resolution patterns, where the data from the detector is compared to generated pattern banks of particle tracks and nonintersecting data is filtered. In highdimensional datasets, e.g., the pixel detector readout data, not all the measured data variables are relevant in understanding the underlying regions of interest (RoI). Generally, statistical predictive models are applied to multidimensional datasets for detection and pattern matching, which is a computationally expensive process. Thus, an effective method is needed to reduce the dimensionality [12] of the data in such highdimensional spatial sets, for faster detection and matching.
As a feasible solution to this problem, we here propose combining waveletbased dimension reduction techniques [13–15] with quantum information processing (QIP) [16] for applications in domains that generate highdimensional data volumes such as highenergy physics (HEP). More specifically, we propose using quantum wavelet transform (QWT) to reduce the dimensionality and high spatial resolution of data in HEP particle tracking. Waveletbased dimension reduction has been shown to be an effective technique in image preprocessing, reducing computation time, reducing interprocessor overhead, and improving classification accuracy [13–15]. Even so, the large volume of data from domains such as highenergy particle physics, present a challenge for a classical waveletbased method. The QWT has been demonstrated in previous works to be very useful in quantum image processing and quantum data compression [16–19]. Quantum information processing uses qubits as the basic units of information storage, compared to classical binary forms, and can exploit quantum mechanical properties such as entanglement and superposition [20]. Therefore, applying QIP techniques such as QWT for dimension reduction of HEP data will bring substantial improvements in storage and computation compared to classical signal processing techniques. To the best of our knowledge, this work is the first to investigate QWTbased dimension reduction for HEP applications. We develop simple and effective algorithms for QWT and inverseQWT (IQWT) that are best suited for dimension reduction and present corresponding emulation hardware architectures for QWT and IQWT.
The objectives and focus of our work are to demonstrate the feasibility of QWT for dimension reduction, through emulation, and to evaluate the performance of the emulation architectures. Our proposed algorithms are prototyped on an FPGAbased quantum emulator that has been developed based on our previous works [21, 22] and has been shown to emulate full quantum algorithms such as quantum Fourier transform (QFT) [23] and Grover’s search algorithm [24]. An FPGA platform was chosen because of its reconfigurability and flexibility in emulating multiple quantum algorithms. The emulator is based on the hardware system of DirectStream [25], which is a stateoftheart reconfigurable computing platform. This emulation platform can be conveniently used to verify and benchmark future implementations of the proposed system in HEP applications. In the next section, we discuss fundamental concepts of quantum computing, QWT, and the related work done on QWT. In Section 3, we elaborate our proposed methods and emulation architectures. In Section 4, the experimental results and analysis are presented. Section 5 is our conclusion and future directions of this work.
2. Background and Related Work
In this section, we discuss background concepts of quantum computing and the quantum wavelet transform. We also discuss current and related work on QWT and highenergy particle detection.
2.1. Qubits, Superposition, and Entanglement
The qubit is the smallest unit of quantum information that describes a twolevel quantum mechanical system. Physical implementations of the qubit can be electron/atomic/nuclear spin, where spin directions of the particle represent the two qubit levels. Other physical representations of the qubit can be photon polarization, superconducting Josephson junction, etc. [26]. The qubit is represented theoretically using the Bloch sphere [20], as shown in Figure 1. The basis states of the qubit, and are denoted by poles of the sphere. The property that distinguishes the qubit from the classical bit is superposition. The qubit can exist in a mixed or superposition state that is any other point on the surface of the sphere other than the poles. The overall state of the qubit can be defined using a linear superposition equation , where α and β are complex numbers determined from φ and θ as shown in Figure 1 and satisfying . Another distinguishing property of qubits is entanglement [20]. Two or more qubits can be entangled together, which means each entangled qubit becomes strongly correlated to the other along all possible combinations of the qubits. Outcome of measurement of one qubit is dependent on the other measurement, but individually they exhibit completely random behavior. In quantum computing, most algorithms assume that the qubits are fully entangled [21]. A system of n entangled qubits can be represented in vector space as complex basis state coefficients.
2.2. Quantum Wavelet Transform
The wavelet transform, similar to other transforms like Fourier transform, decomposes input signals into their components. The principal difference is that Fourier transform decomposes input signals into their sinusoidal orthogonal temporalonly bases, while wavelet transform uses a set of nonsinusoidal functions, usually called mother wavelets, that are both spatially and temporally localized [15]. This results in a very important feature unique to wavelet transform which is the preservation of spatial locality of data. In other words, wavelet transform gives information about both time and frequency of input data. Wavelet transform also has better computation speeds compared to other transforms [14]. Therefore, they are effective and widely used in many image processing applications [16]. The wavelet transform can be effectively implemented in the quantum information processing (QIP) domain as quantum wavelet transform (QWT) [16, 18, 19]. However, the related work on QWT is rare or preliminary. This is because quantum computing and QIP are fields that are gradually developing and have not yet reached full potential. Although many largescale quantum hardware is being developed [27], their useful applications are still yet to be decided. We discuss the classical wavelet transform first and then apply it in QIP domain, to establish a model for the QWT. The general wavelet transform can be expressed bywhere is called the mother wavelet function in complex conjugate form, and a, b are the time dilation and displacement factors, respectively. Wavelet transforms can be classified as discrete or continuous depending on the use of orthogonal or nonorthogonal wavelets, respectively. For the purposes of this paper, we will discuss the discrete wavelet transform (DWT). The DWT is a decomposition of input signals into a set of wavelet functions that are orthogonal to its translations and scale. The first and simplest DWT was introduced by mathematician Alfred Haar [15] and is thus named the Haar wavelet transform. The Haar mother wavelet function can be constructed using a unit step function, , as shown in (2). The discretized version of the Haar wavelet function is defined as (3), where , , and , is the sampling period, and K is the Haar window size in samples. Applying (3) in (1), the expression for the discrete Haar wavelet transform can be derived to be (4):where N is the number of data samples. When doing computation in the quantum domain, there are efficient methods of classicaltoquantum encoding [28–30]. Classical signal samples can be encoded as the coefficients of a quantum state, which is in superposition of its constituent basis states [28, 31]. The signal samples are transformed to a normalized sequence of amplitudes as shown in (5), where n is the number of qubits, is the number of basis states of the quantum system, and is the input quantum state. By applying the wavelet transform on the input quantum state, we can formulate the equivalent expression for the quantum Haar wavelet transform (QHT) as (6), where is the output quantum state:
There are many notable works on wavelets and applications of wavelet transforms [32–34]. We focused our survey on works of wavelet transform applied in the field of quantum information processing, i.e., quantum wavelet transforms (QWT). Early work on the QWT was reported in [16], where the authors present gatelevel circuits for the quantum Haar wavelet and Daubechies wavelet. They propose techniques for efficient quantum implementation of permutation matrices, which are required for factorization of the unitary operations of the wavelet transforms. In [35], the authors present quantum algorithms for Haar wavelet transforms and demonstrate applications in analyzing the multiscale structure of a dynamical system by logistic mapping. They show the derivation of the quantum wavelet transform by factoring the classical operators into direct sums, direct products, and dot products, which is the same approach in [16]. The work in [36] also demonstrates similar quantum circuits for QWT based on the wellknown pyramid and packet algorithms which are used in classical DWT. The work in [37] presents an analytical study of effects of imperfections in quantum computation of a QWTbased dynamical model. They propose a QWTbased algorithm for the Daubechies wavelets. The works in [38, 39] demonstrate applications of QWT in image watermarking. A more recent, novel watermarking method is proposed in [18], where they demonstrate improvement in invisibility and robustness of the watermarked image. The most recent work on QWT is presented in [19], where the authors provide quantum circuit derivations for the Haar and Daubechies wavelet transforms. The authors propose QHT circuits which contain k levels of permutations, where k is the kernel size.
The previous work on the QWT has mostly presented circuits and software simulations, and no hardware implementations were reported. In comparison, our focus is on efficient hardware implementation of the QWT, and we propose an optimized, low resourceintensive approach for emulation on classical hardware. To the best of our knowledge, our work is the first to (1) propose using QHT for reducing data dimensionality and (2) provide hardware emulation architectures for QHT. Our approach is simpler and optimized for emulation because it uses a single Haar kernel model and a pair of permutation models, where the permutation models are implemented as classical circuits. We propose classical circuits for permutation because (1) quantum permutation circuits implemented using multiple levels of swap operations [16, 19, 35, 36] have large quantum cost, and (2) classical permutation techniques such as index scheduling are space and time efficient for hardware implementation.
Moreover, among the previous work there have been no experimental demonstrations of QWTs on actual quantum hardware or on any quantum emulators. In our work, we present simplified architectures for implementing multilevel, multidimensional QHT operations on classical hardware and propose application of these methods in dimensionality reduction of particle tracking data in highenergy physics applications. Our proposed algorithms and architectures are easily generalizable, compared to previous works. In addition, our proposed architectures are more effective in utilizing minimal quantum and classical hardware resources which is more suited for dimension reduction. We experimentally evaluate the architectures on a highthroughput and highaccuracy FPGA quantum emulator. To the best of our knowledge, this work is the first to present experimental demonstrations of quantum wavelets used for dimension reduction in largescale applications, e.g., LHC.
2.3. HighEnergy Particle Detectors
The ATLAS Fast Tracker (FTK) is a hardware processor upgrade [9] for the Large Hadron Collider (LHC) which has been developed for faster reconstruction of tracks at 100 kHz. Details of the operating principle, hardware components, and performance can be found in [40]. The reconstruction is done by matching detector data with predefined track patterns that are stored in associative memory on ASICs. The data processing and pattern matching are done using FPGA hardware. The FTK receives data from the ATLAS pixel detector and stores them as clusters to reduce data size. The clusters are arranged into regions for parallel processing. In the processing units (PU), the tracks are stored with full resolution on input FPGAs, while other FPGA processors are responsible for converting the stored data into coarser resolution segments. This is followed by comparison of the coursegrained segments with prestored Monte Carlo track patterns. The coarse granularity of the tracks can cause problems in identification and pattern matching and lead to slower tracking performance of the FTK. In this work, we propose QHT techniques to reduce dimensionality of full resolution data such as FTK particle tracks. We also demonstrate an FPGAbased hardware prototype that can be easily integrated into the current FTK ATLAS architecture.
3. Methodology and Emulation Architectures
In this section, we elaborate our methodology that uses QHT to achieve dimension reduction. We also detail the corresponding emulation hardware architectures that were implemented [41].
3.1. Dimension Reduction
The classical wavelet transform has been shown to achieve dimension reduction efficiently and can be used in various applications that use hyperspectral data, for example, remote sensing, mineralogy, and surveillance. Depending on the type of data and the application in which these data are being used, both 1D wavelet transform (1DWT) and 2D wavelet transform (2DWT) techniques can be used for dimension reduction. For example, while the data in remote sensing hyperspectral imagery is in the form of large 3D data cubes, 1D wavelet transform (1DWT) was previously proposed [13, 14] for efficient dimensionality reduction of such data cubes. In the experimental work in [14], five levels of wavelet decomposition were used on images of size 217 × 512 pixels by 192 bands to achieve ×32 reductions in data volume. In current and future largescale applications, the volume of data can be overwhelming. For example, hyperspectral image cubes are typically hundreds of pixels in width and height [13], with 220–240 frequency bands [14]. The ATLAS pixel detector contains 1700 detection modules corresponding to pixels [8] and has bandwidth capacity of 48 Gb/s [11]. Hence, it is necessary to investigate and apply newer paradigms of information processing and storage for supporting future applications at full bandwidths. In quantum information processing, exponentially greater amount of information can be held in the state of quantum system compared to a classical binary system. Thus, we propose using quantum information processing techniques such as the QWT for the processing of high volumes of data in largescale applications. For example, a 64K × 64K image can be reduced to a smaller resolution of 32 × 32 using a 32qubit, 12level QWT decomposition. The pixels are encoded as N basis states of a quantum state, where and n is the number of qubits, i.e., 32.
Our proposed methodology for dimension reduction using quantum wavelet transforms is shown in Figure 2 [41]. In our proposed approach, each pixel of the input image is encoded as a basis coefficient of a quantum state. Input image data first undergoes a multidimensional quantum Haar transform, e.g., onedimensional QHT (1DQHT) or twodimensional QHT (2DQHT) operation. The operations can have multiple decomposition levels and separate the input image into a number of low frequency and high frequency replications, depending on the number of decomposition levels. The lowest frequency image replication retains the principal components of the input data without significant data loss. More importantly, the mirror images have reduced dimensionality and thus can be used for reducing preprocessing overhead or communication bandwidth congestion. Multidimensional inverse quantum Haar transform (1DIQHT or 2DIQHT) is then applied to reconstruct the original data. The 2D operations can be achieved by cascading 1D operations and multiple permutation sets.
The proposed kernelbased algorithms for multilevel 1DQHT and 2DQHT are elaborated in Algorithms 1 and 2, respectively. The algorithms perform multilevel decompositions of 1DQHT or 2DQHT operations based on a ddimensional Haar wavelet kernel. The kernel functionality can be represented by a set of operations applied to some input states/pixels and is preceded and followed by perfect shuffle permutation operations [16] on the input and output states/pixels. The permutation operations are performed by means of index calculations and scheduling. Algorithm 1 performs 1DQHT on a set of input pixels, X, to produce an output pixel set, Y. The input pixels first undergo input permutations, followed by 1D Haar kernel operations on 2 pixels every cycle, and output permutations. Algorithm 2 performs 2DQHT on a set of input pixels, X, to produce an output pixel set, Y. The input pixels first undergo input permutations, followed by 2D Haar kernel operations on 4 pixels every cycle, and output permutations.


To efficiently extract output state data, quantumtoclassical readout techniques [28] such as quantum Fourier transform (QFT) can be employed. However, this was not required to be implemented in this work as full emulation of quantum computation was performed on classical hardware and the output of the emulator is in classical representation. For emulation, we develop circuit models based on these algorithms and integrate them into reconfigurable hardware architectures for multilevel, multidimensional (1D and 2D) QHT and IQHT. These models and emulation architectures are elaborated in the next section.
3.2. Quantum Haar Transform Kernel
The Haar wavelet kernel can be generalized by quantum operations using n qubits and a ddimension kernel as shown in (7), where is the Kronecker product [42], H is the Hadamard transform [20], and I is an identity matrix. Here, a group of entangled gates is denoted by the gate symbol with the size of the equivalent operation matrix as subscript, for example, . The quantum Haar function can be implemented using d entangled H gates and entangled I gates as shown in (7). For example, the transformation matrix for 2DQHT with can be derived as shown in (9):wherewhere :where
3.3. Permutation Operations
Perfect shuffle permutation on a given vector is described as partitioning the vector in half and shuffling the top and bottom portions of the halves [16]. In our algorithms for QHT and IQHT, we apply similar input and output permutation operations before and after applying the QHT kernel, respectively. The QHT kernel is performed on a set of k points, where . An input permutation operation involves dividing the input vector of size N, into k groups, and selecting a state (pixel) from every group(s), to be applied to the kernel operation. For 1DQHT and 2DQHT operations, the input permutations, and , are shown in (11) and (12), respectively. An output permutation involves arranging the pixels from k groups into a single output state sequence. The output permutation for 1DQHT and 2DQHT operations, and , are shown in (13) and (14), respectively:
3.4. Emulation Architectures
While developing the emulation architectures for the proposed system, as an intermediate step, we design circuit models, illustrated in Figures 3 and 4 for 1D and 2DQHT/IQHT, respectively. These models are derived from the sequence of operations in Algorithms 1 and 2 and can contain quantum and/or classical circuits. The 1D and 2DQHT models in Figures 3(a) and 4(a), respectively, consist of input permutation models (), followed by Haar kernel models () and then output permutation models (). The 1D and 2DIQHT models in Figures 3(b) and 4(b), respectively, consist of inverse output permutation models , followed by Haar kernel models () and then inverse input permutation models . The inverse models are equivalent to the direct models, as the permutation operations are reversible. To achieve multilevel decompositions, multiple iterations of the Haar kernel models are applied. The QHT and IQHT operations for 1D and 2D are summarized as unitary transformations in (15) and (16), respectively. The emulation architectures of the 1DQHT/IQHT and 2DQHT/IQHT are shown in Figures 5 and 6, respectively. Since the hardware implementations of the 1D and 2D are similar, we focus our following discussions on the implementation of the 2DQHT emulation architectures:
(a)
(b)
(a)
(b)
(a)
(b)
(c)
(a)
(b)
(c)
As shown in Figures 4(a) and (16), the first step in the 2DQHT operation is the input permutation , which is described by (12). The permutations can be modeled as quantum circuits with multiple swap gates, but that would incur high resource utilization in the corresponding emulation architecture. For this reason, we use classical models that involve simple index scheduling, and the corresponding emulation architecture is shown in Figure 6(a). The input is a vector of quantum state coefficients which are written to a memory array in the index order 0 to . Four coefficient values are then read out each clock cycle, with the scheduler generating the read indices according to the input permutation, see Algorithm 2 and (12). The scheduler maintains a counter, row index , and a column index to calculate the output indices. Multiplications and divisions by powers of two are replaced by logical shifts for optimizing area and speed. The scheduler also requires a floor operation unit.
As shown in Figures 4(b) and (9), the 2DHaar transformation, , is modeled using a pair of Hadamard gates. The Hadamard pair operation reduces to kernel operations on a set of four coefficients as we described in Algorithm 2. The emulation architecture for the 2DHaar kernel is shown in Figure 6(b). The design takes in four input coefficients, applies the kernel operations which involve addition and division, and outputs four coefficients per clock cycle. Conventional operator sharing techniques and logical shifts are applied to optimize for speed and area.
The final step in the 2DQHT operation is the output permutation, , described by (14). The corresponding emulation architecture is shown in Figure 6(c) and works similar to the input permutation scheduler. The input vector of coefficients are written to a memory array, four values per clock cycle, with the scheduler generating the write indices according to the output permutation described in Algorithm 2. The permuted coefficients are then read out from memory 4 values per clock cycle.
The emulation hardware architectures, i.e., input/output schedulers and 1D/2D Haar kernels, were integrated into a reconfigurable quantum emulator design based on our previous works [21, 22], whose highlevel architecture is shown in Figure 7. The emulator stores input and output quantum states as vectors of the state coefficients and core kernel operations are extracted from the input quantum algorithm. The input state vector goes through the input permutations (input schedulers) before the kernel operation is applied iteratively across each state. To get the correct final quantum state that represents the transformed data, the output permutation (output schedulers) is applied. The architecture uses a fully pipelined dataflow architecture and supports single and doubleprecision floatingpoint arithmetic. For example, each quantum state coefficient is complex and is modeled in 32 bit floatingpoint precision for the real and imaginary components, respectively. The emulator also supports features such as fullyentangled input quantum state preparation from a set of input qubits and output quantum state measurement as a classical bit string. The emulator is generic and can efficiently run a given quantum algorithm that can be reduced to its corresponding unitary transformation.
4. Experimental Work
The experimental work was performed on DS8, a stateoftheart highperformance reconfigurable computing (HPRC) system developed by DirectStream [25]. On the DS8 platform, developers can build applications on hardware systems ranging from singlenode compute instances to multinode structures, see Figure 8. A single C2 compute node of the DS8 system is equipped with a highend IntelAltera Arria 10AX115N4F45E3SG FPGA, with onchip resources such as adaptive logic modules (ALMs), block RAMs (BRAMs), digital signal processors (DSPs), and onboard resources such as two 32 GB SDRAM memory banks and four 8 MB SRAM memory banks, as shown in Figure 8. A userfriendly programming environment, previously known as CarteC [43], is integrated into the DS hardware systems. A highlevel language (HLL) facilitates the development of complex, parallel, and reconfigurable codes in an efficient manner. The study in [44] showed that CarteC has a highly productive environment, short acquisition time, and short learning time as well as a short development time. The DS8 architecture provides a combination of high performance, high scalability, runtime reconfiguration, and ease of use.
(a)
(b)
(c)
The QHT and IQHT architectures were implemented using C++ on the DS8 programming environment. Input images with a resolution of up to , and 256 shades of grayscale pixels, were used to test the designs. MATLAB was used to convert the images into greyscale, generate the input vectors for DS8, and reconstruct images from the output vectors. Synthesis and hardware builds were performed using Quartus Prime Version 17.02 on the DS8 environment. Figure 9(a) shows one of the input images converted to greyscale, and Figure 9(b) is the output after a 1DQHT operation with 1 level of decomposition. Figure 9(c) is the output after a 1DQHT operation with 2 levels of decomposition, and Figure 9(d) shows the reconstructed images after a 1DIQHT operation was applied. Figures 10(a)–10(d) show the results from repeating the experiment using the 2DQHT and 2DIQHT architectures.
(a)
(b)
(c)
(d)
(a)
(b)
(c)
(d)
Resource utilizations from the hardware implementations are summarized in Tables 1 and 2 for 1D and 2D, respectively. The onchip resources (ALMs, BRAMs, DSPs) are used up in implementing the static components of the design such as counters, adders, and shift operators and hence are constant as the emulated circuit size (number of qubits) increases. The low onchip resource utilizations indicate that our proposed approach and emulation architecture designs are highly spaceefficient. The 1DQHT architecture consumes lower onchip resources than 2DQHT due to its less complex kernel operations. The low resource utilizations also indicate the flexibility of the QHT and IQHT designs for integrating with larger algorithms.
 
Total chip resources: N_{ALM} = 427,200; N_{BRAM} = 2,713; N_{DSP} = 1,518. Total onboard SDRAM memory: 2 parallel banks of 32 GB each. 
 
Total chip resources: N_{ALM} = 427,200; N_{BRAM} = 2,713; N_{DSP} = 1,518. Total onboard SDRAM memory: 2 parallel banks of 32 GB each. 
The SDRAM memory requirements for storage of the input and output images as quantum state vectors are also reported in Tables 1 and 2. For the highest resolution image of size , the pixels occupy 25% of the total onboard SDRAM memory (64 GB) available on a single DS node. The pixels of the input images are encoded as basis coefficients of a quantum state. For example, to store or 256 pixels, we need 256 complex coefficients each of which have a real and imaginary component occupying total bytes in 32 bit floatingpoint representation. Therefore, for storing both input and output images, bytes of memory was required. The obtained memory usages for larger QHT circuits are consistent with expected values.
The hardware designs on the FPGA were pipelined to ensure a constant and high operating frequency of 233 MHz. The obtained emulation times for high resolution images are also feasible. For a image, 20 qubits were sufficient for achieving dimension reduction using 1DQHT and 2DQHT. From our experimental results, we observe that the emulation time increases linearly with increase in the number of image pixels (states), as illustrated in Figure 11. This is because a large portion of the emulation time is dedicated to writing in and reading out the input/output state vectors of size N (number of pixels); hence, the emulation time complexity is . This indicates the benefit of using quantum encoding of data, i.e., encoding each image pixel as a basis state coefficient in the quantum state space. Finally, the emulation times for 1DQHT are higher than 2DQHT because of the higher number of iterations in the 1D algorithm, compared to iterations in the 2D algorithm, see Algorithms 1 and 2.
In general, on a classical emulation platform, the emulation execution time increases with both the spatial and temporal complexities of the quantum circuit. In other words, the emulation time of a quantum circuit on a classical platform is generally a function of both the circuit width (number of qubits) and depth (number of gate levels). Due to optimizations and encoding techniques we used, the emulation time of our proposed emulation architectures is a function of only the quantum circuit width (number of qubits), as shown by our experimental results. On stateoftheart superconducting NISQ devices [45, 46], the execution time is a function of only the depth (number of gate levels) of the circuit [47]. For our proposed 1DQHT and 2DQHT circuits, which are simple quantum circuits of depth 1, we estimate an execution time of 0.01 ms on a typical NISQ device processing a qubit array with sampling frequency of 100 kHz [47]. The estimated execution time is constant for a fixed circuit depth and variable number of qubits in the quantum processing unit (QPU) array; i.e., the time complexity is theoretically . In comparison, the time complexity of our emulation is .
Our emulation experiments and implementations help in validating the functionality and feasibility of the proposed QHTbased methodology in achieving dimension reduction of highresolution images. The emulation provides implications for the proposed system’s application in fast, efficient processing of particle tracking data in the largescale, highenergy physics domain. The emulation is memorybound by the resources on a single DS FPGA node. For largerscale emulation, the onboard memory has to be increased, or multinode, and/or multichassis architectures of the DS system can be utilized in conjunction with efficient scheduling techniques and highbandwidth networks [22].
We further quantitatively compare our obtained experimental results with the existing FPGAbased emulation work [48–53] as shown in Table 3. Among the related work on FPGA emulation of quantum circuits, our emulator has the capability of emulating the largest quantum circuits (QFT, QHT, and Grover’s search), with highest operating frequency (233 MHz) and high precision (32 bit floatingpoint). Current FPGA hardwareemulators have many discrepancies (missing resource utilization, operating frequency, and emulation time) in the reporting of their results which makes a comprehensive comparison difficult. In our comparison, we included only hardware emulators, as most parallelsoftwaresimulators are based on largescale supercomputers such as Summit [47] and Sunway [54], which are extremely costly, powerhungry, and resourcehungry and are not comparable with FPGAemulators. Also, they provide simulations of random quantum circuits and not full quantum algorithms.
 
Results obtained at a later time to publication. 
5. Conclusions
Quantum information processing and quantum computing will have significant implications in the future of computing technology. As current quantum technology continues to improve, there is a great need to investigate useful applications in quantum information theory. In this work, we presented a first effort, to the best of our knowledge, to efficiently reduce data dimensionality using quantum processing methods such as quantum wavelet transform. We propose to apply these techniques in physics applications that investigate highenergy particle detection and tracking, where dimension reduction helps to reduce communication bandwidth and speedup preprocessing computations. Our proposed architectures are simpler and optimized for hardware implementation than previously reported works. We demonstrated the minimal resource utilization, high performance/throughout, and high precision of the proposed architectures. We prototyped our designs on a quantum emulator and demonstrated the feasibility of proposed techniques by conducting experiments using highresolution test image data.
Due to limitations of the current state of quantum technology, e.g., cost, availability, and current scale (size) of quantum processors, it is beyond the scope of this work to actually implement the system and measure performance. Although not yet integrated with the ATLAS FTK project, the proposed approach and emulation hardware architectures are feasible for future implementations, with the maturing of current quantum technology. For future integration into the ATLAS FTK project, data conversion techniques such as quantumtoclassical and classicaltoquantum, which are heavilyresearched current topics, must be perfected first, and we plan to conduct investigations of these techniques in our future work. Our future plans also include application of the proposed methods using real HEP data and combining QHT with Grover’s search algorithm as a complete solution to HEP FTK problems. We will also investigate 3DQHT, Daubechies wavelet transforms, and their application for realtime data streaming.
Data Availability
The test data used to support the findings of this study are available from the corresponding author upon request and approval from Direcstream.
Conflicts of Interest
The authors declare that there are no conflicts of interest regarding the publication of this paper.
Acknowledgments
We would like to thank Prof. Alice Bean and Prof. Christopher Rogan from the department of Physics and Astronomy at the University of Kansas for their valuable insights and help in this work.
References
 CERN Accelerating science, “The Higgs boson,” 2019, https://home.cern/science/physics/higgsboson. View at: Google Scholar
 CERN Accelerating science, “Unified forces,” 2019, https://home.cern/science/physics/unifiedforces. View at: Google Scholar
 V. L. Ginzburg, The Physics of a Lifetime: Reflections on the Problems and Personalities of 20th Century Physics, Springer Science & Business Media, Berlin, Germany, 2013.
 I. Kisel, Track Reconstruction and Pattern Recognition in HighEnergy Physics, 2019, https://www.physik.uniheidelberg.de/c/image/exp/f/highrr/Kisel_HD_12.04.2016.pdf.
 G. Aad and ATLAS Collaboration, “The ATLAS experiment at the CERN large Hadron collider,” Journal of Instrumentation, vol. 3, no. 8, 2008. View at: Publisher Site  Google Scholar
 C. O’Luanaigh, New Results Indicate that New Particle is a Higgs Boson, CERN, Geneva, Switzerland, 2019, https://home.cern/news/news/physics/newresultsindicatenewparticlehiggsboson.
 CERN Accelerating science, “The inner detector,” 2019, https://atlas.cern/discover/detector/innerdetector. View at: Google Scholar
 F. Hugging, “The ATLAS pixel detector,” in Proceedings of the IEEE Symposium Conference Record Nuclear Science, vol. 2, pp. 1077–1081, Rome, Italy, October 2004. View at: Google Scholar
 M. Backhaus, “The upgraded pixel detector of the ATLAS experiment for run 2 at the large Hadron collider,” Nuclear Instruments and Methods in Physics Research Section A: Accelerators, Spectrometers, Detectors and Associated Equipment, vol. 831, pp. 65–70, 2016. View at: Publisher Site  Google Scholar
 H. Pernegger, The Pixel Detector of the ATLAS Experiment for LHC Run2, 2015.
 S. Amerio, A. Andreani, A. Andreazza et al., ATLAS FTK: Fast Track Trigger, 2013.
 I. K. Fodor, A Survey of Dimension Reduction Techniques, Center for Applied Scientific Computing, Lawrence Livermore National Laboratory, Livermore, CA, USA, 2002.
 S. Kaewpijit, J. Le Moigne, and T. ElGhazawi, “Automatic reduction of hyperspectral imagery using wavelet spectral analysis,” IEEE Transactions on Geoscience and Remote Sensing, vol. 41, no. 4, 2003. View at: Publisher Site  Google Scholar
 E. ElAraby, T. ElGhazawi, J. Le Moigne, and K. Gaj, “Wavelet spectral dimension reduction of hyperspectral imagery on a reconfigurable computer,” in Proceedings of the IEEE International Conference on FieldProgrammable Technology (FPT), pp. 399–402, Brisbane, Australia, December 2004. View at: Google Scholar
 J. Wickmann, “A wavelet approach to dimension reduction and classification of hyperspectral data,” Faculty of Mathematics and Natural Sciences, University of Oslo, Oslo, Norway, 2007, Masters Thesis. View at: Google Scholar
 A. Fijany and C. P. Williams, Quantum Wavelet Transforms: Fast Algorithms and Complete Circuits, 1998, http://arxiv.org/abs/9809004v1.
 J. Stajic, “The future of quantum information processing,” Science, vol. 339, no. 6124, p. 1163, 2013. View at: Publisher Site  Google Scholar
 S. Heidari, M. Naseri, R. Gheibi, M. Baghfalaki, M. R. Pourarian, and A. Farouk, “A new quantum watermarking based on quantum wavelet transforms,” Communications in Theoretical Physics, vol. 67, no. 6, p. 732, 2017. View at: Publisher Site  Google Scholar
 L. HaiSheng, P. Fan, H. Xia, S. Song, and X. He, “The multilevel and multidimensional quantum wavelet packet transforms,” Scientific Reports, vol. 8, no. 1, 2018. View at: Publisher Site  Google Scholar
 M. A. Nielsen and I. L. Chuang, Quantum Computation and Quantum Information, Cambridge University Press, Cambridge, UK, 2010.
 N. Mahmud and E. ElAraby, “A scalable highprecision and highthroughput architecture for emulation of quantum algorithms,” in Proceedings of the 31st IEEE International SystemonChip Conference (SOCC 2018), Washington, DC, USA, September 2018. View at: Google Scholar
 N. Mahmud and E. ElAraby, “Towards higher scalability of quantum hardware emulation using efficient resource scheduling,” in Proceedings of the 3rd IEEE International Conference on Rebooting Computing (ICRC 2018), Washington, DC, USA, November 2018. View at: Google Scholar
 P. W. Shor, “Algorithms for quantum computation: discrete logarithms and factoring,” in Proceedings of the 35th IEEE Annual Symposium on Foundations of Computer Science (SFCS ’94), pp. 124–134, Santa Fe, NM, USA, November 1994. View at: Google Scholar
 L. K. Grover, “A fast quantum mechanical algorithm for database search,” in Proceedings of the TwentyEighth Annual ACM Symposium on Theory of computing (STOC ’96), pp. 212–219, Philadelphia, PA, USA, May 1996. View at: Google Scholar
 DirectStream LLC, 2019, https://directstream.com.
 Quantum Computing Report, Qubit Technology, 2019, https://quantumcomputingreport.com/scorecards/qubittechnology/.
 L. Gomes, “Quantum computing: both here and not here,” IEEE Spectrum, April 2019, https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=8322045. View at: Google Scholar
 C. P. Williams, Explorations in Quantum Computing, Springer Science & Business Media, Berlin, Germany, 2010.
 L. Grover and T. Rudolph, Creating Superpositions that Correspond to Efficiently Integrable Probability Distributions, 2002, http://arxiv.org/abs/0208112.
 P. Kaye and M. Mosca, Quantum Networks for Generating ArbitraryQuantum States, 2004, http://arxiv.org/abs/0407102.
 X. Yao, H. Wang, Z. Liao et al., “Quantum image processing and its application to edge detection: theory and experiment,” Physical Review X, vol. 7, no. 031041, 2017. View at: Publisher Site  Google Scholar
 S. G. Mallat, “A theory for multiresolution signal decomposition: the wavelet representation,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 11, no. 7, pp. 674–693, 1989. View at: Publisher Site  Google Scholar
 B. Toufik and N. Mokhtar, “The wavelet transform for image processing applications,” in Advances in Wavelet Theory and Their Applications in Engineering, Physics and Technology, pp. 395–422, InTech, Rijeka, Croatia, 2012. View at: Google Scholar
 P. S. Addison, The Illustrated Wavelet Transform Handbook: Introductory Theory and Applications in Science, Engineering, Medicine and Finance, CRC Press, Boca Raton, FL, USA, 2017.
 D. Gosal and W. Lawton, Quantum Haar Wavelet Transforms and Their Applications, 2001.
 H. Ohnishi, H. Matsueda, and L. Zheng, “Quantum wavelet transform and matrix factorization,” in Proceedings of the IEEE International Quantum Electronics Conference, pp. 13271328, Antwerp, Belgium, 2005. View at: Google Scholar
 M. Terraneo and D. L. Shepelyansky, “Imperfection effects for multiple applications of the quantum wavelet transform,” Physical Review Letters, vol. 90, no. 25, 2003. View at: Publisher Site  Google Scholar
 X.H. Song, S. Wang, S. Liu, A. A. Abd ElLatif, and X.M. Niu, “A dynamic watermarking scheme for quantum images using quantum wavelet transform,” Quantum Information Processing, vol. 12, no. 12, pp. 3689–3706, 2013. View at: Publisher Site  Google Scholar
 Y.G. Yang, P. Xu, J. Tian, and H. Zhang, “Analysis and improvement of the dynamic watermarking scheme for quantum images using quantum wavelet transform,” Quantum Information Processing, vol. 13, no. 9, pp. 1931–1936, 2014. View at: Publisher Site  Google Scholar
 N. Ilic, “The ATLAS fast tracker and tracking at the highluminosity LHC,” Journal of Instrumentation, vol. 12, no. 2, 2017. View at: Publisher Site  Google Scholar
 N. Mahmud, E. ElAraby, and D. Caliga, “Scaling reconfigurable emulation of quantum algorithms at high precision and high throughput,” Quantum Engineering, vol. 1, no. 2, 2019. View at: Publisher Site  Google Scholar
 R. K. Brylinski and G. Chen, Mathematics of Quantum Computation, CRC Press, Boca Raton, FL, USA, 2002.
 E. ElAraby, T. ElGhazawi, J. Le Moigne, and R. Irish, “Reconfigurable processing for satellite onboard automatic cloud cover assessment,” Journal of RealTime Image Processing, vol. 4, no. 3, pp. 245–259, 2009. View at: Publisher Site  Google Scholar
 E. ElAraby, S. G. Merchant, and T. ElGhazawi, “Assessing productivity of highlevel design methodologies for highperformance reconfigurable computers,” in HighPerformance Computing Using FPGAs, W. Vanderbauwhede and K. Benkrid, Eds., pp. 719–745, Springer, New York, NY, USA, 2013. View at: Google Scholar
 A Preview of Bristlecone, Googles New Quantum Processor, Google AI Blog, March 2018.
 IBM Announces Advances to IBM Q Systems & Ecosystem, IBM Press Release, November 2017.
 B. Villalonga, D. Lyakh, S. Boixo et al., “Establishing the quantum supremacy frontier with a 281 Pflop/s simulation,” 2019, http://arxiv.org/abs/1905.00444v1. View at: Google Scholar
 M. Fujishima, “FPGAbased highspeed emulator of quantum computing,” in Proceedings of the IEEE International Conference on FieldProgrammable Technology (FPT 2003), Tokyo, Japan, December 2003. View at: Google Scholar
 A. U. Khalid, Z. Zilic, and K. Radecka, “FPGA emulation of quantum circuits,” in Proceedings of the IEEE International Conference on Computer Design: VLSI in Computers and Processors (ICCD 04), pp. 310–315, San Jose, CA, USA, October 2004. View at: Google Scholar
 M. Aminian, M. Saeedi, M. S. Zamani, and M. Sedighi, “FPGAbased circuit model emulation of quantum algorithms,” in Proceedings of the IEEE Computer Society Annual Symposium on VLSI (ISVLSI ’08), pp. 399–404, Montpellier, France, April 2008. View at: Google Scholar
 Y. H. Lee, M. KhalilHani, and M. N. Marsono, “An FPGAbased quantum computing emulation framework based on serialparallel architecture,” International Journal of Reconfigurable Computing, vol. 2016, Article ID 5718124, 18 pages, 2016. View at: Publisher Site  Google Scholar
 A. Silva and O. G. Zabaleta, “FPGA quantum computing emulator using high level design tools,” in Proceedings of the Eight Argentine Symposium and Conference on Embedded Systems (CASE’17), pp. 1–6, Buenos Aires, Argentina, August 2017. View at: Google Scholar
 J. Pilch and J. Długopolski, “An FPGAbased real quantum computer emulator,” Journal of Computational Electronics, pp. 1–14, 2018. View at: Google Scholar
 R. Li, B. Wu, M. Ying, X. Sun, and G. Yang, “Quantum supremacy circuit simulation on Sunway TaihuLight,” 2018, http://arxiv.org/abs/1804.04797. View at: Google Scholar
Copyright
Copyright © 2019 Naveed Mahmud and Esam ElAraby. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.