Research Article  Open Access
Y. H. Lee, M. KhalilHani, M. N. Marsono, "An FPGABased Quantum Computing Emulation Framework Based on SerialParallel Architecture", International Journal of Reconfigurable Computing, vol. 2016, Article ID 5718124, 18 pages, 2016. https://doi.org/10.1155/2016/5718124
An FPGABased Quantum Computing Emulation Framework Based on SerialParallel Architecture
Abstract
Hardware emulation of quantum systems can mimic more efficiently the parallel behaviour of quantum computations, thus allowing higher processing speedup than software simulations. In this paper, an efficient hardware emulation method that employs a serialparallel hardware architecture targeted for field programmable gate array (FPGA) is proposed. Quantum Fourier transform and Grover’s search are chosen as case studies in this work since they are the core of many useful quantum algorithms. Experimental work shows that, with the proposed emulation architecture, a linear reduction in resource utilization is attained against the pipeline implementations proposed in prior works. The proposed work contributes to the formulation of a proofofconcept baseline FPGA emulation framework with optimization on datapath designs that can be extended to emulate practical largescale quantum circuits.
1. Introduction
Quantum computing is based on the properties of quantum mechanics, namely, superposition and entanglement. Superposition allows a quantum state to be in more than one basis state simultaneously, whereas entanglement is the strong correlation between multiqubit (quantum bit) basis states in a quantum system. Superposition and entanglement facilitate massive parallelism which enables exponential speedups to be achieved in the wellknown integer factoring and discrete logarithms algorithms [1] and quadratic speedups in solving classically intractable bruteforce searching and optimization problems [2, 3].
Similar to classical computing, quantum algorithms are developed long before any largescale practical quantum computer is physically available. In 1994, Shor proposed the integer factoring and discrete logarithms algorithms [1] that brought the world’s attention to the enormous potential of quantum computing. An example of this is the RivestShamirAdleman (RSA) security scheme [4] which is widely applied in current public key cryptosystem. It is based on the assumption that integer factoring of large number is intractable in classical computing. Shor’s proposal, which, in contrast, factors integer in polynomial time, would make such security scheme no longer secure. In [5], Grover proposed a quantum search algorithm that is capable of identifying a specific element in an unordered elements database in attempts. This algorithm achieves a quadratic speedup over the corresponding classical method that requires queries on average, to retrieve the desired data. Although the solution is only polynomially faster than the classical approach, Grover’s quantum algorithm is an important one as it can be generalized to be applied in many intractable computer science problems. Recently, quantum equivalents for random walks [6], genetic algorithms [3], and NAND tree evaluation [7] have been developed.
Shor in [8] categorized quantum algorithms known to provide substantial speedup over the classical approach into three types: (a) algorithms that achieve notable speedup by applying quantum Fourier transform (QFT) in periodicity finding; examples of this type of algorithm include integer factoring and discrete logarithms algorithms [1], Simon’s periodicity algorithm [9], Hallgren’s algorithms for Pell’s equation [10], and the quantum algorithms for solving hidden subgroup problems [11, 12]; (b) Grover’s search algorithm and its extensions [2, 13] which in general offer square root speed improvements over their classical counterparts; and (c) algorithms for simulating or solving problems in quantum mechanics [14].
Physical realization of a quantum computer is proving to be extremely challenging [15]. With research into viable largescale quantum computers still ongoing, various technologies, namely, ion trap [16], nuclear magnetic resonance [17], and superconductor [18], were attempted. Nevertheless, only smallscale quantum computation implementations have been achieved [19, 20]. Instead of focusing on the realization of quantum gates, a different approach known as quantum annealing which solves optimization problems by finding the minimum point is used in the 128qubit DWave One, 512qubit DWave Two, and 1000qubit DWave 2X systems [21, 22]. However, based on the research report presented in [23], the expected quantum speedups were not found in the DWave systems.
In parallel to efforts to develop physical quantum computers, there is also much effort in the theoretical research of quantum algorithms. Until largescale practical quantum computers become prevalent, quantum algorithms are currently developed using the classical computing platform. However, due to their inherent sequential behaviour, classical computers that are based on Von Neumann architecture cannot simulate the inherent parallelism in quantum systems efficiently. On the other hand, the technology of field programmable gate array (FPGA) offers the potential of massive parallelism through hardware emulation. Consequently, significant improvement in speed performance over the equivalent software simulation can be achieved. However, FPGA is still a form of classical digital computing, and resource utilization on such a classical computing platform grows exponentially as the number of qubits increases. The problem is further compounded with the fact that accurate modelling of quantum circuit in FPGA technology is nonintuitive and therefore difficult, providing the research motivation for this paper.
This paper presents an efficient FPGA emulation framework for quantum computing. In the proposed emulation model, quantum computations are mapped to a serialparallel architecture that facilitates scalability by managing the exponential growth of resource requirement against number of qubits. Quantum Fourier transform and Grover’s search are chosen as case studies in this work since they are the core of many useful quantum algorithms, and in addition, they have been used as benchmarking models in prior works on FPGA emulation. Experimental results on the efficiencies of different FPGA emulation architectures and fixed point formats are presented, which will sufficiently demonstrate the feasibility of proposed framework.
The rest of this paper is organized as follows: Section 2 discusses prior works on FPGAbased quantum computing emulation, emphasizing issues of hardware architecture and modelling of quantum system on FPGA platform. In Section 3, the theoretical background on quantum computing and related quantum algorithms is provided. Section 4 presents the design of the proposed FPGA emulation models for QFT and Grover’s search algorithms. Experimental results and analysis are given in Section 5. Finally, concluding remarks are made in Section 6.
2. Related Work
Modelling of a quantum system on classical computing platform is a challenging task. Hence, it is even more difficult to map quantum algorithms for emulation on classical computing environment based on FPGA, which is highly resourceconstrained. Many attempts have been made in the last decade in FPGA emulation of quantum algorithms, and these works include [24–27]. However, details of the critical design processes such as mapping of the quantum algorithms into the FPGA emulation models and the verification of the implementations are not revealed in these prior works.
For softwarebased simulation using classical computer, various types of quantum simulators have been proposed. An open source C library, , for simulation of quantum computing is presented in [28] where pure quantum computer simulation as well as general quantum simulation is supported by the tool. In 2007, a variant of binary decision diagram named quantum information decision diagram (QuIDD) for compact state vector storage was introduced in [29] for efficient quantum circuit simulation. García and Markov [30] proposed a compact data structure based on stabilizer formalism called stabilizer frames.
Most of the previous FPGA emulation works are based on the quantum circuit model, which is essentially an interconnection of quantum gates. A different approach was taken by Goto and Fujishima [24] where a general purpose quantum processor was developed instead of applying the quantum circuit model. However, Fujishima’s quantum processor assumed that the amplitudes of a quantum state can be either all zeros or with evenly distributed probability. In its emulation of Shor’s integer factoring algorithm, details of the implementation are inadequate for its results to be verified as claimed. For instance, it is stated in [24] that a 64bit factorization was demonstrated using their emulator with only 40 Kbits of classical memory instead of 320 qubits as required with Shor’s algorithm in a quantum computer. This statement was not supported by design and implementation details on how factorization of such a large integer can be done with only 40 Kbits memory, where typically it would require at least bytes to represent a quantum state of such a scale on the classical platform.
In [25], FPGA emulation of 3qubit QFT and Grover’s search are proposed. In this work, which is based on the quantum circuit model, qubit expansion is performed prior to the application of multiqubit quantum gate transformations. This leads to an inaccurate modelling of a quantum algorithm, since, according to [31], the input quantum state to QFT circuit should first be placed in superposition of basis states, where signal samples are encoded as sequence of amplitudes. In the work by [26], hardware emulation of QFT restricts its input quantum state to the computational basis state, implying that superposition is not included in the modelling. RiveraMiranda et al. in [26] claims 16qubit QFT emulation is achieved. However, the emulator can only process up to 32 input signal samples in one evaluation, which is equivalent to a 5qubit QFT emulation if effects of superposition and entanglement are included.
From the above discussion it should be noted then that the critical quantum properties of superposition and entanglement were not considered in these previous works, resulting in inaccurate modelling of quantum algorithms. Without the superposition and entanglement effects, the power of quantum parallelism cannot fully be exploited. Previous works reported in [25–27] applied pipeline architecture in their FPGA emulation implementations so as to obtain high throughput and low critical path delay. However, a pipeline design imposes high resource utilization (due to the requirement of additional pipeline registers and associated logic), thus limiting FPGA emulation to be deployed in more practical quantum computing applications that typically require high qubit sizes. In these pipeline implementations proposed in prior works, resource growth was exponential to the increase in qubit sizes.
In this paper, the issues outline above is addressed. The efficiencies of different hardware architectural designs for FPGA emulation purposes are evaluated based on the chosen case studies of QFT and Grover’s search. We propose an accurate modelling of quantum system for FPGA emulation, targeting efficient resource utilization while maintaining significant speedup over the equivalent simulation approach. Since our proposed FPGA emulation framework applies the state vector approach, simulation models based on the library are selected in this work for benchmarking purposes.
3. Theoretical Background
In general, quantum algorithms obey the basic process flow structure. The computation process begins with a system set in a specific quantum state, which is then converted into superposition of multiple basis states. Unitary transformations are performed on the quantum state according to the required operations of the algorithm. Finally, measurement is carried out, resulting in the qubits collapsing into classical bits.
3.1. Quantum Bit (Qubit)
In classical computing, the smallest unit of information is the bit. A bit can be in either state or state , and the state of a bit can be represented in matrix form as
On the other hand, in quantum computing, the smallest unit of information is the quantum bit or a qubit. To distinguish the classical bit with the quantum qubit, Dirac ket notation is used. Using the ket notation, the quantum computational basis state is represented by and . A qubit can be in state , or in state , or in superposition of both basis states. The state of a qubit can be represented aswhere both and are complex numbers and . is the probability where the qubit is in state and is the probability where the qubit is in state upon measurement. An qubit quantum state vector contains complex numbers which represents the measurement probability of each basis state. However, on measurement, the superposition is destroyed and the qubits return to the classical state of bits depending on the probability derived from the complexvalued state vector.
3.2. Tensor/Kronecker Product
Tensor product or Kronecker product is the basic operation that is applied in the formation of a larger quantum system as well as multiqubit quantum transformations. A quantum state vector that can be written as the tensor of two vectors is separable, whereas a state vector that cannot be expressed as the tensor of two vectors is entangled [15]. The tensor operation on any arbitrary two 1qubit transformations is shown below:
3.3. Quantum Circuit Model
A quantum algorithm is a description of a sequence of quantum operations (or transformations) applied upon qubits to generate new quantum states. The model most widely used in describing the evolution of a quantum system is the quantum circuit model, first proposed in [32]. A quantum circuit is the interconnection of quantum gates with quantum wires, and gate operations are represented by unitary matrices.
All unitary matrices are invertible and the products of unitary matrices as well as the inverse of unitary matrix are unitary. An by matrix is unitary if , where is the adjoint (conjugate transpose) of . Since all quantum transformations are reversible, quantum gate operations can always be undone. Fundamental quantum gates include the Hadamard gate, phaseshift gate, and swap gate, and these gates are described as follows.
Hadamard gate is one of the most useful single qubit quantum gates. It operates by placing the computational basis state into superposition of basis states with equal probability. The Hadamard transform can be represented by the following unitary matrix:
The following example illustrates the application of Hadamard gates in mapping a 2qubit basis state to a superposition of basis states with equal probability:
Controlled phaseshift gate operates on 2 qubits, one of which is the control qubit and the other is the target qubit. If the control qubit is true, a phaseshift operation is performed on the target qubit; otherwise, there is no operation. The operation is represented by the following matrix:
Quantum gate is used for swapping two qubits. It switches the amplitudes of a quantum state vector. The operation of a 2qubit gate is represented by matrix in
3.4. Quantum Fourier Transform (QFT)
The Fourier transform is deployed in wide range of engineering and physics applications such as signal processing, image processing, and quantum mechanics. It is a reversible transformation that converts signals from time/spatial domain to frequency domain and vice versa. The Fourier transform is defined in (8) for continuous signals and in (9) for discrete signals:
The quantum Fourier transform (QFT) is a transformation on qubits and is the quantum equivalent of the discrete Fourier transform. It should be noted that a quantum computer performs QFT with exponentially less number of operations than the classical Fourier transform. However, QFT does not reduce the execution time of the algorithm when classical data is used. This is due to the characteristic of the quantum computer that does not allow parallel readout of all quantum state amplitudes. In addition, there is no known method that can effectively instantiate the desired input state amplitudes to be Fouriertransformed [33].
In order to harness the power of quantum computing on Fourier transform, QFT has to be deployed within other practical applications. QFT is pivotal in quantum computing since it is part of many quantum algorithms. These algorithms include integer factorization and discrete logarithms algorithms [1], Simon’s periodicity algorithm [9], and Hallgren’s algorithms [10]. They offer significant speedup over their classical counterparts. QFT has also found applications in many realworld problems such as image watermarking [34] and template matching [35].
To compute Fourier transform in quantum domain, discrete signal samples are encoded as the amplitude sequences of a quantum state vector which is in superposition of basis states [31]. An qubit QFT operation which transforms an arbitrary superposition of computational basis states is expressed in
As the requirement for a valid quantum state, must be normalized such that it fulfils (11). If the original signal inputs do not comply with this requirement, the amplitudes of the signal samples have to be divided by the normalization factor, . In most cases, the input states formed by the normalized signal samples are entangled:
From (10), it can be observed that the term in QFT equation is a rational number in the range of . As qubit representation is typically used in computations, the in base10 integer is redefined in base2 notation as individual bit such that the binary fraction form as expressed in (12) can be conveniently adopted:
With some algebraic manipulations, the QFT equation can be derived from (13) to form (14) [33]:
Since the term produces either if or otherwise, Hadamard computation on the first qubit results in . Computations of the consecutive bits in the binary fraction are obtained using controlled phaseshift gates according to (14). QFT circuit consists of three types of elementary gates which are Hadamard gate, , controlled phaseshift gate, , and gate. The circuit model of an qubit QFT is depicted in Figure 1.
The size of a QFT circuit grows exponentially as the number of input qubits increases. An qubit QFT involves unitary transformations and could process up to input samples in one evaluation (provided the input samples are encoded as the amplitude sequences of a superposition of computational basis states).
3.5. Grover’s Search Algorithm
In computer science area, a typical search problem is to identify the desired element from an unordered array. For many computing applications, it is critical that the search technique is efficient. In terms of a function, the search problem is given with assurance that there exists one binary string where if ; else, .
In classical computing, queries on average are required to search for a particular element in an unordered array with elements. In quantum computing, Grover’s search algorithm can complete the job in queries (for the rest of the text and figures in Section 3.5, the required Grover iterations are abbreviated as times). Although the speedup achieved is only quadratic, Grover’s algorithm and its extensions are extremely useful in enhancing current methods in solving database searching and optimization problems, which include 3satisfiability [36], global optimization [37], minimum point searching [38], and pattern matching [39]. The core operations of Grover’s algorithm are phase inversion and inversion about mean. Phase inversion inverts the phase of the stateofinterest, and its quantum circuit model is given in Figure 2.
In Figure 2, the top qubit, , is the target qubit, and the bottom qubit is called the ancilla qubit. The function of is to pick out the desired binary string. To apply phase inversion on target qubits, Hadamard gate operation is performed on the ancilla qubit, which is initialized as . This is to complement the effect of which takes to .
In terms of matrices, the phase inversion operation can be expressed as , and the corresponding quantum states are described as follows:
Inversion about mean boosts the phase separation between the elementofinterest and other elements in the unordered arrays (after phase inversion operation is applied to invert the phase of the target element). The mean of all elements is computed and inversions are made about the mean. The overall mean remains unchanged after the inversion process. This is because the distance between one element and the mean is the same before and after inversion. The only change is if the original sequence is above the mean, during the inversion it is flipped about the mean to the same distance below the mean and vice versa. In general, the inversion about mean operation can be expressed as where is the mean, is the value of an element in the array, and is the new value of that element after inversion.
In terms of matrices, the mean of a elements vector is obtained by the product of matrices and where all the elements in the by matrix are set to . Hence, inversion about mean in matrix form becomes
In order to achieve high confidence of getting the desired element, the amplitude amplification process (amplitude amplification in Grover’s search algorithm involves phase inversion and inversion about mean operations) has to be repeated for times. This is because the probability of success changes sinusoidally by the number of amplitude amplification iterations (as illustrated in Figure 3) and the highest probability of success first happened after the required iterations. Pseudocode for a generic Grover’s search algorithm is given in Algorithm 1.

There are two approaches to model Grover’s search quantum algorithm. The first approach, which is based on quantum circuit model, is discussed next. The second method is modelling using arithmetic functions, and this is presented in Section 4.2 since this approach is applied in this paper.
As shown in Figure 4, Grover’s search circuit given in [15] is constructed with assumption that black box modules and are available. Descriptions of the and modules have been given earlier.
On the other hand, the circuit model for qubit Grover’s algorithm presented in [33] is shown in Figure 5(a). In this figure, is the Hadamard gate, and is the Grover iteration circuit, which is illustrated in Figure 5(b).
(a) qubit Grover’s search
(b) Grover iteration circuit,
The function of Grover iteration circuit is equivalent to the phase inversion and inversion about mean. In order to achieve high probability of successful search, is concatenated for times. In Figure 5(b), the role of oracle module is to recognize the solution to a particular search problem in the phase inversion operation. By monitoring the oracle qubit, a solution to the search problem can be detected through the changes of the oracle qubit. The design of oracle module varies with different search applications, and an example of the oracle circuit for a simple 3bit search task is shown in Figure 6. In Figure 6, the symbol represents Pauli matrix. The open circle notation indicates conditioning on the qubit being set to zero, whereas the closed circle indicates conditioning on the qubit being set to one.
Deriving from the circuits in Figure 5, the corresponding 3qubit Grover’s search circuit is provided in Figure 7. This circuit model is made up of Hadamard, oracle, quantum NOT, and multiqubit controlledNOT gates.
4. Proposed FPGABased Hardware Emulation
This section presents our approach in modelling quantum Fourier transform and Grover’s search algorithms for FPGA emulation. The proposed techniques can be generalized to FPGA emulation of more complex quantum algorithms that apply QFT or Grover’s algorithm. This paper extends our earlier work presented in [40, 41]. In these previous works, the hardware architecture proposed was restricted to the serial design with resource sharing facilitated at the register level. In this paper, we enable resource sharing at the operator (or computational) level that allows for more efficient emulation of the quantum algorithms. Furthermore, additional case study on Grover’s search algorithm is included for generalization of the proposed framework. The choice of hardware architecture varies based on the need of different applications. Based on the selected case studies, the efficiencies of different hardware architectures for quantum computing emulation purposes are discussed and analysed in this work. The goal is to achieve scalability and also efficient resource utilization for emulating practical larger qubit size quantum systems.
4.1. Modelling QFT for FPGA Emulation
The derivation of quantum circuit model for qubit QFT was discussed earlier is Section 3.4. Here, we present the modelling of QFT for FPGA emulation based on a 3qubit example. According to (14), the mathematical expression for 3qubit QFT is derived as shown in
Deriving from the general qubit QFT circuit provided in Figure 1, the corresponding quantum circuit for 3qubit QFT is obtained as shown in Figure 8.
The circuit consists of Hadamard gates, , controlled phaseshift gates, and , and also the gate. Referring to the functional block diagram given in Figure 9, this quantum circuit model corresponds to a sequence of unitary transformations, , to , defined by
Note that, in Figure 9, the modelling of qubit quantum system with superposition and entanglement properties resulted in a circuit with signals. Since the input samples to QFT circuit are encoded as sequence of amplitudes in an entangled superposition of basis states (discussed in Section 3.4), modelling based on individual qubit with separate quantum gate operations is unable to reflect the effects of applying a quantum gate on entangled qubits correctly.
In order to model the effect of superposition and entanglement, derivation of each unitary transformation is made through the tensor product of individual quantum gate and identity matrix to form unitary matrix of equal dimension with the quantum state vector. Detailed derivations of the quantum unitary matrices for the 3qubit QFT have been presented in our previous paper [41].
Since these quantum unitary matrices are mostly sparse matrices, we extract minimal number of useful arithmetic operations (due to nonzero elements in the matrices), resulting in an optimal realization of the model that can be mapped to an efficient FPGA emulation architecture. Incidentally, a software program has been developed to automate this mapping, hence easily scaling up the circuit model to larger qubit sizes. From these arithmetic functions, the corresponding dataflow graph for the 3qubit QFT is derived as shown in Figure 10.
In Figure 10, each and signal is a fixed point complex number register for an element of the quantum state vector. The operation of module corresponds to and , where is the multiplication of the input complex number with imaginary number . This is applied in unitary transformations and . The operation of module in Figure 10 is described in (26). The function of is the multiplication of the input complex number with a constant complex number which is derived based on the controlled phaseshift gate. As expressed previously in (21), it is used in unitary transformation :
4.2. Modelling Grover’s Search for FPGA Emulation
As mentioned earlier, the second approach of modelling Grover’s quantum search is modelling using arithmetic functions (based on mathematical model). In this work, we have chosen this approach of modelling Grover’s algorithm for FPGA emulation. In contrast to the quantum circuit model approach that involves complex large dimensional matrix operations (i.e., matrix multiplication and tensor product), the chosen method can utilize the computational resources available on FPGA such as comparators, adders/subtracters, and multiplexers for efficient emulation of Grover’s search algorithm.
This technique is based on phase inversion and inversion about mean as described in Section 3.5. As shown in Figure 11, mathematically, Grover’s search for a database with elements mainly involves the processes of inverting the phase of target element, function , and performing inversion about mean on all elements, function , for times. Initialization of the quantum state vector with equal probability, function INIT, is carried out once in the beginning of the process.
For the case study of a 3bit search problem, we derive the dataflow graphs and obtain the result of the required arithmetic functions as shown in Figure 12. For experimental purposes, the oracle module is developed by comparing all elements in database with targeted element using comparators, modules. The arithmetic functions of the inversion about mean are derived through straightforward computations that involve summation, bit shift, and subtraction.
4.3. Architecture of Proposed FPGA Emulation Model
It is clear that if implemented on classical computing platforms, the resource utilization for a quantum system would grow exponentially. Hence, the choice of suitable architecture is critical for FPGAbased quantum computing emulation. Here, we discuss the efficiencies of different architectural choices in datapath: concurrent, pipeline, serial, and serialparallel. The block diagram of various architectures is constructed based on the example of 3qubit QFT.
4.3.1. Concurrent Processing
In concurrent processing of an algorithm, all computations are completed within a clock cycle. In the case of 3qubit QFT, computation blocks between the input and output registers, through the functional blocks, to are performed in one clock cycle (refer to Figure 13(a)). However, such an architecture consumes enormous resources, such that the number of registers required to emulate an nqubit QFT is . In addition, the critical path delay is very high which results in unrealistic low operating frequency.
(a) Concurrent processing
(b) Pipeline architecture
(c) Serial architecture
4.3.2. Pipeline Architecture
Most of the prior works on FPGAbased quantum circuit emulation [25–27] are developed based on pipeline architecture. The pipeline architecture has the advantages of high throughput and much shorter critical path delay. Figure 13(b) shows the proposed pipeline architecture of the 3qubit QFT. However, the main issue of this approach is that resource utilization grows drastically by the number of qubits, due to the circuit augmentation of pipeline registers. pipeline registers are required to emulate nqubit QFT. Consequently, hardware emulation scalability is highly constrained by the available resources in FPGA.
4.3.3. Serial Processing
Although serial design requires multiple iterations to perform a complex computation, it opens up the opportunity for resource sharing. Serial processing is suitable for applications where resource utilization is a critical design constraint. Figure 13(c) depicts the serial form of the 3qubit QFT circuit that consists of a control unit and a datapath unit. As resources can be reused between transformations, a serialbased nqubit QFT consumes registers, a register utilization that is much lower than in concurrent or pipeline architectures. However, pure serial approach forfeits the purpose of conducting FPGA emulation whose aim is to exploit the parallelism inherent in a quantum system, as it would still suffer from slow sequential behaviour as exhibited in simulation on classical computer.
4.3.4. Proposed SerialParallel Architecture
In this paper, we propose a hybrid serialparallel architecture for FPGA emulation of quantum algorithms. The proposed approach takes advantage of both serial and parallel design techniques. Applying the concepts of quantum parallelism and quantum dynamics modelled by sequential transformations on a quantum state vector, it is found that the proposed serialparallel architecture is suitable for efficient and accurate quantum computing emulation on FPGA platform.
Figure 14 shows the functional block diagram of the proposed serialparallel FPGA emulation architecture of the 3qubit QFT. The serialparallel design of the datapath unit involves a number of quantum computation units that can perform parallel computations for each stage of unitary transformation whereby the same computational resources can be reused for the following stage of transformations.
For data storage and synchronization purposes, registers are shared between unitary transformations. As compared to the pipeline design, our proposed serialparallel approach achieves linear reduction on the usage of registers to emulate the same quantum system. The arithmetic logic unit (ALU) in the datapath unit contains multiple custom processing elements and the allocation of resources in ALU varies based on the target application. The number of processing elements is basically determined based on the desired parallelism in the nqubit quantum system.
As illustrated in Figures 10 and 12, the dataflow graph of QFT and Grover’s search algorithm exhibit similar repetitive pattern between unitary transformations. This implies that the proposed serialparallel approach can be generalized for the two case studies to achieve balance in both resource utilization and speed performance. For the case of Grover’s search, the ALU would be the Grover iteration module shown in Figure 12, whereas the control unit is designed to keep track on the number of Grover iterations required by the target search problem.
5. Experimental Results
The proposed emulation designs are modelled in SystemVerilog HDL, synthesized using Altera Quartus II software, and implemented into target emulation platform which is based on Altera Stratix IV EP4SGX530KF43C4 FPGA. In Section 5.1, we discuss the verification of the hardware emulation designs for the QFT and Grover’s search case studies. In addition, the automated process for scaling up the design to larger qubit size is described. In Section 5.2, investigation is conducted to study the effects of the number of mantissa bits used in our fixed point representation format on resource utilization and precision error. In the section that follows, we analyse, for different emulation architectures, how the increase in qubit size impacts on resource growth and maximum operating frequency allowed in the designs. Finally, the runtime speeds in simulation and emulation for quantum algorithms with different qubit sizes are compared.
5.1. Design Verification
To show that the quantum algorithms have been modelled accurately, design verification of the proposed emulation hardware is performed. Golden references based on the software simulation models (in C) are developed and their outputs are compared with the emulation hardware under test (which are described in SystemVerilog HDL).
In our QFT case study, FFTW3 [42], a widely applied fast Fourier transform library in C, is used to perform Fourier transform computations on the signal samples that are used in this work. The outputs of the classical Fourier transform then serve as the golden reference model in verification of the proposed emulation model. Furthermore, since the discrete Fourier transform (DFT) is a linear transformation that can be defined in unitary matrix form, the functional correctness of our QFT hardware emulation model can be conveniently verified against the DFT matrix. The expression of an qubit DFT matrix is shown in where is the th root of unity; that is, . The choice of or is purely a matter of convention as both the term and the term to the power of are equal to .
On the other hand, the design of Grover’s search FPGA emulation model is verified against the mathematical model provided in literature. The simulation model of Grover’s search algorithm also serves as the golden reference model for verification purposes.
For the development of FPGA emulation models for practical quantum computing applications, it is important that the emulation hardware can be scaled up to larger qubit size architectures. In this work, the designs are scaled up with the aid of software program developed inhouse. HDL codes of the two case studies are autogenerated by the software program based on the proposed modelling techniques (as discussed in Section 4). The generated HDL code produces efficient hardware emulation model based on the proposed serialparallel architecture.
5.2. Fixed Point Representation
As defined in (2), a quantum state vector is represented by complex floating point numbers. To ensure effective resource utilization in our FPGA emulation hardware, floating point numbers are replaced by fixed point representations. In this work, a fixed point format with 1 sign bit, 1 integer bit, and mantissa bits (as shown in Figure 15) is used. Since the amplitudes of a quantum state, that is, the probabilities of collapsing into computational basis states after measurement, are constrained in the range of 0 to 1, only one bit is required to represent the integer part.
Due to the limitations of the classical digital computing platform, representation of qubit amplitudes with infinite precision is infeasible. In the context of quantum computer modelling, particularly for quantum systems with large qubit sizes, minimising precision loss is critical to preserve the consistency of the quantum state during the modelling process [43]. Here, we investigate how precision error is affected by the number of mantissa bits used in our fixed point representation format. The corresponding experimental results are given in Figure 16.
Precision error shown in Figure 16 is computed based on the following equations:where and are the floating point real and imaginary amplitudes of the reference output state generated from simulation, whereas and are the amplitudes extracted from the output state of proposed FPGA emulator (converted from original fixed point format to floating point for verification purposes).
Figure 16 shows that 16bit fixed point format (14 mantissa bits) incurs significant precision error for 5qubit emulations in both case studies. However, the error produced by 2qubit emulations is insignificant. This behaviour is because the amplification of the fixed point truncation errors grows with the size of quantum system. For FPGA emulation purposes, precision error due to fixed point representation can be reduced by increasing the number of mantissa bits with tradeoff on resource utilization. By expanding the number of mantissa bits up to 24bit (which results in 26, total number of bits), negligible precision error for 5qubit emulations is attained. It is important to note that the proposed FPGA emulator is parameterizable in terms of the number of mantissa bits for fixed point representation. This is crucial to ensure different fixed point formats can be applied to emulate quantum circuit of various sizes based on the demanded precision error tolerance and resource constraint.
The size of our fixed point number format also affects resource utilization. The corresponding experimental results for QFT case study are shown in Figure 17. Since the resources available on FPGA device are mostly in blocks or multiples of 8 bits, the choice of 26bit fixed point format is not suitable. Hence, we apply 22mantissabit size (i.e., total number of bits is 24) in our fixed point representation formats. Note that, for Grover’s search emulations, the experiment on resource utilization of DSP blocks is not relevant because FPGA emulation model developed here (as depicted in Figure 12) does not involve multiplication.
(a) Logic element usage
(b) Register usage
(c) DSP block usage
5.3. Efficiency of Proposed FPGA Emulation Architecture
In this subsection, we investigate how the increase in qubit size impacts resource growth and maximum allowable operating frequency in different emulation architectures (i.e., concurrent, pipeline, and serialparallel). In the case of QFT emulation model, we have two versions of serialparallel architectures. Type 1 serialparallel uses DSP blocks to perform multiplications, whereas type 2 replaces the multiplications (required in Hadamard gate operations, , , and , in the 3qubit QFT case) with shiftadd operations. Although conventional hardware design methods encourage the usage of shiftadd operation instead of multiplication to reduce resource utilization, the case is now different with FPGA devices containing efficient builtin DSP blocks. Figures 18 and 19 show the results of the experiments conducted on QFT and Grover’s search algorithms, respectively.
(a) Logic element usage
(b) Register usage
(c) DSP block usage
(d) Maximum operating frequency
(a) Logic element usage
(b) Register usage
(c) Maximum operating frequency
Based on the experimental results shown in Figure 18, when comparing to type 1, type 2 method consumes less DSP blocks but more logic elements due to the construction of adders needed in shiftadd operations. As the number of qubits increases, resource utilization of logic elements for type 1 emulation grows rapidly when available DSP blocks are used up and it ends up with similar resource utilization as the type 2 approach. Hence, we can conclude that, for largescale FPGA emulations, both methods would lead to similar resource utilization. Thus, the first approach is preferred due to the ease of design process where DSP blocks are utilized by default when implementing the design with the FPGA design automation tool.
In contrast to the concurrent and pipeline designs, the experimental results for QFT and Grover’s search show that the proposed serialparallel architecture achieves balance on both resource utilization and operating frequency. The proposed architecture has significantly reduced resource growth in the application of logic elements, dedicated logic registers, and DSP block, yet maintaining reasonable operating frequency. With the concurrent and pipeline architectures, 5qubit QFT emulation completely used up the resources available in the Altera Stratix IV FPGA device used in this work. However, the same device can support up to 7qubit QFT emulation with the proposed serialparallel architecture. It is important to note that the processing power of 7qubit QFT is far higher than the 5qubit implementation as an qubit QFT can process up to samples in one evaluation.
For scalability, software simulation would depend on resources available on the computer servers. The scalability of the FPGA emulation framework would depend on the resources available on target FPGA devices. As there have been rapid advances in FPGA technology in recent years, by designing an efficient architecture that is implemented in a highdensity FPGA device (such as the Altera Stratix 10 that contains up to 5.5 million logic elements), one can actually emulate large qubit size quantum circuit on a single FPGA chip. Furthermore, new approach to FPGA emulation may be made through the exploration of efficient data structures and modelling methods. Thus, the proposed work contributes to the formulation of a proofofconcept baseline FPGA emulation framework with optimization on datapath designs that can be extended to emulate practical largescale quantum circuits.
5.4. Benchmarking between Simulation and Emulation
Here, our emulation models are benchmarked against the equivalent software simulations. The simulation models used are based on an open source quantum library, [28]. Software simulation is performed on an Intel Core i74790 eightcore processor with 3.6 GHz clock rate running on a Linuxbased Ubuntu 14.04 kernel, whereas hardware emulation is based on the Altera Stratix IV EP4SGX530KF43C4 FPGA. Table 1 shows the runtime comparison between simulation and our emulation.

Figure 20 illustrates the runtime speedup (simulation/emulation) of Grover’s search case study. It is important to note that the proposed hardware emulation is implemented based on 24bit fixed point format (which is determined based on the experimental work presented in Section 5.2) whereas 32bit single precision float is used in the software simulation to describe the quantum state. For a larger qubit size quantum circuit, the number of mantissa bits for fixed point representation has to be increased accordingly to ensure similar precision as in software simulation can be achieved with tradeoffs on resource utilization and execution speed [44].
From Figure 20, it can be observed that the proposed hardware emulation provides significant speedup over software simulation using . It is important to note that the achieved speedup increases drastically as the number of qubits increases. This result further supports the notion that hardware emulation has significant potential in the modelling of a largescale quantum system on the classical computing platform based on FPGA.
As the number of required I/O pins for emulating QFT and Grover’s search algorithms with parallel readouts is too much to fit in the existing FPGA devices, boardlevel verification is infeasible. Although the usage of multiplexers reduces the number of output pins, resource consumption rises significantly with the increase in the number of qubits, and this affects the analysis of the overall experiment. Thus, the estimated runtime of the proposed FPGA emulation architecture is obtained based on the hardware clock cycle and the operating frequency that is acquired from the FPGA development tool. This is not a critical issue in future practical deployment as the selected case studies are meant to work as core modules within other quantum applications that might not require parallel readout.
6. Conclusion and Future Work
Efficient resource utilization is critical for FPGAbased implementations especially for emulating quantum computing applications as they typically exhibit exponential resource requirement with increasing number of qubits. In this paper, we proposed a baseline FPGA emulation framework with focus on the datapath design optimization based on the conventional state vector model as well as an effective methodology that facilitates accurate modelling of quantum algorithms for FPGA emulation. A serialparallel architecture with efficient resource utilization for FPGAbased emulation of quantum computing is presented. The proposed emulation architecture achieves linear reduction in resource utilization compared to pipeline implementations as found in previous works. This work has also demonstrated the advantage of FPGA emulation over software simulation where hardware emulation of 7qubit Grover’s search is about times faster than the software simulation performed on Intel Core i74790 eightcore processor running at 3.6 GHz clock rate.
However, experimental results obtained in this work show that it is difficult to realize a scalable and flexible emulation platform for large qubit size realworld quantum system using the approach that applies existing state vector quantum models. This concurs with [45] that states that the practical limit on the size of the quantum system that can be modelled on classical computing platform can hardly be overcome due to exponentially large memory requirements for storing the entire state vector. Hence, this suggests that a model with a more effective data structure to represent quantum systems is required. Recently, the work on stabilizer frames [30] has shown promise in providing a more suitable data structure for quantum models targeted for FPGA emulation. This is the subject of future work in our research in applying FPGA emulation in modelling of largescale quantum systems. In addition, the errorcorrection structure available with stabilizer frames will be considered for application in practical quantum computations. With a more efficient modelling technique, FPGA can represent a more efficient emulation strategy of quantum systems.
Competing Interests
The authors declare that they have no competing interests.
Acknowledgments
This work is supported by the Ministry of Higher Education (MOHE) and Universiti Teknologi Malaysia (UTM) under Fundamental Research Grant Scheme (FRGS) Vote number 4F422.
References
 P. W. Shor, “Algorithms for quantum computation: discrete logarithms and factoring,” in Proceedings of the 35th IEEE Annual Symposium on Foundations of Computer Science (SFCS '94), pp. 124–134, Santa Fe, NM, USA, November 1994. View at: Google Scholar
 L. K. Grover, “Quantum mechanics helps in searching for a needle in a haystack,” Physical Review Letters, vol. 79, no. 2, pp. 325–328, 1997. View at: Publisher Site  Google Scholar
 A. Malossini and T. Calarco, “Quantum genetic optimization,” IEEE Transactions on Evolutionary Computation, vol. 12, no. 2, pp. 231–241, 2008. View at: Publisher Site  Google Scholar
 R. L. Rivest, A. Shamir, and L. Adleman, “A method for obtaining digital signatures and publickey cryptosystems,” Communications of the ACM, vol. 21, no. 2, pp. 120–126, 1978. View at: Publisher Site  Google Scholar  MathSciNet
 L. K. Grover, “A fast quantum mechanical algorithm for database search,” in Proceedings of the 28th Annual ACM Symposium on Theory of Computing, pp. 212–219, ACM, 1996. View at: Publisher Site  Google Scholar  MathSciNet
 N. Shenvi, J. Kempe, and K. B. Whaley, “Quantum randomwalk search algorithm,” Physical Review A: Atomic, Molecular, and Optical Physics, vol. 67, no. 5, Article ID 052307, 2003. View at: Google Scholar
 E. Farhi, J. Goldstone, and S. Gutmann, “A quantum algorithm for the Hamiltonian NAND tree,” Theory of Computing, vol. 4, pp. 169–190, 2008. View at: Publisher Site  Google Scholar  MathSciNet
 P. W. Shor, “Why haven't more quantum algorithms been found?” Journal of the ACM, vol. 50, no. 1, pp. 87–90, 2003. View at: Publisher Site  Google Scholar  MathSciNet
 D. R. Simon, “On the power of quantum computation,” SIAM Journal on Computing, vol. 26, no. 5, pp. 1474–1483, 1997. View at: Publisher Site  Google Scholar  Zentralblatt MATH  MathSciNet
 S. Hallgren, “Polynomialtime quantum algorithms for Pell's equation and the principal ideal problem,” Journal of the ACM, vol. 54, article 4, 2007. View at: Publisher Site  Google Scholar  MathSciNet
 M. Mosca and A. Ekert, “The hidden subgroup problem and eigenvalue estimation on a quantum computer,” in Quantum Computing and Quantum Communications, pp. 174–188, Springer, Berlin, Germany, 1999. View at: Publisher Site  Google Scholar  MathSciNet
 D. Bacon, A. M. Childs, and W. Van Dam, “From optimal measurement to efficient quantum algorithms for the hidden subgroup problem over semidirect product groups,” in Proceedings of the 46th Annual IEEE Symposium on Foundations of Computer Science (FOCS '05), pp. 469–478, IEEE, October 2005. View at: Publisher Site  Google Scholar
 L. K. Grover and A. M. Sengupta, “From coupled pendulums to quantum search,” in Mathematics of Quantum Computation, pp. 119–134, Chapman and Hall/CRC, 2002. View at: Google Scholar
 R. P. Feynman, “Simulating physics with computers,” International Journal of Theoretical Physics, vol. 21, no. 67, pp. 467–488, 1982. View at: Publisher Site  Google Scholar  MathSciNet
 N. S. Yanofsky and M. A. Mannucci, Quantum Computing for Computer Scientists, vol. 20, Cambridge University Press, Cambridge, UK, 2008. View at: Publisher Site  MathSciNet
 C. Monroe, D. M. Meekhof, B. E. King, W. M. Itano, and D. J. Wineland, “Demonstration of a fundamental quantum logic gate,” Physical Review Letters, vol. 75, no. 25, pp. 4714–4717, 1995. View at: Publisher Site  Google Scholar  Zentralblatt MATH  MathSciNet
 N. A. Gershenfeld and I. L. Chuang, “Bulk spinresonance quantum computation,” Science, vol. 275, no. 5298, pp. 350–356, 1997. View at: Publisher Site  Google Scholar  MathSciNet
 J. E. Mooij, T. P. Orlando, L. Levitov, L. Tian, C. H. Van der Wal, and S. Lloyd, “Josephson persistentcurrent qubit,” Science, vol. 285, no. 5430, pp. 1036–1039, 1999. View at: Publisher Site  Google Scholar
 I. L. Chuang, N. Gershenfeld, and M. Kubinec, “Experimental implementation of fast quantum searching,” Physical Review Letters, vol. 80, no. 15, article 3408, 1998. View at: Publisher Site  Google Scholar
 R. Barends, J. Kelly, A. Megrant et al., “Superconducting quantum circuits at the surface code threshold for fault tolerance,” Nature, vol. 508, no. 7497, pp. 500–503, 2014. View at: Publisher Site  Google Scholar
 M. H. Amin, N. G. Dickson, and P. Smith, “Adiabatic quantum optimization with qudits,” Quantum Information Processing, vol. 12, no. 4, pp. 1819–1829, 2013. View at: Publisher Site  Google Scholar  Zentralblatt MATH  MathSciNet
 A. D. King, E. Hoskinson, T. Lanting, E. Andriyash, and M. H. Amin, “Degeneracy, degree, and heavy tails in quantum annealing,” http://arxiv.org/abs/1512.07325. View at: Google Scholar
 T. F. Rønnow, Z. Wang, J. Job et al., “Defining and detecting quantum speedup,” Science, vol. 345, no. 6195, pp. 420–424, 2014. View at: Publisher Site  Google Scholar
 Y. Goto and M. Fujishima, “Efficient quantum computing emulation system with unitary macrooperations,” Japanese Journal of Applied Physics, vol. 46, no. 4, pp. 2278–2282, 2007. View at: Publisher Site  Google Scholar
 A. U. Khalid, Z. Zilic, and K. Radecka, “FPGA emulation of quantum circuits,” in Proceedings of the IEEE International Conference on Computer Design: VLSI in Computers and Processors (ICCD '04), pp. 310–315, IEEE, October 2004. View at: Google Scholar
 J. F. RiveraMiranda, Á. J. CaicedoBeltrán, J. D. ValenciaPayán, J. M. EspinosaDuran, and J. VelascoMedina, “Hardware emulation of quantum Fourier transform,” in Proceedings of the IEEE 2nd Latin American Symposium on Circuits and Systems (LASCAS '11), pp. 1–4, IEEE, Bogata, Colombia, February 2011. View at: Publisher Site  Google Scholar
 M. Aminian, M. Saeedi, M. S. Zamani, and M. Sedighi, “FPGAbased circuit model emulation of quantum algorithms,” in Proceedings of the IEEE Computer Society Annual Symposium on VLSI (ISVLSI '08), pp. 399–404, IEEE, Montpellier, France, April 2008. View at: Google Scholar
 H. Weimer, M. Müller, I. Lesanovsky, P. Zoller, and H. P. Büchler, “A Rydberg quantum simulator,” Nature Physics, vol. 6, no. 5, pp. 382–388, 2010. View at: Publisher Site  Google Scholar
 G. F. Viamontes, Efficient quantum circuit simulation [Ph.D. thesis], The University of Michigan, Ann Arbor, Mich, USA, 2007.
 H. J. García and I. L. Markov, “Simulation of quantum circuits via stabilizer frames,” IEEE Transactions on Computers, vol. 64, no. 8, pp. 2323–2336, 2015. View at: Publisher Site  Google Scholar  MathSciNet
 C. P. Williams and S. H. Clearwater, Explorations in Quantum Computing, Springer, Berlin, Germany, 1998. View at: MathSciNet
 A. Barenco, D. Deutsch, A. Ekert, and R. Jozsa, “Conditional quantum dynamics and logic gates,” Physical Review Letters, vol. 74, no. 20, pp. 4083–4086, 1995. View at: Publisher Site  Google Scholar
 M. A. Nielsen and I. L. Chuang, Quantum Computation and Quantum Information, Cambridge University Press, 2000. View at: MathSciNet
 W.W. Zhang, F. Gao, B. Liu, Q.Y. Wen, and H. Chen, “A watermark strategy for quantum images based on quantum Fourier transform,” Quantum Information Processing, vol. 12, no. 2, pp. 793–803, 2013. View at: Publisher Site  Google Scholar  Zentralblatt MATH  MathSciNet
 D. Curtis and D. A. Meyer, “Towards quantum template matching,” in Proceedings of the SPIE 48th Annual Meeting in Quantum Communications and Quantum Imaging, pp. 134–141, International Society for Optics and Photonics, August 2003. View at: Google Scholar
 R. Schützhold and G. Schaller, “Adiabatic quantum algorithms as quantum phase transitions: first versus second order,” Physical Review A, vol. 74, no. 6, Article ID 060304, 2006. View at: Publisher Site  Google Scholar
 W. P. Baritompa, D. W. Bulger, and G. R. Wood, “Grover's quantum algorithm applied to global optimization,” SIAM Journal on Optimization, vol. 15, no. 4, pp. 1170–1184, 2005. View at: Publisher Site  Google Scholar  MathSciNet
 C. Durr and P. Hoyer, “A quantum algorithm for finding the minimum,” http://arxiv.org/abs/quantph/9607014. View at: Google Scholar
 A. Montanaro, “Quantum pattern matching fast on average,” Algorithmica, 2015. View at: Publisher Site  Google Scholar
 Y. H. Lee, M. KhalilHani, and M. N. Marsono, “FPGAbased quantum circuit emulation: a case study on Quantum Fourier transform,” in Proceedings of the 14th International Symposium on Integrated Circuits (ISIC '14), pp. 512–515, IEEE, Singapore, December 2014. View at: Publisher Site  Google Scholar
 M. KhalilHani, Y. H. Lee, and M. N. Marsono, “An accurate FPGAbased hardware emulation on quantum fourier transform,” in Proceedings of the Australasian Symposium on Parallel and Distributed Computing (AusPDC '15), vol. 1, Sydney, Australia, 2015, a1b3. View at: Google Scholar
 M. Frigo and S. G. Johnson, “The design and implementation of FFTW3,” Proceedings of the IEEE, vol. 93, no. 2, pp. 216–231, 2005. View at: Publisher Site  Google Scholar
 D. Wecker and K. M. Svore, “LIQUi∣>: a software design architecture and domainspecific language for quantum computing,” http://arxiv.org/abs/1402.4467. View at: Google Scholar
 S. Kilts, Advanced FPGA Design: Architecture, Implementation, and Optimization, John Wiley & Sons, New York, NY, USA, 2007.
 M. Smelyanskiy, N. P. D. Sawaya, and A. AspuruGuzik, “qHiPSTER: the quantum high performance software testing environment,” http://arxiv.org/abs/1601.07195. View at: Google Scholar
Copyright
Copyright © 2016 Y. H. Lee et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.