Abstract

Hardware emulation of quantum systems can mimic more efficiently the parallel behaviour of quantum computations, thus allowing higher processing speed-up than software simulations. In this paper, an efficient hardware emulation method that employs a serial-parallel hardware architecture targeted for field programmable gate array (FPGA) is proposed. Quantum Fourier transform and Grover’s search are chosen as case studies in this work since they are the core of many useful quantum algorithms. Experimental work shows that, with the proposed emulation architecture, a linear reduction in resource utilization is attained against the pipeline implementations proposed in prior works. The proposed work contributes to the formulation of a proof-of-concept baseline FPGA emulation framework with optimization on datapath designs that can be extended to emulate practical large-scale quantum circuits.

1. Introduction

Quantum computing is based on the properties of quantum mechanics, namely, superposition and entanglement. Superposition allows a quantum state to be in more than one basis state simultaneously, whereas entanglement is the strong correlation between multiqubit (quantum bit) basis states in a quantum system. Superposition and entanglement facilitate massive parallelism which enables exponential speed-ups to be achieved in the well-known integer factoring and discrete logarithms algorithms [1] and quadratic speed-ups in solving classically intractable brute-force searching and optimization problems [2, 3].

Similar to classical computing, quantum algorithms are developed long before any large-scale practical quantum computer is physically available. In 1994, Shor proposed the integer factoring and discrete logarithms algorithms [1] that brought the world’s attention to the enormous potential of quantum computing. An example of this is the Rivest-Shamir-Adleman (RSA) security scheme [4] which is widely applied in current public key cryptosystem. It is based on the assumption that integer factoring of large number is intractable in classical computing. Shor’s proposal, which, in contrast, factors integer in polynomial time, would make such security scheme no longer secure. In [5], Grover proposed a quantum search algorithm that is capable of identifying a specific element in an unordered elements database in attempts. This algorithm achieves a quadratic speed-up over the corresponding classical method that requires queries on average, to retrieve the desired data. Although the solution is only polynomially faster than the classical approach, Grover’s quantum algorithm is an important one as it can be generalized to be applied in many intractable computer science problems. Recently, quantum equivalents for random walks [6], genetic algorithms [3], and NAND tree evaluation [7] have been developed.

Shor in [8] categorized quantum algorithms known to provide substantial speed-up over the classical approach into three types: (a) algorithms that achieve notable speed-up by applying quantum Fourier transform (QFT) in periodicity finding; examples of this type of algorithm include integer factoring and discrete logarithms algorithms [1], Simon’s periodicity algorithm [9], Hallgren’s algorithms for Pell’s equation [10], and the quantum algorithms for solving hidden subgroup problems [11, 12]; (b) Grover’s search algorithm and its extensions [2, 13] which in general offer square root speed improvements over their classical counterparts; and (c) algorithms for simulating or solving problems in quantum mechanics [14].

Physical realization of a quantum computer is proving to be extremely challenging [15]. With research into viable large-scale quantum computers still ongoing, various technologies, namely, ion trap [16], nuclear magnetic resonance [17], and superconductor [18], were attempted. Nevertheless, only small-scale quantum computation implementations have been achieved [19, 20]. Instead of focusing on the realization of quantum gates, a different approach known as quantum annealing which solves optimization problems by finding the minimum point is used in the 128-qubit D-Wave One, 512-qubit D-Wave Two, and 1000-qubit D-Wave 2X systems [21, 22]. However, based on the research report presented in [23], the expected quantum speed-ups were not found in the D-Wave systems.

In parallel to efforts to develop physical quantum computers, there is also much effort in the theoretical research of quantum algorithms. Until large-scale practical quantum computers become prevalent, quantum algorithms are currently developed using the classical computing platform. However, due to their inherent sequential behaviour, classical computers that are based on Von Neumann architecture cannot simulate the inherent parallelism in quantum systems efficiently. On the other hand, the technology of field programmable gate array (FPGA) offers the potential of massive parallelism through hardware emulation. Consequently, significant improvement in speed performance over the equivalent software simulation can be achieved. However, FPGA is still a form of classical digital computing, and resource utilization on such a classical computing platform grows exponentially as the number of qubits increases. The problem is further compounded with the fact that accurate modelling of quantum circuit in FPGA technology is nonintuitive and therefore difficult, providing the research motivation for this paper.

This paper presents an efficient FPGA emulation framework for quantum computing. In the proposed emulation model, quantum computations are mapped to a serial-parallel architecture that facilitates scalability by managing the exponential growth of resource requirement against number of qubits. Quantum Fourier transform and Grover’s search are chosen as case studies in this work since they are the core of many useful quantum algorithms, and in addition, they have been used as benchmarking models in prior works on FPGA emulation. Experimental results on the efficiencies of different FPGA emulation architectures and fixed point formats are presented, which will sufficiently demonstrate the feasibility of proposed framework.

The rest of this paper is organized as follows: Section 2 discusses prior works on FPGA-based quantum computing emulation, emphasizing issues of hardware architecture and modelling of quantum system on FPGA platform. In Section 3, the theoretical background on quantum computing and related quantum algorithms is provided. Section 4 presents the design of the proposed FPGA emulation models for QFT and Grover’s search algorithms. Experimental results and analysis are given in Section 5. Finally, concluding remarks are made in Section 6.

Modelling of a quantum system on classical computing platform is a challenging task. Hence, it is even more difficult to map quantum algorithms for emulation on classical computing environment based on FPGA, which is highly resource-constrained. Many attempts have been made in the last decade in FPGA emulation of quantum algorithms, and these works include [2427]. However, details of the critical design processes such as mapping of the quantum algorithms into the FPGA emulation models and the verification of the implementations are not revealed in these prior works.

For software-based simulation using classical computer, various types of quantum simulators have been proposed. An open source C library, , for simulation of quantum computing is presented in [28] where pure quantum computer simulation as well as general quantum simulation is supported by the tool. In 2007, a variant of binary decision diagram named quantum information decision diagram (QuIDD) for compact state vector storage was introduced in [29] for efficient quantum circuit simulation. García and Markov [30] proposed a compact data structure based on stabilizer formalism called stabilizer frames.

Most of the previous FPGA emulation works are based on the quantum circuit model, which is essentially an interconnection of quantum gates. A different approach was taken by Goto and Fujishima [24] where a general purpose quantum processor was developed instead of applying the quantum circuit model. However, Fujishima’s quantum processor assumed that the amplitudes of a quantum state can be either all zeros or with evenly distributed probability. In its emulation of Shor’s integer factoring algorithm, details of the implementation are inadequate for its results to be verified as claimed. For instance, it is stated in [24] that a 64-bit factorization was demonstrated using their emulator with only 40 Kbits of classical memory instead of 320 qubits as required with Shor’s algorithm in a quantum computer. This statement was not supported by design and implementation details on how factorization of such a large integer can be done with only 40 Kbits memory, where typically it would require at least bytes to represent a quantum state of such a scale on the classical platform.

In [25], FPGA emulation of 3-qubit QFT and Grover’s search are proposed. In this work, which is based on the quantum circuit model, qubit expansion is performed prior to the application of multiqubit quantum gate transformations. This leads to an inaccurate modelling of a quantum algorithm, since, according to [31], the input quantum state to QFT circuit should first be placed in superposition of basis states, where signal samples are encoded as sequence of amplitudes. In the work by [26], hardware emulation of QFT restricts its input quantum state to the computational basis state, implying that superposition is not included in the modelling. Rivera-Miranda et al. in [26] claims 16-qubit QFT emulation is achieved. However, the emulator can only process up to 32 input signal samples in one evaluation, which is equivalent to a 5-qubit QFT emulation if effects of superposition and entanglement are included.

From the above discussion it should be noted then that the critical quantum properties of superposition and entanglement were not considered in these previous works, resulting in inaccurate modelling of quantum algorithms. Without the superposition and entanglement effects, the power of quantum parallelism cannot fully be exploited. Previous works reported in [2527] applied pipeline architecture in their FPGA emulation implementations so as to obtain high throughput and low critical path delay. However, a pipeline design imposes high resource utilization (due to the requirement of additional pipeline registers and associated logic), thus limiting FPGA emulation to be deployed in more practical quantum computing applications that typically require high qubit sizes. In these pipeline implementations proposed in prior works, resource growth was exponential to the increase in qubit sizes.

In this paper, the issues outline above is addressed. The efficiencies of different hardware architectural designs for FPGA emulation purposes are evaluated based on the chosen case studies of QFT and Grover’s search. We propose an accurate modelling of quantum system for FPGA emulation, targeting efficient resource utilization while maintaining significant speed-up over the equivalent simulation approach. Since our proposed FPGA emulation framework applies the state vector approach, simulation models based on the library are selected in this work for benchmarking purposes.

3. Theoretical Background

In general, quantum algorithms obey the basic process flow structure. The computation process begins with a system set in a specific quantum state, which is then converted into superposition of multiple basis states. Unitary transformations are performed on the quantum state according to the required operations of the algorithm. Finally, measurement is carried out, resulting in the qubits collapsing into classical bits.

3.1. Quantum Bit (Qubit)

In classical computing, the smallest unit of information is the bit. A bit can be in either state or state , and the state of a bit can be represented in matrix form as

On the other hand, in quantum computing, the smallest unit of information is the quantum bit or a qubit. To distinguish the classical bit with the quantum qubit, Dirac ket notation is used. Using the ket notation, the quantum computational basis state is represented by and . A qubit can be in state , or in state , or in superposition of both basis states. The state of a qubit can be represented aswhere both and are complex numbers and . is the probability where the qubit is in state and is the probability where the qubit is in state upon measurement. An -qubit quantum state vector contains complex numbers which represents the measurement probability of each basis state. However, on measurement, the superposition is destroyed and the qubits return to the classical state of bits depending on the probability derived from the complex-valued state vector.

3.2. Tensor/Kronecker Product

Tensor product or Kronecker product is the basic operation that is applied in the formation of a larger quantum system as well as multiqubit quantum transformations. A quantum state vector that can be written as the tensor of two vectors is separable, whereas a state vector that cannot be expressed as the tensor of two vectors is entangled [15]. The tensor operation on any arbitrary two 1-qubit transformations is shown below:

3.3. Quantum Circuit Model

A quantum algorithm is a description of a sequence of quantum operations (or transformations) applied upon qubits to generate new quantum states. The model most widely used in describing the evolution of a quantum system is the quantum circuit model, first proposed in [32]. A quantum circuit is the interconnection of quantum gates with quantum wires, and gate operations are represented by unitary matrices.

All unitary matrices are invertible and the products of unitary matrices as well as the inverse of unitary matrix are unitary. An -by- matrix is unitary if , where is the adjoint (conjugate transpose) of . Since all quantum transformations are reversible, quantum gate operations can always be undone. Fundamental quantum gates include the Hadamard gate, phase-shift gate, and swap gate, and these gates are described as follows.

Hadamard gate is one of the most useful single qubit quantum gates. It operates by placing the computational basis state into superposition of basis states with equal probability. The Hadamard transform can be represented by the following unitary matrix:

The following example illustrates the application of Hadamard gates in mapping a 2-qubit basis state to a superposition of basis states with equal probability:

Controlled phase-shift gate operates on 2 qubits, one of which is the control qubit and the other is the target qubit. If the control qubit is true, a phase-shift operation is performed on the target qubit; otherwise, there is no operation. The operation is represented by the following matrix:

Quantum gate is used for swapping two qubits. It switches the amplitudes of a quantum state vector. The operation of a 2-qubit gate is represented by matrix in

3.4. Quantum Fourier Transform (QFT)

The Fourier transform is deployed in wide range of engineering and physics applications such as signal processing, image processing, and quantum mechanics. It is a reversible transformation that converts signals from time/spatial domain to frequency domain and vice versa. The Fourier transform is defined in (8) for continuous signals and in (9) for discrete signals:

The quantum Fourier transform (QFT) is a transformation on qubits and is the quantum equivalent of the discrete Fourier transform. It should be noted that a quantum computer performs QFT with exponentially less number of operations than the classical Fourier transform. However, QFT does not reduce the execution time of the algorithm when classical data is used. This is due to the characteristic of the quantum computer that does not allow parallel read-out of all quantum state amplitudes. In addition, there is no known method that can effectively instantiate the desired input state amplitudes to be Fourier-transformed [33].

In order to harness the power of quantum computing on Fourier transform, QFT has to be deployed within other practical applications. QFT is pivotal in quantum computing since it is part of many quantum algorithms. These algorithms include integer factorization and discrete logarithms algorithms [1], Simon’s periodicity algorithm [9], and Hallgren’s algorithms [10]. They offer significant speed-up over their classical counterparts. QFT has also found applications in many real-world problems such as image watermarking [34] and template matching [35].

To compute Fourier transform in quantum domain, discrete signal samples are encoded as the amplitude sequences of a quantum state vector which is in superposition of basis states [31]. An -qubit QFT operation which transforms an arbitrary superposition of computational basis states is expressed in

As the requirement for a valid quantum state, must be normalized such that it fulfils (11). If the original signal inputs do not comply with this requirement, the amplitudes of the signal samples have to be divided by the normalization factor, . In most cases, the input states formed by the normalized signal samples are entangled:

From (10), it can be observed that the term in QFT equation is a rational number in the range of . As qubit representation is typically used in computations, the in base-10 integer is redefined in base-2 notation as individual bit such that the binary fraction form as expressed in (12) can be conveniently adopted:

With some algebraic manipulations, the QFT equation can be derived from (13) to form (14) [33]:

Since the term produces either if or otherwise, Hadamard computation on the first qubit results in . Computations of the consecutive bits in the binary fraction are obtained using controlled phase-shift gates according to (14). QFT circuit consists of three types of elementary gates which are Hadamard gate, , controlled phase-shift gate, , and gate. The circuit model of an -qubit QFT is depicted in Figure 1.

The size of a QFT circuit grows exponentially as the number of input qubits increases. An -qubit QFT involves unitary transformations and could process up to input samples in one evaluation (provided the input samples are encoded as the amplitude sequences of a superposition of computational basis states).

3.5. Grover’s Search Algorithm

In computer science area, a typical search problem is to identify the desired element from an unordered array. For many computing applications, it is critical that the search technique is efficient. In terms of a function, the search problem is given with assurance that there exists one binary string where if ; else, .

In classical computing, queries on average are required to search for a particular element in an unordered array with elements. In quantum computing, Grover’s search algorithm can complete the job in queries (for the rest of the text and figures in Section 3.5, the required Grover iterations are abbreviated as times). Although the speed-up achieved is only quadratic, Grover’s algorithm and its extensions are extremely useful in enhancing current methods in solving database searching and optimization problems, which include 3-satisfiability [36], global optimization [37], minimum point searching [38], and pattern matching [39]. The core operations of Grover’s algorithm are phase inversion and inversion about mean. Phase inversion inverts the phase of the state-of-interest, and its quantum circuit model is given in Figure 2.

In Figure 2, the top -qubit, , is the target qubit, and the bottom qubit is called the ancilla qubit. The function of is to pick out the desired binary string. To apply phase inversion on target qubits, Hadamard gate operation is performed on the ancilla qubit, which is initialized as . This is to complement the effect of which takes to .

In terms of matrices, the phase inversion operation can be expressed as , and the corresponding quantum states are described as follows:

Inversion about mean boosts the phase separation between the element-of-interest and other elements in the unordered arrays (after phase inversion operation is applied to invert the phase of the target element). The mean of all elements is computed and inversions are made about the mean. The overall mean remains unchanged after the inversion process. This is because the distance between one element and the mean is the same before and after inversion. The only change is if the original sequence is above the mean, during the inversion it is flipped about the mean to the same distance below the mean and vice versa. In general, the inversion about mean operation can be expressed as where is the mean, is the value of an element in the array, and is the new value of that element after inversion.

In terms of matrices, the mean of a elements vector is obtained by the product of matrices and where all the elements in the -by- matrix are set to . Hence, inversion about mean in matrix form becomes

In order to achieve high confidence of getting the desired element, the amplitude amplification process (amplitude amplification in Grover’s search algorithm involves phase inversion and inversion about mean operations) has to be repeated for times. This is because the probability of success changes sinusoidally by the number of amplitude amplification iterations (as illustrated in Figure 3) and the highest probability of success first happened after the required iterations. Pseudocode for a generic Grover’s search algorithm is given in Algorithm 1.

) Start with as the target input qubits
() Apply -qubit Hadamard gate on target qubits,
() for times do
()  Apply phase inversion operation,
()  Apply inversion about mean operation on the target qubits,
() end for
() Measure the target qubits

There are two approaches to model Grover’s search quantum algorithm. The first approach, which is based on quantum circuit model, is discussed next. The second method is modelling using arithmetic functions, and this is presented in Section 4.2 since this approach is applied in this paper.

As shown in Figure 4, Grover’s search circuit given in [15] is constructed with assumption that black box modules and are available. Descriptions of the and modules have been given earlier.

On the other hand, the circuit model for -qubit Grover’s algorithm presented in [33] is shown in Figure 5(a). In this figure, is the Hadamard gate, and is the Grover iteration circuit, which is illustrated in Figure 5(b).

The function of Grover iteration circuit is equivalent to the phase inversion and inversion about mean. In order to achieve high probability of successful search, is concatenated for times. In Figure 5(b), the role of oracle module is to recognize the solution to a particular search problem in the phase inversion operation. By monitoring the oracle qubit, a solution to the search problem can be detected through the changes of the oracle qubit. The design of oracle module varies with different search applications, and an example of the oracle circuit for a simple 3-bit search task is shown in Figure 6. In Figure 6, the symbol represents Pauli- matrix. The open circle notation indicates conditioning on the qubit being set to zero, whereas the closed circle indicates conditioning on the qubit being set to one.

Deriving from the circuits in Figure 5, the corresponding 3-qubit Grover’s search circuit is provided in Figure 7. This circuit model is made up of Hadamard, oracle, quantum NOT, and multiqubit controlled-NOT gates.

4. Proposed FPGA-Based Hardware Emulation

This section presents our approach in modelling quantum Fourier transform and Grover’s search algorithms for FPGA emulation. The proposed techniques can be generalized to FPGA emulation of more complex quantum algorithms that apply QFT or Grover’s algorithm. This paper extends our earlier work presented in [40, 41]. In these previous works, the hardware architecture proposed was restricted to the serial design with resource sharing facilitated at the register level. In this paper, we enable resource sharing at the operator (or computational) level that allows for more efficient emulation of the quantum algorithms. Furthermore, additional case study on Grover’s search algorithm is included for generalization of the proposed framework. The choice of hardware architecture varies based on the need of different applications. Based on the selected case studies, the efficiencies of different hardware architectures for quantum computing emulation purposes are discussed and analysed in this work. The goal is to achieve scalability and also efficient resource utilization for emulating practical larger qubit size quantum systems.

4.1. Modelling QFT for FPGA Emulation

The derivation of quantum circuit model for -qubit QFT was discussed earlier is Section 3.4. Here, we present the modelling of QFT for FPGA emulation based on a 3-qubit example. According to (14), the mathematical expression for 3-qubit QFT is derived as shown in

Deriving from the general -qubit QFT circuit provided in Figure 1, the corresponding quantum circuit for 3-qubit QFT is obtained as shown in Figure 8.

The circuit consists of Hadamard gates, , controlled phase-shift gates, and , and also the gate. Referring to the functional block diagram given in Figure 9, this quantum circuit model corresponds to a sequence of unitary transformations, , to , defined by

Note that, in Figure 9, the modelling of -qubit quantum system with superposition and entanglement properties resulted in a circuit with signals. Since the input samples to QFT circuit are encoded as sequence of amplitudes in an entangled superposition of basis states (discussed in Section 3.4), modelling based on individual qubit with separate quantum gate operations is unable to reflect the effects of applying a quantum gate on entangled qubits correctly.

In order to model the effect of superposition and entanglement, derivation of each unitary transformation is made through the tensor product of individual quantum gate and identity matrix to form unitary matrix of equal dimension with the quantum state vector. Detailed derivations of the quantum unitary matrices for the 3-qubit QFT have been presented in our previous paper [41].

Since these quantum unitary matrices are mostly sparse matrices, we extract minimal number of useful arithmetic operations (due to nonzero elements in the matrices), resulting in an optimal realization of the model that can be mapped to an efficient FPGA emulation architecture. Incidentally, a software program has been developed to automate this mapping, hence easily scaling up the circuit model to larger qubit sizes. From these arithmetic functions, the corresponding data-flow graph for the 3-qubit QFT is derived as shown in Figure 10.

In Figure 10, each and signal is a fixed point complex number register for an element of the quantum state vector. The operation of module corresponds to and , where is the multiplication of the input complex number with imaginary number . This is applied in unitary transformations and . The operation of module in Figure 10 is described in (26). The function of is the multiplication of the input complex number with a constant complex number which is derived based on the controlled phase-shift gate. As expressed previously in (21), it is used in unitary transformation :

4.2. Modelling Grover’s Search for FPGA Emulation

As mentioned earlier, the second approach of modelling Grover’s quantum search is modelling using arithmetic functions (based on mathematical model). In this work, we have chosen this approach of modelling Grover’s algorithm for FPGA emulation. In contrast to the quantum circuit model approach that involves complex large dimensional matrix operations (i.e., matrix multiplication and tensor product), the chosen method can utilize the computational resources available on FPGA such as comparators, adders/subtracters, and multiplexers for efficient emulation of Grover’s search algorithm.

This technique is based on phase inversion and inversion about mean as described in Section 3.5. As shown in Figure 11, mathematically, Grover’s search for a database with elements mainly involves the processes of inverting the phase of target element, function , and performing inversion about mean on all elements, function , for times. Initialization of the quantum state vector with equal probability, function INIT, is carried out once in the beginning of the process.

For the case study of a 3-bit search problem, we derive the data-flow graphs and obtain the result of the required arithmetic functions as shown in Figure 12. For experimental purposes, the oracle module is developed by comparing all elements in database with targeted element using comparators, modules. The arithmetic functions of the inversion about mean are derived through straightforward computations that involve summation, bit shift, and subtraction.

4.3. Architecture of Proposed FPGA Emulation Model

It is clear that if implemented on classical computing platforms, the resource utilization for a quantum system would grow exponentially. Hence, the choice of suitable architecture is critical for FPGA-based quantum computing emulation. Here, we discuss the efficiencies of different architectural choices in datapath: concurrent, pipeline, serial, and serial-parallel. The block diagram of various architectures is constructed based on the example of 3-qubit QFT.

4.3.1. Concurrent Processing

In concurrent processing of an algorithm, all computations are completed within a clock cycle. In the case of 3-qubit QFT, computation blocks between the input and output registers, through the functional blocks, to are performed in one clock cycle (refer to Figure 13(a)). However, such an architecture consumes enormous resources, such that the number of registers required to emulate an n-qubit QFT is . In addition, the critical path delay is very high which results in unrealistic low operating frequency.

4.3.2. Pipeline Architecture

Most of the prior works on FPGA-based quantum circuit emulation [2527] are developed based on pipeline architecture. The pipeline architecture has the advantages of high throughput and much shorter critical path delay. Figure 13(b) shows the proposed pipeline architecture of the 3-qubit QFT. However, the main issue of this approach is that resource utilization grows drastically by the number of qubits, due to the circuit augmentation of pipeline registers. pipeline registers are required to emulate n-qubit QFT. Consequently, hardware emulation scalability is highly constrained by the available resources in FPGA.

4.3.3. Serial Processing

Although serial design requires multiple iterations to perform a complex computation, it opens up the opportunity for resource sharing. Serial processing is suitable for applications where resource utilization is a critical design constraint. Figure 13(c) depicts the serial form of the 3-qubit QFT circuit that consists of a control unit and a datapath unit. As resources can be reused between transformations, a serial-based n-qubit QFT consumes registers, a register utilization that is much lower than in concurrent or pipeline architectures. However, pure serial approach forfeits the purpose of conducting FPGA emulation whose aim is to exploit the parallelism inherent in a quantum system, as it would still suffer from slow sequential behaviour as exhibited in simulation on classical computer.

4.3.4. Proposed Serial-Parallel Architecture

In this paper, we propose a hybrid serial-parallel architecture for FPGA emulation of quantum algorithms. The proposed approach takes advantage of both serial and parallel design techniques. Applying the concepts of quantum parallelism and quantum dynamics modelled by sequential transformations on a quantum state vector, it is found that the proposed serial-parallel architecture is suitable for efficient and accurate quantum computing emulation on FPGA platform.

Figure 14 shows the functional block diagram of the proposed serial-parallel FPGA emulation architecture of the 3-qubit QFT. The serial-parallel design of the datapath unit involves a number of quantum computation units that can perform parallel computations for each stage of unitary transformation whereby the same computational resources can be reused for the following stage of transformations.

For data storage and synchronization purposes, registers are shared between unitary transformations. As compared to the pipeline design, our proposed serial-parallel approach achieves linear reduction on the usage of registers to emulate the same quantum system. The arithmetic logic unit (ALU) in the datapath unit contains multiple custom processing elements and the allocation of resources in ALU varies based on the target application. The number of processing elements is basically determined based on the desired parallelism in the n-qubit quantum system.

As illustrated in Figures 10 and 12, the data-flow graph of QFT and Grover’s search algorithm exhibit similar repetitive pattern between unitary transformations. This implies that the proposed serial-parallel approach can be generalized for the two case studies to achieve balance in both resource utilization and speed performance. For the case of Grover’s search, the ALU would be the Grover iteration module shown in Figure 12, whereas the control unit is designed to keep track on the number of Grover iterations required by the target search problem.

5. Experimental Results

The proposed emulation designs are modelled in SystemVerilog HDL, synthesized using Altera Quartus II software, and implemented into target emulation platform which is based on Altera Stratix IV EP4SGX530KF43C4 FPGA. In Section 5.1, we discuss the verification of the hardware emulation designs for the QFT and Grover’s search case studies. In addition, the automated process for scaling up the design to larger qubit size is described. In Section 5.2, investigation is conducted to study the effects of the number of mantissa bits used in our fixed point representation format on resource utilization and precision error. In the section that follows, we analyse, for different emulation architectures, how the increase in qubit size impacts on resource growth and maximum operating frequency allowed in the designs. Finally, the runtime speeds in simulation and emulation for quantum algorithms with different qubit sizes are compared.

5.1. Design Verification

To show that the quantum algorithms have been modelled accurately, design verification of the proposed emulation hardware is performed. Golden references based on the software simulation models (in C) are developed and their outputs are compared with the emulation hardware under test (which are described in SystemVerilog HDL).

In our QFT case study, FFTW3 [42], a widely applied fast Fourier transform library in C, is used to perform Fourier transform computations on the signal samples that are used in this work. The outputs of the classical Fourier transform then serve as the golden reference model in verification of the proposed emulation model. Furthermore, since the discrete Fourier transform (DFT) is a linear transformation that can be defined in unitary matrix form, the functional correctness of our QFT hardware emulation model can be conveniently verified against the DFT matrix. The expression of an -qubit DFT matrix is shown in where is the th root of unity; that is, . The choice of or is purely a matter of convention as both the term and the term to the power of are equal to .

On the other hand, the design of Grover’s search FPGA emulation model is verified against the mathematical model provided in literature. The simulation model of Grover’s search algorithm also serves as the golden reference model for verification purposes.

For the development of FPGA emulation models for practical quantum computing applications, it is important that the emulation hardware can be scaled up to larger qubit size architectures. In this work, the designs are scaled up with the aid of software program developed in-house. HDL codes of the two case studies are autogenerated by the software program based on the proposed modelling techniques (as discussed in Section 4). The generated HDL code produces efficient hardware emulation model based on the proposed serial-parallel architecture.

5.2. Fixed Point Representation

As defined in (2), a quantum state vector is represented by complex floating point numbers. To ensure effective resource utilization in our FPGA emulation hardware, floating point numbers are replaced by fixed point representations. In this work, a fixed point format with 1 sign bit, 1 integer bit, and mantissa bits (as shown in Figure 15) is used. Since the amplitudes of a quantum state, that is, the probabilities of collapsing into computational basis states after measurement, are constrained in the range of 0 to 1, only one bit is required to represent the integer part.

Due to the limitations of the classical digital computing platform, representation of qubit amplitudes with infinite precision is infeasible. In the context of quantum computer modelling, particularly for quantum systems with large qubit sizes, minimising precision loss is critical to preserve the consistency of the quantum state during the modelling process [43]. Here, we investigate how precision error is affected by the number of mantissa bits used in our fixed point representation format. The corresponding experimental results are given in Figure 16.

Precision error shown in Figure 16 is computed based on the following equations:where and are the floating point real and imaginary amplitudes of the reference output state generated from simulation, whereas and are the amplitudes extracted from the output state of proposed FPGA emulator (converted from original fixed point format to floating point for verification purposes).

Figure 16 shows that 16-bit fixed point format (14 mantissa bits) incurs significant precision error for 5-qubit emulations in both case studies. However, the error produced by 2-qubit emulations is insignificant. This behaviour is because the amplification of the fixed point truncation errors grows with the size of quantum system. For FPGA emulation purposes, precision error due to fixed point representation can be reduced by increasing the number of mantissa bits with trade-off on resource utilization. By expanding the number of mantissa bits up to 24-bit (which results in 26, total number of bits), negligible precision error for 5-qubit emulations is attained. It is important to note that the proposed FPGA emulator is parameterizable in terms of the number of mantissa bits for fixed point representation. This is crucial to ensure different fixed point formats can be applied to emulate quantum circuit of various sizes based on the demanded precision error tolerance and resource constraint.

The size of our fixed point number format also affects resource utilization. The corresponding experimental results for QFT case study are shown in Figure 17. Since the resources available on FPGA device are mostly in blocks or multiples of 8 bits, the choice of 26-bit fixed point format is not suitable. Hence, we apply 22-mantissa-bit size (i.e., total number of bits is 24) in our fixed point representation formats. Note that, for Grover’s search emulations, the experiment on resource utilization of DSP blocks is not relevant because FPGA emulation model developed here (as depicted in Figure 12) does not involve multiplication.

5.3. Efficiency of Proposed FPGA Emulation Architecture

In this subsection, we investigate how the increase in qubit size impacts resource growth and maximum allowable operating frequency in different emulation architectures (i.e., concurrent, pipeline, and serial-parallel). In the case of QFT emulation model, we have two versions of serial-parallel architectures. Type 1 serial-parallel uses DSP blocks to perform multiplications, whereas type 2 replaces the multiplications (required in Hadamard gate operations, , , and , in the 3-qubit QFT case) with shift-add operations. Although conventional hardware design methods encourage the usage of shift-add operation instead of multiplication to reduce resource utilization, the case is now different with FPGA devices containing efficient built-in DSP blocks. Figures 18 and 19 show the results of the experiments conducted on QFT and Grover’s search algorithms, respectively.

Based on the experimental results shown in Figure 18, when comparing to type 1, type 2 method consumes less DSP blocks but more logic elements due to the construction of adders needed in shift-add operations. As the number of qubits increases, resource utilization of logic elements for type 1 emulation grows rapidly when available DSP blocks are used up and it ends up with similar resource utilization as the type 2 approach. Hence, we can conclude that, for large-scale FPGA emulations, both methods would lead to similar resource utilization. Thus, the first approach is preferred due to the ease of design process where DSP blocks are utilized by default when implementing the design with the FPGA design automation tool.

In contrast to the concurrent and pipeline designs, the experimental results for QFT and Grover’s search show that the proposed serial-parallel architecture achieves balance on both resource utilization and operating frequency. The proposed architecture has significantly reduced resource growth in the application of logic elements, dedicated logic registers, and DSP block, yet maintaining reasonable operating frequency. With the concurrent and pipeline architectures, 5-qubit QFT emulation completely used up the resources available in the Altera Stratix IV FPGA device used in this work. However, the same device can support up to 7-qubit QFT emulation with the proposed serial-parallel architecture. It is important to note that the processing power of 7-qubit QFT is far higher than the 5-qubit implementation as an -qubit QFT can process up to samples in one evaluation.

For scalability, software simulation would depend on resources available on the computer servers. The scalability of the FPGA emulation framework would depend on the resources available on target FPGA devices. As there have been rapid advances in FPGA technology in recent years, by designing an efficient architecture that is implemented in a high-density FPGA device (such as the Altera Stratix 10 that contains up to 5.5 million logic elements), one can actually emulate large qubit size quantum circuit on a single FPGA chip. Furthermore, new approach to FPGA emulation may be made through the exploration of efficient data structures and modelling methods. Thus, the proposed work contributes to the formulation of a proof-of-concept baseline FPGA emulation framework with optimization on datapath designs that can be extended to emulate practical large-scale quantum circuits.

5.4. Benchmarking between Simulation and Emulation

Here, our emulation models are benchmarked against the equivalent software simulations. The simulation models used are based on an open source quantum library, [28]. Software simulation is performed on an Intel Core i7-4790 eight-core processor with 3.6 GHz clock rate running on a Linux-based Ubuntu 14.04 kernel, whereas hardware emulation is based on the Altera Stratix IV EP4SGX530KF43C4 FPGA. Table 1 shows the runtime comparison between simulation and our emulation.

Figure 20 illustrates the runtime speed-up (simulation/emulation) of Grover’s search case study. It is important to note that the proposed hardware emulation is implemented based on 24-bit fixed point format (which is determined based on the experimental work presented in Section 5.2) whereas 32-bit single precision float is used in the software simulation to describe the quantum state. For a larger qubit size quantum circuit, the number of mantissa bits for fixed point representation has to be increased accordingly to ensure similar precision as in software simulation can be achieved with trade-offs on resource utilization and execution speed [44].

From Figure 20, it can be observed that the proposed hardware emulation provides significant speed-up over software simulation using . It is important to note that the achieved speed-up increases drastically as the number of qubits increases. This result further supports the notion that hardware emulation has significant potential in the modelling of a large-scale quantum system on the classical computing platform based on FPGA.

As the number of required I/O pins for emulating QFT and Grover’s search algorithms with parallel read-outs is too much to fit in the existing FPGA devices, board-level verification is infeasible. Although the usage of multiplexers reduces the number of output pins, resource consumption rises significantly with the increase in the number of qubits, and this affects the analysis of the overall experiment. Thus, the estimated runtime of the proposed FPGA emulation architecture is obtained based on the hardware clock cycle and the operating frequency that is acquired from the FPGA development tool. This is not a critical issue in future practical deployment as the selected case studies are meant to work as core modules within other quantum applications that might not require parallel read-out.

6. Conclusion and Future Work

Efficient resource utilization is critical for FPGA-based implementations especially for emulating quantum computing applications as they typically exhibit exponential resource requirement with increasing number of qubits. In this paper, we proposed a baseline FPGA emulation framework with focus on the datapath design optimization based on the conventional state vector model as well as an effective methodology that facilitates accurate modelling of quantum algorithms for FPGA emulation. A serial-parallel architecture with efficient resource utilization for FPGA-based emulation of quantum computing is presented. The proposed emulation architecture achieves linear reduction in resource utilization compared to pipeline implementations as found in previous works. This work has also demonstrated the advantage of FPGA emulation over software simulation where hardware emulation of 7-qubit Grover’s search is about times faster than the software simulation performed on Intel Core i7-4790 eight-core processor running at 3.6 GHz clock rate.

However, experimental results obtained in this work show that it is difficult to realize a scalable and flexible emulation platform for large qubit size real-world quantum system using the approach that applies existing state vector quantum models. This concurs with [45] that states that the practical limit on the size of the quantum system that can be modelled on classical computing platform can hardly be overcome due to exponentially large memory requirements for storing the entire state vector. Hence, this suggests that a model with a more effective data structure to represent quantum systems is required. Recently, the work on stabilizer frames [30] has shown promise in providing a more suitable data structure for quantum models targeted for FPGA emulation. This is the subject of future work in our research in applying FPGA emulation in modelling of large-scale quantum systems. In addition, the error-correction structure available with stabilizer frames will be considered for application in practical quantum computations. With a more efficient modelling technique, FPGA can represent a more efficient emulation strategy of quantum systems.

Competing Interests

The authors declare that they have no competing interests.

Acknowledgments

This work is supported by the Ministry of Higher Education (MOHE) and Universiti Teknologi Malaysia (UTM) under Fundamental Research Grant Scheme (FRGS) Vote number 4F422.