Abstract
In recent years, fueled by the breakthroughs in the technology in quantum computation, there has been a rising interest in the noisy intermediatescale quantum (NISQ) era. In addition to a large number of qubits, the study of error correction and quantum algorithms has made great progress. However, as a primary goal of quantum computation, making practicable quantum computers in the NISQ era still needs further study, mainly focusing on quantum computer organization, architecture, and circuit synthesis. This paper studies the quantum circuit synthesis in a smallscale universal quantum computing device called a “quantum systemonchip” (QSoC). We analyze the quantum compilation of the hybrid architecture for a smallscaled universal quantum computational device with a specific size (quantum chip). Two kinds of onchip circuit synthesis algorithms are proposed and discussed.
1. Introduction
In recent years, rapid development has been made in quantum computing science and the quantum information/communication industry, and several milestones in quantum computing devices have been achieved. One remarkable progress is that the number of controllable quantum bits is increased from less than 20 to 20–50 or even to 100. The manipulation of these qubits in a noisy environment is feasible. In 2018, Preskill first used the term “noisy intermediatescale quantum” (NISQ) [1] for the current quantum era.
Today, a feasible, reliable, scalable, and ultimately universal quantum computational device can meet several rigorous requirements to build a quantum computer in smallscale and noisy environments.
Several researchers focused on the smallscale and integrated realization of quasiuniversal quantum computing. The term “onchip” has also been proposed in some research. In 2012, first, onchip quantum simulation with a superconducting circuit was demonstrated [2]. In recent years, the onchip method has been widely used for entangled pair generation [3, 4], quantum control [5], etc. However, there is no research on the integrated architecture for universal quantum computation in the onchip method, which means all the abovementioned studies are focused on the part of the quantum computing system, not the whole system itself. Our purpose in this paper is to give a tentative study for building a fully functioned quantum computing system on a chipsized device. So we call this architecture “quantum systemonchip.”
We will mainly focus on the function of quantum systemonchip that meets the requirements of NISQ quantum computation. In 2000, DiVincenzo proposed five conditions necessary for constructing a quantum computer [6]:(1)A scalable physical system with wellcharacterized qubits(2)The ability to initialize the state of the qubits to a simple fiducial state(3)Long relevant decoherence times(4)A “universal” set of quantum gates(5)A qubitspecific measurement capability.
Although the abovementioned conditions were proposed for experimental quantum computing devices with a very small size (approximately 2–10 qubits), but with additional criteria, they are still necessary to meet the new theoretical and technical requirements in the NISQ era. These new criteria are listed as follows:(1)Several scalable qubitbased quantum resources (qubits in particular states, entanglement pairs, etc.)(2)A hybrid “universal” computational architecture (at the “hardware” level)(3)A set of quantum lowlevel procedures (at the “software” level) or as “quantum primitive”(4)A set of noisy suppression and error correction mechanism or algorithms (at both “hardware” and “software” levels).
The hybrid architecture refers to the famous principle of “quantum data + classic control,” proposed by Selinger [7]. With the new criteria and studies on a quantum computer system [8], we draw a block diagram showing the scientific (theoretical) and engineering (experimental) roadmap of a quantum computer in the NISQ era (see Figure 1).
The paper also studies a quantum circuit synthesis model that ensembles most parts of the organization of a “quantum systemonchip.” Theoretically, it meets the requirements of DiVincenzo and the new criteria. We mainly focus on quantum circuit synthesis and the related architecture.
As a tentative study of the novel architecture and its synthesis, there are several challenges. First, the smallsize is very challenging, although the manipulatable qubits in QSoC.
Here is the outline of this paper: In Section 2, we study the hybrid architecture and organization of quantum computation in QSoC. A reduced quantum instruction set for universal quantum computation is analyzed, and the algorithmic realization of QSoC is discussed. In Section 3, we study the quantum compilation, an equivalence measurement of the quantum algorithm, and the corresponding quantum circuits. Some classic synthesis algorithms and a semiformal method of quantum synthesis are discussed. Quantum gate decomposition and its optimization algorithm for quantum circuit synthesis are studied in Section 4.
2. The Hybrid Architecture of the Quantum SystemonChip
From a microscopic point of view, quantum computation performs a set of unitary transformations on a given initial set of quantum states, and a quantum computational algorithm at a large scale is a unitary matrix that performs on an initial unitary matrix of the same size. It means that all quantum algorithms have different functions in which the parameters and return values are unitary matrices. The quantum evolutions in a quantum algorithm perform intrinsic actions on the quantum variables. Although a quantum program can be compiled into a sequence of quantum gates to control qubits, from a point of view of reliability and scalability, the sequenced gate instructions cost more physical and logical resources for error correction and may cause more cascading failures in the program implementation.
Here, we give the discussion based on a logical architecture rather than a physical architecture of the QSoC quantum device.
In quantum science, the design of the architecture and organization of a computing device is based on a specified computational model. The traditional architecture of a computational model of a quantum Turing machine (QTM) or quantum randomaccess machine (QRAM) is mainly an imperative paradigm. Logically, an imperative quantum computational device should have several separate units, each with its own functions. The most common imperative architecture of quantum computing devices is the quantum von Neumann architecture [9]. The quantum architecture contains a quantum memory unit, a control unit, an input/output unit, a quantum arithmetic logic unit (QALU), and a quantum bus system. In the quantum von Neumann architecture, there is no physical QALU and physical quantum control unit; all qubits are stored in a quantum memory unit (like the organization proposed in this paper). However, at the level of software processing, there is a lot of difference between the quantum von Neumann architecture and the proposed QSoC architecture.
We now discuss the different paradigms between classic and quantum computation. In a classic computational algorithm, a series of functions update the variables in a real linear space. In a quantum computational algorithm, a whole function performs the update of all quantum variables in Hilbert’s space. This implies that the use of the imperative paradigm in a classic computational model (architecture) is better. For example, the von Neumann architecture of a classic computer is an imperative paradigm. The von Neumann architecture allows the separation of all computational resources into several “units,” each of which has its own functions, such as a processing unit, control unit, memory, input, and output mechanism. However, quantum computation is more like a declarative paradigm, which means a quantum computational algorithm always performs an integrated transformation with all intrinsic variables and obtains the final output (i.e., all final states of variables inside an entire unitary matrix). A detailed discussion of the different paradigms is given in [10]. The design of a declarative computational model and declarative architecture is more simplified than that of an imperative architecture. The proposed hybrid architecture of the quantum systemonchip is based on the imperative paradigm: a quantum memory unit is the unique center of the architecture, storing all qubits and/or quantum computations; quantum communications are performed in the quantum memory, see more discussion below.
From the universal viewpoint of a quantum computer, the essential organization of a hybrid quantum computer can be generalized as “quantum data with classic control.” This architectural model mainly contains a classic controller (or classic computing part), a quantum memory unit, a quantum computing unit, and a quantum inputoutput (I/O) device [10]. If a quantum computing device is designed for a specific use, e.g., quantum communication or quantum random number generation (QRNG), the organization and architecture can be simplified. The central parts of the hybrid quantum computing device are a quantum memory unit and a quantum computing unit.
In the architecture of QSoC proposed in the paper, the central part of the entire system is the quantum memory unit rather than the quantum computing unit, and we call this “storagecentric” paradigm. Unlike a quantum computer in its early age (socalled“computingcentric” paradigm), quantum memory is a place where quantum states (e.g., quantum variables) are stored and quantum computation is conducted. We have designed several mechanisms, like the decoherencefree subspace and hierarchy QECC, to suppress decoherence in the quantum computing process and qubit states. In the NISQ era, these mechanisms can also be considered in the design of QSoC, see Figure 2 and details in [8, 10].
Now, we give a detailed description and a comparative study of the “storagecentric” and the “computingcentric” paradigms. In the traditional quantum computation model, similar to the classic Turing machine modelbased von Neumann architecture, there is a quantum memory subsystem at the hardware level. This traditional quantum memory subsystem plays the complete role of quantum storage, including the original input quantum data and the quantum data generated during compute processing. It is a crucial component of the traditional universal quantum computer. Under this architecture, quantum memories can communicate with the quantum arithmetic and logic unit (QALU) during quantum computing through a quantum state transfer (QST) process, which can be performed with a “quantum wire (use entanglement)” or with other techniques. A QST allows quantum information to be transferred from one physical system to another without disturbing its quantum state. These communication mechanisms allow quantum information to be stored and retrieved in a quantum memory subsystem. However, there are several disadvantages and challenges associated with this QST technique, which are as follows: (1) QST requires the preservation of delicate quantum states over a relatively long distance (due to the separation of the quantum memory subsystem and QALU subsystem. However, quantum states are susceptible to decoherence, which is the loss of quantum coherence due to interactions with the environment. This can limit the distance over which quantum state transfer is possible. (2) Error correction can also be challenging in the QST due to the nocloning theorem, so quantum error might be accumulating in this architecture. (3) QST often requires the use of entanglement, which is a limited and valuable resource in quantum computing. In addition, the creation and manipulation of entangled states can be resourceintensive and timeconsuming. (4) Finally, QST can be a complex process, involving multiple physical systems and control mechanisms. This complexity can increase the likelihood of errors and decrease the overall efficiency of quantum information processing.
In comparison, the paradigm of “storagecentric” allows the transfer of all quantum computing processing into the quantum memory unit, which eliminates the quantum state transfer from the quantum memory subsystem to the QALU. All we need is the resource of qubit in the quantum memory unit, and this actually makes the QALU a coprocessor.
Under the memory architecture (shown in Figure 2) with the “storagecentric” paradigm, quantum memory units tend to have simpler designs than computingcentric systems, because they equalize the logical functionality of computation and storage. The ensemble design of quantum memory units, e.g., decoherence subspace (DFS) control and code teleportation modules, EEU, and SPU, requires less sophisticated control mechanisms to manipulate and measure quantum states in realtime. The second advantage is that the storagecentric paradigm allows errorcorrecting codes and DFS to protect quantum information from decoherence and other errors, in both storage and computation processing. In contrast, the computingcentric paradigm should use different kinds of error correction to maintain the accuracy of storage and computations. In terms of scalability, storagecentric paradigm can be easier to scale up both quantum storage ability and the useable number of qubits in quantum computation.
The quantum computing instructions in QSoC perform a universal transformation on quantum data (variables) with a given initial or intermediate quantum state. A quantum gate set is a kind of interface between the hardware and system software (quantum compiler). A hardware quantum unitary gate set is a set , where is a singlequbit quantum gate or twoqubit quantum gate, which may contain a continuous internal parameter . The set is a discrete set with a continuous variable. The size of the quantum gate set can be small (e.g., IBM’s IBMQX4 5qubit experimental quantum computing device has a gate set that contains only 10 universal gates [11]). The quantum instruction set is designed based on the quantum gate set and the architecture of the quantum computing device. Like a classic instruction set, it can also be designed as a reduced (i.e., reduced instruction set computer, RISC) or complex (complex instruction set computer, CISC) set. A more detailed discussion of quantum instruction sets can be found in [12].
The QSoC from this paper provides a reduced instruction set, which contains five kinds of instructions: pure quantum instructions, quantum initialization, quantum evolution, scheduling, and measurement (finalization).
In a memory unit of QSoC, the quantum bits are organized in a lattice structure shown in Figure 2. Each qubit pool keeps the qubits in several rows and columns; they share an entropy exchange unit (EEU) and a state purification unit (SPU) [13]. The strength of quantum coherence between adjacent qubits can be measured and manipulated by a quantum program, which is realizable in NISQ [14, 15]. Such inqubit coherence links up all qubits in a pool and their characteristics in a “quantum wire.” The disadvantage of the qubit’s lattice topology is that it costs superpolynomial SWAP operations in the quantum circuit. However, the EEU can eliminate extraentropy made by SWAP operations by using specific protocols [16].
With the protection designed in the structure and organization of QSoC, all quantum algorithms can be programmed and compiled into quantum circuit language and then assembled into quantum gatebased instructions.
In quantum algorithms, the frequently performed procedures can be reused. We call the reusable procedures as quantum primary. The set of quantum primary procedures mainly includes quantum Fourier transform (QFT), quantum phase estimation, fast quantum modular exponentiation, and quantum algorithms for linear system of equations (e.g., HHL algorithm).
3. Quantum Compilation in the Quantum SystemonChip
3.1. Quantum Compilation in the NISQ Era
Theoretically, quantum computation can provide speedup (polynomially or even exponentially) in specific algorithms, such as factorization, approximate optimization, and simulation of quantum systems. Building NISQ computers for practical applications is challenging. The main goal of the system software is to transform the formal quantum algorithms into operational sequences of a quantum device, to meet the requirements of NISQ, and to be executed on NISQ equipment at the end. Technically, the limitations of NISQ include the following: (1) the number of qubits, (2) connections (quantum wire) between qubits, (3) quantum gates (specific hardware), and (4) the circuit depth caused by noise. The quantum algorithms are hardwareindependent, but quantum programs and applications are hardwaredependent. The quantum compilation can bridge the gap between the requirements of algorithms and programs, which is very important to realize the quantum system on the chip.
In general, the execution of a quantum algorithm is considered to run sequenced instructions in an idealized architecture with perfect and sufficient qubits, quantum gates, and an errorfree environment. However, in actual NISQ quantum devices (e.g., IBM, Rigetti Computing [17], Google [18], and Intel [19]), the environment is imperfect. The quantum compilation is used to convert a higher level algorithm into a lower level sequence that can be executed on the NISQ device.
In addition, with the limitation of the quantum gate resource, the compiler can shorten the sequence of computing gate circuits. For example, in an qubit quantum Fourier transform (QFT), there is a complex composition of Hadamard gates and controlled rotation gates. The number of gates in the algorithm (QFT) is in . The optimized quantum compiler is based on the structure of QFT, so the complied output is more accurate than the one just based on the structure of the deep gate sequence. This optimization can also be applied to quantum Oracle design, noise reduction, and error suppression.
3.2. Evaluation of Quantum Compilation
Several methods can evaluate the efficiency of a quantum compilation system. A normbased method is discussed below.
In quantum computing theory, if the same quantum transform is simulated on different circuits, the compilation results can be different. The norm can evaluate the quality of quantum compilation and circuit synthesis, that is, to minimize where the operator is unitary, which describes the quantum transform, and the operator is the output solution of the compiler or synthesis algorithm. An error is set to approach the transform optimization, i.e., . In QSoC, the evaluation method is the Hilbert−Schmidt norm [13], due to its low cost of evaluation:
3.3. Quantum Circuit Synthesis by Heuristic Search
The traditional quantum circuit synthesis algorithm mainly includes cosinesine decomposition (CSD) [20] and UniversalQ [21].
This section introduces a new algorithm: quantum stimulated annealing circuit synthesis (QSACS), which uses stimulated annealing (SA) and machine learning to optimize quantum compilation and circuit synthesis. QSACS can minimize the objective function (1) based on the search strategy.
Formally, we consider a quantum gate set , where is a discrete parameter and is a continuous parameter. The compilation algorithm will output a quantum gate sequence . In QSoC, compiling unitary transformation into a sequence of gate with sequence length can be described as the following optimization problem:where is a trainable machine; actually, it is a unitary transformation.
In the architecture of QSoC, the parameters of topology and interconnection are defined. The discrete parameter describes the topology of an entire circuit, which is a subspace of the QSoC’s circuit space; the continuous parameter describes the characteristics of each quantum gate, which is the combination of the quantum gates defined in the instruction set. The length of the sequence is fixed as a constant in the optimization process, and other parameters may vary in the process of the optimization training procedure. The function is a cost function to describe the “distance” between the trained and the target unitary transformations. Apparently, for any unitary transformation and , there is and if and only if .
With topologyaware optimization, the QSACS method can work properly to obtain the optimal circuit topological mapping from the application quantum algorithms. The advantage of the proposed method contains two parts: the system can use the declarative paradigm of quantum computing as a whole unitary transformation; and the topologyaware method can reduce the computational cost of the quantum part in a QSoC system.
4. Quantum Circuit Synthesis for the Quantum SystemonChip
In the discussion of Section 3.3, we analyze a quantum circuit synthesis algorithm based on machine learning and simulated annealing (SA). The algorithm defines a cost function and optimizes the target circuit in circuit space by using heuristic search. The algorithm’s target circuit (output) will be the optimal circuit with a lower cost. The strategy of this algorithm is if the cost of a synthesis circuit increases, the algorithm will determine whether to accept a structural change based on a specific probability. The acceptance probability decreases exponentially as distance for a change.
As discussed in Section 2 for the QSoC in the hybrid architecture, the classic part of the QSoC maintains a classic version of all quantum transformations, which correspond to the quantum algorithm for execution. The classic part is a coprocessor which assists the quantum part in classical computations in realtime circuit synthesis.
For the quantum part of QSoC, the memory unit is in initial states at the time of circuit synthesis, and all quantum variables are in preparation states. Notice that, in QSoC, there is no quantum processing unit (QPU). The prepared quantum circuit topology is in classic storage (classic memory), similar to von Neumann’s storedprogram computer. There is no gatebased sequence generated from the circuit synthesis procedure, only a declarative circuitbased topological mapping from the classic part of QSoC to the memory unit in the quantum part of QSoC. The detailed discussion of the quantum circuit synthesis by heuristic search is shown in [22].
We consider that a unitary transformation and a gate sequence are performed on a dimension space with an qubit, where . We define the cost function from (1) and (2):where is real, when the term is close to 1, and the sequences and are equivalent in the global system when the loss function equals to 0. That means the unitary transformation and gate sequence are operationally equivalent in the given quantum algorithm. This synthesize is simple and efficient.
A simplified block diagram of the proposed circuit synthesis is shown in Figure 3.
4.1. Method
In Section 3, a probabilistic methodology to optimize the synthesis problem is proposed. A good method may use randomly constructed gate sequences to compute a large amount of data. Half of the data are used to train the algorithm, namely, to optimize the cost function; another half of the dataset is used as a test dataset to evaluate the performance of the training model. The data used for training should cover all possible input branch spaces of a given algorithm. Because the algorithm (compiled gate sequence) is a linear mapping from the data qubit space (in dimension) to a real number (in one dimension), the estimation of the amount of training data to minimize the cost function is , where is the number of data qubits. The proposed method uses a unitary matrix with a size of , and the estimated constraints required to determine the algorithm parameters are . These data can be processed by using a classic computer in the hybrid model of QSoC.
The structural update discussed in Section 3 refers to the dynamical modification of the discrete parameters, including the use of new quantum gates to randomly replace some of the gates in the generated gate sequence. In the update process, the type and/or position of a given quantum gate may be changed to reoptimize the cost function on a continuous parameter . Due to the restrictions of the structure, reducing the gate sequence is usually not easy. Therefore, a key point of structural optimization is that every change in discrete parameters requires to reoptimize the continuous parameters; this is very important for the optimization efficiency.
We chose the backtracking algorithm to optimize the continuous parameter in which the updates of all singlequbit gates are in operation order, which means that only one singlequbit gate is updated at a time and other quantum gates are kept in quantum memory. After determining an optimal quantum gate (i.e. the quantum gate minimizes the cost function), it moves to the next quantum gate. The algorithm can also change the order of a quantum gate randomly to avoid local minima as possible. We consider the gradient descent method as the continuous parameter optimization of a single quantum gate, because a qubit gate can be described by three real parameters. The gradient descent method can be operated efficiently in a threedimensional space. Compared with the continuous parameter optimization, the algorithm needs to repeatedly optimize the continuous parameters until the cost function converges [22].
After the iteration (including the update of both discrete and continuous parameters), the cost will be computed. If a structural change lowers the costs, it will be accepted; if costs increase, then it will decide whether to accept the structural change based on a specific probability. This process behaves as simulated annealing (SA), which can effectively solve the problem of locally optimal solutions. The algorithm should find the global optimal solution instead of searching for the local minimum (optimal). Such a synthetic algorithm will have a greater “hit” possibility of finding the optimal solution for the circuit.
Another problem in this method is the length of the generated gate sequence. The simulation on NISQ shows that the upper bound of the length of the gate sequence is nearly eight times of the number of useable physical qubits. If an overlength sequence is adapted, the cost for error correction is also overproportioned. So limiting the length of the gate sequence is important.
After several steps of the iteration, the algorithm checks whether the current gate sequence can be reduced. The algorithm will try to find if there is an equivalent subsequence that can be substituted as a number of gates with a smaller size. If possible, it will modify the current gate sequence accordingly. The length of the gate sequence is shortened, but the cost is not increased. The algorithm to compress the subsequences is consistent with the process of the algorithm, and it is also iterative for both discrete and continuous parameter optimization. The same method can be used recursively since the size of quantum gates is fixed. This compression method may cause a shorter gate sequence than the fixed length . Regularly, compressing the length of the gate sequence is particularly useful for searching for global optima because the random update of discrete parameters can generate gate sequences containing redundancy. Such gate sequences often reach the local minima of cost after continuous optimization. But if the redundant subsequences are removed, the local minima can be avoided.
The algorithm iterates the process until the cost function converges or the number of iterations reaches the limit; then, the final gate sequence circuit is obtained. For a fixed gate sequence length L, this algorithm can usually obtain an approximate optimization of compilation for the quantum algorithm , which is sufficient to realize an equivalent circuit generation. If a more precise (or better) compilation is required, more elaborate compilation methods should be considered, such as hierarchical compilation.
4.2. Threshold of Synthesis
The threshold is set in the algorithm to speed up the compilation process. When the algorithm searches for a circuit, it is terminated and generates a circuit when the cost reaches the acceptability threshold . The threshold is determined by the following three criteria: (1) the new circuit can implement the original quantum algorithm with a given accuracy, (2) the generated unitary gate sequence should do the same task as the original unitary transformation, and (3) the algorithm can be executed within a reasonable time. To meet these criteria, we use the thresholds with different precision in the simulation and introduce a threshold of .
4.3. Postcompilation Optimization
In the synthesis procedure, the algorithm optimizes the structure of the gate sequence, and the approximate compilation for a gate sequence with length of the quantum algorithm is obtained ultimately. Next, the obtained sequence will be optimized by the gradient descentbased machine learning method.
Suppose the gate sequence generated by a compiler is , the gradient of can be described as follows:
The gradient of one single gate can be described as follows:which refers to the gradient of the gate of th matrix element of the given gate sequence .
For example, we take the rotation gate according to (5), and the derivate of the parameter can be obtained:
Here, we only consider the parametrized singlequbit gates, and all are rotation gates. The gradient between them can be easily evaluated.
Both the time and space complexities of the gradient optimization are in polynomials: the primary compute component is to compute the gradient , the Kronecker product of a dimension unitary matrix takes steps, and the normal product of 2 unitary matrices takes steps. So, to compute the gradient and take optimization in a quantum circuit with depth in and with the iteration round , the total time complexity should be and the space complexity should be , due to the fact that the memory for computing can be released between every iteration round.
Algorithm 1 is designed to optimize the continuous parameter via gradient descent method [22].

This algorithm can obtain an approximate compilation result of , where .
5. Conclusion
Quantum circuit synthesis (quantum compilation) is a fundamental technique in the era of NISQ, and it is a method of generating equivalent circuits for quantum processors. It synthesizes the equivalent circuits based on a given unitary transformation. Various limitations of the QSoC device (such as the limited circuit depth, limited topology, and limited number of quantum gates) restrict the realization of quantum algorithms. Circuit synthesis can be implemented by adapting quantum algorithms to quantum systemonchip under limited conditions. The development of an efficient and highfidelity quantum circuit synthesis algorithm has become a focus of QSoC research.
This paper focuses on the quantum circuit synthesis algorithm, which converts a higher level abstract algorithm into a lower level form to be executed on NISQ devices. With this algorithm, we improve the general quantum algorithms for QSoC. The algorithm or the method proposed in the paper adapts a decomposition method based on a heuristic search. When a circuit synthesis problem is formalized, a highfidelity output circuit can be obtained through iterative optimization on the optimizing target. In the process of algorithm iteration, we use the gradient descent and simulated annealing methods to optimize the cost function.
Compared with the traditional quantum computational architecture with a quantum processing unit (QPU), the quantum memorycentralized architecture proposed in this paper has two advantages. First, all quantum variables are independent of a physical quantum storage unit, making all quantum variables “logical”. Second, all procedures that quantum computation executes can be separated into different logical subprocedures. They can be executed in the quantum ALU part, the quantum memory unit, or even in the classic part. However, it adds difficulty to preprocessing because the “whole” logic processing of a set of variables is separated into a number of small processing.
We also study the topologyaware synthesis algorithm. It generates the corresponding gate sequence based on the quantum gate set and topological structure of the QSoC.
For further study, an open quantum system will be modeled in the noisy environment in the QSoC. The synthesis of circuits in an open environment (that is, in an open environment without QECC or with weak error tolerance) is interesting. Furthermore, a study of circuit synthesisspecific instruction sets, which can improve the performance and speed of the circuit synthesis procedure, is worth studying.
Data Availability
No data were used to support this study.
Conflicts of Interest
The authors declare that they have no conflicts of interest.
Acknowledgments
This work was supported by the National Key R&D Program of China (grant no. 2019YFA0308700) and the Chinese National Natural Science Foundation of Innovation Team (grant no. 61321491).