Abstract

We develop a formal verification procedure to check that elastic pipelined processor designs correctly implement their instruction set architecture (ISA) specifications. The notion of correctness we use is based on refinement. Refinement proofs are based on refinement maps, whichβ€”in the context of this problemβ€”are functions that map elastic processor states to states of the ISA specification model. Data flow in elastic architectures is complicated by the insertion of any number of buffers in any place in the design, making it hard to construct refinement maps for elastic systems in a systematic manner. We introduce token-aware completion functions, which incorporate a mechanism to track the flow of data in elastic pipelines, as a highly automated and systematic approach to construct refinement maps. We demonstrate the efficiency of the overall verification procedure based on token-aware completion functions using six elastic pipelined processor models based on the DLX architecture.

1. Introduction

The impact of persistent technology scaling results in a previously ignored set of design challenges such as manufacturing and process variability and increasing significance of wire delays. These challenges threaten to invalidate the effectiveness of synchronous design paradigms at the system level. Several alternate design paradigms to deal with these challenges are being proposed. One popular trend is latency-insensitive designs, which allows for variability in data propagation delays [1]. Synchronous Elastic Networks (SENs) [2, 3] hav been proposed as an effective approach to design latency-insensitive systems.

One of the critical challenges for any design approach to succeed is verification. We present a novel highly automated formal verification solution for latency-insensitive pipelined microprocessors developed using the SEN approach (here on referred to as elastic processors). Note that correctness proofs for methods to synthesize elastic designs from synchronous designs have been provided [2], but this is not a substitute for verification. The idea with the verification approach is to show that the elastic processor correctly implements all behaviors of its instruction set architecture (ISA) model, which is used as the high-level specification for the processor. The notion of correctness that we use is Well-Founded Equivalence Bisimulation (WEB) refinement, a detailed description of which can be found in [4]. It is sufficient to prove that the elastic processor (implementation) and its ISA (specification) satisfy the following core WEB refinement correctness formula to establish that the elastic processor refines (correctly implements) its ISA.

Definition 1.1 (Core WEB Refinement Correctness Formula). One has βŸ¨βˆ€π‘€βˆˆπΌπ‘€π‘ƒπΏβˆΆβˆΆπ‘ =π‘Ÿ(𝑀)βˆ§π‘’=𝑆𝑠𝑑𝑒𝑝(𝑠)βˆ§π‘£=𝐼𝑠𝑑𝑒𝑝(𝑀)βˆ§π‘’β‰ π‘Ÿ(𝑣)→𝑠=π‘Ÿ(𝑣)βˆ§π‘Ÿπ‘Žπ‘›π‘˜(𝑣)<π‘Ÿπ‘Žπ‘›π‘˜(𝑀)⟩.(1)

In the formula above, IMPL denotes the set of implementation states, Istep is a step of the implementation machine, and Sstep is a step of the specification machine. The refinement map π‘Ÿ is a function that maps implementation states to specification states. In fact, the refinement map can be thought of as an instrument to view the behaviors of the implementation machine at the specification level, thereby allowing verification tools to easily compare the behaviors of the two systems. π‘Ÿπ‘Žπ‘›π‘˜ is used for deadlock detection. Our focus in this work is to check safety, that is, to show that if the implementation makes progress, then, the result of that progress is correct as specified by the high-level specification. We plan to address deadlock detection in future work, and we therefore ignore π‘Ÿπ‘Žπ‘›π‘˜ for the present.

The specific steps involved in a refinement-based verification methodology are (a) construct models of the specification and implementation, (b) compute the states of the implementation model that are reachable from reset (known as reachable states), (c) construct a refinement map, and (d) the models and the refinement map can now be used to state the refinement-based correctness formula (excluding deadlock detection) for the implementation model, which can then be automatically checked for the set of all reachable states using a decision procedure. Modeling and verification are performed using ACL2-SMT [5], a system developed by combining the ACL2 theorem prover (version 3.3) with the Yices decision procedure (version 1.0.10) [6].

The primary challenge in applying the refinement-based approach to elastic pipelines is as follows. The very attractive property of elastic systems is that they allow for the insertion of buffers (known as elastic buffers) in any place in the data path to deal with propagation delays of long wires, without altering the functionality of the system. The insertion of these buffers however can drastically change the data flow patterns of the system, making it hard to compute refinement maps for these systems. Our primary contribution is a procedureβ€”we call token-aware completion functionsβ€”that computes refinement maps for elastic pipelined systems (described in Section 3) even after the insertion of elastic buffers in any place in the data path. The procedure allows for highly automated and efficient verification of elastic pipelined systems. The effectiveness of our verification method is demonstrated using 6 DLX-based elastic pipelined processor models. The models are described in Section 2. Verification results are given in Section 4, and we conclude in Section 5. Due to limited space, we request the reader to refer to literature for background on synchronous elastic networks [2, 3] and refinement [4].

Note that this is the first known approach that aims to verify the correctness of elastic pipelined processors against their high-level nonpipelined ISA specifications. In previous work, we have developed an equivalence checking approach that is used to verify elastic pipelines against their synchronous parent pipelines [7].

2. Elastic Processor Models

The elastic processor models are based on the 5-stage DLX pipeline. The elastic processor models and their nonpipelined ISA-level specifications are described using the ACL2 programming language and are defined at the term-level, because term-level abstractions make the verification problem tractable. We use the ACL2-SMT system for verification as it can be used to reason at the term-level. Note that bit-level versions of these models were used in [7]. The models were obtained by first elasticizing a synchronous 5-stage DLX processor using the Synchronous Elastic Flow (SELF) protocol approach [2]. The main idea is to replace all flip flops with elastic buffers (EBs) that are constructed from two elastic half buffers (EHBs), namely, a master EHB and a slave EHB. The clock network is replaced by a network of elastic controllers, where each controller is used to control the elastic buffers in a pipeline stage and synchronized with the controllers of adjacent pipeline stages. The controllers are synchronized with the clock and are connected in accordance with connections between pipeline stages in the data path. Each controller has three possible states, π‘’π‘šπ‘π‘‘π‘¦, β„Žπ‘Žπ‘™π‘“, and 𝑓𝑒𝑙𝑙, which indicate that the corresponding elastic buffer has 0, 1, and 2 valid data tokens, respectively.

We call the processor model obtained by elasticizing the synchronous DLX 𝑀0. The main advantage of the elastic processor is that it permits the insertion of additional elastic buffers at any place in the data path to break long wires. We therefore inserted additional elastic buffers 𝑙1,…,𝑙5 at various places in the model. We inserted 𝑙1 in model 𝑀0 to get model 𝑀1. We then inserted 𝑙2 in model 𝑀1 to get 𝑀2. We derived models 𝑀3, 𝑀4, and 𝑀5 in a similar manner. The model M5 is shown in Figure 1. The figure also shows the positions of the additional elastic buffers and how they are connected with the elastic buffers corresponding to the pipeline latches (namely 𝑝𝑐,𝑓𝑑,𝑑𝑒,π‘’π‘š, and π‘šπ‘š). The network of elastic controllers for the DLX processor with five additional elastic buffers in the data path is shown in Figure 2. These models are used to demonstrate the effectiveness of our verification approach.

3. Token-Aware Completion Functions

Flushing [8] is one standard approach used to compute refinement maps for pipelined processors. In this approach, partially executed instructions in the pipeline latches are forced to complete, without allowing the machine to fetch any new instructions. Projecting out the programmer visible componentsβ€”which include the program counter, register file, instruction memory, and data memory for the models we considerβ€”in the resulting state will give the corresponding ISA state.

Completion functions [9] were proposed as a computationally efficient approach to construct flushing refinement maps. One completion function for each pipeline latch in the machine is used to compute the effect on the programmer visible components of completing any partially executed instruction in that latch. The completion functions are composed to form the flushing refinement map. Note that older instructions in the pipeline are completed before younger instructions. For the DLX example, let 𝑓𝑑𝑐, 𝑑𝑒𝑐, π‘’π‘šπ‘, and π‘šπ‘šπ‘ be the completion functions for the latches 𝑓𝑑, 𝑑𝑒, π‘’π‘š, and π‘šπ‘š, respectively. Let π‘Ÿπ‘“, π‘–π‘š, and π‘‘π‘š be the register file, instruction memory, and the data memory of the processor model. The ISA state 𝑠 corresponding to a synchronous DLX processor state 𝑀 (βŸ¨π‘π‘π‘€,𝑓𝑑𝑀,𝑑𝑒𝑀,π‘’π‘šπ‘€,π‘šπ‘šπ‘€,π‘Ÿπ‘“π‘€,π‘–π‘šπ‘€,π‘‘π‘šπ‘€βŸ©) is βŸ¨π‘π‘π‘ ,π‘Ÿπ‘“π‘ ,π‘–π‘šπ‘ ,π‘‘π‘šπ‘ βŸ©= fdc(dec(emc(mmc (βŸ¨π‘π‘π‘€,π‘Ÿπ‘“π‘€,π‘–π‘šπ‘ ,π‘‘π‘šπ‘€βŸ©, π‘šπ‘šπ‘€),π‘’π‘šπ‘€),𝑑𝑒𝑀),𝑓𝑑𝑀).

When we try to apply the completion functions approach to elastic pipelined processors, two issues arise. First, in some states of the elastic processor, instructions can be duplicated in the data path; that is, an instruction can reside in two pipeline latches. Such a situation can occur at a fork when the instruction in a buffer before the fork has proceeded along one path of the fork, but the other path is blocked. The latch before the fork has to retain the instruction until both paths are cleared. A direct application of the completion functions-based map to such a state will result in completing the same instruction twice leading to an erroneous refinement map. Second, Elastic Half Buffers (EHBs) need not have valid tokens. The contents of such EHBs should be ignored and should not be used to update the programmer visible components.

We introduce token-aware completion functions as a method to compute flushing-based refinement maps for elastic pipelined processors. The idea being that EHBs which are either holding duplicate instructions or are in an empty state should not be completed. This is achieved by first computing the reachable states of the elastic controller network. We use token-flow diagrams proposed in [7] to compute the reachable states of the system. The reachability analysis is performed by simulating how tokens flow in the elastic architecture using a form of symbolic simulation. The output of the token-flow diagrams is a set of token-states, one token-state for each reachable state. In a token-state, each EHB is assigned a numbered token, which is essentially a natural number. A value of β€œ0” indicates a bubble; that is, the EHB is empty. Also, EHBs with the same instruction will be assigned the same token numbers. Thus, using the token-state, duplicate instructions and empty EHBs can be identified.

The token-aware completion functions approach works by first computing a two-dimensional array; we call token-array. Each row in the array corresponds to a reachable state of the elastic controller network. Each element in a row is a binary value. The number of elements in a row is 2𝑛, where 𝑛 is the number of pipeline latches in the elastic system. If token-array (𝑖,𝑗)=1, then the contents of EHB 𝐻𝑗 in the reachable state 𝑆𝑖 should be completed. If token-array (𝑖,𝑗)=0, then the contents of EHB 𝐻𝑗 in the reachable state 𝑆𝑖 should be ignored when computing the refinement map. Given the set of token-states (which are the reachable states represented using numbered tokens) of the elastic controller network of an elastic system, Procedure 1 computes the token-array for the elastic system.

Procedure. In:𝑆𝑅, set of token-states of the elastic controller network and 𝑃𝐻, the ordered set of pipeline half buffers. The number of token states (|𝑆𝑅|) is π‘Ÿ. The number of pipeline half buffers (|𝑃𝐻|) is 2𝑛, where 𝑛 is the number of pipeline latches. The order of the pipeline half buffers is determined by the position of the buffer in the pipeline; that is, buffers closer to the end of the pipeline have a higher index.Out:token-array for the elastic system.(1) Initialize 𝑖 to π‘Ÿ.(2) Initialize 𝑉𝑑 (the set of visited tokens) to {0}. The token number β€œ0” represents a bubble. Note that initializing 𝑉𝑑 to {0} causes the procedure to assign a β€œ0” value to the empty EHBs in the token-array.(3) Initialize 𝑗 to 2𝑛.(4) Let 𝑑=token (𝑆𝑖,𝑃𝐻𝑗), where π‘‘π‘œπ‘˜π‘’π‘› is a look-up function that gives the token number for EHB 𝑃𝐻𝑗 in token-state 𝑆𝑖.(5)token-array (𝑖,𝑗)=Β¬(π‘‘βˆˆπ‘‰π‘‘)(6) Assign 𝑉𝑑=𝑉𝑑βˆͺ{𝑑}: add the token number of EHB 𝑃𝐻𝑗 to the visited token set.(7) If π‘—βˆ’1β‰ 0, decrement 𝑗 and go to step 4.(8) If π‘–βˆ’1β‰ 0, decrement 𝑖 and go to step 2. Procedure 2 takes as input the token-array and computes the flushing refinement map for the elastic system using completion functions.

Procedure. In: Elastic processor state 𝑀: βŸ¨π‘ƒ1,…,π‘ƒπ‘š,𝐻1,…,𝐻2π‘›βŸ©. 𝑃1,…,π‘ƒπ‘š are the programmer visible components, and 𝐻1,…,𝐻2𝑛 are the half buffers in the pipeline latches of the elastic machine.Out: ISA state 𝑠 obtained by applying the flushing refinement map to 𝑀.(1) Let 𝑆2𝑛+1=βŸ¨π‘ƒ1,…,π‘ƒπ‘šβŸ©. (2) Initialize 𝑖 to 2𝑛.(3)π‘Ÿ=reachable-state (𝑀), gives the number of the reachable elastic controller network state of 𝑀, assuming that the reachable states are numbered.(4)One has 𝑆𝑖=ξƒ―ξ€·π‘†π‘π‘œπ‘šπ‘π‘™π‘’π‘‘π‘–π‘œπ‘›π‘–+1,𝐻𝑖𝑆,ifπ‘‘π‘œπ‘˜π‘’π‘›-π‘Žπ‘Ÿπ‘Ÿπ‘Žπ‘¦(π‘Ÿ,𝑖)=1,𝑖+1,otherwise.(2)(5) If π‘–βˆ’1β‰ 0, decrement 𝑖 and go to step 3.(6)Then, 𝑠=𝑆1.

Example 3.3. The elastic controller network of the 𝑀5 processor model has two reachable states 𝑆1 and 𝑆2. The token-states 𝑇1 and 𝑇2 (given as a vector of token numbers for the EHBs in 𝑀5 in the order βŸ¨π‘π‘βˆ£π‘“π‘‘βˆ£π‘‘π‘’βˆ£π‘’π‘šβˆ£π‘šπ‘šβˆ£π‘™1βˆ£π‘™2βˆ£π‘™3βˆ£π‘™4βˆ£π‘™5⟩) corresponding to these reachable states 𝑆1 and 𝑆2, respectively, are ⟨0,7∣0,6∣0,5∣0,4∣0,0∣0,0∣0,0∣0,3∣0,0∣0,3⟩ and ⟨0,0∣7,6∣0,5∣0,0∣0,4∣0,7∣0,4∣0,3∣0,3∣0,0⟩ [7]. Note that there are two tokens in the token-states for each EB, one corresponding to the master EHB and the other corresponding to the slave EHB. The completion function-based refinement map obtained using Procedures 3.1 and 3.2 for any state 𝑀 of processor model 𝑀5 whose elastic controller network is state 𝑆1 is βŸ¨π‘π‘π‘ ,π‘Ÿπ‘“π‘ ,π‘–π‘šπ‘ ,π‘‘π‘šπ‘ βŸ©=𝑓𝑑𝑐(𝑑𝑒𝑐(π‘’π‘šπ‘(π‘šπ‘šπ‘(βŸ¨π‘π‘π‘€,π‘Ÿπ‘“π‘€,π‘–π‘šπ‘ ,π‘‘π‘šπ‘€βŸ©,𝑙5𝑀𝑠),π‘’π‘šπ‘€π‘ ),𝑑𝑒𝑀𝑠), 𝑓𝑑𝑀𝑠). The completion function-based refinement map obtained using Procedures 3.1 and 3.2 for any state 𝑀 of processor model 𝑀5 whose elastic controller network is state 𝑆2 is βŸ¨π‘π‘π‘ ,π‘Ÿπ‘“π‘ ,π‘–π‘šπ‘ ,π‘‘π‘šπ‘ βŸ©= fdc(fdc(dec(mmc (βŸ¨π‘π‘π‘€,π‘Ÿπ‘“π‘€,π‘–π‘šπ‘ ,π‘‘π‘šπ‘€βŸ©,π‘šπ‘šπ‘€π‘ ),𝑑𝑒𝑀𝑠),𝑓𝑑𝑀𝑠), π‘“π‘‘π‘€π‘š).

4. Results

The token-aware completion functions approach was used to verify safety for six elastic pipelined processors 𝑀0,…,𝑀5. The results are shown in Table 1. Verification was performed using the ACL2-SMT system. The ACL2-SMT system incorporates a translator that reduces the correctness theorem to a decision problem in the form of a formula in a decidable logic that Yices can handle. The decision problem is then checked by Yices. Column β€œBool Vars” gives the number of Boolean variables in the decision problem. The experiments were conducted on a 1.8 GHz Intel (R) Core(TM) Duo CPU, with an L1 cache size of 2048 KB. As can be seen from the table, each of the elastic 5-stage DLX-based processors was verified against the high-level instruction set architecture (ISA) within 25 seconds, thereby demonstrating the high efficiency of our approach.

5. Conclusions

We have developed a method for checking the correctness of elastic pipelined processors against their high-level instruction set architectures. The approach was demonstrated by verifying 6 DLX-based elastic processor models. For future work, we plan to further explore the scalability of the verification method.