#### Abstract

Addition arithmetic design plays a crucial role in high performance digital systems. The paper proposes a systematic method to formalize and verify adders in a formal proof assistant COQ. The proposed approach succeeds in formalizing the gate-level implementations and verifying the functional correctness of the most important adders of interest in industry, in a faithful, scalable, and modularized way. The methodology can be extended to other adder architectures as well.

#### 1. Introduction

Demonstrating the functional correctness of an arithmetic implementation is a challenging topic which has lasted for several decades. Testing and simulation, as the traditional methods, have won good reputation and have been employed extensively in industry. When dealing with large scale designs, these methods may find counterexamples but could not assert if a design is correct because the exhaustivity is impractical.

As an alternative, formal methods have been increasingly adopted to validate the arithmetic implementations. A main branch of formal methods is model checking, which is recognised by its automation and succeeds in numerous industrial applications. However, the inherent state explosion problem prevents it from scaling to large scale designs.

Another branch of verification is theorem proving, which is no longer restricted by the scale as model checking, testing, and simulation. The main problem restricting theorem proving to be widespread is that it requires strong logic backgrounds and heavy user interactions. Nevertheless, there appear quite a few successful applications by different theorem provers. By Boyer-Moore, a microprocessor is verified in [1], and an N-bit comparator as well as mean-value circuits are verified in [2]. By HOL, a ripple carry adder and a sequential device are verified in [3], and an ATM switch fabric is verified in [4]. By COQ, a sequential multiplier is verified in [5], and an asynchronous transfer mode switch fabric is verified in [6].

The main effort of this work is to propose a holistic methodology to formalize and verify adders in COQ [7]. Adders are chosen because they are the most fundamental arithmetic units widely employed in various advanced digital systems, such as IBM POWER6, whose correctness depends significantly on the correctness of its addition subcomponents. This methodology provides a uniform way to formalize and verify various implementations of arithmetic addition, and it is applied in this work to formalize and verify primary and high speed adders of interest in industry, including Carry Look-ahead Adder (CLA), Ling Adder (LA), and Parallel Prefix Adder (PPA).

Benefiting from the techniques of COQ, the methodology shares the following decent features.(i)Scalability: the formalization of an adder is parameterized by a natural number (named* length*) and the correctness proof applies to any length.(ii)Modularization: various verified adders are encapsulated as instances of an abstract module, which provides a uniform way to be reused in advanced arithmetic units. The formalization and verification of an advanced arithmetic unit can be accumulated from verified units ignoring their detailed implementations.(iii)Fidelity: the adders are formalized by (recursive) functions, which have clear correspondences to the gate-level implementations of circuits. The addends and sum of an adder are formalized as vectors, which is a faithful model of arrays and provides meanwhile additional type checking ability to avoid potential misusing of inputs.

The rest of paper is organized as follows. Related works are introduced in Section 2. According to our knowledge, we verify not only most adders appearing in the literature, but also some for the first time by theorem proving. Section 3 explains our methodology in details by the example of ripple carry adder. Preliminaries are also introduced according to our needs. Some definitions and most proofs will not be presented in this paper, but they are available on the author’s webpage (http://superwalter.github.io/dev/veriadder.zip). Sections 4 and 5 are devoted to LA and PPA, respectively.

#### 2. Related Work

Compared to their extensive applications, the verification of primary adders by theorem proving is not at the fingertips. In particular, the formalization and verification of the Ling adder cannot be found in any literature. Reference [8] proves the correctness of RCA by formalizing adders with dependent types in COQ. Reference [9] proves the correctness of RCA by the higher-order logic with a reusable library for formalizing circuits. Reference [2] verifies RCA written in VHDL as well as other circuits by the higher-order logic. Reference [10] develops semiformal correctness proof of CLA or PPA. Reference [11] shows a pencil-and-paper proof of the general prefix adders, as well as the proof of related RCA. Furthermore, [12] formalizes and verifies these adders in COQ. By rewriting and induction, [13] provides the verification of PPA using powerlists. An algebra formalization of PPA and its correctness proof are presented in [14]. Besides applying it to formalize and verify most primary adders, our methodology also provides good features, which only appear partially in other literatures, but are never integrated together in any preview work, according to our knowledge.

#### 3. A Holistic Methodology

Various kinds of adders are designed to provide relatively good performances for different circumstances, while they implement the same addition functionality. A holistic methodology is proposed in this work in order to capture all the different adders and provide desired good features.

##### 3.1. A Unified Proof Structure

Basically, the methodology answers four questions:(i)how to formalize the related data types;(ii)which method is used to formalize an adder;(iii)what should be proved;(iv)how to organize formalizations and verifications for different adders.

These questions are answered by a uniform specification, utilizing the module system of COQ. Definition mbadder (n: nat):= data (S n)-> data (S n)-> bit-> hyb (S n). Definition mbadder_c n (f: mbadder n):= forall (X Y: data (S n)) c,[X] + |[Y]|+ |c|= |(f X Y c)|. Module Type GenAdder. Parameter adder: forall n, mbadder n. Axiom adder_correct: forall (n:nat), mbadder_correct (@adder n). End GenAdder.

Lines 1 and 2 answer the first two questions. , in line 1, is a parameter (name* length*) indicating the inherent nature of an adder: how many bits it can process. The input carry-in and output carry-out are formalized by Booleans (bit). The input addends and the returned sum are formalized by vectors of Booleans (data ), which are dependent types depending on the length . hyp is another dependent type standing for a tuple of a bit and a -bit vector, which is used in line 2 for combining the carry-out and the sum. Thus, an adder is formalized as a function, taking two addends and a carry-in as inputs and returning a tuple of carry-out and sum. This function is normally recursively defined as shown later.

Lines 3, 4, and 5 answer the third question. The correctness of an adder is ensured by proving that the natural number denotations of the inputs and outputs are equivalent. In line 5, is the natural number denotation of a bit . and are natural number denotations of the vector and the result tuple . Big endian is chosen to implement these two functions.

Lines 6–10 answer the last question. A general adder is formalized as an abstract module. The specification is assigned and the correctness is required. A verified adder should be its instance, like a Ripple Carry Adder (RCA).

##### 3.2. An Example Explaining the Methodology

Carry Look-ahead Adder (CLA) improves RCA by computing all the carries in advance in order to reduce the significant delay. This is represented, in the formalization, by extending the general module with abstract functions , , and which are supposed to compute all the propagated carries, generated carries, and carries, respectively, according to the inputs. Module Type LookAheadAdder <: GenAdder. Parameter P: forall n, data n -> data n -> data n. Parameter G: forall n, data n -> data n -> data n. Parameter carries n: data (S n) -> data (S n) -> bit -> hyb (S n). Parameter adder: forall n, mbadder n. Parameter adder_correct: forall n, mbadder_correct (@adder n). End LookAheadAdder.

symbol in line 1 stands for the fact that this module should be an instance of the general verified adder. RCA is formalized according to the following equations:

Carry to each bit in CLA is computed by iteratively unfolding in (1) until which is an overall input bit as shown by the following example:

This process as well as definitions of and are formalized as follows: Definition P n (X Y: data n):= X Y. Definition G n (X Y: data n):= X Y. Definition carries n (X Y: data (S n)) (cin: bit): hyb (S n). induction n as [|n rec]. + exact (bandor (Y ) (X ) cin, [cin]). + set (recs:= rec (X ) (Y )). exact (bandor (Y ) (X ) (recs_{1}), (recs_{1})(recs_{2})). Defined.

and in lines 1 and 2 and used later are extensions of logical Boolean operations , , and , iterating these operations on the elements at the same position of the two vectors. + symbols in lines 5 and 6 stand for the start of the two branches of the recursion where or . The operators in line 5 return the leftmost element of a vector. Correspondingly, the operator in line 6 returns the rightmost elements of a -bit vector. is a vector with a single bit . and represent the first and second objects of a tuple, respectively. The operator in line 9 joins a bit and a -bit vector to form a -bit vector.

The adder is defined as follows and its correctness is proved by induction on the length and reusing the correctness result of the full adder: Definition adder: forall n, mbadder n. intros n X Y cin. set (cc:= carries (P X Y) (G X Y) cin). exact (cc_{1}, (P X Y) (cc_{2})). Defined. Theorem adder_correct: forall n, mbadder_correct (@adder n). Proof. induction n as [|n rec]. … Qed.

##### 3.3. Features Provided by the Methodology

There are several benefits to the use of this methodology for the verification of adders.

###### 3.3.1. Scalability

The formalization and verification of an adder is scalable to any data-width, because the parameterized length can be specified to arbitrary natural number. A -bit RCA can be obtained by the following: Definition CLA4:= CLA 3. Corollary CLA4_correct: forall (X Y: data 4) c,|[X]|+ |[Y]|+ |c|= |(RCA4 X Y c)|. Proof. intros; apply CLA_correct. Qed.

Notice that a -bit CLA is CLA3, because we require that the addends of the adders have at least one bit. The correctness proof of a CLA with a specified length follows straightforwardly from the proof of CLA with arbitrary length.

###### 3.3.2. Modularization

Some high speed adders divide the input addends into different groups. Each group is calculated by a Carry Selected Adder (CSA) independently, and different groups will be concatenated together in order. Since the computation of CSA depends on the very late steps of input carry-in, such designs would have less propagated time, thus high performance. We formalize an abstract architecture for this kind of design, which illustrates the modularization of our method and may also contribute to verify complex adders in the future.

CSA takes an abstract verified adder as parameter and is also an instance of the general verified adder. Module CSA (M: GenAdder) <: GenAdder. Definition adder n: mbadder n. intros X Y c. set (a1:= M.adder X Y true). set (a0:= M.adder X Y false). set (sum:= (dmap (band c) a1_{2}) (dmap (band (c)) a0_{2})). set (c’:= (a1_{1} c) (a0_{1} (c))). exact (c’, sum). Defined. Theorem adder_c: forall n, badder_ correct (@adder n). Proof. rewrite M.adder_c. Qed. End CSA. Module CSA_CLA:= CSA CLA.

Lines 2 to 10 define CSA. Two adders compute the sum and the carry-out with respect to carry-in and in lines 4 and 5, respectively. The multiplexer chooses the real sum and carry-out according to the actual carry-in in lines 6 and 7, since when the input carry is required. in line 6 applies a function to each element of a vector. The correctness of CSA holds because the addition unites are correct; thus, CSA is an instance of the general adder. The parameterized module can be instantiated by any verified adders. Line 13 defines a CSA whose addition unites are specified to CLA. Module Type GroupAdder (M: GenAdder) <: GenAdder. Parameter part: list nat. Fixpoint adder_rec (n lens len: nat): (mbadder lens). destruct n. + exact (@M.adder lens). + specialize adder_rec with (1:=n) (lens:= pred (cur_index_abr n len)) (2:=len). …. exact (cast_comb (combination (@M.adder (lens - (cur_index_abr n len))) (adder_rec)) (aux _ _ Hc3 Hc2)). Defined. Definition adder n:= adder_rec (sect n) n n. Lemma adder_correct: forall n, mbadder_ correct (adder n). Proof. Qed. End GroupAdder.

The formalization and verification of this adder are quite complex due to the problem with the dependent types as described in [15, 16]; therefore, the unimportant details are omitted. The in line 2 is a partition of the addends. This partition should be valid, which means the elements preserve strict order and do not exceed the total data-width. Lines 3 to 12 define the adder recursively by combining an adder with another which is combination of the remaining groups of adders obtained by recursion. in line 9 execute the combining operation. in line 9 converts an adder with length to an adder with length taking the proof of as an argument. The initial values of this recursive function are specified in line 13. The correctness can be proved by the induction on the length of the partition and using the correctness result of combining correct adders.

The parameterized module can be instantiated by any verified adder. If it is instantiated by CSA, it is a verification of many popular high speed adders.

###### 3.3.3. Fidelity

There are normally two ways to formalize the addends and sum of an adder in COQ, either by dependent type as in [6, 8] and this work or nondependent type as in [12]. Both [6, 8] have explanations why dependent type is more proper for the verification of adders. Generally speaking, nondependent list is more proper for formalizing linked list, whose length can be obtained by computation, while dependent vector is more proper for formalizing array, whose length is inherent natural. The functionality of adders is formalized by interactively defined (recursive) functions which have clear correspondences to gate-level description of circuits.

#### 4. Ling Adder

The Ling Adder (LA) was proposed by [17]. Instead of computing in advance all the* carries* as CLA, LA computes all the pseudo carries, the propagation of which have less fan-ins and fan-outs. With the proper grouping of the input addends, LA needs lesser levels of gates and consequently has better performance.

Similar to the propagated and generated carries, LA has new complementing signal and previous stage propagate , which are defined in (4) and (5) respectively as follows:

The pseudo carries are defined recursively. According to our knowledge, [17] and other materials about LA define the pseudo carries without considering the case as this paper does in (6b).

Consider

Without this case, the default values of and are both , and it is equivalent to our definition assuming that is always . More intuitively, that algorithm does not consider the carry-in to the least significant bit,, which restricts it to some special applications, such as the addition of two registers. We generalize it to provide general functionality of an adder. Sum is defined similarly to consider the carry-in to the least significant bit as follows:

The abstract module of Ling extends the general one by adding signatures of , , and . Module Type LingAdder <: GenAdder. Parameter K: forall n, data n -> data n -> data n. Parameter T: forall n, data n -> data n -> data n. Parameter H: forall n, data (S n)-> data (S n)-> bit-> data (S n). Parameter adder: forall n, mbadder n. Parameter adder_correct: forall n, mbadder_correct (@adder n). End LingAdder.

To compute the th pseudo carry of , the th bit of and the th bit of are needed. Therefore, the two parameters of stand for vectors and a left shift of . The formalization of assuming the correctness of the parameters is as follows: Definition H n (X Y: data (S n)): data (S n). induction n as [|n rec]. + exact ([(X ) (Y )]). + set (recs:= rec (X ) (Y )). exact ((X ) ((Y ) (recs )) recs). Defined.

is defined recursively. Line 3 deals with the case . Lines 4 and 5 deal with the recursive case. is the last bits of by recursion, and stands for .

LA is defined according to (7a) and (7b) using the definition of . Definition adder n (X Y: data (S n)) (cin: bit): hyb (S n). set (KXY:= K X Y). set (TXY:= T X Y). set (Tshft:= shiftin cin TXY). set (Hc:= H KXY (Tshft )). set (Hcshft:= shiftin true pc). set (sum:= (TXY Hc) (KXY (Hcshft ) (Tshft ))). exact ((TXY ) (Hc ), sum). Defined.

Since the th bit of sum depends on the th bit of and , they are shifted in lines 5 and 7. The reason why is shifted into is explained above; is shifted into to ensure where and are the bits to be shifted in, respectively, and . The carry-out of LA is which is equivalent to as shown in

The formalization of (8) is complicated, but the proof is trivial by induction and case analysis. The correctness of LA follows by proving a lemma stating that the outputs of CLA and LA are the same with regard to arbitrary same inputs. This lemma is proved by induction with the result of (8). Lemma LA_CLA_eq: forall n (X Y: data (S n)) c_in, LAdder.adder X Y c_in = CLAdder.adder X Y c_in. Proof. induction n as [|n rec]. Qed. Theorem adder_correct: forall n (X Y: data (S n)) c,|[X]|+ |[Y]|+ |c|= |(LAdder·adder X Y c)|. Proof. intros; rewrite LA_CLA_eq. apply CLAdder.adder_correct. Qed.

Reference [18] proposed an extension of Ling’s adder by the following equations: where and are group propagated and generated carries which are defined later in Section 5. Equation (11) is also proved in this work.

#### 5. Parallel Prefix Adder

CLA improves RCA by computing all the carries in advance as shown in (4). However, large fan-in and fan-out will be caused if all the carries are computed this way especially when is large. Parallel Prefix Adder (PPA) avoids this by the idea of divide-and-conquer, which provides an efficient way to compute all the parallel carries. Basic definitions are as follows:

Due to the similarity between (14) and (15), only the formalization of (15) is shown as follows. An auxiliary function, defined recursively on the difference of and , is reluctantly introduced to define it in COQ. Definition GpG_rec n (gp gg: data (S n)) (d i: nat): bit. revert i; induction d as [|d rec]; intros i. + exact (nth (n-i) gg). + exact ((nth (n-i) gg) ((nth (n-i) gp) (rec (pred i)))). Defined. Definition GpG n (X Y: data (S n)) i j:= GpG_rec X Y (i-j) i.

In line 1, the parameters and stand for the propagated and generated carry vectors. Another parameter is the difference of and . Function returns the th element of from the leftmost bit indexed .* pred * computes the predecessor of .

To compute all the carries parallel in advance, the carry should not depend on any , where , except which is the overall carry-in. Therefore, carries of PPA are computed according to a variation of (12) as follows: and different PPAs employ different parallel prefix methods to compute the group carries and , for all , for the sake of high performance. To capture various PPAs in a uniform framework, an abstract module, which abstractly describes this method as , is employed as follows: Module Type GroupCarries. Parameter groups: forall n, data2 (S n) -> data2 (S n). Axiom groups_correct: forall n (X Y: data (S n)), groups (P X Y, G X Y) = correct_groups (P X Y, G X Y). End GroupCarries.

, in line 3, is the dependent type of a tuple of vectors whose lengths are both . Therefore, the parameter of stands for vectors of propagated and generated carries as shown in line 6. in line 4 is the assumption that the function is correct. The correctness is represented as an extensional equality of another correct function and itself. In line 7, is the correct function to compute the groups carries according to (14) and (15). Its correctness holds by, first, computing all the carries according to this function and then proving that are equivalent to the carries of CLA. Definition correct_carries (n: nat) (c_in: bool) (X Y: data (S n)): hyb (S n). set (PXY:= P X Y). set (GXY:= G X Y). set (bvGp:= correct_groups (PXY, GXY)). set (all_c:= shift_map c_in bvGp). exact (all_c , all_c ). Defined. Lemma carries_correct: forall n (X Y: data (S n)) c_in, correct_carries c_in X Y = CLAdder.carries (P X Y) (G X Y) c_in.

, in line 2, is a compositional operation first iterating Equation (16) on all the and which are stored in the vectors of the first and projection of and then shifting the overall carry-in to get all the carries. Consider that the computation of depends on the subgroups of the group propagated carries , the* fundamental carry operator* “” as in [19] is used to compute the group propagated and generated carries simultaneously in function and should be used in all implementations of function . Consider

Function can be taken as an instance of function and is only one particular implementation of , which is verified. There are many other implementations of the function based on the following lemmas which are proved by induction on the difference between and , using (14) and (15):

Equation (18) can be rewritten using operator in one equation. For all ,

Equation (19) shows clearly that any group of group carries can be computed by its concatenation (or even overlapped) subgroups. And the proper dividing and conquering of the bits of input addends can implement function with high performance. PPA is such a family of adders differing only in the computation of the function; thus, a general PPA can be formalized and parameterized by module . Module PPAdder (Import M: GroupCarries) <: GenAdder. Definition adder n (X Y: data (S n)) (c_in: bit): (hyb (S n)). set (GT0:= groups ((P X Y), (G X Y)). set (all_carries:= shift_map c_in (GT0_{1}) (GT0_{2})). set (sum:= PC (all_carries )). exact (all_carries , sum). Defined. Theorem adder_correct: forall n (X Y: data (S n)) c_in,|[X]|+ |[Y]|+ |c_in|= |(adder X Y c_in)|. Proof. intros n X Y c_in; unfold adder. rewrite CLAdder.adder_correct. unfold CLAdder.adder. rewrite <- carries_correct. unfold correct_carries. rewrite groups_correct; trivial. Qed. End PPAdder.

Line 5, uses the abstract function from the parameterized module to compute all the group carries in advance. function in line 7 implements the operation in (16). Lines 6 and 8 compute all the carries and the sum.

The correctness of PPA is proved based on the assumption in the abstract parameterized module . Line 15 reformats the left part of the equation to the result of what CLA computes. Line 17 uses the result that the carries of CLA are identical to the carries computed by (14), (15), and (16). Finally, the assumption is used to prove that computes the same result as (14) and (15) do.

The rest of this section will show, by the example of Kogge-Stone addition algorithm, how this general PPA applies to some specific ones. The algorithm formalized following [20], in which the algorithm is proposed. Other implementations of PPA can be formalized and verified similarly. Module Kogge_Stone <: GroupCarries. Fixpoint KS_PG_rec (n m: nat) (bvPG: data2 (S n)): (data2 (S n)):= match m with|0 => bvPG|S m’ => let recur:= (KS_PG_rec m’ bvPG) in data2_op1 recur (shiftin_group (power2 m’)recur) end. Definition groups n (PG: data2 (S n)):= KS_PG_rec (S (log2 (S n))) PG. Theorem groups_correct: forall n (X Y: data (S n)), groups (P X Y, G X Y) = correct_groups (P X Y, G X Y). End Kogge_Stone_Group_Carry.

Lines 2 to 10 describe the main function to define the Kogge-Stone implementation of the function. is a simple counter to indicate how many* stages* are needed and which should the logarithm of the data-width be. When initializing, the input stands for two vectors of the propagated and generated carries, respectively, for example, and , for all . At any stage , this function computes the group carries of maximum length . A 8-bit kogge-stone adder is taken as an example to illustrate this procedure. At stage 2 (), the group carries of maximum length 4 has been computed according to line 8:
At the next stage (), as in lines 8 and 9, firstly function shifts both vectors in the tuple simultaneously times with and , respectively. The result is represented by :
Secondly, executes the fundamental operator in (17) with two operands and , and the result is

In line 11, function specifies that the stages needed are , where is the data-width.

The correctness theorem cannot be proved by induction on the data-width as normal, because Kogge-Stone implementation of function recurses on the stages, not the data-width as shown in the definition of .

Noticing that, in the theorem, the result of each side of equation is a tuple of vectors, the equality holds if and only if the corresponding elements are identical pairwise. Lemma data_eq_nth_eq_data2: forall n (gx gy: (data2 (S n))), (forall k, k < S n -> (nth k (fst gx), th k (snd gx)) = (nth k (fst gy), th k (snd gy))) <-> gx = gy.

However, the result of changes with the stage , the first thing to prove is an invariant of stating how this function approaches the result of function gradually with the increasing of the stages. Suppose, without loss of generality, that for all , and ; then, for all ,

With this invariant, the existence of the fixed points can be proved secondly, and the least fixed point should be . For all and , . Function , which iterates function stages, computes the same result as function , which is the correctness theorem. The whole proof of this theorem has been carried out in COQ, although they are expressed in an intuitive way here for better understandings of the readers.

Kogge-Stone adder can be combined by the general module of PPA and this specific module of Kogge-Stone methods to compute all the group carries, which provides not only the computation method but also the correctness proof.(1)Module Kogge_Stone <: GenAdder:=(2)PPAdder Kogge_Stone.

#### 6. Conclusion and Future Work

In this work, we proposed a holistic methodology to formalize and verify primary adders (RCA, CLA, LA, and PPA) in theorem prover COQ. They are formalized using dependent types, higher-order recursion and module systems in order to provide fidelity, scalability, and modularization.

In particular, PPA is a family of adders sharing the same structure, only differing in the methods of parallel prefix computing. We provide a novel way to describe the general PPA and show how to use this general module to verify a specific PPA, exemplified by Kogge-Stone adder.

Other advanced arithmetic designs can be verified reusing the formalizations and verifications of this work in a combinational way, as we describe by the example of carry select adders.

All the work has been carried out in COQ. The whole development contains around 2,000 lines of COQ scripts. This number of scripts is only about one third of [12], which is another work dedicated to verify additional designs in COQ. This work used lesser scripts but verified more addition designs than [12].

This work can be continued in two directions. Advanced arithmetic designs, such as IBM POWER6, can be cumulately verified based on these verified adders. Since formalization in a constructive way is to have clear correspondence to gate-level descriptions of circuits, HDL codes can be generated from the verified designs, which may provide an alternative way for designing the correct arithmetic implementations.

#### Conflict of Interests

The authors declare that there is no conflict of interests regarding the publication of this paper.

#### Acknowledgments

This work has been supported by the National Science Foundation of China Grant 61272002, the Tsinghua National Laboratory for Information Science and Technology (TNList) Cross-discipline Foundation 2011-9, the Major Research plan of the National Natural Science Foundation of China Grant 91218302, and the National Basic Research Program of China (973 Program) Grant 2010CB328003.