Abstract

Addition usually affects the overall performance of digital systems and an arithmetic function. Adders are most widely used in applications like multipliers, DSP (i.e., FFT, FIR, and IIR). In digital adders, the speed of addition is constrained by the time required to propagate a carry through the adder. Various techniques have been proposed to design fast adders. We have derived architectures for carry-select adder (CSA), Common Boolean Logic (CBL) based adders, ripple carry adder (RCA), and Carry Look-Ahead Adder (CLA) for 8-, 16-, 32-, and 64-bit length. In this work we have done comparative analysis of different types of adders in Synopsis Design Compiler using different standard cell libraries at 32/28 nm. Also, the designs are analyzed for the stuck at faults (s-a-0, s-a-1) using Synopsis TetraMAX.

1. Introduction

In any processing unit, adders are basic building block. We look for an area and power efficient fast adders to increase system performance [1, 2]. Ripple carry adders (RCA) [3] are area and power efficient, but with drawback of being slow. Carry-select adders (CSA) [2, 410] are one of the fastest adders among traditional adders, but they are not power and area efficient. Common Boolean Logic (CBL) [7] adders are area-power-delay efficient adders. Carry Look-Ahead Adder (CLA) [11] is among the fastest adders with area overhead.

The organization of this paper is as follows. Section 2 deals with the analysis of adder using different topologies. Section 3 presents logic for CBL adder design. Section 4 deals with comparative analysis based on delay, area, and power among different adder architectures using Synopsis Design Compiler. The fault testing concept (s-a-0, s-a-1) for various adders is also discussed. Section 5 gives the conclusion followed by the references.

2. Different Topologies for Adder

In this section, we have discussed in brief the different architectures of adder, namely, ripple carry adder (RCA), Carry Look-Ahead Adder (CLA), CSA, SQRT-CSA, and Common Boolean Logic (CBL) adder.

2.1. Ripple Carry Adder (RCA)

It is basic parallel adder where a chain of adders is cascaded with carry rippled from one stage to another. Block diagram for 4-bit RCA is shown in Figure 1(a). Delay is due to the rippling of carry since the consequent blocks of adder have to wait for the carry generated from the previous adder. Delay of -bit RCA adder is where is input setup time (for generating XOR and AND of basic inputs), is the delay of two gates, that is, AND gate and OR gates, and is the delay of the XOR gate.

A 64-bit RCA is shown in Figure 1(b). There are 16 blocks and each block is a 4-bit RCA. Each block has 4 full adders except the first block which has 3 full adders and 1 half adder as the lowest bit of addends can be added using the half adder instead of a full adder.

2.2. Carry Look-Ahead Adder (CLA)

CLA use the concept of generating carry and propagating carry. We know that is generating function which generates the carry regardless of the value of and   is propagating function. Carry propagates if either of or is 1:On simplifying, we haveExpressions shown above are implemented using a two-level AND-OR circuit so that can be evaluated very quickly. Due to independent computation of carry, the CLA based adder is fast as compared to RCA but at a cost of increased hardware. Block diagram for 4-bit CLA is shown in Figure 2(a).

Since circuit of CLA Adder for higher order bit is complex to implement directly, we have realized it by cascading 4-bit CLA blocks. Figure 2(b) shows 64-bit adder based on CLA blocks. There are 16 blocks. Each block is of 4 bits and has circuit as shown in Figure 2(a). The carry generated from any block is rippled to its successor block. This is not a pure 64-bit adder. It has been implemented just for study purpose.

2.3. Carry-Select Adder (CSA)

The carry-select adder consists of two ripple carry adders and a set of multiplexers. The block diagram for 4-bit addition using CSA is given in Figure 3(a). For adding two 4-bit numbers using CSA, we require two 4-bit full adders and that can be ripple carry adder (RCA) or Carry Look-Ahead Adder (CLA). 1st stage computes the 4-bit addition using and 2nd stage computes the same with . After the two results are available, the correct sum and carry-out are then decided by the multiplexer once the correct carry is known [4]. For designing 64-bit adder, we can cascade the structure shown in Figure 3(a). It means we need 16 stages of CSA to create 64-bit CSA adder. The number of bits in each CSA block is uniform.

Figure 3(b) shows block diagram of 64-bit CSA. There are 2 blocks, each having 8 blocks consisting of 4-bit RCA. All blocks other than RCA0 are in pair whose is connected to logic 0 or 1 which acts as and the results of whom block is to be considered depend upon the carry propagating from the previous block which is controlled by the MUX which can be seen after every pair of blocks (smaller). Depending upon the carry the sum is also considered to be taken. The bunch of MUX can be seen at the end of both blocks; these control the sum. The RCA block can be of any bit not necessarily of 4 bits and the number of adders in each RCA block controls the speed and power dissipation. The more the number of multiplexers in the circuit, the more the area and power dissipation.

A CLA-CSA can be designed in the same fashion. The only difference will be that the RCA block will be replaced by the CLA blocks. The block diagram of 64-bit CLA-CSA is shown in Figure 3(c).

2.4. Square-Root Carry-Select Adder (SQRT-CSA)

A 16-bit CSA with variable size is created by cascading four CSA with variable input size. The block diagram for the same is given in Figure 4(a). We have five stages of CSA with different inputs bits to each stage. We have an adder with block sizes of 2, 2, 3, 4, and 5, respectively. In the modified SQRT-CSA, the block of CSA with is replaced with BEC (Binary to Excess Converter) [5].

Figure 4(b) shows the block diagram of 64-bit SQRT-CSA. We have an adder with block sizes of 2, 3, 4, 5, 6, 7, 8, 9, 10, and 10, respectively.

Another interesting adder is based on CBL. It is explained in Section 3 in detail.

3. Common Boolean Logic Adders

Table 1 shows the output pattern of 1-bit full adder. From the table we conclude that if the carry propagating from previous adders is “0,” the current sum () and the carry-out bit () are the XOR and AND of the two input bits, respectively. On the other hand, if the carry propagating from previous adder is “1,” the current sum () bit is the XNOR and the carry-out bit () is OR of the two input bits, respectively. Hence,

The implementation of the CBL based adder is shown in Figure 5(a).

From Figure 5(a), it can be seen that MUX decides the final sum depending upon the carry propagating from previous logic cell. The carry which will propagate will decide the final sum and the carry for the next logic cell. If the carry propagating from previous adder () is “0” then the sum () will be ; otherwise would be . Similarly if C0 is “0” then carry which will propagate to next cell would be ; otherwise propagating carry would be .

Delay of -bit CBL adder iswhere is as defined earlier, is the delay of a 2 × 1 MUX, and is the delay of the XOR gate.

Figure 5(b) shows the block diagram of 64-bit Common Boolean Logic based adder. There are 4 blocks, having 16 blocks each, which means 64 blocks of Common Boolean Logic.

4. Comparative Analysis of Different Adder Architecture

In this section, we have presented the comparison of delay, power, and area of various adders using 32/28 nm digital standard cell library using Synopsis Design Compiler.

4.1. Design Compiler

Design Compiler is developed by Synopsis. Important aspects of any digital circuit like area, power, delay, and so forth can be calculated by Design Compiler just providing the HDL code of the design and digital standard cells library of the technology on which you want to work. Different standard cells libraries are provided by the Synopsis for the educational purpose. The libraries are in three formats  .db,  .lib, and  .v. In Design Compiler we need  .db format;  .lib format is readable format of the  .db format.

4.2. Area-Delay Estimation of Basic Circuit Using Design Compiler

We have considered 2 input AND, OR, and XOR gates. The area, delay, Dynamic Power Dissipation, and leakage are given in Table 2. Leakage power is considered for  V DC at room temperature as per the data sheet. Operating frequency is 500 MHz.

For the understanding purpose, we are analyzing a simple circuit of full adder as shown in Figure 6. The critical path of the full adder is highlighted in the figure. It should be the double of the XOR gate, that is, .16 ns, which is exactly the same as the propagation delay as per the Design Compiler.

The area of the circuit should be the area of the 2 XOR gates (i.e., 8.640896 ), 2 AND gates (i.e., 4.066308 ), one OR gate (i.e., 2.033152 ), and the area of the nets (i.e., 1.070661 ) which is equal to 15.811017 . The area calculated by Design Compiler is 15.811013 which is approximately the same.

We have computed the area, power, or delay for the different adder circuit. We have developed HDL code for each adder module for different input bit length. Figures 7, 8, and 9 show the delay for the various adders for 32/28 nm technology for HVT (High Voltage Threshold), RVT (Regular Voltage Threshold), and LVT (Low Voltage Threshold), respectively, for different adder architecture for 8-bit, 16-bit, 32-bit, and 64-bit adder. RVT is equivalent to standard Voltage Threshold. Units are in nanoseconds. EDK libraries for 32 and 28 nm technology are the same as provided by the Synopsis.

Figures 10, 11, and 12 show the Dynamic Power Dissipation for 32/28 nm technology for HVT, RVT, and LVT, respectively. Units are in microwatt. Figure 13 shows silicon area required to design hardware for various adders of different length. Units are in square micrometer.

Figure 14 shows leakage power dissipation for different adder architecture for 8-bit, 16-bit, 32-bit, and 64-bit adders. Units are in microwatt.

4.3. Required Test Patterns to Test Various Adder Circuits

In the analysis of digital circuits, testing plays a very important role [12]. In the traditional testing we used to apply all the input and check the output corresponding to the applied inputs. If all the results used to be fine then we used to declare the hardware good only. As the size of the digital design increases the number of the input patterns increases. Then different methods were introduced to minimize the number of the input test patterns. Some of the methods like ATPG are discussed in [12] to save the time. As in the environment of such huge competition where the manufacture wants to launch the product as soon as possible, it requires lesser number of input test patterns. The hardware which will need lesser input test patterns to test will be better as per the testing point of view.

Similarly, to find the stuck at faults we need some input patterns. We apply those patterns to the input of the design and check the output. If all the outputs corresponding to all the inputs are correct we can say that the hardware is free from stuck at faults.

TetraMAX is used to generate pattern using ATPG for different circuits. With TetraMAX, designers can generate high-quality manufacturing test patterns without compromising on high performance design techniques. While such techniques may impede other ATPG tools, TetraMAX is able to obtain coverage on the resulting complex logic. Input test patterns using ATPG for different digital circuits are generated by TetraMAX and details are given in Figure 15. These patterns can be used to test the circuit for various stuck at faults after manufacturing.

Figure 15 shows number of input test patterns required to test various adders for 8, 16, 32, and 64 bits. For example for 8-bit RCA we will need only 6 patterns to declare the hardware good while there can be total of 65536 input patters to add two 8-bit values.

5. Conclusion

A comparative analysis of various digital adders has been performed in this paper. The parameters of comparison were delay, area, and power (dynamic and leakage) using digital standard cell library. Test generation analysis on the adders was also performed.

With Respect to Delay. For 8-bit adders RCA-CSA is approximately 3% faster than SQRT-CSA and CLA-CSA, 42% faster than RCA and CSA-CBL, and 60% faster than CLA.

For 16-bit adders RCA-CSA is approximately 2% faster than SQRT-CSA and CLA-CSA, 90% faster than CSA-CBL and RCA, and 125% faster than CLA.

For 32-bit adders SQRT-CSA is approximately 14% faster than RCA-CSA and CLA-CSA, 170% faster than CSA-CBL and RCA, and 210% faster than CLA.

For 64-bit adders SQRT-CSA is approximately 33% faster than RCA-CSA and CLA-CSA, 260% faster than CSA-CBL and RCA, and 310% faster than CLA.

Adders with LVT cells are 20% and 60% slower than with RVT and HVT cells, respectively.

With Respect to Power Dissipation. CLA consumes power approximately 35% lesser than RCA, 40% lesser than CSA-CBL, 50% lesser than CLA-CSA, and 70% lesser than SQRT-CSA and RCA-CSA.

Adders with HVT cells are 1.5% and 2% lesser than with RVT and LVT cells, respectively.

With Respect to Area. CLA require area approximately 15% lesser than RCA, 30% lesser than CSA-CBL, 50% lesser than CLA-CSA, and 60% lesser than RCA-CSA and SQRT-CSA.

With Respect to Leakage Power. CLA have leakage power approximately 15% lesser than RCA, 30% lesser than CSA-CBL, 50% lesser than CLA-CSA, and 60% lesser than RCA-CSA and SQRT-CSA.

Considering the number of input test patterns required to test the adder CLA, RCA, and CSA-CBL architecture are far better than CSA. Also among non-CSA architectures CLA is the best and CSA-CBL is the worst while among CSA architectures CLA-CSA is better than RCA-CSA and SQRT-CSA.

So depending upon the requirements we can choose the adder accordingly.

Similar analysis can also be performed on other various adders for different standard digital libraries, that is, 360 nm and 90 nm.

Competing Interests

The authors declare that they have no competing interests.