Research Article  Open Access
V. Kokilavani, K. Preethi, P. Balasubramanian, "FPGABased Synthesis of HighSpeed Hybrid Carry Select Adders", Advances in Electronics, vol. 2015, Article ID 713843, 13 pages, 2015. https://doi.org/10.1155/2015/713843
FPGABased Synthesis of HighSpeed Hybrid Carry Select Adders
Abstract
Carry select adder is a squareroot time highspeed adder. In this paper, FPGAbased synthesis of conventional and hybrid carry select adders are described with a focus on high speed. Conventionally, carry select adders are realized using the following: (i) full adders and 2 : 1 multiplexers, (ii) full adders, binary to excess 1 code converters, and 2 : 1 multiplexers, and (iii) sharing of common Boolean logic. On the other hand, hybrid carry select adders involve a combination of carry select and carry lookahead adders with/without the use of binary to excess 1 code converters. In this work, two new hybrid carry select adders are proposed involving the carry select and sectioncarry based carry lookahead subadders with/without binary to excess 1 converters. Seven different carry select adders were implemented in Verilog HDL and their performances were analyzed under two scenarios, dualoperand addition and multioperand addition, where individual operands are of sizes 32 and 64bits. In the case of dualoperand additions, the hybrid carry select adder comprising the proposed carry select and sectioncarry based carry lookahead configurations is the fastest. With respect to multioperand additions, the hybrid carry select adder containing the carry select and conventional carry lookahead or sectioncarry based carry lookahead structures produce similar optimized performance.
1. Introduction
Carry select adder (CSLA) belongs to the family of highspeed squareroot time adders [1, 2] and provides a good compromise between the low area occupancy of ripple carry adders (RCAs) and the highspeed performance of carry lookahead adders (CLAs) [2, 3]. In the existing literature, many flavors of carry select addition have been realized on both ASIC and FPGA platforms with ASIC implementations being predominant. CSLAs usually involve duplication of RCA structures with presumed carry inputs of binary 0 and binary 1 to enable parallel addition and thereby speed up the addition process [3, 4]. To minimize the area metric of CSLAs owing to replication of the RCA structures, an addone circuit (also called, binary to excess1 converter, viz. BEC) is introduced [5–7]. Carry select addition can also be performed by utilizing the common Boolean logic (CBL) [8] shared between the sum and carry outputs of a full adder [9]. Nevertheless, due to the serial cascading of full adder modules, the delay metric would not decrease although the area parameter would reduce. Further, optimizations at the device, gate levels [10–15], and realization styles [16, 17] have been carried out to reduce area, improve speed, and minimize the powerdelay product of CSLAs on the basis of semicustom and fullcustom ASICstyle synthesis. Rather than realizing pure CSLAs, hybrid architectures incorporating carry select and carry lookahead structures have also been proposed [18–21] to improve the design efficiency of CSLAs. Moreover, some FPGA implementations of CSLAs have been attempted [21–23]. Overall, a survey of published literature reveals that CSLAs have been widely implemented using the following topologies and computational elements:(i)(Conventional) CSLA – full adders and 2 : 1 multiplexers (MUXes)(ii)CSLA with BEC – Full adders, BECs, and 2 : 1 MUXes(iii)CSLA based on CBL sharing(iv)Hybrid CSLA and CLA structures(v)Hybrid CSLA and CLA including BECs.In general, CSLAs are composed using a carry select architecture with/without BECs or may consist of a mix of carry select and carry lookahead configurations with/without BECs. CSLAs constructed using pure carry select structures are called “homogeneous CSLAs” and CSLAs realized using a combination of carry select and carry lookahead structures are labeled as “heterogeneous/hybrid CSLAs.” The interest behind hybrid CSLAs is supported by the fact that heterogeneous adders tend to better optimize the design metrics compared to homogeneous adders [24]. In a recent work [25], sectioncarry based CLAs (SCBCLAs) were proposed as an alternative to conventional CLAs; for a 32bit addition operation, the SCBCLA was found to exhibit reduced propagation delay than the conventional CLA by 15.2%. Motivated by this result, two new hybrid CSLA architectures are proposed in this work, a hybrid CSLA incorporating CSLA and SCBCLA and another hybrid CSLA embedding CSLA, SCBCLA, and BECs. This paper builds upon our prior work [21] by analyzing the performance of different CSLA architectures with respect to diverse input partitions for different addition widths for the case of dualoperand addition and further evaluates the efficacy of the conventional and proposed CSLAs with respect to multioperand additions.
The remaining part of this paper is organized as follows. With 8bit addition as a running example, Section 2 describes the conventional CSLA topologies with and without BEC logic and also the CSLA based on sharing of CBL. Section 3 presents the architectures of hybrid CSLAs incorporating CLAs and SCBCLAs with/without BEC logic. In Section 4, the performance of different CSLA topologies is evaluated for dualoperand and multioperand additions with operand sizes of 32 and 64bits. Finally, the conclusions follow in Section 5.
2. Homogeneous CSLA Architectures
The RCA and homogeneous CSLA architectures are shown in Figure 1 for an example case of 8bit addition. Figure 1(a) depicts an 8bit RCA, which is formed by a cascade of full adder modules; the full adder [9] is an arithmetic building block that adds an augend and addend bit (say, and ) along with any carry input (cin) and produces two outputs, namely, sum (Sum) and carry overflow (Cout). Since there is a rippling of carry from one full adder stage to another, the propagation delay of the RCA varies linearly in proportion to the adder width. The CSLA basically partitions the input data into groups and addition within the groups is carried out in parallel; that is, the CSLA is composed of partitioned and duplicated RCAs. It can be seen from Figure 1 that the least significant 4bit adder stages of RCA and CSLAs are identical. However, the carry produced by the least significant nibble is simply propagated through the more significant nibble in the case of the RCA bitbybit, while the carry corresponding to the least significant nibble serves as the selection input for MUXes present in the more significant position in the case of CSLAs.
(a) 8bit RCA
(b) 8bit conventional CSLA comprising full adders and 2 : 1 MUXes (CSLA type)
(c) 8bit conventional CSLA comprising full adders, 2 : 1 MUXes, and BEC logic (CSLA_BEC type)
Figure 1(b) shows the 8bit conventional CSLA comprising full adders and 2 : 1 MUXes, henceforth referred to as simply “CSLA.” In the case of CSLA shown in Figure 1(b), the full adders present in the most significant nibble position are duplicated with carry inputs (cin) of 0 and 1 assumed; that is, one 4bit RCA with a carry input (“cin”) of 0 and another 4bit RCA with a carry input (“cin”) of 1 are used. Notice that both these RCAs have the same augend and addend inputs. While the least significant 4bit RCA would be adding the augend inputs ( to ) with the addend inputs ( to ), the more significant 4bit RCAs would be simultaneously adding up the augend inputs ( to ) with the addend inputs ( to ), with presumed carry inputs (cin) of 0 and 1. Due to two addition sets, two sets of sum and carry outputs are produced, one based on 0 as the carry input and another based on 1 as the carry input, which are in turn fed as inputs to the 2 : 1 MUXes. The number of MUXes used depends on the size of the RCA duplicated. To determine the true sum outputs and the real value of carry overflow pertaining to the most significant nibble position, the carry output () from the least significant 4bit RCA is used as the common select input for all the MUXes; thereby the correct result corresponding to either the RCA with 0 as the carry input or the RCA with 1 as the carry input is displayed as output.
Figure 1(c) portrays the 8bit CSLA containing full adders, 2 : 1 MUXes, and BEC logic, henceforth identified as “CSLA_BEC”. Figure 1(c) also shows the internals of the 5bit BEC, which is depicted by the circuit shown within the oval. The CSLA_BEC is rather different from the CSLA in that instead of having an RCA with a presumed carry input of 1 in the more significant nibble position, the BEC circuit is introduced. The BEC logic adds binary 1 to the least significant bit of its binary inputs and produces the resultant sum and carry as output. As seen in Figure 1(c), the BEC accepts as inputs the sum and carry outputs of the RCA having a presumed carry input of 0, adds binary 1 to this input, and produces the resulting sum and carry overflow as output. Now the correct result exists between choosing the output of the RCA with a presumed carry input of 0 and the output of the BEC logic. The carry output of the least significant RCA is used to determine the correct set of the most significant nibble position sum and carry outputs. The logic equations governing the 5bit BEC are given below. In the equations, signifies logical inversion, implies logical XOR, and represents logical conjunction. Consider The CSLA constructed on the basis of sharing of CBL is depicted through Figure 2, which will be referred to as “CSLA_CBL” henceforth. The CSLA_CBL adder is founded upon utilizing the full adder logic, whose underlying equations are given below with , , and cin being the primary inputs, and Sum and Cout being the primary outputs. In (3), “+” implies logical disjunction: Referring to (2) and (3), it may be understood that, for a carry input (cin) of 0, (2) and (3) reduce to and . With , (2) and (3) become and . Based on this principle, sum and carry outputs for both possible values of input carries are generated simultaneously and fed as inputs to two 2 : 1 MUXes. The correct sum and carry outputs are determined by the carry input, serving as the select input for the two MUXes. Though the exorbitant duplicated RCA and RCA with BEC logic structures are eliminated through this approach, leading to savings in terms of area, nevertheless, since the carry propagates from stagetostage, the critical data path delay tends to be proportional to the size of the full adders cascade. As a consequence, the delay of the CSLA_CBL adder may be close to that of the RCA which is confirmed by the simulation results given in Section 4.
3. Heterogeneous/Hybrid CSLA Architectures
Apart from synthesizing basic CSLA topologies viz. CSLA, CSLA_BEC, and CSLA_CBL, hybrid CSLA architectures involving CSLA and CLA/SCBCLA were also implemented with the intention of minimizing the maximum propagation path delay. It is well known that a CLA is faster than a RCA, and hence it may be worthwhile to have a CLA as a replacement for the least significant RCA in the CSLA structure. Although the concept of carry lookahead is widely understood, the concept of sectioncarry based carry lookahead may not be that well known, and hence to explain the distinction between the two, sample 4bit lookahead logic realized using these two approaches is portrayed in Figure 3 for an illustration. For details on different sectioncarry based carry lookahead structures and SCBCLA constructions using them, an avid reader is directed to references [25–27], which constitute prior works in the realm of synchronous and asynchronous designs.
The sectioncarry based carry lookahead generator shown enclosed within the circle in Figure 3 produces a single lookahead carry signal corresponding to a “section” or “group” of the adder inputs (hence the term “sectioncarry”), while the conventional carry lookahead generator encapsulated within the rectangle produces multiple lookahead carry signals corresponding to each pair of augend and addend primary inputs. The sectioncarry based carry lookahead generator differs from the traditional carry lookahead generator in that bitwise lookahead carry signals are not required to be computed for the former. The XOR and AND gates used for producing the necessary propagate and generate signals ( to and to ) are highlighted using dotted lines in Figure 3; these constitute the propagategenerate logic referred to in Figures 4 and 5.
(a) 8bit hybrid CSLA with a conventional CLA in the least significant stage
(b) 8bit hybrid CSLA featuring BEC with a least significant CLA stage
(a) 8bit hybrid CSLA with 4bit SCBCLA in the least significant stage
(b) 8bit hybrid CSLA incorporating BEC with a least significant SCBCLA stage
8bit hybrid CSLAs with/without BEC logic and comprising a CLA in the least significant stage viz. “CSLACLA” and “CSLA_BECCLA” adder types are shown in Figure 4. On the other hand, 8bit hybrid CSLAs with/without BEC logic and incorporating a SCBCLA in the least significant stage viz. “CSLASCBCLA” and “CSLA_BECSCBCLA” adder varieties are portrayed in Figure 5. Both the conventional CLA and SCBCLA constitute three functional blocks: propagategenerate logic, lookahead carry generator, and the sum producing logic. Not only is the carry lookahead generator different for CLA and SCBCLA adders, but the sum producing logic is also different; in case of CLA, the sum producing logic comprises only XOR gates, whereas in the SCBCLA, the sum producing logic consists of full adders and an XOR gate, with the XOR gate providing the sum of the primary inputs , , and . While rippling of carries occurs internally within the carrypropagate adder constituting the SCBCLA and producing the requisite sums, the lookahead carry signal corresponding to an adder section is generated independently (in parallel) and serves as the lookahead carry input for the successive CSLA stage.
4. Results and Discussion
Three homogeneous CSLA architectures viz. CSLA, CSLA_BEC, and CSLA_CBL and four heterogeneous CSLA architectures viz. CSLACLA, CSLA_BECCLA, CSLASCBCLA, and CSLA_BECSCBCLA were described topologically in Verilog HDL similar to previous works [16, 21–23, 25] to perform two kinds of addition operations viz. dualoperand addition and multioperand addition. For dualoperand addition, two binary operands having corresponding sizes of 32bits and 64bits were considered. For multioperand addition, addition of four binary operands, each of size 32bits, and another multioperand addition involving four binary operands with each having size of 64bits were considered. Moreover, two types of multioperand additions were performed based on (i) carry save adder (CSA) topology, and (ii) bitpartitioned addition scheme. All the adders were synthesized using a 90 nm FPGA (XC3S1600E) [28], with speed optimization specified as the design goal in the Xilinx 9.1i ISE design suite. The critical path delay and area values (in terms of number of basic logic elements viz. BELs) were ascertained after automatic placeandroute. The results of dualoperand additions shall be presented first, followed by the results obtained for multioperand additions.
4.1. DualOperand Addition
CSLAs can be implemented on the basis of uniform or nonuniform primary input partitions; accordingly they are labeled as “uniform” or “nonuniform” CSLAs, in a structural sense. “Input partitioning” basically means splitting up of the primary inputs into groups of inputs so as to pave the way for addition to be done in parallel within the partitions; it should be noted that input partitioning is inherent to all CSLAs except the CSLA_CBL type (shown in Figure 2) which has a regular carry select structure and hence is void of input partitions. Referring to Figure 1(b), it can be seen that 8 pairs of inputs have been split into two uniform or equalsized groups of 4input pairs; thus it can be said that the 8bit CSLA is realized according to a 44 input partition.
For synthesis, 3 uniform input partitions (44444444, 8888, and 1616) and 2 optimum nonuniform input partitions (37654322 [29] and 8764322 [15]) were considered for realizing the 32bit CSLAs. Figure 6 visually portrays the variations in propagation delay corresponding to different primary input partitions for the six CSLA types. On the other hand, 4 uniform input partitions viz. 4444444444444444, 88888888, and 16161616, 3232, and a nonuniform input partition viz. 810987654322 [29] were considered for realizing the 64bit CSLAs. Figure 7 depicts the propagation delay variations subject to different primary input partitions for the six CSLA architectures. The trend line highlighted in Figure 6 shows that the uniform 8888 input partition consistently paves the way for least propagation delay (varying from 17 ns to 20 ns) with respect to various 32bit homogeneous and heterogeneous CSLAs. Similarly the trend line indicated in black in Figure 7 conveys that the uniform 16161616 input partition results in the least data path delay (varying from 27 ns to 29 ns) for the different homogeneous and heterogeneous 64bit CSLAs.
The maximum combinational path delay (also called, “critical path delay”) encountered and the total number of BELs consumed by different homogeneous and heterogeneous CSLAs to perform the addition of two 32bit operands and two 64bit operands separately is shown in Tables 1 and 2, respectively. The optimum delay and area values are in bold font in the tables. Note that the symbol signifies the proposed hybrid CSLA architectures in the tables.


From Table 1, it is evident that the CSLASCBCLA hybrid adder based on the 8888 input partition features the least propagation delay (17.897 ns) amongst all homogeneous and hybrid CSLAs, and hence the 8888 input partition is deemed to be optimum. The 32bit RCA has critical path delay of 30.604 ns, while the 32bit CSLA_CBL adder is found to have the longest path delay of 37.604 ns. Compared to the maximum delay of the hybrid CSLASCBCLA, the hybrid CSLA_BECSCBCLA adder which is another proposed hybrid CSLA topology has a comparable speed performance of 18.052 ns. However with respect to area, the RCA and CSLA_CBL structures require less number of BELs than all the CSLAs. Hence it is inferred from Figure 6 and Table 1 that for the addition of two input operands having sizes of 32bits the hybrid CSLASCBCLA adder is preferable over all other homogeneous and heterogeneous CSLAs and the favorable input data partition is 8888.
Based on a similar observation, by referring to Figure 7 and Table 2, it can be seen that the 16161616 input partition is found to be optimum from a delay (i.e., speed) perspective for 64bit dualoperand addition. The proposed CSLA_BECSCBCLA constructed using the 16161616 input data partition leads to the least latency amongst all other adder topologies; however, the other proposed CSLA viz. CSLASCBCLA based on a similar input partition features almost a similar delay metric. In terms of area occupancy though, the 64bit RCA is optimized. Nevertheless, the RCA encounters considerably more data path delay by 1.6× in comparison with the proposed CSLA_BECSCBCLA based on a 16161616 input partition.
4.2. Multioperand Addition
The performance of different homogeneous and heterogeneous CSLAs is evaluated based on the case studies of multioperand addition involving 4 binary operands, with respective sizes of 32bits and 64bits. Two multioperand addition schemes are considered, one involving the carry save adder (CSA) topology, and another involving the bitpartitioning method.
4.2.1. CSA Based Multioperand Addition
The structure of an example CSA used to add four bit binary numbers is shown in Figure 8. Here, to , to , to , and to represent the primary inputs and the sum bits and to Sum_{0} represents the primary outputs. The subscript 0 denotes the LSB and the subscript () denotes the MSB. As shown in Figure 8, there are three adders in three levels to perform the addition of four input operands. In each CSA, the carry output signal of the current bit at a level is not transferred to the next bit adder of the same level as the carry input; instead, the carry output is transferred to the next bit adder in the lower level as the carry input. In the toplevel adder, three numbers (, , and ) are added simultaneously; that is, the bits corresponding to any number could act as input carries for the full adders of the first level CSA. In the next lower level, an extra number () is added. The adder in the bottom level, shown within the ellipse in Figure 8, is a simple RCA which is what portrayed here but it may be any dualoperand adder that can be used to compute the final sum.
Experimentation was performed by having different dualoperand adders viz. RCA and various homogeneous and heterogeneous CSLAs in the final adder stage of the CSA, shown in Figure 8, to analyze their relative performance for two different addition scenarios: (i) addition of four binary operands, each of size 32bits, and (ii) addition of four binary operands with each having size of 64bits.
The FPGAbased synthesis results viz. delay and area obtained for the addition of four binary operands, each having size of 32bits, are given in Table 3 with the optimized values in bold font. Since the 8888 primary input partition was found to yield the least data path delay, as evident from Figure 6 and Table 1, it was preferred for the various CSLA realizations. It can be seen from Table 3 that the hybrid CSLA_BECCLA when used in the final adder stage of the CSA encounters the least propagation delay, with the proposed CSLA_BECSCBCLA adder closely following it with just a 1.7% delay difference. The conventional CLA, when used in the final adder stage of the CSA as a “homogeneous adder,” reports a critical path delay of 34.306 ns. On the contrary, when the conventional CLA is used along with the CSLA inclusive of the BEC as a “heterogeneous adder” (CSLA_BECCLA), it enables considerable decrease in maximum data path delay by 37.8% vindicating the observation made in [24] that heterogeneous adders are preferable over homogeneous adders for delay optimization. Although the use of RCA and CSLA_CBL adders in the final adder stage of the CSA helps to minimize the area occupancy compared to their counterparts, they suffer from an exacerbated increase in delay of about 87% over the CSLA_BECCLA type.

The synthesis results obtained for the addition of four binary operands, each having sizes of 64bits, is shown in Table 4 and the optimized values are in bold font. Since the 16161616 uniform input partition was found to be delay optimal (refer to Figure 7 and Table 2), it was adopted for implementing all the CSLAs. Again, the CSLA_BECCLA variant reports the least propagation delay compared to others as in the previous case, with the proposed CSLA_BECSCBCLA reporting almost a similar performance. However due to less logic complexity, the usage of RCA or CSLA_CBL in the final adder stage of the CSA results in the least area occupancy in comparison with the rest, albeit at the expense of a considerable increase in delay by about 1.4x.

4.2.2. BitPartitioned Multioperand Addition
In CSAs, rowwise parallel addition is performed where the tree height (i.e., number of adder levels) grows with an increase in the number of input operands by an approximate linear order. To reduce the logic depth of the adder tree, a bitpartitioning strategy was presented in [30] in the context of selftimed multioperand addition, which involved splitting up of the entire group of data operands into a desired number of subgroups, and the intermediate addition results of the subgroups are finally added to produce the final sum. The bitpartitioning approach basically parallelizes the multioperand addition and is illustrated through Figure 9 for an example scenario where addition of “” binary operands with each operand having a size of “” bits is considered whilst assuming “” to be even. A “dot” represents a bit position in Figure 9.
The entire set of input operands from bit position 0 to bit position () is divided into two equalsized groups (for an example) as _field, which comprises inputs from bit positions 0 to () and the _field consisting of inputs from bit positions () to (). Addition within the individual fields (i.e., _field and _field) is performed simultaneously and the sum bits generated as intermediate outputs from these individual fields (_field and _field) are then added together using a final dualoperand adder to produce the required sum. The bitpartitioning scheme might help to speedup the addition, especially when several operands have to be added by way of performing parallel columnwise addition of rowwise partitions. For example, considering the addition of 32 data operands, each of size 32 bits, the CSA topology would encounter thirty full adder delays plus the delay associated with the final dualoperand adder. On the other hand, based on the bitpartitioning technique, considering eight partitions with each partition comprising four data operands, the bitpartitioned multioperand adder based upon the CSA topology could encounter a reduced propagation delay of about four full adder delays plus the delay of a dualoperand adder, depending upon the implementation. Also, a high regularity would be implicit within the overall architecture as the gatelevel hardware is being duplicated.
In this work, the bitpartitioning scheme was employed to partition the set of four inputs into two input groups (_field and _field, as shown in Figure 9) and the outputs of and fields were then added to produce the final sum. Several dualoperand adders were used to realize the bitpartitioned addition separately viz. RCA, CSLA_CBL, CSLA, CSLA_BEC, CSLACLA, CSLA_BECCLA, CSLASCBCLA, and CSLA_BECSCBCLA. The different bitpartitioned addition structures were individually synthesized using the same FPGA (XC3S1600E). It should be noted that the focus here is only on evaluating the performance of the RCA and different CSLAs as employed for multioperand addition and not to comment upon the efficacy of the bitpartitioning scheme as such (i.e., no comparison with the results of the previous subsection). This is because, as mentioned in the preceding discussions, the bitpartitioning technique is scalable, can be customdefined, and could potentially benefit in terms of latency reduction primarily for additions involving typically higher dimensions as compared with conventional combinational tree structures.
Table 5 presents the timing and area results obtained for the synthesis of bitpartitioned multiinput addition of 4 binary operands, each of size 32bits, on the basis of RCA and various homogeneous and heterogeneous CSLAs. Since the 8888 uniform input partition was found to be delayoptimum for realizing the 32bit CSLAs (refer to Figure 6 and Table 1), only this uniform input partition has been considered for implementing the various homogeneous and hybrid CSLAs corresponding to field and _field of the bitpartitioned multioperand addition. To sum up the outputs of field and _field, a 33bit dualoperand adder would be required in which case an extra bit has been added to the most significant position of various CSLA input partitions. The optimum synthesis metrics obtained for the example multiinput addition are in bold font in Table 5. It can be seen that the proposed CSLA_BECSCBCLA paves the way for least computation time (27.056 ns) amongst all. In comparison, the undesirable increases in delay values for other bitpartitioned multioperand adders incorporating RCA, CSLA_CBL, CSLA, CSLA_BEC, CSLACLA, CSLA_BECCLA, and CSLASCBCLA types are found to be 47.6%, 56.1%, 15.9%, 3%, 15.9%, 3%, and 2.1%, respectively. However, the RCA results in the lowest area occupancy (190 BELs) and the CSLA_CBL adder occupies nearly the same area with just 5 more BELs. Nevertheless, the bitpartitioned multioperand adder based upon the RCA pays a 47.6% delay penalty in comparison with that utilizing the CSLA_BECSCBCLA.

Table 6 shows the delay and area values obtained for the synthesis of bitpartitioned addition of four input operands of sizes 64 bits, corresponding to different adder architectures, with the CSLAs utilizing the 16161616 uniform input partition since this partition was found to be delay optimal (refer to Figure 7 and Table 2). With respect to less area, the RCA is found to be the optimum architecture. However, in terms of less critical path delay, the proposed CSLASCBCLA benefits by achieving a good delay reduction of 38.2% compared to the maximum path delay of the RCA based bitpartitioned multioperand adder.

5. Conclusions
CSLA is an important member of the highspeed adder family. In this paper, existing CSLA architectures viz. homogeneous and heterogeneous have been described and two new hybrid CSLA topologies were put forward: (i) carry selectcumsectioncarry based carry lookahead adder (CSLASCBCLA) and (ii) carry selectcumsectioncarry based carry lookahead adder including BEC logic (CSLA_BECSCBCLA). The speed performances of the various CSLA structures have been analyzed based on the case studies of 32bit and 64bit dualoperand and multioperand additions. Both uniform and nonuniform input data partitions were considered for the various CSLA implementations and FPGAbased synthesis was performed. It has been found for dualoperand additions; the proposed CSLASCBCLA/CSLA_BECSCBCLA architecture is faster and outperforms all other homogeneous and heterogeneous CSLAs. For bitpartitioned multiinput additions, the proposed CSLASCBCLA/CSLA_BECSCBCLA architecture promises high speed. Nevertheless, for multioperand addition based on the CSA topology, the conventional CSLA_BECCLA and the proposed CSLA_BECSCBCLA architectures were found to exhibit an optimized and comparable speed performance. From the inferences derived through this work, it is likely that the proposed hybrid CSLA architectures could achieve enhanced performance over conventional CSLAs for ASICbased synthesis as well.
Conflict of Interests
The authors declare that there is no conflict of interests regarding the publication of this paper.
Acknowledgment
The authors thank the constructive comments of the reviewers, especially the pointing out of some typos in the initial submitted version by a reviewer, which has helped to improve this paper’s presentation.
References
 O. J. Bedrij, “Carryselect adder,” IRE Transactions on Electronic Computers, vol. 11, no. 3, pp. 340–346, 1962. View at: Publisher Site  Google Scholar
 A. R. Omondi, Computer Arithmetic Systems: Algorithms, Architecture and Implementation, Prentice Hall, 1994.
 I. Koren, Computer Arithmetic Algorithms, A K Peeters/CRC Press, 2nd edition, 2001.
 B. Parhami, Computer Arithmetic: Algorithms and Hardware Designs, Oxford University Press, New York, NY, USA, 2nd edition, 2010.
 T.Y. Chang and M.J. Hsiao, “Carryselect adder using single ripplecarry adder,” Electronics Letters, vol. 34, no. 22, pp. 2101–2103, 1998. View at: Publisher Site  Google Scholar
 Y. Kim and L.S. Kim, “64bit carryselect adder with reduced area,” Electronics Letters, vol. 37, no. 10, pp. 614–615, 2001. View at: Publisher Site  Google Scholar
 B. Ramkumar and H. M. Kittur, “Lowpower and areaefficient carry select adder,” IEEE Transactions on VLSI Systems, vol. 20, no. 2, pp. 371–375, 2012. View at: Publisher Site  Google Scholar
 I.C. Wey, C.C. Ho, Y.S. Lin, and C.C. Peng, “An areaefficient carry select adder design by sharing the common boolean logic term,” in Proceedings of the International MultiConference of Engineers and Computer Scientists (IMECS '12), vol. 2, pp. 1091–1094, March 2012. View at: Google Scholar
 P. Balasubramanian and N. E. Mastorakis, “High speed gate level synchronous full adder designs,” WSEAS Transactions on Circuits and Systems, vol. 8, no. 2, pp. 290–300, 2009. View at: Google Scholar
 W. Jeong and K. Roy, “Robust highperformance lowpower carry select adder,” in Proceedings of the Asia and South Pacific Design Automation Conference, pp. 503–506, Kitakyushu, Japan, January 2003. View at: Publisher Site  Google Scholar
 M. Alioto, G. Palumbo, and M. Poli, “A gatelevel strategy to design carry select adders,” in Proceedings of the IEEE International Symposium on Circuits and Systems, vol. 2, pp. 465–468, IEEE, May 2004. View at: Publisher Site  Google Scholar
 M. Alioto, G. Palumbo, and M. Poli, “Optimized design of parallel carryselect adders,” Integration, the VLSI Journal, vol. 44, no. 1, pp. 62–74, 2011. View at: Publisher Site  Google Scholar
 A. Nève, H. Schettler, T. Ludwig, and D. Flandre, “Powerdelay product minimization in highperformance 64bit carry select adders,” IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vol. 12, no. 3, pp. 235–244, 2004. View at: Publisher Site  Google Scholar
 Y. He, C.H. Chang, and J. Gu, “An area efficient 64bit square root carryselect adder for low power applications,” in Proceedings of the IEEE International Symposium on Circuits and Systems (ISCAS '05), vol. 4, pp. 4082–4085, May 2005. View at: Publisher Site  Google Scholar
 B. K. Mohanty and S. K. Patel, “Areadelaypower efficient carry select adder,” IEEE Transactions on Circuits and Systems II: Express Briefs, vol. 61, no. 6, pp. 418–422, 2014. View at: Publisher Site  Google Scholar
 J. Monteiro, J. L. Güntzel, and L. Agostini, “A1CSA: an energyefficient fast adder architecture for cellbased VLSI design,” in Proceedings of the 18th IEEE International Conference on Electronics, Circuits and Systems (ICECS '11), pp. 442–445, Beirut, Lebanon, December 2011. View at: Publisher Site  Google Scholar
 Y. Chen, H. Li, K. Roy, and C.K. Koh, “Cascaded carryselect adder (C^{2}SA): a new structure for lowpower CSA design,” in Proceedings of the International Symposium on Low Power Electronics and Design, pp. 115–118, August 2005. View at: Google Scholar
 Y. Wang, C. Pai, and X. Song, “The design of hybrid carrylookahead/carryselect adders,” IEEE Transactions on Circuits and Systems II: Analog and Digital Signal Processing, vol. 49, no. 1, pp. 16–24, 2002. View at: Publisher Site  Google Scholar
 G. A. Ruiz and M. Granda, “An areaefficient static CMOS carryselect adder based on a compact carry lookahead unit,” Microelectronics Journal, vol. 35, no. 12, pp. 939–944, 2004. View at: Publisher Site  Google Scholar
 H. G. Tamar, A. G. Tamar, K. Hadidi, A. Khoei, and P. Hoseini, “High speed area reduced 64bit static hybrid carrylookahead/carryselect adder,” in Proceedings of the 18th IEEE International Conference on Electronics, Circuits and Systems (ICECS' 11), pp. 460–463, December 2011. View at: Publisher Site  Google Scholar
 V. Kokilavani, P. Balasubramanian, and H. R. Arabnia, “FPGA realization of hybrid carry selectcumsectioncarry based carry lookahead adders,” in Proceedings of the 12th International Conference on Embedded Systems and Applications, pp. 81–85, 2014. View at: Google Scholar
 R. Yousuf and Najeebuddin, “Synthesis of carry select adder in 65nm FPGA,” in Proceedings of the IEEE Region 10 Conference (TENCON '08), pp. 1–6, November 2008. View at: Publisher Site  Google Scholar
 U. Sajesh Kumar and K. K. Mohamed Salih, “Efficient carry select adder design for FPGA implementation,” Procedia Engineering, vol. 30, pp. 449–456, 2012. View at: Google Scholar
 J.G. Lee, J.A. Lee, B.S. Lee, and M. D. Ercegovac, “A design method for heterogeneous adders,” in Embedded Software and Systems, vol. 4523 of Lecture Notes in Computer Science, pp. 121–132, Springer, 2007. View at: Google Scholar
 K. Preethi and P. Balasubramanian, “FPGA implementation of synchronous sectioncarry based carry lookahead adders,” in Proceedings of the IEEE 2nd International Conference on Devices, Circuits and Systems (ICDCS '14), pp. 1–4, IEEE, Combiatore, India, March 2014. View at: Publisher Site  Google Scholar
 P. Balasubramanian, D. A. Edwards, and W. B. Toms, “Selftimed sectioncarry based carry lookahead adders and the concept of alias logic,” Journal of Circuits, Systems and Computers, vol. 22, no. 4, Article ID 1350028, 2013. View at: Publisher Site  Google Scholar
 P. Balasubramanian, D. A. Edwards, and H. R. Arabnia, “Robust asynchronous carry lookahead adders,” in Proceedings of the 11th International Conference on Computer Design, pp. 119–124, 2011. View at: Google Scholar
 Xilinx, http://www.xilinx.com.
 K. K. Parhi, “Lowenergy CSMT carry generators and binary adders,” IEEE Transactions on VLSI Systems, vol. 7, no. 4, pp. 450–462, 1999. View at: Publisher Site  Google Scholar
 P. Balasubramanian, D. A. Edwards, and W. B. Toms, “Selftimed multioperand addition,” International Journal of Circuits, Systems and Signal Processing, vol. 6, no. 1, pp. 1–11, 2012. View at: Google Scholar
Copyright
Copyright © 2015 V. Kokilavani et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.