Subthreshold circuit designs are very much popular for some of the ultra-low power applications, where the minimum energy consumption is the primary concern. But, due to the weak driving current, these circuits generally suffer from huge performance degradation. Therefore, in this paper, we primarily targeted analyzing the performance of a near-threshold circuit (NTC), which retains the excellent energy efficiency of the subthreshold design, while improving the performance to a certain extent. A modified row-based dual 4-operand carry save adder (CSA) design has been reported in the present work using 45 nm technology. Moreover, to find out the effectiveness of the near-threshold operation of the 4-operand CSA design, it has been compared with the other design styles. From the simulation results, obtained for the frequency of 20 MHz, we found that the proposed scheme of CSA design consumes Watt of average power (), which is almost 90.9% lesser than that of the conventional CSA design, whereas, looking at the perspective of maximum delay at output, the proposed scheme of CSA design provides a fair 44.37% improvement, compared to that of the subthreshold CSA design.

1. Introduction

Subthreshold digital circuit design is a well-practiced technique, for implementing the highly energy-constrained, ultra-low power applications such as implanted sensors, pacemakers, and mobile peripheral processors [1, 2]. But the primary challenge, that limits its usage only to low performance systems, is the weak driving current. For the subthreshold or near-threshold operation, the MOS transistor is provided with a gate-to-source voltage which is either lower or else nearer to the threshold voltage () of the device. At the same time, the supply voltage () can be scaled below the or else can be set somewhat nearer to the . Thus, achieving the minimum power consumption, which leads to a longer battery lifetime, can be possible by using this technique [2]. However, the aforesaid advantage in energy consumption comes at the cost of performance degradation and that is mainly due to the fact that the charging and discharging of the load capacitances of the circuit (with the change in logic function) are actually driven by the weak subthreshold leakage current [3].

Now, it has been observed that a notable improvement in the performance of a CMOS circuit is possible, if we do a little bit of sacrifice in the energy consumption perspective [3]. And, this is the concept which triggers an increasing usage of near-threshold circuits (NTCs). To have the more precise definition, a circuit that operates with a supply voltage which is equal or slightly greater than the is called the NTC [4].

On the other hand, assigning the dual scheme to a CMOS circuit can be very effective in reducing both the dynamic and the leakage power [5, 6]. It provides the higher supply voltage () to timing critical logic gates, whereas the other noncritical logic gates of the circuit are actually driven by a lower supply voltage (). Therefore, with this dual technique, it is possible to reduce the overall power consumption, without degrading the performance of the circuit too much [2, 4]. Moreover, the use of the to speed up the timing critical logic gates and the to the noncritical logic gates for minimizing the total power of the circuit requires the additional level-shifters which causes extra power consumption as well as area overhead [7]. Now, considering the case of NTC, the key advantage lies in the fact that the value of and used in the circuit happens to be very close to each other. Thus, such a small difference in two supply voltages can eliminate the requirement of voltage level-shifters [4]. Thereby, properly selecting the subset of the logic gates which needs to be assigned with the , we can significantly improve the performance of the circuit at an affordable power cost [4].

Though, the assignment of dual can be extremely interesting in case of NTCs, but looking at the physical design implementation part, this approach may cause an extra cost [4]. To reduce this extra cost of routing overhead, we may go for the row-based dual assignment, where the different rows of circuit are prioritized based on their time criticality, and according to that the rows residing in the critical path are driven by the , while the rest of the rows in the circuit are provided with the [4]. Now, in this work, to find out the effectiveness of the row-based dual assignment in case of NTCs, the scheme is implemented on an example circuit, which is actually the 4-operand CSA, as described in [8]. The rest of the paper is organized as follows. Section 2 introduces the details of several design issues for the subthreshold circuits. In Section 3, the row-based dual assignment for a 4-operand CSA has been presented, whereas the near-threshold operation of the 4-operand CSA and its performance analysis has been illustrated in Section 4. Section 5 describes the conclusion of this work.

2. Subthreshold Circuit Design Issues

2.1. Modeling the Minimum Energy Point

In case of subthreshold operation (< ), the current that flows through the channel of a transistor is mainly due to diffusion [9]. Now, for the purpose of estimating the minimum energy point of a subthreshold circuit, we can take the help of the current model which serves as the basis for the entire analysis [9]. Assuming that total drain current in subthreshold regime is equal to the subthreshold current () and considering “n” as the subthreshold slope factor ( ), as the thermal voltage (), as the linearized drain induced barrier lowering (DIBL) coefficient, and S as the subthreshold slope, the can be represented as [3] where I0 denotes the drain current at (gate to source voltage) equal to (threshold voltage) and the (drain to source voltage) dependence in the “quasisaturation” region has been modeled using the [9].

Again, for a subthreshold circuit, the gate delay is expressed by the following [3]: where is denoting the delay fitting parameter and is giving the value of the output load capacitance of the gate.

Now, for the = , we can rewrite (2) as [3] Thus, the propagation delay of the gate exponentially depends on the as well as the .

Next, the total energy consumed per cycle (assuming rail to rail swing, i.e., for “ON” current) by a single gate can be expressed as [3] where and .

denotes the amount of leakage current, whereas gives the low to high activity of the output of the gate [3].

2.2. Optimum Sizing of the Various Logic Gates
2.2.1. Subthreshold Voltage Transfer Characteristics (VTC) of the CMOS Inverter Circuit

For the 45 nm technology node, the SPICE model which is used for the purpose of simulation has the threshold voltage for the NMOS which is set to 0.469 Volt, whereas for the PMOS it is set to −0.418 Volt. Figure 1 shows the voltage transfer characteristics (VTC) curves of an inverter circuit, where the supply voltage is varied from 0 to 0.4 Volt (with an increment of 0.1 Volt), to inspect the behavior of the circuit in the subthreshold region. It is observed that for the ratio of the width of the PMOS () to the width of the NMOS () around 4 : 1 there is a sharp transition at the output, whenever the input value crosses the /2 level.

2.2.2. Subthreshold XOR Gate Using Transmission Gate Logic

Figure 2(a) shows the conventional transmission gate based 8-transistor XOR that works at ultra-low voltages [9]. Besides, the use of transmission gates in the design helps to balance the number of parallel devices which are operating with the minimum voltage [9].

Figure 2(b) illustrates the plot of the values of the and the maximum delay at output (), with the variation of the and sizes of the transistors used in the circuit. It is seen that / = 800 nm/200 nm gives the optimum point where the two curves cross each other.

2.2.3. Subthreshold Operation of a Two-Inverter Chain or a Buffer Circuit

Here a two-inverter chain or a buffer circuit is firstly simulated with a single and thereafter with a dual (where is taken as 0.4 Volt and is taken as 0.8 Volt). In the first case, where is set to 0.4 Volt and frequency is 200 MHz, we considered the different / values (maintaining the above-mentioned ratio) for the transistors used in the buffer circuit. When the gate length (L) = 45 nm, / = 800 nm/200 nm, we found that the of the circuit is 3.528 × 10−8 Watt and the is 1.378 × 10−10 Second.

Now, in case of the dual assignment for any CMOS circuit, the major problem occurs when a low input swing starts driving a high gate. So, whenever a high voltage gate has to be driven by a low voltage gate, it becomes obvious to use a level converter (LC) [3]. Thus, the LC is supposed to perform the job of shifting the voltage from a lower level to a higher one. However, as the LCs do not implement any logic function, thereby the usage of a large number of LCs in a circuit may ultimately cause in the area as well as energy overhead [7].

To mitigate this issue, the concept of the use of a second threshold voltage for the PMOS transistors in the higher voltage gates (which are actually driven by the lower voltage gates) has been described in [7]. We followed a similar concept here (as shown in Figure 3), except for the fact that, for the purpose of increasing the threshold of those PMOS transistors, we have actually increased their gate lengths [10]. The overall performance of this buffer circuit, with a dual , has been described in Table 1.

From Table 1, it can be seen that the best case results are obtained when the gate length of the PMOS transistor in the higher voltage inverter circuit is set to 90 nm.

2.3. Obtaining the for a Full Adder Circuit

Firstly, the full adder (FA) circuit of Figure 4 has been driven by the single [11, 12] and the inputs having the frequency of 200 MHz. This FA circuit (which has actually got no buffer circuits at its sum and carry outputs) will hereafter be called as FA1 if not otherwise mentioned.

Now, to find out the for this FA1, we have varied the from 0.1 Volt to 0.8 Volt (with an increment of 0.1 Volt) and measured the changes in the values of and (Table 2).

It is observed that, for the region of = 0.4 Volt to 0.6 Volt, the (=× × ) is minimum. But, considering the aspects of (which will increase with the increase in ), we have opted the = 0.4 Volt as the for the FA1 circuit.

In the next, the same full adder circuit of Figure 4 is provided with two buffer circuits at its sum and carry outputs. For those buffers, the first inverter is driven by a supply of , whereas the second one is driven with the supply voltage which has the value of . Besides, as mentioned earlier in Section 2.2, the length of the PMOS transistor of the second inverter is taken as L = 90 nm. Now this FA circuit, which is supplied with the dual , will hereafter be considered as FA2.

Table 3 shows the performance of this FA2 circuit, when the is set to 0.4 Volt, the frequency is taken as 200 MHz, and the is varied in between the range of 0.4 Volt to 0.8 Volt.

From Table 3, it can be inferred that the best case result is obtained, considering the power delay product, when the = 0.4 Volt and the = 0.5 Volt.

3. Row-Based Dual Assignment for a 4-Operand CSA

Figure 5 shows a 4-operand CSA, where four 4-bit binary numbers (say, A3A2A1A0, B3B2B1B0, C3C2C1C0, and D3D2D1D0) can be added with an initial carry-in [8]. The upper two rows of the circuit (as shown in Figure 5) form the 4-bit CSA, whereas the third row serves as the carry propagation adder (CPA) [8].

Now, for the purpose of fine tuning the performance, we can opt for the near-threshold operation of this example circuit by selectively using for the gates which are in the critical path and for the rest of the gates to reduce the overall power consumption [4]. The dotted line, as shown in Figure 5, is meant for denoting the critical path of the circuit. Moreover, considering the view point of physical design implementation, the approach of row-based dual assignment has been adopted here. For that, the entire circuit is partitioned into three different clusters of row/rows. The first cluster may be formed using the subset of row/rows which is/are not time-critical (hence driven by ), whereas the third cluster can be formed using the subset of row/rows which is/are time-critical (hence driven by ). Now, the row which resides in the second cluster should be studded with the gates which are well-equipped to do the interfacing job in between the row at and the row at by .

With this notion, in our modified 4-operand CSA design (as illustrated in Figure 5), row1 is driven by the (=0.4 Volt), row2 is driven by a dual supply of (= 0.4 Volt) and (= 0.5 Volt), and row3 is driven by the (= 0.5 Volt) only. Furthermore, as the basic building blocks of the CSA design, we have used FA1 blocks for both row1 and row3 and FA2 blocks for the intermediate row2.

4. Near-Threshold Operation of the Proposed Scheme of CSA Design and Its Performance Analysis

When the conventional CSA design of [8] has been simulated with a larger supply voltage ( = 1 Volt), for the frequency of 20 MHz, the value is obtained as 2.071 × 10−10 Second. But, for the subthreshold operation ( = 0.4 Volt) of the same circuit (even though the power consumption reduces drastically), the value gets increased to a much higher value of 7.774 × 10−9 Second. Thereby, the application of the subthreshold design is mostly limited to the low performance systems only.

Now, to maintain this excellent energy efficiency of the subthreshold design, while boosting the speed of operation by a significant amount, we can explore the performance of the design for the near-threshold operation [13]. And that is what we have actually done in this work. To evaluate the effectiveness of the near-threshold operation of our modified CSA design, it has been compared with the conventional CSA design as well as the subthreshold CSA design (as shown in Table 4).

While operating for a frequency of 20 MHz, the proposed scheme of CSA design consumes 3.009 ×10−7 Watt of , which is almost 90.9% lesser than that of the conventional CSA design. Again, looking at the delay at output, the proposed scheme of CSA design provides a 44.37% improvement in , compared to that of the subthreshold CSA design.

Figure 6 illustrates the variation in power consumption values, considering all the three design styles, for different frequencies like 20 MHz, 50 MHz, and 200 MHz.

Following are the key points regarding the performance of the modified 4-operand CSA design presented in this work.(i)The first one is the flexibility of the choice of any higher supply voltage (as per the requirement) for the gates which are in the critical path. In case where the speed of the subthreshold circuit is important, we can tune it by increasing the even up to 0.8 Volt.(ii)From Figure 6, it can be inferred that the proposed scheme works fine not only for an operating frequency of 20 MHz but also for the higher frequency ranges.(iii)Row-based dual assignment has been incorporated in the proposed scheme of CSA design to facilitate the physical design implementation part.(iv)But, in the case where performance tuning is not the requirement, rather is the only concern, we may simply go for the subthreshold CSA design where the power consumption is minimum (Table 4).(v)Lastly, the limits for the upper/lower values of can be obtained through the use of as 1 Volt and as 0.4 Volt, respectively. Simultaneously, at these two limits, we will get the lower/higher amount of .The proposed scheme of CSA design fits somewhere in between with a major advantage, that is, the flexibility of the choice of (which can be used for the purpose of performance tuning).

5. Conclusion

In this work we have mainly focused on the performance analysis of a row-based dual CSA design which operates in the near-threshold voltage regime. For that purpose, we used two supply voltages: (= 0.5 Volt) and (= 0.4 Volt). Besides, the entire circuit is partitioned into different clusters of row/rows, and all the logic gates which reside in a particular cluster have been driven by a single supply (may be or ). Moreover, a fair comparison among the different design styles for a 4-operand CSA has also been presented here. From the results obtained, we can easily infer that the near-threshold operation of the proposed scheme of CSA design can be very much effective in reducing the overall energy consumption, like a subthreshold design. At the same time, it can also be useful in tuning the performance of the circuit so that the maximum delay at output gets reduced.

Conflict of Interests

The authors declare that there is no conflict of interests regarding the publication of this paper.


The authors would like to thank SMDP-II Project Lab., IC Design & Fabrication Centre, Jadavpur University, for giving them the opportunity to carry out this work using SPICE Tools.