#### Abstract

Two novel theorems are developed which prove that certain logic functions are more robust to errors than others. These theorems are used to construct datapath circuits that give an increased immunity to error over other naive implementations. A link between probabilistic operation and ultra-low energy computing has been shown in prior work. These novel theorems and designs will be used to further improve probabilistic design of ultra-low power datapaths. This culminates in an asynchronous design for the maximum amount of energy savings per a given error rate. Spice simulation results using a commercially available and well-tested β technology are given verifying the ultra-low power, probabilistic full-adder designs. Further, close to 6X energy savings is achieved for a probabilistic full-adder over the deterministic case.

#### 1. Introduction

As digital technology marches on, ultra-low voltage operation, atomic device sizes, device mismatch, and thermal noise are becoming commonplace and so are the significant error rates that accompany them. These phenomena are causing ever increasing bit-error rates, and with billion-transistor digital chips being produced today, even a 1-in-100-million bit error rate becomes costly.

This paper will present a novel discovery of boolean logic that certain logic gates are more robust to error than others, and in fact it will be shown that some logic even *improves* the error rate just through natural computation. The paper will show how these principles translate into CMOS and other implementations, but these principles are *independent of technological implementation* since they are properties of boolean logic itself. Thus these design principles will stand the test of time.

The motivation behind studying computing architectures robust to error then becomes clear, especially as technological breakthroughs have recently shown that extreme power savings can be traded for a certain level of errorβknown as probabilistic computing [1]. Paying more attention to error rates is critical if scaling power consumption and devices is to continue [2].

Recently, ultra-low power computing has been achieved by lowering the supply voltage of digital circuits into near threshold or even the subthreshold region [1, 3]. Indeed a fundamental limit to voltage scaling technology has been proposed: the thermodynamic limit of these devices [4]. When the supply voltage becomes comparable to thermal noise levels in these types of ultra-low power designs, devices start to behave probabilistically giving an incorrect output with some nonzero probability [4, 5]. Kish predicts that the thermal noise phenomenon will result in the βdeathβ of Mooreβs law [6]. The International Technology Roadmap for Semiconductors predicts that thermal noise will cause devices to fail catastrophically during normal operationβwithout supply voltage scalingβin the next 5 years [6, 7].

A paradigm-shifting technology has been introduced in part by the authors called probabilistic CMOS or pcmos to combat the failure in voltage scaling due to the thermal noise limit. Experiments have already been completed that show that this thermal noise barrier can be overcome by computing with deeply scaled *probabilistic* CMOS devices. It was shown, in part by the authors, that probabilistic operation of devices allows for applications in arithmetic and digital signal processing that use a fraction of the power of their deterministic counterparts [1, 8]. This paper builds upon this work, and offers improved solutions for ultra-low power datapath units.

Logic gates are the fundamental building blocks of all digital technology, and it has been discovered that not all logic gates propagate error equally *regardless of technological implementation*. The main contributions of this paper are the following.

*regardless of implementation or technology generation*(ii)Several theorems will be given that offer a guideline for logic that most reduces error rates. (iii)A case study using full-adders is given using these principles to achieve reduced error rates strictly through intelligent implementation (e.g., without error correction logic). (iv)An analysis is given for asynchronous logic showing that a lower error rate is achieved when compared to its synchronous counterparts.(v)It is shown that reducing error rates can be translated into reduced power consumption via probabilistic computing.

This work is an expansion of the work proposed in [9]. The background to this research and related work is given in Section 2. Some definitions and assumptions are given in Section 3. A case study analyzing which implementation is optimal for noisy full-adders is given in Section 5. Theorems illustrating the reliability of logic circuits under a probabilistic fault model are presented in Section 4. Conclusions and future directions are discussed in Section 6.

#### 2. Logic Tables and Device Physics

##### 2.1. All Logic Is Not Created Equal

Boolean logic functions are most simply represented by a *Truth Table* mapping each input combination to an output. When a bit-error is present at the input, or one of the input bits is flipped in other words, if this new input combination is mapped to the same output, then no error results in calculating the given logic function. In this case the logic function did not *propagate* the error. If one assumes a small, given error rate per bit, then a single-bit error is more likely than two simultaneous bit errors which is more likely than three simultaneous errors, and so forth. A logic function that has the least input-output mappings where a single-bit error on an input causes a mapping to a different output will be the least likely to propagate a bit error. This phenomenon is shown in Figures 1(a) and 1(b) using an NAND function and an XOR function as an example.

**(a)**

**(b)**

Figure 1 illustrates the theme that not all logic gates propagate errors equally. If one calculates the average probability of error across all possible input combinations of NAND and XOR logic, an extremely interesting result emerges. Assume that a probability of error of is present at the inputs of these gates. Further assume that full-adders are built such that a probability of error at the input is also present and that one of the full-adders is built with an NAND-NAND implementation and the other is built with a standard XOR implementation. Figure 2 shows the drastically differing output probabilities that result. Note that in Figure 2, the gate itself has no probability of computing erroneously associated with it. This shows that different gates have differing abilities to mask errors regardless of input combination.

As shown in Figure 2, the NAND gate has a much lower output probability of error, , than the XOR gate when under the same input error conditions. An interesting property also emerges that a NAND actually propagates an *improved* error probability due to its logic properties through its natural computation. Obviously, a is orders of magnitude higher than the actual error rates seen in digital logic, but it will later be shown that the output error probability of logic gates monotonically decreases as a function of decreasing error rates and thus this concept holds true at more realistic bit-error rates. Subsequent sections will go into more depth explaining these properties.

##### 2.2. MOSFET Implementation and Device Properties

There are two key issues that cause device failures in digital logic implemented with MOSFET transistors: device mismatch and thermal noise. This work will address the thermal noise problem; device mismatch will be addressed in future works. As for other types of noise, the switching speed of devices is faster than the frequencies typically seen for the type of noise (flicker noise), so there is no need to address that effect. A plot of thermal noise measured from a chip is shown in Figure 3(a). The derivation of the probability of a digital β1β or β0β is seen in Figure 3(b) using a comparator.

**(a)**

**(b)**

Device mismatch continues to get worse with minimum size devices as process sizes scale down. Mismatch problems can be mitigated with larger device sizes, but this is not always reasonable, increases power dissipation, and might give loss of computing performance. Other techniques are possible for tuning digital gates and is the subject of much recent work [10, 11] that has shown promise for programmatically tuning devices with floating gates. With sufficient calibration using these tuning techniques, the device mismatch effect can be removed. This result is quite significant, as interest in subthreshold digital circuits continues to increase. Mismatch in threshold voltage impacts circuits by , where or 25.4βmV. Hence a mismatch of 20βmV, which happens for minimum devices in CMOS process, could result in a change of current by a factor of [12]. As we scale down, this level of mismatch continues to increase. The failure in this case is that timing of the gates somehow does not occur when expected on a probabilistic basis.

Even a conservative estimate of the thermal noise problem is disconcerting. The thermal noise magnitude for a digital gate is approximated as noise. For a small 1F capacitor, the rms noise level is roughly 2βmV. However, because of a trend of larger interconnect capacitances, large digital buffers, and higher operating temperatures due to transistor density, thermal noise has been shown to increase to 200βmV in even nanoscale digital circuits [5]. In low-power digital logic, a supply voltage in the hundreds of mV is not unreasonable especially in subthreshold logic and hence the probability of an error becomes exceedingly likely in this case [3]. Of course, one would need to look at the probability over a large number of gates, say 1 billion gates, and never having an error all the sudden gets interesting.

An estimate given by Kish in [6] predicts untenable device failure due to thermal noise at the 22βnm transistor node assuming βV, creating an argument for a not so distant problem. So far, device mismatch issues are the larger effect. However, thermal noise is the eventual limit that will be reached in low-power circuit design, and thus merits the treatment given here.

##### 2.3. Theoretical Background for Probabilistic Gates

Design with probabilistic gates is an old problem that has been given a new application via ultra-low power computing. In [13], von Neumann showed two important results. Reliable circuits can be built with noisy (probabilistic) gates with an arbitrarily high reliability, but this circuit will compute more slowly than the same circuit built with noiseless gates due to necessary error correction logic. von Neumann refers to triple modular redundancy (TMR) and others refer to N-modular redundancy (NMR) to achieve these results shown in [13]; however, these techniques often result in high circuit overhead of as much as or more. Pippenger showed exactly how much error correction would be needed [14] to achieve arbitrary reliability.

Pippenger improved upon this result by showing that the fraction of layers of a computing element that must be devoted to error correction for reliable computation is , but he was unable to show how such a computing element could be built [14].

Others have addressed soft error rate (SER) reduction and error masking techniques through time redundancy which catches transient errors due to particle strikes and delay-based errors [15, 16]. This work differs from these others because here we consider bits that are truly probabilistic due to thermal noise properties and the error rate is independent of time. Therefore, the shadow latching techniques and timing redundancy techniques presented in the aforementioned work are not as effective. Not to mention that these techniques pose the same drawback as TMR for time-independent errors in that there is significant overhead circuit involved and often involve pipeline flushes if an error is detected. Alternatively, in this work we show that logic synthesis can be done such that the fabric of the logic itself reduces error propagation.

The work presented in [17] presents a probabilistic analysis framework similar to the one presented in this paper and addresses adders and logic gates also similarly to this paper. However, they neither come to the conclusion of the relative error propagation characteristics of different logic networks nor are they able to link probability to power the way this paper does. It also presents a framework for analysis and not necessarily a solution to probabilistic errors.

Krishnaswamy et al. in [18] address probabilistic gates by using logic transformations to reduce error rates by considering that observability do not care (ODC) sets. This work differs from the current paper in that probabilistic thermal noise faults are considered herein, not deterministic stuck-at-faults. This work is superior to previous methods in the metric of energy savings in that no error correction logic or overhead is used.

Synthesizing probabilistic logic into CMOS using Markov Random Fields was presented in [8]. However, the CMOS circuits presented are not standard and use far more transistors (20 transistors for an inverter) than the gates proposed here.

Finally, Chakrapani et al. in [19] have done work in synthesizing probabilistic logic for inherently probabilistic applications, but do not address probabilistic design for deterministic applications.

#### 3. Defining Probabilistic Gates via Matrices

A tight analysis of the differing error rates present in boolean logic is given. To give a framework for the discussion two definitions are presented.

*Definition 1. * is the probability that a bit flip occurred on a circuit node.

*Definition 2. * is the probability that the output of a network of gates is incorrect as a function of .

The assumptions used in this work are similar to previous works on the subject [13, 14]: , , and each node is assumed to fail (flip) independently and with a uniform probability, . All input combinations are assumed to be equally likely. Stein in [5] and Cheemalavagu et al. in [4] show that thermal noise can be modeled as errors occurring with a precise probability defined by the noise to ratio and that this noise can be modeled at the input or output of each gate, thus is defined at the *nodes* of the circuit. The probability model used in this paper is the same as in [4], which has been verified by running an HSPICE circuit simulation and measurements from a chip. βMinβ, βMajβ, and βAOIβ will be used throughout this paper which stand for the Minority gate, Majority gate, and And-Or-Inv gates, respectively, which are shown in [20]. To review a minority gate simply outputs the value equal to the value present on the minority of inputs. Thus for a 3-input minority function with the input ββ or ββ the output would be ββ.

Transfer matrix methodology is used for calculating probabilities at the outputs of a complex network of gates. This methodology allows the probability to be calculated in a strict, mathematically defined way regardless of input [21]. The ideal transfer matrix (ITM) and PTM for a NAND2 gate can be seen in Figure 4 where each entry in the matrix is the probability of that input-output combination occurring. The top row of numbers in Figure 4(a): β00β, β01β, β10β, β11β are the possible input combinations and the columns on the left side are the possible output values. An ITM simply is a matrix with a β1β as an entry that corresponds to the correct output for a given input combination. In other words, for an ITM.

**(a)**

**(b)**

The PTM methodology is used in the definition of , illustrated in Figure 5. To calculate , all inputs are considered to be equally likely. Once the final probability transfer matrix is calculated for the circuit, giving a probability of each output occuring for each input combination, the probability of error is then calculated by summing the probability of error for each input combination and then dividing by the number of input combinations. In Figure 5, there are inputs to the circuit yielding possible input combinations, thus the final sum is divided by to get the average probability of error.

The algorithm described in [22] shows how to calculate an overall circuit PTM from individual gate PTMs. Briefly, one calculates the final PTM from the output. Each level of the circuit is represented by a matrix. Each level is then matrix multiplied with the level before it. If there are more than one gate at a given level, the PTM is calculated using the kronecker product of all gates.

#### 4. Theorems for Noisy Gates

Two theorems proving properties of noisy gates are outlined here, which can be used as a guideline for designing circuits with minimal probability of error. The first theorem proves which gate types have the lowest probability of propagating an error, and the second one proves that the probability of an error at the output of a circuit increases as the depth of that circuit increases.

Theorem. *A noisy gate of 2 or 3 inputs will have a minimal probability of error, , when the cardinality of its ON-Set or OFF-Set is 1.*

Note that the cardinality of the ON-Set of an AND gate is and the cardinality of the OFF-Set of a NAND gate is . In other words, there is input-output mapping that gives a value of ON () and OFF (), respectively.

*Proof. *The probability of an *correct* value at the output of the gate, , is calculated in (1).
where = Gates + Inputs,β = number of columns in transfer matrix,β = number of rows in transfer matrix.

The PTM for a gate including the inputs is the dot product of the PTM of the gate itself with the kronecker product of the PTM for inputs.

where are PTMs for inputs through = Gates.

The error function is a monotonic function with respect to and thus so is .

The equation for the minimal as calculated from (1) is shown for each possible value of in (3). For a 3-input function, .

where

The equation for for a each value of for 2 input functions is given in (5).

For , the probability of error, , for 2 and 3 input gates is minimal for .

Corollary 1. * and have the minimal probability of error, , of any gate type subjected to noise for 2 and 3 inputs. Among the set of 2 and 3 input gates subjected to noise for the set: , and have a lower probability of error than any other gate in this set.*

*Proof. *NAND and NOR are gates where , and the other gates in the set are such that . Therefore, from Theorem 1, NAND and NOR have the minimal probability of error, , which is less than of the XOR, AOI, OAI, Maj, and Min gates.

It is easy to conclude from Figure 6 that the XOR2 gate has a higher error, , than the NAND2 gate for all values . Thus it could be said that XOR2 gates propagate errors in a circuit with a higher probability than a NAND2 gate. As per Theorem 1, NAND2 will in fact have the lowest error rate of any gate for any .

Another implication of Theorem 1 and its corollary is that the results can be extended to circuits of an arbitrary size. That is to say that any logic function will have the least probability of error when implemented with gates where the cardinality of those gatesβ *ON-Set* or *OFF-Set* is 1. Further, the smaller the minimum cardinality of either the *ON-Set* or *OFF-Set* of these gates, the smaller the error probability of the network of gates.

Secondly, the error rate of a NAND gate decreases as a function of number of inputs, for example going from a 2-input to a higher input version, whereas the opposite is true for an XOR gate.

Theorem 2. *The probability of error at an output of a noisy circuit increases as the logic depth of that circuit increases.*

*Proof. *Let be the probability transfer matrix (PTM) for a given noisy circuit of depth . Depth is defined as the longest path in number of gates from primary input to primary output. To increase the length of this circuit to , additional noisy logic is added to compute the inputs of . Let the kronecker product of the PTMs of the additional input logic to be . The resulting PTM of the new circuit of length is according to [21].

Assume . Then according to (1), the noisy circuit would have to have , a contradiction. Therefore, .

The results for increasing logic depth of an inverter chain with different values of can be seen in Figure 7.

From Figure 7, one can see that no matter the value of , the error of the circuit, , increases as the logic depth increases. Another result that is available from the experiment is that the rate at which increases with an increase in logic depth is proportional to . This implies that the importance of logic depth becomes increasingly important with the level of noise present in the circuit. This further implies that as deep submicron transistor generations are explored, logic cones and pipeline stages must become increasingly smaller to sustain the same level of reliability.

Table 1 shows the different probabilities of error of gates given .

Several observations of regarding logic synthesis under a probabilistic fault model were observed.

(i): a minority gate followed by an inverter will have the same value as a majority gate. (ii)For : : XOR3 has the minimal error rate, . (iii)(iv)For : is minimal.Table 1 confirms the results predicted by the theory given earlier in this section. Namely, NAND and NOR type gates have the lowest probability of error, , of any gate. Further, for a given number of inputs, as increases, so does the probability of error . So for example, a gate that has such as a NAND3 gate has a minimal error. But for such as an AOI3 gate is increased, and further increases for as in the XOR3 gate.

#### 5. Case Study: The Full-Adder

The full-adder is a primary building block in digital arithmetic computation, present in nearly all digital adders and multipliers, and is one of the most well-studied circuits with a large variety of implementations. Its diverse array of circuit implementations and relative importance in creating low-power datapath units make it a prime candidate for study.

##### 5.1. Full-Adder Microarchitecture

In [20, 23], a CMOS full-adder is presented that is optimized for transistor count, which is known as the 28-transistor implementation or βmirror adderβ; it will be referred to as βf28β. In [23], the authors claim the f28 is not only faster but consumes less power than the low-power carry-pass logic (CPL) implementation. The transistor level implementation of the βf28β full-adder is shown in Figure 8

It should also be noted that a majority gate is nothing more than a minority gate with an inverted output. A majority gate can also be achieved by inverting the inputs to the minority gate, which is useful should the inverted inputs be needed for some other part of the circuit. This phenomenon is illustrated in Figure 9.

Finally a dual-rail asynchronous adder is also presented for comparison. Asynchronous logic can be visited more in depth in [24]. The circuit for carry-out computation is shown in Figure 10.

In terms of the theorems proven earlier, dual-rail asynchronous logic is quite attractive because of a unique property. This type of logic switches between a *valid* state and an *invalid* state, and when the logic is in an invalid state, between one and many bit-flips on the input can be sustained before an output error occurs. These states are defined such that if bit is *invalid* then , . Otherwise, if or , then is in *valid* state. Table 2 summarizes the result of single-bit flips (SBF)on the inputs of dual-rail asynchronous logic for each relevant input/output mapping.

Table 2 shows the single-bit flips at the input that would cause the output to erroneously flip for the carryout calculation of the dual-rail asynchronous adder. For example, no single-bit flip would cause an output error when the async-adder is in *invalid* state because at the very least the output will float. The async-adder is designed to leak back to the previous input in the case of a floating output. As another example if the async-adder is in state , a bit flip on or would cause to flip. This is because the output is supposed to be pulled up to but with the input a bit flip on either of or will create a DC path to ground, pulling the output down to . As it turns out, the async-adder has the least single-bit flip errors on the input that can cause an output bit flip of any adder.

Several other adders were built including a βfmajβ full-adder built with an XOR3 gate for the sum bit and an Maj3 for the carryout. A baseline adder, measured at full voltage so that no probabilistic errors would occur called βfbaseβ, was built for comparison. This adder, βfbaseβ was sized and built with several stages according to logical effort to maximize speed.

Simulations were run using the PTM algorithm in [22] to assess which full-adder types were most robust to error. These 4 full-adders were built in *Cadence* with TSMC technology, and voltage scaled (except for the deterministic βfbaseβ adder) such that probabilistic errors occurred at each node in the circuit due to thermal noise, and then measured with the HSPICE circuit simulator. The results can be seen in Table 4. This particular was chosen because it was shown in [1] that adders and multipliers can be built to compute successfully with probabilities of error up to and still successfully be used for image processing.

By scaling down supply voltages, the circuits will of course run more slowly; however, the energy-performance product metric showed a gain of up to in these experiments which is not shown in Table 4 for simplification.

#### 6. Conclusion

It was shown that probabilistic design principles can be used for ultra-low power computing. Probability and energy become closely linked in ultra-low power regimes because supply voltage is lowered so much so that it becomes comparable to the thermal noise level rendering the circuit probabilistic.

As shown in Table 4, error-aware logic design can produce a great reduction in the error rate and thus energy consumption over the baseline case which is optimized for speed. It was shown that in fact an asynchronous adder had the best performance producing both the lowest probability of error for a given and least energy consumption as well.

The principles behind the efficiency improvement were proven to be two important theorems developed under the probabilistic transfer matrix model. All gates do not propagate errors equally and that the depth of a digital circuit has a direct effect on the efficiency in terms of energy consumed for a given error rate. Future work includes extending these theorems to a general logic synthesis algorithm, and continued work on tuning threshold voltages through floating gate technology to mitigate other device effects such as device mismatch.

#### Acknowledgments

The authors would like to thank the National Science Foundation for supporting this research and would also like to thank the reviewers for their insightful comments.