VLSI Design

Volume 2015, Article ID 651785, 16 pages

http://dx.doi.org/10.1155/2015/651785

## A Discrete Event System Approach to Online Testing of Speed Independent Circuits

Department of Computer Science and Engineering, Indian Institute of Technology, Guwahati 781 039, India

Received 23 November 2014; Revised 30 March 2015; Accepted 30 March 2015

Academic Editor: Marcelo Lubaszewski

Copyright © 2015 P. K. Biswal et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

#### Abstract

With the increase in soft failures in deep submicron ICs, online testing is becoming an integral part of design for testability. Some techniques for online testing of asynchronous circuits are proposed in the literature, which involves development of a checker that verifies the correctness of the protocol. This checker involves Mutex blocks making its area overhead quite high. In this paper, we have adapted the Theory of Fault Detection and Diagnosis available in the literature on Discrete Event Systems to online testing of speed independent asynchronous circuits. The scheme involves development of a state based model of the circuit, under normal and various stuck-at fault conditions, and finally designing state estimators termed as detectors. The detectors monitor the circuit online and determine whether it is functioning in normal/failure mode. The main advantages are nonintrusiveness and low area overheads compared to similar schemes reported in the literature.

#### 1. Introduction

With the advancement of VLSI technology for circuit design, there is a need to monitor the operations of circuits for detecting faults [1]. These needs have increased dramatically in recent times because, with the widespread use of deep submicron technology, there is a rise in the probability of development of faults during operation. Performing tests before operation of a circuit and assuming continued fault-free behaviour may decrease the reliability of operation. In other words, there is a need for* online testing (OLT) of VLSI circuits, whereupon they are designed to verify, during normal operation, whether their output response conforms to the correct behaviour*.

Most of the circuits used in VLSI designs are synchronous. Compared to synchronous circuits, asynchronous designs offer great advantages such as no clock skew problem, low power consumption, and average case performances rather than the worst case performances. Testing asynchronous circuits as compared to synchronous circuits is considered difficult due to the absence of the global clock [2]. OLT has been studied for the last two decades and can be broadly classified into the following main categories:(i)Self-checking design.(ii)Signature monitoring in FSMs.(iii)Duplication.(iv)Online BIST.

The approach of self-checking design consists of encoding the circuit outputs using some error detecting code and then checking some code invariant property (e.g., Parity) [3–5]. Some examples are Parity codes [6], m-out-of-n codes [7], and so forth. The area overhead for making circuits self-checkable is usually not high. These techniques, termed as “intrusive OLT methodologies,” require some special properties in the circuit structure to limit the scope of fault propagation. These properties can be achieved by resynthesis and redesign of the original circuit, which may affect the critical paths in the circuit.

Signature monitoring techniques for OLT [8, 9] work by studying the state sequences of the circuit FSM model during its operation. These schemes detect faults that lead to illegal paths in the control flow graph, that is, paths having transitions which do not exist in the specified FSM. To make the runtime signature of the fault-free circuit FSM different from the one with fault, a* signature invariant property* is forced during FSM synthesis, making the technique intrusive. Further, the state explosion problem in FSM models makes the application of this scheme difficult for practical circuits.

Duplication based OLT technique works by simply replicating the original circuit and comparing the output responses [10]; a fault is detected if the outputs do not match. The major advantage of duplication based scheme is nonintrusivity; however, area overhead is more than double. To address this issue, partial duplication technique is applied [11, 12]. This scheme first generates a complete set of test vectors for all the faults possible, using Automatic Test Pattern Generation (ATPG) algorithms. After that, a subset of faults are selected (based on required coverage) and a subset of test vectors (based on tolerable latency) for the selected faults are taken and synthesized into a circuit which is used for OLT. It may be noted that ATPG algorithms are optimized to generate the minimum number of test vectors that detect all faults. As the scheme applies ATPG algorithms in a reverse philosophy, it becomes prohibitively complex for large circuits.

The technique of designing circuits with additional on-chip logic, which can be used to test proper operation of the circuit before it starts, is called off-line BIST. Off-line BIST resources are now being used for online testing [13–16]. This technique utilizes the idle time of various parts of the circuit to perform online BIST. Idle times are reducing in circuits because of pipelining and parallelism techniques and so online BIST scheme is of limited utility.

*Motivation of the Work*. From the above discussion we may state that an efficient approach for OLT should have the following metrics:(i)The OLT scheme should be nonintrusive. This is the most important constraint as designers meet requirements, like frequency, area, power, and so forth, of the circuit to produce an efficient design and do not want the test engineers to change them.(ii)The OLT technique should support well accepted fault models.(iii)The scheme should be computationally efficient so that it can handle reasonably large circuits.

Most of the papers cited in the above discussion are for OLT of synchronous circuits and only a few of them [4, 5, 10] are applicable to asynchronous circuits. Now, we elaborate on these three works on OLT of asynchronous circuits and derive motivation of the present work.

Traditionally, double redundancy methods were used for OLT of asynchronous designs [10]. In this scheme, two copies of the same circuit work in parallel and the online tester checks whether they generate the same output. This scheme results in more than 100% area and power overheads. Further, both being the same circuit, they are susceptible to similar nature of failures. The schemes reported in [4, 5] basically work by checking whether the output of the asynchronous circuit maintains a predefined protocol (i.e., there is no premature or late occurrence of transitions). The checker circuit is implemented using David cells (DCs), Mutual Exclusion (Mutex) elements, C-elements, and logic gates. The checker circuit has two modes: operation-normal mode and self-test mode. In normal mode, the checker is used to detect whether there is any violation in the protocol being executed by the CUT. On the other hand, in self-test mode, the checker is used to detect faults that may occur within the checker itself. Mutex elements (component of asynchronous arbiter) were used to grant exclusive access to the shared DCs between different modes of operation. The area overhead of the Mutex blocks is high, even compared to the original circuit. So, area overhead of the online tester in this case would be much higher than that of the original circuit and even the redundancy based methods. Further, this tester only checks the protocol and so fault coverage or detection latency cannot be guaranteed.

Discrete event system (DES) model based methods are used for failure detection for a wide range of applications because of the simplicity of the model as well as the associated algorithms [17]. A DES is characterized by a discrete state space and some event driven dynamics. Finite state machine (FSM) based model is a simple example of a DES. In the state based DES model, the model is partitioned according to the failure or normal condition. The failure detection problem consists in determining, in finite time after occurrence of the failure, whether the system is traversing through normal or failure subsystem. A fault is detectable by virtue of certain transitions (in the failure states) which are called fault detecting transitions (-transitions). -transition is a transition of the faulty subsystem, for which there is no corresponding equivalent transition in the normal subsystem. Using the -transitions, a DES fault detector is designed, which is a kind of state estimator of the system. For OLT of circuits, the detector is synthesized as a circuit which is executed concurrently with the circuit under test (CUT). Biswas et al. in [18, 19] have developed an OLT scheme for synchronous circuits using the FSM based DES theory, which satisfies most of the metrics mentioned above for an efficient online tester design. In this paper, we aim at using the theory of failure detection of DES models for OLT of SI circuits.

Just like synchronous circuits, the basic FSM framework is also used to model asynchronous circuits with slight modification. In case of synchronous circuits, state changes in the FSM occur only at the active edge of the register clock, irrespective of the time of change of the inputs. On the other hand, in asynchronous circuits, state changes can occur immediately after transition in the inputs. FSM used to model asynchronous circuit is called AFSM [20]. An alternative to AFSM is burst-mode (BM) state machines [20]. AFSM and BM state machine are similar from the modeling perspective; however, in case of BM state machine transitions are labeled with signal changes rather than their explicit values, which is the case in AFSMs. AFSMs and BM state machines assume that first inputs change followed by outputs and finally new state is reached. Due to the strict sequence of signal changes, all asynchronous protocols cannot be modeled using AFSMs or BM state machines. Extended BM state machines address this modeling issue by allowing some inputs in a burst to change monotonically along a sequence of bursts rather than in a particular burst. Petri net (PN) is widely accepted modeling framework for highly concurrent systems [21]. PN models a system using interface behaviors which are represented by allowed sequence of transitions or traces. The view of an asynchronous circuit as a concurrent system makes PN based models more appropriate than AFSMs and BM state machines for their analysis and synthesis. There are several variants of PNs among which signal transition graph (STG) is generally used to model asynchronous circuits. The major reason is that the STG interprets transitions as signal transitions and specifies circuit behavior by defining casual relations among these signal transitions [22].

In this paper, we aim at using the theory of failure detection of DES model for OLT of SI asynchronous circuits. Several modifications are made in the DES framework used for synchronous circuits [18, 19] when applied for SI circuits. The modifications are as follows:(i)We first model SI circuits along with their faults as STGs and then translate them into state graphs. State graphs are similar to FSM based DES models from which -transitions can be determined.(ii)In case of synchronous circuits, the fault detector is an FSM which detects the occurrence of -transitions. A synchronous circuit can be synthesized in a straightforward way from the FSM specification [18, 19] that performs online testing. Why the same design cannot be synthesized as an asynchronous circuit will be discussed in this paper. As the use of synchronous circuit for OLT of asynchronous modules is not desirable, we propose a new technique for detector design which can be synthesized as a SI circuit. The detector is designed as state graph model which is live and has complete state coding (CSC); these properties ensures its synthesizability as a SI circuit.

The paper is organized as follows. In Section 2, we present some definitions and formalisms of the DES framework. Section 3 illustrates DES modeling for a speed independent circuit under normal and stuck-at faults. In Section 4, the DES detector for the SI asynchronous circuit is designed. Synthesizing the DES detector as online tester circuit is also discussed in the same section. Section 5 presents experimental results regarding area overhead and fault coverage of the DES detector based online tester. Also, comparison of area overhead of the proposed approach with other similar schemes is reported. Finally, we conclude in Section 6.

#### 2. DES Modeling Framework: Definitions and Formalisms

A* discrete event system (DES) model * is defined as where (In case of modeling SI circuits as DES, the state variables are values of the I/O signals. So, in this work, we will interchangeably use the terms signal and variable.) is a finite set of discrete variables assuming values from the set , called the domains of the variables, is a finite set of states, is a finite set of transitions, and is the set of initial states. A state is a mapping of each variable to one of the elements of the domain of the variable. A* transition * from a state to another state is an ordered pair , where is denoted by and is denoted as .

##### 2.1. Failure Modeling

The failure of the system is modeled by dividing the DES model into submodels and each submodel is used to model the system under normal or failure conditions. To differentiate between the submodels, each state is assigned a failure label by a status variable with its domain being equal to , where is normal status, , , is failure status, and is the number of possible faults.

*Definition 1 (normal -state). *A -state is normal if . The set of all normal states is denoted by .

*Definition 2 (--state). *A -state is failure state or synonymously an -state, if . The set of all -states is denoted by .

*Definition 3 (normal -transition). *A -transition is called a normal -transition if , .

*Definition 4 (--transition). *A -transition is called an --transition if , .

*Definition 5 (equivalent states). *Two states and are said to be equivalent, denoted by , if and .

In other words, two states are said to be equivalent if they have the same values for state variables and different value for status variable.

A transition , where , is called a* failure* transition indicating the first occurrence of some failure in the system. Since failures are assumed to be* permanent*, there is no transition from any state in to any state in or from any state in to any state in .

*Definition 6 (equivalent transitions). *Two transitions and are equivalent, denoted by , if , and they must associate with the same signal change.

Suppose that there is a transition in failure DES model for which there is no corresponding equivalent transition in normal DES model, then that transition is called failure detecting transition (-transition). The failure is detected when the system traverses through the -transition. Thus, we can define -transition as follows.

*Definition 7 (-transition). *A --transition of faulty DES model is an -transition, if there is no -transition in the normal DES model such that .

The motivation of failure detection using DES model is to find out such -transitions and design DES detector using these transitions. In the next section, we discuss how to model SI circuits using DES.

#### 3. DES Model of a Speed Independent Circuit: Normal and Faulty

As already discussed, the first step to design a DES based online tester is to obtain the normal and faulty state based model of the CUT. However, the traditional state based DES paradigm cannot be directly used for modeling SI circuits. So in this case we will start with signal transition graph (STG), which is a type of Petri net based DES, to specify fault-free and faulty conditions. The STGs will be converted into state graphs (similar to FSMs) using the concept presented in [22].

We first discuss fault modelling at the STG level using an example and concepts from [22]. In addition to the models (i.e., faults in transistors of the C-elements) given in [22], we have also modeled stuck-at faults on all wires (i.e., input/output of gates).

##### 3.1. Fault Modeling

The SI asynchronous CUT example being considered to illustrate the proposed scheme is shown in Figure 1 (taken from [22]). Traditionally, synchronous circuits consist of blocks of combinational logic connected with clocked latches or registers, while, in case of SI circuit designs, we basically have logic gates as building blocks with C-elements, which act as storage elements. Transistor level diagram of C-element is shown in Figure 2; logic function of the C-element can be described by the Boolean equation , where is the next state and is the old state value [22, 23]. The output of C-element becomes logically high (low) when both the inputs are logically high (low); otherwise it keeps its previous logic value. There are two types of C-elements used in SI circuits: static C-element and dynamic C-element. The static version of C-element promises that the information inside it can be stored for unbounded periods. However, dynamic versions of C-element provide gains in terms of area, power, and delay [23–26]. Since the circuits having high operating speed, low area, and power consumption are preferred in modern days, we have chosen SI circuits with dynamic C-elements instead of static ones.