Department of Informatics, Technological Educational Institute of Athens, Athens 12210, Greece
Test set embedding built-in self test (BIST) schemes are a class of pseudorandom BIST techniques where the test set is embedded into the sequence generated by the BIST pattern generator, and they displace common pseudorandom schemes in cases where reverse-order simulation cannot be applied. Single-seed embedding schemes embed the test set into a single sequence and demand extremely small hardware overhead since no additional control or memory to reconfigure the test pattern generator is required. The challenge in this class of schemes is to choose the best pattern generator among various candidate configurations. This, in turn, calls for a need to evaluate the location of each test pattern in the sequence as fast as possible, in order to try as many candidate configurations as possible for the test pattern generator. This problem is known as the test vector-embedding problem.
In this paper we present a novel solution to the test vector-embedding problem for sequences generated by accumulators. The time overhead of the solution is of the order O(1). The applicability of the presented method for embedding test sets for the testing of real-world circuits is investigated through experimental results in some well-known benchmarks; comparisons with previously proposed schemes indicate that comparable test lengths are achieved, while the time required for the calculations is accelerated by more than 30 times.
1. Introduction
The problem of testing
VLSI chips is becoming more and more time- and memory-consuming. For the
testing of the chips fabricated today, complicated testing scenarios are applied, which incorporate both external testers and on-chip resources. The latter fall into the category of built-in self-test (BIST) techniques that
provide for test pattern generation and response verification operations on chip [1].
BIST pattern generators apply the
test vectors to the inputs of the circuit under test. The effectiveness of a
BIST pattern generator is judged by the hardware overhead imposed on the
circuit, the length of the applied test sequence, and the impact on the timing parameters of the circuit. In pseudorandom BIST schemes [2], either easily
synthesizable modules (i.e., modules that can be easily implemented by altering existing registers, such as linear feedback shift registers or cellular automata
[3]) or modules that already exist into VLSI chips (e.g., counters or accumulators [4]) are utilized for the generation of test patterns.
With pseudorandom BIST, both the
hardware overhead and the impact on the circuit timing parameters are kept low. In order to reduce the length of the pseudorandom sequence, test vector embedding [5, 6] has been proposed. With test
vector embedding, a precomputed (deterministic) test set is embedded into a sequence generated by a pseudorandom generator. In this way, the number of the applied pseudorandom patterns is decreased, without affecting the hardware or the impact on the timing parameters; furthermore, such schemes apply when reverse-order simulation [7] cannot be applied. In test vector embedding schemes, an embedding algorithm [5] is used. An embedding algorithm calculates the location () of a vector () in a
sequence generated by the generator that starts from a specific value. Thus,
is the number of cycles that the generator module needs to operate until
appears at its outputs.
Embedding test vectors into
sequences generated by hardware modules has been the goal of various
researchers. For example, Lempel et al. [5] presented an algorithm for embedding test vectors into sequences generated by
LFSRs, utilizing results of the theory of discrete logarithms, based on the results of [8, 9] in time proportional to ,
where is the number of stages of the LFSRs. Kagaris and Tragoudas [10, 18] have presented results on
embedding test vectors into counter-based sequences by using permutation and
complementation operations on the counter outputs.
Current VLSI chips (implementing
data paths or digital signal processors) commonly contain accumulators (see Figure 1); thereby, the utilization of such modules for the generation of test patterns or verification of responses of a circuit under test has no impact on the circuit timing parameters. For example, in [12] a response compaction scheme
based on accumulators is presented for the testing of RAM modules. In [13], a scheme was presented that
generates weighted patterns, that is, patterns where the probability of an
output is different from 0.5 (purely pseudorandom vectors) based on a properly
modified accumulator module. The pseudorandom nature of accumulator-generated
sequences has been studied in [4]. Dorsch and Wunderlich [6] presented a test vector embedding
approach utilizing accumulators and results of the theory on reduced-order binary
decision diagrams [9].
Independently, Stroele and Meyer [7] explored methods to reduce test
application time for accumulator-based self-test by skipping test patterns and
utilizing reverse-order simulation. Recently, Manich et al. [14, 15] further advanced the field by
presenting a scheme that minimizes memory requirements for storing the seeds
and addends that feed the accumulator inputs. Their scheme is based on the
observation that by using as addends
the test patterns extracted from an automatic test pattern generator tool, the fault coverage is increased.
Figure 1: Typical DSP core.
The above-mentioned schemes [6, 7, 14, 15] are based on the utilization of multiple seeds, where the accumulator is loaded with different values during the test phases and different addends feed the accumulator inputs. Therefore, they share the common need to store the accumulator addends and seeds, as well as additional control to handle the BIST operations. However, the requirement for additional memory and control for BIST purposes cannot always be met. In certain low-budget applications, the BIST hardware overhead needs to be as simple as possible. In these cases, single-seed solutions, where the test pattern generator is initialized and left to operate for a predetermined number of cycles until all
faults under question are detected, may be a preferable solution. The
cornerstone of such schemes is their test set embedding algorithm. However, the problem of embedding test patterns into hardware-generated sequences has been typically considered to be of exponential complexity (see, e.g., [5]).
In a recent work [11], a solution to the problem of embedding a test pattern into an accumulator-generated sequence was presented, which depends on the number of the stages of the accumulator; that is, it is of the order . However, when a test set of vectors has to be embedded,
the complexity becomes
Nikolos et al. [16] exploited
ideas of the number theory [17] in order to speed up the calculation of the
locations of the test patterns of the test set. More precisely, they found a
way to speed up the calculation from to ; therefore, the time required to
calculate the locations of all the patterns of the test set is reduced by a factor that ranges from 16
to 29 times compared to [11].
The hardware overhead of single-seed
accumulator-based BIST schemes is extremely low, since the need for storage is
eliminated; for example, the module presented in Figure 1 can be easily
configured in such way that the inputs of the accumulator are driven by the
outputs of a register of the register file. Hence, the hardware overhead is
minimal, compared even to LFSR-based schemes, where the hardware overhead is
two-way multiplexers ( is the width of the LFSR). In Figure 2, the configuration of a 4-stage accumulator that accumulates the pattern 1001 is
presented. The register (one of the registers of the register file of Figure 1)
is set to the 1001 value.
Figure 2: Accumulator fed by the pattern “1001” during BIST.
Another advantage of the schemes
in [11, 16] is that no additional reordering of the inputs is required, as has
been proposed by other schemes (see, e.g., [10, 18]); therefore, the data path
does not have to be reconfigured during the BIST operations.
In this paper, we present a novel
solution to the problem of embedding test patterns into accumulator-generated
sequences; more precisely, we prove that the location of a pattern in the
sequence generated by an -stage accumulator containing one’s complement adder
that accumulates the pattern where is an integer, can be
calculated by a simple formula; hence the embedding algorithm is of the order . To the best of our knowledge,
this is the only result on embedding test vectors of the order presented in the literature.
Comparisons with previous single-seed accumulator-based schemes [11, 16]
indicate that significant reduction is achieved in the calculation time to
embed the test set, while the length of the resulting sequence is comparable.
The proposed scheme may be well incorporated
into a generic scheme for the testing of processor cores, since it can be
effectively utilized to test combinational parts of the core. For example, as
shown in Section 3.3, the testing of a specific benchmark (c6288), which is a
array multiplier, can be performed in realistically low time.
The paper is organized as follows. In Section 2, we present the theory underlying
the proposed scheme. In Section 3, comparisons with previously proposed schemes
[11, 16] are performed. Finally, in Section 4, we conclude the paper.
2. Theoritical Background
In the sequel, will represent
an integer number that is, a power of two, that is, The symbol denotes an integer number less
than that is, µ
represents the th-power of 2, that is,
Definition 1. An ()-sequence is the sequence of vectors .
For example, the (7, 2)-sequence is (0, 2, 4, 6, 1, 3, 5, 7). The (15, 4)-sequence is (0, 4, 8, 12, 1, 5, 9, 13, 2, 6, 10, 14, 3, 7, 11, 15)
It has been proved (see, e.g., [4]) that an accumulator with one’s complement adder starting from a
nonzero value and accumulating a constant pattern generates all nonzero
vectors if and only if is mutually prime with . Therefore, an ()-sequence
as described in Definition 1 generates all -bit vectors since and are always prime. Indeed, the only numbers dividing are On the other hand, is an odd number
since is even.
From Definition 1, an accumulator
containing a carry-rotate adder that accumulates a constant pattern generates
an ()-sequence.
From the definition of the ()-sequence, it is evident that for every value of
there exist exactly -sequences, one for each number ,
for For example, for , the four -sequences are presented in Table 1.
Table 1: ()-sequences.
Definition 2. The location of a vector in an
()-sequence, denoted by , is the position of the
vector in the ()-sequence starting from 0.
From Definition 2, indicates when will be generated if the -bit carry-rotate
accumulator accumulates the constant value Therefore, if then For example, from Table 1, it
is easy to verify that while
Following Definitions 1 and 2, the problem of embedding a vector in a sequence
generated by an -stage accumulator fed by a constant pattern is transformed
into calculating The presented scheme is based on Theorem 1. In the
sequel, “mod” will be used to indicate the remainder of the division of two
integer numbers.
Theorem 1.. If and then
Proof. It is enough to prove that
Indeed, since
from the definition of the and operators and therefore
thus (3) gives
From this, we have
In Algorithm 1, the function implementing the formula (1) is presented.
Algorithm 1: -language routine implementing the presented scheme.
Example 1. Suppose we want to utilize the results of Theorem 1 in order to calculate the location of
in the sequence generated by a 5-stage accumulator with one’s
complement adder accumulating the constant value . Since , we have and Following (1), we
can see that is calculated by
It is easy to see that therefore, Thus, is expected at the th position in
the -sequence. From Table 5 of the appendix, we can see that, indeed,
this is the case.
Example 2. For , (i.e., ), and , the is
calculated as follows:
Indeed, it is easy to see that
3. Comparisons
In this section, we will evaluate the proposed scheme in three directions. At
first, we will compare the proposed scheme to serial- and linear-search
algorithms with respect to the time required to calculate the locations of all
patterns. This first comparison is indicative of the speed of the
algorithm. Then, we will perform comparisons with the scheme proposed in [11]
for randomly generated test sets. The purpose of this comparison is to
investigate the effect of narrowing the search space from (in
[11, 16]) to (in the proposed scheme). Finally, we will compare the proposed scheme for test sets of real benchmarks, from the ISCAS’85 suite [19],
in order to investigate the applicability of the proposed scheme in real-world
circuits.
3.1. Comparisons with Serial- and Linear-Search Algorithms
We implemented the serial-search algorithm that examines the test
vectors until the target vector is found (this algorithm operates in O () time and is, therefore, representative of the exponential time algorithms) as well as the linear-search algorithm [11], that is representative of the time
algorithms. The
-language routine we utilized for the implementation of the serial-search
algorithm is shown in Algorithm 2. For the linear search, we utilized the
algorithm given in [11].
Algorithm 2: Serial-search algorithm.
We
run -programs in order to find the locations of all nonzero
vectors for a single seed () of the accumulator (the computer used was Pentium
III 933 MHz, with 256 MB of RAM)
for various values of . The execution times of the programs are presented in Table 2. For the calculation of the time required by the serial search for
values of we simulated the time required to
calculate the locations
of a number of vectors for and then projected these values to the total
number of vectors..
Table 2: Execution times of serial and linear searches
versus the proposed time algorithm.
From Table 2, it is evident that
as the number of bits increases, the time required by the exponential as well
as the linear-search algorithms may become impractical, whilst the time
required by the presented method remains interestingly low.
3.2.Comparisons for Randomly Generated Patterns
In order to validate the
applicability of the presented algorithm and the quality of the applied test
sequence, we performed simulations to embed sets of test patterns into
accumulator-generated sequences. Our aim was to choose a “good” pair of numbers
(), where is the constant value accumulated and is the initial value of the
accumulator such that the whole test set is generated in as few cycles as
possible. We performed simulations utilizing random vectors generated by the random function
of for various values of the CUT inputs.
We experimented with all candidate
values of the input vector . For every value of we kept and
that is, the locations where the first and last vectors of the
test set were generated. We also calculated the distance as Every time a value of was found that generated the test
set within fewer cycles, that is, was assigned to and the new values of and were stored. For the calculation of
the initial seed of the accumulator, denoted by it is trivial to see that if
the accumulator is initialized to the starting value calculated by and operates for clock cycles, then the whole test set
is generated. Therefore, in the sequel, is not stated explicitly since it can
be directly calculated from and Every time a value of was
found that generated the test set within fewer cycles, that is, was assigned to and the new values of and were kept.
For each value of (the CUT
inputs), we performed experiments for four values of (the number of test
vectors in the test set), namely, 10, 20, 50, and 100. For each value, we
performed three experiments. The average value of these three experiments is
presented in Table 3. In Table 3, the first column presents the number of the
inputs of the CUT. The second column presents the number of (randomly generated)
vectors of the test set. The third column presents the minimum distance given by the scheme in [11].
The fourth column presents the minimum distance for the proposed scheme. The
fifth column presents the % difference in of the proposed
scheme over the one in [11].
The value of is,
in general,
expected to be higher than the one in [11], since in the proposed work, we
have a smaller solution space ( instead of in [11]). In the sixth column, the
expected mean value of denoted by is presented; the mean value is
statistically equal to The last cell of the table
(rightmost cell of the last line) indicates the average increase of of the proposed scheme over the scheme of [11]. This cell indicates an
average increase of 0.39%, that is, negligible. Therefore, we can conclude that the
quality of the test sequence of the proposed scheme (measured by the length of
the sequence) is comparable to that of [11, 16]. Furthermore, by comparing the
values of the fourth and sixth columns, we can see that the values of given by the proposed scheme are smaller than the expected mean value of
Table 3: Test set embedding into randomly generated test sets (average of three experiments).
Next, we investigated the
relationship of with (the maximum value of
for each experiment). We present the value of the ratio for various values of and in Figure 3. From Figure 3, it is extracted that—as we expected—the smaller the
value of the better the performance of the embedding task, since this gives
lower values for the quantity Furthermore, for
small values of the number of patterns (i.e., no. 10 and no. 20), we can see a
trend for decrease as the number of inputs of the test set increases.
Figure 3: for various values of and
3.3. Comparisons for Benchmark Circuits
In order to illustrate the
applicability of the presented scheme in real-world circuits, we applied our
embedding algorithm to test sets extracted by COMPACTEST [20] for the ISCAS'85 circuits [19]; the fault coverage achieved by
the utilized test patterns scales over 99% of the detectable single stuck-at
faults. The BIST community generally considers the ISCAS’85 as good platforms
for evaluating testing methodologies. Following the rationale of [10, 18], we have considered that the
test set, once given, is not altered. This approach is mostly favorable when
embedded modules such as intellectual property (IP) cores are utilized, whose
inner structure is not available to the test designer; in such cases, the test
designer utilizes test sets given by the designers of the modules and cannot
exploit techniques such as reverse-order simulation (see, e.g., [7]).
The scheme in [11] introduced a linear-time
algorithm to calculate the location of a test pattern in a sequence generated
by an accumulator that accumulates a constant pattern. Nikolos et al. [16] proposed the use of the
Diophantine equation in order to calculate the location of one of the test
patterns of the test set. For the remaining patterns, they utilized a formula
given in [17],
eliminating the need to resolve the Diophantine equation. Therefore, they achieved
a reduction of 16–29 times; that
is, the time is reduced to 3.45% (in
the best case) of the time reported in [11].
In Table 4, we present comparison
data for the three schemes. In the first three columns, the circuit name, the
number of its inputs, and the number of vectors extracted by COMPACTEST are
presented. The fault coverage of these patterns (for single stuck-at faults) is
over 99%. In the fourth column, the test length reported by [11, 16] is shown (the test lengths of
[11, 16] are similar, having a difference from 0% to 2.8%, i.e.,
negligible) and in the fifth column, the test length of the proposed scheme is
illustrated. In the sixth column, the % difference of the test length is
presented (the symbol is used since the test lengths of [11, 16] are not exactly equal); in the last row of the table, the calculation
times of the three schemes are presented. It is noted
that the complexity of the scheme presented in [11] is , where is the number of patterns in the
test set, the complexity of [16] is and the complexity of the proposed
scheme is
Table 4: Simulation results for the ISCAS’85 circuits.
The results reveal that for
almost all benchmarks, the proposed scheme results in test sequence lengths
that are comparable to those reported in [11, 16]; in one case (c1908), it even
outperforms previous schemes. This result is somewhat “strange” since the space
of solution of [11],
which includes a much larger set of candidate values for the addend, should
give much better results with respect to the value of However,
this paradox may be deciphered as follows. In the experiments conducted in [11], since the times were
prohibitively large, simulation was stopped after a predetermined time limit, for
example, 20 minutes. Therefore, although better solutions might exist, they
were not found. The scheme proposed in this work, due to the extremely low time
required for the calculations, exhausts the solution space very quickly.
As for the calculation time, the
proposed scheme always requires less than 0.7 seconds; that is, it is reduced
by 850 times
compared to the time of [11]
(i.e., less than 0. 12%) and 30 times compared
to the scheme in [16].
It should be further noted that
the proposed scheme could be utilized in combination with the schemes proposed
in [11, 16] as follows. At first, a very
quick (in orders of milliseconds) test can be performed using the scheme
proposed here in order to investigate if an acceptable test length can be
achieved. If the test length given by the proposed scheme is acceptable, then
the procedure stops; if a shorter test sequence is required, then the schemes
in [11, 16]
can be utilized and left to run for longer test time in order to try to find such a shorter sequence. Furthermore, if the achieved test length is not acceptable, then we may resort to alternative solutions like reseeding and
multiple addends; such solutions are the subject of ongoing research.
4. Conclusions
A novel solution has been presented to the problem of embedding test vectors into
sequences generated by accumulators containing one’s complement adders. The
presented solution calculates the location of a pattern into the sequence
generated by a carry-rotate accumulator accumulating a constant pattern of the form The time complexity of the presented algorithm is of the order To the best of our knowledge,
this is the first solution to the problem of embedding patterns into
hardware-generated sequences of the order
presented in the literature. Comparisons with previous schemes based on
exponentiall and linear time algorithms
reveal that the proposed scheme results in comparable test lengths, in
noticeably lower time. For the examined ISCAS85 benchmark circuits, the
reduction in time to embed the test set is 30 times lower than [16] and 850
times lower than [11].
Appendix
See Table 5 (31,b)-sequences