School of Engineering Systems, Queensland University of Technology (QUT), 2 George Street, Brisbane, QLD 4000, Australia
Sigma-delta modulated systems have a number of very appealing properties and are, therefore, heavily used in
analog to digital converters, amplifiers, and modulators. This paper presents new results which indicate that they
may also have significant potential for general purpose arithmetic processing. The paper introduces new arithmetic
processing structures for ternary (i.e., +1, 0, or ) sigma-delta modulated signals. Simulations show that these new
structures can be implemented very efficiently and have relatively good accuracy.
1. Introduction
Oversampled sigma-delta modulation (SDM)
signal representations have several key advantages over traditional Nyquist
rate pulse code modulated formats. When signals are put into SDM format, they
typically have very short word-lengths (e.g., binary or ternary). This very
simple representation creates the potential for reduced hardware complexity,
simpler signal bus routing, and resilience against electronic component inaccuracies
[1, 2]. There is yet another
advantage to the use of SDM (or bitstream) systems. If signals are maintained
in bitstream form all the way through the processing chain, one does not need format conversions (or the associated
interpolation and decimation filters). This is so because within many systems,
the “front end” analog to digital converters and “back end”
digital to analog converters use SDM bitstream format, but the intermediate
processing stages are typically implemented in multibit format. If the
intermediate processing stages of a system can be operated in binary or ternary
format as well, then the format conversions are unnecessary.
Several works have proposed digital bitstream
arithmetic processing using pulse width modulation, particularly in digital
neural networks, for example, [3, 4]. O'Leary and Maloberti [5] also presented a binary bitstream adder in which the
sum is stored and fed back to the adder to reduce the truncation error (i.e.,
the carries of the full-adder). This same approach has been adopted by
[6] to implement a
ternary bitstream adder using 2's complement format. However, the possibility
of compensating the ignored carries is confined to the immediate next sample.
The compensation fails when the next sample addition generates a carry as well.
Another well-cited work on bitstream arithmetic is that conducted in [7], where various binary
arithmetic circuits were proposed. However, most of the arithmetic circuits
proposed in [7] suffer
from two main drawbacks. First, many of these structures do not operate fully
in the short word-length domain; they partially operate in the multibit domain.
In particular, many of these structures use integrators which consist of a
recursive multibit adder followed by an SDM requanFtizer. The second major
drawback in the arithmetic units within [7] was the limited accuracy of the structures.
This paper attempts to address both of the above
listed limitations. Ternary quantized SDM processing is the assumed format throughout
the paper. Ternary format (i.e., +1, 0, ) is used rather than binary because
the extra zero state reduces quantisation error and so enables greater
accuracy. At the same time, the zero state often corresponds to a no hardware
operation, and so ternary formats often require minimal extra hardware
components [8].
In this paper, ternary arithmetic processing
modules are proposed, and an attempt is made to provide a measure of the
accuracy of these systems. This is done by determining the resolution or number
of bits in a multibit counterpart with similar accuracy. Both DC and slowly
varying input signals are considered in this paper.
2. Basic 1-Trit Ternary Adder/Subtractor
The adder is a critically important component of an
arithmetic processor since it is a fundamental building block upon which many
other processing operations are built. Therefore, it is highly desirable to
create adders with minimal complexity. Let and be two ternary
bitstreams. It is assumed that the inputs to the proposed adder are obtained
with ternary sigma-delta modulators (TSDMs). A
sample first-order TSDM is shown in Figure 1. These modulators may be well
incorporated into analog to digital converter hardware. The signal iswhere is the
allowable range of the input. is defined
similarly. Assume that the desired output of the proposed ternary adder is , such that all and . The task of designing the adder involves a tradeoff
between implementation complexity and accuracy. One can implement a very simple
adder with relatively large quantization by simply truncating any results
greater than . The resulting quantization error is then relatively
large. This simple adder would be defined as
Figure 1: The structure of a first-order
ternary sigma-delta modulator.
The above “basic” ternary adder can be
implemented using a traditional ternary half-adder. It is perfectly accurate
except when the two input signal values are identical, in which case a carry is
generated (and neglected).
The quantization noise at the output of the basic
ternary adder has two components: (i) the quantization noise inherent in the
two input signals and (ii) the quantization noise due to the truncation
operation which occurs when the two inputs have identical values. These two
components are quantified below.(i)If there was a white
uniformly distributed quantization noise in each of the inputs, then the power
spectral density (PSD) of each input would bewhere is the sampling
frequency, and is the
quantization step. The quantization noise will not be white, however, because
of the noise shaping inherent in SDM signal formats [9, 10]. Assume, say, that -order
TSDMs have been used to create and , then the power spectral density of the quantization
noise in both and would
beIt is worth noting that we are
dealing here with the whole oversampled spectrum (from to Hz) since the
operations are achieved within the ternary domain and there is no decimation
process.(ii)Now, it is necessary to determine an expression
for the quantization noise corresponding to the truncation introduced by the
ternary adder. We denote the truncation error signal as . This signal will only be nonzero when . This condition would be expected to occur on average
about two samples out of every nine, assuming that the probability of obtaining
a value of , or 0 is equal
to . The total average power of would,
therefore, be about . The spectral shape of depends on the
correlation between and . If and are perfectly
correlated, then the spectrum has a delta function at DC. If the signals are
uncorrelated, then the spectrum tends to be white.
Then, the expected PSD due to the truncation process
iswhere is the
expectation operator, and F is the Fourier transform operator. The total
power spectral density at the output of the adder will be the sum of the
quantization noise corresponding to the two input signals and . That is, the total quantization power spectrum is
given by
The subtraction operation can be easily accomplished
by negating one of the ternary bitstreams and using the same proposed adder.
3. Improved Adder
Knowing the
source of errors in the ternary adder specified in (2), one may alleviate this
error using a simple technique. If the lost carries are compensated for
whenever possible in the next samples, then in the average sense, the adder
would have improved accuracy. This can be done by introducing a ternary
flip-flop in the adder
circuit to store any carry overflows and propagate this carry information to
subsequent samples. Figure 2 shows a block diagram of the improved ternary
adder version. The rationale behind the new adder circuit is that any carry
arising from the addition of the current two input samples should be stored in
a flip-flop and added to the next output sample. If there is any carry
generated from doing this addition to the next output sample, then that
resulting carry should also be fed back and stored. The operation of the
circuit in Figure 2 can be described mathematically asThe improved adder can be
implemented by using three ternary half-adder (THA) modules and one delay
element. Each THA performs according to the truth-table shown in Table 1. The
ternary adder (TA) defined in (7) can easily be implemented with either
conventional digital gates (e.g., [11]) or with multiple-valued logic (e.g., [12, 13]).
Table 1: Truth-table of the proposed THA.
Figure 2: An improved version of the proposed
adder (TA) constructed using three ternary half-adders
(THAs).
According to (7), can be
re-expressed asnoting that the condition corresponds to
the event of .
By recombining lines 3 and 4 of (7), one obtains the
following expression for : Recalling that . It should be noted that the error term is only
nonzero when both and are equal and
nonzero (or equivalently, ).
Implicitly, this leads to the fact that an error will only occur when (i.e., with two
consecutive s (s)).
Then, the error is given
asNote that if the probability of , , and is assumed
equal for a trit, then the probability of is and the
probability of is also . Therefore, the total probability that would be .
Now, assuming ergodicity, the average value for can be
calculated asand the autocorrelation function
for is given
byThe precise form for the
autocorrelation function will depend on the nature of the signals and in
particular the correlation between them. By taking the Fourier transform of
(12), one can determine the power spectral density of asNote that in the above equation
the discrete Fourier transform uses cosine basis functions rather than complex
exponential ones because is an
autocorrelation function and is, therefore, even.
An alternative approach can be used by recombining
lines 3, 4, and 5 of (7), then one obtains the following expression for :where is the
truncation error in due to
“uncompensatable” carries. is given by .
Rearrangement the above expression for yields an
expression for as
follows:Taking the z-transform of (15),
the output can be expressed asExamination of the above
equation reveals that comprises a
true component () and an
error term (). The
error term is, in turn, comprised of two components, the first corresponding to
carries which are eventually “compensated” () and
the second due to uncompensated or “lost” carries (). The
structure of the adder causes the compensated carry error component to be
high-pass filtered, as per the term. This
high-pass filtering causes significant attenuation of the error term and
accounts for the improvement provided by the adder. The uncompensated carries
error term does not get attenuated, but fortunately, it is relatively low in
power because it tends to be nonzero in an average probability of about .
Because of the high-pass filtering of the error term
which is inherent in this adder, a significant reduction in the average
quantization error can be achieved. This is illustrated in simulations in
Section 5.
4. Format Conversion via an SDM with Ternary Integrator
As discussed
earlier, ternary arithmetic is significantly more accurate than binary
arithmetic, at least for pulse width modulation type signal formats. To
implement practical ternary arithmetic, it may sometimes be necessary to
convert incoming signals from binary format to ternary format. It is obviously
desirable that this must be done efficiently. This section proposes an
efficient new structure for binary to ternary format conversion, with this new
structure involving an SDM whose internal integrator is formed from the adder
proposed in the previous section.
In the short word-length literature, digital
integrators have generally been constructed from a multibit subsystem (such as
up-down counters) followed by a 1-bit noise shaper to restore the format to the
short word-length domain [7]. This approach is computationally intensive. By using
the ternary adder (TA) from the previous section, a novel integrator is
proposed that operates entirely in the ternary domain (see Figure 3, inside
the box)
Figure 3: Realization of a first-order SDM
using the proposed adder (TA) and the integrator.
Simulation results presented in Section 5 show that
the new 1-bit ternary integrator outperforms the traditional counterpart in
[7].
Having devised a digital integrator the next step is
to construct an SDM-based format conversion
structure. This structure uses both the proposed adder and new integrator and
is shown in Figure 3.
Consider first the leftmost TA in Figure 3. Equation
(16) provides a general expression for the output of an arbitrary TA, and using
this result, one can obtain the output for the leftmost TA as
follows:where and are the errors
due to the compensated and uncompensated carries, respectively, in
the leftmost TA. (Note that because of the synchronous clocking which is used,
there is effectively a single sample delay in the feedback path of the SDM in
Figure 3. This effective delay is not explicitly shown in Figure 3 because
of convention—SDMs representations normally do not explicitly show a delay).
Now consider the rightmost TA. Again, using the result
in (16), one can obtain the Z-transform of the output aswhere is the error
due to compensated carries, and is the error
due to uncompensated carries in the rightmost TA. Combining the above two
equations yieldsThe performance of this new
format conversion structure is evaluated in the next section.
5. Simulations
As we are dealing with arithmetic processing, there is
a need to determine the resolution of the new ternary bitstream structures. To
make a reasonable comparison with the multibit domain, the output stream has to
be windowed and averaged to determine an equivalent multibit value. The length
of the time window should be greater than or equal to the oversampling ratio
(OSR) to ensure a fair comparison. The SNR of this averaged output ternary
bitstream is then calculated, and an equivalent resolution (equivalent number
of bits) can be obtained.
5.1. Simulation Results for the Proposed Adders
Four different
input signals were considered in this work. The first two were sinusoids
corrupted by additive white Gaussian noise with an SNR of about 25 dB. These
sinusoids had the forms and respectively,
with and representing
the additive noise. These two sinusoids were mapped to the symmetrical ternary
domain using ternary quantizer sigma-delta modulators. Figure 4 shows the
spectra of the bitstreams , and the
summation of these bitstreams with the adder is specified in (7). Additional
simulations have revealed that the adder is quite robust to the presence of DC
components. That is, spurious tones do
not appear when there is a DC component present.
Figure 4: The spectra of the input ternary
streams and their ternary summation.
The final two types of input signals considered were
the DC signal and the ramp signal. The former is considered to be one of the
most challenging signals for SDMs to deal with—it can easily produce limit
cycles [14, 15]. It was specified by . The ramp signal was specified by , where . Figure 5 shows plots of the sum of and obtained with
(i) the basic adder defined in (2), (ii) the improved adder defined in (7), and
(iii) a 32 bit precision multibit adder. As seen in Figure 5, the curves
corresponding to the 32 bit precision adder and the improved ternary adder are
almost indistinguishable. The average output signal error power was calculated
by subtracting the true signal value from a multibit reconstruction of the
ternary signal representation. This reconstruction was achieved by filtering
the ternary signal with an -point
moving average filter. In these simulations, . The mean squared error of the basic adder (method 1)
and the improved adder (method 2) were and , respectively. This corresponds to an equivalent
multibit resolution of and bits,
respectively. An improvement of bits has,
therefore, been achieved by using the improved adder rather than the basic
adder. This result is consistent with the expectation expressed in Section 3.
Figure 5: A comparison between the proposed
ternary adders with DC ramp input, (red) infinite-precision, (blue) basic (method
1), and (black) improved (method 2).
To compare the resolutions achieved with the proposed
adder versus those of the adding technique presented in [7] (1-bit adder), [6] (ternary), the same DC and
ramp inputs as above were used. The adder output was averaged over samples, and
the mean squared error (MSE) was calculated and compared with an equivalent
N-bit quantizer that produced the same value of MSE (for the same dynamic range
(–0.5)). Table 2 summarizes the outcomes. The improved
adder proposed in this paper clearly outperforms the existing adders.
Table 2: Comparison among bitstream adders.
5.2. The Proposed Binary to Ternary Format Conversion Structure
It should be noted that as long as we are dealing with
short word-length systems (i.e., with no need to go back and forth between
decimation/interpolation stages), one must be concerned about the whole range
of the frequency spectrum, that is, .
For the simulations in this section, a 16-bit PCM
signal with an SNR of dB was
modulated using an SDM to produce . For simulation, this binary bitstream was used as
input to the newly proposed SDM format conversion structure (shown in
Figure 3. The power spectrum of the output is shown in Figure 6 with the
normalized in-band region assumed to be . Also in Figure 6 is
shown the output obtained if the same format
conversion structure and inputs are used, but with a traditional integrator (of
the form proposed in [7]) instead of the integrator proposed in Section 3.
Figure 6: Output of the proposed format
conversion structure in Figure
3 and output of its traditional counterpart.
The oversampling ratio is always
.
For the traditional and proposed format conversion
structures, the ensemble-average (1000 runs) of the in-band SNRs ()
was found to be dB and , respectively. While the whole of band SNR () was dB for the new
integrator and dB for the
traditional one. That is, improvements of about dB in and about dB were
obtained by using the new integrator within the format converter structure.
This improvement is a promising finding as integrators are common structures in
many digital electronic circuits. Moreover, the proposed format converter
structure not only outperforms its traditional counterpart but also permits
more efficient hardware implementation.
5.3. Realization of Exponential/Trigonometric Functions
This section
illustrates the use of the improved adder as a building block for realising
practically important functions such as exponential and trigonometric
functions. To create these functions, the improved adder was first used to
create a multiplier according to the model in [7]. That is, the multiplier in
[7] was realized by
simply replacing the original adder components with the new adder introduced in
Section 3. Once the multiplier was constructed, the exponential and
trigonometric functions were able to be created by using two non-DC terms of
their series expansions (i.e., and ).
Figure 7(a) shows the averaged (with a 128 sample
moving average filter) ternary realization of the function for a ramp DC
input (extending between and )
compared with its 2-term infinite-precision counterpart. The input was varied
in steps of . A mean squared error of was obtained
for a dynamic range of which is
equivalent to the quantization noise of 5.6 bit system.
Figure 7: Realization of series expansion
ternary bitstream (dotted) functions compared with their 32 bit precision
counterparts (red): (a) Exponential function; (b) sine and cosine functions.
Figure 7(b) shows averaged
ternary sine and cosine functions which were created using the same approach as
was used for the exponential function. These functions are drawn versus the
index of the input
signal so as to
provide a clear visual assessment of the outcome. The input signal was varied in
steps of . For a dynamic
range of , the mses were for the sine
and for the cos,
with these being equivalent to and resolutions
bits, respectively. The reduction in accuracy in the cosine function is
attributed to the quadruple term implementation in its expansion series.
6. Conclusions
Novel 1-bit ternary arithmetic structures have been
proposed in this paper for adders, integrators, and format converters. The
internal processing and the output for these new structures are all kept
entirely in the ternary domain. The operation of the proposed adder is assessed
in terms of the accuracy (expressed as the equivalent number of bits in
corresponding multibit system). Simulations show that both structures are
surprisingly efficient and, therefore, have the potential to realize
multiplication, division, and exponential/trigonometric functions.