Abstract

We present a compact and low-power rank-order searching (ROS) circuit that can be used for building associative memories and rank-order filters (ROFs) by employing time-domain computation and floating-gate MOS techniques. The architecture inherits the accuracy and programmability of digital implementations as well as the compactness and low-power consumption of analog ones. We aim to implement identification function as the first priority objective. Filtering function would be implemented once the location identification function has been carried out. The prototype circuit was designed and fabricated in a 0.18 μm CMOS technology. It consumes only 132.3 μW for an eight-input demonstration case.

1. Introduction

Searching operation is an important function in recognition systems. In conventional recognition systems, only the nearest matched template data among a vast number of template data can be retrieved. However, in some applications, such as in -neighbor selectors or internet routers, finding an th nearest matched data is necessary. Although such kind of operation can be carried out by employing a sorting processor, sorting operation is computationally expensive and time consuming, making it unsuitable for building low-power systems.

In image and speech processing, data compression, communication, neural network, and so forth, nonlinear filters can find a lot of applications such as attenuating impulsive noise while preserving sudden changes in the signal. Among many types of nonlinear filters, MIN, MAX, and MEDIAN are most popular ones. These filters can be implemented by using rank-order filters (ROFs) with appropriate rank-order setting values. Several ROFs have been implemented in fully digital [1, 2], mixed-signal [3] as well as analog approaches [46]. When considering the problem of saving required circuit area so as to use the structure as a basic block for building parallel processing array, analog implementations of ROFs are preferred. Although they can achieve low-power consumption and small chip real estate, the main drawback of analog implementations is that they suffer from the problems of accuracy, such as mismatches between transconductance amplifiers [4].

In this paper, we have developed a compact and low-power rank-order searching (ROS) circuit by employing a time-domain computation technique. Here, a time interval or delay time is used for representing value. The circuitry in this study is the core that can be used for building associative memories and ROFs. Since employing time-domain technique, the architecture not only achieves small chip real estate and low power consumption of analog implementations, but also improves the accuracy of such approaches. In the design we aim to identify the location of the candidate as the first priority objective. Getting out its content or filtering function would be implemented easily once the location was found.

In the rest of the paper, system organization and major circuitries utilized in the prototype chip design are described in Sections 2 and 3. Section 4 shows the experimental results from the test chips fabricated in a 0.18 μm CMOS technology. And the conclusion of the paper is given in Section 5.

2. System Organization

A ROF employing pulse width modulation (PWM) signals was proposed in [7] in a fully digital architecture. That architecture works well for filtering function, but it suffers from the problem of narrow-pulse signals (or glitches) probably occurring at the output of XOR gates in the address encoder circuit, leading error in the location identification function. The problem becomes more obviously in the case of a large number of inputs. In order to overcome this problem as well as to achieve a compact architecture dealing with the problem of a large number of inputs in many applications, an analog rank-order searching engine employing time-domain computation techniques is proposed in Figure 1. It is the basic core for building ROFs and associative memories. Identification function is the first objective that we aim at with this architecture. Basically, it consists of analog-to-delay-time converters (ATCs), a rank-order setting circuitry, a comparator based on floating-gate MOS technology, a binary encoder, and a binary counter. The ATCs convert analog values, , to delay-time signals; then the rank-order searching circuit uses them as input data. The final output is a binary code representing the location of the th smallest value in analog voltage domain or the th risen-up signal in time domain. In addition, in order to establish a smooth interfacing to the following digital processing, a binary counter can be added to the system. The value of the th smallest input in a digital format is given at the output of this counter. Filtering/searching operation is carried out within a period called the operation slot which is determined by the SLOT signal. The value of SLOT is mainly selected depending on the desired resolution of computation. For example, SLOT is set to 256 () clock cycles for 8-bit resolution. In the study, from now on, a rank of is represented by a binary number RANK equivalent to (). For example, a rank of four is represented by the rank value of .

3. Circuit Implementations

3.1. Analog-to-Delay-Time Converter (ATC)

Input voltages, , are converted to delay-time signals, , by ATCs as shown in Figure 1. Input analog voltages are applied to the negative nodes of voltage comparators while a common ramp voltage signal is applied to the positive nodes. The comparator compares the input analog voltage with this ramp voltage. The output of the comparator remains “0” level until the ramp exceeds the input voltage. At that moment, the comparator output is inverted to “1” level. In this manner, an analog voltage is converted to a delay-time signal. A smaller delay-time is corresponding to a smaller analog voltage.

3.2. Floating-Gate-MOS-Based Comparator and Rank-Order Setting Circuit

In order to reduce the circuit area compared with [7], the carry save adder (CSA) and the subtractor in [7] are replaced by a simple floating-gate-MOS-based comparator and a rank-order setting circuit. Simplified schematic of the floating-gate-MOS-based comparator utilized is shown in Figure 2. The voltages at the floating gates are determined as linear weighted summations of multiple input signals and calculated by [8]: where ,    is either “” or “.”

are the input voltages getting one of two levels: or ; are capacitive coupling coefficients between the floating gate and each of the input gates; is the capacitive coupling coefficient between the floating gate and the substrate. and are necessary to guarantee that is smaller than at each given rank and to fit the range of and inside the input range of the comparator. Smallest MIM capacitances of 16 fF of the fabrication process were chosen to save chip area.

For a given rank-order value, rank-order setting circuit will connect some of its capacitors (i.e., ) to and connect others to ground so that it can set a corresponding proportional to the rank value. The voltage rises proportionally to the number of “1” inputs. When exceeds , the comparator output COUT becomes “high,” as shown in Figure 2, after a small delay due to the response of the comparator.

Figure 3 demonstrates the case of a 5-input system with the required rank order of four. As can be seen in this example, the filtered input, that is, the candidate, is and a binary code of will be generated by the address encoder.

3.3. Address Encoding

Figure 4 illustrates the schematic diagram of the address encoding circuit. It identifies the location of the filtered input and represents this location as a binary code. It consists of simple XOR gates, narrow-pulse filters, domino buffers, and a binary encoder. Location identification function is carried out by taking XOR function between COUT and delay-time signals then searching a “0” that is remained at the output of domino buffers. Unfortunately, due to the response time of the comparator, a narrow-pulse signal (i.e., glitch) will occur at the output of the corresponding XOR gate of the candidate signal. Theoretically, such a glitch disappears if the response time is zero. This pulse is narrower than others occurring at other XORs’ outputs. Narrow-pulse filters are placed at the outputs of XOR gates to remove such a pulse. The domino buffers following filters detect from “0” to “1” transitions. As a result, only one output of these buffers remains “0” level, and others become “1,” at the end of the operation slot. A following binary priority encoder senses the “0” input and generates an address corresponding to that one. As can be seen in the example of Figure 3, the fourth smallest data () is nearly identical to the signal COUT, and the glitch occurring at is removed by the narrow-pulse filter.

3.4. Narrow-Pulse Filter (Glitch Filter)

A well-known delay cell [9] has been employed as a narrow-pulse filter in this design. Figure 5 shows the basic schematic of the filter. At the beginning of operation, the output of the filter is reset to “0” by and . The output changes its state when the voltage of the parasitic capacitor becomes smaller than the threshold voltage of the inverter. Therefore, if the discharge time , namely, the time required to discharge the parasitic capacitance from to , is larger than the pulse width of , the pulse is filtered. The advantage of this filter over conventional RC filters is that the filtered pulse width is programmable by changing a bias voltage . By this manner, the filter is programmed to filter only the narrowest pulse , which is related to the candidate signal , while other pulses are just delayed by filtering.

4. Experimental Results

4.1. Chip Fabrication

The proof-of-concept chip was designed and fabricated using a 0.18 μm standard CMOS technology. The chip includes eight inputs for demonstration. Time-domain signals are directly applied as input data. It means that ATCs shown in Figure 1 were not implemented in the test chip. For simplicity, the binary counter to count the number of clocks representing the digital value of the ranked input was not implemented either.

A photomicrograph of the test chip is shown in Figure 6 and specifications are summarized in Table 1. The core size is 0.006 mm2. The power dissipation is 132.3 μW and the accuracy is 9.5 ns, respectively. Assuming that the system has 8-bit resolution, it takes 256 clock cycles in each operation slot. As a result, the filtering latency is 2.432 μs.

4.2. Measurement Results and Discussion

In order to verify the operation of the prototype chip, an arbitrary pattern of 8 time-domain signals, as shown in Figure 7, was applied as input data. In this example, a rank value of 1002 corresponding to searching for the 5th smallest signal is applied to the rank-order setting circuit. Figure 8 shows the measurement results of the searching operation. The searched input (filtered input) is the input . A winner address code of was generated by the encoder. These waveforms are captured at the maximum time accuracy (i.e., time resolution) of 9.5 ns.

Figure 9 shows the response of the comparator. The comparator has a response time of 3.8 ns at the bias voltage of 0.9 V. Reducing will increase the response time with the expense of more power dissipation. It can be seen that the time accuracy of the system is the minimum time between two successive delay-time signals in order that the system can distinguish them correctly. In this design, the time accuracy is at least twice as large as the response time of the comparator because of XOR function and filtering operation. If two successive signals violate the time accuracy, they both generate narrow pulses at the outputs of XORs, and these two pulses will be filtered. Consequently, the binary encoder may give a wrong decision. For the test chip, the time accuracy is achieved as small as 9.5 ns. As a matter of fact, the response time can be reduced by employing fast comparators but usually with the tradeoff of more power consumption. High-speed synchronous comparators such as in [10] can be implemented in the system since time-domain signals can be synchronized with the system clock. The time accuracy is not an important issue because it can be satisfied by changing the slope of the ramp voltage signal in the ATC. It mainly affects the latency time required for a given resolution.

The performance of the test chip is summarized in Table 2 along with some ROF implementations from the literature. As can be seen from the table, the analog design in [5] gives the best performance in terms of compactness. It is also quite fast but the precision is not good. Digital implementation in [2] is very fast, but it occupies large area. The architecture in this study achieves small core size and low-power consumption. Although this architecture still suffers from some sources of errors such as the offset of the comparators, process variations,…, most of these error can be eliminated by increasing the time resolution of the system to a certain value, via changing the slope of the ramp voltage in the ATCs, so that the system can distinguishe successive time signals correctly. As can be seen, the tradeoff is a larger latency. The problem of glitches in [7], which probably occur at the outputs of XOR gates, is solved by using programmable narrow-pulse filters.

The rank-order setting circuit can be removed in certain applications where the rank is fixed, making the architecture simpler, and thus saving chip area. In addition, chip real estate becomes smaller if such a high- MIM capacitance technology is available. The area required for capacitors in the floating-gate-MOS-based comparator, therefore, would be reduced significantly. In terms of computation accuracy, the proposed approach can preserve the precision of digital approaches which is difficult to achieve with pure analog implementations.

The narrow-pulse filter employing delay elements described in Section 3.4 can be replaced by a better version shown in Figure 10. The power-hungry current source caused by in Figure 5 is removed, thus reducing the total DC power consumption. As a result, an estimated power consumption as small as 77 μW is achieved.

Once the location of the desired input is identified, the ROF function can be implemented by either an additional counter, as shown in Figure 1, to receive the filtered value in a binary code or an additional multiplexer to select the filtered analog signal.

5. Conclusions

A low-power analog implementation of rank-order searching circuit for building ROFs and associative memories has been developed by using a time-domain computation scheme. The architecture can preserve the accuracy of digital implementations but achieves advantages of analog implementations in terms of low-power dissipation and small chip real estate. The architecture is also a promising solution when a large number of input data are required. This is because it does not require many additional circuits. The circuit operation has been verified by experimental results obtained from the fabricated proof-of-concept chip.

Acknowledgments

The authors would like to thank Mr. Liem T. Nguyen for his original ideas of the rank-order filter using time-domain digital computation technique presented in [7]. The VLSI chip has been fabricated in the fabrication program of VLSI Design and Education Center, The University of Tokyo in collaboration with Rohm Corporation and Toppan Printing Corporation.