Abstract

Quantum-dot cellular automata (QCA) is an attractive nanotechnology with the potential alterative to CMOS technology. QCA provides an interesting paradigm for faster speed, smaller size, and lower power consumption in comparison to transistor-based technology, in both communication and computation. This paper describes the design of a 4-bit multifunction nanosensor data processor (NSDP). The functions of NSDP contain (i) sending the preprocessed raw data to high-level processor, (ii) counting the number of the active majority gates, and (iii) generating the approximate sigmoid function. The whole system is designed and simulated with several different input data.

1. Introduction

Nanotechnology is a multidisciplinary field that brings together many science and engineering disciplines including physics, chemistry, biosciences, material science, computer science, and electrical and mechanical engineering. Nanotechnology will radically affect all these disciplines and their application areas. The economic impact is foreseen to be comparable to information technology and telecommunication industries [1]. Nanotechnology has direct applications in sensing, sensor miniaturization, and new materials development as well as in electronics and electromechanical devices. Substantial advances in nanotechnology development have been achieved in the fields of engineering and bioscience [24]. For example, integrating a large number (496) of programmable FET nodes in a small area of about 960 μm2 and programing them into a full adder, subtractor, multiplexer, and demultiplexer has been reported. Work reported in [3] demonstrated that a lateral integration of 700 rows of ZnO nanowires produces a peak voltage of 1.26 V at a low strain of 0.19%, which is potentially sufficient to recharge an AA battery. The nanosensor development described in [5], based on nanowires, is emerging as a powerful and general class of ultrasensitive, electrical sensors for the direct detection of biological and chemical species, from proteins and DNA to drug molecules and viruses down to the ultimate level of a single molecule. It also shows that a nanosensor array that contains 100 addressable elements provides unique opportunities for label-free multiplexed detection of biological and chemical species. The literature [6] provides an in-depth view of nanosensor technology and electromagnetic communication among nanosensors. With the integration of the technologies described in [3, 5, 6], that is, the integration of nanosensors, self-powered nanodevices, and wireless nanosensors, many nanosensor applications can be considered. These applications can be classified into four groups: biomedical, environmental, industrial, and military applications. In biomedical applications, health monitoring systems and drug delivery systems are some of the examples. In environmental applications, plant monitoring systems and plague defeating systems can be given as examples. In industrial and consumer goods applications, the examples are ultrahigh sensitivity touch surfaces and haptic interfaces. For military and defense applications, examples are nuclear, biological, and chemical defenses and damage detection systems.

In the above-mentioned applications, although the total number of nanosensors used in each case depends on the scale of the deployed devices, generally this number ranges from 103 to 107 or higher. These application devices have the following common characteristics.(i)Simple Data. Each nanosensor delivers only simple data. The processing of the data provided by each individual nanosensor does not need complicated operation.(ii)Ambiguity. Because some of the nanosensors may work at charging phase (self-powered) or the power may be low, nanosensors may provide ambiguous signals. The receiver (processing) side cannot obtain accurate data at all times. Also, like any other communication system, the signal may fade because of environment conditions and noise.(iii)Massive Data. Nanosensing devices contain a large number of nanosensors (103–107 or higher) which generate data continuously. The data from each nanosensor may be primitive and ambiguous. However, massive amount of this type of data, if processed correctly, delivers useful and meaningful information from the device as a whole.

To analyze data from the nanosensing devices mentioned above, classical computing and processing models, such as loosely coupled distributed systems or tightly coupled parallel systems, can be employed. However, in the classical computing and processing models, the number of processors is usually in the range of 101–103. This number of processors can be increased, but it will bring tremendous increase of cost and system dimensions. Each processing unit usually has a local memory and can conduct complex operation independently. Also, they consume large amounts of energy (power) and heat dissipation becomes a critical issue. The above-mentioned computing and processing models are not suitable for processing data provided by nanosensors. The computing and processing models for nanosensing should meet the following three principles.(i)Simplicity. The basic processing element (PE), that is, the cell, is simple. PE does not need complicated operations and does not need many instructions, as those in the existing general-purpose CPUs. It only needs a small number of operations because it will function as a bridge between nanosensors and high-level processors.(ii)Parallelism. There will be a vast number of cells operating in parallel. Hence, the processing must be distributed.(iii)Locality. All interactions take place on a purely local basis. A cell can interact with a few other cells.

ITRS report [7] summarizes several possible technology solutions for nanosensor data processing. Quantum-dot cellular automata (QCA) is an interesting possibility. Since QCAs were introduced in 1993 [8], several experimental devices have been developed [913]. Although they are certainly “not ready for prime time,” recent papers show that QCAs may eventually achieve high density [14], fast switching speed [15], and room temperature operation [10, 16].

This paper describes the design of a PE for nanosensor data processing, named as nanosensor data processor (NSDP), based on QCA. NSDP works as a bridge between the nanosensors and high-level processor. Its functions are limited to three: (i) sending the preprocessed raw data to high-level processor, (ii) counting the number of the active majority gates (the active means that the output of a majority gate is 1), and (iii) generating the approximate sigmoid function for postprocessing based on artificial neural network. This paper is organized as follows. Section 2 presents the background of QCA focusing on its unique clocking scheme. Section 3 shows the details design of NSDP. Simulation results are shown in Section 4. Conclusions and future works are given in Section 5.

2. QCA Background

2.1. QCA Cells and Wires

A QCA cell is a square nanostructure with a quantum dot in each of the four corners [17], as shown schematically in Figure 1. The cell is populated with two electrons that can tunnel between two pairs of quantum dots connected via a tunnel junction. The two electrons occupy antipodal sites within the cell due to Coulombic repulsion. Tunneling action only occurs within the cell and no tunneling happens between cells. The combination of quantum confinement, Coulombic repulsion, and the discrete electronic charge produces bistable behavior. The two charge configurations can be used to represent binary “0” and “1” with polarization of −1 and +1, respectively. In contrast to a physical wire, a QCA “wire” is a chain of cells where the cells are adjacent to each other, as shown in Figure 1(b). Since no electrons tunnel between cells, QCA provides a mechanism for transferring information without current flow.

2.2. QCA Logic Gates

In QCA, three-input majority gates and inverters serve as the fundamental gates. A majority gate, as shown in Figure 2(a), consists of five QCA cells that realize the function of . An inverter, as shown in Figure 2(b), is made by positioning cells diagonally from each other to achieve the inversion functionality. Figures 2(c) and 2(d) show the variation layouts of an inverter. Majority gates and inverters form a universal set; that is, any logic function can be implemented by using this set. For example, a two-input AND gate is realized by fixing one of the majority gate inputs to “0,” that is, . Similarly, an OR gate is realized by fixing one input to “1,” that is, .

2.3. QCA Clocking Scheme

Adiabatic switching is used for QCA clocking, which significantly reduces metastability issues and enables deep pipelines [18]. During each clock cycle, half of the wire is active for signal propagation, while the other half is unpolarized. During the next clock cycle, half of the previous active clock zone is deactivated and the remaining active zone cells trigger the newly activated cells to be polarized. Thus, signals propagate from one clock zone to the next. The circuit area is divided into four sections and they are driven by four phase clock signals, as shown in Figure 3. In each zone, the clock signal has four states: high-to-low, low, low-to-high, and high. The cell begins computing during the high-to-low state and holds the value during the low state. The cell is released when the clock is in the low-to-high state and inactive during the high state.

2.4. QCA Design Rules

A nominal cell size of 20 nm by 20 nm is assumed. The cell has a width and height of 18- and 5-nm-diameter quantum dots. The cells are placed on a grid with a cell center-to-center distance of 20 nm. QCA design rules are well studied in [1921] and are summarized in the following.

(A) Layout Design Rules(1)Maximum Number of Cells in a Single Clocking Zone. It can be as large as 47 cells; the maximum length of QCA wire is 25 cells. Any long QCA wire exceeding the length of 25 needs to be partitioned into different clocking zones.(2)Minimum Number of Cells in a Single Clocking Zone. It can be one cell. However, the waveform of a one-cell clocking zone can become distorted and cascading of this kind of clocking zone could lead to incorrect results [22]. To observe correct outputs from a circuit, it is recommended that clocking zones should consist of at least two cells.(3)Minimum Wire Spacing for Signal Separation. A space of one QCA cell size is sufficient separation between two wires carrying different signals.(4)Wire Crossover. A unique property of QCA layout is the possibility of implementing crossovers by using only one layer, known as coplanar crossing. Coplanar crossing uses both 45° and 90° cells. However, they can easily fail due to low robustness [23] and fabrication issues [24]. Another alternative is multilayer crossing, which uses more than one layer of cells similar to the routing of metal wires in CMOS technology. The extra layers of QCA are believed to be useful as active components of the circuits and consume less area compared to coplanar circuits [23].(5)QCA Equivalent λ-Rule. “λ-rule” for QCA circuit design could be defined according to the size of a QCA cell, or perhaps the cell size itself could be used as the equivalent λ.

(B) Timing Design Rules(1)Logic Component Timing Rule. The timing constraint on a QCA majority gate is that all three inputs are expected to reach the device cell (central cell) at the same time in order to have fair voting.(2)Clocking Zone Assignment Rule. In QCA circuits, cells in each clocking zones should be synchronized.

(C) Special Rules for QCA(1)Majority Logic Reduction. The logic primitive used in QCA is the majority gate. The majority logic-based reduction method [25] can significantly reduce the complexity of QCA circuits.(2)Systolic Design. The features of systolic architecture in terms of synchrony, deep pipelines, and local interconnection are particularly suitable for accommodating the special timing requirement in QCA circuits. When applying systolic architecture to QCA, significant benefits can be achieved, even more than when applied to CMOS-based technology [26].

In the design of NSDP, the second rule (i.e., Minimum number of cells in a single clocking zone) in layout design is applied in the following way. For the fixed-value input, such as fixing one input of a majority gate to “−1.00” (binary 0) to make it an AND gate, it uses one-cell clocking zone. Or for the limited space, it uses one-cell clocking zone, but no cascading. For the 4th rule (i.e., wire crossover), it employs the multilayer crossing technique. The following section shows the details design of NSDP.

3. NSDP Architecture

NSDP is a processor that works as a bridge between nanosensors and the high-level processor. Its functions are (i) sending the preprocessed raw data to high-level processor, (ii) counting the number of the active majority gates, and (iii) generating the approximate sigmoid function. Among these functions, the last one is the focus of NSDP because it provides the sigmoid function output for the high-level processor to conduct the processing based on the artificial neural network (ANN). The block diagram of NSDP is shown in Figure 4. The detail of each block is explained below.

3.1. Preprocessing

Considering that a large number of nanosensors will generate a great amount of “0” or “1” data, and some nanosensors may generate ambiguous data, the first processing in NSDP computer is the majority operation that determines which one (“0” or “1”) takes the majority. Generally, the number of the inputs of this unit can be any number. The larger the number of the inputs is, the better the majority operation is. However, considering the capability of the available QCA design tool, QCADesigner, the number of the inputs in NSDP is set at 12. This processing unit consists of four majority gates, and it has 12 inputs and 4 outputs. Figure 5(a) shows the schematic, Figure 5(b) shows the QCA layout, and Figure 5(c) shows the simulation result by QCADesigner, respectively. The output of this unit is given by

As shown in Figure 5(c), for the input , , , and , the output , , , , respectively, and the delay is clock.

3.2. Counter

The output of the preprocessing unit is separated into two, one goes to M-Latch and the other goes to the input of the counter. This unit is to count “1” in the output of the preprocessing unit. To count “1” from 4 parallel bits of the output of the preprocessing unit, it employs three full adders, as shown in Figure 6(a). QCA full adder has been well studied. Some representative QCA adders are [19, 2731]. The QCA carry flow adder reported in [19] is a layout optimized multilayer full adder. In QCA, the path from carry-in to carry-out uses one majority gate that requires one clocking zone per bit in a ripple carry adder. This adder (referred to as Cho adder in this paper) consumes only one clocking zone delay per bit, which significantly reduces the delay for large adders. Three Cho adders are employed to implement the counter. The QCA layout is shown in Figure 6(b), and the simulation result shown in Figure 6(c). The output is given by

As shown in Figure 6(c), for the input , , , , the output . The delay is 2 clocks. Note that the “+” operator here means arithmetic addition. In the following, “+” means logic OR operation if there is no specific notation.

3.3. Decoder

The decoder unit and the sigmoid pattern generator unit are used to approximate the sigmoid function . As shown in Table 1, NSDP employs 4 points to approximate the sigmoid function. This is 3-to-4 decoder; the inputs are , , and , and the outputs are , , , and . The output corresponds to that is equal to 0 or 1; that is, it corresponds to that equals to 1000, 0100, 0010, 0001, or 0000. It is worth to note that the position of “1” does not matter. For the four inputs, , , , and , of the counter, does not mean the most significant bit and does not mean the least significant bit because the twelve input sensors can only be considered “fired” or “not fired” and the majority gate can be numbered in any order. This is the same here and after. Likewise, the output corresponds to that which is equal to 2, that is, it corresponds to that which is equal to 1100, 0110, 0011, 1010, 0101, or 1001. The output corresponds to that , is equal to 3, that is, it corresponds to that which is equal to 1110, or 0111. The output corresponds to that is equal to 4; that is, it corresponds to that equals to 1111. The outputs , , , and are given by

Figure 7(a) shows the schematic of this 3-to-4 decoder. Figure 7(b) shows the QCA layout, and Figure 7(c) shows the simulation result. As shown in Figure 7(c), when the input (sum) is equal 0, 1, 2, 4, 0, 3, 1, and 2, the output will be equal to 0001, 0001, 0010, 1000, 0001, 0100, 0001, and 0010, respectively, and the delay is one clock.

3.4. Pattern Generator

The output of the decoder is used to generate four patterns, 1111, 1101, 0011, and 0000, as listed in the third column in Table 1, which are used to approximate the sigmoid function. These four patterns are generated in the following way: The schematics for , , , and are shown in Figure 8(a). The QCA layout for is shown in Figure 8(b). The layout for , and is similar to the one in Figure 8(b), except that the pattern to be generated is 1101, 0011, and 0000, respectively. The simulation result is shown in Figure 8(c). When the input (sum) is equal to 0, 1, 2, 4, 0, and 3, the generated pattern is 0000 and 0000 for , 0011 for , 1111 for , 0000 for , and 1101 for , respectively. The delay is clock. Note that the delay for and is one clock longer than of and . This is because that the input for and is delayed by one clock.

3.5. Sigmoid Function Output

The output of the sigmoid function is one of the four patterns, , , , and , depending on the number of 1’s of the counter. Therefore, the output of the sigmoid function can be written as The schematics for are shown in Figure 9(a). The QCA layout for are shown in Figure 9(b). This layout is also for other outputs , except that the inputs and output will be changed to and , correspondingly. The simulation result is shown in Figure 9(c). When the input (sum) is equal to 0, 1, 2, 4, 0, and 3, the output is 0000 , 0000 , 0011 , 1111 , 0000 , and 1101 , correspondingly. The delay from the output of the counter to the output of the sigmoid function is clocks. Figure 10 shows the graph of the approximated sigmoid function, generated by the decoder and pattern generator unit described above.

3.6. Function Control

NSDP is a multifunction nanosensor data processor. The function selection of NSDP is determined by the function control unit, which is a 2-to-3 decoder. The truth table of the function control unit is shown in Table 2. When control bus is equal to 00, the output of NSDP is the raw majority gate output, that is, from 4 independent majority gates. The bit position does not matter; that is, does not mean MSB and does not mean LSB. When control bus is equal to 01, the output of NSDP is the number of 1s of the majority gate output, given by . Note that here is LSB and MSB is always 0. When control bus is equal to 10, the output of NSDP is the sigmoid function output. The output of the function control unit is given by The schematic of the function control unit is shown in Figure 11(a), the QCA layout is shown in Figure 11(b), and the simulation result is shown in Figure 11(c). When input is equal to 00, 01, and 10, the output is equal to 001, 010, and 001, respectively. The delay from the input to output is 1 clock.

3.7. NSDP Output

The function control unit of NSDP determines which output of the three function units (raw majority data, the number of the active majority gates, and sigmoid function) will be the output of NSDP. is given by

The schematic of NSDP output unit is shown in Figure 12(a), and the partial QCA layout of in is shown Figure 12(b). Note that , , and are the outputs of M-latch, S-latch, and G-latch for the inputs , , and , respectively, which are not shown in Figure 12(b) (refer to Figure 13 for details).

The complete QCA layout of NSDP is shown in Figure 13. NSDP consists of the following units: preprocessing, counter, 3-to-4 decoder, pattern generator, sigmoid function output, function control, M-latch, S-latch, G-latch, and NSDP output. Each unit is surrounded by a red dotted line in Figure 13. The complete experiment results are given in the next section.

4. Results

NSDP is designed and simulated by using QCADesigner Ver. 2.0.3. NSDP contains 4436 cells. Its dimension is (6.9 ). The simulation employs the coherence vector engine. Figure 14 shows the parameter setting for coherence vector simulation engine.

4.1. Raw Majority Data Output ()

In this mode, the data flow path is Preprocessing M-Latch NSDP Output (refer to Figure 13 for details).

As shown in Figure 15, 12 inputs are grouped into 4 groups: Input 3 (), Input 2 (), Input 1 (), and Input 0 (). The input data are as follows:Input 3: 011 , 100 , 101 , 110 , 111 , Input 2: 101 , 100 , 011 , 010 , 001 , Input 1: 101 , 100 , 011 , 010 , 001 , Input 0: 011 , 100 , 001 , 010 , 110 ,

The majority output on bus () is equal to 1111 , 0000 , 1110 , 1000 , 1001 . When is equal to 00, becomes 1, and and both become 0. NSDP picks the data on bus as the final output. This is shown in 6th row, in Figure 15; that is, Output () is equal to 1111 , 0000 , 1110 , 1000 , 1001 , in accompany with the clocked .

4.2. Counter Output ()

In this mode, the data flow path is Preprocessing Counter S-Latch NSDP Output (refer to Figure 13 for details).

As shown in Figure 16, the input data are as follows:Input 3: 000 , 001 , 010 , 011 , 100 , 101 , 110 , 111 , Input 2: 000 , 111 , 110 , 101 , 100 , 011 , 010 , 001 , Input 1: 000 , 100 , 110 , 101 , 100 , 011 , 010 , 001 , Input 0: 000 , 001 , 010 , 011 , 100 , 001 , 010 , 110 ,

The counter output on sum bus () is equal to 000 , 001 , 010 , 100 , 000 , 011 , 001 , 010 . When is equal to 01, becomes 1, and and both become 0. NSDP picks the data on bus and extends to 4 bits by padding leftmost bit with 0, as the final output. This is shown in 6th row, in Figure 16; that is, the Output () is equal to 0000 , 0001 , 0010 , 0100 , 0000 , 0011 , 0001 , 0010 , in accompany with the clocked .

4.3. Sigmoid Function Output ()

In this mode, the data flow path is Preprocessing Counter 3-to-4 Decoder, Pattern Generator Sigmoid Function Output G-Latch NSDP Output (refer to Figure 13 for details).

As shown in Figure 17, the input data are as follows:Input 3: 000 , 001 , 010 , 011 , 100 , 101 , 110 , 111 , Input 2: 000 , 111 , 110 , 101 , 100 , 011 , 010 , 001 , Input 1: 000 , 100 , 110 , 101 , 100 , 011 , 010 , 001 , Input 0: 000 , 001 , 010 , 011 , 100 , 001 , 010 , 110

The counter output on sum bus () is equal to 000 , 001 , 010 , 100 , 000 , 011 , 001 , 010 , as shown in 5th row, in Figure 17. The sigmoid function converts these inputs to 0000 , 0000 , 0011 , 1111 , 0000 , 1101 , 0000 , 0000 , and outputs on the Sigmoid bus, as shown in 6th row. When is equal to 10, becomes 1, and and both become 0. NSDP picks the data on the sigmoid bus as the final output. This is shown in 7th row, in Figure 17; that is, the Output () is equals to 0000 , 0000 , 0011 , 1111 , 0000 , 1101 , 0000 , 0000 , in accompany with the clocked .

In above experiments, the processing in sigmoid function is the most complicated and the delay from input to the output is the longest. For the raw majority data output and the counter output, the delay is inserted to make them synchronize with the output of the sigmoid function output. Therefore, the delay of NSDP from the Input 3 (), Input 2 (), Input 1 (), and Input 0 () to the Output () is clocks.

5. Conclusions and Future Works

This paper presents the design of a 4-bit nanosensor data processor (NSDP) which has 3 functions: (i) sending the result of 4 majority gates to high-level processor; (ii) counting the number of active majority gates (i.e., the output of the majority gate is 1); (iii) generating the approximate sigmoid function for the postprocessing based on the artificial neural network. QCA circuits have significant wire delays. For a fast design in QCA, it is generally necessary to minimize the complexity. The design uses systolic array structures to produce an output on every clock cycle with low latency to the first output. The layouts and functionality checks were done using QCADesigner. NSDP is a relatively complicated system. The simulation results show that its three functions work correctly. It is hoped that this paper will inspire further ideas on developing application systems based on QCA circuits.

As mentioned in Section 1, the nanosensor system generates a large number of nanosensor data. This type of system needs 103–107 PEs to process the real-time data. NSDP described in this paper works as a PE. A large number of PEs (103–107) need to be arranged in a parallel and distributed manner to handle the huge amount of nanosensor data. NSDP is a 4-bit processor. It will be more convenient to handle large amount of nanosensor data, by extending it to a 8-bit processor. All these are our future works.

Conflict of Interests

The authors declare that there is no conflict of interests regarding the publication of this paper.

Acknowledgments

This work is partially supported by a Grant from AFRL under Minority Leaders Program, Contract no. TENN13-S567-019-02-C2. The authors would like to thank the anonymous reviewers for their careful review and valuable comments.