Abstract
This work presents a novel coefficient optimization algorithm to reduce the area and improve the performance of finite impulse response (FIR) filter designs. Two basic architectures are commonly used in filters—direct and transposed. The coefficients of a filter can be encoded in the fewest possible nonzero bits using canonic signed digit (CSD) expressions. The proposed optimization algorithm can share common subexpressions (CS) and reduce the number of replicate operations that involve the CSD coefficients of filters with a transposed architecture. The effectiveness of the algorithm is confirmed by using filters with the collision detection multiple access (CDMA) standard, the 121tap highpass band, and 105 and 325tap lowpass bands as benchmarks. For example, the proposed algorithm used in the optimization of 105tap filter has a 30.44% smaller combinational logic area and a 16.69% better throughput/area than those of the best design that has been developed to date. Experimental results reveal that the proposed algorithm outperforms earlier designs.
1. Introduction
Digital filters have a wide range of applications because they are much more stably reliable than analog filter. Digital filters are used in image/audio processing and a wide range of wired and wireless communication systems. The designs of digital filters vary widely for various applications. They can be divided into finite impulse response (FIR) and infinite impulse response (IIR) filters.
A finite impulse response filter (or so called FIR filter) has a linear phase and arbitrary amplitude, and it is easily implemented. The main goal of the previous designs has been to prevent for a highcost multiplier at the transmitting side, since a multiplier must be used at the receiving side. The design approach herein involves simplifying the digital filter’s coefficients to reduce the area cost of the filter. The coefficients of a digital filter can be separated into various coefficient groups. The filter hardware comprises logical adders, subtractors, and shift registers. If the number of these logical components can be reduced by some simplifying methods, then the overall system can be improved.
The rest of this paper is organized as follows. Section 2 briefly describes previous researches for filter optimization. Section 3 then describes the coefficient optimization method. Next, Section 4 summarizes the experimental results and compares them with those of other previous designs. Conclusions are finally drawn in Section 5, along with recommendations for future research.
2. Related Works
2.1. Coefficient Simplification Methods
Coefficient simplification is one of the most effective ways of improving the area and performance of a finite impulse response filter. Numerous methods of coefficient simplification for filters have been developed. The minimum number of signed poweroftwo (MNSPT) methods [1] has been developed to simply numbers of the coefficients. The canonic signed digits (CSD) [2] representation is used to reduce the numbers of binary “1”s in the coefficients to reduce the area of realizing constant multiplications. Simplification algorithms are utilized to reduce the numbers of required constant multipliers in FIR filter realization. Using an algorithm to determine the relationships among coefficients and to extract the common terms in their binary formats can reduce the number of redundant logical operations.
In the literature, horizontal and vertical relationships can be found between coefficients; existing relationships can be checked to design an algorithm for extracting their common factors. Such an algorithm commonly has low complexity. The algorithm of Paško et al. [3] performs a global search but consumes too much time. In some studies [4–9], horizontal and vertical relationships between coefficients and the displacement and delay characteristics of coefficients were used to perform the simplifications. Ernesto and Dolecek [10] utilized linear programming to identify the largest common factors of coefficients. Searches for common factors using lowcomplexity methods can be divided into two categories: horizontal and vertical.
2.1.1. Horizontal Search Algorithms
Horizontal search algorithms find shifting relationships between coefficients. For example, in Figure 1, the coefficients H0 and H1 in binary format are shifted relative to each other, so they have the same multiplication block. They can both be multiplied by performing only one calculation.
2.1.2. Vertical Search Algorithms
Vertical search algorithms find the delay relationships between the values in corresponding positions of binary representations of coefficients. Coefficients with such a relationship have the same addition block. They can be added by performing a single calculation. Figure 2 presents the vertical search.
2.2. Literature Review
Coefficient representations can be categorized into binary and canonic signed digit (CSD) representations. The CSD representation primarily involves reducing the number of “1”s in the original binary representation of coefficients. More “1”s result in more repeated additions and require more adders in the corresponding circuit realizations. Additionally, repeated additions can be represented as a sequence of additions with a large value and one subtraction. For instance, in binary, the value seven can be expressed as (2^{2} + 2^{1} + 2^{0}); it can also be expressed as () in standard CSD notation. The value seven in CSD notation uses a single subtraction instead of the two additions that are required using binary notation. The cost of realization is reduced by using the CSD notation. Numerous coefficient simplification algorithms are described below.
In 1999, Paško et al. [3] proposed the representation of coefficients using the CSD notation and utilized the horizontal search algorithm to find the common factors (1, 0, 1), (1, 0, −1), (1, 0, 0, 1), (1, 0, 0, −1), and so on. The algorithm did not consider the inversion relationship between pairs of coefficients. The most frequently occurring CSDbased common factor is the one extracted from coefficients. The algorithm is performed until no common factor of coefficients can be found. In 2002, Jang and Yang [4] proposed the representation of coefficients using CSD notation and used a vertical search algorithm to find the common factors (1, −1) and (−1, 1). The algorithm considered terms that were related by inversion. In 2003, Vinod et al. [5] utilized the vertical common term extraction method that was developed by Jang and Yang [4] to simplify coefficients. Their method firstly performs horizontal searches for common factors (1, 0, 1), (1, 0, −1), (1, 0, 0, 1), and (1, 0, 0, −1) and then vertical searches for factors (1, 0, 1) and (−1, 0) related by inversion.
In 2005, Vinod and Lai [11] improved the horizontal and vertical search algorithms that were developed in 2003 [5]. They constructed multiplier block adders (MBAs) and then structure adders (SAs). Their new algorithm yielded a final result after adding one or more delays to the logic gates of realized noncommon factors of the coefficients. The algorithm that was presented by Takahashi and Yokoyama [6] extracts common factors by finding the common factor with the highest frequency. If two or more common factors have the same frequency, then the smallest one is extracted. The experience of performing the algorithm in a filter with 26 coefficients shows that the factors (1, 0, ) and (1, 0, 0, ) are found to appear most frequently. Maskell and Liewo [12] developed an algorithm for reducing the height in all instances of an adder tree that is composed of common factors. The height of the adder tree can be reduced by properly setting the width of the adder. Accordingly, the common factors that are extracted more resulted in the less wide adders with low latency and low area cost. A local search algorithm firstly extracts the common factors (1, 0, 1) and (1, 0, −1). The algorithm also uses a specific multiplier block (MB) in place of a full adder (FA), reducing the area cost by 67%.
In 2007, Smitha and Vinod [7] proposed the use of binary representation to extract common factors of coefficients with only two to fourbit terms. The algorithm uses as few adders to generate the coefficients as possible and performs a horizontal search to find the common factors (1, 1), (1, 0, 1), (1, 1, 1), (1, 1, 1, 1), (1, 1, 0, 1), (1, 0, 1, 1), and (1, 0, 0, 1). In 2010, Vinod et al. [9] developed a new algorithm to perform a horizontal search by extracting the common factors (1, 0, 1), (1, 0, −1), (1, 0, 0, 1), and (1, 0, 0, −1) of the coefficients in CSD notation. Then, the algorithm performs a vertical search to extract the factors (1, 1) and (1, 0, 1). The horizontal search part of this algorithm takes into account the inversion of the filter’s coefficients. In previous works, the search algorithms effectively reduced the number of logical operations required for multiplication by a constant by extracting the common factors. The present work proposes a new search algorithm to improve the area cost and the performance over those of earlier designs.
3. Proposed Coefficient Optimization Method
Equation (1) is the finite impulse response (FIR) filter. At various time points , the variable is calculated only with the coefficient . Consider the following:
The basis of the common subexpression elimination (CSE) algorithm is to find the common factors of the coefficients of a filter. In the transposed form, presented in Figure 3, the common factors of the coefficients are evaluated as the shared multiplication blocks (MBs). Therefore, the overall area of the FIR circuit can be reduced by the sharing of MBs.
This section elucidates the use of a new CSE algorithm to extract the common factors of the coefficients in CSD notation. The algorithm obtains the statistics concerning the frequencies of the appearances of coefficients and finds the reciprocals of those coefficients to search for common factors. Algorithm 1 presents the pseudocode of the proposed CCSE (CSDbased common subexpression elimination) algorithm. The steps of the proposed CCSE algorithm are as follows.

Initial. Set the boundary conditions 0 ≦ ≦ (filter’s order −1) and initialize to zero.
Step 1. Find the th coefficient and find all nonzero bit positions (from high to low) of the coefficient. Record these positions and list the combinations of subexpressions (SEs) with more than one nonzero bit. Use all of the combinations as the basic elements (BEs) in simplification (tabulate the BEs). If an input element matches one of the BEs in the table, then increase the statistical frequency of the BE. Otherwise, if an input element does not match any BE in the table, the input element becomes a new BE and is added to the table.
Step 2. Set and determine whether the value exceeds the value in the boundary condition set in the initialization. If it does not, then repeat Step 1; otherwise, proceed to Step 3.
Step 3. Evaluate all of the BEs in the table to find their subexpressions (SEs) having reciprocal SEs. If the inverted SEs exist, then use the positive SEs as basic elements. Calculate the number of appearances of negative SEs.
Step 4. Evaluate all of the BEs in the table to find which BE has the highest appearance frequency. If the highest frequency is one, then the algorithm proceeds to the final step; if the highest frequency exceeds one, then select the BEs as common subexpressions (CSs), and if more than one BE has the same highest frequency and this frequency exceeds one, select the shorter BE as the extracted CS.
Step 5. Find all of the coefficients with the same CS that was generated in Step 4 and those of the corresponding inverted CS and perform the elimination process. When the process is complete, put a new replaced variable back in the original expressions and reset the loop value to zero, before returning to Step 1.
Final. Complete the algorithm and output the simplification results.
Table 10 presents the example of a filter with three coefficients to elucidate the actual processes of the algorithm. Intermediate results are obtained after each step of the algorithm, as described in the following statements.
Initial. Set and the boundary condition .
Explanation 1 ( equals zero). Select the coefficient H(0) and list all of the SEs whose nonzero bits are greater than one. Make these SEs to BEs for simplification. The appearance frequencies of these SEs are as follows.
SE (1, 0, 1) appears twice.
SEs (1, 0, 0, 1), (1, 0, 0, 0, 1), (1, 0, 1, 0, 1), (1, 0, 0, 0, 0, 1), (1, 0, 0, 0, 0, 0, 0, 1), (1, 0, 0, 0, 1, 0, 0, 1), (1, 0, 1, 0, 0, 0, 0, 1), and (1, 0, 1, 0, 1, 0, 0, 1) all appear once.
Firstly, no BE is recorded in the table, so all SEs are taken as BEs for subsequent simplification.
Explanation 2 (). equals one and satisfies . The algorithm executes Step 1.
Explanation 3 ( equals one). Select the coefficient and list all of the SEs whose nonzero bits are greater than one. The intermediate results are as follows.
SE (1, 0, 1) appears twice.
SEs (1, 0, 0, 1), (1, 0, 0, 0, 1), (1, 0, 1, 0, 1), (1, 0, 0, 0, 0, 1), (1, 0, 0, 0, 0, 0, 0, 1), (1, 0, 0, 0, 1, 0, 0, 1), (1, 0, 1, 0, 0, 0, 0, 1), and (1, 0, 1, 0, 1, 0, 0, 1) appear once.
The BEs in the table are as follows.
(1, 0, 1) appears four times.
(1, 0, 0, 1), (1, 0, 0, 0, 1), (1, 0, 1, 0, 1), (1, 0, 0, 0, 0, 1), (1, 0, 0, 0, 0, 0, 0, 1), (1, 0, 0, 0, 1, 0, 0, 1), (1, 0, 1, 0, 0, 0, 0, 1), and (1, 0, 1, 0, 1, 0, 0, 1) all appear twice.
Explanation 4 (). equals two and satisfies . The algorithm executes Step 1.
Explanation 5 ( equals two). Select the coefficient and list all of the SEs whose nonzero bits are greater than one. The intermediate results are as follows.
SE (−1, 0, −1) appears once.
The BEs in the table are as follows.
(1, 0, 1) appears four times.
(−1, 0, −1), (1, 0, 0, 1), (1, 0, 0, 0, 1), (1, 0, 1, 0, 1), (1, 0, 0, 0, 0, 1), (1, 0, 0, 0, 0, 0, 0, 1), (1, 0, 0, 0, 1, 0, 0, 1), (1, 0, 1, 0, 0, 0, 0, 1), and (1, 0, 1, 0, 1, 0, 0, 1) all appear twice.
Explanation 6 (). equals three and so does not fall in the range . The algorithm goes to Step 3.
Explanation 7. Determine whether the CSs of the BEs in the table have corresponding inverted CSs. SEs (−1, 0, −1) and (1, 0, 1) are the inverse of each other. The SE (1, 0, 1) becomes the basic element. The number of appearances of SE (−1, 0, −1) is calculated.
The BEs in the table are as follows.
(1, 0, 1) appears five times.
(1, 0, 0, 1), (1, 0, 0, 0, 1), (1, 0, 1, 0, 1), (1, 0, 0, 0, 0, 1), (1, 0, 0, 0, 0, 0, 0, 1), (1, 0, 0, 0, 1, 0, 0, 1), (1, 0, 1, 0, 0, 0, 0, 1), and (1, 0, 1, 0, 1, 0, 0, 1) all appear twice.
Explanation 8. Make the BE with the highest frequency in the CS to be simplified. BE (1, 0, 1) is selected as the CS after Step 4 is performed.
Explanation 9. The CS (1, 0, 1) is taken from Step 4 and the inverse CS (−1, 0, −1) is also utilized in the simplification. A new variable replaces the CS in the original coefficients and the intermediate outputs are as presented in Table 11. After the five steps have been completed, the algorithm resets the value to zero and returns to Step 1.
Explanation 10. Repeat the five steps and find the BEs of coefficients until the appearance frequencies of BEs are equal to zero. Table 12 presents the outputs of the algorithm.
The algorithm yields the following CSs. C5 = (1, 0, 1), whose decimal value is 5; C21 = (5, 0, 1), whose decimal value is 21; C169 = (21, 0, 0, 1), whose decimal value is 169.
Based on the above explanations and the pseudocode of the proposed algorithm, the main goal of the first step is to identify all subexpressions with nonzero bits. An algorithm that finds more subexpressions is more likely to find the best simplification. The third step is to calculate the number of inverted SEs, with a view to improving the area of simplification. The fourth and fifth steps are the major reduction steps. The major difference between the proposed algorithm and earlier ones concerns the CSE processes. The proposed algorithm does not directly eliminate the CSs from the coefficients but replaces the CSs with new variables. In next iteration, the algorithm performs the extraction of new CS between the coefficients with new variables and those without CS. This new approach can extract more new CSs and achieve better simplification results.
To confirm the effectiveness of the proposed algorithm, the filter with three taps is utilized to estimate the required logical operations at the architectural level. The input variable is set to 12 bits. The minimum bit widths of the coefficients and common factors are utilized in the estimation. A represents the adder; S denotes the subtractor; I denotes the inverter. The realization areas of a filter with original three coefficients in CSD notation are as follows. The realization of a filter with coefficient needs (72 A, 0 S, 0 I), needs areas of (60 A, 0 S, 0 I), and H(2) needs areas of (0 A, 30 S, 0 I). The total area of the filter needs (132 A, 30 S, 0 I). After the algorithm is implemented, the realization of common subexpression C5 needs (15 A, 0 S, 0 I), C21 needs (17 A, 0 S, 0 I), and C169 needs (20 A, 0 S, 0 I). H(0) and share the common subexpression C169 with a shifting relationship. The shift relationship can be realized without occupying additional area. H(2) has the inverted C5. The realization of H(2) only needs (15 A, 0 S, 15 I) after the algorithm is executed. The total area cost of the filter is (52 A, 0 S, 15 I). In the estimation of the area, the subtractor and the adder are assumed to have the same area cost. The area cost of the inverter is 1/10 of that of the adder. Accordingly, the original area cost of the filter is 162 A. The proposed algorithm reduces the area cost of the filter to 53.5 A. The algorithm reduces 67% of the area cost of realizing the coefficients of the filter.
4. Experimental Results
In the experiments herein, four filters are used to compare the performance of various search algorithms; the filters are a symmetric filter with 48 tap defined in CDMA 2000 communication protocol, a highpass filter with 121 tap, and two lowpass filter with 105 and 325 taps. The CDMA 2000 [13] is a 3 G mobile communication standard. The 3 G system offers various telecommunications services, including voice, multimedia, and highspeed and lowspeed data transmissions. The system requires a based band filter to perform intersymbol interference. The CDMA 2000 standard recommends the use of a symmetric finite impulse response filter with 48 tap to eliminate the interference. A highpass filter with 121 tap [14] and a lowpass filter with 105 and 325 taps [15] are also used to confirm the effectiveness of simplification by the search algorithms. Symmetric coefficients of the three filters are realized with the transposed architectures.
In the realization, the Synopsys Design Compiler (DC) SP1 software is utilized to data concerning the synthesis of the circuit. The process technology adopts the CBDK Arm 4.0_TSMC 0.18 um cell library with the default system parameters. The following data in the compared table are rounded to the second decimal place, the unit of circuit area is um^{2}, the unit of data arrival time is nanosecond (ns), the unit of the throughput is Gigabits per second (Gbps), and the unit of throughput per area is bps/um^{2}. In the comparison of areas, combination logic area is utilized to confirm the simplification performance of each algorithm. The search algorithms can simplify the coefficients, which are realized using combinational logic units. In the following performance comparisons, the coefficients with original CSDbased expressions are only simplified using the Synthesis DC tool. Other search algorithms perform simplifications of the coefficients in CSD notation.
According to Table 1, the search algorithm proposed by Jang and Yang [4] has a coefficient simplification ratio of 4.14% compared with original filter, which is better than that of any previous algorithm. The algorithm extracts the most common factors than the others but also causes the most path delays. The proposed CCSE algorithm reduces the combination logic area by 13.43%, and the total filter area by 8.56% compared with original filter. The proposed algorithm reduces the area of the filter by more than the previous algorithms. According to Table 2, the search algorithm that was developed by Jang and Yang [4] has the best simplification ratio, 15.85%, which corresponds to the best reduction of the area of coefficient realization. The proposed algorithm reduces the combinational logic area by 22.91% and the total filter area by 15.57%. Both of these results are the best achieved using any algorithm. Tables 3 and 4 also reveal that the proposed algorithm reduces the area of the filter more than does any other.
As more common factors are extracted using the search algorithms, the path delay (or the data arrival time) is increased more. The realizations under the coefficients in CSD notation have shorter path delays compared to those of the coefficients in original binary notation after search algorithms are implemented. The throughputs have the same effects as the path delays. In Table 5, all of the throughput/area ratios are reduced by realizing the constant multiplications with the coefficients in CSD notation. The proposed algorithm increases the throughput/area ratio by 2.84%, which is the largest increase of any algorithm. In Table 7, the coefficient simplifications using the proposed algorithm increase the throughput/area ratio by 16.69% more than the original coefficient simplifications using Synopsys DC. Tables 6 and 8 also reveal that the proposed algorithm has the best simplification ratio of any of the compared algorithms.
Previously proposed search algorithms have different advantages in coefficient simplifications than those of the four filters described above. All of the algorithms effectively reduce the area cost of realizing the FIR filters; most of them sacrifice throughput by increasing the path delays. The proposed CCSE algorithm reduces the most area of the filter, but it has a higher throughout than the other algorithms. According to experimental observations, the proposed algorithm performs best not only in area reduction but also in the throughput/area ratio.
However, the layout realizations of 48, 121, 105, and 325tap filters are also utilized to reveal the effectiveness of our proposed algorithm. The Cadence SOC Encounter software is utilized to place and route the designed filters. The I/O pin counts of all filters are 40 pins. The synthesis and layout processes utilize TSMC 0.18 μm mixedsignal RF 1P6 M CMOS technology. Table 9 shows the placement and route information of four designed filters. The die size of 48tap CDMA filter is about 0.16 mm^{2} and the total gate count of the filter is approximately 12.5 K. The filter can operate at 72 MHz with 7.531 mW power dissipation. Figure 4 shows the layout graph of the 48tap filter with 40 I/O pins. The 325tap filter has the largest chip size and can operate at 55 MHz with an area of 87.68 K gates.
5. Conclusions
In summary, coefficient simplifications by the search algorithms are useful in reducing the combinational logic area and the total filter area. Previously developed search algorithms provide greater area reductions but they sacrifice the throughput/area ratio. The proposed CCSE algorithm offers the best area reduction and the best throughput/area ratio. The main feature of the proposed algorithm is that the coefficients are not eliminated after they are used to reduce the number of common factors; instead, they are retained to reduce the number of new common factors that are generated by combining existing common factors to maximize the area reduction. Experimental results further reveal that the proposed algorithm performs best.
Conflict of Interests
The authors declare that there is no conflict of interests regarding the publication of this paper.
Acknowledgments
The authors would like to thank the Ministry of Education of the Republic of China, Taiwan, for financially supporting this research under Contract no. C30205. Ted Knoy is appreciated for his editorial assistance.