#### Abstract

Optimization for power is one of the most important design objectives in modern digital signal processing (DSP) applications. The digital finite duration impulse response (FIR) filter is considered to be one of the most essential components of DSP, and consequently a number of extensive works had been carried out by researchers on the power optimization of the filters. Data-driven clock gating (DDCG) and multibit flip-flops (MBFFs) are two low-power design methods that are used and often treated separately. The combination of these methods into a single algorithm enables further power saving of the FIR filter. The experimental results show that the proposed FIR filter achieves 25% and 22% power consumption reduction compared to that using the conventional design.

#### 1. Introduction

Actually, power consumption presents an important issue when designing electronic devices such as mobile phones. Power in digital electronic circuits can be considered as static, dynamic, leakage, and short-circuit power, where the main advantage of CMOS VLSI circuits is low static power, and the dynamic power is the major power source of them all. The source of dynamic power consumption is due to the highest switching rate of the clock signal. On the other hand, the finite impulse response (FIR) filter is widely used as a critical component for implementing several digital signal processing (DSP) hardware circuits for their guaranteed linear phase and stability. These circuits perform key operations in various recent mobile computing and portable multimedia applications such as high-efficiency video coding (HEVC), channel equalization, speech processing, and software defined radio (SDR). This fact pushed designers to search for new methods to grant low power consumption for the FIR filter. In several applications, such as the SDR channelizer, there is a need to implement the FIR filters in reconfigurable hardware [1, 2]. In [3, 4], authors have minimized power consumption of the FIR filter by reducing the filter coefficients without modifying its order. In [5], an approximate signal processing technique is used. In several approaches, the structure of the filter is simplified by add and shift operations. For low-power architectures, many techniques are used [6]. An integer linear programming (ILP) approach to design optimal finite wordlength linear-phase FIR filters in the logarithmic number system (LNS) domain has been proposed in [7], and different input wordlength and filter taps are adopted in [8]. In [9], a reduced dynamic signal representation technique is used. In [10], a reversible technique has been used. A memristor-based FIR filter has been proposed in [11]. In [12], a multibit flip-flop (MBFF) technique has been introduced for FIR power optimization. In [13], the data-driven clock gating (DDCG) technique has been used for power digital filter optimization.

Several works have been proposed in the last decade using the clock gating technique for digital filters. In [14], data-driven clock gating for digital filters has been implemented.

In this study, we propose a combination of (MBFFs) and (DDCG) techniques on a single algorithm applied to an appropriate structure of the FIR filter for power saving.

The remainder of this paper is organized as follows. Section 2 presents the background of the existing FIR filter. Section 3 discusses the power optimization by the clock gating technique. Section 4 analyzes the reduction of the power optimization by the MBFF technique. Section 7 shows the discussions and results of the proposed FIR filter. Finally, conclusions are drawn in Section 8.

#### 2. Background and Existing FIR Filters

An *N*-th order FIR filter performs *N*-point liner convolution of the input sequence with filter coefficients for a new input sample. The transfer function of the linear invariant (LTI) FIR filter can be expressed as the following equation:where *N* represents the length of the filter, is the coefficient, and are the input data at time instant .

The *z* transform of the data output iswhere *H*(*z*) is the transfer function of the filter given by

However, FPGA comes at the cost of speed, power, and overhead compared to ASICs. The improvement of the performance of the filter by algorithm reformulation is limited by the generalized reconfigurable nature. For this, several architectures have been proposed in the last recent years.

A filter can be implemented in the direct form (DF) or the transposed direct form (TDF). Figures 1 and 2 present, respectively, the structure of the DF and the TDF of the FIR filter. The transposed form and the direct form of an FIR filter are equivalent. It is easy to prove that, in the direct form, as shown in Figure 1, the wordlength of each delay element is equal to the wordlength of the input signal. However, in the transposed form, each delay element has a longer wordlength than that in the direct form; moreover, the delay elements are used to delay the product or sum of products. The transposed structure reduces the critical path delay, but it uses more hardware. In the critical path, there are 1 multiplier + (*M* − 1) adders in the DF but only 1 multiplier + 1 adder in the TDF. The improvement on performance is more observable for large *M*.

In VLSI implementation, the TDF is preferred over the direct form due to its inherent pipelined accumulation section. A TDF consists of two modules: multiple constant multiplication (MCM) and product accumulation to produce the filter output. In the last decade, a lot of effort has been put to reduce the complexity of the MCM module. However, the product accumulation module is often ignored. In [14], Xin et al. proposed a novel structure for area-efficient implementation of FIR filters by replacing parts of long wordlength with shorter wordlength SAs. Figure 2 shows the proposed transposed direct form FIR filter architecture.

The number of adders can be estimated as [15]

Figure 3 shows the proportion of the MCM block and the accumulation block in terms of FAs (full adders). MCM consumes about 10% of the total FAs, while the accumulation block consumes about 90%. In [16], Proakis et al. prove that if the phase of the filter is linear, the coefficients are symmetric or antisymmetric:

If *M* is odd, Figure 1 can be improved to save the hardware cost, shown as in Figure 3. The structure in Figure 3 uses only (*M* + 1)/2 multipliers, which is reduced by almost 50% for a large *M* and uses the same number of adders as the structure in Figure 2. Since multipliers consume the most area in the filter, the optimization based on the symmetric structure can reduce power dissipation.

The canonical signed digit (CSD) arithmetic can be used for reducing the area and power for the *M*-tap filter. Transposed structure and symmetry structure can be used to improve the delay and cost. Figure 4 shows the transposed symmetric structure of the FIR filter with *M* being odd.

Therefore, in this paper, we are interested on the optimization of the accumulation block. If we can reduce the power of the accumulation block module, we can reduce the power consumption of the filter.

In [8], Xin et al. have expressed (5) as

Figure 5 shows the structure of the filter with an odd order *N*. A similar structure as in Figure 5, without the tap can be obtained for an even order *N*.

The structure in Figure 5 presents an average power reduction over [16, 17] 41% and 23.5%, respectively. In the rest of this paper, we will use the filter structure in Figure 5.

#### 3. Power Optimization by Clock Gating Technique

Clock gating is a technique that reduces the switching power dissipation of the clock signals.

When the present and the next state of the *D* flip-flop is observed, it is noticed that when two continuous inputs are identical, the *D* flip-flop gives the same value as the output. Even if the inputs do not change from one clock to the next, the latch still consumes clock power.

At each clock edge, the main aim of the clock controller block is as follows:(i)It verifies if the input data of the FIR block at the current clock period *T*_{n} are similar to the input data at the previous clock periods *T*_{n−1}(ii)If (i) is verified, the clock controller disables the clock signal of the FIR filter.

The clock gating methods can be classified into three groups: The synthesis-based methods: the clock enables the signal which are synthesized based on the logic of the underlying system The data-driven methods: the clock gating system compares the data values and generates the clock signal The autogating flip-flops: the simplicity of this method makes it possible to only small power savings

#### 4. Power Optimization by MBFF Technique

Flip-flops (FFs) are usually used in digital systems to store data. To reduce the clock power, several FFs can be grouped into a module called a multibit FF (MBFF), replacing several flip-flops [18]. Each flip-flop contains two inverters which generate opposite-phase clock signals. The use of MBFFs was proposed for optimizing clock delay, controlling clock skew, and improving routing resource utilization. The MBFF can be used in the RTL design level to reduce the clock-to-Q propagation delay (*t*_{p}CQ). The driving capability of a clock buffer can be evaluated by the number of minimum-sized inverters that it can drive on a given rising or falling time. Because of this phenomenon, several flip-flops can share a common clock buffer to avoid unnecessary power waste. Figure 6 shows the block diagrams of 1- and 2-bit flip-flops. If we replace the two 1-bit flip-flops as shown in Figure 6(a) by a 2-bit flip-flop as shown in Figure 6(b), the total power consumption can be reduced because the two 1-bit flip-flops can share the same clock buffer.

**(a)**

**(b)**

The best grouping of FFs that minimizes the energy consumption has been explained in [19].

#### 5. Introducing Clock Gating into MBFF

This approach consists of maximizing the clock deactivation at the gate level, where the clock signal driving a flip-flop is deactivated (gated) when the flip-flop states are not subject to a change in the next clock cycle. Since the clock must be disabled when the inputs to all the flip-flops in a group do not change, it is, consequently, beneficial to group flip-flops whose switching activities are highly correlated to derive a joined enabling signal.

If we consider *p* probability of switching the clock data, the energy E1 consumed by a 1-bit FF can be expressed as follows:

We denote *λ*_{1} and *λ*_{k} as, respectively, the energy of the FF’s and MBFF’s internal clock drivers and as, respectively, the data toggling energy of the FFs and per-bit data toggling energy of MBFFs

In [19], the authors have shown that the expected energy is

The relation (8) presents the worst case. In fact, the correlation between FF toggling lets to have an upper energy savings. The energy saving potential of *k*-MBFF can be expressed as follows:

For *p* = 0, the energy consumption is the approximation of 35% savings for the 2-MBFF and 55% savings for the 4-MBFF. Figure 7 illustrates a filter architecture with a DDCG integrated into a *k*-MBFF. The group size *k* that maximizes the energy savings solves the following equation:where *C*_{FF} is the clock input load of FF and *C*_{latch} is the clock input loads of the latch.

#### 6. Implementation Methodology

Several methodology designs have been proposed in the literature [20–22] that differ according to system requirements, abstraction level, model refinements, and other design issues. In this paper, the platform-based design methodology has been used. This methodology allows designers to work at high levels of abstraction.

The PBD [23–26] is a design methodology that was proposed to decrease time to market and enhance product reuse. This methodology neither uses the bottom-up nor the top-down view; it is defined as a “meeting in the middle” process where refinements of specifications meet with abstractions of potential implementations. This methodology also allows designing the system at high levels of abstraction without making distinction between hardware and software tasks. After defining system specifications and requirements, the designer defines the parts that will be implemented as hardware components, the parts that will be implemented as software running on the component, and the parts realized with reconfigurable hardware.

#### 7. Results and Analysis

We have used a speech signal for testing the proposed FIR filter. Several parameters have been also used in the present discussion. The power consumption ratio *P*_{r} means the ratio of the proposed filter power consumption to the conventional filter power:

In the following discussions, as a metric of power savings, we use the power consumption ratio, which means the ratio of the proposed filter power consumption to the conventional filter power.

*P*_{sl} means the leakage power saving ratio and *P*_{rl} means consumption ratio of the leakage power:

The dynamic power saving ratio *P*_{sd} is expressed aswhere *P*_{rd} means the consumption ratio of the dynamic power.where *P*_{st} presents the total power saving ratio and *P*_{rt} means the power consumption ratio of the total power.

We compare the proposed implementation of the proposed structural FIR filter using data-driven clock gating and multibit flip-flops with the conventional FIR filter. Table 1 presents the leakage power comparison for random and speech signals input in milli-Watts. The proposed structural reconfigurable FIR filter gives lesser power consumption when compared to the conventional FIR filter design.

The specifications of the implemented FIR filters are 16-bit input sequence data and coefficients Data range is [−1, 1] 16-bit multipliers 24-bit digital output signal

Speech corpus database in the wav format from ITU-T P.50 [29] and NOIZEUS [30] has been used.

Table 2 shows the dynamic power comparison for random and speech signals input in milli-Watts. The proposed FIR filter presents lesser power consumption compared to the classic architectures of FIR filters. Table 3 shows the total power comparison for random and speech input signals input in milli-Watts. The total power of the proposed FIR filter is lower than the conventional design.

As a measure of filter performance degradation, we use the mean square error (MSE) between the proposed reconfigurable filter output and the original filter output. Table 4 presents the comparison in terms of MSE for the random and speech signals. The MSE is reduced compared to the conventional works:where *n*, *S*, and Si are the number of samples, the expected output, and the proposed output, respectively.

The signal power to mean square error ratio (SMR) [31] is defined as the ratio of the desired signal to the distorted error signal power.

Table 5 shows the comparison in terms of SMR for both existing and proposed FIR filter designs. The proposed reconfigurable FIR filter shows decrease in SMR as taps increase, when compared to the existing designs.

Simulation result examples of gains vs frequency of a low-pass FIR filter with two cutoff frequencies of 100 Hz and 5 kHz are proposed in Figures 8(a) and 8(b). The proposed architecture is also implemented using FPGA of family Virtex-5(XC5VLX110T-FF136). The design results in Table 6 show the resources occupied by the implementation of the FIR filter with and without the proposed method. The results show that the use of this method does not degrade the other design parameters such as the material resources occupied (Table 6).

**(a)**

**(b)**

The proposed FIR filter designed by multibit flip-flops and clock gating has the advantage in terms of area, delay, and power consumption. Therefore, the circuit performance is high compared to the FIR filter designed by single-bit flip-flops. In this architecture, we have used the carry look ahead adder and the array multiplier for implementing the FIR filter. For example, for the 9-tap FIR filter, we get the output at the 8th clock pulse. But, in the proposed technique, we get the output at the 2nd clock pulse. Therefore, delay is reduced and speed of the circuit performance is increased and also power consumption of the circuit is reduced due to multibit flip-flops.

The proposed method has been also synthesized using TSMC 0.25 CMOS technology. Power consumption is measured in the spice-level simulations using *nanosim* [33] with the operation frequency of 100 MHz. In Table 7, the proposed architecture is compared with previous works [5, 32] in terms of power saving and MSE. The proposed filter shows power savings than the filters in [5, 32].

#### 8. Conclusion

In this paper, a novel architecture of the FIR filter is proposed. The proposed design is based on the combination of data-driven clock gating (DDCG) and multibit flip-flops (MBFFs) applied to an appropriate FIR filter structure. To compare the power consumption between the conventional and the proposed FIR filters, two types of inputs are used: speech signal and random signal. In the proposed structure of the filter, the leakage power, the dynamic power, and the total power are reduced. The proposed methods allow a power saving up to 22%. The design results show that the loss in resources is less than 10%. This loss is considered as negligible when compared to the gain in power consumption.

#### Data Availability

The data used to support the findings of this study are available from the corresponding author upon request.

#### Conflicts of Interest

The authors declare that they have no conflicts of interest.