Research Article  Open Access
Lamjed Touil, Abdelaziz Hamdi, Ismail Gassoumi, Abdellatif Mtibaa, "Design of LowPower Structural FIR Filter Using DataDriven Clock Gating and Multibit FlipFlops", Journal of Electrical and Computer Engineering, vol. 2020, Article ID 8108591, 9 pages, 2020. https://doi.org/10.1155/2020/8108591
Design of LowPower Structural FIR Filter Using DataDriven Clock Gating and Multibit FlipFlops
Abstract
Optimization for power is one of the most important design objectives in modern digital signal processing (DSP) applications. The digital finite duration impulse response (FIR) filter is considered to be one of the most essential components of DSP, and consequently a number of extensive works had been carried out by researchers on the power optimization of the filters. Datadriven clock gating (DDCG) and multibit flipflops (MBFFs) are two lowpower design methods that are used and often treated separately. The combination of these methods into a single algorithm enables further power saving of the FIR filter. The experimental results show that the proposed FIR filter achieves 25% and 22% power consumption reduction compared to that using the conventional design.
1. Introduction
Actually, power consumption presents an important issue when designing electronic devices such as mobile phones. Power in digital electronic circuits can be considered as static, dynamic, leakage, and shortcircuit power, where the main advantage of CMOS VLSI circuits is low static power, and the dynamic power is the major power source of them all. The source of dynamic power consumption is due to the highest switching rate of the clock signal. On the other hand, the finite impulse response (FIR) filter is widely used as a critical component for implementing several digital signal processing (DSP) hardware circuits for their guaranteed linear phase and stability. These circuits perform key operations in various recent mobile computing and portable multimedia applications such as highefficiency video coding (HEVC), channel equalization, speech processing, and software defined radio (SDR). This fact pushed designers to search for new methods to grant low power consumption for the FIR filter. In several applications, such as the SDR channelizer, there is a need to implement the FIR filters in reconfigurable hardware [1, 2]. In [3, 4], authors have minimized power consumption of the FIR filter by reducing the filter coefficients without modifying its order. In [5], an approximate signal processing technique is used. In several approaches, the structure of the filter is simplified by add and shift operations. For lowpower architectures, many techniques are used [6]. An integer linear programming (ILP) approach to design optimal finite wordlength linearphase FIR filters in the logarithmic number system (LNS) domain has been proposed in [7], and different input wordlength and filter taps are adopted in [8]. In [9], a reduced dynamic signal representation technique is used. In [10], a reversible technique has been used. A memristorbased FIR filter has been proposed in [11]. In [12], a multibit flipflop (MBFF) technique has been introduced for FIR power optimization. In [13], the datadriven clock gating (DDCG) technique has been used for power digital filter optimization.
Several works have been proposed in the last decade using the clock gating technique for digital filters. In [14], datadriven clock gating for digital filters has been implemented.
In this study, we propose a combination of (MBFFs) and (DDCG) techniques on a single algorithm applied to an appropriate structure of the FIR filter for power saving.
The remainder of this paper is organized as follows. Section 2 presents the background of the existing FIR filter. Section 3 discusses the power optimization by the clock gating technique. Section 4 analyzes the reduction of the power optimization by the MBFF technique. Section 7 shows the discussions and results of the proposed FIR filter. Finally, conclusions are drawn in Section 8.
2. Background and Existing FIR Filters
An Nth order FIR filter performs Npoint liner convolution of the input sequence with filter coefficients for a new input sample. The transfer function of the linear invariant (LTI) FIR filter can be expressed as the following equation:where N represents the length of the filter, is the coefficient, and are the input data at time instant .
The z transform of the data output iswhere H(z) is the transfer function of the filter given by
However, FPGA comes at the cost of speed, power, and overhead compared to ASICs. The improvement of the performance of the filter by algorithm reformulation is limited by the generalized reconfigurable nature. For this, several architectures have been proposed in the last recent years.
A filter can be implemented in the direct form (DF) or the transposed direct form (TDF). Figures 1 and 2 present, respectively, the structure of the DF and the TDF of the FIR filter. The transposed form and the direct form of an FIR filter are equivalent. It is easy to prove that, in the direct form, as shown in Figure 1, the wordlength of each delay element is equal to the wordlength of the input signal. However, in the transposed form, each delay element has a longer wordlength than that in the direct form; moreover, the delay elements are used to delay the product or sum of products. The transposed structure reduces the critical path delay, but it uses more hardware. In the critical path, there are 1 multiplier + (M − 1) adders in the DF but only 1 multiplier + 1 adder in the TDF. The improvement on performance is more observable for large M.
In VLSI implementation, the TDF is preferred over the direct form due to its inherent pipelined accumulation section. A TDF consists of two modules: multiple constant multiplication (MCM) and product accumulation to produce the filter output. In the last decade, a lot of effort has been put to reduce the complexity of the MCM module. However, the product accumulation module is often ignored. In [14], Xin et al. proposed a novel structure for areaefficient implementation of FIR filters by replacing parts of long wordlength with shorter wordlength SAs. Figure 2 shows the proposed transposed direct form FIR filter architecture.
The number of adders can be estimated as [15]
Figure 3 shows the proportion of the MCM block and the accumulation block in terms of FAs (full adders). MCM consumes about 10% of the total FAs, while the accumulation block consumes about 90%. In [16], Proakis et al. prove that if the phase of the filter is linear, the coefficients are symmetric or antisymmetric:
If M is odd, Figure 1 can be improved to save the hardware cost, shown as in Figure 3. The structure in Figure 3 uses only (M + 1)/2 multipliers, which is reduced by almost 50% for a large M and uses the same number of adders as the structure in Figure 2. Since multipliers consume the most area in the filter, the optimization based on the symmetric structure can reduce power dissipation.
The canonical signed digit (CSD) arithmetic can be used for reducing the area and power for the Mtap filter. Transposed structure and symmetry structure can be used to improve the delay and cost. Figure 4 shows the transposed symmetric structure of the FIR filter with M being odd.
Therefore, in this paper, we are interested on the optimization of the accumulation block. If we can reduce the power of the accumulation block module, we can reduce the power consumption of the filter.
In [8], Xin et al. have expressed (5) as
Figure 5 shows the structure of the filter with an odd order N. A similar structure as in Figure 5, without the tap can be obtained for an even order N.
The structure in Figure 5 presents an average power reduction over [16, 17] 41% and 23.5%, respectively. In the rest of this paper, we will use the filter structure in Figure 5.
3. Power Optimization by Clock Gating Technique
Clock gating is a technique that reduces the switching power dissipation of the clock signals.
When the present and the next state of the D flipflop is observed, it is noticed that when two continuous inputs are identical, the D flipflop gives the same value as the output. Even if the inputs do not change from one clock to the next, the latch still consumes clock power.
At each clock edge, the main aim of the clock controller block is as follows:(i)It verifies if the input data of the FIR block at the current clock period T_{n} are similar to the input data at the previous clock periods T_{n−1}(ii)If (i) is verified, the clock controller disables the clock signal of the FIR filter.
The clock gating methods can be classified into three groups: The synthesisbased methods: the clock enables the signal which are synthesized based on the logic of the underlying system The datadriven methods: the clock gating system compares the data values and generates the clock signal The autogating flipflops: the simplicity of this method makes it possible to only small power savings
4. Power Optimization by MBFF Technique
Flipflops (FFs) are usually used in digital systems to store data. To reduce the clock power, several FFs can be grouped into a module called a multibit FF (MBFF), replacing several flipflops [18]. Each flipflop contains two inverters which generate oppositephase clock signals. The use of MBFFs was proposed for optimizing clock delay, controlling clock skew, and improving routing resource utilization. The MBFF can be used in the RTL design level to reduce the clocktoQ propagation delay (t_{p}CQ). The driving capability of a clock buffer can be evaluated by the number of minimumsized inverters that it can drive on a given rising or falling time. Because of this phenomenon, several flipflops can share a common clock buffer to avoid unnecessary power waste. Figure 6 shows the block diagrams of 1 and 2bit flipflops. If we replace the two 1bit flipflops as shown in Figure 6(a) by a 2bit flipflop as shown in Figure 6(b), the total power consumption can be reduced because the two 1bit flipflops can share the same clock buffer.
(a)
(b)
The best grouping of FFs that minimizes the energy consumption has been explained in [19].
5. Introducing Clock Gating into MBFF
This approach consists of maximizing the clock deactivation at the gate level, where the clock signal driving a flipflop is deactivated (gated) when the flipflop states are not subject to a change in the next clock cycle. Since the clock must be disabled when the inputs to all the flipflops in a group do not change, it is, consequently, beneficial to group flipflops whose switching activities are highly correlated to derive a joined enabling signal.
If we consider p probability of switching the clock data, the energy E1 consumed by a 1bit FF can be expressed as follows:
We denote λ_{1} and λ_{k} as, respectively, the energy of the FF’s and MBFF’s internal clock drivers and as, respectively, the data toggling energy of the FFs and perbit data toggling energy of MBFFs
In [19], the authors have shown that the expected energy is
The relation (8) presents the worst case. In fact, the correlation between FF toggling lets to have an upper energy savings. The energy saving potential of kMBFF can be expressed as follows:
For p = 0, the energy consumption is the approximation of 35% savings for the 2MBFF and 55% savings for the 4MBFF. Figure 7 illustrates a filter architecture with a DDCG integrated into a kMBFF. The group size k that maximizes the energy savings solves the following equation:where C_{FF} is the clock input load of FF and C_{latch} is the clock input loads of the latch.
6. Implementation Methodology
Several methodology designs have been proposed in the literature [20–22] that differ according to system requirements, abstraction level, model refinements, and other design issues. In this paper, the platformbased design methodology has been used. This methodology allows designers to work at high levels of abstraction.
The PBD [23–26] is a design methodology that was proposed to decrease time to market and enhance product reuse. This methodology neither uses the bottomup nor the topdown view; it is defined as a “meeting in the middle” process where refinements of specifications meet with abstractions of potential implementations. This methodology also allows designing the system at high levels of abstraction without making distinction between hardware and software tasks. After defining system specifications and requirements, the designer defines the parts that will be implemented as hardware components, the parts that will be implemented as software running on the component, and the parts realized with reconfigurable hardware.
7. Results and Analysis
We have used a speech signal for testing the proposed FIR filter. Several parameters have been also used in the present discussion. The power consumption ratio P_{r} means the ratio of the proposed filter power consumption to the conventional filter power:
In the following discussions, as a metric of power savings, we use the power consumption ratio, which means the ratio of the proposed filter power consumption to the conventional filter power.
P_{sl} means the leakage power saving ratio and P_{rl} means consumption ratio of the leakage power:
The dynamic power saving ratio P_{sd} is expressed aswhere P_{rd} means the consumption ratio of the dynamic power.where P_{st} presents the total power saving ratio and P_{rt} means the power consumption ratio of the total power.
We compare the proposed implementation of the proposed structural FIR filter using datadriven clock gating and multibit flipflops with the conventional FIR filter. Table 1 presents the leakage power comparison for random and speech signals input in milliWatts. The proposed structural reconfigurable FIR filter gives lesser power consumption when compared to the conventional FIR filter design.
The specifications of the implemented FIR filters are 16bit input sequence data and coefficients Data range is [−1, 1] 16bit multipliers 24bit digital output signal
Speech corpus database in the wav format from ITUT P.50 [29] and NOIZEUS [30] has been used.
Table 2 shows the dynamic power comparison for random and speech signals input in milliWatts. The proposed FIR filter presents lesser power consumption compared to the classic architectures of FIR filters. Table 3 shows the total power comparison for random and speech input signals input in milliWatts. The total power of the proposed FIR filter is lower than the conventional design.
As a measure of filter performance degradation, we use the mean square error (MSE) between the proposed reconfigurable filter output and the original filter output. Table 4 presents the comparison in terms of MSE for the random and speech signals. The MSE is reduced compared to the conventional works:where n, S, and Si are the number of samples, the expected output, and the proposed output, respectively.
The signal power to mean square error ratio (SMR) [31] is defined as the ratio of the desired signal to the distorted error signal power.
Table 5 shows the comparison in terms of SMR for both existing and proposed FIR filter designs. The proposed reconfigurable FIR filter shows decrease in SMR as taps increase, when compared to the existing designs.
Simulation result examples of gains vs frequency of a lowpass FIR filter with two cutoff frequencies of 100 Hz and 5 kHz are proposed in Figures 8(a) and 8(b). The proposed architecture is also implemented using FPGA of family Virtex5(XC5VLX110TFF136). The design results in Table 6 show the resources occupied by the implementation of the FIR filter with and without the proposed method. The results show that the use of this method does not degrade the other design parameters such as the material resources occupied (Table 6).
(a)
(b)

The proposed FIR filter designed by multibit flipflops and clock gating has the advantage in terms of area, delay, and power consumption. Therefore, the circuit performance is high compared to the FIR filter designed by singlebit flipflops. In this architecture, we have used the carry look ahead adder and the array multiplier for implementing the FIR filter. For example, for the 9tap FIR filter, we get the output at the 8th clock pulse. But, in the proposed technique, we get the output at the 2nd clock pulse. Therefore, delay is reduced and speed of the circuit performance is increased and also power consumption of the circuit is reduced due to multibit flipflops.
The proposed method has been also synthesized using TSMC 0.25 CMOS technology. Power consumption is measured in the spicelevel simulations using nanosim [33] with the operation frequency of 100 MHz. In Table 7, the proposed architecture is compared with previous works [5, 32] in terms of power saving and MSE. The proposed filter shows power savings than the filters in [5, 32].
8. Conclusion
In this paper, a novel architecture of the FIR filter is proposed. The proposed design is based on the combination of datadriven clock gating (DDCG) and multibit flipflops (MBFFs) applied to an appropriate FIR filter structure. To compare the power consumption between the conventional and the proposed FIR filters, two types of inputs are used: speech signal and random signal. In the proposed structure of the filter, the leakage power, the dynamic power, and the total power are reduced. The proposed methods allow a power saving up to 22%. The design results show that the loss in resources is less than 10%. This loss is considered as negligible when compared to the gain in power consumption.
Data Availability
The data used to support the findings of this study are available from the corresponding author upon request.
Conflicts of Interest
The authors declare that they have no conflicts of interest.
References
 R. Mahesh and A. P. Vinod, “New reconfigurable architectures for implementing FIR filters with low complexity,” IEEE Transactions on ComputerAided Design of Integrated Circuits and Systems, vol. 29, no. 2, pp. 275–288, 2010. View at: Publisher Site  Google Scholar
 S. Y. Park and P. K. Meher, “Efficient FPGA and ASIC realizations of a DAbased reconfigurable FIR digital filter,” IEEE Transactions on Circuits and Systems II: Express Briefs, vol. 61, no. 7, pp. 511–515, 2014. View at: Publisher Site  Google Scholar
 O. Gustafsson, “A difference based adder graph heuristic for multiple constant multiplication problem,” in Proceedings of the 2007 IEEE International Symposium on Circuits and Systems, pp. 1097–1100, New Orleans, LA, USA, May 2007. View at: Publisher Site  Google Scholar
 R. Mahesh and A. P. Vinod, “A new common subexpression elimination algorithm for realizing lowcomplexity higher order digital filters,” IEEE Transactions on ComputerAided Design of Integrated Circuits and Systems, vol. 27, no. 2, pp. 217–229, 2008. View at: Publisher Site  Google Scholar
 J. T. Ludwig, S. H. Nawab, and A. P. Chandrakasan, “Lowpower digital filtering using approximate processing,” IEEE Journal of SolidState Circuits, vol. 31, no. 3, pp. 395–400, 1996. View at: Publisher Site  Google Scholar
 E. Chitra and T. Vigneswaran, “An efficient low power and high speed distributed arithmetic design for FIR filter,” Indian Journal of Science and Technology, vol. 9, no. 4, pp. 1–5, 2016. View at: Publisher Site  Google Scholar
 S. A. Alam and O. Gustafsson, “Design of finite word length linearphase FIR filters in the logarithmic number system domain,” VLSI Design, vol. 2014, Article ID 217495, 14 pages, 2014. View at: Publisher Site  Google Scholar
 Z. Yu, M. L. Yu, K. Azadet, and A. N. Wilson Jr., “A low power FIR filter design technique using dynamic reduced signal representation,” in Proceedings of the 2001 International Symposium on VLSI Technology, Systems, and Applications. Proceedings of Technical Papers (Cat. No. 01TH8517), pp. 113–116, Hsinchu, Taiwan, April 2001. View at: Publisher Site  Google Scholar
 Z. Yu, M.L. Yu, K. Azadet, and A. N. Willson Jr., “A low power adaptive filter using dynamic reduced 2’scomplement representation,” in Proceedings of the IEEE 2002 Custom Integrated Circuits Conference (Cat. No. 02CH37285, Orlando, FL, USA, May 2002. View at: Publisher Site  Google Scholar
 S. Padmapriya and V. Lakshmi Prabha, “Design of an efficient dual mode reconfigurable FIR filter architecture in speech signal processing,” Microprocessors and Microsystems, vol. 39, no. 7, pp. 521–528, 2015. View at: Publisher Site  Google Scholar
 Y. Hong and Y. Lian, “A memristorbased continuoustime digital FIR filter for biomedical signal processing,” IEEE Transactions on Circuits and Systems I: Regular Papers, vol. 62, no. 5, pp. 1392–1401, 2015. View at: Publisher Site  Google Scholar
 Y. T. Chang, C. Hsu, M. P. Lin, Y. Tsai, and S. Chen, “Postplacement power optimization with multibit flipflops,” in Proceedings of the 2010 IEEE/ACM International Conference on ComputerAided Design (ICCAD), San Jose, CA, USA, November 2010. View at: Publisher Site  Google Scholar
 A. Bonanno, A. Bocca, A. Macii, E. Macii, and M. Poncino, “Datadriven clock gating for digital filters,” Integrated Circuit and System Design. Power and Timing Modeling, Optimization and Simulation, vol. 5953, pp. 96–105, 2009. View at: Google Scholar
 X. Lou, P. Meher, Y. Yu, and W. Ye, “Novel structure for areaefficient implementation of FIR filter,” IEEE Circuits and Systems Society, vol. 64, no. 10, pp. 1212–1216, 2016. View at: Publisher Site  Google Scholar
 A. Bonanno, A. Bocca, A. Macii, E. Macii, and M. Poncino, “Datadriven clock gating for digital filters,” in Integrated Circuit and System Design. Power and Timing Modeling, Optimization and Simulation, Springer, Berlin, Germany, 2010. View at: Publisher Site  Google Scholar
 J. Proakis and D. Manolakis, “Digital Signal Processing: Principle, Algorithms and Applications”, PrenticeHall, Upper Saddle River, NJ, USA, 4th edition, 2006.
 K. Johansson, O. Gustafsson, and L. Wanhammar, “Bitlevel optimization of shiftandadd based FIR filters,” in Proceedimgs of the 2007 14th IEEE International Conference on Electronics, Circuits and Systems, vol. 3, pp. 713–716, Marrakech, Morocco, December 2007. View at: Publisher Site  Google Scholar
 H. Kao, C. Hsu, and S. Huang, “Twostage multibit flipflop clustering with useful skew for low power,” in Proceedings of the 2019 2nd International Conference on Communication Engineering and Technology (ICCET), Nagoya, Japan, April 2019. View at: Publisher Site  Google Scholar
 Y. Shyu, J. Lin, C. Huang, C. Lin, Y. Lin, and S. Chang, “Effective and efficient approach for power reduction by using multibit flipflops,” IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vol. 21, no. 4, pp. 624–635, 2013. View at: Publisher Site  Google Scholar
 A. S. Bensalem, M. Bozga, J. Combaz, M. Jaber, T. Nguyen, and J. Sifakis, “Rigorous componentbased system design using the BIP framework,” IEEE Software, vol. 28, no. 3, pp. 41–48, 2011. View at: Publisher Site  Google Scholar
 J. Jensen, D. Chang, and E. Lee, “A modelbased design methodology for cyberphysical systems,” in Proceedings of the International Conference on Wireless Communications and Mobile Computing Conference (IWCMC), Istanbul, Turkey, 2011. View at: Google Scholar
 A. Gamatié, S. Beux, E. Piel et al., “A modeldriven design framework for massively parallel embedded systems,” ACM Transactions on Embedded Computing Systems, vol. 10, no. 4, 2011. View at: Publisher Site  Google Scholar
 I. Augé, F. Pétrot, F. Donnet, and P. Gomez, “Platformbased design from parallel C specifications,” IEEE Transactions on ComputerAided Design of Integrated Circuits and Systems, vol. 24, no. 12, pp. 1811–1826, 2005. View at: Publisher Site  Google Scholar
 A. S. Vincentelli and G. Martin, “Platform based design and software design methodology for embedded systems,” IEEE Design and Test of Computers, vol. 18, no. 6, pp. 23–33, 2001. View at: Publisher Site  Google Scholar
 L. P. Carloni, F. De Bernardinis, C. Pinello, A. L. SangiovanniVincentelli, and M. Sgroi, “Platformbased design for embedded systems,” in The Embedded Systems Handbook, R. Zurawski, Ed., CRC Press, Boca Raton, FL, USA, 2005. View at: Google Scholar
 S. KurtKeutzer, A. Malik, J. Richard Newton, M. Rabaey, and A. SangiovanniVincentelli, “Systemlevel design: orthogonalization of concernsand platformbased design,” IEEE Transactions on Computeraided Design of Integrated Circuits and Systems, vol. 19, no. 12, 2000. View at: Publisher Site  Google Scholar
 S. Hwang, G. Han, S. Kang, and J. Kim, “New distributed arithmetic algorithm for lowpower FIR filter implementation,” IEEE Signal Processing Letters, vol. 11, no. 5, pp. 463–466, 2004. View at: Publisher Site  Google Scholar
 L. SeokJae, J.W. Choi, S. W. Kim, and J. Park, “A reconfigurable FIR filter architecture to trade off filter performance for dynamic power consumption,” IEEE Transactions on Very Large Scale Integration (VLSI), vol. 19, no. 12, pp. 2221–2228, 2011. View at: Publisher Site  Google Scholar
 http://www.itu.int/net/itut/sigdb/menu.aspx.
 http://ecs.utdallas.edu/loizou/speech/noizeus.
 J. G. Proakis, Digital Communications, McGrawHill, New York, NY, USA, 3rd edition, 1995.
 K.H. Chen and T.D. Chiueh, “A lowpower digitbased reconfigurable FIR filter,” IEEE Transactions on Circuits and Systems II: Express Briefs, vol. 53, no. 8, pp. 617–621, 2006. View at: Publisher Site  Google Scholar
 Synopsys Inc, Nanosim Reference Guide, Synopsys Inc., Mountain View, CA, USA, 2007.
Copyright
Copyright © 2020 Lamjed Touil et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.