Research Article  Open Access
Mohd Tausif, Ekram Khan, Mohd Hasan, Martin Reisslein, "LiftingBased Fractional Wavelet Filter: EnergyEfficient DWT Architecture for LowCost Wearable Sensors", Advances in Multimedia, vol. 2020, Article ID 8823689, 13 pages, 2020. https://doi.org/10.1155/2020/8823689
LiftingBased Fractional Wavelet Filter: EnergyEfficient DWT Architecture for LowCost Wearable Sensors
Abstract
This paper proposes and evaluates the LFrWF, a novel liftingbased architecture to compute the discrete wavelet transform (DWT) of images using the fractional wavelet filter (FrWF). In order to reduce the memory requirement of the proposed architecture, only one image line is read into a buffer at a time. Aside from an LFrWF version with multipliers, i.e., the LFr, we develop a multiplierless LFrWF version, i.e., the LFr, which reduces the critical path delay (CPD) to the delay of an adder. The proposed LFr and LFr architectures are compared in terms of the required adders, multipliers, memory, and critical path delay with stateoftheart DWT architectures. Moreover, the proposed LFr and LFr architectures, along with the stateoftheart FrWF architectures (with multipliers (Fr) and without multipliers (Fr)) are compared through implementation on the same FPGA board. The LFr requires 22% less lookup tables (LUT), 34% less flipflops (FF), and 50% less compute cycles (CC) and consumes 65% less energy than the Fr. Also, the proposed LFr architecture requires 50% less CC and consumes 43% less energy than the Fr. Thus, the proposed LFr and LFr architectures appear suitable for computing the DWT of images on wearable sensors.
1. Introduction
1.1. Motivation
The availability of lowcost smallsized cameras attached to wearable sensors and portable imaging devices has opened up a wide range of imagingoriented applications, including assisted living, smart healthcare, traffic monitoring, virtual sports experiences, and posture recognition [1–12]. An interconnection of visual sensor nodes (sensor nodes with attached camera) is known as visual sensor network (VSN) [13, 14] or as wireless multimedia sensor network (WMSN) [15, 16]. Wearable visual sensors may also be a part of the Internet of things (IoT) [17–21]. Lowcost IoT wearable sensors [22] enable a wide range of activities for the benefit of society, e.g., hazard avoidance systems for worker safety [23], navigation aids for visually impaired individuals [24], activity monitoring [25], smart irrigation [26], and sports [27].
In many visual applications of wearable sensors and portable imaging devices, images captured by the camera need to be transmitted wirelessly to a bodyworn or nearby hub device. The wearable sensors and portable imaging devices have limited resources, and the wireless links have narrow bandwidth [28], making it impossible to directly send the raw (uncompressed) images. Thus, there is a need to compress the images before transmission [29]. Therefore, an image coder is needed in order to compress the images. In an image coder, an image is generally first transformed using the discrete cosine transform (DCT) [30] or discrete wavelet transform (DWT) [31, 32] and then it is quantized and entropy coded. The DWT, which is also used in JPEG 2000 [33], is popular in a wide variety of applications, including activity monitoring [34], fault detection in inverter circuits [35], medical imaging [36], image denoising [37], image recognition [38], image reconstruction [39], watermarking [40], computer graphics, and realtime processing [41] due to its multiresolution feature and excellent energy compaction properties [42, 43].
The hardware architectures for wearable visual sensors and portable imaging devices in the IoT and wireless multimedia sensor networks should require minimal hardware resources and consume low energy for a small form factor and long battery life [44, 45]. Generally, the computational capabilities of visual sensor nodes have been increasing in recent years [46]. Nevertheless, due to the economic pressures on visual sensor designs and despite the emergence of specialized hardware acceleration, e.g., FPGA and, components [47–49], the computational resources of visual sensors will likely remain scarce. Emerging computing and communication paradigms, such as mobile ad hoc cloud computing [50, 51], expect the nodes to not only transmit sensed images but also to participate in some service computing functions, e.g., for localized image analysis and decision making, which can be orchestrated through softwaredefined networking and control structures [52–54]. In order to make the economical functioning of wearable visual sensors in such networked systems feasible, the resource usage of the image coding and transform must be very low. In particular, as the DWT is an important component of an image coder for visual sensors, the DWT hardware architecture should have minimal area and energy consumption.
1.2. Related Work
The conventional convolutionbased DWT computation of an image requires a huge amount of memory due to its row and columnwise scanning [55, 56], making it unsuitable for memoryconstrained wearable sensors. The different lowmemory architectures reported in the literature for computation of DWT can be categorized as linebased architectures [57], stripebased architectures [58, 59], blockbased architectures [60, 61], and the fractional wavelet filter (FrWF) architecture [62]. For an image of dimension pixels, the line, stripe, and blockbased architectures require random access memory (which we refer to as RAM or memory for brevity) in the range of to words, while the FrWF architecture requires words of RAM [62].
Another lowmemory pipelinebased architecture has been proposed in [63]. However, the design in [63] is based on the nonseparable DWT computation approach, which is unpopular because of its higher computational requirements than the conventional separable approach. It is a wellknown fact that at a particular throughput, the separable 2D DWT computation approach is computationally more efficient than the nonseparable approach [64]. A dual data scanningbased DWT architecture is reported in [65]. In this architecture, several 2D DWT units are combined into a parallel multilevel architecture for computing up to six DWT levels. However, this architecture needs words of memory. An architecture based on an interlaced read scan algorithm (IRSA) is proposed in [66] in conjunction with a liftingbased approach with a 5/3 filterbank which requires words of memory. However, the long critical path delay (CPD) of (where is the multiplier delay and is the adder delay) of the architecture in [66] may limit its use in realtime applications.
An LUTbased lifting architecture for computing the DWT has been reported in [67]. The design [67] has low area and power requirements. However, it has a long CPD equal to (where is the lookup table (LUT) delay, bits is the word length, and is the full adder delay). A liftingbased architecture for computing both the 1D and 2D DWT has been presented in [68]. However, this design uses a transpose buffer of size . An energyefficient blockbased DWT architecture has been proposed in [61]. However, this architecture requires a large number of multipliers, namely, 16 and 36 multipliers for 5/3 and 9/7 filters, respectively. Another energyefficient liftingbased reconfigurable DWT architecture has been proposed in [69], mainly for medical applications. However, the frequency of operation of this architecture is limited to 20 MHz. An energyefficient liftingbased configurable DWT architecture for neural sensing applications has been proposed in [70], requiring 12 adders and 12 multipliers. However, its operating frequency is limited to only 400 KHz and 80 KHz for the gating and interleaving architectures used in the main architecture, respectively.
A powerefficient modified form of the DWT architecture has been presented in [71], using Radix8 booth multipliers. This architecture uses bit truncation to reduce the area and power. However, bit truncation degrades the quality of the reconstructed image when the inverse DWT is applied. There have been some DWT implementations on graphics processing units (GPUs) [72–78]; however, GPUs are relatively expensive for lowcost sensing platforms.
The recently proposed FrWF architecture requires only words of memory and has a CPD equal to the delay of a multiplier [62]. A multiplierless FrWF architecture was also reported in [62] which reduces the CPD to the delay of an adder, . However, the FrWF architecture (with and without multipliers) has high energy consumption owing to its large number of compute cycles. The high energy consumption of the FrWF architecture may be prohibitive for wearable sensors and portable imaging devices with tight memory and energy constraints [79].
1.3. Contributions and Structure of This Article
This paper proposes the LFr, a novel liftingbased energyefficient architecture to compute the DWT coefficients of an image with a 5/3 filterbank. At the core of the proposed LFr architecture is a novel basic Lift_block that computes the and subband coefficients with only two twoinput adders and one multiplier (plus two pipeline registers), thus greatly reducing the hardware requirements compared to prior convolution architectures. Moreover, a multiplierless implementation of the proposed architecture, denoted by LFr, is designed. The multiplierless LFr has a shorter CPD than the proposed multiplierbased LFr architecture. The proposed LFr and LFr architectures are not only efficient in terms of energy but also require fewer adders, multipliers, and registers than the stateoftheart FrWF architectures (with multipliers (Fr) and without multipliers (Fr)). We compare the proposed architectures with stateoftheart DWT computation architectures in terms of the required adders, multipliers, memory, and critical path delay. We also implement the proposed architectures and the stateoftheart FrWF architectures on the same FPGA board. Experimental results demonstrate that the proposed LFr and LFr architectures have lower hardware resource requirements and energy consumption than the stateoftheart Fr and Fr architectures.
The remaining part of the paper is arranged as follows. Section 2 gives a brief overview of the DWT and FrWF techniques. The proposed liftingbased LFrWF architecture is described in detail in Section 3 along with its memory requirement. The evaluation results along with related discussions are presented in Section 4. Finally, Section 5 concludes the paper.
2. Background
This section briefly reviews the DWT and FrWF techniques along with FrWF architecture. The main notations used in this article are summarized in Table 1.

2.1. Discrete Wavelet Transform (DWT)
The most popular approach for computing the twodimensional (2D) DWT of an image is the separable approach, in which the rows are filtered first, followed by columnwise filtering of the resulting coefficients. When a row is convolved (filtered) by a lowpass filter (LPF) and a highpass filter (HPF), followed by downsampling by a factor of two, the results are known as approximation and detail coefficients, respectively. For a 1D signal of dimension , which we consider as a preliminary step for computing the 2D DWT, there are approximation coefficients and detail coefficients. Combining the downsampling with the convolution operation, the approximation coefficients and the detail coefficients for can be expressed mathematically as [55]respectively, whereby and denote the LPF and HPF coefficient, respectively, denotes the signal sample, while and are the number of LPF and HPF coefficients, respectively. The largest integer less than or equal to is denoted by the symbol .
In the separable approach, all image rows are first convolved separately by a HPF and a LPF, followed by downsampling with a factor of two, resulting in the and subbands. Then, the columns of the and subbands are convolved by a HPF and a LPF, followed by downsampling with a factor of two, resulting in the , , , and subbands [80]. However, this approach needs to save the entire image in the RAM on the sensor (board) system. Thus, this DWT computation approach requires a huge amount of memory, making this approach unsuitable for lowcost wearable sensors and portable imaging devices with limited RAM [55, 56].
The lifting scheme [81] computes the DWT of images using inplace computations which save memory. Moreover, the lifting scheme uses predict and update steps for computing the subbands. In particular, the lowpass filtered coefficients are predicted using the highpass filtered coefficients. Thus, the lifting scheme reduces the convolution operations needed by the LPF coefficients. Hence, the lifting scheme reduces the number of arithmetic computations required for computing the image DWT [82].
The lifting scheme for a 5/3 filterbank is shown in Figure 1. In this figure, are the input signal samples. Among these samples, , and are the evenindexed samples, while , and are the oddindexed samples. Also, and are the highfrequency and lowfrequency lifting parameters, respectively; and are the scaling parameters, whereby , , and [66]; , , and are the highfrequency wavelet coefficients; while , , , and are the lowfrequency wavelet coefficients. The high and lowfrequency wavelet coefficients are computed following the diagram in Figure 1; for instance,
It should be noted that the arrows without an associated symbol in Figure 1 have the unit multiplication factor, i.e., 1.
2.2. Fractional Wavelet Filter (FrWF)
The FrWF is a lowmemory DWT computation technique [56]. It uses a specific image data scanning technique in order to reduce the memory required for computing the DWT. It selects a vertical filter area (VFA), scanning rows of the image from an SDcard (where is the number of LPF coefficients). The rows in a VFA are read in raster scan order. Once the reading of all the image rows in a VFA is complete, the VFA is shifted by two lines in the vertical direction. This shifting of the VFA is done in order to incorporate the dyadic downsampling. One line of the , and subbands is computed from one VFA. All the image lines are covered by shifting the VFA. The VFA will be shifted times for an image of dimension . The FrWF has been combined with a lowmemory image coding algorithm to design an efficient image coder for WMSNs in [83].
An FPGA architecture for the FrWF with a 5/3 filterbank has been proposed in [62]. This FrWF architecture, which follows the FrWF data scanning order, requires words of memory and a total of compute cycles. The large number of compute cycles results in a high energy consumption, which may be prohibitive for resourceconstrained wearable visual sensors and portable imaging devices. The proposed LFrWF focuses on reducing the energy consumption for computing the DWT of images.
3. Proposed LFrWF Low Energy Architecture
This section presents the proposed LFrWF liftingbased architecture to compute the DWT of an image using the FrWF approach with a 5/3 filterbank.
3.1. Data Scanning Order
The proposed liftingbased architecture follows the data scanning order of the FrWF algorithm [56]. It is assumed (as is common for lowmemory implementations of the DWT computation) that the original image is stored on an SDcard; throughout, the SD accesses are appropriately buffered to compensate for the latencies of the SDcard accesses. Initially, a vertical filter area which spans image lines ( is the number of LPF coefficients) is marked in the SDcard. The rows of the image are read in raster scan order from the VFA, one line at a time into the RAM buffer P_store (as shown in Figure 2). After the processing of all the rows of the VFA is completed, the VFA is shifted down by two lines and the new rows are again read into buffer P_store in raster scan order. The complete image is read by repeatedly shifting the VFA downwards by two lines until all the rows are read. In the proposed architecture, one complete line is read at a time and scanned in raster order; in contrast, the FrWF architecture in [62] reads only 5 coefficients of an image line at a time.
3.2. Proposed LiftingBased LFrWF Architecture
This subsection describes the proposed liftingbased DWT architecture in detail.
3.2.1. TopLevel Architecture
Figure 2 shows the toplevel block diagram of the proposed LFrWF architecture. The LFrWF architecture works as follows. First, the input image pixels of a line are read into the register P_store. This P_store register stores the original image pixels of 8 bits each. The pixels of the image from P_store are sent to the Lift_block (as detailed in Figure 3) to compute the and subband coefficients using the lifting scheme. The generated and subband coefficients are saved in the register 1D_store. The contents of the 1D_store register are used as inputs for the Conv_block (as shown in Figure 4), which generates intermediate coefficients that are saved in the HH_store, HL_store, LH_store, and LL_store registers. These intermediate values are successively updated by the next image lines. The intermediate values in the registers HH_store, HL_store, LH_store, and LL_store, after updating, will give the values of the , and subbands, respectively. Once the final subband coefficients of the , and subbands are computed, they are transferred and saved in an external SDcard. The functioning of the different blocks leading to the computation of the subbands is described next.
3.2.2. Lifting Block
In the lifting scheme with a 5/3 filterbank, two previous highpass filtered coefficients are used to predict a lowpass filtered coefficient. For the efficient implementation of the lifting scheme, we introduce a novel basic Lift_block. As illustrated in Figure 3, the basic Lift_block computes two subband coefficients and one subband coefficient from a group of five input pixels in three steps. The inputs (Input_{1}, Input_{2}, Input_{3}, and Lift_{par}) and output (Out_{1}) of the adders and multiplier to be used in Figure 3 for the different steps are shown in Table 2. The first two steps compute two coefficients of the subband and the third step computes a coefficient of the subband. In Table 2, , and are the first five pixels of an image line. and are the first two highpass filtered coefficients which are stored as the first two elements of the register 1D_store. is the first lowpass filtered coefficient and is stored as the third element of the register 1D_store. The highpass filtered coefficients ( and ) and the lowpass filtered coefficient () are computed aswhere and are lifting parameters [66]. Once the five pixels (, and ) are processed, the first two pixels are discarded and two new pixels are read along with the previous last three pixels. The same procedure, in equations (3)–(5), is repeated on these new pixels to compute the and subband coefficients.

The basic Lift_block in Figure 3 requires two twoinput adders and one multiplier. The functionality of this basic Lift_block essentially replaces the functionality of the convolution stage1 block in the Fr architecture, as shown in Figure 3 in [62] and elaborated in Figures 4–7 in [62]. For an LPF length of and an HPF length of , the convolution stage1 block in [62] requires twoinput adders and multipliers for the lowpass filtering as well as twoinput adders and multipliers for the highpass filtering. Thus, for a 5/3 filter, the Fr convolution stage1 block requires six adders as well as eight multipliers.
3.2.3. Convolution Block
In the Conv_block in Figure 4, the subband coefficients from the 1D_store register are multiplied by a suitable HPF and LPF coefficient (as determined by a multiplexer) and then added/stored with the previous value in the registers HH_store and HL_store, respectively. Similarly, the subband coefficient in the 1D_store register is multiplied by a suitable HPF and LPF coefficient (as determined by a multiplexer) and then added/stored with the previous value in the registers LH_store and LL_store, respectively. The values in the registers HH_store, HL_store, LH_store, and LL_store are updated to compute the coefficients of the , and subbands, respectively.
We note that the Conv_block in Figure 4 is essentially equivalent to the aggregation of the FrWF convolution stage2 blocks in Figures 4–7 in [62]. The Conv_block in Figure 4 requires four twoinput adders and four multipliers. On the other hand, the aggregation of the FrWF convolution stage2 blocks in Figures 4–7 in [62] requires two twoinput adders and two multipliers.
3.2.4. Pipeline Registers
The Lift_block and the Conv_block use two and four pipeline registers, respectively, to temporarily save the intermediate results after each compute cycle. Through the use of the pipeline registers, the critical path delay (CPD) of the proposed LFrWF architecture becomes the multiplier delay .
Overall, for a 5/3 filter, considering both the basic Lift_block (Figure 3) and the Conv_block (Figure 4), the proposed LFr requires six twoinput adders and five multipliers compared to eight twoinput adders and ten multipliers of the Fr architecture (Figures 4–7 in [62]).
The proposed LFrWF architecture stores the original image and the subbands in the SDcard. Thus, higher wavelet decomposition levels can be computed with the same architecture, whereby the subband coefficients are taken as input.
3.3. Proposed Multiplierless LFr Implementation
The 5/3 filterbank coefficients (shown in Table 3) and the 5/3 filterbank lifting parameters involve integer division and multiplication. Thus, they can be implemented using the shift and add method. More specifically, the convolution with the 5/3 filterbank requires only integer multiplication and division and can therefore be implemented with only shift and add operations. For example, , i.e., shifting the number two times to the right is equivalent to dividing by 4. The shift and add concept, as applied to the 5/3 filter coefficients, operates as follows:(1)The filter coefficient can be implemented by three right shift operations, followed by a complement operation(2)The filter coefficient can be implemented by two right shift operations(3)The filter coefficient can be implemented by two right shift operations, followed by addition with one right shift(4)The filter coefficient can be implemented by one right shift operation, followed by a complement operation(5)The coefficient , thus, no shifting is required

With these specified shifting operations, the convolution block can be simplified and implemented using only shifters and adders. Multiplierless computation blocks for the 5/3 LPF and HPF coefficients are given in Figures 5 and 6, respectively. One benefit of the multiplierless implementation over the multiplierbased architecture in Section 3.2 is that the multiplierless implementation reduces the CPD from the multiplier delay down to the adder delay .
3.4. Memory Requirement
In order to compute the DWT coefficients, the proposed LFrWF architecture uses four registers (HH_store, HL_store, LH_store, and LL_store), two register arrays (P_store and 1D_store), and six pipeline registers. The register array P_store (of size words) is used to store an image line. The and subband coefficients computed by the Lift_block are saved in the register array 1D_store of 3 words. The four registers HH_store, HL_store, LH_store, and LL_store are of words each. The total memory requirement of the proposed architecture is equal to the sum of all registers, i.e.,
3.5. Line Segmentation
Equation (6) indicates that LFrWF memory requirement grows with the image dimension and thus will be significantly greater than the FrWF memory requirement of words for large images. In order to reduce the memory requirement of the proposed LFrWF architectures, each image line may be segmented, as illustrated in Figure 7, with overlapping of coefficients at both boundaries of the second to the last, but one segment (the first and last segments only require overlapping at one boundary) (Appendix E in reference [88]). In this approach, only one line segment needs to be read into the register array P_store. Thus, the memory requirement of the LFrWF with line segments is
For the 5/3 filterbank with a VFA of lines, the memory requirement is
The other resource requirements are independent of line segmentation and remain unchanged.
The line segmentation reduces the memory requirement of the proposed LFrWF architectures so that their memory requirement can be reduced below the memory required by FrWF architectures of [62]. The FrWF architecture does not include a line segmentation provision; therefore, its memory requirement cannot be reduced further. We observe from Table 4 that the memory requirements of the proposed LFrWF architectures are greater than the FrWF memory requirements. However, by incorporating the line segmentation approach, the memory requirement of the LFrWF architectures can be reduced below that of the FrWF architectures. In case of the 5/3 filterbank, we observe from Table 4 that the memory requirement of the FrWF architectures is , while the memory requirement of LFrWF architecture with line segments is , see equation (8). Therefore, the LFrWF memory requirement is less than the FrWF memory requirement if .
4. Results and Discussion
This section presents the implementation of the proposed LFrWF architecture and its comparison with stateoftheart architectures. First, we compare the proposed LFrWF architecture with several stateoftheart architectures in terms of the required numbers of adders and multipliers, as well as the critical path delay (CPD) and required memory. Next, the postimplementation results of the proposed LFrWF architectures are compared with the stateoftheart FrWF architecture [62] by implementing both architectures on the Xilinx Artix7 FPGA platform.
4.1. Adders, Multipliers, CPD, and Memory
Table 4 compares the numbers of required adders and multipliers, as well as the CPD and the required RAM of the proposed LFrWF architectures with stateoftheart architectures. The numbers of adders and multipliers of the existing stateoftheart architectures shown in Table 4 have been taken from the corresponding papers. We observe from Table 4 that the proposed LFr architecture requires the least number of adders (namely, only six adders, see Figures 3 and 4) among the stateoftheart architectures. While the proposed LFr reduces the number of required adders only by two compared to the Fr architecture, the proposed LFr reduces the number of adders down to less than half of the other prior architectures. Among the architectures using multipliers, the proposed LFr architecture also requires the least number of multipliers, namely, only five multipliers, see Figures 3 and 4. Only the RMA [85] has a similarly low multiplier requirement with six multipliers (but requires approximately twice the memory compared to LFrWF). The other prior architectures require twice or more multipliers than the proposed architecture.
We also observe from Table 4 that the CPD of the proposed LFr architecture and the Fr architecture [62] are , which is less than the architectures in [85, 86]. We note from Table 4 that the multiplierless LFr and Fr have reduced the CPD to , which is less than the CPD of other stateoftheart architectures. The CPD of achieved by the proposed LFr architecture cuts the shortest CPD of any existing architecture of achieved by the Aziz architecture [87] down to half. Note that the shifter delay is commonly larger than the adder delay , i.e., ; thus, the PMA architecture [85] has a longer CPD than the Aziz architecture. The benefit of the reduction in CPD is that the architectures can be operated at higher frequencies, since maximum operations frequency = 1/CPD. As the CPD decreases, the maximum operating frequency increases.
Table 4 furthermore indicates that the FrWF architecture has the lowest memory requirement. However, the memory requirement of the proposed LFrWF architecture is less than the memory requirement of the other stateoftheart architectures in Table 4. As noted in Section 4.3, with segmentation of a line of words (pixels) into segments (of words each), the LFrWF memory requirement drops below the FrWF memory requirement if more than segments are used.
4.2. FPGA Implementation
The proposed LFrWF architecture computes the DWT coefficients of images based on lifting while following the FrWF approach. As observed from Table 4, the FrWF architecture [62] requires the least memory among the stateoftheart architectures. Thus, we implemented the FrWF architectures [62] and the proposed LFrWF architectures (initially without segmentation, i.e., ) on an Artix7 FPGA (family: Artix7, device: xc7a15t, package: csg324, speed: ). The implementations used identical multipliers, adders, and other components provided by the Xilinx Artix7 FPGA family. All architectures used an input pixel width of 8 bits and a datapath width of 16 bits. Table 5 summarizes the FPGA implementation comparison. We report averages for evaluations with seven popular (8 bits/pixel) test images, namely, “lena,” “barbara,” “goldhill,” “boat,” “mandrill,” “peppers,” and “zelda,” obtained from the Waterloo Repertoire (http://links.uwaterloo.ca). The energy consumption is evaluated by multiplying the number of compute cycles with the average power consumption and the compute (clock) cycle durations of 5.0 ns and 1.5 ns for the architectures with multipliers and without multipliers, respectively. These clock cycle durations have been selected to satisfy the CPD constraint, as given in Table 5, namely, a CPD of 4.8 ns for the design with multipliers and a CPD of 1.45 ns for the multiplierless design. The number of compute cycles and the average power consumption were evaluated by simulation with the Xilinx Vivado software suite, version 2018.2.
 
LUT, lookup tables; FF, flipflops; CC, compute cycles. 
We observe from Table 5 that the proposed LFr architecture requires approximately 22% less LUTs, 34% less FFs, and 50% less compute cycles, and consumes 65% less energy than the Fr architecture. Due to the reduced number of hardware components (LUTs and FFs), the area occupied by the LFr architecture will be less than the area of the corresponding Fr architecture. Moreover, the proposed multiplierless LFr architecture requires 2.6% less FFs and 50% less cycles and consumes 43% less energy than the multiplierless Fr architecture [62]. The proposed LFr architecture requires slightly more LUTs than the multiplierless Fr architecture.
We also observe from Table 5 that the proposed LFrWF reduces the number of required compute cycles to roughly half the compute cycles required by the FrWF. More specifically, while the FrWF requires on the order of 10 million compute cycles for a image, the proposed LFrWF requires only a little more than 5 million compute cycles. This substantial reduction is primarily due to the computational efficiency of the novel Lift_block (see Section 3.2.2) for computing the decomposition subband coefficients.
Moreover, we observe from Table 5 that the power consumption of the proposed LFrWF architecture with multipliers is less than the power consumption of the corresponding FrWF architecture with multipliers, while the multiplierless LFrWF and FrWF have approximately the same power consumption. The energy consumption is evaluated by multiplying clock cycle duration (which is based on the CPD) with the number of clock cycles and the consumed power. Due to the reduced (almost half) number of compute cycles and the lower (or same) power consumption, the energy consumption levels of the proposed LFrWF architectures are substantially lower than the energy consumption levels of the FrWF architectures. We further observe from Table 5 that compared to the designs with multipliers, the multiplierless designs of both the LFrWF and the FrWF have the same numbers of clock cycles, but shorter CPD and (slightly) reduced power levels; thus, the multiplierless designs have substantially reduced energy consumption levels.
We also observe from Table 5 that both architectures have the same CPD. We note that the numbers of hardware components, e.g., adders, multipliers, LUT, and FF, and other parameters, such as the number of clock cycles, memory, and CPD ( or ), are independent of the platform on which the design is implemented and the test image. Among the results presented in Tables 4–6, only the energy consumption, the power consumption, and the energy delay product (EDP) depend on the platform and image.

4.3. Line Segmentation
We observe from Table 6 that increasing the number of line segments reduces the memory requirement while increasing the number of compute cycles and the energy consumption. The compute cycle and energy consumption increases are mainly due to the overlapping of coefficients at the line segment boundaries which need to be read twice. However, for all line segmentations (), the number of compute cycles and energy consumption are less than for the corresponding FrWF architectures, see Table 5. We observe from Tables 5 and 6 that even with segments per line, the number of compute cycles and the energy consumption of the proposed LFrWF architectures are less than those for the corresponding FrWF architectures. Since the FrWF architectures of [62] read only 5 pixels at a time, the segmentation approach cannot be incorporated into the FrWF architecture. Hence, the memory of the FrWF architectures cannot be further reduced by incorporating line segmentation.
The EDPs of the LFrWF and FrWF architectures with and without multipliers are compared in Figures 8 and 9, respectively. The EDP, which characterizes both the consumed energy and the computational performance, is evaluated by multiplying the consumed energy with the corresponding clock cycle duration. We observe from Figures 8 and 9 that the EDPs of the proposed LFrWF architectures (with and without multipliers) are less than the EDPs of the corresponding FrWF architectures (with and without multipliers). The EDP of the proposed architecture with multipliers () is approximately 65% less than that for the FrWF architecture with multipliers, and the EDP of the proposed multiplierless architecture () is approximately 43% less than that for the multiplierless FrWF architecture. We observe from Figures 8 and 9 that the EDPs of the proposed LFrWF architectures increase with the number of segments. However, even with segments per image line, the EDPs of the proposed LFrWF architectures are less than those for the corresponding FrWF architectures.
5. Conclusion
This paper proposed and evaluated a liftingbased architecture to compute the DWT coefficients of an image based on the FrWF approach with a 5/3 filterbank. The proposed architecture requires fewer adders and multipliers than stateoftheart architectures. The proposed architecture with multipliers (LFr) and without multipliers (LFr) and the stateoftheart FrWF architecture (with and without multipliers) [62] have been implemented on the same FPGA board and compared.
The experimental results show that the proposed LFr architecture requires less hardware components (and thus less area) and consumes 65% less energy than the Fr architecture. Moreover, the proposed LFr architecture consumes 43% less energy with only a slight increase in area compared to the Fr architecture. The lower energy consumption with minimal area overhead makes the proposed architectures promising candidates for computing the DWT of images on resourceconstrained wearable sensors.
An important direction for future research is to integrate the LFrWF architecture with efficient architectures of stateoftheart waveletbased image coding algorithms to design FPGAbased image coders for realtime applications on wearable visual sensors and IoT platforms. Another interesting future research direction is the examination of the use of our proposed approach in the context of compressive sensing [15, 89].
Data Availability
The evaluation data used to support the findings of this study are included within the article.
Conflicts of Interest
The authors declare that there are no conflicts of interest regarding the publication of this paper.
References
 Y. Gu, Y. Tian, and E. Ekici, “Realtime multimedia processing in video sensor networks,” Signal Processing: Image Communication, vol. 22, no. 3, pp. 237–251, 2007. View at: Publisher Site  Google Scholar
 T. Kamal, R. Watkins, Z. Cen, J. Rubinstein, G. Kong, and W. M. Lee, “Design and fabrication of a passive droplet dispenser for portable high resolution imaging system,” Scientific Reports, vol. 7, pp. 1–13, 2017. View at: Publisher Site  Google Scholar
 R. LeMoyne and T. Mastroianni, “Wearable and wireless gait analysis platforms: smartphones and portable media devices,” in Wireless MEMS Networks and Applications, D. Uttamchandani, Ed., pp. 129–152, Woodhead Publishing, Cambridge, UK, 2017. View at: Google Scholar
 A. Nag, S. C. Mukhopadhyay, and J. Kosel, “Wearable flexible sensors: a review,” IEEE Sensors Journal, vol. 17, no. 13, pp. 3949–3960, 2017. View at: Publisher Site  Google Scholar
 H. Ng, W.H. Tan, J. Abdullah, and H.L. Tong, “Development of vision based multiview gait recognition system with MMUGait database,” The Scientific World Journal, vol. 2014, 2014. View at: Google Scholar
 S. Plangi, A. Hadachi, A. Lind, and A. Bensrhair, “Realtime vehicles tracking based on mobile multisensor fusion,” IEEE Sensors Journal, vol. 18, no. 24, pp. 10077–10084, 2018. View at: Publisher Site  Google Scholar
 S. Seneviratne, Y. Hu, T. Nguyen et al., “A survey of wearable devices and challenges,” IEEE Communications Surveys & Tutorials, vol. 19, no. 4, pp. 2573–2620, 2017. View at: Google Scholar
 L. Tsao, L. Li, and L. Ma, “Human work and status evaluation based on wearable sensors in human factors and ergonomics: a review,” IEEE Transactions on HumanMachine Systems, vol. 49, no. 1, pp. 72–84, 2019. View at: Publisher Site  Google Scholar
 B. Velusamy and S. C. Pushpan, “An enhanced channel access method to mitigate the effect of interference among body sensor networks for smart healthcare,” IEEE Sensors Journal, vol. 19, no. 16, pp. 7082–7088, 2019. View at: Publisher Site  Google Scholar
 G. Yang, W. Tan, H. Jin, T. Zhao, and L. Tu, “Review wearable sensing system for gait recognition,” Cluster Computing, vol. 22, no. S2, pp. 3021–3029, 2019. View at: Publisher Site  Google Scholar
 M. Zheng, P. X. Liu, R. Gravina, and G. Fortino, “An emerging wearable world: new gadgetry produces a rising tide of changes and challenges,” IEEE Systems, Man, and Cybernetics Magazine, vol. 4, no. 4, pp. 6–14, 2018. View at: Publisher Site  Google Scholar
 J. Wang and S. Payandeh, “Hand motion and posture recognition in a network of calibrated cameras,” Advances in Multimedia, vol. 2017, Article ID 2162078, 2017. View at: Google Scholar
 C.H. Hsia, J.M. Guo, and J.S. Chiang, “A fast Discrete Wavelet Transform algorithm for visual processing applications,” Signal Processing, vol. 92, no. 1, pp. 89–106, 2012. View at: Publisher Site  Google Scholar
 S. Soro and W. Heinzelman, “A survey of visual sensor networks,” Advances in Multimedia, vol. 2009, Article ID 640386, 2009. View at: Google Scholar
 A. S. Unde and P. P. Deepthi, “Ratedistortion analysis of structured sensing matrices for block compressive sensing of images,” Signal Processing: Image Communication, vol. 65, pp. 115–127, 2018. View at: Publisher Site  Google Scholar
 S. Heng, C. SoIn, and T. G. Nguyen, “Distributed image compression architecture over wireless multimedia sensor networks,” Wireless Communications and Mobile Computing, vol. 2017, Article ID 5471721, 2017. View at: Google Scholar
 F. Sun, C. Mao, X. Fan, and Y. Li, “Accelerometerbased speedadaptive gait authentication method for wearable IoT devices,” IEEE Internet of Things Journal, vol. 6, no. 1, pp. 820–830, 2019. View at: Publisher Site  Google Scholar
 H. Baali, H. Djelouat, A. Amira, and F. Bensaali, “Empowering technology enabled care using IoT and smart devices: a review,” IEEE Sensors Journal, vol. 18, no. 5, pp. 1790–1809, 2018. View at: Publisher Site  Google Scholar
 S. Hiremath, G. Yang, and K. Mankodiya, “Wearable Internet of Things: concept, architectural components and promises for personcentered healthcare,” in Proceedings of the 7th International Conference on Wireless Mobile Communication and Healthcare (MOBIHEALTH), pp. 304–307, Vienna, Austria, 2014. View at: Google Scholar
 M. Manas, A. Sinha, S. Sharma, and M. R. Mahboob, “A novel approach for IoT based wearable health monitoring and messaging system,” Journal of Ambient Intelligence and Humanized Computing, vol. 10, no. 7, pp. 2817–2828, 2019. View at: Publisher Site  Google Scholar
 L. E. Lima, B. Y. L. Kimura, and V. Rosset, “Experimental environments for the Internet of Things: a review,” IEEE Sensors Journal, vol. 19, no. 9, pp. 3203–3211, 2019. View at: Publisher Site  Google Scholar
 F. Javed, M. K. Afzal, M. Sharif, and B. Kim, “Internet of Things (IoTs) operating systems support, networking technologies, applications, and challenges: a comparative review,” IEEE Sensors Journal, vol. 20, no. 3, pp. 2062–2100, 2018. View at: Google Scholar
 K. Kim, H. Kim, and H. Kim, “Imagebased construction hazard avoidance system using augmented reality in wearable device,” Automation in Construction, vol. 83, pp. 390–403, 2017. View at: Publisher Site  Google Scholar
 Z. Bauer, A. Dominguez, E. Cruz, F. GomezDonoso, S. OrtsEscolano, and M. Cazorla, “Enhancing perception for the visually impaired with deep learning techniques and lowcost wearable sensors,” Pattern Recognition Letters, vol. 137, pp. 27–36, 2020. View at: Publisher Site  Google Scholar
 U. Lee, K. Han, H. Cho et al., “Intelligent positive computing with mobile, wearable, and IoT devices: literature review and research directions,” Ad Hoc Networks, vol. 83, pp. 8–24, 2019. View at: Publisher Site  Google Scholar
 M. Ayaz, M. Ammaduddin, I. Baig, and E.H. M. Aggoune, “Wireless sensor's civil applications, prototypes, and future integration possibilities: a review,” IEEE Sensors Journal, vol. 18, no. 1, pp. 4–30, 2018. View at: Publisher Site  Google Scholar
 A. Kos, V. Milutinović, and A. Umek, “Challenges in wireless communication for connected sensors and wearable devices used in sport biofeedback applications,” Future Generation Computer Systems, vol. 92, pp. 582–592, 2019. View at: Publisher Site  Google Scholar
 M. Cagnazzo, F. Delfino, L. Vollero, and A. Zinicola, “Trading off quality and complexity for a HVQbased video codec on portable devices,” Journal of Visual Communication and Image Representation, vol. 17, no. 3, pp. 564–572, 2006. View at: Publisher Site  Google Scholar
 A. Chefi, A. Soudani, and G. Sicard, “Hardware compression scheme based on low complexity arithmetic encoding for low power image transmission over WSNs,” AEUInternational Journal of Electronics and Communications, vol. 68, no. 3, pp. 193–200, 2014. View at: Publisher Site  Google Scholar
 M. Chen, Y. Zhang, and C. Lu, “Efficient architecture of variable size HEVC 2DDCT for FPGA platforms,” AEUInternational Journal of Electronics and Communications, vol. 73, pp. 1–8, 2017. View at: Publisher Site  Google Scholar
 A. Madanayake, R. J. Cintra, V. Dimitrov et al., “Lowpower VLSI architectures for DCT\/DWT: precision vs approximation for HD video, biomedical, and smart antenna applications,” IEEE Circuits and Systems Magazine, vol. 15, no. 1, pp. 25–47, 2015. View at: Publisher Site  Google Scholar
 T. K. Araghi, A. A. Manaf, A. Alarood, and A. B. Zainol, “Host feasibility investigation to improve robustness in hybrid DWTSVD based image watermarking schemes,” Advances in Multimedia, vol. 2018, Article ID 1609378, 2018. View at: Google Scholar
 H. Persson, A. Brunstrom, and T. Ottosson, “Utilizing crosslayer information to improve performance in JPEG2000 decoding,” Advances in Multimedia, vol. 2007, Article ID 024758, 2007. View at: Google Scholar
 B. Yan, T. Pei, and X. Wang, “Wavelet method for automatic detection of eyemovement behaviors,” IEEE Sensors Journal, vol. 19, no. 8, pp. 3085–3091, 2019. View at: Publisher Site  Google Scholar
 B. D. E. Cherif and A. Bendiabdellah, “Detection of twolevel inverter opencircuit fault using a combined DWTNN approach,” Journal of Control Science and Engineering, vol. 2018, Article ID 1976836, 2018. View at: Google Scholar
 A. F. R. Guarda, J. M. Santos, L. A. da Silva Cruz, P. A. A. Assunção, N. M. M. Rodrigues, and S. M. M. de Faria, “A method to improve HEVC lossless coding of volumetric medical images,” Signal Processing: Image Communication, vol. 59, pp. 96–104, 2017. View at: Publisher Site  Google Scholar
 M. L. L. De Faria, C. E. Cugnasca, and J. R. A. Amazonas, “Insights into IoT data and an innovative DWTbased technique to denoise sensor signals,” IEEE Sensors Journal, vol. 18, no. 1, pp. 237–247, 2018. View at: Google Scholar
 F. Han, X. Qiao, Y. Ma, W. Yan, X. Wang, and X. Pan, “Grass leaf identification using dbN wavelet and CILBP,” Advances in Multimedia, vol. 2020, Article ID 1909875, 8 pages, 2020. View at: Google Scholar
 J. G. Escobedo and K. B. Ozanyan, “Tiledblock image reconstruction by wavelet based, parallelfiltered backprojection,” IEEE Sensors Journal, vol. 16, no. 12, pp. 4839–4846, 2016. View at: Google Scholar
 D. Bhowmik and C. Abhayaratne, “2D t wavelet domain video watermarking,” Advances in Multimedia, vol. 2012, Article ID 973418, 19 pages, 2012. View at: Google Scholar
 P. R. Hill, N. Anantrasirichai, A. Achim, M. E. AlMualla, and D. R. Bull, “Undecimated dualtree complex wavelet transforms,” Signal Processing: Image Communication, vol. 35, pp. 61–70, 2015. View at: Publisher Site  Google Scholar
 T. Brahimi, F. Laouir, L. Boubchir, and A. AliChérif, “An improved waveletbased image coder for embedded greyscale and colour image compression,” AEUInternational Journal of Electronics and Communications, vol. 73, pp. 183–192, 2017. View at: Publisher Site  Google Scholar
 H. Liu, K.K. Huang, C.X. Ren, Y.F. Yu, and Z.R. Lai, “Quadtree coding with adaptive scanning order for spaceborne image compression,” Signal Processing: Image Communication, vol. 55, pp. 1–9, 2017. View at: Publisher Site  Google Scholar
 R. Banerjee and S. Das Bit, “An energy efficient image compression scheme for wireless multimedia sensor network using curve fitting technique,” Wireless Networks, vol. 25, no. 1, pp. 167–183, 2019. View at: Publisher Site  Google Scholar
 M. Tükel, A. Yurdakul, and B. Örs, “Customizable embedded processor array for multimedia applications,” Integration, vol. 60, pp. 213–223, 2018. View at: Publisher Site  Google Scholar
 D. G. Costa, “Visual sensors hardware platforms: a review,” IEEE Sensors Journal, vol. 20, no. 8, pp. 4025–4033, 2020. View at: Publisher Site  Google Scholar
 L. Linguaglossa, S. Lange, S. Pontarelli et al., “Survey of performance acceleration techniques for network function virtualization,” Proceedings of the IEEE, vol. 107, no. 4, pp. 746–764, 2019. View at: Publisher Site  Google Scholar
 G. S. Niemiec, L. M. S. Batista, A. E. SchaefferFilho, and G. L. Nazar, “A survey on FPGA support for the feasible execution of virtualized network functions,” IEEE Communications Surveys & Tutorials, vol. 22, no. 1, pp. 504–525, 2020. View at: Publisher Site  Google Scholar
 P. Shantharama, A. S. Thyagaturu, and M. Reisslein, “Hardwareaccelerated platforms and infrastructures for network functions: a survey of enabling technologies and research studies,” IEEE Access, vol. 8, pp. 132021–132085, 2020. View at: Publisher Site  Google Scholar
 A. J. Ferrer, J. M. Marquès, and J. Jorba, “Towards the decentralised cloud: survey on approaches and challenges for mobile, ad hoc, and edge computing,” ACM Computing Surveys (CSUR), vol. 51, no. 6, 2019. View at: Publisher Site  Google Scholar
 M. Mehrabi, D. You, V. Latzko, H. Salah, M. Reisslein, and F. H. Fitzek, “Deviceenhanced MEC: multiaccess edge computing (MEC) aided by end device computation and caching: a survey,” IEEE Access, vol. 7, pp. 166 079–166 108, 2019. View at: Publisher Site  Google Scholar
 N. Karakoc, A. Scaglione, A. Nedic, and M. Reisslein, “Multilayer decomposition of network utility maximization problems,” IEEE/ACM Transactions on Networking, vol. 28, no. 5, pp. 2077–2091, 2020. View at: Publisher Site  Google Scholar
 T. Luo, H.P. Tan, and T. Q. S. Quek, “Sensor OpenFlow: enabling softwaredefined wireless sensor networks,” IEEE Communications Letters, vol. 16, no. 11, pp. 1896–1899, 2012. View at: Publisher Site  Google Scholar
 P. Shantharama, A. S. Thyagaturu, N. Karakoc et al., “LayBack: SDN management of multiaccess edge computing (MEC) for network access services and radio resource sharing,” IEEE Access, vol. 6, pp. 57 545–557 561, 2018. View at: Publisher Site  Google Scholar
 S. Rein and M. Reisslein, “Lowmemory wavelet transforms for wireless sensor networks: a tutorial,” IEEE Communications Surveys & Tutorials, vol. 13, no. 2, pp. 291–307, 2011. View at: Publisher Site  Google Scholar
 S. Rein and M. Reisslein, “Performance evaluation of the fractional wavelet filter: a lowmemory image wavelet transform for multimedia sensor networks,” Ad Hoc Networks, vol. 9, no. 4, pp. 482–496, 2011. View at: Google Scholar
 B. K. Mohanty, A. Mahajan, and P. K. Meher, “Area and powerefficient architecture for highthroughput implementation of lifting 2D DWT,” IEEE Transactions on Circuits and Systems II: Express Briefs, vol. 59, no. 7, pp. 434–438, 2012. View at: Publisher Site  Google Scholar
 R. K. Bhattar, K. R. Ramakrishnan, and K. S. Dasgupta, “Strip based coding for large images using wavelets,” Signal Processing: Image Communication, vol. 17, no. 6, pp. 441–456, 2002. View at: Publisher Site  Google Scholar
 Y. Hu and C. C. Jong, “A memoryefficient scalable architecture for liftingbased discrete wavelet transform,” IEEE Transactions on Circuits and Systems II: Express Briefs, vol. 60, no. 8, pp. 502–506, 2013. View at: Publisher Site  Google Scholar
 L. Ye and Z. Hou, “Memory efficient multilevel discrete wavelet transform schemes for JPEG2000,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 25, no. 11, pp. 1773–1785, 2015. View at: Publisher Site  Google Scholar
 Y. Hu and V. K. Prasanna, “Energy and areaefficient parameterized liftingbased 2D DWT architecture on FPGA,” in Proceedings of the IEEE High Performance Extreme Computing Conference (HPEC), pp. 1–6, MA, USA, 2014. View at: Google Scholar
 M. Tausif, A. Jain, E. Khan, and M. Hasan, “Low memory architectures of fractional wavelet filter for lowcost visual sensors and wearable devices,” IEEE Sensors Journal, vol. 20, no. 13, pp. 6863–6871, 2020. View at: Publisher Site  Google Scholar
 C. Zhang, C. Wang, and M. O. Ahmad, “A pipeline VLSI architecture for fast computation of the 2D discrete wavelet transform,” IEEE Transactions on Circuits and Systems I: Regular Papers, vol. 59, no. 8, pp. 1775–1785, 2012. View at: Publisher Site  Google Scholar
 B. K. Mohanty and P. K. Meher, “Memoryefficient highspeed convolutionbased generic structure for multilevel 2D DWT,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 23, no. 2, pp. 353–363, 2013. View at: Publisher Site  Google Scholar
 Y. Zhang, H. Cao, H. Jiang, and B. Li, “Memoryefficient highspeed VLSI implementation of multilevel discrete wavelet transform,” Journal of Visual Communication and Image Representation, vol. 38, pp. 297–306, 2016. View at: Publisher Site  Google Scholar
 C.H. Hsia, J.S. Chiang, and J.M. Guo, “Memoryefficient hardware architecture of 2D dualmode liftingbased discrete wavelet transform,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 23, no. 4, pp. 671–683, 2013. View at: Publisher Site  Google Scholar
 G. Hegde, K. S. Reddy, and T. K. Shetty Ramesh, “A new approach for 1D and 2D DWT architectures using LUT based lifting and flipping cell,” AEU  International Journal of Electronics and Communications, vol. 97, pp. 165–177, 2018. View at: Publisher Site  Google Scholar
 M. M. A. Basiri and S. N. Mahammad, “An efficient VLSI architecture for lifting based 1D/2D discrete wavelet transform,” Microprocessors and Microsystems, vol. 47, pp. 404–418, 2016. View at: Google Scholar
 C. Wang, J. Zhou, L. Liao et al., “Nearthreshold energy and areaefficient reconfigurable DWPT/DWT processor for healthcaremonitoring applications,” IEEE Transactions on Circuits and Systems II: Express Briefs, vol. 62, no. 1, pp. 70–74, 2015. View at: Publisher Site  Google Scholar
 T. Wang, P. Huang, K. Chen et al., “Energy‐efficient configurable discrete wavelet transform for neural sensing applications,” in Proceedings of the IEEE International Symposium on Circuits and Systems (ISCAS), pp. 1841–1844, Melbourne, Australia, June 2014. View at: Google Scholar
 G. Kumar, N. Balaji, N. Balaji, K. Reddy, and V. Thanuja, “Power and area efficient radix8 booth multiplier for 2D DWT architecture,” International Journal of Intelligent Engineering and Systems, vol. 12, no. 3, pp. 148–155, 2019. View at: Publisher Site  Google Scholar
 J. Franco, G. Bernabé, J. Fernández, and M. Ujaldón, “The 2D wavelet transform on emerging architectures: GPUs and multicores,” Journal of RealTime Image Processing, vol. 7, no. 3, pp. 145–152, 2012. View at: Publisher Site  Google Scholar
 V. Galiano, O. López, M. P. Malumbres, and H. Migallón, “Parallel strategies for 2D Discrete Wavelet Transform in shared memory systems and GPUs,” The Journal of Supercomputing, vol. 64, no. 1, pp. 4–16, 2013. View at: Publisher Site  Google Scholar
 T. Ikuzawa, F. Ino, and K. Hagihara, “Reducing memory usage by the liftingbased discrete wavelet transform with a unified buffer on a GPU,” Journal of Parallel and Distributed Computing, vol. 9394, pp. 44–55, 2016. View at: Publisher Site  Google Scholar
 W. J. van der Laan, A. C. Jalba, and J. B. T. M. Roerdink, “Accelerating wavelet lifting on graphics hardware using CUDA,” IEEE Transactions on Parallel and Distributed Systems, vol. 22, no. 1, pp. 132–146, 2011. View at: Publisher Site  Google Scholar
 M. Kucis, D. Barina, M. Kula, and P. Zemcik, “2D discrete wavelet transform using GPU,” in Proceedings of the International Symposium on Computer Architecture and High Performance Computing Workshop, pp. 1–6, Paris, France, 2014. View at: Google Scholar
 T. M. Quan and W.K. Jeong, “A fast discrete wavelet transform using hybrid parallelism on GPUs,” IEEE Transactions on Parallel and Distributed Systems, vol. 27, no. 11, pp. 3088–3100, 2016. View at: Publisher Site  Google Scholar
 R. Khemiri, F. Sayadi, M. Atri, and R. Tourki, “MatLab acceleration for DWT ”daubechies 9/7” for JPEG2000 standard on GPU,” in Proceedings of the Global Summit on Computer & Information Technology (GSCIT), pp. 1–4, Sousse, Tunisia, 2014. View at: Google Scholar
 G. Suseela and Y. Asnath Victy Phamila, “Energy efficient image coding techniques for low power sensor nodes: a review,” Ain Shams Engineering Journal, vol. 9, no. 4, pp. 2961–2972, 2018. View at: Publisher Site  Google Scholar
 H. Sun and Y. Q. Shi, Image and Video Compression for Multimedia Engineering: Fundamentals, Algorithms, and Standards, CRC Press, Boca Raton, FL, USA, 2nd edition, 2008.
 H. ZainEldin, M. A. Elhosseini, and H. A. Ali, “A modified listless strip based SPIHT for wireless multimedia sensor networks,” Computers & Electrical Engineering, vol. 56, pp. 519–532, 2016. View at: Publisher Site  Google Scholar
 W. Sweldens, “The lifting scheme: a construction of second generation wavelets,” SIAM Journal on Mathematical Analysis, vol. 29, no. 2, pp. 511–546, 1998. View at: Publisher Site  Google Scholar
 S. A. Rein, F. H. P. Fitzek, C. Gühmann, and T. Sikora, “Evaluation of the wavelet image twoline coder: a low complexity scheme for image compression,” Signal Processing: Image Communication, vol. 37, pp. 58–74, 2015. View at: Publisher Site  Google Scholar
 A. Skodras, C. Christopoulos, and T. Ebrahimi, “The JPEG 2000 still image compression standard,” IEEE Signal Processing Magazine, vol. 18, no. 5, pp. 36–58, 2001. View at: Publisher Site  Google Scholar
 A. D. Darji, S. S. Kushwah, S. N. Merchant, and A. N. Chandorkar, “Highperformance hardware architectures for multilevel liftingbased discrete wavelet transform,” EURASIP Journal on Image and Video Processing, vol. 2014, 2014. View at: Google Scholar
 G. Savić and V. Rajović, “Novel memory efficient hardware architecture for 5/3 liftingbased 2D Inverse DWT,” Journal of Circuits, Systems and Computers, vol. 28, no. 7, 2018. View at: Google Scholar
 S. M. Aziz and D. M. Pham, “Efficient parallel architecture for multilevel forward discrete wavelet transform processors,” Computers & Electrical Engineering, vol. 38, no. 5, pp. 1325–1335, 2012. View at: Publisher Site  Google Scholar
 M. Tausif, E. Khan, M. Hasan, and M. Reisslein, “SMFrWF: segmented modified fractional wavelet filter: fast lowmemory discrete wavelet transform (DWT),” IEEE Access, vol. 7, pp. 84 448–84 467, 2019. View at: Publisher Site  Google Scholar
 R. Li, H. Liu, Y. Zeng, and Y. Li, “Block compressed sensing of images using adaptive granular reconstruction,” Advances in Multimedia, vol. 2016, 2016. View at: Google Scholar
Copyright
Copyright © 2020 Mohd Tausif et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.