Research Article  Open Access
Qin Jiancheng, Lu Yiqin, Zhong Yu, "Fast Algorithm of Truncated BurrowsWheeler Transform Coding for Data Compression of Sensors", Journal of Sensors, vol. 2018, Article ID 6908760, 17 pages, 2018. https://doi.org/10.1155/2018/6908760
Fast Algorithm of Truncated BurrowsWheeler Transform Coding for Data Compression of Sensors
Abstract
Lots of sensors in the IoT (Internet of things) may generate massive data, which will challenge the limited sensor storage and network bandwidth. So the study of big data compression is very useful in the field of sensors. In practice, BWT (BurrowsWheeler transform) can gain good compression results for some kinds of data, but the traditional BWT algorithms are neither concise nor fast enough for the hardware of sensors, which will limit the BWT block size in a very small and incompetent scale. To solve this problem, this paper presents a fast algorithm of truncated BWT named “CZBWT algorithm” and implements it in the shareware named “ComZip.” CZBWT supports the BWT block up to 2 GB (or larger) and uses the bucket sort. It is very fast with the time complexity O(N) and fits the big data compression. The experiment results indicate that ComZip with the CZBWT filter is obviously faster than bzip2, and it can obtain better compression ratio than bzip2 and p7zip in some conditions. In addition, CZBWT is more concise than current BWT with SA (suffix array) sorts and fits the hardware BWT implementation of sensors.
1. Introduction
With the rapid expansion of IoT (Internet of things), lots of sensors are available in various fields, which may generate massive data. Meanwhile, the storage capacity of sensors and network bandwidth are limited, especially in a WSN (wireless sensor network). GBs or TBs of big data in IoT make enormous challenges to the sensors.
Data compression is a smart way to reduce the storage usage and speed up the network transportation. In addition, BWT (BurrowsWheeler transform [1]) can gain good compression results for some kinds of data. For example, there are a lot of lightweight sensors in a zone of WSN to obtain the temperature data, and most of the data are similar. Thus, a practical way is using some highperformance nodes in this WSN to gather these data, use BWT to compress them and transmit them to the back end cloud platform.
BWT is also valuable in the field of bioinformatics. For example, the big genome data need compression and index, and BWT is an effective way [2, 3]. The DNA data are special and fit the BWT compression. Although we cannot simply compare the bioinformation software such as BWA (BurrowsWheeler Aligner) and the universal compression software such as bzip2, analyzing their BWT algorithms is meaningful.
But a practical problem is the speed of BWT for sensors and big data. High compression speed is important because the sensors have to treat GBs of data or more, while the traditional BWT algorithms are neither concise nor fast enough, which will limit the BWT block size in a very small and incompetent scale. In our previous paper, we have discussed the traditional compression software bzip2 [4]. Its BWT block size is not more than 900 KB, which will limit the compression ratio. Although it is not large enough to deal with the big data, enlarging the BWT block will observably decelerate the compression. The primary reason is the computing consumption of the traditional BWT algorithms. Besides, the hardware performance and energy consumption of the sensors are limited, which makes it difficult to increase the BWT block size for the big data compression.
We have designed a combined parallel algorithm named “CZ algorithm” to compress and encrypt the big data efficiently and developed our compression software named “ComZip” [5]. Now we have made ComZip compatible to Linux platforms. As mentioned in the figures of [4, 5], ComZip has a BWT filter. This paper focuses on the BWT filter and proposes a fast algorithm of truncated BWT named “CZBWT algorithm” to compress the big data efficiently. CZBWT algorithm has the following features: (1)It uses truncated BWT to simplify the algorithm and gain the good performance.(2)It uses bucket sort to speed up the BWT encoding and decoding with time complexity O(N), so that the BWT block size can rise to 2 GB or more to fit the big data compression.(3)It can simplify the hardware design of the BWT filter, so that the sensors may use hardware to accelerate the BWT compression.
We did some experiments on both platforms x86/64 and ARM (advanced RISC machines) to compare the efficiencies of data compression among ComZip with/without CZBWT, bzip2, and p7zip. The experiment results indicate that ComZip with CZBWT filter is obviously faster than bzip2, and it can obtain better compression ratio than bzip2, p7zip, and ComZip itself without CZBWT filter in some conditions. In addition, the algorithm analysis infers that CZBWT is more concise than current BWT with SA (suffix array) sorts and fits the hardware BWT implementation of sensors.
To make further experiments, we provide 2 versions of ComZip in the website: for Ubuntu Linux (x86/64 platform) and Raspbian (ARM platform). The researchers may download them from http://www.28x28.com/doc/cz_bwt.html.
The remainder of this paper is structured as follows:
Section 2 expresses the problems of BWT for sensors and big data compression. Section 3 introduces the algorithm of CZBWT encoding and decoding. Section 4 analyzes the complexities of CZBWT algorithm. The experiment results are given in Section 5. The conclusions and future work are given in Section 6.
2. Problems of BWT for Sensors and Big Data Compression
Numerous sensors in IoT can generate big data, but the bottlenecks of data transportation, storage, and computation in the networks of sensors need to be eliminated. Data compression meets this requirement. Figure 1 shows a typical scene in a WSN with both lightweight and heavy nodes, where BWT is feasible.
This WSN has lots of lightweight nodes to sense the situation and generate massive data. Since they have limited energy, storing capacity, and computing resources, they cannot keep the data or achieve the long distance transportation, while a few heavy nodes in the WSN can gather and compress the data and then transport them to the backend cloud platform. The cloud platform has plenty of resources to store, decompress, and analyze the data.
We have discussed the big data compression and encryption in the heavy nodes in a WSN in [5], but if the heavy nodes use BWT, we still have the following problems: (1)How can the BWT block be enlarged without rapid decrease of the encoding/decoding speed?(2)Can we design simplified hardware BWT filters for the sensors?
A larger BWT block can gain better compression ratio. In this paper and the previous [4, 5], we use the same definition of the compression ratio as follows:
and are the volumes of the compressed and original data, respectively. If the original data are not compressed, . If the compressed data are larger than the original data, . Always .
Facing GBs or TBs of big data, a small block of 900 KB cannot show the power of BWT. But enlarging the block will cause the performance bottleneck. As the analyses in Section 4 reveal, BWT encoding speed depends on the string sorting algorithm, and traditional BWT encoding has the time complexity O(N^{2}lbN). N is the block size. If we change the block from 900 KB to 60 MB without any optimization, the encoding will become very slow. This is the first problem.
Although the hardware development improves the performance of the heavy sensors, it is still a challenge for the sensors to achieve fast BWT encoding/decoding. For example, ARM platforms have multicore CPUs with low energy consumption, and the current flash memory has enough capacity and good performance to support a large BWT block, but a practical BWT filter must be fast enough. This is the reason we consider making hardware BWT filters for the sensors.
The problem is that the complex traditional BWT algorithms bring difficulties to the hardware design. If a hardware BWT filter is very complex, its performance will be limited and its energy consumption will be high, and then it is unfit for the sensors.
To solve the problems, we need to review the main related works around sensors and big data compression.
In [5], we have discussed that current mathematic models and methods of lossless compression can be divided into 3 classes: (1)The compression based on the probabilities and statistics(2)The compression based on the dictionary indexes(3)The compression based on the order and repeat of the symbols; BWT belongs to this class
Current popular compression softwares are comprehensive applications of the above basic classes, and they have different features, which determine their compression ratio and speed. Especially, to compress big data in the sensors, we have 2 requirements: (1)Compression speed: fast enough. Since the hardware performance of a sensor is limited, the speed is very important. Too slow softwares such as PAQ and WinUDA are unfit for the big data.(2)Compression ratio: high. A large data window with good algorithms can benefit the compression ratio. The softwares with too small data windows such as WinZip (512 KB), WinRAR (2 MB), gzip (32 KB), and bzip2 (900 KB block) are unfit for the big data.
In [4, 5], we have developed and updated the compression software ComZip, and in this paper, we developed its Linux version, so that it can run in some sensors such as ARM platforms. ComZip uses all the 3 compression classes: (1)In class 1, ComZip uses the arithmetic coding [6] and PPM (partial prediction match) algorithm [7], which can gain pretty good compression ratio.(2)In class 2, LZ77 algorithm [8] is used, which has the advantage of speed.(3)In class 3, BWT is used, which is the focus in this paper.
To solve the problem of BWT encoding/decoding speed, a lot of algorithms have been developed. Current string sorting algorithms for BWT can reach the speed of linear time complex O(N), for example, some algorithms using SA (suffix array) [9] such as the 3 most popular lineartime algorithms: KS [10], KA [11], and SAIS [12]. Among them, SAIS is currently the best algorithm in the speed. Moreover, the further optimization of SAIS algorithm is studied [13], and the first linear nonrecursive algorithm named GSACA is a new approach for the future [14].
Although current algorithms with SA are faster than the traditional BWT algorithms, it is not so easy to apply them directly to the sensors with the hardware and energy limits. Considering the large BWT block for the big data, the memory requirement of the SA construction is many times of the block. Meanwhile, if we try to design a hardware BWT filter for the sensors, we will meet the complexity of the algorithms such as the recursive computation in SAIS [13]. GSACA is nonrecursive, but currently, it is slower than SAIS, and its memory consumption is quite large [14], which are weaknesses for the limited computing resources of the sensors.
Parallel algorithms and the hardware design of BWT are also studied to improve the speed, including the parallel architecture [3, 15] and the practical hardware acceleration, for example, FPGA (fieldprogrammable gate array) [16] and GPU (graphic processing unit) [17]. The advancement is that parallel algorithms benefit the hardware BWT performance, and the researchers tend to simplify the hardware design so that they can obtain higher speeds [3, 17], but the algorithms such as SAIS are still complex for the sensors. Thus, finding faster and simpler BWT algorithms is useful.
In [3], a limited SA length k is brought into the string sorting, which can reduce the computation. We call this method “truncated BWT.” We also use truncated BWT in this paper, but the limited length is different because we do not use SA, and we use bucket sorting instead of traditional merging or comparingbased sorting.
3. CZBWT Encoding and Decoding
3.1. Concepts of CZBWT
The compression software ComZip uses the parallel pipeline named “CZ pipeline” and the truncated BWT named “CZBWT.” We have introduced the framework of CZ encoding pipeline in [5], and the reverse framework is CZ decoding pipeline. Figure 2 is the same encoding framework, and the only difference is the alternative BWT filter in use. CZBWT is working in the BWT filter.
CZBWT combines the following methods to improve the performance and simplify the algorithm design: (1)CZBWT uses truncated string sorting instead of SA sorting. As shown in the first figure of [16], the principle of BWT is sorting the data to fit the compression. Sorting is the primary computation in BWT, which determines the performance. We use the same example as that in [16] to explain the truncated string sorting in CZBWT. Figure 3 shows the matrices for BWT sorting. The block size . As shown in (a), the traditional BWT uses full string sorting, which needs comparing of entire strings, for example, Row 0 “XYZAACOL” and Row 1 “YZAACOLX.” The sorting result of (a) is shown in (b). Column 0 “AACLOXYZ” is the sorted string, and Column 7 “ZAAOCLXY” is the BWT output string. As shown in (c), the SA sorting ought to compare the suffixes of the same string, e.g., Row 0 “XYZAACOL” and Row 1 “YZAACOL,” but the SA algorithms have been optimized to avoid such slow comparison [9–14]. As shown in (d), the truncated string sorting only compares short strings with length , for example, Row 0 “XYZ” and Row 1 “YZA” with . Figure 3(c) shows the SA sorting. Because the common SA sorting result is not always the same as the initial BWT result [1] shown in (b), a special ending symbol “$” is attached to the string tail in order to bridge the gap of the different results. This special symbol has a smaller code (e.g., −1) than any 8b binary code (), which means the SA sorting algorithm needs special treatments for the ending symbol besides the normal 8b charset. Figure 3(e) shows the truncated string sorting result of (d). In this example, (b) and (e) are the same, but in practice, if 2 truncated strings are the same, for example, “ABC” compares to “ABC,” the sorting result depends on their original positions. So the decoding algorithm of CZBWT is different from that of common BWT.(2)CZBWT sorts simple integers instead of strings, and it reverses the character sorting sequence indeed. Figure 4 shows different types of data comparisons. Truncated string comparing, for example, “XYZ” and “ACO” in (a) can be changed into simple integer comparing if we regard “XYZ” as a 24b integer. A 64b integer can substitute a truncated string with length , but in most of the platforms, for example, x86/64 and ARM, the LSB (least significant byte) is in the front, so the sorting sequence of the characters is reversed. As shown in (b), “Z” is the MSB (most significant byte) of the integer “XYZ,” so its actual sorting result will be the same as reverse string sorting in (c).(3)CZBWT uses bucket sorting instead of traditional merging or comparing based sorting. Since the sting sorting is changed into integer sorting, CZBWT can use bucket sorting. When we use truncated strings with , 256^{3} buckets are needed. If the memory is sufficient, is feasible and 256^{4} buckets are needed. Both encoding and decoding in CZBWT use bucket sorting.
(a) Traditional full string sorting
(b) BWT sorted matrix
(c) Suffix array sorting
(d) Truncated string sorting
(e) Truncated BWT sorted matrix
(a) String comparing
(b) Integer comparing
(c) Reverse string comparing
3.2. CZBWT Encoding
We use another example with BWT block length . There are 2 phases in CZBWT encoding:
Phase 1 (building the bucket sorting links). We assume the BWT block data is a “cycle” string , which has the following feature: And is a 24b integer (). Then we build the bucket sorting links on s. Figure 5 shows the example of 2 links: “ZYX” and “OCA.” The bucket array has 256^{3} link headers, and all links have the same end: null pointer. We define the structure of the links and their headers as follows:
Phase 2 (outputting the sorted data). We follow each link to output the data. Figure 6 shows the example of the link “OCA,” which will output the characters “MA,” referring to Figure 3(e). And finally we output the start position of the block for CZBWT decoding.
Algorithm 1 shows the CZBWT encoding algorithm.

3.3. CZBWT Decoding
We use the same example as shown in Figure 3, but the string is reversed into “LOCAAZYX” because of the MSB/LSB in the integer sorting. Figure 7(a) shows the full decoding matrix of CZBWT, which corresponds to Figure 3(e). And we ought to pay attention to the column numbers of this matrix: Row 6 is the reversed string “YZAACOLX,” and Row 5 is the original data, reversed string “XYZAACOL.”
(a) CZBWT full decoding matrix
(b) Phase 1
(c) Phase 2
(d) Phase 3
(e) Phase 4
Recovering the whole decoding matrix is not necessary. Because CZBWT uses truncated data sorting, its decoding is different from that of the general BWT. As shown in Figure 7, there are 4 phases in CZBWT decoding:
Phase 1 (building the second column of the matrix). As shown in Figure 7(b), this phase is the 8b integer bucket sorting. The second column is Column 7, and Column 0 stores the input data, which are the output data of CZBWT encoding. The bucket array has 2^{8} counters, so that we can scan Column 0 once and write the sorted data to Column 7.
Phase 2 (building the third column of the matrix). As shown in Figure 7(c), this phase is the 16b integer bucket sorting. The bucket array has 2^{16} counters, so that we can scan Column [0,7] once and write the sorted data to Column [7,6]. Because the amount of the 16b integers is related to the previous 8b integers, Column 7 will be the same as that in phase 1. Thus, we can write Column 6 only.
Phase 3 (building the forth column of the matrix). As shown in Figure 7(d), this phase is the 24b integer bucket sorting. The bucket array has 2^{24} counters, and we can scan Column [0,7,6] once and write the sorted data to Column [7,6,5]. But this time, we need not write Column 5, because the link headers in phase 4 have the same 24b sorting effect already. Hiding the writing back operation can simplify this algorithm and improve the decoding speed.
Phase 4 (outputting along the bucket sorting links). As shown in Figure 7(e), this phase is the outputting of the decoded block. We can easily change the bucket array from data counters into link headers by accumulating the counter values, because Column [7,6,5] is sorted, and the current link header position adds that the current counter value is the next link header. For example, in Figures 7(d) and 7(e), we focus on Column [7,6,5] and notice
According to the “cycle” string as shown in (2), we define the data counters in Figure 7 as follows:
Then we can get the link headers in Figure 7(e) from the data counters in Figure 7(d) as follows:
Here is a trick for the algorithm optimization. The exact value of a header ought to be 1 smaller than that in (7). For example, according to (7), , while in (5), exactly. Now we explain this trick:
There are 256^{3} links in Column [7,6,5] in Figure 7(e), and their 256^{3} headers are dynamic. In this phase, the headers are calculated with (7) at first, and then each output character of string s will cause that a corresponding header switches to the next link node position:
When a counter gets a value larger than 1, for example, in Figures 5 and 6 (the string is reversed), the dynamic header is useful to determine which is the current link node position. The next position of the same link is easily calculated with (8) because Column [7,6,5] in Figure 7(e) are already sorted.
The trick can save the algorithm operations: Since each time we have to fetch a header value to locate the position, and decrease the value with (8) for the future fetch, we may simply merge the “fetch” and “decrease” operations. So long as the initial value of each header in (7) is 1 larger than the exact value, we can use the “fetch & decrease” operation each time.
Algorithm 2 shows the CZBWT decoding algorithm. We use 2 bucket arrays to mix the phases and save the time of data accessing.

3.4. Example of CZBWT Decoding
To explain how Algorithm 2 works, we provide another example of CZBWT decoding in detail.
In this example, we use the reversed string “XYXYXCOL” as the original data, which has two matches of the characters “XYX.” Figure 8 shows the CZBWT encoding of Algorithm 1. We can find the relationship between Figures 3(d) and 3(e) and Figures 8(a) and 8(c). And in Figure 8(a), characters “XYX” in positions 1 and 2 are sorted. As a result, Figure 8(b) shows the unchanged “XYX” position sequence. The similar situation is in Figure 6.
Figures 9 and 10 show decoding phases 1 to 4 according to Figure 7. As the phases and key operations are described in Algorithm 2, we can see the data changes from Figures 9 and 10, so that we can follow the process of decoding the reversed string “XOCYLYXX.”
(a)
(b)
(c)
(d)
In Figure 9(a), the input data “XOCYLYXX” are counted and then sorted in Figure 9(b). This is a typical bucket sorting with 256 counters. And the bucket sorting proceeds again in Figures 9(c) and 9(d), with 256^{2} counters. The array link stores the sorting results.
In Figure 9(e), the data are counted with 256^{3} counters, but the sorting is hidden in the calculation of link headers in Figure 9(f). Thus, the link does not store Column 5 indeed.
Figure 10 shows the data output process in phase 4. The practical output operation in Algorithm 2 is using a string s to store the output characters. Figures 10(a) and 10(b) give the example of outputting s[0] and s[1].
The start position is Row 4 in Figure 10(a). Each time, the algorithm outputs a character by the following steps: (1)Outputting the character of Column 0 in the link. In Figure 10(a), (2)Fetching and decreasing the link header value with Column [0,7,6] in the link. In Figure 10(a), .(3)Keeping the header value as the next position to output a character. .
In Figure 10(b), the steps are executed again: (1).(2).(3).
The steps go on repeating until we gain the fulllength which are reversed. As mentioned in Section 3.3, CZBWT decoding outputs reversed data. As CZBWT encoding also outputs reversed data through the backward links in Figures 5 and 6, this decoding algorithm can reverse the data again and finally gain the original data.
As shown in Figures 10(c) and 10(d), the decoding algorithm can maintain the correct values of the dynamic headers, for example, header[“XYX”], which keeps the proper order of the character outputs.
4. Analyses of the CZBWT Algorithm
Quite a few recent advancements of BWT algorithms are driven by the rapid development of genome information technologies [2, 3, 13, 18], and there are many DNA softwares using BWT, including DNA compression, alignment, sequencing, and indexing. Due to the difference between the DNA and common data charsets, we cannot proceed direct experiments to compare a DNA software such as BWA (BurrowsWheeler aligner) with a universal compression software such as ComZip or bzip2, but we can analyze their BWT algorithms to investigate their advantages and shortcomings.
4.1. Time Complexities
We may study the BWT encoding and decoding algorithms by analyzing their time and space complexities in the worst cases. First, it is known that the traditional BWT encoding algorithm has the time complexity O(N^{2}lbN). N is the block size. The analyses are as follows:
According to the principle of BWT compression [1], the key computation of BWT encoding is the string sorting, which determines the encoding speed. And the string sorting consists of 2 algorithms: (1)The comparison of 2 strings: The length of each string is equal to the BWT block size N, so this string comparison has the time complexity O(N).(2)The sorting of data elements: In BWT encoding, a data element is a string, and the amount of the strings is equal to the block size N. Some traditional sorting algorithms such as quick sort and heap sort have the time complexity O(NlbN), and it has been proven that O(NlbN) is the fastest level in all comparisonbased algorithms.
From the above 2 algorithms, we find that the fastest traditional BWT encoding has the time complexity O(N^{2}lbN). It is not fast enough. As mentioned in Section 2, the current wellknown fastest string sorting algorithm is SAIS, which has the time complexity O(N). So we compare CZBWT encoding which is used in ComZip and the SAIS encoding which is used in BWA.
According to Algorithm 1, we find that the bucket sorting also has the time complexity O(N), but we can compare more details.
SAIS requires special ending symbol for the block and recursive reduction to a shorter string [2, 13], which are more complex than CZBWT. SAIS needs to scan the BWT block for more than 3 times, while according to Algorithm 1, CZBWT encoding just scan the block twice in phases 1 and 2. Thus, CZBWT encoding is faster indeed.
GSACA also requires special ending symbol. It is nonrecursive, and it also has 2 phases [14], but each phase has much more operations than simply scanning the block in CZBWT encoding. These operations makes GSACA slower than SAIS and CZBWT currently.
4.2. Space Complexities
The memory usage is important for the sensors. SAIS, GSACA, and CZBWT have the same space complexity O(N). In detail, BWA uses the RAM (random access memory) of 5.37N; GSACA uses 12N besides 4N for the suffix array, and CZBWT uses 4N to store the links for the block size up to 2 GB. Moreover, it is easy for CZBWT to use 5N for the block size of up to 512 GB. In this view, CZBWT needs less memory for the block than BWA with SAIS and the GSACA program.
But CZBWT requires extra RAM for the bucket array. If the block size is not more than 2 GB, a bucket counter uses 4 B, and a bucket array has 256^{k} elements. If , a bucket array uses 64 MB, which is feasible in a heavy sensor node. And if , it needs 16 GB, which is feasible in the current cloud platforms.
4.3. Complexities of Hardware Design
Hardware acceleration is valuable for the sensors which have limited computing resources. Due to the complexity of SAIS, it is difficult to implement the hardware BWT with SAIS. As a contrast, CZBWT is simpler and easier for the hardware acceleration.
The figures in [16] show that the truncated BWT fits the hardware design, but it uses merge sort, which has the time complexity O(N^{2}lbN). CZBWT uses bucket sort, which is both fast and easy for the hardware design. Figure 11 shows the primary hardware design of bucket sort. We take Figure 11(a), for example. The accumulator can indicate the current position of the block (); thus, it can directly connect to the address bus of the block data RAM. And this RAM can provide the data, for example, “LXY,” to the address bus of the bucket array RAM. Then, the latter RAM provides the current counter value to the ALU (arithmetic and logic unit), which will update the value and write it back to the RAM. We can use simple sequencecontrol logic circuits to make this module work. In the view of hardware, both (a) and (b) in Figure 11 are succinct and easy to optimize the hardware speed.
(a) Counting
(b) Sorting
4.4. Weaknesses
As a truncated BWT, CZBWT cannot use the standard BWT decoding, which can be used by the BWT with SAIS. Both CZBWT and the standard BWT decoding have the time complexity O(N), but the latter is faster. According to Algorithm 2, CZBWT decoding has to scan the block for 4 times from phases 1 to 4, while the standard BWT decoding can scan the block only twice.
This weakness is acceptable for the sensors. Although CZBWT decoding is slower than the standard BWT, its speed is still in the linear level. And it has no complex implementation such as the special ending symbol and the recursive algorithm, so the hardware acceleration for CZBWT can be used to the sensors in a relatively easy way. Moreover, the typical scene in Figure 1 infers that most of the BWT decoding events occur in the cloud platform, which has plenty of computation resources. So the speed of CZBWT decoding is fast enough in this case.
Another weakness is that the truncated BWT has lower compression ratios than the standard BWT. But we can use a larger block in CZBWT to keep up with the compression ratio. The experiment results show this accomplishment.
Table 1 shows the comparison of typical BWT algorithms. We ought to distinguish the concepts of BWT: CZBWT is a kind of truncated BWT, while bzip2, SAIS, and GSACA use standard BWT, but bzip2 uses the traditional BWT, which is slow.

5. Experimental Results
We have done some experiments to compare ComZip, WinRAR, and 7zip in [5]. The results indicate that ComZip with a large data window has better compression ratio than WinRAR and 7zip in most cases, and its compression speed is faster than 7zip. But those experiments do not use the BWT filter, which has CZBWT algorithms for ComZip.
In this paper, we compare the following softwares in the experiments: ComZip with CZBWT, bzip2, ComZip without CZBWT, and p7zip (7zip for Linux). When we test ComZip without CZBWT, we observe its data window size. When we test ComZip with CZBWT, we observe its block size and use a fixed 4 MB data window for its LZ77 algorithm [8]. We choose a small data window of 4 MB to extrude the abilities of CZBWT, and a data window smaller than 4 MB may reduce the performance of the BWT filter.
The experiments in this paper are on 2 hardware platforms: x86/64 and ARM. Their performances may provide references to the future and current heavy sensor nodes. The operating systems of both experiment platforms are Linux. We have developed ComZip for Linux, and we still provide ComZip in the website. Researchers may use it to do more experiments with new data. It can be downloaded from http://www.28x28.com/doc/cz_bwt.html.
5.1. Tests on the x86/64 Platform
This platform is a common laptop with the following equipments: Intel Core i74700MQ 4core & 8thread CPU, 16 GB DDR3 RAM, and 128 GB SSD (Solid State Disk) and Ubuntu Linux 12.10 (x64). We regard this laptop as a future highend mobile sensor when the fuel cell can provide enough energy. The software versions are ComZip v20171019 (64b), bzip2 1.0.6, and p7zip 9.18.
In this experiment, we use different data windows or block sizes to compress the same original file named “book.htm,” which is an example that some kinds of data can show the advantage of BWT in the compression ratio. This is a real Chinese bookshop data file of storage records in HTML/XML format. Its original length is 346,499,594 B. It can be downloaded from http://www.28x28.com/doc/book.htm.bz2.
Table 2 and Figure 12 show the relationship of the compressed file size and the data window/block size. From (1), we can find that this relationship is virtually the relationship of the compression ratio and the window size.

Table 3 and Figure 13 show the relationship of the compression/decompression time and the data window/block size. Figure 13 hides the decompression time because this weakness of CZBWT is analyzed in Section 4. We focus on the compression performance first, and the optimization of decompression for ComZip is our future work.

In Table 2 and Figure 12, we observe that ComZip with CZBWT has the best compression ratio among these softwares, and p7zip has the worst except the 0.1 MB block of bzip2. The 0.1 MB block is too small for the big data compression. If the block is large enough, the standard BWT in bzip2 has better compression ratio than truncated BWT indeed. When bzip2 uses 0.9 MB block, ComZip has to use about 7 MB block to gain better compression ratio.
But enlarging the block for bzip2 is not practical. Table 3 and Figure 13 show that bzip2 has the slowest compression speed, and its curve raises rapidly, which can exhibit the analysis that traditional BWT has the time complexity O(N^{2}lbN). We can estimate the speed of bzip2 with a 512 MB block.
According to the compression speed shown in Figure 13, ComZip with CZBWT is slower than p7zip, but their curves are close. The curve from 1 to 8 MB show that a block smaller than 8 MB may reduce the performance of CZBWT with 4 MB data window, and the curve from 8 to 512 MB can exhibit the analysis that CZBWT has the time complexity O(N). If we can find an universal compression software with SAIS, we suppose its curve will like this one for CZBWT.
ComZip without CZBWT is much faster than others in Figure 13. We can provide 2 possible reasons. The first reason is the parallel CZ encoding pipeline, which is introduced in [5]. This platform with 8thread CPU, large RAM, and SSD may release the good performance of the pipeline. The second reason is the data file for this experiment fits the optimized LZ77 algorithm, which is mentioned in [4], so the performance of ComZip is evident.
Above all, the experiment results on this x86/64 platform show that ComZip with CZBWT can have the best compression ratio among these softwares, and its compression speed is near p7zip, which is practical for the big data.
5.2. Tests on the ARM Platform
This platform is a popular Raspberry Pi 2 Model B with the following equipments: ARM CortexA7 4core CPU, 1 GB DDR RAM, 64 GB Micro SDXC (SD eXtended Capacity), and Raspbian Linux 7. We regard this Raspberry Pi as a current heavy node of mobile sensors, which is inexpensive. The software versions are ComZip v20171019 (32b), bzip2 1.0.6, and p7zip 9.20.
In this experiment, we still use different data windows or block size to compress the same original file “book.htm.” We can see the difference of the results between the platforms of x86/64 and ARM.
Table 4 and Figure 14 show the relationship of the compressed file size and the data window/block size, and Table 5 and Figure 15 show the relationship of the compression/decompression time and the data window/block size.


The only difference between Tables 2 and 4 is the size of the file compressed by ComZip. Even the data window or block size is the same; ComZip generates different compressed file. The reason is explained in [5]. ComZip is also a chaotic encryption software. If the same file is compressed by ComZip twice, we will get 2 thoroughly different compressed files. But the difference of the lengths is so tiny that the influence on the compression ratio can be ignored.
This experiment is limited by the platform hardware, especially the 1 GB RAM. When the block size is enlarged to 64 MB, ComZip with CZBWT aborts for insufficient RAM. Both the bucket array and the operating system occupy extra RAM; thus, the total RAM capacity of 1 GB is inadequate. If the RAM is enlarged to 2 GB, we estimate that the workable block size may reach 300 MB.
Figure 15 shows bzip2 is much slower than the others, and its curve also raises rapidly. ComZip with CZBWT is faster than p7zip when their data window/block size is between 2 and 8 MB and slower than p7zip in the other cases. ComZip without CZBWT is also the fastest on this platform, but the 4core CPU limits the performance of the parallel CZ encoding pipeline.
Above all, the experiment results on this ARM platform also show that the compression speed of ComZip with CZBWT is practical. Although the block size is limited by the RAM, ComZip with CZBWT has the best compression ratio among these softwares.
5.3. Tests with Other Data
The compression ratio of BWT is not always better than the others without BWT. We find that only some kinds of special data fit the BWT compression well. This experiment uses the same x86/64 platform, but the data file is changed into “lamp.vdi,” which is a real virtual machine image file of a Linux data partition. The original length of this file is 527,467,008 B.
Table 6 shows the relationship of the compressed file size and the data window/block size, and Table 7 shows the relationship of the compression/decompression time and the data window/block size.


In Table 6, we observe that bzip2 has the lowest compression ratio among these softwares, and ComZip with CZBWT has the second lowest compression ratio. In Table 7, we observe that bzip2 and ComZip with CZBWT cannot be faster than p7zip and ComZip with CZBWT. Thus, the experiment results provide an example that some kinds of data cannot get better compression ratio and speed by using BWT.
This paper focuses on the BWT algorithms. Researchers may use their own data to find what kind of data fit the BWT well.
From all of the above experiment results, we can get some support about the advantages of CZBWT: the compression ratio for some kinds of data, and the compression time contrasting to the other universal BWT compression software. And these results provide some references to the performance of CZBWT running on x86/64 and ARM platforms, which may infer the feasibilities and practicalities of using CZBWT in the future and current sensors.
But these results also reveal that BWT cannot always gain better compression ratio than other compression algorithms. Thus, the BWT filter in ComZip remains alternative. And compared to the standard BWT, CZBWT has lower compression ratio, and its decompression is slower. So we regard the elimination of the weaknesses from CZBWT as our future work.
6. Conclusions and Future Work
The rapid expansion of IoT leads to numerous sensors, which generate massive data and bring the challenges of data transmission and storage. A valuable way for this requirement is data compression, and BWT can gain good compression ratios for some kinds of data, which can be used in the sensors.
But the problems of BWT in the sensors for big data still exist. Due to the limited computation resources of each sensor, enlarging the BWT block without the rapid decrease of the encoding/decoding speed is a problem. If the sensor needs hardware acceleration for BWT, simplifying the complex BWT to design the hardware is another problem.
To solve these problems, this paper presents CZBWT algorithm, a fast algorithm of truncated BWT using bucket sort. CZBWT is implemented in the shareware ComZip. It supports the BWT block up to 2 GB currently, and it is easy to support a larger block, which meets the requirements of big data compression.
The analyses indicate that CZBWT encoding has the time complexity O(N), and it’s faster than the BWT encoding with SAIS. The space complexity of CZBWT encoding is also O(N), and it uses less RAM than that with SAIS, if the block size is large enough and the RAM for bucket array can be ignored. The primary hardware design of bucket sort infers that the hardware acceleration for CZBWT is relatively easy to realize.
The experiment results support that ComZip with CZBWT is obviously faster than bzip2, and it can obtain better compression ratio than bzip2, p7zip, and ComZip without CZBWT for some kinds of data. And these results provide references to the performance of CZBWT running on x86/64 and ARM platforms, which may infer that using CZBWT in the future and current sensors is feasible and practical.
On the other hand, these experiment results also provide the proofs of the weakness analyses in the CZBWT. Compared to the standard BWT, CZBWT has lower compression ratio, and its decompression is slower. How can the loss of the compression ratio be analyzed for the truncated BWT? Can we enhance the truncated BWT encoding to obtain better compression ratio? Can we change CZBWT into standard BWT and keep its advantages? Can we optimize the decompression algorithms of ComZip, especially the CZBWT, to get better speed? Solving these problems is the future work.
Conflicts of Interest
The authors declare that there is no conflict of interest regarding the publication of this paper.
References
 M. Burrows and D. J. Wheeler, A BlockSorting Lossless Data Compression Algorithm, Systems Research Center, Palo Alto, CA, USA, 1994.
 H. Li and R. Durbin, “Fast and accurate short read alignment with Burrows–Wheeler transform,” Bioinformatics, vol. 25, no. 14, pp. 1754–1760, 2009. View at: Publisher Site  Google Scholar
 Y. Liu, T. Hankeln, and B. Schmidt, “Parallel and spaceefficient construction of burrowswheeler transform and suffix array for big genome data,” IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol. 13, no. 3, pp. 592–598, 2016. View at: Publisher Site  Google Scholar
 J. C. Qin and Z. Y. Bai, “Design of new format for mass data compression,” The Journal of China Universities of Posts and Telecommunications, vol. 18, no. 1, pp. 121–128, 2011. View at: Publisher Site  Google Scholar
 Q. Jiancheng, L. Yiqin, and Z. Yu, “Parallel algorithm for wireless data compression and encryption,” Journal of Sensors, vol. 2017, Article ID 4209397, 11 pages, 2017. View at: Publisher Site  Google Scholar
 A. Moffat, R. M. Neal, and I. H. Witten, “Arithmetic coding revisited,” ACM Transactions on Information Systems, vol. 16, no. 3, pp. 256–294, 1998. View at: Publisher Site  Google Scholar
 A. Moffat, “Implementing the PPM data compression scheme,” IEEE Transactions on Communications, vol. 38, no. 11, pp. 1917–1921, 1990. View at: Publisher Site  Google Scholar
 J. Ziv and A. Lempel, “A universal algorithm for sequential data compression,” IEEE Transactions on Information Theory, vol. 23, no. 3, pp. 337–343, 1977. View at: Publisher Site  Google Scholar
 S. J. Puglisi, W. F. Smyth, and A. H. Turpin, “A taxonomy of suffix array construction algorithms,” ACM Computing Surveys, vol. 39, no. 2, p. 4, 2007. View at: Publisher Site  Google Scholar
 J. Kärkkäinen, P. Sanders, and S. Burkhardt, “Linear work suffix array construction,” Journal of the ACM, vol. 53, no. 6, pp. 918–936, 2006. View at: Publisher Site  Google Scholar
 P. Ko and S. Aluru, “Space efficient linear time construction of suffix arrays,” Journal of Discrete Algorithms, vol. 3, no. 2–4, pp. 143–156, 2005. View at: Publisher Site  Google Scholar
 G. Nong, S. Zhang, and W. H. Chan, “Two efficient algorithms for linear time suffix array construction,” IEEE Transactions on Computers, vol. 60, no. 10, pp. 1471–1484, 2011. View at: Publisher Site  Google Scholar
 N. Timoshevskaya and W. C. Feng, “SAISOPT: on the characterization and optimization of the SAIS algorithm for suffix Array construction,” in 2014 IEEE 4th International Conference on Computational Advances in Bio and Medical Sciences (ICCABS), pp. 1–6, Miami, FL, USA, 2014. View at: Publisher Site  Google Scholar
 U. Baier, Lineartime suffix sorting – a new approach for suffix array construction, [M.S. thesis], Ulm University, BadenWürttemberg, Germany, 2015.
 V. Z. Grajeda, C. F. Uribe, and R. C. Parra, “Parallel hardware/software architecture for the BWT and LZ77 lossless data compression algorithms,” Computación y Sistemas, vol. 10, no. 2, pp. 172–188, 2006. View at: Google Scholar
 U. I. Cheema and A. Khokhar, “A high performance architecture for computing BurrowsWheeler transform on FPGAs,” in 2013 International Conference on Reconfigurable Computing and FPGAs (ReConFig), pp. 1–6, Cancun, YUC, Mexico, 2013. View at: Publisher Site  Google Scholar
 M. Deo and S. Keely, “Parallel suffix array and least common prefix for the GPU,” ACM SIGPLAN Notices, vol. 48, no. 8, p. 197, 2013. View at: Publisher Site  Google Scholar
 C. H. Chang, M. T. Chou, Y. C. Wu et al., “sBWT: memory efficient implementation of the hardwareaccelerationfriendly Schindler transform for the fast biological sequence mapping,” Bioinformatics, vol. 32, no. 22, pp. 3498–3500, 2016. View at: Publisher Site  Google Scholar
Copyright
Copyright © 2018 Qin Jiancheng et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.