Abstract

We present a lossless and low-complexity image compression algorithm for endoscopic images. The algorithm consists of a static prediction scheme and a combination of golomb-rice and unary encoding. It does not require any buffer memory and is suitable to work with any commercial low-power image sensors that output image pixels in raster-scan fashion. The proposed lossless algorithm has compression ratio of approximately 73% for endoscopic images. Compared to the existing lossless compression standard such as JPEG-LS, the proposed scheme has better compression ratio, lower computational complexity, and lesser memory requirement. The algorithm is implemented in a 0.18 μm CMOS technology and consumes 0.16 mm × 0.16 mm silicon area and 18 μW of power when working at 2 frames per second.

1. Introduction

Wireless capsule endoscopy (WCE) [14] is a state-of-the-art technology to receive images of human intestine for medical diagnostics. In this technique, the patient ingests a specially designed electronic capsule which has imaging and wireless circuitry embedded inside (as shown in Figure 1). While the capsule travels through the gastrointestinal (GI) tract, it captures images and sends them wirelessly to an outside workstation (i.e., PC), where the images are reconstructed and displayed on a monitor for medical diagnostics. The development of wireless capsule endoscopy has changed video endoscopy of the little intestine into a much invasive and more complete examination. The increasing use of these resources and the comfort and ease with which some of these examinations can be performed makes it likely that wireless capsule video imaging will have a substantial impact on the management of small intestinal disease as well as other parts of the body. The capsule runs on button batteries that need to supply power for about 8–10 hours [1]. In this paper, our focus is on the image compressor in the capsule. Here, we propose an image compression algorithm by exploring the unique properties of endoscopic images. The scheme consists of a simple and static prediction scheme and encoding the error both in golomb-rice [5, 6] and in unary coding. The algorithm is particularly suitable to work with any commercial low-power image sensors [7, 8] that output image pixels in raster scan fashion, eliminating the need of large buffer memory to store the complete image frame. The proposed algorithm has low computational complexity and it is simple to implement.

There have been some works reported on the image compressor of the capsule. In [911], Discrete-cosine-transform (DCT) based image compressors are proposed. In DCT-based image compressors, 4×4 or 8×8 pixel blocks need to be accessed from the image sensor. However, commercial CMOS image sensors [7, 8] send pixels in a row-by-row (i.e., raster-scan) fashion and do not provide buffer memory. So, to implement these DCT-based algorithms, buffer memory needs to be implemented inside the capsule to store an image frame. For instance, in order to store a 256×256-size color image (i.e., 24 bits per pixel), a minimum size of 192 kB buffer memory is required. The memory takes large area and consumes sufficient amount of power which can be a noticeable overhead. Moreover, the computational cost associated in such transform-coding (i.e., multiplications, additions, data scheduling, etc.) results in high area and power consumption. DCT-based compression algorithms are lossy. For proper medical diagnostics, lossless images are more desirable. Among lossless image compression standards, JPEG-LS [12, 13] can be a good choice for compression for endoscopic capsule application, because it can work with pixels coming in raster-scan fashion, does not need to buffer the whole image in memory. In [14, 15], design of an image compressor based on JPEG-LS algorithm is described. However, it needs memory to store at least one row of the image and approximately 1.9kB register arrays to store other key control parameters and contexts of JPEG-LS [16].

The rest of the paper is organized as follows. In Section 2, the design criterion of the image compressor is described. Section 3 describes several analysis of endoscopic images. In Section 4, the proposed compression algorithm and its performance is described. In Section 5, the complexity of the proposed algorithm is discussed. Section 6 discusses the hardware architecture of the compressor, and Section 7 concludes the paper.

2. Design Criterion

Considering the application need and the environment, we have set the following design criterion:(i)The operation of the capsule should be at low power consumption as the battery life is limited. Therefore, image-processing algorithms with high complexity (such as transform coding, wavelet decomposition, etc.) cannot be performed within the capsule. Hence, we focus on algorithms with very low complexity. This will give the opportunity to add additional features (e.g., robotic capability, speed control, [1720], etc.) with the available capsules.(ii)The size of a typical capsule is limited to 28mm×11mm [2]. Thus, the area is a critical issue that limits the usage of memory size. Memory consumes more silicon area and power. Here, we focus on algorithms that do not require much memory.(iii)The compressor should be able to work with commercially available CMOS image sensors which send data in raster-scan fashion. (iv)For an accurate diagnosis, the quality of image reconstruction is very important. So, we focus on lossless image compression algorithms.(v)Finally, the compression algorithm must also be able to reduce the data so that transmitter can send images at approximately 2 frames per second (fps) using its limited bandwidth.

3. Analysis of Endoscopic Images

In this section, we present our analysis on several endoscopic color images from [21]. One hundred test images have been taken from twenty different positions of the GI tract for the study; among them 20 images are shown in Figure 17.

3.1. RGB to YUV Conversion

First of all, we convert the test images from RGB color space to YUV using 𝑌=(66×𝑅+129×𝐺+25×𝐵+128)256+16,𝑈=(38×𝑅74×𝐺+122×𝐵+128)256+128,𝑉=(112×𝑅94×𝐺18×𝐵+128)256+128.(1)

Figure 2 shows a 3D plot between component values (red, green and blue) and positions of an endoscopic image. From Figure 2, we can see that in RGB plane, the changes in pixel values are high, which means that there are more information contained in the three components. After the conversion to YUV, the changes in pixel values have been reduced as shown in Figure 3. In YUV, the intensity (luminance) is stored in 𝑌 component and 𝑈 and 𝑉 contains color information (chrominance). From Figure 3, the 𝑈 and 𝑉 component values do not vary much with positions and remain almost constant in the same plane. It is also noted that, this transformation does not theoretically degrade the quality of image; neither any information is lost. The small information content in 𝑈 and 𝑉 plane enables to use fewer bits to code them, without losing any information at all.

In addition, most commercially available image sensors are now equipped with a RGB-YUV converter. The proposed scheme aims to take the advantage of the on-chip converter, that is, there is no need to implement (1) inside the image compressor—that leads to save area and power consumption of the compressor.

3.2. Prediction

The JPEG compression standard supports a lossless mode where a simple lossless predictive coding (LPC) method is employed [5]. A predictor combines the values of up to three neighboring samples (𝐴, 𝐵, and 𝐶) to form a prediction of the sample indicated by 𝑋 in Figure 4.

Among the seven prediction modes as shown in Figure 5, the mode-1 prediction is the simplest to implement in hardware because pixels are sent in raster-scan fashion from image sensors and no buffer memory is required to implement mode-1 prediction scheme. On the other hand, mode-4 gives the most accurate prediction. Mode 4 has a geometrical interpretation as shown in Figure 6., which is, if we interpret each pixel as a point in a three-dimensional space, with the pixel’s intensity as its height, then the value 𝐴+𝐵𝐶 places the prediction 𝑋𝑝 on the same plane created by the pixels 𝐴, 𝐵, and 𝐶 [5]. This produces the most accurate prediction when there is no sharp edge in the neighborhood. However, to implement mode-4 prediction scheme in hardware, the previous row of the current pixel needs to be buffered which increases the hardware cost.

For a comparative study, the compression ratio for each prediction mode is provided in Table 2.

The prediction 𝑋𝑝, is subtracted from the actual value of the sample 𝑋 as shown in (2), and then the difference (𝑑𝑋) is entropy encoded. 𝑑𝑋=𝑋𝑋𝑃(2)

3.3. Difference Encoding

The next important step is to find a suitable variable-length encoding. For this purpose, we analyse the 𝑑𝑋 values of 𝑌, 𝑈, and 𝑉 components for mode-1 prediction. These changes (i.e., 𝑑𝑌, 𝑑𝑈, and 𝑑𝑉) and their number of occurrence in an endoscopic image and an standard “baboon” image is shown in Figures 7(a) and 7(b) namely. Here, we see that for endoscopic images, the 𝑑𝑈 and also 𝑑𝑉 value occurs in a narrower range than standard images. It is due to the color homogeneity and the absence of bluish color objects in the GI tract of human intestine. The plots in Figure 7 shows two-sided geometric distribution.

3.3.1. Golomb-Rice Coding for 𝑑𝑌 Component

In the case of geometric distribution, the golomb code gives the optimum code length [6]. The golomb-rice code is a simpler version of the golomb code, which is easier to implement in hardware and have compression efficiency near to the golomb code [5]. Hence, we have chosen it to encode the difference values (𝑑𝑋). Since, 𝑑𝑋 can be either positive or negative, and golomb-rice code can work only with positive integers, we map the positive 𝑑𝑋to even integers and negative 𝑑𝑋 to odd integers using (3). The mapped integers (𝑚_𝑑𝑋) are then encoded using golomb-rice code.2||||𝑚_𝑑𝑋=2𝑑𝑋,when𝑑𝑋0𝑑𝑋1,when𝑑𝑋<0(3) Our experiments show that the values of 𝑑𝑌 do not generally exceed the range from +127 to −128 due to the absence of extreme sharp changes between two consecutive pixels in endoscopic images. The 𝑑𝑈 and 𝑑𝑉 values vary in a narrower range. So, it can be safely assumed that, the mapped positive integers (𝑚_𝑑𝑋) will range from 0 to 255, which can be expressed in binary using 8 bits. The optimized golomb-rice coding is as follows. (i)We denote the variable 𝐼 using: 𝐼=28=256(4)(ii)The 𝑚_𝑑𝑋 is divided by M, where M is a predefined integer, and a power of 2 as shown in (5): 𝑀=2𝑘,𝑞=Integer𝑚_𝑑𝑋𝑀,𝑟=𝑚_𝑑𝑋mod𝑀(5)(iii)The quotient (q) is expressed in unary in 𝑞+1 number of bits. Then the remainder (r) is concatenated with the unary code, and r is expressed in binary in 𝑘 number of bits. It is desirable to limit the size of the golomb-rice code as it becomes very long for larger values. This is done by using a parameter named 𝑔limit. If 𝑞>=(𝑔limitlog2𝐼1), then the unary code of 𝑔limitlog2𝐼1 is generated. This acts as an escape code for the decoder and is followed by the binary representation of m_dX in log2𝐼 bits.(iv)The maximum length in golomb-rice code (𝑔limit) is chosen as 32. The length of a golomb-rice code can be calculated using (6) 𝑗=𝑔limitlog2𝐼1,𝑔𝑟_𝑙𝑒𝑛=𝑞+1+𝑘,when𝑞<𝑗,𝑔limit,when𝑞𝑗.(6)

The length of golomb-rice code for different values of 𝑘 is shown in Figure 8. It has been noticed from Figure 7 that the most occurred value of 𝑑𝑈 and 𝑑𝑉 is zero and others are very close to zero. For 𝑑𝑌, a wider range of value occurs. We choose 𝑘=2 for encoding the mapped integers of 𝑑𝑌.

3.3.2. Unary Coding for 𝑑𝑈 and 𝑑𝑉 Component

The number of occurrence of 𝑑𝑈=0 and 𝑑𝑉=0 is very high as shown in Figure 7. In golomb-rice code for 𝑘=2, it needs 3 bits to represent “0”. However, in unary coding, it only needs 1 bit to represent “0”. Hence, to get a good compression, smaller length codes for zero and near-zero values need to be assigned. As 𝑑𝑈 and 𝑑𝑉 values are mostly zero (others are very near to zero), unary code can be used to get better compression. Unary code can be generated by setting 𝑘=0 in the golomb-rice encoder. We use the maximum unary code length (𝑢limit) as 32 similar as 𝑔limit for golomb-rice code. It should be noted that, unary code gets very large for larger values as shown in Figure 8. 𝑑𝑌 has wider range of values than 𝑑𝑈 and 𝑑𝑉 as shown in Figure 7. So, unary coding of 𝑑𝑌 does not improve compression ratio. The coding scheme is summarized in Table 1.

3.4. Corner Clipping

WCE images are generally shown inside a round circular area as shown in Figure 9(a) due to the cylindrical shape of the GI tract. The black corners of the image do not have any importance in diagnostics. So, the capsule may discard these pixels during compression and thus higher compression ratio can be achieved. From hardware implantation point of view, it is easier to cut the four corners diagonally than cutting the image in a circular fashion.

For a square of length 𝑊, as shown in Figure 7(b), we calculated the maximum value of 𝛼 using (7) where no pixels are discarded inside the circle of radius 𝑊/2.𝛼==sin452𝑤𝑑2=𝑊2×212=0.29×𝑊(7) For a 256×256 image, the value of 𝛼 is 75. Once, 𝛼 is determined, the clipping algorithm can be implemented using a few combinational logic blocks as shown in Figure 10.

4. Proposed Compression Algorithm

Based on the above analysis, we have constructed a new image compression algorithm dedicated for capsule endoscopy application. The pixel values of the image are read (by the image compressor) in a raster-scan fashion. The overall algorithm is shown in block in Figure 11. The encoder and the decoder using the proposed compression scheme have been implemented in PC software and verified.

We have applied the proposed compression algorithm with seven different prediction schemes on total one-hundred test endoscopic images of GI tract beginning from the larynx to the anus and the average compression ratio are shown in Table 2. The compression ratio (CR) is calculated using (8) and the overall peak-signal-to-noise-ratio (PSNR) is calculated using (9) CR=1TotalBitsAfterCompressionTotalBitsBeforeCompression×100,(8)PSNRoverall=20×log10×255(1/𝑀𝑥𝑁𝑥3)3𝑐=1𝑁𝑛=1𝑀𝑚=1𝑥𝑚,𝑛,𝑐𝑥𝑚,𝑛,𝑐2,(9) where 𝑀 and 𝑁 are the image width and height namely; 𝑥 and 𝑥 are the original and reconstructed component values namely. C represents the three colour components red, green, and blue namely.

From Table 2, we see that the mode-4 prediction scheme gives the best compression ratio which is 72% without corner clip and 76.8% with corner clipping. The mode-1 prediction scheme is the simplest to implement in hardware as it does need buffer memory to store one row of image pixels. From Table 2, we see that the compression ratio for mode-1 is 67% without corner clipping and 73.2% with corner clipping. With an 800 kbps transceiver [22], compressed images can be transmitted at 1.9 and 2.2 frame per second using mode-1 and mode-4 prediction scheme namely. As the proposed compression is lossless, the reconstructed images are identical with the original image as shown in Figure 12. Hence, the overall average PSNR of the reconstructed images is infinity and the structural similarity (SSIM) index [23, 24] is 1.

In Table 2, comparison is also made with some existing works on endoscopic image compression. Here, we see that the proposed compression algorithm is the only lossless compression algorithm which produces infinite PSNR of the reconstructed images. In terms of compression ratio, the proposed algorithm outperforms [14, 15, 26, 27]. It should be noted that, the work in [9, 11, 25, 28] use DCT-based approach, which is computationally very expensive. Moreover, the reconstructed images may contain blocking artifacts due to inherent nature of DCT-based algorithms. Note that, our proposed scheme does not contain any blocking artifacts. Thus, considering the compromise of CR and PSNR, the proposed algorithm outperforms all other schemes by a good margin. The proposed lossless compression algorithm is applied on two standard images and the compression ratio is shown in Table 3. Comparing Tables 2 and 3, we see that the proposed compression algorithm works much better on WCE images than standard images. It is due to the fact that there are relatively large variations in 𝑈 and 𝑉 component in nonendoscopic images than endoscopic images. As a result, dU and dV spans in wider range in nonendoscopic images as shown in Figure 7 and the use of unary code cannot produce good compression ratio as unary code becomes very large for large values as shown in Figure 8.

5. Copmpelxity Analysis

A comparison between the proposed lossless algorithm and the standard JPEG-LS is shown in Table 4. Here, we see that the proposed algorithm has lower computational complexity (such as static prediction and static 𝑘 parameter) and lower memory requirement. We also see that the proposed compression algorithm outperforms the standard JPEG-LS algorithm [29] in compression ratio by 15% for endoscopic images. In Table 5, a comparison of the complexity of the proposed scheme with other works is shown. For an image of n pixels, the proposed algorithm has lowest computational complexity 𝑂(𝑛). For the mode-1 prediction scheme as shown in Figure 5, no buffer memory is required. However, for mode-4 prediction, buffer memory is required to store one row of the image pixels. On the other hand, the DCT-based algorithms [911, 25, 28] have complexity of 𝑂(𝑛log𝑛) and need memory buffer to store image frame.

Compared with the JPEG-LS scheme in [14], the presented algorithm implements simpler prediction scheme and works on YUV plane. Moreover, no buffer memory is required to store the context which is needed in standard JPEG-LS. The proposed algorithm does not implement the “Run mode” due to fact that endoscopic images do not contain long runs. The work in [26] uses a wavelet-based coding; although the complexity of wavelet transform is 𝑂(𝑛), the actual implementation takes more power and area than the proposed predictive coding-based scheme. The work based on sensing theory [27] has the highest computational complexity, 𝑂(𝑛3).

6. Hardware Architecture

The proposed lossless compression algorithm have been implemented in VHDL and simulated for functional verification. The compressor is designed to compress QVGA (320×240), 24 bits per pixel color images. Figure 13 shows the overall block diagram of the proposed compressor. The input lines are designed in a way that can be connected to any digital-video-port (DVP) based commercial image sensor [7, 8]. The compressor outputs 32-bit encoded bit vector and 5-bit code length. A parallel-to-serial (P2S) converter is needed to connect to commercial transceiver [22] which accepts data serially using serial peripheral interface (SPI) (or any other serial) protocol.

Most commercial CMOS image sensors [7, 8] send image data bytes (both in RGB and YUV) in raster-scan fashion using a common DVP interface [30]. The input lines of the proposed compressor implements the DVP interface so that commercial CMOS image sensors can be directly interfaced with the proposed compressor. In a DVP interface, the VD (or VSYNC) and HD (or HSYNC) pins indicate the end of frame and end of row respectively. The image compressor samples the YUV pixel bytes from DATA_BUS(7:0) bus on each positive edge of DCLK and then generates the compressed variable length codes on the CODE_DATA(31:0) bus and its end bit index on CODE_LEN (4:0) bus within a single clock cycle (cc) of DCLK. Thus, the latency of the proposed design is 1 cc. Whenever a new code appears on the data bus, the IS_NEW_CODE signal goes from low to high. Compressed output bitstream is available for sampling at the SERIAL_DATA pin at each positive edge of SERIAL_CLK.

6.1. Image Compressor

The hardware architecture of the image compressor is shown in Figure 14. The position registers such as the Col, Row, and ByteCount stores the column, row, and byte position in the row of the currently sampled pixel namely. From the Col and Row, the Clipper module decides whether clipping is necessary. If the pixel is inside the defined visual region, then the mode-1 prediction is made and subtracted from the current pixel value. Then the error is mapped to positive integer and a variable length golomb-rice code for 𝑌 component and variable length unary code for 𝑈 and 𝑉 component is generated along with the code length.

6.2. Parallel to Serial Converter (P2S)

The architecture of the P2S block is shown in Figure 15.

It samples the CODE_DATA and CODE_LEN buses at the low to high transition of the IS_NEW_CODE and then sends the variable length code bitstream serially (starting MSB first) using the SERIAL_DATA and the SERIAL_CLK pins. The DCLK_32 is the input clock signal for this module that has at least 32 times higher frequency than the DCLK frequency, so that in one clock cycle of DCLK, the maximum length code (which is 𝑔limit=𝑢limit=32) can be safely transmitted serially. At each positive edge of SERIAL_CLK, the RF transceiver can sample the bits from the SERIAL_DATA pin.

The overall design including the P2S module is synthesized using 0.18 μm CMOS technology using standard Artisan library cells. Figure 16 shows the chip layout of the compressor. The chip specification is presented in Table 6. For a better assessment of the proposed compression scheme, the P2S converter (which merely facilities the interfacing with RF transceiver) should not be considered as part of the compressor. Then the image compressor core itself (without the P2S module) consumes 618 cells and 18 𝜇W of power at 2 fps.

In Table 7, we compare the cost of implementation with other existing schemes. Here, we see that the proposed compressor occupies the least core area and hardware cost (i.e., gate count). It does not require any memory buffer for temporary storage of image frame or blocks. The power consumption of the compressor is also the lowest (less than 1mW) comparing with all other works. The latency for [11, 28] were not reported. However, these schemes use 2-D 8×8 DCT coding that is similar to the DCT implementations presented in [31, 32]. As a result, the latency for [11, 28] is estimated to be in the range from 92 to 144 (from [31, 32] resp.). Compared to all these existing designs, the proposed scheme has a latency of 1 cc which is the lowest. Thus, the scheme meets all the design criterion set in Section 2 and presents itself as a strong candidate for energy-efficient implantable lossless image compressor for wireless endoscopic applications.

We have also implemented the image compressor with mode-4 prediction which gives the best compression ratio. However, one row of image pixels for the 3 color components needs to be buffered for mode-4 prediction. Table 8 shows the chip specification with mode-4 prediction. By comparing Table 6 and 8, we see that the hardware cost is much higher in mode-4 prediction. It should also be noted that by using memory compiler synthesis tool, the hardware cost can be reduced.

7. Conclusion

In this paper, a lossless image compression algorithm for endoscopic images has been presented. The algorithm consists of a static prediction scheme and encoding the error both in golomb-rice and in unary code. The algorithm is suitable to work with any commercial low-power image sensors that outputs image pixels in a raster scan fashion, eliminating the need of buffer memory. The proposed lossless compression algorithm is implemented in a 0.18 μm CMOS technology and consumes 0.16mm×0.16mm silicon area, and 18 𝜇W of power when working at 2 frames per second.

Acknowledgments

The authors would like to acknowledge the Natural Science and Engineering Research Council of Canada (NSERC) for its support to this research work. The authors are also indebted to the Canadian Microelectronics Corporation (CMC) for providing the hardware and software infrastructure used in the development of this design.