Abstract

Original images are often compressed for the communication applications. In order to avoid the burden of decompressing computations, it is thus desirable to segment images in the compressed domain directly. This paper presents a simple rate-distortion-based scheme to segment images in the JPEG2000 domain. It is based on a binary arithmetic code table used in the JPEG2000 standard, which is available at both encoder and decoder; thus, there is no need to transmit the segmentation result. Experimental results on the Berkeley image database show that the proposed algorithm is preferable in terms of the running time and the quantitative measures: probabilistic Rand index (PRI) and boundary displacement error (BDE).

1. Introduction

Data segmentation is important in many applications [16]. Early research work on image segmentation is mainly at a single scale, especially for medical images [79]. In the human visual system (HVS), the perceived image is decomposed into a set of band-pass subimages by means of filtering with simple visual cortical cells, which can be well modeled by Gabor filters with suitable spatial frequencies and orientations [10]. Other state-of-the-art multiscale techniques are based on wavelet transform (WT), which provides an efficient multiresolution representation in accord with the property of HVS [11]. Specifically, the higher-detail information of an image is projected onto a shorter basis function with higher spatial resolution. Various WT-based features and algorithms were proposed in the literature for image segmentation at multiple scales [1214].

For the communication applications, original images are compressed in order to make good use of memory space and channel bandwidth. Thus, it is desirable to segment a compressed image directly. The Joint Photographic Expert Group (JPEG) standard adopts discrete cosine transform for subband image coding. In order to improve the compression performance of JPEG with more coding advantages, for example, embedded coding and progressive transmission, the JPEG2000 standard adopts WT as the underlying transform algorithm. Specifically, embedding coding is to code an image into a single code stream, from which the decoded image at any bit rate can be obtained. The embedded code stream of an image is organized in decreasing order of significance for progressive transmission over band-limited channels. This property is particularly desirable for the Internet streaming and database browsing applications [1517]. Zargari proposed an efficient method for JPEG2000 image retrieval in the compressed domain [18]. Pi proposed a simple scheme to estimate the probability mass function (PMF) of wavelet subbands by counting the number of 1-bits and used the global PMF as features to retrieve similar images from a large database [19]. For image segmentation, however, the local PMF is needed. In [20], we proposed a simple method to compute the local PMF of wavelet coefficients based on the MQ table. It can be applied to a JPEG2000 code stream directly, and the local PMF can be used as features to segment a JPEG2000 image in the compressed domain.

Motivated by the idea behind the postcompression rate distortion (PCRD) algorithm [15], we propose a simple algorithm called the rate-distortion-based merging (RDM) algorithm for JPEG2000 image segmentation. It can be applied to a JPEG2000 code stream instead of the decoded image. As a result, the burden of decoding computation can be saved. In addition, the RDM algorithm is based on the MQ table, which is available at both encoder and decoder; thus, no overhead transmission is added from a segmentation viewpoint. The remainder of the paper proceeds as follows. In Section 2, the JPEG2000 standard is reviewed briefly. In Section 3, the MQ-table-based rate distortion slope (MQRDS) is proposed to examine the significance of wavelet segments; based on which, the RDM algorithm is thus proposed to merge wavelet segments with similar characteristics. Experimental results on the Berkeley color image database are given in Section 4. Conclusions can be found in Section 5.

2. Review of the JPEG2000 Standard

The core module of the JPEG2000 standard is the embedded block coding with optimized truncation (EBCOT) algorithm [15], which adopts wavelet transform (WT) as the underlying method to decompose an image into multiresolution subbands. WT has many desirable properties, for example, the self-similarity of wavelet coefficients across subbands of the same orientation, the joint space-spatial frequency localization with orientation selectivity, and the energy clustering within each subband [11]. The fundamental idea behind EBCOT is to take advantage of the energy clustering property of wavelet coefficients. EBCOT is a two-tier algorithm; tier-1 consists of bit plane coding (BPC) followed by arithmetic coding (AC); tier-2 is primarily for optimal rate control. Three coding passes, namely, the significance propagation (SP) pass, the magnitude refinement (MR) pass, and the clean-up (CU) pass, are involved with four primitive coding operations, namely, the significance coding operation, the sign coding operation, the magnitude refinement coding operation, and the clean-up coding operation. For a wavelet coefficient that is currently insignificant, if any of the 8 neighboring coefficients are already significant, it is coded in the SP pass using the significance coding operation; otherwise, it is coded in the CU pass using the clean-up coding operation. If this coefficient becomes significant, its sign is then coded using the sign coding operation. The magnitude of the significant wavelet coefficients that have been found in the previous coding passes is updated using the magnitude refinement coding operation in the MR pass. The resulting code streams of coding passes can be compressed further by using a context-based arithmetic coder known as the MQ coder. JPEG2000 defines 18 context labels for the MQ coder and stores their respective probability models in the MQ table. Specifically, 10 context labels are used for the significance coding operation and the clean-up coding operation, 5 context labels are used for the sign coding operation, and 3 context labels are used for the magnitude refinement coding operation.

In JPEG2000, a large image can be partitioned into nonoverlapped subimages called tiles for computational simplicity. WT is then applied to the tiles of an image for subband decompositions; and each wavelet subband is further divided into small blocks called code blocks. The code blocks of an image are independently coded from the most significant bit plane (MSB) to the least significant bit plane (LSB). Based on the true rate-distortion slope (RDS) of code blocks, JPEG2000 concatenates the significant code streams with large RDS using the post compression rate distortion (PCRD) algorithm for optimal rate control. More specifically, let be a set of code blocks in the whole image. The code stream of can be terminated at the end of a coding pass, say , with the bit rate denoted by ; all the end points of coding passes are possible truncation points. The distortion incurred by discarding the coding passes after is denoted by . PCRD selects the optimal truncation points to minimize the overall distortion: subject to the rate constraint: , where is a given bitrate. It is noted that the coding passes with nonincreasing RDS are candidates for the optimal truncation points. Motivated by the idea of the above, a new technique is proposed to segment JPEG2000 images in the JPEG2000 domain; the detail is given in the following section.

3. Image Segmentation in the JPEG2000 Domain

This section presents a simple merging algorithm for JPEG2000 image segmentation. It merges wavelet segments with similar characteristics based on the change of the estimated RDS in the JPEG2000 domain. Thus, the proposed algorithm can be applied to a JPEG2000 code stream without decompressing complexity.

3.1. MQ Table-Based Probability Mass Function

In JPEG2000, the wavelet coefficients of an image are quantized with bit planes, and binary wavelet variables are almost independent across bit planes. The probability mass function (PMF) known as the wavelet histogram [19] can be approximated by where is the magnitude of a wavelet coefficient,, is the PMF of the binary wavelet variable, , on the bit plane, and is the number of bit planes. For image segmentation, the local PFM is needed. We had proposed a simple method to estimate the local PMF based on the MQ table [20]. Specifically, the probability of 1-bit,, is given by where is the probability of the less probable symbol (LPS), which is stored in the MQ table and MPS denotes the more probable symbol. The set obtained from the MQ table can be used to compute the local PMF. As the MQ table is also available at decoder, no overhead transmission is needed for the computation of PMF. In addition, JPEG2000 defines only 18 context labels to model the binary wavelet variables; thus, the computation of PMF is simple.

3.2. MQ Table-Based Rate Distortion Slope and Merging Algorithm

Motivated by the post compression rate distortion (PCRD) algorithm [15], we propose the MQ table-based rate distortion slope (MQRDS) for image segmentation in the JPEG2000 domain as follows: where is the distortion of wavelet segment: defined as is a wavelet coefficient at location: in wavelet segment, represented by The estimate of can be computed by in which can be obtained from the binary arithmetic code table known as the MQ table as follows: The estimate of code length can be efficiently obtained by using [2] where denotes the bit plane index, is the binary variable of on bit plane , which are independent across bit planes, is the number of bit planes, is the feature space dimension, is the number of wavelet coefficients in segment , is the total number of wavelet coefficients, and is an entropy operation. After merging two wavelet segments, say and n, the change of MQRDS is given by where and are the MQRDS of wavelet segments, and , with sizes and , respectively, and is the MQRDS of the merged wavelet segment. As one can see, the change of MQRDS is likely to be increased significantly for wavelet segments with similar characteristics. Thus, we propose a simple algorithm called the rate-distortion-based merging (RDM) algorithm for JPEG2000 image segmentation, which is presented in the steps below.

The RDM Algorithm
Step 1. Given a JPEG2000 code stream, compute the MQ table-based local PMF of wavelet coefficients using (2).Step 2. As mentioned in [2], a set of oversegmented regions known as superpixels is in general needed for any merging algorithms; this low-level initial segmentation can be obtained by coarsely clustering the local PMF as features.Step 3. For all pairs of superpixels, compute their respective changes of MQRDS using (12), and merge the one with maximum change of MQRDS.Step 4. Continue the merging process in step 3 until the change of MQRDS is insignificant.

In order to reduce the computation time, the following equation can be used to approximate (6): Moreover, the cross terms of the previous equation are not significant and can be discarded for computational simplicity. Figure 1 depicts flowchart of the RDM algorithm. It is noted that the MQ table defined in JPEG2000 is finite, thus (10) can be obtained by look-up table (LUT); this sure reduces the computation time further. As shown in Figure 2, RDM can be applied to a JPEG2000 code stream directly; this is one of the advantages of RDM.

4. Experimental Results

In the first experiment, the potential of the MQ table-based local PMF (LPMF) is shown by segmenting images with Brodatz textures. As noted, the essential characteristics of textures are mainly contained in the middle-high-frequency wavelet subbands; thus, we applied a simple clustering algorithm known as K-means to the LPMF of wavelet coefficients to generate an initial segmentation. The number of superpixels was set to 30, which was then finely merged using the RDM algorithms. Figure 3(a) shows the test image with two Brodatz textures, namely, wood and grass. The segmentation result and error image with white pixels representing misclassifications are shown in Figure 3(b) and Figure 3(c), respectively. Figure 3(d) shows the percentages of errors at various rates of bits per pixel (bpp). It is noted that the segmentation results even at low-middle bpp rates are still satisfactory. Hence, a small portion of JPEG2000 code stream is sufficient for the segmentation task.

The RDM algorithm has also been extensively evaluated on the Berkeley image database [21]. We adopted the Waveseg algorithm [14] to compute the initial superpixels of a natural color image. In order to avoid decoding a JPEG2000 code stream, the Waveseg algorithm was applied to the estimated wavelet coefficients instead of the decoded wavelet coefficients. More specifically, the estimated wavelet coefficient of using the MQ table-based LPMF is as follows. where is the probability of 1-bit on the bit plane, which can be obtained from the MQ table. The resulting superpixels were then merged by RDM with threshold, , set to 0.1. We compared the RDM algorithm with two other state-of-the-art algorithms known as Mean-shift [22] and CTM [2]. In Mean-shift, the parameters, and , were set to 13 and 19, respectively; in CTM, the threshold was set to 0.1, as suggested in [2]. The original images shown at the top of Figure 4 are natural images contained in the Berkeley database, namely, Pyramids, Landscape, Horses, and Giraffes. Their respective segmentation results using RDM, CTM, and Mean-shift are shown in the second, third, and fourth rows. Visual inspection shows that RDM and Mean-shift have similar performances for the first three images; the performances of RDM and CTM are similar to detect the giraffes shown in the fourth image.

In addition to visual inspection [23, 24], two commonly used measures, namely, the probabilistic Rand index (PRI) and the boundary displacement error (BDE) [25], were adopted for quantitative comparisons. Table 1 gives the average PRI performance on the Berkeley database. PRI ranges from 0 to 1, and higher is better. BDE measures the average displacement error of boundaries between segmented images, which is nonnegative, and lower is better. The average BDE performance is given in Table 2. It is noted that RDM outperforms CTM and Mean-shift in terms of the PRI and BDE measures.

The running times on a PC are given in Table 3. It shows that RDM is faster than CTM and Mean-shift due largely to the simple computations of (8) and (13). Moreover, RDM can be applied to a JPEG2000 code stream directly while most algorithms such as Mean-shift and CTM are primarily applied to the original or decoded image and it takes more time to decode a compressed image.

5. Conclusions

The MQ table defined in the JPEG2000 standard provides useful information that can be used to compute the local probability mass function (LPMF) of wavelet coefficients. A simple LPMF-based scheme has been proposed to estimate the rate distortion slope (RDS) of a wavelet segment. It is noted that the RDS is increased significantly after merging a pair of wavelet segments with similar characteristics into a single segment. Similar ideas of the above can be used to improve the rate control performance of JPEG2000 [2628]. In this paper, we propose the rate-distortion-based merging (RDM) algorithm to segment images in the framework of JPEG2000. RDM has been evaluated on images with Brodatz textures and the Berkeley color image database. Experimental results show that the segmentation performance even at low-middle bpp rates is rather promising. For natural images with high-detail contents, RDM is preferable in terms of the average PRI and BDE measures. In addition, the total running time of RDM, which includes the computation of superpixels and the merging process, is faster than Mean-shift and CTM.

As RDM is based on the MQ table, which is available at both encoder and decoder, no overhead transmission is needed to compute the LPMF of wavelet coefficients. RDM can be applied to a JPEG2000 code stream directly; thus, the burden of decompressing computation can be avoided, and memory space that is required to store the decompressed image is no longer necessary from the segmentation point of view.

Acknowledgments

The authors are grateful to the maintainers of the Berkeley image database. The National Science Council of Taiwan, under Grants NSC100-2628-E-239-002-MY2 and NSC100-2410-H-216-003, supported this work.