Abstract

Image compression is necessary in various applications, especially for efficient transmission over a band-limited channel. It is thus desirable to be able to segment an image in the compressed domain directly such that the burden of decompressing computation can be avoided. Motivated by the adaptive binary arithmetic coder (MQ coder) of JPEG2000, we propose an efficient scheme to segment the feature vectors that are extracted from the code stream of an image. We modify the Compression-based Texture Merging (CTM) algorithm to alleviate the influence of overmerging problem by making use of the rate distortion information. Experimental results show that the MQ coder-based image segmentation is preferable in terms of the boundary displacement error (BDE) measure. It has the advantage of saving computational cost as the segmentation results even at low rates of bits per pixel (bpp) are satisfactory.

1. Introduction

Image segmentation is important in many applications, ranging from industrial monitoring to medical diagnosis. Among numerous techniques, the feature-based approach has received a lot of attention due largely to its computational efficiency [1]. However, the segmentation result is dependent on the selection of feature vectors [27]. Early research work on feature extraction is mainly at a single scale. It is noted that an image is decomposed into band-pass subimages by simple visual cortical cells in the human visual system (HVS) [8], which can be modeled by Gabor filters with spatial frequencies and orientations properly tuned [9]. Wavelet transform (WT) provides an efficient multiresolution representation, in which the higher detail information of an image is projected onto the shorter basis function with higher spatial resolution, and the lower detail information is projected onto the larger basis function with higher spectral resolution. This property matches the characteristics of HVS [10]. Various WT-based schemes were proposed to extract image features at multiple scales [1114]. In addition, the advantage of WT is to take account of the phenomena of multiscales [1517], which is fundamental in nonlinear time series [1820] and fractal time series [21].

With the rapid growth of multimedia technologies [2225] and the Internet applications, image compression is still in great demand [26]. As one can see, it is desirable to extract image features in the compressed domain directly, such that the burden of decompressing an image can be avoided [14, 27, 28]. The Joint Photographic Expert Group (JPEG) standard shows satisfactory results at moderate compression rates. The JPEG2000 standard, which adopts WT as the underlying transform, is preferable for additional advantages, for example, embedded coding and progressive transmission [29, 30]. In embedded coding, the original image is coded into a single code stream, from which the decoded image at any bit rate can be obtained. For progressive transmission, which is especially beneficial to the image browsing and Internet streaming applications, JPEG2000 uses the postcompression rate distortion (PCRD) algorithm to arrange the code stream of an image in decreasing order of information importance [31]. It is based on the rate distortion theory; more specifically, the rate distortion slope (RDS) should be nonincreasing as the number of coding bits increases. For the image segmentation applications, two interesting questions are thus raised. (1) Is it possible that image segmentation can be carried out in the compressed domain such that the burden of decoding computations can be avoided? (2) Is there a common piece of information, based on which image features can be constructed at both encoder and decoder? If so, there is no need to transmit these features from encoder to decoder.

This paper presents an efficient scheme to segment an image in the compressed domain. It is a two-step algorithm. In the first step, the MQ coder-based image features are coarsely clustered into small regions known as superpixels by using the simple 𝐾-means algorithm. The inherently oversegmented superpiexels are merged recursively by using the Compression-based Texture Merging (CTM) algorithm [32]. In order to avoid over merging, we propose a simple RDS-based method to terminate CTM accordingly. The remainder of this paper proceeds as follows. In Section 2, the JPEG2000 standard is briefly reviewed. In Section 3, the modified CTM algorithm with the MQ coder-based image features is proposed to segment JPEG2000 images. Experimental results are presented in Section 4. Conclusion is given in Section 5.

2. Introduction to JPEG2000

The core of JPEG2000 is the embedded block coding with optimized truncation (EBCOT) algorithm [29], which adopts wavelet transform (WT) as the underlying method for subband decompositions. WT provides many desirable properties, for example, joint space-spatial frequency localization with orientation selectivity, self-similarity of wavelet coefficients across subbands of the same orientation, and energy clustering within each subband [10]. Among various WT-based image features, the commonly used are magnitude, energy, the generalized Gaussian distribution signature, and the cooccurrence measures [1114].

EBCOT is a two-tier algorithm. Tier-1 consists of bit-plane coding (BPC) followed by arithmetic coding (AC). Tier-2 aims for optimal rate control. In BPC, three coding passes, namely, the significance propagation (SP) pass, the magnitude refinement (MR) pass, and the clean up (CU) pass, are involved with four primitive coding operations, namely, the significance coding operation, the sign coding operation, the magnitude refinement coding operation, and the cleanup coding operation. For a wavelet coefficient that is currently insignificant, if any of the 8 neighboring coefficients are already significant, it is coded in the SP pass using the significance coding operation; otherwise, it is coded in the CU pass using the cleanup coding operation. If this coefficient becomes significant, the sign is coded immediately using the sign coding operation. In the MR pass, magnitudes of the significant coefficients are updated using the magnitude refinement coding operation. The output bit streams of coding passes can be further coded by using a context-based arithmetic coder known as the MQ coder to improve the compression performance. Based on the 8 neighboring coefficients, the MQ coder defines 18 context labels with their respective probability modes stored in the MQ table [29].

In JPEG2000, a large image can be partitioned into nonoverlapped subimages called tiles, each tile is decomposed into subbands by WT, each subband is divided into small blocks called code blocks, and each code block is independently coded from the most significant bit-plane to the least significant bit-plane. For optimal rate control, JPEG2000 adopts the postcompression rate distortion (PCRD) algorithm. Specifically, let {𝐵𝑖} be the code blocks of an image. The embedded code stream of 𝐵𝑖 can be terminated at some point, say 𝑛𝑖, with a bit rate denoted by 𝑅𝑛𝑖𝑖; all the end points of coding passes are possible truncation points. PCRD selects the optimal truncation points to minimize the overall distortion: 𝐷=𝑖𝐷𝑛𝑖𝑖 subject to the rate constraint: 𝑅=𝑖𝑅𝑛𝑖𝑖𝑅𝑐, where 𝐷𝑛𝑖𝑖 denotes the distortion incurred by discarding the coding passes after 𝑛𝑖, and 𝑅𝑐 is the target bit rate. It is noted that the coding passes with nonincreasing rate distortion slopes (RDS) are candidates for the optimal truncation points. Based on the above, we propose an efficient scheme to segment JPEG2000 images in the following section.

3. Image Segmentation in the JPEG2000 Domain

In this section, we modify the Compression-based Texture Merging (CTM) algorithm [32] to segment the MQ coder-based image features [28] in an adaptive manner. As a result, the image segmentation task can be conducted in the JPEG2000 domain directly, and the burden of decompressing computation can be avoided.

3.1. The MQ Coder-Based Image Feature

The distribution of wavelet coefficients known as the wavelet histogram has been widely used for image segmentation. As the binary variables of wavelet coefficients are almost independent across bit-planes [14], the joint probability mass function (PMF) representing the wavelet histogram can be approximated as𝑃(|𝑐|=𝑥)=𝑛1𝑖=0𝑃𝑖𝑥𝑖,(3.1) where 𝑥 is the absolute value of a wavelet coefficient, 𝑐, which can be written by𝑥=𝑛1𝑖=0𝑥𝑖2𝑖;𝑥𝑖{0,1},(3.2)𝑛 is the number of bit-planes, and 𝑃𝑖() is the 𝑖th bit-plane’s PMF. Based on the MQ table defined in JPEG2000, we proposed a simple scheme to estimate the local PMF [28]. Specifically, let 𝑃𝑖(𝑥𝑖=1) be the probability of 1 bit for variable 𝑥𝑖 on the 𝑖th bit-plane, which can be obtained from the MQ table as follows:𝑃𝑖𝑥𝑖==1Qe_ValueifMPS=0,1Qe_ValueifMPS=1,(3.3) where Qe_Value is the probability of less probable symbol (LPS) stored in the MQ table, and MPS stands for more probable symbol. Note that the set {𝑃𝑖(𝑥𝑖=1);𝑖=0,,𝑛1} obtained from the MQ table can be used to estimate the local PMF. As the MQ table is available at both encoder and decoder, there is no need to transmit the overhead information to construct the MQ features.

3.2. The Modified CTM Algorithm

In this section, we modify the CTM algorithm [32] to segment the MQ feature vectors of an image. With a set of MQ feature vectors, the number of coding bits can be approximated as𝐿(𝜀)=𝑁+𝐷2log2𝐷det𝐼+𝜀2𝑁Σ+𝐷2log2𝜇1+𝑇𝜇𝜀2,(3.4) where 𝜇 is the mean vector, Σ is the covariance matrix, 𝜀 is the distortion incurred, 𝐷 is the feature dimension, and 𝑁 is the number of feature vectors. For 𝐾 sets of MQ feature vectors, the total number of coding bits is given by𝐿tot(𝜀)=𝐾𝑖=1𝐿𝑖(𝜀)𝑁𝑖log2𝑁𝑖𝑁,(3.5) where 𝐿𝑖(𝜀) and 𝑁𝑖 are the number of coding bits obtained by (3.4) and the number of MQ feature vectors in the i𝑡 set, respectively, and 𝑁 is the total number of MQ feature vectors, that is, 𝑁=𝐾𝑖=1𝑁𝑖. The idea behind CTM is to merge two sets of feature vectors such that the coding bits can be reduced maximally. The pairwise merging procedure of CTM is performed iteratively until no merge can reduce the coding bits any more. As mentioned in [32], the termination of CTM is dependent on the distortion parameter, 𝜀, which can be determined by𝜀=min{𝜀𝑑(𝜀)𝛾},(3.6) where 𝑑(𝜀) is the distance between a pair of segments with respect to 𝜀.

Motivated by the rate distortion theory, which has been widely used in embedded image coding for optimal rate control, we propose a simple scheme to determine the candidates of 𝜀. Specifically, for a sequence of increasing distortion values: 𝜀1<𝜀2<, the number of segments and the total number of coding bits are monotonically decreasing, that is, 𝐾1>𝐾2> and 𝐿tot(𝜀1)>𝐿tot(𝜀2)>. The rate distortion slope (RDS) is thus defined as𝑆𝜀𝑖=Δ𝐷𝑖Δ𝑅𝑖,(3.7) whereΔ𝑅𝑖=𝐿tot𝜀𝑖1𝐿tot𝜀𝑖𝑁,Δ𝐷𝑖=𝜀i𝜀i1,(3.8) and 𝑁 is the number of MQ feature vectors. As RDS should be nondecreasing, that is,𝑆𝜀𝑖𝜀𝑆𝑖+1for𝜀𝑖<𝜀𝑖+1,(3.9) if 𝑆(𝜀𝑖)>𝑆(𝜀𝑖+1), 𝜀i can be considered as a candidate to terminate the merging process of CTM. Thus, we modify the selection of 𝜀 as follows:𝜀=max𝑖𝜀𝑖𝜀𝑆𝑖𝜀>𝑆𝑖+1𝜀,𝑑𝑖.𝛾(3.10)

Figure 1 depicts flowchart of the modified CTM with the RDS-based adaptive selection of 𝜀, where the MQ coder-based image features are projected into a low-dimensional space via principal component analysis (PCA) in order to reduce the computational cost further, and the initial superpixels are obtained by using the simple 𝐾-means algorithm. Take the image shown in Figure 2(a) as an example; the candidates of 𝜀 are shown in Figure 2(e), where the horizontal and vertical axes are the distortion and the RDS values, respectively. Figures 2(b)2(d) show the segmentation results with the first, second, and third candidates of 𝜀. As one can see, the rate distortion information can be used to avoid overmerging of CTM.

4. Experimental Results

The proposed algorithm has been extensively evaluated on the Berkeley database [33]. The 9/7-wavelet filters adopted by JPEG2000 are used to extract the MQ coder-based image features. The number of initial superpixels is set to 50. In addition to visual inspection, the boundary displacement error (BDE) and the probabilistic Rand index (PRI) [34] are used for quantitative evaluation. The segmentation results are compared with CTM, Mean-Shift and NCuts. In Mean-Shift, the parameters 𝑠 and 𝑟 are set to 13 and 19, respectively; in NCuts, the number of segments is 20. The threshold 𝛾 of CTM is set to 0.1, as suggested in [32].

We first evaluate the segmentation performance at various compression rates. Figure 3(a) shows a test image with two Brodatz textures, namely, wood and grass. Figure 3(b) depicts percentages of errors at various rates of bits per pixel (bpp). It is noted that the segmentation results even at low bpp rates are satisfactory; thus, a small portion of code stream is sufficient for the segmentation task. It has the advantage of saving transmission time, computational cost, and memory space, which are desirable especially for the Internet applications.

Table 1 shows the BDE and PRI performances compared to CTM, Mean-Shift, and NCuts. The proposed algorithm is preferable in terms of the average BDE.

The improvements in PRI and BDE using (3.10) are shown in Figures 4(a) and 4(b), respectively, where the horizontal axis is the threshold: 𝛾. It is shown that the proposed algorithm is more robust by taking account of the rate distortion information to avoid overmerging.

Figures 5, 6, 7, 8, 9, and 10 are representative of the Landscape, Objects, Urban, Water, Portraits, and Animals images in the Berkeley database. The original images are shown in the left column. The segmentation results by using the proposed algorithm and the CTM algorithm are given in the middle and right columns, respectively. It is noted that, for images with high-detail contents, the proposed algorithm improves the segmentation results visually.

5. Conclusion

The MQ coder provides effective probability models, which is available at both encoder and decoder and therefore can be used to extract image features in the JPEG2000 domain directly. As a result, no overhead transmission is necessary to extract the feature vectors, and moreover the burden of decompressing a JPEG2000 image can be avoided. Based on the MQ coder, an efficient scheme of segmenting an image has been proposed. In order to avoid overmerging, the CTM algorithm has been modified by taking account of the rate distortion information. The proposed algorithm has been evaluated on images with Brodatz textures and the Berkeley image database. It is shown that the segmentation results at low-middle bpp rates are rather promising. In addition, for images with high-detail contents, the proposed algorithm is preferable in terms of the average BDE measure and visual comparison.

Acknowledgments

The authors are grateful to the maintainers of the Berkeley image database. The National Science Council of Taiwan under Grants NSC100-2628-E-239-002-MY2 supported this work.