#### Abstract

A novel reversible data-hiding scheme is proposed to embed secret data into a side-matched-vector-quantization- (SMVQ-) compressed image and achieve lossless reconstruction of a vector-quantization- (VQ-) compressed image. The rather random distributed histogram of a VQ-compressed image can be relocated to locations close to zero by SMVQ prediction. With this strategy, fewer bits can be utilized to encode SMVQ indices with very small values. Moreover, no indicator is required to encode these indices, which yields extrahiding space to hide secret data. Hence, high embedding capacity and low bit rate scenarios are deposited. More specifically, in terms of the embedding rate, the bit rate, and the embedding capacity, experimental results show that the performance of the proposed scheme is superior to those of the former data hiding schemes for VQ-based, VQ/SMVQ-based, and search-order-coding- (SOC-) based compressed images.

#### 1. Introduction

With the explosive growth of communication through the Internet, information processing and management at any time have become a standard service for most people, subsequently inducing security problems such as interception, modification and montage. Thus, one critical issue is how to find effective ways to protect information transmission over the Internet with safety and security. To cope with this, data hiding techniques have attracted attentions for being able to embed secret data into a cover image with minimal perceptual degradation.

In general, data hiding techniques can be classified into two categories, namely, reversible data hiding schemes and irreversible data hiding schemes. For irreversible data hiding schemes, only secret data can be extracted, while restoration of cover images is unavailable. The irreversible data hiding scheme is not suited for certain applications that demand the original cover image to be unaltered by the data hiding process. Conversely, reversible data hiding schemes can extract the secret data and recover the original cover images in the decoder. Moreover, data hiding schemes can be performed in three possible formats, that is, in spatial domain [1–5], in frequency domain [6–8], and in compressed domain [9–15]. In spatial domain, Tian [1] proposed a scheme for reversible data hiding, called difference expansion (DE). Tian’s work expands the value differences between two neighboring pixels to embed a bit. This scheme is easy to realize, yet the embedding capacity heavily depends on the smoothness of an image. Ni et al. [2] utilized the zero of the minimum points of the histogram in an image and slightly modified the pixel values to embed secret data into the peak points of the histogram of an image. The scheme is simple for implementation, and which supports high visual quality. However, the embedding capacity is restricted by the peak points of the histogram of an image. Moreover, a cover image can be transformed into the frequency domain with various possible transform kernels, for example, DCT, DFT, or DWT. Normally, frequency domain techniques embed message by modulating the transformed coefficients of subbands with just noticeable difference (JND) according to the sensitivity of the human visual system (HVS).

Data hiding techniques in compressed domain can relish the both advantages of data hiding and compression for a multimedia distribution. Jo and Kim [9] firstly proposed an irreversible VQ-based scheme which is easy to realize, yet the hiding capacity is rather small. Later, many VQ/SMVQ-based schemes with indicators are proposed. For example, Chang et al. [10] developed a VQ-based data hiding with recovery capability. The scheme has low embedding capacity and high bit rate (BR) because of an indicator added in front of most encoded indices, and only two clusters are used to hide secret data. In [11, 12], a secret data hiding was proposed based on the search-order coding compression method of VQ indices to increase embedding capacity. The search-order coding selects the neighboring indices of the encoding index to form a state codebook, and then search-order code (SOC) and original index value (OIV) are employed to hide a secret bit 0 or 1. A 1-bit indicator is always employed to distinguish between SOC and OIV*.* For this scheme, the original cover image can be restored completely. Shie et al. [13] developed an adaptive data hiding based on VQ-compressed image, in which image blocks are classified into embeddable and unembeddable blocks based on the variances and side-match distortions (SMDs). The embeddable blocks are employed to hide secret data, while those unembeddable blocks remain unchanged. It is necessary that indicator be used for the block judgment. Although this scheme can yield a high embedding capacity, the quality of the reconstructed image reduces as the quantity of the secret data increases. In addition, this scheme is a case of irreversible information hiding. In [14], both VQ and SMVQ are used to hide secret data, in which a 1-bit indictor is always required for each index and only 1-bit secret data can be embedded in each index when the VQ technique is applied. Conversely, more than one bit secret data can be embedded in each index when the SMVQ technique is applied. This scheme can provide a higher hiding capacity, yet it is an irreversible data-hiding scheme. In [15], Yang and Lin extended the scheme proposed by Chang et al. [10] by dividing the VQ codebook into clusters, and half of which are used to embed secret data, where BS denotes the size of the secret data embedded into each VQ index. In the scheme, both of the VQ and SMVQ are applied to hide secret data and a 1-bit indictor is required for the index identification. Moreover, it is a reversible data-hiding scheme. In these schemes, indicators are always used, which causes a higher bit rate.

By exploiting the special indices distribution of a SMVQ-compressed image that frequently used indices are encoded by short codes while rarely-used indices are encoded by long codes, this work proposed a novel reversible data hiding in the SMVQ-compressed domain. The scheme can increase the hiding capacity and completely restore the cover image from an embedded image after the hiding data is extracted. Moreover, the indices of a VQ-compressed image can be relocated close to zero by using SMVQ predications. SMVQ indices located around zero can be encoded using fewer bits. Compared with the previous schemes [10, 12, 15], in which the bit rates are increased because indicators are required in most of the indices, the proposed scheme employs the SMVQ predication, and thus most of the indices are encoded without indicator. As a result, high embedding capacity and low bit rate scenarios can be achieved. As documented in the experimental results, the performance of the proposed scheme is significantly superior to that of the previous works [10, 12, 15] in terms of the embedding rate, bit rate, and embedding capacity.

The rest of this paper is organized as follows. Section 2 presents the related works to briefly introduce the concepts of VQ and SMVQ. Section 3 describes the proposed scheme in detail. Section 4 gives the experimental results and performances comparisons of the proposed scheme with former approaches. Finally, Section 5 draws conclusions.

#### 2. Backgrounds

##### 2.1. Brief Concept of the VQ

VQ initially involves codebook construction from a set of training images; the training images are partitioned into nonoverlapping blocks and the most representative blocks are selected to form codebook, in which the elements in the codebook are called codewords. In general, the LBG algorithm [16] is employed to produce the desired codebook. With the generated codebook, each block in an image is encoded with the index of the nearest codeword, such that the total storage space for an image is minimized. To measure the similarity between a block, and a codeword, , where is the th codeword in the codebook, the squared Euclidean distance is generally used as below:

When the encoding process is completed, block is simply represented by the index , where is the nearest codeword of the block . For decoding process, the table lookup is performed with the same codebook as that used by the encoder.

##### 2.2. SMVQ

Kim initially proposed SMVQ for image coding [17]. SMVQ exploits the high correlation among neighboring blocks and side-match prediction to create a state codebook with a size smaller than that of the original VQ codebook. The side-match prediction assumes that the values of the adjacent pixels between the neighboring blocks are equal. In SMVQ, blocks located in the first row and first column of an image are encoded by traditional VQ and the remaining blocks are predicted using the corresponding state codebook. Figure 1 shows an example of the relationships among an encoding block , its upper neighboring block , and its left neighboring block for blocks of size . This work denotes the border vector and the side vector of block as and . The SMD between a block ,predicted by blocks and , and a codeword cw in the codebook is measured by the squared Euclidean distance denoted as follows:

For each block, SMVQ sorts the codebook CB according to SMDs, and thus the first codewords in the sorted codebook are picked to form the state codebook SC, in which SMDs are the smallest for block . Subsequently, the SMVQ searches the closest codeword from the state codebook for block , and then the index of the closest codeword is used to encode block . Because the size of the state codebook is smaller than that of the VQ codebook, the size of the SMVQ index is reduced to improve the coding gain. However, there is a distortion between the block, decoded by the SMVQ index, and the corresponding block, decoded by the VQ index.

#### 3. The Proposed Scheme

A grayscale cover image of size is partitioned into nonoverlapped blocks of pixels, in which . Normally, the variables . Each block of the image can be encoded using VQ, and thus each block is represented with an index of the nearest codeword in the codebook. A VQ-compressed image, denoted by , consists of indices. The size of an index is proportional to the number of codewords in the codebook. A SMVQ-compressed image is generated by applying SMVQ predictions to a VQ-compressed image. To reconstruct a VQ-compressed image completely, the size of the state codebook and that of the VQ codebook are first set as identical. Next, the codebook is sorted according to the SMDs which denote the differences between an encoding block and the codewords in the VQ codebook. Finally, the state codebook is generated using all of the codewords in the sorted codebook. For a VQ index, the corresponding SMVQ index can be obtained using the state codebook and no distortion is presented between the two codewords decoded by the two indices.

Figures 2 and 3 show histograms of the VQ-compressed Peppers image and the corresponding SMVQ-compressed image, respectively, with a codebook of size 256. Based on the SMVQ predication, most indices of the SMVQ-compressed image are distributed close to zero. For a SMVQ-compressed image, a very small value, binary digits ( bits) representing unsigned integers from 0 to , can cover most of the indices. In the proposed scheme, the numbers are employed as indicators or/and encoding codes. Table 1 shows the functions of each number of the binary digits ( bits). A binary number among 0 and represents an encoding code and indicator. Binary numbers and represent indicators, denoted by *indicator 0* and *indicator 1*, respectively.

##### 3.1. The Encoding Process

Suppose the size of a secret data, sd, is bits. During the encoding process, a SMVQ index with a value less than , which is defined as a tiny-value index, is denoted by , and the corresponding encoded index is denoted by . Then, is expressed in (3), where and denote exponential and concatenation operators, respectively,

Most of the SMVQ indices can be encoded using (3) including no indicator, and the rest are encoded with the help of *indicator 0* or *indicator 1*. However, the adoption of the indicator increases the size of the encoded result. A SMVQ index with a value in between and is defined as a middle-value index. Indices with middle values are encoded using *indicator 0* concatenated with an encoding space with bits. A middle-value index is denoted by , and the corresponding encoded index is denoted by which can be expressed in the following:

The -bit encoding space of (4) is not to hide data but to collect more indices. *Indicator 1* is employed to assist encoding a SMVQ index which cannot be encoded using (3) or (4). A SMVQ index with a value more than is defined as a large-value index, denoted by , and the corresponding encoded index is denoted by which can be expressed in the following:

Equation (5) not only hides no data but also increases an additional indicator with bits. For a SMVQ-compressed image, most indices with tiny values are encoded using (3), which hides bits and generates an encoded result with bits for each index; a few indices are encoded using (4), which generates an encoded result with bits for each index; few indices are encoded using (5), which generates an encoded result with bits for each index, where cs indicates the size of a codebook. Table 2 shows the ranges and symbols of tiny-value, middle-value, and large-value indices. The size of an embedded image can be measured with (6), where denotes the size of and, , and denote the number of indices with tiny values, middle values, or large values, respectively:

In (6), and can be controlled in less or more than bits and only the third item always requires a space with more than bits. It is remarkable that a larger leads to a larger embedding capacity, and vice versa. Moreover, a larger increases , while decreasing , and vice versa. Based on the special index distribution property of the SMVQ, most indices can be encoded using (3), in which no indicator is required and -bit data are hidden in each index, and thus a high embedding capacity and a low embedded image size can be achieved in the encoding process. The algorithm of the embedding process is summarized as follows and an example of the index classification and encoding schemes is shown in Figure 4.

*Input.* A grayscale cover image of size , a codebook CB of size cs, a secret data stream, and parameters and .

*Output.* The code stream in binary form and parameters and .

*Step 1. *Compress the cover image using VQ to obtain the VQ-compressed image of size .

*Step 2. *Perform SMVQ predictions with the VQ-compressed image to create the SMVQ-compressed image of size .

*Step 3. *Read the next SMVQ index, denoted as , from the SMVQ-compressed image with the raster scan order.

*Step 4. *If (a tiny-value index), then we have the following.

*Step 4.1.* Read -bit data, sd, from the secret data stream,

*Step 4.2.* is encoded using (3), , where denotes a concatenation operation, and D2B denotes a decimal to binary operation.

*Step 5. *If is in between and (a middle-value index), then is encoded using (4), .

*Step 6. * If (a large-value index), then is encoded using (5), .

*Step 7. *Repeat Steps 3–7 until all of the indices in the SMVQ-compressed image are processed.

*Step 8. *Output the code stream and parameters and .

Figure 5 shows an example of the encoding process for hiding 5-bit secret data. Let be *indicator 0* and *indicator 1*, where , , , and , and the size of the state codebook is identical to that of the VQ codebook. For each image (Figure 5), let to be the indices located in the first row and to the indices located in the second row, and so on. For the SMVQ-compressed image, has a value 0, and it is less than . The index is encoded using (3). Secret data are (00000)_{2}. According to (3), the index with bits followed by secret data (00000)_{2} is presented as the encoded result for . has a value of 3, and it is in between and . An indicator , 2, followed by the offset index, 1, is the encoded result for . has a value of 127 and it is more than . An indicator , followed by the index, 127, is the encoded result for . All of the encoded results are in binary form, and the length of the encoded result of each index is , or bits using (3), (4), or (5), respectively.

##### 3.2. The Extraction and Reversion Process

The goal of the data extraction and reversion process is to extract the embedded secret data from an embedded image and then recover the original VQ-compressed image. The extraction and reversion process is introduced as below. For an encoded index, bits are firstly read from code stream and the -bit binary value is converted to the decimal value, denoted by . In the first case, if is less than , the encoded index is an index. Then, the original index is directly restored by . Next, read bits from the code stream to reconstruct the secret data. In the second case, if is , the encoded index is an index. Then, read the next bits and convert them into decimal value, denoted by ; the original index is recovered by . In the third case, if is , the encoded index is a index. Then, read the next bits and convert them into decimal value ; the original index is recovered by . When the original SMVQ-compressed image has been reconstructed, SMVQ predictions can be employed to restore the original VQ-compressed image. In summary, the extraction and reversion process of the proposed scheme is organized as below.

*Input.* The code stream in binary form and parameters and .

*Output.* The reconstructed VQ-compressed image of size and the retrieved secret data stream.

*Step 1. *Set ptr = 1 and the restored sd = Null.

*Step 2. *Read the next bits from the code stream and convert the bits to decimal value, denoted by , , .

*Step 3. *If (a tiny-value index), then restored index = ; read the next bits, denoted by sd′, from code stream , restored sd = restored sd sd′, .

*Step 4. *If (*indicator 1*), then read the next bits from the code stream *c*_bs, = B2D(, restored index = .

*Step 5. *If (*indicator 0*), then read the next bits from the code stream , , , restored index = .

*Step 6. *Repeat Steps 2–6 until all of the bits in the code stream are processed.

*Step 7. *Apply the SMVQ prediction to the reconstructed SMVQ-compressed image to obtain the original VQ-compressed image.

*Step 8. *Output the reconstructed VQ-compressed image and the restored secret data stream.

#### 4. Experimental Results

In this section, we present the experimental results for evaluating the performance of the proposed scheme in terms of the embedding rate, bit rate, and embedding capacity. Various gray-level images as shown in Figure 6 are employed as the training images or cover images, each of which is of size . The codebooks of sizes 128, 256, 512, and 1024 with codewords of 16 dimensions were trained by the LBG algorithm with “Airplane,” “Boat,” “Lena,” “Peppers,” and “Sailboat” as the training images.

**(a) Airplane**

**(b) Lena**

**(c) Peppers**

**(d) Boat**

**(e) Sailboat**

**(f) Tiffany**

**(g) House**

**(h) Elaine**

**(i) Zelda**

**(j) Gold hill**

The secret data in the experiment is in binary format, 0 and 1, and which are generated by a pseudorandom number generator. If a high-level security is required, secret data can be encrypted prior to the embedding using the well-known cryptographic methods such as DES or RSA. In this work, the embedding rate (ER), bit rate (BR), and embedding capacity (EC) as defined in (7)–(9), respectively, are employed to evaluate the performance of various schemes. In general, a data-hiding method with a small value in BR and large values in EC and ER endorses a good performance, and vice versa:

##### 4.1. Experiments on Selecting the Appropriate Parameter

To obtain the best performance of the proposed scheme, the parameter is needed to be properly determined. As mentioned, when and are set to a larger value, , EC, and BR are increased. The encoding length of a VQ index is bits. A VQ index encoded as an or index yields an additional space with bits, the space can be used to hide secret data or collect more indices. Experiments are performed to choose an appropriate value which can achieve the highest embedding rate for different codebooks and different images. First of all, and are set to . Figure 7 shows the performance of the proposed algorithm using different , codebook sizes, and test images. For codebooks of sizes 128 and 256, SMVQ provides accurate prediction, and the most of the VQ indices are predicted by the codewords with smaller indices in the SMVQ state codebook. In these cases, the highest ERs can be obtained when is set to a small value 3. For codebooks of sizes 512 and 1024, must be set at greater values, 4 or 5, to obtain a high ER. For each image, larger ERs appear as with values of 3, 3, 4, and 5 for codebooks of sizes 128, 256, 512, and 1024, respectively.

**(a)**

**(b)**

**(c)**

**(d)**

The experiments are performed based on the selected various with the four codebooks of sizes 128, 256, 512, and 1024. Experimental results for the test images are shown in Table 3, in which different columns of the simulation results are obtained with (cs, , , ) = (128, 3, 4, 4), (256, 3, 5, 5), (512, 4, 5, 5), and (1024, 5, 5, 5), respectively. From Table 3, it can be seen that the BRs of the proposed scheme are only somewhat larger than that of the original VQ BRs and the embedding capacities of the proposed scheme are very high. Moreover, for most images the high embedding capacities are still obtained when the sizes of codebooks are increased. The smooth image, such as “Tiffany,” has a high embedding capacity than that of the complicated image, such as “Boat,” since the SMVQ prediction normally presents good performance for a smooth image, thus many indices are encoded with indices. With the proposed method, when the codebooks are of sizes 128, 256, 512, and 1024, the average ECs are 3.246, 3.637, 3.848, and 3.872 bits/index, respectively. In addition, for applications of secret communication, can be set to a value smaller (larger) than to obtain a lower (higher) embedding capacity, and consequently the corresponding bit rate can be changed as well.

##### 4.2. Experiments on Comparing the Proposed Method with Other Methods

To demonstrate the superiority of the proposed algorithm, it is compared with the VQ-based scheme proposed by Chang et al. [10], the VQ/SMVQ-based scheme proposed by Yang and Lin [15], and SOC-based scheme proposed by Shie and Lin [12] as detailed below. To ensure all of them can be fairly compared, the proposed scheme is performed with the , , and values are set to obtain an embedding capacity close to that of the other schemes, and the codebook size for each comparison is set as identical.

The results on the left and the middle of Table 4 show the ERs, BRs, and ECs of the schemes developed by Chang et al. [10] and Yang and Lin [15], respectively. The results on the right of Table 4 show the ERs, BRs, and ECs of the proposed scheme. In Table 4, Chang et al.’s scheme is VQ-based, Yang and Lin’s scheme is VQ/SMVQ-based, and the proposed scheme is SMVQ-based. First of all, , , and of the proposed scheme are set to obtain an embedding capacity close to that of the other schemes. Thus, when the codebooks are of sizes 128, 256, 512, and 1024, (, , ) are set at (2, 3, ), (3,3, ), (3,3, ), and (4,3, ), respectively. SMVQ can relocate the histogram of the VQ indices to locations near zero. In the proposed scheme, the indices with values close to zero are encoded using short codes without any indicator, while the indices of values away from zero are encoded using the original index with an indicator. Based on the characteristic of the SMVQ, most of the indices are with small values, and only a few indices have large values. Thus, this property leads to low bit rate and high embedding capacity with the proposed scheme. In Chang et al.’s [10] scheme, when the number of embeddable indices is increased, the embedding capacity of each index is decreased, and vice versa. The satisfactory EC cannot yield. Moreover, most indices are encoded by an indicator concatenated with the encoding code. The length of an indicator or an encoding code is identical to that of the original VQ index. The bit rate of an embedded image is thus increased. In Yang and Lin’s [15] scheme, a 1-bit indicator is always used to distinguish a SMVQ encoding from a VQ encoding and other indicators are required for VQ encoding, leading to a high bit rate result. Compared with the other schemes, the proposed scheme has low bit rate yet high embedding capacity and embedding rate. In particular, when the embedding capacities of the proposed scheme are controlled to values close to that of the other schemes, the proposed scheme presents much lower BRs, even smaller than that of the original VQ BRs, for all of the test images.

The results on the left of Table 5 show the ERs, BRs*, *and ECs of the SOC-based scheme with codebooks of sizes 256 and 512, respectively, developed by Shie and Lin [12]. The results on the right of the Table 5 show the ERs, BRs, and ECs of the proposed scheme with codebooks of sizes 256 and 512, respectively. In Shie and Lin’s [12] scheme, a 1-bit indicator is always used to distinguish a SOC index from an OIV index, in which a SOC is encoded using bits, while an OIV index is encoded using bits. The variable indicates the length of a search path, and which is set to 2 to yield a better performance. The proposed scheme encodes and indices using and bits, where is set to 3 to obtain embedding capacities close to that of Shie and Lin’s method. The length of a VQ index is , in which or bits are used to encode an index or a SOC index, and the remaining bits can be used to hide secret data. In Table 5, first of all, the encoded result for each or SOC index of the two schemes is set to bits, which yields a hiding space of bits for a SOC index and bits for a index. For each test image with a specific codebook size, the bold-number columns in Table 5 show that the EC values of the proposed scheme are significantly greater than that of Shie and Lin’s scheme, while lower BR values and higher ER values can be achieved with the proposed scheme.

Next, the hiding space of Shie and Lin’s [12] scheme is set to bits, which can achieve a high embedding space, and the length of the encoded result for a SOC index is 1 + 2 + bits. Because the proposed scheme has good performance in terms of ER, BR, and EC, , , and the hiding space of the proposed scheme is set to 3, and bits, the set can offer an EC close to that of Shie and Lin’s scheme and decrease the length of the encoded result of an index to bits. For each codebook size and test image, the unmarked columns in Table 5 show that the EC values of the proposed scheme are still much greater than that of Shie and Lin’s scheme under the high hiding space, while the proposed scheme can preserve low BR values and high ER values. Notably, the “Zelda” and “Gold hill” images are not the training members for proper selection. Yet it till can achieve superior performances.

As mentioned, the proposed scheme outperforms the VQ-based scheme proposed by Chang et al. [10], the VQ/SMVQ-based scheme proposed by Yang and Lin [15], and SOC-based scheme proposed by Shie and Lin [12] in three aspects, that is, embedding rate, bit rate, and embedding capacity. Thus, the proposed method can be considered as effective coding candidate for reversible data hiding application.

#### 5. Conclusions

In this paper, a novel reversible data-hiding scheme for SMVQ-compressed images is proposed. In most VQ/SMVQ hiding schemes, the bit rate is increased when an indicator is required to encode an index. The proposed scheme applies the SMVQ prediction to relocate most of the indices of a VQ-compressed image to locations close to zero. Consequently, these small-value frequently used indices are encoded using fewer bits, and no indicator is required, which thus increases the embedding capacity and decreases the bit rate. The experimental results demonstrate that the proposed scheme can completely extract the secret data and recover the original VQ-compressed image. Moreover, comparing to other schemes, the proposed scheme can achieve better efficiency under each codebook size for various test images.

#### Acknowledgment

The work was supported by National Science Council, Republic of China, Taiwan, under Grant NSC100-2221-E-182-047-MY3.