Abstract

Multimedia data is a popular communication medium, but requires substantial storage space and network bandwidth. Vector quantization (VQ) is suitable for multimedia data applications because of its simple architecture, fast decoding ability, and high compression rate. Full-search VQ can typically be used to determine optimal codewords, but requires considerable computational time and resources. In this study, a hybrid VQ combining a tree structure and a Voronoi diagram is proposed to improve VQ efficiency. To efficiently reduce the search space, a tree structure integrated with principal component analysis is proposed, to rapidly determine an initial codeword in low-dimensional space. To increase accuracy, a Voronoi diagram is applied to precisely enlarge the search space by modeling relations between each codeword. This enables an optimal codeword to be efficiently identified by rippling an optimal neighbor from parts of neighboring Voronoi regions. The experimental results demonstrated that the proposed approach improved VQ performance, outperforming other approaches. The proposed approach also satisfies the requirements of handheld device application, namely, the use of limited memory and network bandwidth, when a suitable number of dimensions in principal component analysis is selected.

1. Introduction

With the development of wireless networks and handheld devices, multimedia data has become a popular communication medium. Generally, it requires substantial storage space and network bandwidth. However, handheld devices that offer limited memory and network bandwidth are seriously restricted in multimedia data applications. To overcome these problems, vector quantization (VQ), which yields a simple architecture, rapid decoding ability, and high compression rate, is widely used to reduce the size of multimedia data [15]. Thus, the ability of VQ to balance storage and computational complexity is useful in handheld device applications that involve multimedia data. VQ then could be applied to different applications such as 3D medical image compression [6], speech coding [7], information hiding [8, 9], and artistic style learning [10].

A VQ is defined as a mapping function that maps a -dimensional vector space into a finite subset . Each vector is called a codeword and a set with codewords is called a codebook. Generally, VQ involves three procedures: the codebook design, encoding, and decoding procedures. During the encoding procedure, an input vector is turned into an index value that indicates the location of the chosen codeword in , allowing the size of the multimedia data to be substantially reduced. Conversely, during the decoding procedure, a codebook lookup operation is applied to determine the optimal codeword, allowing the quantized multimedia data to be reconstructed. Therefore, the quality of reconstructed multimedia data depends on the codebook and chosen codewords. The codebook design procedure is used to design the codebook used in the encoding and decoding procedures. A typical codebook is obtained from numerous training vectors extracted from a set of multimedia data by using a clustering algorithm. The well-known Linde-Buzo-Gray (LBG) clustering algorithm [11] is widely applied in codebook construction.

Numerous approaches have been proposed to improve codeword search efficiency and these can be classified as the full-search equivalent and partial-search approaches. Full-search vector quantization (FSVQ) is the most simple and intuitive method, but it is inefficient at large codebook sizes [12]. Full-search equivalent approaches [1316] have been proposed to save computational time. These approaches not only use rough distortion elimination instead of executing the entire dimensions, but also filter out impossible codewords by using certain rejection conditions. However, the search times using these approaches are larger than those when using partial-search approaches. Moreover, regarding multimedia data, a certain degree of distortion is tolerable; thus, developing a partial-search-based VQ is critical when using multimedia data in handheld devices.

Partial-search approaches primarily use specifying data structures to organize codebooks and search for optimal codewords within a local search domain. This enables such approaches to save substantial computational time. Among partial-search approaches, tree-structured vector quantization (TSVQ) is one of the most effective approaches [17]. In TSVQ, a binary tree is applied to reduce computational complexity from to . However, the codeword determined by TSVQ is not sufficiently near the input vector, yielding poor quality of the reconstructed multimedia data. To improve TSVQ performance, multistage (MTSVQ) [18] and closest-couple TSVQ [19] have been used to determine the optimum codeword by enlarging the search space. However, these approaches may traverse wrong direction paths caused by the decision criteria in each node, degrading performance.

To avoid traversing wrong direction paths, projection-based VQ approaches [2024] have been proposed. Most of these approaches applied principal component analysis (PCA) to preserve most information of data, which is called the ratio of preserved information. This allows a vector to be transformed into a low-dimensional coordinate system; geometric measurements are subsequently applied to determine the nearest codeword. Cui et al. applied a B+ tree to rapidly locate an initial codeword and subsequently used -nearest neighboring candidates to determine an optimal codeword [23]. Because the candidates of each leaf node may overlap, the candidates are not well organized for locating the optimal codeword. Voronoi diagrams [2427] can be used to efficiently interpret the relations between neighboring codewords, and Chang and Wu [24] used a Voronoi diagram to model the geometric interpretation of neighboring codewords in a transformed two-dimensional (2D) coordinate system. By using Voronoi edges and vertices, an initial codeword was quickly found, and a planar-oriented ripple search algorithm was then applied to enlarge the search space. However, it was limited by the 2D space of the Voronoi diagram; this prevented it from precisely modeling the geometric interpretation.

In this study, a hybrid vector quantization (HVQ) combining a tree structure and a Voronoi diagram is proposed to accelerate the VQ quantization process. By retaining certain suitable principal components of PCA, sufficient information regarding relations between each pair of codewords is preserved in the projected coordinate system. A B-tree that indexed the principal components is used to rapidly determine an initial codeword. To precisely enlarge the search space of the initial codeword stored in a leaf node, a Voronoi diagram is applied to generate the Voronoi regions and their corresponding neighbors. However, because increasing the number of principal components causes a corresponding increase in the number of neighbors for each Voronoi region, the neighbors of each Voronoi region are clustered into groups, and a greedy search algorithm is applied to rapidly search for the optimal codeword.

The rest of this paper is organized as follows. Section 2 describes the HVQ, including the tree structure, Voronoi diagram, and greedy search algorithm. Section 3 presents the results of a series of experiments examining the performance of the proposed approach. In Section 4, the conclusion and recommendations for future research are presented.

2. Hybrid Vector Quantization

In this section, a new HVQ combining a tree structure and a Voronoi diagram is presented to improve performance beyond that of previously proposed methods. Figure 1 shows the HVQ structure, which combines a tree structure and a Voronoi diagram. First, a tree structure in a selected dimensional space is applied to quickly find an initial codewords. Second, Voronoi diagram is used to enlarge the search space of an initial codeword. Finally, a greedy search algorithm is proposed to search an optimum codeword in Voronoi diagram. The procedure for constructing the hybrid structure is detailed in the following subsections. PCA-based tree structure and Voronoi diagram are detailed in Sections 2.1 and 2.2. Then, the greedy search algorithm based vector search procedure is presented in Section 2.3.

2.1. PCA-Based Tree Structure

Because the expected minimal distance between two vectors in a high-dimensional space is large, codewords tend to be nearly equidistant to an input vector, rapidly degrading performance [28]. Selecting a suitable number of codeword dimensions could overcome this problem; thus, PCA efficiently reduces dimensions when a ratio of preserved information is selected. In PCA, codewords are normalized to be zero mean and unit variance (denoted as ) and allowing a covariance matrix to be obtained as follows: where denotes the mean of codewords and is the size of the codebook. Using the covariance method, the eigenvalues , where , and the corresponding eigenvectors can be computed. For a specified application, the computational complexity and ratio of preserved information in the projected subspace, , is balanced by selecting a suitable . The ratio of preserved information, , is defined as follows:

Using , the codewords in are easily projected to , resulting in a set . is then used to construct a tree structure to rapidly determine an initial codeword. A pseudocode representation of the tree construction is as shown in Pseudocode 1.

Input: a set of projected codewords
Output: a tree-structure based codebook
Tree construction procedure:
(1) root = new node // obtain the root of the tree structure
(2) ConstructTree(root, ) // apply the recursive function to construct the tree
(3) return root
Function ConstructTree(node, W)
(4) According to the set of input codewords, W, apply the LBG algorithm to
estimate two centroids, and
(5) Let and
(6) Each is assigned to or depending on whether w is closer to
or
(7) If the number of is greater than a predefined threshold then {
(8)    = new node
(9)   ConstructTree( , )
(10) }
(11) Else {
(12)
(13)  Find a codeword such that is minimum
(14)
(15) }
(16) If the number of is greater than a predefined threshold , then {
(17) = new node
(18) ConstructTree( , )
(19) }
(20) Else {
(21)
(22)  Find a codeword such that is minimum
(23)
(24) }

Lines 1–3 are used to generate a root for tree structure and the recursive function ConstructTree allows the construction of the entire tree structure for . In ConstructTree, a predefined threshold is applied to limit the levels of the tree structure. When is set as 1, each leaf node is linked to a codeword in in a one-to-one and onto relationship. Regarding each internal node, and are used to traverse the paths, allowing an initial codeword stored in leaf node to be found.

2.2. Voronoi Diagram Structure

The search space of the initial codeword can be enlarged to reduce distortion. Using a Voronoi diagram to interpret relations of neighboring codewords in [26, 27] allows the search space of the initial codeword to be enlarged. Using the PCA detailed in Section 2.1, is then used to generate the Voronoi diagram structure, which is a partition of . A set of Voronoi regions for can then be obtained and denoted as . is the Voronoi region of and must satisfy the following property:

It is clear that each Voronoi region contains only one projected codeword. Moreover, all points in are closer to than to any other projected codewords; thus, must be a nonempty set. A Voronoi region is bounded by Voronoi edges and Voronoi vertices, but some Voronoi regions are necessarily unbounded. In this study, a lifting-based algorithm, using tangent hyper planes, was used to construct a Voronoi diagram, as detailed in [28]. Figure 1 exemplifies a Voronoi diagram structure.

Increasing the number of dimensions of the Voronoi diagram structure substantially increases the number of neighbors of each Voronoi region. To reduce the search space, centroids of neighbors are estimated using the LBG algorithm and used to partition the search space into groups. An input vector is guided to efficiently search for an optimal codeword by using the partition with the closest centroid.

2.3. Greedy Search Algorithm

In this subsection, a greedy search algorithm is proposed as the VQ encoding procedure; this can be summarized in three steps. First, an input vector is projected to by using the eigenvectors proposed in Section 2.1. Second, a tree-structure-based search algorithm is adopted to find an initial codeword. In this step, two centroids stored in the internal nodes of the tree structure are used to traverse a path, exhibiting minimal distortion. This allows finding a leaf node contained an initial codeword and linked to a Voronoi region . Third, the neighboring Voronoi regions of are adopted to enlarge the search space, searching for a suitable optimal codeword. In this study, using neighboring regions to enlarge the search space is referred to as “rippling.” To accelerate the search process by using the Voronoi diagram, the neighboring Voronoi regions are classified into groups, and the centroid of the th group is denoted as . Increasing the number of ripples, the number of Voronoi regions to search correspondingly increases. To reduce this effect, a Voronoi region with minimal distortion is chosen from the neighboring Voronoi regions and is searched for the optimal codeword. A pseudocode representation of the greedy search algorithm is as shown in Pseudocode 2.

Input: an input vector x, a root node of tree structure root, and a set of Voronoi regions V
Output: an optimal codeword
Greedy search algorithm:
(1) Project x to , and denote the projected vector as
(2) Call = to find a Voronoi region
(3) Call = to find an optimal codeword with a
predefined number of ripples, , and set the minimal distance is as infinite
(4) Return
Function
(5) If then {
(6)   If then
(7)    Return a Voronoi region
(8)   Else
(9)    Call ,
(10) }
(11) Else {
(12)   If then
(13)    Return a Voronoi region
(14)   Else
(15)    Call ,
(16) }
Function , , ,
(17) If > 1 then {
(18)  Find a such that is minimized
(19)  Let be the neighbors in group of Voronoi region
(20) Find a such that the distance, , between and the codeword of is minimized
(21)  If then {
(22)    , , ,
(23)   Return
(24)  }
(25) }
(26) Return

A recursive strategy is used in the TreeSearch and GreedyRipple functions, searching for an initial codeword and a corresponding Voronoi region and traversing the neighboring Voronoi regions to locate an optimal codeword. In the proposed approach, Lines 20–24 yield an optimal codeword in a Voronoi region and traverse the Voronoi regions in the Voronoi diagram structure.

3. Experimental Results and Discussion

To evaluate the proposed approach, mel-frequency cepstrum (MFCC) based vectors that comprise 39 dimensions (12-MFCC, 12-MFCC, 12-MFCC, Energy, Energy, and Energy) were used in this study [29]. Two sets of 6654 and 3026 vectors were generated based on the TCC300 [30] and treated as the training and testing sets, respectively. Using the training set, the LBA algorithm was adopted to construct a codebook comprising 2048 codewords (denoted as CB2048).

3.1. Experimental Results of the Ratio of Preserved Information

First, the effect of the ratio of preserved information was examined. Regarding vectors comprising 39 dimensions, the ratios of preserved information were set as 56.56%, 68.38%, and 83.23% for the remaining 2, 3, and 4 principal components, respectively, by setting the variances of normal distributions. Then, 5012 vectors and 1024 vectors were randomly generated to form the simulated training and testing sets, respectively. A codebook comprising 2048 codewords was then constructed to analyze the characteristics of the proposed approach when using various ratios of preserved information.

The proposed approach was examined using one ripple to find an initial codeword and corresponding optimal codeword. The experimental results in Euclidean distance are shown in Table 1. The effect of the tree structure was analyzed, indicating that the depths of the tree structures for codebooks comprising various numbers of dimensions were almost the same. Thus, the effect of the number of dimensions can be neglected. Consequently, only the number of neighboring Voronoi regions was considered in this experiment. Increasing the ratio of preserved information from 56.56% to 82.23%, the distortion was reduced from 4.105 to 3.385, and the distortion reduction rate was 17.54%. Thus, the ratio of preserved information substantially affected the accuracy of the initial codeword.

Second, the effect of the Voronoi diagram was examined using one ripple and treating the neighboring Voronoi regions as one group. Increasing the number of dimensions from 2 to 4 increased the average number of neighbors for a Voronoi region from 6.4 to 39.1, decreasing the distortion of the optimal codeword from 3.227 to 2.197. The distortion reduction ratio was 31.92%. Clearly, increasing the ratio of preserved information greatly improved accuracy and increased the computational complexity. Thus, selecting a suitable ratio of preserved information is necessary to attain a balance between accuracy and search speed.

3.2. Experimental Results of Finding Initial Codeword

Because the number of dimensions and levels of a tree structure substantially affect performance, CB2048 was adopted to analyze these two factors. To control the depths of the tree structures, the number of tree nodes was limited to 256, 512, 1024, and 2048. In addition, the number of dimensions was set as 2, 3, and 4.

Figure 2 shows the experimental results, indicating that distortion was reduced by either increasing the number of dimensions or depths of the tree structure. First, we analyzed the effect of the depth of tree structure. Increasing the nodes of the tree structure from 256 to 2048 yielded distortion reduction rates of 6.68%, 13.21%, and 16.57% for 2, 3, and 4 dimensions, respectively. Performance can be improved by increasing the number of depths in a tree structure, but the distortion reduction rate at 4 dimensions was much higher than was that at 2 dimensions. Therefore, the ratio of preserved information plays a critical role in establishing decision criterion for the tree structure.

Second, we analyzed the effect of the number of dimensions. Increasing the number of dimensions from 2 to 4 yielded distortion reduction rates of 12.75%, 16.06%, 19.22%, and 22.00% after limiting the nodes of tree structure to 256, 512, 1024, and 2048, respectively. These results indicated that the distortion reduction rates for controlling the number of dimensions were higher compared with those for controlling the nodes of the tree structure. Thus, the number of dimensions yielded a greater effect on performance than did the number of depths.

Finally, we analyzed the storage and computation requirements for the tree structure. To simplify the analysis, the tree structures were treated as balanced trees, and the number of traversed nodes in the tree structure was treated as , where is the number of nodes. Hence, the number of traversed nodes was considered 8, 9, 10, and 11 for 256, 512, 1024, and 2048 nodes, respectively. When the number of tree nodes increased from 256 to 2048, the traversed nodes would only increase three nodes. Therefore, the differences in computational times for various numbers of tree nodes can be ignored in many applications. However, the storage requirement necessitates 1792 additional nodes. This serious increase in required storage presents a critical problem in applications with limited memory. When we increased one dimension of tree structure, it needed additional 8 to 11 comparison operations for 256 and 2048 nodes, respectively. This would also increase one additional storage space in each node. However, the additional storage space of increasing the dimensions was smaller than that of increasing number of depths.

To overcome this disadvantage, we proposed enlarging the search space of the Voronoi diagram. The experimental results are shown in Figure 3. The distortion for HVQ with 2 dimensions and 256 tree structure nodes was 6.771. The distortion for HVQ with 4 dimensions and 2048 tree structure nodes was 6.525. These results are similar, allowing the storage requirement to be reduced without degrading performance; thus, HVQ is suitable in applications with limited storage, such as handheld devices and microcontroller units.

3.3. Experimental Results of Searching Optimal Codeword

CB2048 was adopted to examine the characteristics of the Voronoi diagram structure. The number of neighboring Voronoi regions of each Voronoi region was counted when the number of dimensions was 2, 3, and 4. These results are shown as histograms in Figure 4. The average numbers of neighboring Voronoi regions were 7.0, 17.1, and 42.4 for 2, 3, and 4 dimensions, respectively. Approximately 99.46%, 97.61%, and 93.31% of the Voronoi regions at 2, 3, and 4 dimensions exhibited less than 19, 34, and 79 neighbors, respectively. Therefore, the number of nodes in the search space enlarged by the ripple strategy is greatly increased by , where and are the numbers of neighboring Voronoi regions and ripples. This increase in computational time is unsuitable for various applications.

To overcome this disadvantage, two approaches were proposed and examined: enlarging the search space based on the optimal neighboring Voronoi regions and retaining parts of the neighboring Voronoi regions to search for the optimal neighboring region. At 4 dimensions, most Voronoi regions had less than 79 neighbors; thus, the neighboring Voronoi regions were divided into two groups. Because the full-search strategy is inappropriate for most applications, HVQ, which enlarges the search space by using an optimal neighboring Voronoi region, was investigated. Retaining half of the neighboring Voronoi regions to search for the optimal neighboring Voronoi region (denoted as HVQ_GS) was then adopted to compare the performance. The maximal number of ripples was set at 3 because this yielded acceptable results.

Figure 5 shows the experimental results of the HVQ, expressed as the distortion and numbers of traversed Voronoi regions. In Figure 5(a), the distortion reduction rates are approximately 70% for the optimal codewords searched using one-ripple Voronoi regions in the Voronoi diagram structure. This indicated that the relations between codewords were correctly modeled using the Voronoi diagram structure. Furthermore, the reduction in distortion when the number of dimensions was increased was greater than it was when the number of ripples was increased. Therefore, the number of dimensions was determined to be the primary factor involved in reducing VQ distortion.

Figure 5(b) shows the computed number of traversed Voronoi regions. Enlarging the search space by one ripple yielded average numbers of traversed Voronoi regions of 6.31 and 37.32 at 2 and 4 dimensions, respectively. Enlarging the search space by three ripples yielded average numbers of traversed Voronoi regions of 13.93 and 77.72 at 2 and 4 dimensions, respectively. The number of traversed Voronoi regions was increased by increasing the number of dimensions; however, this increases the computational complexity.

To reduce the computational complexity without considerably degrading accuracy, HVQ_GS was proposed. Figure 6 presents the experimental results. Comparing Figures 6(a) and 5(a) indicates that the accuracy slightly degraded. Comparing Figures 6(b) and 5(b) demonstrates that the proposed approach halved the number of traversed Voronoi regions. Thus, HVQ_GS was determined to be an appropriate solution for reducing computational complexity.

3.4. Comparisons with Other Approaches

Finally, other existing VQ algorithms were compared with the proposed approach. Chang and Wu [24] showed that PVDS outperforms other approaches such as DTPC [20], SCS [21], TSVQ [31], and HOSM [32]; therefore, PVDS was chosen as the baseline approach. Because the proposed approach involved applying a binary search algorithm based on 2D projection and Voronoi vertices to find an initial codeword, the number of dimensions for Voronoi diagram must be less than 3. Thus, HVQ and 2 dimensions, denoted as HVQ(2), were compared with PVDS. Three search strategies, namely, full search, tracking an optimal neighboring Voronoi region based on all neighboring Voronoi regions, and tracking an optimal neighboring Voronoi region based on half the neighboring Voronoi regions, were adopted and denoted as HVQ(4F), HVQ(4), and HVQ_GS(4), respectively.

Figure 7 shows the experimental results. HVQ(2) and PVDS performed similarly at one ripple. As the number of ripples increased, PVDS achieved a more satisfactory performance than did HVQ(2). HVQ(2) only selects optimal neighboring Voronoi regions to determine the optimal codeword; this substantially reduces the search space, easily attaining a local minimum solution. In low-dimensional space, searching all neighboring Voronoi regions could avert this situation. However, searching all neighboring Voronoi regions is unfeasible in high-dimensional space because of the large number of neighboring regions.

Reduction of the search space of a Voronoi diagram structure with high dimensions was investigated. At one ripple, the distortions of HVQ(4F), HVQ(4), and HVQ_GS(4) were smaller than were those of PVDS and HVQ(2), nearly reaching the optimal performance of PVDS and HVQ(2). Moreover, HVQ increased the search depth to achieve optimal performance more efficiently compared with the full-search approach. Thus, a Voronoi diagram structure that uses high dimensions can precisely enlarge the search space, thereby enhancing performance. Furthermore, a tree structure that exhibits high dimensions can be used to precisely locate an initial codeword. Increasing the number of dimensions improves the performance of both the tree and Voronoi diagram structures. Therefore, HVQ that uses one ripple was a suitable approach to reduce the search space of a Voronoi diagram structure with high dimensions.

To evaluate its performance, the proposed approach was compared with that of tree-structure-based VQ and projection-based VQ. Dynamic path tree-structured VQ [33] was chosen to represent tree-structure-based VQ and was denoted as DP_TSVQ. To compare this approach with enlarging the search space by using the Voronoi diagram structure, a tree structure whose leaf nodes were linked to -closest codewords was proposed and denoted as Tree_N. PVDS was selected to represent projection-based VQ. To objectively compare these approaches, the number of traversed Voronoi regions or -nearest codewords was set at approximately 50. Table 2 lists the experimental results. The proposed approach outperformed the other approaches, and increasing the number of dimensions by using the tree structure and the Voronoi diagram enhanced performance. HVQ can balance computational complexity and accuracy when a suitable number of dimensions is selected.

4. Conclusion

In this study, an HVQ combining a tree structure and a Voronoi diagram is proposed to improve VQ efficiency. A tree structure integrated with PCA was successfully applied to reduce the search space in low-dimensional space, allowing an initial codeword to be efficiently located. To improve accuracy, a Voronoi diagram was proposed to precisely model the relations among each codeword, enlarging the search space of the initial codeword. The search space in high-dimensional space was greatly reduced by retaining parts of neighboring Voronoi regions to search for an optimal codeword. According to the experimental results, the proposed approach improved VQ performance and outperformed other approaches. Moreover, it satisfied the computational complexity, storage, and precision requirements when suitable numbers of dimensions were selected. Therefore, the proposed approach is suitable for implementation in handheld devices with limited memory, computational capability, and network bandwidth.

Conflict of Interests

The authors declare that there is no conflict of interests regarding the publication of this paper.

Acknowledgment

The authors would like to thank the National Science Council of the Republic of China, Taiwan, for financially supporting this research under Contract NSC 102-2221-E-218-001.