Abstract

Effective and efficient image comparison plays a vital role in content-based image retrieval (CBIR). The earth mover’s distance (EMD) is an enticing measure for image comparison, offering intuitive geometric interpretation and modelling the human perceptions of similarity. Unfortunately, computing EMD, using the simplex method, has cubic complexity. FastEMD, based on min-cost flow, reduces the complexity to (O( )). Although both methods can obtain the optimal result, the high complexity prevents the application of EMD on large-scale image datasets. Thresholding the ground distance can make EMD faster and more robust, since it can decrease the impact of noise and reduce the range of transportation. In this paper, we present a new image distance metric, , which applies a threshold to the ground distance. To compute , the FastEMD approach can be employed. We also propose a novel linear approximation algorithm. Our algorithm achieves complexity with the benefit of qualified bins. Experimental results show that (1) our method is 2 to 3 orders of magnitude faster than EMD (computed by FastEMD) and 2 orders of magnitude faster than FastEMD and (2) the precision of our approximation algorithm is no less than the precision of FastEMD.

1. Introduction

The development of advanced multimedia technology increases the importance of image retrieval, since large-scale image datasets are proliferating. Effective and efficient methods for computing the similarity between images is vitally important to content-based image retrieval (CBIR) systems.

The earth mover’s distance (EMD) is a famous measure for image retrieval [1]. It not only takes the corresponding bins into account, but also considers the correlations between noncorresponding bins and thus is robust to histogram shift and rotation. The effectiveness of EMD and/or its variants is witnessed in many application domains, such as image retrieval [14], common pattern discovery [5, 6], mathematical symbol retrieval [7], contour matching [8], texture retrieval [1, 9], shape matching [10], graph matching [11], visual event recognition [12], face verification [13, 14], visual tracker [15], and uncertain or probability data [16, 17].

The EMD defines the comparison of two histograms as the transportation problem. (In [1], EMD is used to compute the distance between two signatures. For the convenience of statement, we use histogram in this paper. In fact, our algorithm can be used to both histogram and signature.) The minimum cost to transport one histogram to another one is taken as the distance of the two histograms. Rubner et al. [1] employ the transportation simplex algorithm [18] to compute the EMD. Thus the computation of EMD is very expensive, with cubic time complexity (). Pele and Werman [19] present the FastEMD algorithm which is faster than the simplex algorithm. FastEMD is based on the min-cost flow algorithm, thus it still suffers from quadratic complexity (). Ling and Okada [10] propose EMD- which employs -norm as the ground distance. EMD- can reduce the number of variables in the transportation simplex algorithm from to , so the time complexity can be reduced to . But only using -norm as ground distance limits the application of EMD-. Rahimi and Kiram [20] propose a greedy algorithm with complexity of . Shirdhonkar and Jacobs [21] present a linear time approximate EMD algorithm, Wavelet EMD, based on a weighted wavelet transform. To approximate EMD, wavelet EMD uses the weighted wavelet coefficients of the difference histogram. Another linear time approximate EMD algorithm proposed by Jang et al. [22] is MHC which is used for dominant color descriptor (DCD). This algorithm employs Hilbert curves to fill the multidimensional color space and computes cost according to the Hilbert order of dominant color. To improve accuracy, MHC uses Hilbert curves to fill the -dimensional space, then computes cost values, and takes the minimum cost as the distance between two images.

Furthermore, EMD is not a metric if the two histograms have different total weight. To solve this drawback, Ljosa et al. [4] introduce a special bin, called the bank, for each image. This extra bin allows the two histograms to exchange mass. The bank has the same ground distance to all other bins. Pele and Werman [3] propose a variant of EMD, named , which takes the difference of two weights as an extra mass. The moving distance of the extra mass is times maximum ground distance. If the ground distance is a metric and , is a metric for all histograms.

The EMD with thresholded ground distance has proven to be more robust, since it can reduce the interference of noise and correspond to the way humans perceive distance [2, 3, 19]. With the help of thresholded ground distance, FastEMD can reduce the number of edges of flow network from to [19] and thus makes an improvement in both accuracy and speed.

In this paper, we present a new image distance metric, , based on thresholded ground distance. We prove that is a metric by deducing that equals with the same ground distance threshold. Therefore, we can employ FastEMD [19] to compute the value of . However, its quadratic complexity makes FastEMD a nonideal choice. According to the definition of , we derive a novel linear approximation algorithm which can compute very fast. In experiments we find that our algorithm is 2 to 3 orders of magnitude faster than EMD (computed by FastEMD) and 2 orders of magnitude faster than FastEMD. Moreover, the results of our approximation algorithm are no worse than the exact results.

In the remainder of the paper, we first review the earth mover’s distance and in Section 2. We then formalize the in Section 3. Section 4 proposes the details of our novel linear approximation algorithm. In Section 5, we experimentally evaluate the practical performance of our algorithm. We conclude this paper in Section 6.

2. Earth Mover’s Distance

The earth mover’s distance is a perceptual and intuitive measure between two histograms. EMD which defines the distance between two histograms as the minimum cost that must be paid to transform one histogram to another one is based on the famous transportation problem. We consider one histogram as a supplier with a given amount of products distributed at some places and another histogram as a demander that has some warehouses with a given limited capacity. To transport products from the supplier to the demander, the optimal solution is required to minimize the transportation cost. The following is the formal definition of EMD.

Definition 1 (earth mover’s distance). Given two histograms and with sizes and respectively, where is the weight of bin and is the weight of bin . Let be the distance between and and let represent the amount of weight transferred from to . The total cost of matching and is defined as with the following constraints:
The earth mover’s distance is defined as the minimum total cost for solving this problem; that is,

The EMD is a many-to-many matching method which can handle variable-size structures. If the ground distance is a metric and the total weights of two histograms are equal, that is, , EMD is a metric [1]. However, in many applications, the total weights of two histograms cannot be equal. To solve this drawback, Pele and Werman [3] present a new EMD variant, , which is defined as subject to the same constraints as the EMD.

If the two histograms are probability distributions (i.e., total weight is one) EMD and are equivalent. Pele and Werman proved that if and the ground distance is a metric, is a metric. The proof can be found in [3].

is more appropriate for two cases. One case is when the total weight of histograms is important. And the other one is when difference of total weight between histograms is distinguished [3].

3. New Distance Metric: EMD+

Thresholded distances are distances that are limited by a threshold. Let be the distance between and and let be a threshold; the thresholded distance between and is defined as .

As mentioned in [2, 3, 19], thresholding the ground distance can improve the performance of EMD family. Motivated by this thought, we design a new variant of EMD with thresholded ground distance. We call this new measure . If the ground distance is a metric, the is a metric. Firstly, we formally define in Definition 2.

Definition 2 (distance measure: ). Given two histograms and with sizes and , respectively, where is the weight of bin and is the weight of bin . Let be the ground distance between and , let be the amount of weight transferred from to , and let be a threshold of ground distance. If , moves as much mass as possible from to ; otherwise, does not move any mass from to . Support is the maximum amount that can be transported from to (). The distance between and is defined as s.t

We denote the total weight of and as and respectively. Without loss of generality, we assume . The condition in Definition 2 limits the range of bins that can be matched by . Only the bins whose distances to are smaller than the threshold have the chance to receive mass from . The optimal solution is the minimum total cost of moving amount of mass from to . After that, we ignore the limitation of capacity of and transport all the remainder weight of to with cost of .

Figure 1 shows two examples of . From Figure 1, we obtain , , and . Set -norm to be the ground distance and to be the threshold. We get If we calculate the EMD values, we get That is to say, , while . Actually, is more similar to than . Thus, has more discernibility.

In fact, the is equivalent to the with a thresholded ground distance and [19]. Because the is a metric, we can deduce that is a metric.

Theorem 3. If the ground distance is a metric, then EMD+ is a metric.

Proof. A thresholded ground distance was introduced in [19]. is the ground distance between and , and is a threshold. The thresholded ground distance is defined as . If is a metric, is a metric; a proof can be found in [19]. Without loss of generality, we assume . Set ; we rewrite as
Due to the fact that is a metric with , thus is a metric.

4. Fast Computation of EMD+

Intrinsically, has the same optimization problem as EMD and . Hence, can be computed by either transportation simplex algorithm [18] (e.g., EMD [1]) or min-cost flow algorithm (e.g., FastEMD [19]). However, the high complexity of simplex and min-cost flow algorithms drives us to develope an optimal method for efficient image retrieval. In this section, we proceed to develop a more efficient algorithm to obtain the value of .

In the definition of (Definition 2), there is a very important condition, , which restricts that each bin in one histogram can only match a small part of bins in another histogram (Figure 1). Taking advantage of this characteristic, we derive a linear approximate algorithm to compute . We will explain our basic idea by an example (Figure 2).

In Figure 2, it is clear that only the bins that fall into the solid or dash circle have the qualification of receiving mass from or , respectively. We call these bins as qualified bins. If is one qualified bin of , it is easy to get the qualified bin of at the corresponding position by paralleling . Inspired by this heuristic, we present to precompute all the possible qualified bins of a reference bin (e.g., ) and then utilize the possible qualified bins to improve the computation of .

To precompute the qualified bins, we propose a novel method which is composed of function (Algorithm 1) and (Algorithm 2) (in Algorithm 2, function getBins() uses -norm as the ground distance, in other cases it is distance dependent). first calls the function to enumerate all possible qualified bins of the reference bin (e.g., ) (line 3) and then sorts the bins in ascending order according the distances to the reference bin (line 4). Let be the th qualified bin to reference bin. For each bin , we can get its th possible qualified bin by paralleling . Notice that the reference bin itself does not appear in qualified bin set; the reason will be shown later.

   Input: : threshold; : dimensionality of histogram
   Output: Qualified bin set
(1)   ;   qualified bin set
(2) ;   reference bin, if ,
(3) getBins( );   Algorithm  2
(4) Sort( );

   Input: : one qualified bin; : qualified bin set;
    : threshold; : dimensionality;
    : current dimension.
   Output: Qualified bin set
(1)  for   to   do
(2)      ;
(3)    if     then
(4)      ;
(5)    if     then
(6)        ;
(7)        getBins( );
(8)      , ;
(9)     if     then
(10)     ;
(11)      getBins( );
(12)  else
(13)      if     then
(14)       ;
(15)      else
(16)       ;
(17)      if     then
(18)       ;
(19)      getBins( );

Let be the size of qualified bin set , ( is the dimensionality of histogram). Thus the getBins() (Algorithm 2) can be done in time and Sort() (line 4) can be done in time. In fact, is very small corresponding to the size of histogram and can be treated as a constant. Thus the time of Algorithm 1 can be ignored (in our experiments (Section 5), the time of precomputing qualified bins is no more than 0.001 second).

Next thing is to compute the value of , that is, the core of our idea. The details are depicted in the following.(1)The first step is to saturate all the zero-cost flows, since matching corresponding bins does not cause cost. After this step, many bins will be empty (Figure 2).(2)The second step focuses on cross-bin matching. We plan to transport as much mass as possible when the cost is as low as possible. We employ the qualified bin set generated by Algorithm 1 to accelerate the speed of our algorithm. We traverse from front to end. In the th iteration, we match each nonempty bin to its corresponding th qualified bin. If the th qualified bin of exists, we transfer the highest possible mass; otherwise, we do nothing. Since corresponding bins are already handled in the first step, we have no necessity to store the reference bin in the qualified bin set .

Algorithm 3 gives the pseudocode of our algorithm. The proposed algorithm is named as LinearEMD, because its complexity is linear. The very important parameter stores the qualified bins of original point (e.g., for 2-dimensional histogram). Notice that only needs to be computed once.

     Input: , : histogram; : threshold; : qualified bins
      of original point.
     Output: Distance between and
(1)       ;
(2)       ;
 //  Part 1, corresponding bin
(3)    foreach     do
(4)       if  corresponding exists  then
(5)        ; // : weight of bin
(6)        ;
(7)        ;
(8)        ;
(9)      if     then
(10)         remove from ;
(11)     if     then
(12)         remove from ;
 //  Part 2, cross bin
(13)  for   to   do
(14) foreach     do
(15)   if   th qualified bin exists  then
(16)        ;
(17)        ;
(18)        ;
(19)        , ;
(20)        if     then
(21)            remove from ;
(22)        if     then
(23)            remove from ;
 //  Part 3, compute the
(24) ;

LinearEMD is composed of 3 parts. Part 1 focuses on the corresponding bins (lines 3~12). Since corresponding bin has no cost (), we should move as much mass as possible from to . Then we delete the empty bins (lines 9~12). Once part 1 is done, and will shrink dramatically. The time complexity of part 1 is where . Part 2 adopts greedy idea to match cross bins (lines 13~23). The outer loop (line 13) guarantees that nearest bins get preferential treatment and each bin in is shifted to the same position. Ideally, many bins in or will be empty after several iterations. The speed will be faster and faster. Assume the average of bin numbers in each iteration is () and the size of is . is a constant. The complexity of part 2 is . Part 3 which gets the result of by a simple calculation has an expensive (line 31). Therefore, the total cost is ().

5. Experimental Evaluation

This section aims to experimentally test and verify the following things.(1)LinearEMD is efficient enough for massive image retrieval.(2)Thresholding the ground distance can get a more effective result.(3)The result set obtained by LinearEMD is very close the exact value.

To do that, we use image retrieval as an example to perform the experiments. Section 5.1 gives the setup of our experiments and Section 5.2 describes some evaluation criteria. The experimental results are shown in Section 5.3.

5.1. Experimental Setup

Wang’s dataset [23] is a benchmark for image retrieval. This dataset contains 1,000 color images in 10 classes. Each class has 100 images. Since there are some ambiguous images in this dataset, we use a subset of Wang’s dataset filtered by [19] (we downloaded the filtered image list from http://www.seas.upenn.edu/ofirpele/FastEMD/). This subset is composed of 773 images. We exact the first 5 images from each classe as the queries. We perform our experiments on 5 different size RGB color histograms, that is, , , , , and , respectively. We use RGB color histogram because it is simple and has the ability to estimate the experimental goals. We do not claim that RGB color histogram is the best descriptor for image retrieval.

We compare with (in this section, when we say , we mean with thresholded ground distance), EMD-, and EMD. To compute , we use LinearEMD algorithm (we implement this algorithm in C++). For EMD-, we use the authors implementation [10] (we downloaded the code from http://www.dabi.temple.edu/hbling/code_data.htm/). For , we use FastEMD [19] (we downloaded the code from http://www.seas.upenn.edu/ofirpele/FastEMD/code/). For EMD, we also use FastEMD [19], since simplex is very expensive. The ground distance of EMD- is, of course, -norm. For others, we choose both - and -norm. For and , the ground distance thresholds are 3.0, 3.5, 4.0, 4.5, and 5.0 corresponding to the size of RGB color histogram, respectively. We set the coefficient of to be equal to 1. Since equals , the value of (computed by LinearEMD) is approximate value and the value of (computed by FastEMD) is the exact one.

The experiments are run on a computer with one Intel Xeon E5645 2.40 GHz CPU, 8 GB memory. The OS is CentOS release 6.3 and gcc version is 4.4.6.

5.2. Evaluation Criteria

This section introduces some criteria to fairly evaluate the performance of these distance measures and algorithms.

To evaluate the speed, we compare the average runtime of each query. The effectiveness is measured by the average number of correct retrieved images. We define the mean overlap rate (MOR) to calculate the common images retrieved by LinearEMD and FastEMD. MOR is the average Jaccard coefficient: where is the number of query, is the result set of the th query retrieved by algorithm , and is the result set of the th query retrieved by algorithm .

To measure how close the values computed by two methods are, we employ the symmetric mean absolute percentage error (SMAPE) [24]. SMAPE is defined as follows: where is the number of common retrieved images and and are the values of the th image computed by algorithms and .

5.3. Results

Firstly, we compare the speed of each method. Figure 3 shows the average running time of each query. Obviously, our algorithm exceeds other methods in both ground distances. Our algorithm is faster by two to three orders of magnitude faster than EMD, two orders of magnitude faster than and an order of magnitude faster than EMD-. The larger the size of RGB color histogram, the better our algorithm. When the size of RGB color histogram is , the LinearEMD is 300 to 500 times faster than the FastEMD and nearly 3000 to 4500 times faster than the EMD. Therefore, the LinearEMD is competent enough to handle massive image retrieval task.

The second thing of this section is to compare the retrieved images of each method. Figure 4 shows the average number of correct retrieved images of each query. It is obvious that thresholding the ground distance of EMD (i.e., and ) can obtain better retrieval result. That is, and are more robust and have more discernibility. and with norm as ground distance can achieve better results than -norm as ground distance. However, for EMD, as ground distance retrieves more images. Generally, () (() denotes with -norm as the ground distance; () and EMD() have similar meanings) gets the best results and () is better than (). In addition, Figure 4 illustrates the precision of and will be improved while the size of histogram is increased. Although our method is the approximate algorithm, the precision of our method is no less, even higher, than the exact algorithm (FastEMD). Figure 5 shows an example of retrieved images. The images retrieved by (), (), and () are all correct, yet the last image retrieved by () is incorrect. However, there are only 2 correct images in the results retrieved by EMD-, EMD(), and EMD(), respectively. This result shows the advantage of thresholding the ground distance once again.

The third thing is to count the common images retrieved by our algorithm and the FastEMD. To this end, we employ the mean overlap rate (MOR). Figure 6(a) shows the MOR of the two 50 nearest neighbors obtained by our algorithm and the FastEMD (we only count the correct retrieved images). The MOR()s (MOR() denotes the MOR value of LinearEMD and FastEMD with -norm as the ground distance; SAMPE() has the same meaning.) are between 95% and 98% and the MOR()s are between 93% and 95%. Although the MOR()s are a little bit lower than the MOR()s, they are still very high. Figure 6(a) indicates that the results between our algorithm and the FastEMD share many common images.

Finally, we compute the SMAPE of our algorithm and the FastEMD. Like the MOR, we use the two 50 nearest neighbors and only count the correct retrieved images. The SMAPE results are shown in Figure 6(b). Obviously, the SMAPE()s are lower than the SMAPE()s. In spite of this, the maximum value is lower than . Figure 6(b) illustrates that the values computed by our approximate algorithm highly approach the values computed by the FastEMD.

In one word, our approximate algorithm is an optimal choice for comparing images.

6. Conclusion

In this paper, we have presented a new variant of EMD with thresholded ground distance, named as . We proved that is a metric for any kind of histogram. To compute the value of , we have proposed a linear approximate algorithm. This algorithm is an effective and efficient method for image retrieval. Experimental results show that our method break through the comparing methods. And thus, our method is competent enough for massive image retrieval task. Our method may also be able to compare other histograms (e.g., HSV color histogram) and descriptors (e.g., SIFT).

Conflict of Interests

The authors declare that there is no conflict of interests regarding the publication of this paper.