#### Abstract

Mutual information (MI) has been widely used in multisensor image matching, but it may lead to mismatch among images with messy background. However, additional prior information can be of great help in improving the matching performance. In this paper, a robust Bayesian estimated mutual information, named as BMI, for multisensor image matching is proposed. This method has been implemented by utilizing the gradient prior information, in which the prior is estimated by the kernel density estimate (KDE) method, and the likelihood is modeled according to the distance of orientations. To further improve the robustness, we restrict the matching within the regions where the corresponding pixels of template image are salient enough. Experiments on several groups of multisensor images show that the proposed method outperforms the standard MI in robustness and accuracy and is similar with Pluim’s method. However, our computation is far more cost saving.

#### 1. Introduction

The matching of multisensor images is an important preprocessing step for many applications, such as navigation, data-fusion, and visualization tasks. Due to different physical characteristics of different sensors, the relationship between the intensities of the matching pixels is often complex. Visual features present in one sensor image may not appear in other ones, and vice versa; image contrast may differ from each other in the same regions; multiple intensities in one image may be mapped to a single intensity in other images, and vise versa. The uncorrelated intensities make the normal intensity-based matching methods ineffective; therefore, the matching of multisensor image becomes challenging in computer vision.

Previous works on multisensor image matching can be generally categorized into two classes [1]. The first class is to find some invariant representations among multisensor images, such as point feature [2], edge feature [3], contour feature [4] and edge orientation maps [5], then the problem can be solved by utilizing the traditional monosensor matching methods. Nevertheless, the feature extraction has been regarded as one of the most difficult problems in image processing, especially in multisensor cases, as the feature present in one image may not appear in other ones. Additionally, these methods usually omit most of the image details, which may lead to a wrong matching.

The second class is to use an invariant similarity measure of images. These methods are conducted directly in the intensity domain, and they have been successfully used in a variety of multisensor registrations. Within these methods, the images are treated as combinations of different sets. Each set stands for a separate gray level, assuming that the same component of the same object is consistent in one image, and the pixels of the same component with intensities near in one image correspond to the pixels with intensities near in the other image. In other words, there may be no pixel-to-pixel correspondence in intensity, but there do exist set-to-set correspondences between two images; namely, ignoring the intensity property, the topology of the corresponding sets must be similar between two images. The corresponding relationship can be well shown by the joint intensity scatter plot (JISP) or the joint histogram (JH) [6], and by measuring their dispersion one can quantify the matching degree. One of the most successful measures is the entropy. The lower the entropy is, the more tightly clustered the scatter plot will be and, hence, the closer the matching will be. Mutual information (MI) [7, 8] or normalized mutual information (NMI) [9] has been considered as the state-of-the-art matching frameworks of the entropy-based methods.

Nevertheless, the robustness of the MI-based matching methods is questionable. A mismatched result of the NMI between an infrared (IR) and a visible (VS) image is shown in Figure 1, in which statistical relationship between the images is decreased by the messy background. In addition, a “Grey stripe” registration also demonstrates the pitfall of NMI in [10].

**(a)**

**(b)**

Recently, many approaches have been proposed to improve the MI by incorporating additional spatial information. Studholme et al. [11] introduce a prior of labeled connecting regions as an additional information channel to the MI. Guo and Lu [12] adopt the gradient vector flow (GVF) to enhance the original intensity images before using the MI, which improves the success rate. Kim et al. [13] propose a new method based on a 3D joint histogram (JH), which integrates the intensity information and the edge orientation information. The method provides more robustness. Lee et al. [14] extend [13] by using a modified GVF to extend the capture range of gradient-based MI. Wang and Tian [15] encode the images into 32 bins according to the gradient magnitude and orientation and then compute the MI directly on the encoded field, which improves the computation efficiency of the MI. Liu et al. [16] improved this method with an adaptive combination strategy of an intensity-based MI and a coded gradient-based MI. Fan et al. [17] propose a method that combines the spatial information through a feature-based selection mechanism, such as the Harris corner, and the JH is increased to 4D. Loeckx et al. [18] propose conditional mutual information (cMI) as a new similarity for image registration; cMI starts from a 3D joint histogram incorporating, besides the intensity dimensions, also a spatial dimension expressing the location of the joint intensity pair. Pluim et al. [19] modify the MI-based method by multiplying the MI with a term of the gradient. The gradient term not only seeks to align locations of high gradient magnitude but also aims for a similar orientation of the gradients at these locations. This method is referred to as GNMI in this paper.

These improvements for the MI-based methods strongly depend on the invariant characteristics mentioned in the first class, for example, the gradient information of magnitude and orientation, as well as the corner features. Actually, since they take full advantages of the two classes, we classify them as the third class and regard this additional information as prior information in this paper.

It should be pointed out that some methods which introduce the prior into MI are problematic. For example, in [15], the pixels are encoded according to their gradient magnitude and orientation. The intensities in the standard MI are substituted with gradient codings. It is known that the gradient orientation appears to be parallel to each other in multisensor images, and they must be distributed along the diagonal of JH in the true matching. However, as the codings function similarly as the intensities of the standard MI in JH, and MI measures the dispersion of JH only, this method still gives a high correlation even though the compact bins are located far away from the diagonal of JH. Moreover, the gradient magnitudes participating in the coding are not so reliable, as the magnitudes vary a lot in multisensor images [13, 14]. Some better methods are presented in [13, 16, 19], in which they treat the measure of gradient vectors between multisensor images as a weight of the MI. However, they suffer from expensive computation.

In this paper, a new similarity measure that incorporates the prior information into MI by the Bayesian method is proposed. For simplicity, it is named as Bayesian MI (BMI) here. BMI is deduced by substituting classical probabilities in MI with the Bayesian probability, such as the prior, the likelihood, and the posterior. For example, the classical conditional probability can be regarded as a likelihood probability or a posterior probability in the Bayesian framework and through which we can introduce some subjective experiences in the estimation. To avoid the statistic of JH, the BMI is defined directly on each pixel pair, and the images can then be measured by the mean BMI of the valid matching pixel pairs.

Then, an implementation of the BMI is proposed based on the prior information of gradient. Since the magnitude information between multisensor images is not so reliable, we only concern whether their magnitudes are salient or not and treat the salient ones as the saliency. Generally, saliency of image contributes more in matching [20, 21], so we restrict the pixel pairs of matching within a mask where the pixels of the template image are salient.

In the implementation, the orientation of each pixel is defined firstly. The saliency pixels are defined as the absolute value of their orientations, while the nonsaliency ones are defined as zero. Secondly, the prior distribution is estimated based on the kernel density estimate (KDE) method, and the likelihood between orientations is modeled by the Gaussian function according to their distance. The closer the orientation is, the more similar the pixel pair will be. Finally, the sequential statistic BMI based on the gradient information is modeled as well.

At the end of this paper, experiments are conducted on six groups of images which are captured by different sensors, and the results suggest that the proposed method is more robust and more accurate than the standard NMI and is similar to Pluim’s method [19]. Moreover, our computation is far more cost saving.

The rest of this paper is outlined as follows. Section 2 briefly introduces the standard MI, and the proposed BMI is deduced by Bayesian formula. Based on the gradient prior information, the application of the BMI is shown in Section 3. In Section 4, some experimental results and discussions are presented. Finally, conclusions are drawn in Section 5.

#### 2. Proposed Method

##### 2.1. Mutual Information

The MI method is of general significance because it makes few assumptions on image intensities. It assumes neither linear correlation nor even functional correlation but only statistical dependence [10].

Given two images, and , one can define their joint probability (JP) by simply normalizing the JH. Let and denote the corresponding probabilities, which can be obtained directly from . The MI between and based on Shannon entropy is given by

Apart from (1), MI can also be described in many other forms, as summarized by [22]: where and are the Shannon entropy of images and , which indicate the amount of uncertainty about and , respectively. and denote the conditional entropy, indicating the uncertainty left in or when or is known. is the joint entropy, which indicates the total uncertainty of and . When is known, the conditional probability of is noted as ; it can be easily deduced from their JP by the classical probability theory.

In fact, can be regarded as the self-information of , and the entropy represents its mean. Similarly, the mutual information between and can be defined as and can be deemed as the mathematical expectation of .

##### 2.2. Bayesian Mutual Information.

BMI is deduced directly from the MI by regarding the conditional probability in (2) as a posterior probability, which can be estimated according to the prior information based on the Bayesian method.

Let be the prior probability; then, according to the Bayesian formula, we have where and denote the likelihood probability and evidence probability, respectively. can be estimated through the joint of prior and likelihood by the total probability formula. Here, is used for distinguishing the Bayesian probabilities from the classical probability . Using (2) and (4), the BMI can be defined as

Similar to the MI, (5) needs the statistic of JP beforehand. For convenience, we refine (5) into a sequential statistic form. As , where is the total pixel number in the matching, is the Kronecker function, and are the pixel values of and at , respectively. Thus, (5) can be rewritten as where denotes the BMI of the pixel pair . Obviously, a higher BMI can be achieved from higher likelihood and lower evidence. Equation (6) is applicable regardless of whether and are discrete or continuous, while (5) is for the discrete form only.

BMI can be comprehended favourably using the communication theory when regarding , and as the message source, the channel, and the sink; respectively, BMI is just the evaluation of the correct rate after transmission [23].

It should be noticed that , , , and their logarithmic values can be calculated offline; moreover, as , the cost of our computation can be remarkably saved by utilizing the lookup-table (LUT) technique.

#### 3. Implementation

In this section, we are about to implement the proposed BMI based on the prior information of the gradient. Actually, BMI is universal as long as the prior and the likelihood can be modeled, such as the intensity and the local statistic between monosensor matching, the gradient orientation and magnitude, and the edges but not the intensity of multisensor images.

As mentioned in the introduction, the saliency includes most of the information [24]; thus, it would contribute more in matching. In [21], Qin et al. extract the salient vector for each pixel using the principal axis analysis (PAA). The more similar the corresponding vectors are, the higher weight the pixel pairs will be in JP. In [20], Kim et al. extract the salient regions from expanding the edges; then the MI is calculated on those regions in which the images are both salient. This method enhances the statistical correlation between the IR and VS images.

In this study, we simply define the saliency as those pixels whose gradient magnitude are higher than a threshold, which is correlated to the mean of the full image. Different from Kim’s scheme, the proposed BMI is computed only on those pixel pairs in which the template image is salient, regardless of the reference image.

Let and denote the reference image and the template image, respectively, and denotes the subimage cropped from the top-left corner of , which has the same size as . Both of the two images are filtered by a Gaussian window to suppress the noise.

The realization of the BMI includes the gradient orientation mapping, the prior probability estimation, and the likelihood probability modeling

It should be noted that, in the following implementation, it is assumed that there is only translation between two images, so are the experiments. This assumption does not affect manifesting the effect of our method.

##### 3.1. Gradient Orientation Mapping.

Let and be the gradient orientation maps of and , and is the coordinate of pixel. Taking for an example, it is defined as where is a function which returns the phase of vector , is its magnitude, and is the sign function, which reverses the vectors in the 3rd and 4th quadrant and limits the to . is the threshold which separates the gradient of salient structures from noises, and where a is an empirical percentage of the mean magnitude, is the total pixel number of . Practically, . Here, the fact that gradient orientations are parallel to each other between multisensor images is utilized. In other words, their orientations are either the same or opposite. When the gradient magnitude is small, is set to zero, and the pixels with horizontal orientation are defined as .

Let be the resolution angle; then we can get the discrete form of (7)

In this paper, the pixels of are named as zero-code pixels, and those of are named as nonzero-code pixels, and those nonzero-code pixels are defined as saliency. The saliency of forms the mask in which the matching is conducted, and we denote the set of the corresponding coordinations as .

##### 3.2. Prior Probability Density Estimation

Normally, the prior probability can be defined by the histogram of . However, when the size of the template image is small (small sample size problem), the histogram will be incapable to character the probability precisely by the classical probability theory, which requires as many samples as possible. For this problem, kernel density estimate (KDE), or Parzen window method [25, 26], is a good solution. KDE has been widely used for nonparametric distribution estimation [27]. Therefore, the probability density of can be written as where is the kernel function and corresponds to the bandwidth. The larger is, the smoother will be. Constant makes .

Actually, the KDE method distributes each to its neighbors by a bounded with and makes more accurate and more robust than the discrete one. Frequently used kernel functions include Uniform, Triangle, Epanechnikov, and Normal [28]. In this study, the Normal Gaussian kernel function is employed as kernel function.

##### 3.3. Likelihood Probability Density Model

Likelihood is a subjective probability in our work. It is a connection between the prior and the evidence. In this section, we sort the valid pixel pairs under the mask into two classes: nonzero-code to nonzero-code matching and nonzero-code to zero-code matching, as shown in Figure 2. Their likelihoods will be modeled, respectively.

As for the first class, its likelihood can be modeled according to their distance between and . It is clear that the closer the orientation is, the higher the likelihood will be. Taking two random orientations, and , as an example, their likelihood probability density can be modeled as where is a hat-like function which is high in the middle but low on both sides, for example, the Gaussian function. defines its bandwidth. is the intersection angle between the two orientations, and , which is calculated by is a constant which ensures satisfy the boundary condition of .

Figure 3 shows an example of when . It indicates that the bigger the intersection angle is, the lower the likelihood will be, and it approaches to 0 as tend to . Combining (10) and (11), the evidence probability density can be obtained:

The continuous function, , is just a smoothed form of by using the function , and they will be over-lapped when , as shown in Figure 4.

With regard to the nonzero-code to zero-code matching, it can be treated as mismatching, and its likelihood can be defined as 0. Finally, we have

##### 3.4. Practical Issues

Now, the BMI of and can be derived by substituting , , , and in (6) with , , , and , respectively. First, some issues should be noticed.(1)BMI will be defined as 0 if the likelihood equals 0, such as the second class.(2)We can see that there do exist some pixel pairs that satisfy , which results in a negative BMI. Sometimes, the mean BMI may be negative, too, especially in those mismatched locations. But it never occurs in the standard MI [22]. In this study, the pixel pairs with negative BMI are regarded as mismatched, and their BMIs are limited to zero, so that they are consistent with the second class.

Finally, the BMI-based matching formula between and can be derived as

The true matching of the template image can be obtained by maximizing the through out the interval of the reference image .

#### 4. Experimental Validation

To evaluate the performance of the proposed method, firstly, ten pairs of IR and VS images are used in the test as shown in Figures 5 and 6, respectively. The IR images are captured by a long-wave IR camera, whose sizes are , and they are taken as the template image; the VS images are captured by a CCD camera, whose sizes are , and they are taken as the reference images. All the images are captured on a sunny afternoon, with only some random translation transform between them. As the ground truth is unknown for the sets of real images, it is estimated by manually matching in Photoshop. We also test these four methods on two pairs of image scropped from [29], as shown in Figures 7 and 8.

**(a)**

**(b)**

**(a)**

**(b)**

The matching results obtained from the proposed method are compared with the NMI, GNMI [19], and GCMI methods [15]. These methods are chosen because they have been widely used in multisensor image matching, and they provide relatively good matching performance. In the NMI and GNMI methods, the densities are estimated by JH with a fixed number of 64 * 64 bins for both images. The intensity values are linearly mapped into these bins. The parameters used in these methods are , , and , the matching results of our method are shown in Figure 10.

The comparison results of the NMI, GNMI, GCMI, and our method are shown in Figure 9. A global searching strategy was applied to get a full view of the correlation curve. The figures indicate the following.(1)The BMI outperforms the NMI significantly. It can be seen that (1) the difference between the highest and the 2nd higher peak, which measures the interference suppressing ability, is more salient in BMI than in NMI through all the tests. In particular, the NMI mismatches in the 8th, 9^{th}, and 11st test, while the BMI is still applicable. (2) The highest peak of BMI, which measures the position accuracy, is much sharper than that of NMI. The matching results are shown in Table 1, from which we can see that all the methods have matching errors, but our method is more accurate.(2)Because GNMI, GCMI, and BMI adopt gradient information as measure, the performances of our method are similar to the GNMI and GCMI; they all have similar matching accuracy and the salient highest peak. However, as discussed previously, the computation of our method is remarkably cost saving when the LUT technique is used. In the MATLAB environment, the cost ratio of the four methods is about 1 : 2 : 3 : 4 (BMI : GCMI : NMI : GNMI), and the proportion can be further improved when using other program tools, such as the C and C++ language.

**(a)**

**(b)**

Additional experiments have also been done to test the robustness according to the parameter settings, including , , and , which refer to (10), (11), and (14), respectively. The results are listed below.(1)When , the noise increases and it makes the matching curve of BMI not so smooth, but the main peak of the curve remains unchanged. When , the salient regions shrink, and the position accuracy decreases.(2)The change of has no substantial influence on , as long as .(3) determines the cover range of the likelihood, and the higher is, the smoother the curve will be and, hence, the more robust the BMI will be.

#### 5. Conclusions

In this paper, a robust Bayesian estimated mutual information method for multisensor images matching is proposed. Our method is universal as long as the prior and the likelihood can be modeled, and we have implemented it by using the prior information of gradient, in which the prior probability density is estimated by the KDE method, and the likelihoods are modeled, respectively, according to the class of the matching pixel pair. Our method is compared with the standard NMI and GNMI by experimenting on some multisensor images, and the results show that (1) our method provides higher robustness and position accuracy than NMI; (2) as the keys of BMI can be precalculated offline, our method is remarkably cost saving; (3) the parameters of the BMI can be set easily, which facilitate its application.

Further studies include improving the matching accuracy by suppressing the speckle noise and refining the saliency mask. Moreover, to enhance its robustness, additional information can be introduced according to the practical conditions.