Random Binary Local Patch Clustering Transforms Based Image Matching for Nonlinear Intensity Changes
This paper presents a new feature descriptor that is suitable for image matching under nonlinear intensity changes. The proposed approach consists of the following three steps. First, a binary local patch clustering transform response is employed as the transform space. The value of the new space exhibits a high similarity after changes in intensity. Then, a random binary pattern coding method extracts raw feature histograms from the new space. Finally, the discrimination of the proposed feature descriptor is enhanced by using a multiple spatial support region-based binning method. Experimental results show that the proposed method is able to provide a more robust image matching performance under nonlinear intensity changes.
With the rapid development of modern sensors and big data technology, image matching has been applied to many computer vision applications, such as multiple sensor fusion [1, 2], 3D reconstruction , and depth information estimation [4, 5]. Nevertheless, when the intensity value of an image pair contains an obvious nonlinear variation, image matching becomes very challenging. Typical intensity changes include changes in illumination [6, 7], different foci , flash versus no-flash , multispectral images [10, 11], multimodalities , and blurred images . The most direct approach to this challenging task is to construct a feature descriptor that is effective for different types of image transformations.
Image matching under illumination changes has a very important effect on the performances of visual object detection, tracking, and recognition. The local relationship between the intensity values of pixels is employed to construct an illumination-independent feature descriptor, such as a local binary pattern (LBP)  and relative family features, namely, center symmetric local binary pattern (CS-LBP)  and center symmetric local trinary pattern (CS-LTP) . Local features extracted from intensity rankings rather than raw intensity have become a more popular method recently because intensity order distributions are invariant to monotonic intensity changes . In this regard, Tang proposed an ordinal spatial intensity distribution (OSID) , which uses the relative order of pixel intensity within a patch to generate illumination-robust features. However, ordering pixels by discrete intensity yields significantly different order distributions when the illumination changes, thereby leading to changes in the nonlinear intensity. To address this problem, the exact order based descriptor (EOD) employed by Kim  used an exact order method.
Image matching in multispectral cases is more challenging than in cases of illumination change. Long wavelength infrared (LWIR) intensity variations are related to variations in the temperatures of objects, while variations in the RGB intensities come from colored objects and light reflections. Due to their different natures, the nonlinear relationship of the intensity values across these image types significantly complicates image matching due to the resulting lack of correlation between their respective gradients. Furthermore, LWIR images appear smoother, with losses of detail and texture , as Figure 1(c) shows. Thus, in multispectral cases, many novel feature descriptors, such as the scale invariant feature transform SIFT , lose their effectiveness. Inspired by SIFT, many SIFT-like descriptors have been proposed for RGB/LWIR image matching. Cristhian  describes the Edge Oriented Histogram (EOH) descriptor, which is robust for characterizing multispectral key-points. Additionally, Mouats employs multioriented and multiscale Log-Gabor filters instead of multioriented Sobel descriptors in order to construct a new and upgraded feature descriptor PCEOH .
(a) Illumination change
(b) Different exposures
Local self-similarity (LSS)  is another novel method for feature extraction that captures the local internal layouts of self-similarities in the image. Inspired by LSS, Heinrich proposed a modality independent neighbourhood descriptor (MIND)  for multimodal image matching. Kim proposed a dense adaptive self-correlation (DASC)  with a series of adaptive self-similarity measures between patches sampled by randomized receptive field pooling, for which the sampling pattern was obtained through discriminative learning. However, every image patch has separate internal layout information. Despite employing a machine-learning algorithm , the fixed sampling pattern based self-similarity group feature (DASC) could not achieve perfect performance across all test images.
This paper presents a new descriptor called a random binary pattern of patch clustering (RBPPC), which employs binary local patch clustering transform responses as the transform space of the input image. Then, groups of random sampling responses in the new space are converted into different patterns as the “cluster bin” of the proposed histogram feature descriptor. Finally, multiple spatial support regions are designed in order to enhance the discrimination of the proposed feature descriptor.
This paper is organized as follows. Section 2 introduces the proposed RBPPC feature descriptor. Section 3 describes the experimental process and results analysis. Finally, the concluding remarks are given in Section 4.
2. Proposed Feature Descriptor
2.1. Binary Local Patch Clustering Transform
Instead of constructing a descriptor with a series of self-similarity measures directly, in this paper a binary local patch clustering transform is proposed as the transform space, which performs robustly to intensity changes. The typical intensity changes include changes in illumination, different exposures, and multispectral images. For the multispectral images (e.g., RGB/LWIR images) in particular, there is not only little texture information but also low contrast in the LWIR image compared to the RGB image. Therefore, a binary local path clustering transform can generate a more similar transform space.
Figure 1 shows an example of the response of the binary local patch clustering transform to intensity changes. The first row of Figure 1 shows the input image pairs, featuring different types of intensity changes, and the second row shows the response of the image pairs to the binary local patch clustering transform. This example confirms that the new transformed space can be less sensitive to intensity changes than the raw intensity space. Therefore, building a robust intensity change feature descriptor requires an effective matching approach that confines the feature values of the transformed space in response to the changes in intensity.
The proposed binary local patch clustering transform consists of the following five steps:
Extract the feature vector sets with identical orders from the source and reference images by employing a sliding window in the same direction (from left to right and up to down) as shown in Figure 2:where and are the local feature vector sets of the source and reference images, respectively. and are the feature vectors, which consist of the intensity values of the corresponding local image patches of the source and reference images. N is the total number of local feature vectors, as Figure 2 Step shows.
Build a codebook of the source image using a K-means clustering algorithm  from :where represents the codebook of the source image, which consists of two feature vectors. Pj1 and Pj2 represent the binary clustering centers extracted by K-means from . Therefore, Pj1 and Pj2 are actually the -th and -th feature vectors in .
Build a codebook of the reference image by sharing the index information of the source image binary clustering centers Pj1 and Pj2:where represents the codebook of the reference image and is the codebook feature vector index of the corresponding local patches in the source local feature set . The relationship established in (3) depicts the source and reference codebooks that share the same clustering index (=). Therefore, although the codebooks and consist of different intensities, they represent the local patch clustering centers of the source image and reference image, respectively. This simple method deals effectively with the quantization problem caused by intensity differences.
Generate the binary local patch clustering transform response using codebook-based quantization:where and are the binary local patch clustering transform responses of the source and reference images, respectively. SSD presents the sum of the square differences of the intensity values P(x, y) and Q(x, y) representing the intensity features of the local patch located at the (x, y) points in the source image patch and the reference image, respectively. As a result, either the source or the reference local patches are assigned a binary class label. Figure 2 shows the entire processing of the binary local patch clustering transform algorithm.
In order to illustrate the advantages of the proposed binary local patch clustering transform algorithm (BLPCT), three classical image binarization algorithms, namely, the Otsu method, the Kittler method, and the Niblack method, are employed as comparison objects. The comparison experiment is set up as follows: 100 local image patch pairs under intensity changes with different resolutions are randomly selected from the DB1 illumination changes database (Leuven), as the first row of Figure 3 shows. Then, a similarity test of patch pair binarization results, , is employed as an evaluation function (5) to evaluate the effectiveness of different binarization algorithms. Finally, the average similarity of 100 image patch pair binarization results under intensity is used as a qualitative evaluation indicator of their performances.Table 1 shows the comparison experimental results of the proposed method and the Otsu, Kittler, and Niblack methods. The first row of Table 1 contains the average similarity of different image binarization results for 100 random image patch pairs under intensity changes. The second row of Table 1 contains the average running time of the different image binarization algorithms. From Table 1, we can clearly see that the average similarity of the image binarization results for the proposed method is obviously higher than the average similarity for other classical image binarization algorithms, which illustrates that the proposed BLPCT algorithm performs more effectively under intensity changes compared with other conventional methods. Moreover, the computational complexity is similar to the Otsu algorithm.
Figure 3 shows some experimental result examples of the proposed binary local patch clustering transform (BLPCT) and the classical Otsu, Kittler, and Niblack binarization algorithms. Specifically, the first column of Figure 3 shows the test image patch pair under intensity changes (the source image patch and the reference image patch). The second column of Figure 3 shows the proposed BLPCT responses of the test image patch pair, and the third column of Figure 3 shows the Otsu algorithm responses of the test image patch pair. The fourth column shows the Kittler binarization algorithm results of the test image patch pair, and the fifth column shows the Niblack binarization algorithm results of the test image patch pair.
The corresponding evaluation results are summarized in Table 2. It is clear that the proposed binary local patch clustering transform algorithm performs more effectively than other classical image binarization algorithms under intensity changes, since the average similarity of the BLPCT responses under intensity changes is 6–7% higher than that of other binarization algorithm responses.
In theory, according to the definition of the classical Otsu algorithm , the image histogram is separated into two groups using a gray value. When the variance of the two divided groups becomes a maximum, the gray value is seen as the best threshold. The algorithm utilizes the category variance as the criterion and chooses the gray value as the optimal threshold, which is the maximum of the variance between the categories and the minimum of variance within the categories. The Kittler algorithm  is also a gray level threshold based image binarization method, whose goal is to find the minimum error threshold. The Niblack algorithm  employs the local mean and the standard deviation to estimate the optimal threshold.
The proposed BLPCT algorithm is based on the local image patch classification result. Compared with a global or local intensity threshold, a local image patch can provide more information, such as the neighbouring intensity relationship and the spatial position information, which is more important to intensity changes.
2.2. Random Binary Pattern Coding Strategy
In order to convert the response of a binary local patch clustering transform to a feature histogram, this paper proposes a random binary pattern coding method, defined as follows:where F((I)) represents the binary pattern index of the j-th random sampling response of image I. (I) is a random binary pattern that consists of four sampling point responses of image I (Фj1(I), (I), (I) and (I)).
Mathematically, (I) presents a permutation of four random sample point responses to input image . Therefore, given that the random sampling result of a binary local patch clustering transform response is either 0 or 1, when applying (I) (6) to the binary local path clustering transform response M, the complete permutation of (M) consists of 16 different patterns, as shown in Table 3.
The process of the random binary pattern coding method is explained clearly in Figure 4.
With the pattern index in Table 3, a feature mapping function π is defined to map the Фj(I) permutation into a 16-dimensional feature vector , whose elements are all 0 except for the ith element, which is 1. The mathematical definition of π isTherefore, based on the above definitions, the histogram feature of the proposed RBPPC can be defined by where represents the binary local patch clustering transform response and n represents the random sampling time. Figure 5 illustrates the histogram feature of the RBPPC generation process. First, four points are randomly sampled n times and the computing pattern index is calculated according to (6) for each sampling response. Then, the HRBPPC histogram is constructed by counting the pattern index number with reference to (8).
2.3. Chamfer Distance-Based Spatial Constraint
With the help of the proposed binary local patch clustering transform, the transform space exhibits high similarity compared to the original image space under intensity changes, as shown by the above examples. However, there are still some tiny differences between the transform responses of the source and reference images. In other words, it is difficult for the binary local patch clustering transform to provide an absolutely identical transform space under intensity changes, such as those shown in Figures 7(c) and 7(d).
Therefore, in order to extract features that are more similar from the binary local patch clustering transform space under intensity changes, the Chamfer distance  response is applied in order to provide a spatial constraint for the RBPPC descriptor extraction. This spatial constraint can effectively enhance the robustness of RBPPC descriptor. The detailed process is described as follows: First, the Chamfer distance map is extracted from the binary local patch clustering transform of the source image as shown in Figure 7(e).where is the Chamfer distance map and presents the local patch clustering transform response of the source image.
Then, the spatial sampling point candidate regions are built from the Chamfer distance map by a distance threshold as follows:where Rsam represents the spatial sampling point candidate regions that are shown by the yellow region of Figure 7(f). Figure 6 shows the entire spatial constraint enhanced RBPPC histogram feature extraction process. We can clearly see that, with the help of the sampling candidate region Rsam, the RBPPC histogram features under intensity changes are more similar.
Figure 7 shows an example of the RBPPC descriptor with an RGB/LWIR image pair under different generation conditions. Figures 7(g) and 7(h) show RBPPC histograms of input image pairs without spatial sampling point candidate region constraints, and there are obvious differences between those two histograms. In contrast, as Figures 7(i) and 7(j) show, when using spatial sampling point candidate region, the RBPPC feature pairs extracted from the source and reference images exhibit high similarity.
Moreover, Figure 8 shows an example of the RBPPC descriptor with a different exposure image pair under different generation conditions and Figure 9 shows an example of the RBPPC descriptor with an image pair featuring different illuminations. The second rows of Figures 7, 8, and 9 feature different RBPPC descriptor generation conditions with and without spatial sampling point candidate region constraints. These examples demonstrate that the similarity of the proposed RBPPC descriptors for image pairs under intensity change is markedly improved by using the Chamfer distance transform based spatial sampling point candidate regions.
2.4. Multiple Spatial Support Regions Binning Method
In order to enhance the discrimination of the proposed RBPPC feature descriptor, this study applies a multiple spatial region-based binning method. Figure 10 shows 17 different spatial support region binary masks designed to enhance the discrimination of the proposed RBPPC descriptor. The white parts of the spatial support region masks represent 1 while the black parts of spatial support region masks represent 0.
The multiple spatial support region-based RBPPC histogram feature descriptor vectors are defined as follows:where represents the RBPPC histogram of spatial support region. represents the multiple spatial support region-based RBPPC histogram feature descriptor vectors, which consist of 17 . π is a mapping function, which is defined to map a pattern index F(Фj(X)) into a 16-dimensional feature vector , as (7) shows.
Moreover, represents the actual spatial random point sampling candidate regions of actual spatial support region for the binary local patch clustering transform response M, and is generated with the following two components, as (11) shows:
the binary mask of the spatial support region , which is shown in Figure 10;
the spatial sampling point candidate regions Rsam, which are shown by the yellow regions of Figure 11.
Therefore, the multiple support region-based RBPPC has (1617-) 272-dimensional feature descriptor vectors. The flowchart of the feature extraction of multiple spatial support region-based RBPPC histograms is shown in Figure 11. Figure 12 shows examples of the multiple support region binning method enhanced RBPPC feature descriptor, used with different types of image pairs.
3. Experimental Results
We implemented the proposed algorithm with MATLAB 2016b software and a desktop computer with a 3.30 GHz Intel 5 processor. Three popular published image database sets under intensity changes were employed as test data, which consisted of illumination changes , different exposures , and multispectral images (RGB/LWIR) . These datasets all provided the ground truth information about key-point positions and their corresponding matching results.
To be specific, the illumination changes database (Leuven) is a standard Oxford image matching dataset with natural illumination changes, which consists of six outdoor scene images under various degrees of illumination changes. These images are employed in this paper as the image matching test DB1 (illumination changes), which is shown in Figure 13(a). The second image matching test DB2 (different exposures) consists of four image pairs with obviously different exposures, and three image pair examples are shown in Figure 13(c). The third image matching test DB3 (multispectral images) consists of 44 multispectral image pairs (any image pair including an RGB image and an LWIR image with the same scene), and three examples are shown as in Figure 13(b).
3.1. Parameters Analysis of the Proposed Feature Descriptor
Figure 14 shows the results of an extensive analysis of the RBPPC descriptor performance when varying its associated parameters, namely, the codebook patch scale a, the random sampling time n, and the threshold of the Chamfer distance .
A successful image matching rate, Prate, is employed as an evaluation indicator in order to evaluate its performance quantitatively.Note that, as Figure 14(a) shows, the matching performance of the proposed algorithm is not affected strongly by the local patch size a of a codebook with any of the test data types, since the performance curve is so flat. Varying the threshold of the Chamfer distance has a larger effect on performance, as shown in Figure 14(b). With the illumination change data, increasing the threshold causes the matching quality to degrade. However, with the exposure change and the multispectral data, increasing the threshold across a certain range improves the matching quality. The reason for this is that the local patch clustering transform generates more similar responses under illumination changes than under other kinds of intensity changes. The dissimilar region is effectively removed by the large Chamfer distance threshold. Therefore, with a large Chamfer distance threshold, better matching quality is achieved under different exposures or multispectral images. Moreover, if the Chamfer distance is too large, the generated sampling candidate region will become too small and the proposed histogram feature will lose its discrimination; as a result, the matching quality will fall down. Likewise, as Figure 14(c) shows, the matching performance is also improved by increasing the random sampling time n within a certain range for all test data types.
3.2. Comparison Experiments
The image matching performances of the proposed RBPPC feature descriptor and other conventional descriptors, EOH , DASC , LBP , PCEOH , under intensity changes were compared. To evaluate the performance of the feature descriptors while avoiding bias due to the feature detector performance, we follow a similar approach to . We extract the key-point descriptors from the source images and then project them into the corresponding reference image pair using different feature descriptor-based matching results. The ground truth homography information is provided by published databases [8, 31, 32] so that the ground truth and the performance of the different descriptors can be evaluated. The histogram intersection method  is employed as a matching measure for the feature descriptor, defined in where and are the histogram feature descriptors and dim represents the dimensions of and .
The precision-recall curve  is selected as a common criterion that is used to evaluate local feature descriptors. The curve is based on the number of correct matches and the total matches obtained for an image pair. “Recall” indicates the ratio of the correct matches to the corresponding number of the image pair. “Precision” denotes the ratio of the correct matches to the total matches. The “Recall” and “Precision” are defined in (14) and (15), respectively.Table 4 shows the image matching performances of the different feature descriptors with different test data types. It is clear that the performance of all the descriptors is better with the illumination change dataset than the other datasets. The reason for this is that the intensity changes under global illumination changes are approximately linear. However, the proposed RBPPC feature descriptor performs more effectively for all datasets than the other conventional methods, with high recall and low 1-precision.
Figures 15–17 show the key-point matching performances of proposed RBBPC feature descriptors under different test data types. Although there is a nonlinear intensity change between every test image pair (illumination change, different exposures, or multispectral images), key-point matching has been performed well with the help of the proposed RBBPC feature.
Figure 18 shows examples of the performances of the proposed RBPPC and other four conventional feature descriptors (EOH , LBP , DASC , and PCEOH ) under nonlinear intensity changes, that is, with RGB/LWIR multispectral image patch pairs.
A new feature descriptor is proposed in this paper for image matching under intensity changes. In contrast to conventional methods, a binary local patch clustering transform is proposed to transform the spaces of the original intensity change image pair. The values of the new spaces have higher similarities than the raw intensities, and a random binary pattern coding method converts the values of the transformed spaces into feature histograms. Moreover, to enhance the discrimination of the proposed RBPPC, a Chamfer distance-based threshold is combined with a multiple spatial support region mask. The experimental results demonstrate that, given its high matching success rate, the proposed RBPPC descriptor performs more effectively than the conventional methods of image matching with intensity change image pairs.
The [DB1 image] data used to support the findings of this study are included within the article. The [DB2 image] data used to support the findings of this study are included within the article. The [DB3 image] data used to support the findings of this study are included within the article.
Conflicts of Interest
The authors declare that they have no conflicts of interest.
Han Wang and Hanseok Ko came up with the main idea for the research; Han Wang performed the research and numerical analysis; Zhihuo Xu performed some additional parts of the experiments analysis and partially edited the paper; Hanseok Ko gave valuable suggestions on the revision of the paper and partially edited the paper.
This work was supported in part by the Natural Science Foundation of China under Grant 61872425, 61771265, and 61801247, in part by the Nantong Science and Technology Bureau Foundation under Grant GY12016020 and GY12016017, in part by the University Science Research of Jiangsu Province Foundation under Grant 17KJB520029 and 17KJB510047, in part by Nantong University (17R31), and in part by the Basic Science Research Program of NRF (NRF-2017R1A2B4012720).
P. Sen, N. K. Kalantari, M. Yaesoubi, S. Darabi, D. B. Goldman, and E. Shechtman, “Robust patch-based HDR reconstruction of dynamic scenes,” ACM Transactions on Graphics, vol. 31, no. 6, 2012.View at: Google Scholar
G. Petschnigg, R. Szeliski, M. Agrawala, M. Cohen, H. Hoppe, and K. Toyama, “Digital photography with flash and no-flash image pairs,” in Proceedings of the Proceedings of ACM SIGGRAPH 2004, pp. 664–672, August 2004.View at: Google Scholar
X. Shen, L. Xu, Q. Zhang, and J. Jia, in Proceedings of the 13th Europe Conference on Compute Vision (ECCV), Multi-modal and multi-spectral registration for natural images, Ed., pp. 309–324, Zürich, Switzerland, 2014.
S. Kim, D. Min, B. Ham, M. N. Do, and K. Sohn, “DASC: Dense Adaptive Self-Correlation Descriptor for Multi-modal and Multispectral Correspondence,” in Proceedings of the 32th IEEE Conference on Compute Vision and Pattern Recognition (CVPR), pp. 2103–2112, Boston, MA, USA, June 2015.View at: Publisher Site | Google Scholar
B. Wang, Y. Zhang, X. Wang, and C. Wu, “A new regularized minimum error thresholding method,” Transactions of Nanjing University of Aeronautics and Astronautics, vol. 32, no. 4, pp. 355–364, 2015.View at: Google Scholar
W. Niblack, An Introduction to Digital Image Processing, Prentice Hall, 1986.
C. A. Aguilera, A. D. Sappa, and R. Toledo, “LGHD: A feature descriptor for matching across non-linear intensity variations,” in Proceedings of the 22th International Conference on Image Processing (ICIP), pp. 178–181, Quebec City, Canada, September 2015.View at: Google Scholar