Multimodal Image Alignment via Linear Mapping between Feature Modalities

Jiang, Yanyun; Zheng, Yuanjie; Hou, Sujuan; Chang, Yuchou; Gee, James

doi:https://doi.org/10.1155/2017/8625951

Journal of Healthcare Engineering

On this page

Abstract Introduction Experimental Results Conclusion Conflicts of Interest Acknowledgments References Copyright Related Articles

Research Article | Open Access

Volume 2017 | Article ID 8625951 | https://doi.org/10.1155/2017/8625951

Multimodal Image Alignment via Linear Mapping between Feature Modalities

Yanyun Jiang,¹Yuanjie Zheng,¹Sujuan Hou,¹Yuchou Chang,²and James Gee³

Academic Editor: Saverio Affatato

Received08 Jan 2017

Accepted10 May 2017

Published06 Jul 2017

Abstract

We propose a novel landmark matching based method for aligning multimodal images, which is accomplished uniquely by resolving a linear mapping between different feature modalities. This linear mapping results in a new measurement on similarity of images captured from different modalities. In addition, our method simultaneously solves this linear mapping and the landmark correspondences by minimizing a convex quadratic function. Our method can estimate complex image relationship between different modalities and nonlinear nonrigid spatial transformations even in the presence of heavy noise, as shown in our experiments carried out by using a variety of image modalities.

1. Introduction

Multimodal/multispectral images acquired from multiple modalities or different spectral bands of the same subject or organ are of great importance for medical diagnosis and computer-aided surgery, benefiting from the complementary information captured by sensors of different modalities/spectra (e.g., magnetic resonance imaging and computed tomography or the multispectral imaging) [1–3]. They are also being more and more widely used in other fields, such as computer vision and computational photography, accomplished via different imaging modalities (e.g., RGB and near infrared) or under various imaging conditions (e.g., flash and no flash, depth, and color images) [4].

Image alignment resolves spatial correspondences between images and plays a fundamentally important role in practical application of multimodal images. There currently exist various techniques [4–9] for multimodal image alignment, which can be basically categorized into feature-based and patch-based methods. The feature-based methods detect sparse salient points and extract features to describe their local photometric/geometric pattern [10, 11]. Different from alignment of generic images, multimodal image alignment requires the features together with their similarity measurement to be able to deal with image variations caused by the modality difference [6]. The patch-based methods measure the similarity between local patches by computing their mutual information [12], cross correlation [4, 6, 13], or their combination [14].

Disregarding the promising results reported in existing papers, multimodal image alignment still remains a challenge mainly due to the complex and unknown relationship between image modalities (as shown by the left two images in Figure 1(c)). The common information between multimodal images is needed for defining image features. However, it is not always trivial to recognize, model, or learn this information in practice due to outliers, large displacement, and the complex relationship [4]. Moreover, the predefined image features can work well only when the corresponding measurement of the feature similarity fits these features, which is not always an easy task in practice. Finally, the definition of image feature and similarity is independent from the computation of spatial correspondences in most of the existing works for multimodal image alignment, which may lead to suboptimal solutions.

(a)

(b)

(c)

Figure 1

From left to right: (a) a pair of MSI images captured from the same retina but at different spectra; (b) overlaid results generated after alignments by the algorithms based on mutual information [19], robust measurement [4], and our linear mapping, respectively; (c) a small rectangular patch chosen at a similar position from the images from left to right at (a) and (b), respectively. The white arrow in the rectangular patches points to an area bearing obvious differences in vessel alignment between different algorithms.

In this paper, we propose a new landmark matching based multimodal image alignment method which uniquely builds an implicit linear mapping of features extracted for describing each landmark in one image to the ones of the corresponding landmark in the other image taken in a different modality/condition. It runs as resolving the linear mapping and the landmark correspondences simultaneously by minimizing squared differences between features. Our method bears several advantages over the state-of-the-art techniques. First, the resolved linear mapping enables our method to gain an effective similarity measure by adaptively discovering common information between images, even based only on common image features for describing image local properties at each landmark and the L₂ norm for measuring feature differences. Second, simultaneous optimization of the linear mapping and landmark correspondences results in an optimal solution, benefiting from their mutual interactions involved in the optimization process. Third, we formulate the problem as an integer quadratic programming and resolve it with an efficient conditional gradient algorithm.

2. Problem Definition

Suppose we have a pair of images, denoted by and , which are taken under different modalities. From each of them, we extract a set of landmark points, represented by for and for , respectively. We aim to align and by searching correspondences between and .

3. Linear Mapping

Great challenges in corresponding the landmarks in and the ones in arise from the complex relationship between and in the sense of not only the brightness value at each landmark but also the local photometric/geometric pattern at the vicinity of each landmark. In order to tackle this hard problem, we first extract a set of features denoted by a vector for each landmark in and a set of features for landmark in , respectively. By stacking the features of all landmarks together, we have and . Then, if landmark in corresponds to landmark in , we solve a projection matrix such that . In other words, we assume that there is a linear mapping from to . At the same time, we assume that all pairs of corresponding landmarks follow the same linear mapping, which can be written as where denotes a correspondence matrix. is a binary matrix, that is, , for which rows correspond to the landmarks of and columns are associated with . Elements of take a value of 1 when the related landmarks correspond and 0 if otherwise.

4. Objection Function

We resolve the correspondence matrix together with the projection matrix of the linear mapping by optimizing the following objective function with the Frank-Wolfe algorithm [15]: where the right term enforces an regularizer on and is an adjusting parameter. When where the right term enforces an regularizer on and is an adjusting parameter. When is fixed, (2) becomes a ridge regression problem [16] with respect to and generates a solution where is an identity matrix. By combining (2) and (3), we have where Tr() means the computation of trace and is written as

5. Enforcing Priors

In order to avoid degenerated solutions which are characterized as being obviously different from a reasonable solution in practice, we enforce two constraints on in (4) based on our prior knowledge about the landmark matching problem. The first one aims to minimize the number of landmarks in associated with each landmark in ; that is, we do not hope many landmarks in one image are assigned to a landmark in the other image. This constraint can be expressed as where is an all-ones vector and is a parameter to be set empirically. In (6), controls the number of points. The second constraint is introduced to prevent two landmark points from being associated if they are too distant to be true. It can be written as where and are matrices created by repeating copies of the horizontal vector composed by coordinates of landmarks in and , respectively, and and are built in a similar way by using coordinates instead.

6. Optimization

Combining (4), (6), and (7), our landmark matching problem is formulated as a minimization of the following objective function:

Equation (8) is a positive semidefinite quadratic function and its minimization is NP hard. We optimize (8) by using a similar technique to [15], which consists of a relaxation of the binary matrix to be continuous and the optimization to be over the convex hull of (via the Frank-Wolfe algorithm [15]), and a procedure of rounding the resulted continuous solution of by minimizing the Euclidean distance between the binary and the continuous with a linear programming optimization. As shown in [15], the Frank-Wolfe algorithm can find the global optimization of the correspondence matrix in (2), and therefore, we can simply initialize it randomly.

7. Landmark Detection and Features

In our algorithm, landmark points are specified as the key locations of SIFT [17] due to its prestigious advantage of being stable. The features for describing each landmark point are the gradient orientation matrices (GOM) [18].

8. Experimental Results

We implemented our algorithm in MATLAB® and its processing time for a 2500 × 2300 image is less than 2 minutes on a 2.39 GHz Core i7 computer. In our experiments, we empirically set λ = 0.6, λ₁ = 0.8, λ₂ = 0.1, and μ = 1.5. We employ a coarse-to-fine strategy based on an image pyramid created by using 3 scales with a downsampling rate of 0.8.

In order to validate our algorithm, we collected a dataset which consists of a pair of aerial and orthophoto images copied from MATLAB, 6 pairs of flash/no-flash indoor images taken by using a Canon camera, 6 pairs of RGB/depth images captured by Microsoft Kinect, and 10 pairs of multispectral imaging (MSI) ocular images acquired by using an Annidis RHA™ instrument (Annidis Health Systems Corp., Ottawa, Canada). Every pair of images comes from the same scene/object, for example, MSI images of each pair share the same retina. In our experiments, we converted all RGB images to gray.

We compared our linear mapping based method with the classic mutual information based approach [19] and the recently proposed robust measurement based technique [4] both qualitatively and quantitatively. We first ran the three algorithms on our dataset and visually compared their performances by both overlaying the transformed image to the other image and showing the connection of matched points. Observed misalignment in the overlaid image or the matched-point connections means an inferior performance of the matching algorithm. In our experiments, we found that our algorithm outperforms the other two methods for 19 pairs (an exampling pair of MSI images are shown in Figures 1 and 2) and produces comparable results for the left 4 pairs. Then, we added into all images Gaussian noise with zero mean and variances of 0, 0.01, 0.02, 0.04, and 0.08, respectively. For each pair of images, a trained rater manually marked 10 points which are easy to recognize in both images (as shown in Figure 2). We treated the 230 manually set point pairs as the ground truth and computed the quantitative errors (as shown in Figure 3) of the three methods that we are evaluating. Specifically, we estimated the 12-parameter transformation model for the retina [20, 21] for MSI images and an affine model for other images, used it to transform manually set points in one image to the other image and then computed the spatial distance between the transformed point and the corresponding manually set point.

As shown by the results in Figures 1–3, we have at least three findings. First, our algorithm performs better than the two representative state-of-the-art techniques, as shown by its fewer vessel misalignments in the overlaid images especially in the area to which the white arrow points in the rectangular patches of Figure 1 and the smaller quantitative errors in Figure 3. Second, our algorithm can automatically discover the complex relationship (as shown by the left two images in Figure 1(c)) between images taken from different modalities and therefore results in better accuracies. Third, the linear mapping demonstrates better robustness to image noise, and the simultaneous optimization of linear mapping and landmark correspondences shows an extraordinary ability to estimate nonlinear nonrigid transformations (e.g., the retina in Figure 1).

9. Conclusion

We have presented a novel landmark matching based multimodal image alignment technique. It is distinguished from existing image alignment techniques by at least two of its unique characteristics. First, it automatically discovers the latent complex relationship between different feature modalities. Second, it simultaneously solves for the linear mapping and landmark correspondences based on a minimization of a convex quadratic function. Our future works would include extensions of our algorithm to other features (e.g., learned features [22]) and for describing landmark points, different features for different modalities, and a supervised alignment scheme.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

This work was made possible through the support from the Natural Science Foundation of China (NSFC) (61572300, 61672329, 61402267), Natural Science Foundation of Shandong Province in China (ZR2014FM001, ZR2016FQ20, ZR2014FQ004), Taishan Scholar Program of Shandong Province in China (TSHW201502038), National Institutes of Health (NIH) (P30 EY001583), Project of Humanities and Social Sciences in Universities of Shandong Province (J13WH07), and Shandong Provincial Project for Science and Technology Development (2014GGX101026).

References

M. A. Viergever, J. A. Maintz, S. Klein, K. Murphy, M. Staring, and J. P. Pluim, “A survey of medical image registration-under review,” Medical Image Analysis, vol. 33, pp. 140–144, 2016.
View at: Publisher Site | Google Scholar
A. Calcagni, I. Styles, A. Palmer et al., “Multispectral retinal image analysis (mria) for the quantification of macular pigment,” Investigative Ophthalmology & Visual Science, vol. 54, no. 15, pp. 5522–5522, 2013.
View at: Google Scholar
J. Lin, Y. Zheng, W. Jiao et al., “Groupwise registration of sequential images from multispectral imaging (msi) of the retina and choroid,” Optics Express, vol. 24, no. 22, pp. 25277–25290, 2016.
View at: Publisher Site | Google Scholar
X. Shen, L. Xu, Q. Zhang, and J. Jia, “Multi-modal and multi-spectral registration for natural images,” in European Conference on Computer Vision, pp. 309–324, Springer, 2014.
View at: Publisher Site | Google Scholar
F. P. Oliveira and J. M. R. Tavares, “Medical image registration: a review,” Computer Methods in Biomechanics and Biomedical Engineering, vol. 17, no. 2, pp. 73–93, 2014.
View at: Google Scholar
M. Irani and P. Anandan, “Robust multi-sensor image alignment,” in The Proceedings of the Sixth IEEE International Conference on Computer Vision, pp. 959–966, IEEE, 1998.
View at: Google Scholar
S. Kim, D. Min, B. Ham et al., “Dense adaptive self-correlation descriptor for multi-modal and multi-spectral correspondence,” in 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2103–2112, IEEE, 2015.
View at: Publisher Site | Google Scholar
C. Wachinger and N. Navab, “Entropy and laplacian images: structural representations for multi-modal registration,” Medical Image Analysis, vol. 16, no. 1, pp. 1–17, 2012.
View at: Publisher Site | Google Scholar
Y. Zheng, Y. Wang, W. Jiao et al., “Joint alignment of multispectral images via semidefinite programming,” Biomedical Optics Express, vol. 8, no. 2, pp. 890–901, 2017.
View at: Publisher Site | Google Scholar
J. Han, E. J. Pauwels, and P. De Zeeuw, “Visible and infrared image registration in man-made environments employing hybrid visual features,” Pattern Recognition Letters, vol. 34, no. 1, pp. 42–51, 2013.
View at: Publisher Site | Google Scholar
T. Hrkac, Z. Kalafatic, and J. Krapac, “Infrared-visual image registration based on corners and hausdorff distance,” in Scandinavian Conference on Image Analysis, pp. 383–392, Springer, 2007.
View at: Google Scholar
J. P. Pluim, J. A. Maintz, and M. A. Viergever, “Mutual-information-based registration of medical images: a survey,” IEEE Transactions on Medical Imaging, vol. 22, no. 8, pp. 986–1004, 2003.
View at: Publisher Site | Google Scholar
R. Kolar, L. Kubecka, and J. Jan, “Registration and fusion of the autofluorescent and infrared retinal images,” International Journal of Biomedical Imaging, vol. 2008, Article ID 513478, 11 pages, 2008.
View at: Publisher Site | Google Scholar
A. Andronache, M. von Siebenthal, G. Szekely, and P. Cattin, “Non-rigid registration of multi-modal images using both mutual information and cross-correlation,” Medical Image Analysis, vol. 12, no. 1, pp. 3–15, 2008.
View at: Publisher Site | Google Scholar
P. Bojanowski, R. Lajugie, F. Bach et al., “Weakly supervised action labeling in videos under ordering constraints,” in European Conference on Computer Vision, pp. 628–643, Springer, 2014.
View at: Google Scholar
A. E. Hoerl and R. W. Kennard, “Ridge regression: biased estimation for nonorthogonal problems,” Technometrics, vol. 12, no. 1, pp. 55–67, 1970.
View at: Publisher Site | Google Scholar
D. G. Lowe, “Object recognition from local scale-invariant features,” in The Proceedings of the Seventh IEEE International Conference on Computer Vision, pp. 1150–1157, IEEE, 1999.
View at: Google Scholar
N. Dalal and B. Triggs, “Histograms of oriented gradients for human detection,” in 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05), pp. 886–893, IEEE, 2005.
View at: Google Scholar
W. M. Wells, P. Viola, H. Atsumi, S. Nakajima, and R. Kikinis, “Multi-modal volume registration by maximization of mutual information,” Medical Image Analysis, vol. 1, no. 1, pp. 35–51, 1996.
View at: Publisher Site | Google Scholar
Y. Zheng, E. Daniel, A. A. Hunter et al., “Landmark matching based retinal image alignment by enforcing sparsity in correspondence matrix,” Medical Image Analysis, vol. 18, no. 6, pp. 903–913, 2014.
View at: Publisher Site | Google Scholar
A. Can, C. V. Stewart, B. Roysam, and H. L. Tanenbaum, “A feature-based, robust, hierarchical algorithm for registering pairs of images of the curved human retina,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 24, no. 3, pp. 347–364, 2002.
View at: Publisher Site | Google Scholar
X. Sui, Y. Zheng, B. Wei et al., “Choroid segmentation from optical coherence tomography with graph-edge weights learned from deep convolutional neural networks,” Neurocomputing, vol. 237, pp. 332–341, 2017.
View at: Publisher Site | Google Scholar

Copyright

Copyright © 2017 Yanyun Jiang et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

PDF Download Citation

Download other formats

Order printed copies

Views

1468

Downloads

1100

Citations