Abstract

This paper presents a nonrigid coarse correspondence computation algorithm for volumetric images. Our matching algorithm first extracts then correlates image features based on a revised and improved 3DSIFT (I3DSIFT) algorithm. With a scale-related keypoint reorientation and descriptor construction, this feature correlation is less sensitive to image rotation and scaling. Then, we present an improved spectral matching (ISM) algorithm on correlated features to obtain a one-to-one mapping between corresponded features. One can effectively extend this feature correspondence to dense correspondence between volume images. Our algorithm can benefit nonrigid volumetric image registration in many tasks such as motion modeling in medical image analysis and processing.

1. Introduction

Establishing spatial correspondence between volume images is an important and fundamental computational problem in 3D image and video processing. It can greatly benefit many tasks such as nonrigid image registration [14], registration evaluation [58], volumetric ultrasound panorama [9], object recognition [10], and three-dimensional classification [11], to name a few. In biomedical computation, many medical scans (CT and MRI) are volumetric images, whose spatial and temporal correspondence are critical in clinical monitoring, diagnosis, and treatment planning [1214]. However, robust volumetric images registration is often challenging due to the nonrigid characteristics of the deformations of the organs and tissues.

A nonrigid registration is often formulated as a nonlinear optimization which is usually nonconvex and very expensive to solve. Feature correspondence suggests desirable initial guess to this optimization problem which can improve the computation efficiency. Furthermore, it can be adopted as an extra guidance in the dense registration computation to overcome many undesirable local minima and obtain a reliable global solution.

In 2D image registration, the SIFT descriptor and its variants [15, 16] are widely adopted feature modeling methods due to their great robustness and discrimination power. 3DSIFT [1719] generalizes the 2D SIFT into volumetric images and has been demonstrated to be effective in volume image matching problems. However, the existing 3DSIFT algorithms are still inadequate in handling dynamically deforming volumetric images such as aforementioned medical images, due to their sensitivity against image rotation and scaling. Also, the extracted feature correspondences using local descriptors are usually many-to-many and include mismatches. We need a reliable correspondence selection/refinement algorithm to remove mismatches and obtain an objective mapping between feature point sets extracted from the source and target images. In the recent computer vision literature, such a correspondence refinement procedure is often formulated as a graph matching problem; Leordeanu and Hebert [20] proposed an efficient and effective spectral algorithm to solve this graph matching problem.

In this work, we propose a volumetric image matching algorithm in a three-step pipeline: (1) feature extraction and initial correlation, (2) feature matching and correspondence refinement, and (3) extending the coarse correspondence (feature matching) to dense intervolume correspondence. Specifically, consider the following:(1)we develop an improved 3DSIFT (I3DSIFT) descriptor that is less sensitive to rotation and scaling, to reliably extract feature correspondences between given volumes;(2)we propose an improved spectral matching (ISM) algorithm that can more effectively refine the initial feature correlation and obtain correct matching between corresponding features;(3)we implement a spline fitting algorithm to extend these coarse correspondences to dense intervolume correspondences.

The remainder of this paper is organized as follows. We briefly review related work in Section 2. After giving an overview of our algorithm pipeline in Section 3, we elaborate the three steps of this pipeline in Sections 4, 5, and 6. Some volumetric matching results are shown in Section 7 and one application on the motion modeling of medical images is shown in Section 8.

Our algorithm of computing dense correspondences between volumetric data is guided by coarse correspondences between features. The major challenges and our contributions in this work lie in the robust extraction and matching of features. Therefore, we mainly discuss related work in these fields.

2.1. Extraction of Correlated Features

Volumetric feature extraction is the process to build coarse correspondences between volume images. It has been widely studied.

Scovanner et al. [17] first proposed the 3DSIFT descriptor and showed its application in action recognition. Their work is only on 3DSIFT descriptor construction and does not suggest feature correspondences construction. Furthermore, their descriptor is sensitive to scaling and rotation change. Ni et al. [9] implemented a 3DSIFT algorithm for feature extraction and applied it in volumetric ultrasound panorama. They extended scale space extreme detection and dominant direction reorientation step of SIFT to 3DSIFT. However, they did not utilize scale information so that their method is sensitive to scaling change. Cheung and Hamarneh extended SIFT to the so-called N-dimension SIFT (N-SIFT) [18]. They demonstrated its effectiveness using synthetic experiments on volumetric MRI and 4DCT data matching. However, N-SIFT algorithm is still sensitive to both scaling and rotation. Allaire et al. [19] proposed a full 3DSIFT algorithm. They filtered out potential keypoints that either have low contrast or locate on edges or ridges. But many correct corresponding keypoints are filtered out unnecessarily. As a result, only 0.1%~8% of potential keypoints were kept, which is sometimes too few for reliable volumetric matching. Also, this 3DSIFT is sensitive to scale change. Cheung and Hamarneh [18] also proposed the global histogram feature and reoriented global histogram feature. Both of them use single histogram on a keypoint region, while the latter has an additional reorientation step. The reoriented global histogram feature normalizes the direction of the gradients with respect to the highest peak in the histogram to increase robustness to orientation change. Unfortunately, both are less discriminative than N-SIFT for the lack of space information. Han [21] proposed a method based on 3DSURF. It is based on the partial derivatives of local regions and their absolute values. Therefore, 3DSURF is sensitive to noise.

2.2. Feature Correspondence Refinement

Feature extraction and matching algorithms depending on local descriptors usually compute a set of correspondences between feature pairs. Some correspondences are many-to-many and some are mismatches. In order to provide correct and one-to-one matching for the subsequent dense correspondence computation, a refinement is needed. The RANdom SAmple Consensus (RANSAC) algorithm [22, 23] is a widely adopted randomized algorithm that iteratively estimates correspondences and eliminates outliers. It starts with a small feasible dataset and enlarges this set with consistent data correspondences when possible. The RANSAC algorithm is nondeterministic and is only stable in a probabilistic sense. In Xu et al. [24], we used 3DSIFT to extract features and applied RANSAC to obtain the feature matching.

Spectral matching [20, 25], on the other hand, is a deterministic algorithm that has been reported with better efficiency [20]. In spectral matching, a correspondence set is treated as a bipartite graph. It is based on the observation that correct matches corelate with each other strongly, while incorrect matches corelate with each other accidently. Despite its successful application when the bipartite graph is connected, how well the algorithm works when the graph is not connected (which is the case in our pipeline) is not clear.

3. Algorithm Overview

In the paper, we propose a pipeline to find dense correspondences between volumetric images. The pipeline has three steps.(1)Given two volume images and , we extract an initial coarse correspondence set using a revised 3DSIFT (I3DSIFT) algorithm. This initial coarse correspondence set contains all the potential correspondences between features on and features on . In the following is called the potential correspondence set or initial correspondence set (Section 4).(2)Some correspondences in are mismatches; therefore, we perform a revised spectral matching (ISM) algorithm to filter them out and get a refined coarse correspondence set (Section 5).(3)Finally, we perform a B-spline fitting algorithm, guided by , to compute the dense correspondences between and (Section 6).

4. Revised 3DSIFT

As discussed in Section 2, a reliable feature extraction algorithm should be insensitive to scaling and rotation change. We implement a revised 3DSIFT (I3DSIFT) algorithm consisting of four main steps: keypoint detection, orientation assignment, direction reorientation, and descriptor construction. Figure 1 illustrates the feature extraction through these steps. The main changes on I3DSIFT are in the direction reorientation and descriptor construction. We propose a scale-related region definition method in these two steps. Moreover, 5-linear interpolation and feature normalization are introduced in the descriptor construction step to improve the robustness. Compared with the original 3DSIFT with fixed-size region definition, I3DSIFT is insensitive to rotation and scaling transformations. We will elaborate these four steps in the following.

Keypoint detection is an essential procedure in volumetric image matching. In 3DSIFT, scale space extrema are identified as keypoints. The detection is based on the scale space theory [26, 27]. In real implementation, DoG (difference of Gaussian) [15, 28] is usually adopted to detect keypoints. Given a volume image, its DOG images can be defined as a stack of images that are smoothed using series of DoG functions with the variable . For efficiency, an image pyramid with octaves and levels per octave is implemented to represent DoG images. The extremes of 3 × 3 × 3 × 3 across scale in each octave are defined as keypoints. The of the Gaussian image in which the keypoint shows up is selected as the keypoint’s scale. For example, a keypoint that shows up on the th level image, its scale is assigned as . The scale of a keypoint can represent its surrounding feature region size (e.g., keypoint with big scale has a big feature region and vice versa). Therefore, we implement a scale-related region definition method in the following direction reorientation and descriptor construction steps.

Orientation assignment provides pixel attributes for descriptor construction. It has been shown that gradient-based descriptors, such as SIFT [15], are more discriminative than intensity-based descriptors [16]. 3DSIFT is a gradient-based descriptor. Voxel gradient can be depicted according to its azimuth (, ), elevation (, ), and magnitude ().

Direction reorientation is an essential procedure to deal with image rotation. It includes two steps: dominant direction definition and keypoint region reorientation. In the previous 3DSIFT algorithms [17], the size of the keypoint regions is defined as a constant (e.g., sized 16 × 16 × 16). However, this definition does not distinguish keypoints with different feature regions. Keypoints with small feature regions have the same size of keypoint regions with the large ones. Accordingly, the definition of the dominant direction is not accurate. In the paper, inspired by SIFT [15], we implement a direction reorientation step that is related to scale. Our size of the keypoint region for orientation definition is defined as , where is a user-defined parameter and is the scale of the keypoint defined above. The definition of is related to the content of volume images. In our lung CT images application, we set . Consequently, this definition is more accurate.

Descriptor construction is to build description for each keypoint. In order to design a 3DSIFT descriptor that is insensitive to scale change, similarly, we adore a scale-related descriptor region. We set the region size . After subdivision of the descriptor region into 4 × 4 × 4 subregions, we add magnitudes of the pixels in each subregion (its size is ) according to its and . In order to avoid all boundary effects that the descriptor abruptly changes as a sample shifts from being within one subregion to another or one orientation to another during this step, we implement a 5-linear interpolation method to interpolate within three subregion dimensions (, , and ) and two gradient orientations ( and ). Each entry into a bin is multiplied by a weight of for each dimension, where is the distance of the sample from central value of the bin as measured in units of the histogram bin spacing. Then, we join the feature of these subregions and get the descriptor. In order to discard information due to noise or illumination, we normalize the descriptor to and discard features below a threshold (0.2 in the paper) by setting it as 0. Finally we normalize the descriptor into and get the final descriptor.

5. Revised Spectral Matching

After initial feature extraction and matching (Section 4), we obtain an initial correspondence set , indicating potential matches between features on and . To filter out incorrect matches, we can use the spectral matching algorithm. Compared with the well-known RANSAC [22] algorithm, spectral matching is in general more efficient [20, 25], especially when a good initial correspondence set is given.

5.1. Basic Idea of Spectral Matching

Spectral matching is an efficient spectral method to find consistent correspondences in an initial correspondence set . An assignment graph is assigned to represent . Each node represents a correspondence, while the weights on the edges represent the pairwise agreements between the correspondences. can be represented by an affinity matrix . The definition of is shown as follows: where and are two correspondences from the potential correspondence set. is the distance between points and , the same as . is a nonnegative symmetric matrix and each encodes the pairwise agreement of right matches between and correspondences.

The basic idea of spectral matching is from an observation: correct assignments are likely to establish link with each other and thus form a strongly connected cluster, while incorrect correspondences establish links with others only accidently so that they are unlikely to belong to strongly connected clusters. Accordingly, correct matches form a strong cluster in , while incorrect matches do not. This phenomenon can be illustrated by the principal eigenvector of (denoted as ). The elements of represent the confidence of each match. Correct matches have big values in , while incorrect matches have small ones. Consequently, a greedy selection method based on is presented to pick out the correct matches. Intuitively, the matches are picked based on the elements from big value to small value, while the one-one matching constraint is considered.

5.2. A Revision on Spectral Matching

The basic idea of spectral matching is strongly based on the analysis of the affinity matrix and the assignment graph . When most of the elements in are larger than 0, usually corresponds to a connected graph; when most of them are 0, usually corresponds to a unconnected graph.

The basic spectral matching algorithm works well when is highly connected. However, when a discriminative descriptor was used to establish the potential correspondence set, the mismatch pairs have been significantly reduced, and is often sparsely connected or even not fully unconnected (see Figures 2(a)2(c)). For such graphs, the basic spectral matching algorithms have the following problems.

The first problem is the theoretical error on unconnected graphs for spectral matching. As was mentioned above, spectral matching can be illustrated by the principal eigenvector of . However, this is not true when is not a primitive matrix (i.e., corresponds to a unconnected graph) according to the Perron-Frobenius theorem [29]. Directly applying spectral matching on unconnected graphs is theoretically unsupported.

The second problem is the failure on filtering out unobvious wrong matches. Spectral matching considers all the positive values in . It can filter out those matches whose values equal zero or which do not meet the matching constraint. However, when wrong matches have some connection with others they have positive value in . Spectral matching can not filter them out correctly.

The existing spectral matching algorithms [20, 25] can not work well when is unconnected, specifically, when initial correspondences in are almost one-one matching. In this work we present a revised spectral matching (ISM) to overcome these problems (see Figures 2(d)2(f)).

First, is decomposed into several submatrices. Suppose that has , connected components; it can be decomposed into primitive submatrices. Hence, it is theoretically supported to apply spectral matching on each submatrix.

Second, a threshold is given to filter out unobvious wrong matches. In particular, we add a reject ratio , to discard mismatches. represents the rate of matches that we want to discard. A small could not filter out mismatches, while a larger leads to a higher accuracy but could reject many correct matches at the same time. We find that a suitable should be corelated with the accuracy rate of . We always choose or a slightly larger value. The accuracy rate can always be estimated by randomly choosing a small amount of correspondences for verification. For example, in our biomedical application (see Section 8), we model the deformation of lung volume images during the respiratory cycles, where organs will not deform too drastically. And medical therapists suggest that correspondences should not have the offset more than 3 voxel in -direction and 20 voxels in - or -directions. These type of rules can be used as guidance to estimate and therefore .

6. Feature-Guided Image Registration

This algorithm is based on our previous paper [24]. However, unlike [24], which focuses on solving a continuous parameterization and smooth trajectory of an image sequence, we solve this registration to evaluate the effectiveness of our coarse correspondence results. Hence we used a simplified model with only the intensity term and regularization term. The computation of this model is hence simpler.

Assume that we have two volume images. One image, the moving image , is deformed to fit the other image, the fixed image . and are defined on their own spatial domain: and , respectively. We are going to find a transformation that makes spatially aligned to . The transformation is defined as a mapping from the fixed image to the moving image; that is, . The quality of alignment is defined by a cost function with a regularization term . In here, we will adopt the sum of squared differences with a feature regularization term: where weighs similarity against feature constraints. To solve the above minimization problem, there are basically two approaches: parametric and nonparametric. In here we will consider the parametric methods by introducing a parametrization model of the transformation. The original optimization problem thus becomes where the subscript indicates that the transformation has been parameterized. The vector contains the values of the “transformation parameters.” In here, we use the nonrigid B-spline transformation. Consider with the knot points, the cubic multidimensional B-spline polynomial, the B-spline coefficient vectors (the control point displacements), the B-spline control point spacing, and the set of all knot points within the compact support of the B-spline at . The parameters are formed by the B-spline coefficients .

The cost function is defined as with the domain of the fixed image and the number of voxels. Given a transformation , this measure can easily be implemented by looping over the voxels in the fixed image , calculating by interpolation, and adding the squared difference to the sum.

Basing on our proposed feature matching algorithm, we can get the feature correspondences between two volume images. They will guide the spline-fitting. The regularization term therefore minimizes the distance of two corresponding feature points: where indicates the size of the correspondence set and are corresponding features from the fixed and moving images, respectively.

7. Experimental Results

We denote the first two steps of our algorithm (that use I3DSIFT and ISM) as I3DSIFT-ISM. This section demonstrates the effectiveness of our feature matching (the first two steps of our algorithm). In our experiments, we generate an image pyramid with octaves and levels, where we set and . We set for I3DSIFT-ISM. Our experiment materials are lung volumetric images. We perform experiments on two sets of data: and . includes 3 real volume images (SE_1, SE_5, and SE_8) from different respiratory periods of the same person. is from the public POPI data [30]. The volume images in have a resolution of 465 × 465 × 20, while images are 482 × 360 × 141. In preprocessing, we extract region of interests (ROI) through the graph-cut image segmentation and rescale the gray level of each volume image to .

In our experiments, we detect scale space extremes as keypoints. We match descriptors based on NN (nearest neighborhood) search and only those matches whose ratio of NN distance and the second NN distance is smaller than a threshold ( in our paper). We perform the NN search in two directions and a matched pair is kept only if it satisfies the NN optimality in both directions. We conduct NN search using a k-d tree algorithm for the sake of efficiency. We adopt a 1.5 voxel matching accuracy in the experiments on synthetic images. In the paper, most existing volumetric image matching algorithms are included to make a full range of comparison on volumetric image matching. The algorithms include global histogram (Global) [18], reoriented global histogram (Reor.) [18], N-SIFT [18], and 3DSURF [21].

We implement a 3DSURF descriptor based on the paper [21] and use a source code provided in [18] on global histogram, reoriented global histogram, and N-SIFT. I3DSIFT and I3DSIFT-ISM are implemented in C++ and ITK. We show the performance of I3DSIFT and I3DSIFT-ISM on synthetic images to demonstrate the performance with scaling and rotation change. Moreover, we show results on real data.

7.1. On Synthetic Images

Rotation insensitive is an important property to evaluate volumetric matching methods. In order to demonstrate the performance of I3DSIFT-ISM with rotation change, we do matching on SE_1 (from the first dataset ) and its synthetic images with transforms of varying angles about -axis. Figure 3 shows the result. In Figure 3, I3DSIFT gets an accuracy about 80% while N-SIFT [18] and 3DSURF [21] will fail when rotation change is large (larger than ). I3DSIFT outperforms N-SIFT [18] and 3DSURF [21]. I3DSIFT performs much better when the rotation angle is times of 10°, because we define the dominant direction every 10 degrees. As a result, we can change the way we define the dominant direction, for example, every 5° definition instead. However, we should make a compromise between accuracy and efficiency. I3DSIFT-ISM gets an accuracy of almost 100% in every rotation angle. Above all, I3DSIFT is less insensitive to rotation change to the state of the art, while, furthermore, I3DSIFT-ISM is nearly rotation-invariant.

Scale insensitive is another important property to evaluate volumetric matching methods. In order to demonstrate the performance of I3DSIFT-ISM with scale change, we show matching results on SE_1 and its down-sampling synthetic images by a varying factor along each of the 3 axes. Figure 4 shows the result. I3DSIFT outperforms N-SIFT [18] and 3DSURF [21]. There are some further improvement when using I3DSIFT-ISM. I3DSIFT is relatively robust to scale change and I3DSIFT-ISM further improves the robustness.

In conclusion, I3DSIFT-ISM is very robust against rotation and insensitive to scaling.

7.2. Matching Real Data

Our dataset images are volumetric CT images scanned during respiratory cycles. indicates the th frame. We pick three frames 1, 5, and 8, and use I3DSIFT-ISM to match their corresponding pairs: SE_15, SE_58, and SE_18. The gray values of corresponding points vary little (i.e., the lighting condition is stable) in the volume images. Therefore, we compute mean of Sum of Square Difference to evaluate the matching results. Effective matching should produce small value. Consider where and are the fixed image and the moving image, is a matching pair between the two images, and is the match number. Table 1 shows the results.

I3DSIFT finds matches about 2 times of N-SIFT and 1.5 times of 3DSURF. Moreover, I3DSIFT is more accurate. The I3DSIFT-ISM filters out mismatches of the matching result of I3DSIFT. The I3DSIFT-ISM is the most accurate. Figure 5 shows some matching results on volumetric images using I3DSIFT.

8. Application on Feature-Guided Image Registration

Integrating the step-3, we now apply our whole volumetric matching algorithm on the publicly available POPI dataset. In the 3D volume at time frame , the coherent landmarks (a set of 3D points, denoted as ) are available and can be used to evaluate the registration.

In our experiments, we adopt the 3D consecutive image registration with the alinement of corresponded features enforced. The registration results are evaluated by the mean target registration error (MTRE) between the set of landmark points . MTRE is defined by where is a landmark in time and is the transformation from time to time . In our experiments, we set the control weight in (3) as . Table 2 shows the comparison results between feature-guided matching and the registration algorithm without any feature constraints . By comparing the MTRE errors, we can see that the method with both I3DSIFT and I3DSIFT-ISM outperforms (by 8%) the method that does not consider the feature correspondences.

Also, the method with I3DSIFT-ISM outperforms the method with I3DSIFT. This demonstrates that(i)correct feature correspondences (computed by our first two-steps) are very important in improving the volumetric matching accuracy;(ii)better (more matches, higher correctness rate) feature correspondences lead to better matching.

Based on our feature-guided image registration, we can do the dynamic tumor motion modeling. With our volumetric registration between consecutive images, we can build up (by linear interpolation) a temporal deformation model. The motion and dynamics of the tumor can be analyzed using this model. The coherent tracking of the deforming tumor is visualized in Figure 6.

9. Conclusion

We propose a volumetric feature modeling and coarse correspondence algorithm for volumetric images. The pipeline aims to facilitate general nonrigid volumetric image registration by providing a reliable coarse feature matching. Our algorithm includes three main steps. (1) We develop a revised 3DSIFT (I3DSIFT) that is less sensitive to scaling and rotation. (2) We propose a revised spectral matching (ISM) that more reliably refines the initially extracted feature correspondences. (3) We use a B-spline function to solve the dense correspondences from the coarse (feature) correspondences.

We demonstrate the effectiveness of our coarse matching computation on both synthetic and real data by showing that our algorithm outperforms the existing volumetric matching algorithm, such as N-SIFT [18], 3DSURF [21], global histogram [18], and the reoriented global histogram [18]. Our method can be used for reliable volumetric matching. As its application, we show effective feature-guided image registration using our volumetric matching pipeline. We show that it outperforms the image registration method without feature guidance.

A limitation of the current pipeline is the expensive computational cost in 3DSIFT. Feature extraction based on this needs a convolution in 3-dimensional space so it is quite slow. However, in this multiscale space, the computation can be easily conducted in a parallel manner. In the near future, we will explore its GPU/multithread implementation to achieve more efficient computation.

Conflict of Interests

The authors declare that there is no conflict of interests regarding the publication of this paper.

Acknowledgments

This work is partially supported by the US National Science Foundation IIS-1320959 and IIS-1251095; US Louisiana Board of Regents LEQSF(2009-12)-RD-A-06, LEQSF-EPS(2009)-PFUND-133, and LEQSF-EPS(2013)-PFUND-312; National Natural Science Foundation of China no. 61170323 and no. 61373147; Science and Technology Research Projects of XMUT YKJ11012R and YKT10037R; and the 12th Five-year Plan Project of Xiamen Education Science no. 1250.