Abstract

To align multimodal images is important for information fusion, clinical diagnosis, treatment planning, and delivery, while few methods have been dedicated to matching computerized tomography (CT) and magnetic resonance (MR) images of lumbar spine. This study proposes a coarse-to-fine registration framework to address this issue. Firstly, a pair of CT-MR images are rigidly aligned for global positioning. Then, a bending energy term is penalized into the normalized mutual information for the local deformation of soft tissues. In the end, the framework is validated on 40 pairs of CT-MR images from our in-house collection and 15 image pairs from the SpineWeb database. Experimental results show high overlapping ratio (in-house collection, vertebrae , blood vessel ; SpineWeb, vertebrae , blood vessel ) and low target registration error (in-house collection, ; SpineWeb, ) are achieved. The proposed framework concerns both the incompressibility of bone structures and the nonrigid deformation of soft tissues. It enables accurate CT-MR registration of lumbar spine images and facilitates image fusion, spine disease diagnosis, and interventional treatment delivery.

1. Introduction

Spine is the backbone of body trunk. It protects the most significant nerve pathway in the spinal cord and the body. On the other hand, spine injury and disorders affect up to 80% world population and may cause deformity and disability, which become a major health and social problem [13]. For instance, the lumbar degenerative disease accompanied by pathological changes might result in lumbocrural pain, neural dysfunction, instability of facet joints, and spino-pelvic sagittal imbalance, and thus, the quality of life decreases dramatically. In addition, due to the aging population, the global burden relating to spinal disease remedy is expected to raise significantly in the next decades.

To align intrapatient multimodal images, such as computerized tomography (CT) and magnetic resonance (MR), benefits clinical diagnosis, treatment planning, and delivery for lumbar spinal diseases [4, 5]. However, few methods were dedicated to matching lumbar spine images. Panigrahy et al. developed a method for CT-MR cervical spine images which needed anatomical landmarks to guide image registration [6]. Palkar and Mishra combined different orthogonal wavelet transforms with various transform sizes for CT-MR spine image fusion, while interactive localization of control points was required [7]. Tomazevic et al. implemented an approach for rigid alignment of volumetric CT or MR to X-ray images [8]. To simplify the registration problem in real-world scenarios, images were acquired from a cadaveric lumbar spine phantom and three-dimensional (3D) images contained only one of the five vertebrae. Otake et al. proposed a registration method for 3D and planar images which was used for spine intervention and vertebral labeling in the presence of anatomical deformation [9]. Harmouche et al. designed an articulated model for MR and X-ray spine images [10]. Hille et al. presented an interactive framework, and rough annotation of the center regions in different modalities was used to guide the registration [11].

Accurate alignment of intrapatient CT-MR images is challenging. From the anatomy, human spine consists of inflexible vertebrae surrounded by soft tissues, such as nerves, vessels, and muscles. Moreover, the vertebrae of lumbar spine are connected by facet joints in the back, which allows for forward and backward extension and twisting movements. Moreover, spinal deformity imposes difficulties on multimodal image registration. Specifically, during image acquisition, patients can lay flatly for a short time due to pain, and subsequently, motion becomes unavoidable. Last but not the least, there are intrinsic differences between CT and MR imaging.

Figure 1 shows a pair of intrapatient CT-MR images. It is found that in CT images, the lumbar spine region easily highlights itself from the rest of soft tissues (the top row), while in MR images, soft tissues show various intensities and in particular, it might be hard to distinguish rigid bones from soft tissues (the bottom row). In the figure, soft tissues in MR images are with various contrast than those in CT images (red arrows), undesirable artifacts caused by the bias field are observed in MR images (green arrows), and these pairs of images show different imaging field of views. It is obvious that these facets pose difficulties in image registration.

Image registration is important in medical image analysis [12, 13, 14]. Based on similarity metrics, registration methods could be generally classified into intensity- and feature-based methods. Among the intensity-based methods, mutual information (MI) is well known, and it was primarily presented for MR breast image alignment [15]. Afterwards, the metric is used in multimodal medical image registration [16]. For specific applications, MI has been modified to enhance the performance of image registration. For instance, normalized MI (NMI) was proposed for invariant entropy measure [17], regional MI was implemented to capture volume changes when local tissue contrast varied in serial MR images [18], localized MI was designed for atlas matching and prostate segmentation [19], conditional MI was developed to incorporate joint histogram and intensity distribution for image description [20], self-similarity weighted αMI was presented for handheld ultrasound and MR image alignment [21], and MI was also advanced with spatially encoded information [22].

Feature-based methods aim to quantify detected landmarks with features for image registration. Ou et al. collected multiscale multiorientation Gabor features to weight mutual-saliency points for matching [23]. Zhang et al. used scale-invariant features and corner descriptors for lung image registration [24]. Heinrich et al. designed modality independent neighborhood descriptor (MIND) which extracted the distinctive structure in small image patches for multimodal deformation registration [25]. Via principal component analysis of deformation, a low-dimension statistical model was learned [26]. Toews et al. combined invariant features of volumetric geometry and appearance for image alignment [27]. Determined by the moments of image patches, a self-similarity inspired local descriptor was presented [28]. Jiang et al. designed a discriminative local derivative pattern which encoded images of different modalities into similar representation [29]. Woo et al. combined spatial and geometric context of detected landmarks [30], and Carvalho et al. considered intensity and geometrical features [31] into a similarity metric. Weistrand and Svensson constrained image registration with anatomical landmarks for local tissue deformation [32].

Embedding a proper penalty term into a similarity metric is helpful in specific applications. Rueckert et al. used a term to regularize the local deformation to be smooth in breast MR image registration [33]. Rohlfing et al. designed a local volume preservation constraint, assuming the soft tissues incompressible in small deformation [34]. Staring et al. proposed a rigidity penalty and modeled the local transform when thorax images with tumors were aligned [35]. To model fetal brain motion, Chen et al. utilized the total-variation regularization and a penalty was adopted toward piece-wise convergence [12]. Due to local tissue rigidity characteristics, Ruan et al. added a regularization term for aligning inhale-exhale CT lung images [36]. Fiorino et al. designed the Jacobian-volumehistogram of deforming organs to evaluate the parotid shrinkage [37].

This study proposes a coarse-to-fine framework to address the registration of intrapatient CT-MR images of lumbar spine. It develops a similarity metric that penalizes a bending energy term into NMI for local deformation of soft tissues. The most similar work is from the comparison of bending energy penalized global and local MI metrics in aligning positron emission tomography and MR images [38], while this study differs itself from the proposed coarse-to-fine registration framework, the bending energy penalized NMI (BEP-NMI) and the application to CT-MR lumbar spine images.

3. Materials and Methods

3.1. Data Collection

Two data sets were analyzed. One is our in-house collection which contains 40 pairs of lumbar spine images from the Department of Radiology, Shenzhen Second People’s Hospital, the First Affiliated Hospital of Shenzhen University. CT images were acquired through SIEMENS SOMATO. The voxel resolution is , and the matrix size is with slices. T2-weighted MR images were acquired using a 1.5 Tesla scanner (SIEMENS Avanto). The physical resolution is , the matrix size is , and the slice number ranges between 60 and 75.

The other data set is accessible online, namely SpineWeb (http://spineweb.digitalimaginggroup.ca). It includes 15 image pairs of lumbar spine. The physical resolution of CT images is , the image size is , and the slice number is 77 per volume. The resolution of T1-weighted MR images is , the image size is , and each volume contains 42 slices.

3.2. The Proposed Framework

The proposed framework consists of two steps both of which use intensity-based image registration methods. An intensity-based registration method can be treated as an optimization problem, and the similarity metric performs as the cost function. Given a fixed image and a moving image in 3D space, image registration aims for mapping the moving image to the space of the fixed image guided by the metric . When an additional regularization term of is penalized into , the registration problem can be formulated as,

where is a transform model, compromises the metric and the regularity term , is the transform coefficients, and is the initialized model by .

Figure 2 illustrates the proposed framework. It indicates a rigid registration stage and a hierarchical deformation stage, and NMI and BEP-NMI, respectively, perform as the similarity metric. Moreover, adaptive stochastic gradient descent (ASGD) [39] is applied for hyperparameter optimization. Specifically, an affine transformation with 12 degrees of freedom is employed in the first stage, and a B-spline elastic model is used for free-form deformation in the second stage.

3.2.1. Rigid Registration

An affine transform model is used here. The transform can be formulated by

where is a matrix that contains the rotation, scale, and shear coefficients, is the center of rotation, is a translation vector, and is a vector of 12 degrees of freedom in volumetric image registration.

Rigid registration attempts for global positioning of the whole body, and thus, an initial alignment of lumbar spine. A 3-level recursive pyramid denotes smoothing that downsamples the source volumes by a factor of 2. Besides, the metric NMI and the affine transform are employed in each scale.

3.2.2. Hierarchical Deformation

Hierarchical deformation is a coarse-to-fine adjustment procedure [40]. This setup utilizes Gaussian pyramid without downsampling to match images from the global structures toward the fine details.

B-spline transform. The B-splines are used to depict the local shape difference between the lumbar vertebrae. To construct the B-spline based free-form deformation model, let be a spatial domain of a 3D image. A lattice () of control points is denoted as , spanning the integer grid in , and denotes the control point at () on the mesh . Then, the elastic model can be expressed as a 3D tensor product of the uniform B-spline of order 3 as below,where , and repents the lth basis function of the B-spline,

where . The basic functions weigh the contribution of each control point to based on its distance to the point .

Since the B-splines can be locally controlled, it makes the computation efficient for a large number of control points. In particular, changing a control point affects only the transforms of its local neighborhood.

BEP-NMI. The metric MI is preferred in multimodal image registration. Given and with intensity bins of and , MI is quantified from a joint probability function and marginal probability distribution functions.

of and . The metric MI between a pair of images, and , can be described aswhere and are the marginal entropy and the is the joint entropy of and .

The metric NMI is more robust to the change of overlapped tissue regions. It uses a Parzen-window approach to estimate the probability density function. The entropy of a fixed image is defined as , where is a probability distribution estimated by using Parzen-windows. The entropy of a moving image can be computed in a similar way. And subsequently, the NMI between and can be presented as

In order to regularize the B-spline deformation and to prevent the rigid structures from being smoothed, a BEP term is added to the NMI. The new cost function, BEP-NMI, is formulated aswhere and are predefined constants to weigh between global similarity and local regularity. In this study, off-line experiments indicated that was a good choice.

The penalty terms are commonly based on the first or second-order spatial derivatives of the transform [35, 36]. In this study, the BEP term is composed of the second-order derivatives [35, 40] in the volumetric space,where is a 3D image. The Equation (8) can be approximated as a discretely sampled sum over the volume as below,where is the number of voxels in , and denotes a sum of the squared second-order derivatives of inside the integral part in Equation (8) at a voxel location . Specially, the derivative approximation with finite differences can be restricted to the local neighborhood of the control point.

Optimization. Given an initial parameter , an optimization algorithm updates an incremental to reduce the cost function iteratively. ASGD is used in the study, since it runs faster and less likely to get trapped in the local minima when compared to other gradient-based optimization algorithms [39]. Notably, ASGD implemented in the elastix package (http://elastix.isi.uu.nl) is used for adaptive step size prediction and the initial parameters are set as those in [39, 40].

3.3. A Comparison Method

The MIND is a feature-based method and it has been widely used in multimodal deformable registration [25, 41]. It aims to represent the distinctive image structure in a local neighborhood and explore the similarity of small image patches by using Gaussian-weighted patch distances [25].

MIND can be formulated by a distance , a spatial search region and a variance estimate as below,

where is a normalization constant, the search region, a convolution filter of size , a convolution filter, and a dense sampling on . As such, an image can be represented by a vector of size at each location . Moreover, can be computed based on a mean of the patch distances within a small neighborhood

In Equation (10) to Equation (12), denotes a six-connected neighborhood and indicates a volume block.

The similarity metric used in MIND comes from the sum of absolute difference. To the fixed image () and the moving image (), the local difference at a voxel is

The default value of is 6 and it means 6-connected neighbors are taken into computation.

3.4. Performance Evaluation
3.4.1. Tissue Overlapping

Tissue overlapping quantifies the overlapping ratio of outlined tissue regions in the fixed and its aligned image, which can distinguish the reasonable from the poor registration [42, 43]. This study focuses on the region of lumbar vertebrae and blood vessels. Assuming the outlined tissues in the fixed and aligned image are, respectively, denoted as and , the voxel-wise Jaccard () index and Dice () coefficient can be, respectively, described aswhere indicates the number of voxels per volume.

3.4.2. Target Registration Errors

As for landmark annotation, ImageJ (http://imagej.nih.gov/ij/) was used. A pair of CT-MR images are displayed side-by-side. Then, landmarks are identified and manually annotated by an imaging radiologist (3+ year experience) and further confirmed by a senior radiologist (10+ year experience). Once landmarks are annotated, their locations in 3D space are recorded. In this study, anatomical landmark points are localized on the vertebral body center (VBC), neural edge (NE), disc center (DC), and blood vessel edge (BVE).

Target registration error (TRE) evaluates the distance between anatomical point pairs in the fixed and moving image. Here, assuming and , respectively, denotes the corresponding landmark point pairs in the fixed and moving image, the mean for a given is defined aswhere is the number of pairs of landmark, and indicates Euclidian distance in 3D space.

3.5. Software and Platform

The whole framework is implemented with Insight Segmentation and Registration Toolkit (http://www.itk.org) and the elastix package [40]. Experiments are performed on a desktop computer equipped with dual-core Intel i7 CPU (3.70 GHz) and 16 GB RAM memory.

4. Results

4.1. Tissue Overlapping

Figure 3 illustrates the tissue overlapping measure J of CT-MR image registration on the in-house collection (left) and the SpineWeb (right). The left shows that the proposed framework outperforms the MIND method on the vertebrae ( versus ) and blood vessel ( versus ) overlapping. In the right figure, the framework achieves higher values (vertebrae, ; blood vessel, ) than the MIND method (vertebrae, ; blood vessel, ), and thus, it leads to better performance.

Figure 4 shows the overlapping ratio D of multimodal image registration on the in-house collection (left) and the SpineWeb (right). The left figure indicates that the coarse-to-fine registration framework obtains better results than the MIND method on the vertebrae ( versus 0.77±0.05) and blood vessel ( versus ) overlapping. In the right figure, the MIND method (vertebrae, ; blood vessel, ) obtains inferior performance than the proposed framework (vertebrae, ; blood vessel, ).

4.2. Target Registration Errors

Figure 5 demonstrates the mean TRE value of anatomical landmark points between the proposed framework and the MIND algorithm on the in-house collection. The error-bar plot indicates that the TRE of the proposed framework is less than 3.00 mm (DC), while that of the MIND algorithm is larger than 4.00 mm (VBC) on average. In addition, statistical analysis indicates that the proposed framework significantly outperforms the MIND algorithm in each of the four sets of landmarks (, two-sample -test).

Table 1 shows the TRE values (, ) with respect to different landmark sets. The coarse-to-fine framework achieves TRE between (BVE) and (DC), while the TRE of the MIND method ranges from (BVE) to (DC), correspondingly larger than that from the proposed framework.

The mean TRE on the SpineWeb dataset is shown in Figure 6. It is observed that the TRE value of the proposed framework is less than 3.00 mm (VBC and NE), while the MIND algorithm leads to the TRE values larger than 5.00 mm.

Statistical analysis indicates significant difference of the TRE values between the proposed framework and the MIND algorithm on aligning the pairs of VBC and BVE landmarks (, two-sample -test).

Table 2 summarizes the mean TRE values on different sets of landmark pairs. The proposed framework achieves the TRE values between (BVE) to (VBC), and the TRE values of the MIND algorithm ranges from (BVE) to (VBC).

4.3. Perceived Quality of Image Alignment

Visual assessment of registration quality is perceived from the fusion of CT and MR images and observed from three perspective views in Figure 7, where are the CT image, are the MR image, are the aligned image from the proposed framework, and are the aligned image from the MIND algorithm. Red arrows directing to the soft tissue regions and green arrows directing to the bone regions are used for comparison. Before registration, both bones and tissues are misaligned, such as acantha , bones and nerves and muscles . After image registration, the proposed framework aligns these parts in the MR images with fine deformation to the CT images. Specifically, both rigid bones and soft tissues are well matched, and the anatomical textures shows consistent distributions in the aligned image. On contrary, the MIND algorithm fails to overlap the acantha (), bones and nerves () and muscles () accurately.

4.4. Computation Time

Based on the software and platform, it took about 62 seconds to complete the affine registration and 427 seconds to complete the deformable registration. And thus, it required a total of 8.15 minutes to fulfill the coarse-to-fine registration for a pair of CT-MR lumbar spine image.

5. Discussion

Intrapatient multimodal image registration can fuse multisource information that benefits disease diagnosis and treatment delivery. This study develops a coarse-to-fine framework and aligns intrapatient CT-MR lumbar spine images. It first utilizes the similarity metric NMI for global positioning, and then, bending energy penalized NMI for local deformation of soft tissues. The proposed framework achieves high tissue overlapping ratio and low target registration error. It not only preserves the incompressibility of vertebrae but also well matches local soft tissues that provide accurate elastic registration of lumbar spine images for clinical applications.

The proposed framework is a coarse-to-fine approach for multimodal image registration. It aligns anatomical structures and addresses the potential difference on the fields of view and the intrinsic differences between medical imaging. The metric NMI is used, since it is a robust and accurate measure in multimodal image registration [17, 44]. After global positioning, a new similarity metric that integrates a bending energy term into NMI is used for local deformation and registration of soft tissues in medical images. It is worth of note that the term encourages smooth displacements in registration [33]. Ceranka et al. embedded the term to improve multiatlas segmentation of the skeleton from whole-body MR images [45], and de Vos et al. integrated the term into unsupervised affine and deformable image registration by using a convolutional neural network [46]. Both works [45, 46] figured out that the term caused significantly less folding in image registration.

The framework takes the incompressibility of vertebrae into account. Vertebrae are bony structures which are connected to each other by the ligamentum flavum at the neural arch [47]. The proposed framework enables global and local image structures well matched, and inflexible bones and soft tissues properly deformed. Its superior performance has been verified on the in-house collection and the SpineWeb database. Experiential results demonstrate that the overlapping ratio of annotated vertebrae and blood vessels are larger than 0.85, and the target registration error is less than 2.40 mm on average. It outperforms the MIND algorithm partly due to its proper deformation of local soft tissues and incompressible lumbar vertebrae. The registration quality is further perceived in a CT-MR image pair. It is found that the marked tissues keep relative location after image registration by using the proposed framework, since it not only well tackles the local soft tissue deformation but also conserves the rigid lumbar vertebrae.

Even if the proposed framework achieves superior performance on aligning CT-MR lumbar spine images, there is still room for further improvement. One way to enhance registration accuracy is by transferring multimodal image registration into mono-modal image registration. Wachinger and Navab developed structural representations, such as Entropy and Laplacian images, which could represent the images in a third space where the images showed close intensity or gradient distribution [48]. Moreover, deep networks have been explored to estimate CT images from MR images directly and in particular, the mapping between CT and MR images was learned without any patch-level pre- or postprocessing [49]. Another straightforward way is to utilize deep networks to learn the deformation field between different imaging modalities [50]. In addition, interactive image registration is admirable in interventional surgery and a doctor user could localize landmarks to guide and to update the registration procedure [51].

There are several limitations in this study. One limitation comes from no comparison of NMI and BEP-NMI on deformable image deformation, since our off-line experimental results show that the NMI based deformable registration is prone to distortion of lumbar spine and unnatural deformation of soft tissues. Moreover, demons and its variants [52, 53, 54] failed in the registration of lumbar spine images. Thus, this study reports the performance of the proposed framework and the MIND method. In addition, how to properly balance the BEP term and the NMI is always a problem and no existing methods could well tackle this issue, while prior knowledge [35, 37] could be employed for further improvement of the registration accuracy.

6. Conclusions

This paper presents a coarse-to-fine framework for the registration of intrapatient CT-MR lumbar spine images. It integrates the bending energy term into normalized mutual information for fine deformation of soft tissues around the incompressible vertebrae. Its high performance benefits multisource information fusion for accurate spine disease diagnosis, treatment planning, interventional surgery, and radiotherapy delivery.

Data Availability

The in-house collection of MR-CT image pairs used to support the findings of this study are restricted by the Medical Ethics Committee of Shenzhen Second People’s Hospital in order to protect patient privacy. The SpineWebdata set of MR-CT images used to support the findings is freely available online (https://spineweb.digitalimaginggroup.ca/spineweb/index.php?n=Main.Datasets). If interested, requests for access to these data can be made to the author Shibin Wu (https://[email protected]). Since the database is freely available, requests for access to these data can also be made to the author Shibin Wu (https://[email protected]).

Conflicts of Interest

The authors declare there is no conflict of interest. The founding sponsors had no role in the design of this study, in the collection, analysis or interpretation of data, in the writing of this manuscript, nor in the decision to publish the experimental results.

Acknowledgments

The authors would like to thank the editor and anonymous reviewers for their invaluable comments that have helped to improve the paper quality. This work is supported in part by grants from Shenzhen matching project (GJHS20170314155751703), National Key Research and Develop Program of China (2016YFC0105102); the National Natural Science Foundation of China (61871374); the Leading Talent of Special Support Project in Guangdong (2016TX03R139); the Science Foundation of Guangdong (2017B020229002, 2015B02023301); and CAS Key Laboratory of Health Informatics.