Abstract

This work presents a novel method for multimodal medical registration based on histogram estimation of continuous image representation. The proposed method, regarded as “fast continuous histogram estimation,” employs continuous image representation to estimate the joint histogram of two images to be registered. The Jensen–Arimoto (JA) divergence is a similarity measure to measure the statistical dependence between medical images from different modalities. The estimated joint histogram is exploited to calculate the JA divergence in multimodal medical image registration. In addition, to reduce the grid effect caused by the grid-aligning transformations between two images and improve the implementation speed of the registration method, random samples instead of all pixels are extracted from the images to be registered. The goal of the registration is to optimize the JA divergence, which would be maximal when two misregistered images are perfectly aligned using the downhill simplex method, and thus to get the optimal geometric transformation. Experiments are conducted on an affine registration of 2D and 3D medical images. Results demonstrate the superior performance of the proposed method compared to standard histogram, Parzen window estimations, particle filter, and histogram estimation based on continuous image representation without random sampling.

1. Introduction

Image registration is the task of finding the optimal geometric transformation between two images. Image registration is widely used in numerous fields, such as medical imaging, computer vision, and remote sensing. Medical images from different modalities can provide various complementary information. Therefore, the registration of multimodal medical images is valuable in multimodal diagnosis and computer-aided surgery [13].

Information-theory-based image registration has become a popular method for multimodal medical images. In these methods, mutual information (MI) is a frequently used similarity measure simultaneously proposed by Collignon et al. [4], Maes et al. [5], Wells III et al. [6], and Viola and Wells III [7]. These approaches do not require any preprocessing and have been found to hold for a range of applications. Many similarity measures based on information theory have been employed for medical image registration, for example, normalized MI [8], MI combined with gradient information [9], cumulative residual entropy (CRE) [10, 11], and cross-CRE [12].

The critical element for all these information-theory-based approaches for image registration is the estimation of probability density functions (PDFs). Standard histograms have been the most frequently used for estimating probability distributions [4], which is a straightforward algorithm and easily implemented. However, Rajwade et al. [13] pointed out that the number of histogram bins determines the estimation of PDFs, and a relatively small number can provide the preferable PDFs. Standard histogram method also yields discontinuous probability estimates, and there are no principle techniques to select the suitable number of bins for calculating the PDFs of images to be registered. The Parzen window (PZW) [7] has been exploited for probability density estimation in image registration. This method was developed further by Thévenaz and Unser [14]. However, the optimal selection of kernel function and kernel width is a challenging task when utilizing PZW to estimate PDFs. Nonparametric (NP) windows [15] were introduced to compute PDFs more accurately than PZW. However, the computational and implementation complexity of NP windows limits the method’s actual application. Joshi et al. [16] simplified the implementation of NP windows by using planar interpolation. Standard histogram and PZW methods possess one common property; that is, both approaches regard an image as a set of pixels or samples [17]. Discrete histogram transform (DHT) was introduced in [18]. This method averages the estimations from several multivariate histograms to alleviate the discontinuities in the boundaries of histogram bins. However, the computational cost of DHT increases quickly as the number of histograms grows. López-Rubio and Muñoz-Pérez [19] proposed the use of multivariate frequency polygons (MFP) to reduce computational complexity. By contrast, probability estimations based on DHT and MFP also rely on training samples and bin width. Maes et al. [5] introduced the partial volume estimation (PVE) to calculate probability distributions. Chen and Varshney proposed the generalized PVE [20] technique, which applies a general kernel function to determine fractional votes of each intensity pair. The estimated methods based on partial volume construct joint probability distribution through continuous image representation (CIR), whereas the PVE approach still substantially requires the choice of kernel function [17].

In our earlier work [21], a novel divergence measure called the Jensen–Arimoto (JA) divergence was presented as registration criterion for 2D-2D rigid medical registration. To calculate the JA divergence, the joint probability distributions between two images to be registered need to be estimated using the standard/simple histogram (SH) algorithm, PVE, or PZW method. Whereas the critical element for SH based estimation methods is selecting the suitable number of bins, PVE and PZW both belong to the continuous estimation of densities. A comparison between these two methods to density estimation has been introduced in [22]. Darkner and Sporring [23] also concluded that PZW is algorithmically a superior estimator for NMI compared to GPV. Nevertheless, these two approaches both need to select the appropriate kernel function. In a word, the number of bins for SH and selection of kernel function for PVE and PZW are the current challenges and limitations in estimation of densities. To overcome these problems, we introduce a CIR for estimating the joint histogram of two images to be registered. Different from the aforementioned approaches, our method avoids binning problem and the choice of kernel function. The most straightforward strategy to estimate joint histograms by CIR is to use all voxels from aligned images. However, this way is time consuming for the images with large sizes. Additionally, the grid effect caused by grid-aligning transformations and the discontinuities of similarity measures are two major problems associated with these information-theory-based methods [24]. Therefore, to solve these issues, we choose a certain number of random samples from the images to be registered, instead of all pixel points, to estimate the joint histogram. The computation time for histogram estimation is then reduced by random sampling strategy. Our proposed histogram estimation method is regarded as “fast continuous histogram estimation” (FCHE). We apply the FCHE algorithm to estimate joint probability and then calculate the JA divergence for multimodality medical registration. Affine transformation models are employed, and the simplex Nelder–Mead method [25] is used to search the maximum of the JA divergence. To evaluate the performance of our registration technique based on FCHE, we implement experiments on 2D and 3D medical images. The results demonstrate that FCHE method can provide better registration accuracy and improve the implementation efficiency.

The remainder of this paper is organized as follows. In Section 2, the FCHE method for joint histogram estimations and a review of the JA divergence measure are introduced. The subsequent image registration technique is detailed in Section 3. Section 4 provides our experimental results on 2D and 3D medical images. This section also compares these findings with those of the registration strategies based on continuous histogram estimation (CHE) without random sampling, SH, PF, and PZW. Concluding remarks and perspectives are presented in Section 5.

2. Theory

2.1. FCHEs

In information-theory-based registration approaches, the joint probabilities between two images to be registered are the critical element to successfully calculate the similarity measure. As mentioned above, several algorithms for computing joint probabilities include SH, PZW, and PVE. In this work, a spatially continuous representation of an image is adopted to estimate a joint histogram for obtaining joint probabilities.

2.1.1. Joint Histogram in 2D

Two images, namely, the reference image () and the floating image (), are considered along with their intensity values normalized into a specified number of bins. For every grid point in each image, around which the intensity values of four neighborhood points were estimated by Rajwade et al. [13], here the distances between the grid point and four neighborhood points are all 0.5 pixels (Figure 1(a)). These intensity values of four neighbors in nongrid points are estimated through interpolation methods, which inevitably increase the computational cost and introduce interpolation error into registration results. To address this problem, four grid points are used to compose a square for histogram estimations (Figure 1(b)), and then the square (a pixel) is split into two triangles (half-pixels).

The intensity values of the three vertices in each triangle are regarded as linear functions of the vertices’ respective coordinates. The related equations are as follows:where (, ) denotes the coordinates of three vertices (no. ) of the triangle shown in Figure 1(d) and , and represent the corresponding intensity values of images and , respectively. Additionally, , , and are the coefficients obtained by solving the equations defined in (1). For the values of the three coefficients, the intensity values of any coordinate () within each triangle of two images are calculated using . Conversely, given a pair of intensity values (one intensity value in and one in , signified as ) within the range of specified bins, we can yield two isointensity lines, and . The joint histogram is updated if the intersection of two lines () falls into a pair of corresponding triangles from two images as shown below:where represents the joint histogram of the entry () between two images and . We repeat the aforementioned process and cover all entries in the range of bins to acquire the entire joint histograms.

2.1.2. Joint Histogram in 3D

Similar to the joint histogram in 2D, we still account for the CIR when using 3D images. Eight vertices of a voxel in grid points are selected to constitute one cube, which is divided into 12 or 24 tetrahedrons through the cube center [17]. The intensity values of the eight vertices (grid points) can be directly acquired, whereas that of one center is evaluated through interpolation because of its location on a nongrid point. Nevertheless, we partition the cube (or one voxel) into six equal volumes using the eight vertices of this cube. Hence, we obtain six tetrahedrons (Figure 2(b)) without the need of the cube’s center. Similar to the estimation in 2D, the intensity values of the four vertices of each tetrahedron are deemed as linear functions of the vertices’ respective coordinates as shown below:

Likewise, and are the coordinates and intensity values of the four vertices of each tetrahedron, along with of the two images and as the four vertices. In addition, , , , and are the coefficients that determine the linear equations in (3). The values of these coefficients are computed through these equations. On the basis of the coefficients, the two isointensity planes and are established. For the 3D case, we first consider the segment of the intersecting line of two planes. Then, the joint histogram of () is accumulated by measuring the length of the segment within each pair of homologous tetrahedrons from two respective images. The related equation is as follows:where and denote the segment length and the joint histogram, respectively. Considering one tetrahedron, the calculation process for the joint histogram in 3D is shown in Figure 2(c).

2.1.3. Random Sample

Information-theory-based similarity measures rely on the computation of a joint histogram between two images or volumes to be registered. The straightforward strategy for calculating the joint histogram is applying all pixels or voxels. However, the high implementation cost impedes the application of this strategy for large images or 3D images. An alternative approach is selecting a subset of pixels or voxels from an entire image. Generally, the subset is chosen from a uniform grid or random coordinate. Notably, random samples can be obtained from the grid points or the nongrid coordinates [25]. Bhagalia et al. [26] proposed selecting samples that lie on notable image features.

Press et al. [25] demonstrated that random samples on nongrid locations can improve the smoothness of MI. They also compared their experimental results to those obtained from several sample methods provided by Klein et al. [27]. The random sampling off the grid can reduce the grid effect and discontinuities of information-theory-based similarity measures. Accordingly, while applying our registration method, we adopt a subset of samples selected from the nongrid coordinates in the fixed image to estimate the joint histogram in 2D or 3D. Finally, under FCHE (Algorithm 1), the execution time of histogram estimations is substantially shortened while the registration accuracy is preserved. For instance, estimating the joint histogram between two 3D images with dimensions of 41 × 41 × 41 voxels approximately lasts for 21.7 s without the need of a random sample. By contrast, the evaluated time under the FCHE strategy is only about 1.5 s when choosing 5000 random samples from the fixed image.

Input: Two images and .
Stage 1: The coordinates of sample points of and are obtained by random sampling strategy.
Stage 2: Each sample point is partitioned into two triangles (2D) or six tetrahedrons (3D case).
Stage 3: The isointensity lines or isosurfaces of the corresponding sample points from two images are constructed.
Stage 4: The joint histogram between and is updated using the isointensity lines or isosurfaces. Go to stage 2,
unless all sample points are exhausted.
Output: The joint histogram between and .
2.2. JA Divergence

The Arimoto entropy [28], a generalization of Shannon entropy, was introduced by Arimoto and further developed by Liese and Vajda [29]. Its definition is given by

Unlike Shannon entropy, Arimoto entropy is a nonextensive entropy. The parameter α reflects the degree of nonextensivity. The nonextensivity gradually weakens as the parameter approaches 1, and the limit of the Arimoto entropy equals the Shannon entropy when α→ 1.

We then follow the derivation of the Jensen–Shannon divergence reported by Lin [30]. In our previous work [21], a divergence measure based on the Arimoto entropy called the JA divergence was introduced. The JA divergence was used to measure the distance of two random variables and showed a superior performance in registering medical images to those of other information-theory measures.

Given a random variable with probability distributions of , we define the JA divergence [21] aswhere denotes the Arimoto entropy for , , and represent weight factors, such that with . When the parameter approaches 1, the Arimoto entropy converges to the standard Shannon entropy, and the limitation of the JA divergence in (6) resembles that of the traditional MI obtained by L’Hopital’s rule.

3. Description of the Proposed Method

Herein, we described our medical registration method. First, the FCHE method was applied to estimate the joint histogram of two images to be registered and calculate their joint probability distributions. Then, the JA divergence used as similarity measure was derived in an affine transformation space. Finally, the simplex optimization scheme was employed to search for the maximum of the JA divergence, along with the optimal transformation being obtained. A block diagram of our registration algorithm (Algorithm 2) consisting of five stages is displayed in Figure 3, where the sequence numbers⑤ denote these stages, respectively.

Input: Reference image , floating image
Stage 1: Given the initialized transformation , Output is the transformed floating image .
Stage2: Random sampling strategy was used for and , and then output are the coordinates of sample points.
Stage 3: The joint histogram between and was estimated using these sample points. The output is the
joint probability distribution of and .
Stage 4: The joint probability distribution was adopted to calculate the JA similarity measure between and .
The output of this stage is the value of JA between and .
Stage 5: The simplex optimization method was employed. If the stop criteria of optimization method are satisfied,
output is the optimal transformation , otherwise go to stage 1.
3.1. Registration Algorithm

For two misaligned images to be registered, one is selected as the floating image F and the other is considered as the reference image R. As an example, Figure 4 describes the corresponding slices in T1-weighted magnetic resonance (MR) and T2-weighted MR volumes ((a) denotes the reference image, and (b) denotes the floating image). The goal of image registration is to search the optimal transformation between and . We denote and by an arbitrary point in and its corresponding point in , respectively. The spatial transformation relation between the corresponding points in and can be formulated aswhere represents the mapping function and is a vector of parameters.

Image registration aimed to align the floating image to the reference image by maximizing a similarity measure between and . In other words, the registration problem is formulated as the following optimization problem:where denotes a suitable similarity measure, which achieves its maximum when the reference image and floating image are completely registered, and stands for the floating image transformed by the space mapping .

3.2. Transformation Model

Rigid, affine, perspective, and curve (elastic) transformation models were reported to simulate geometric transformations of images [3]. Rajwade et al. [17] reported that the CHEs are not suitable for deformation registration because the derivatives of histograms acquired by CHE are not analytically derived. Therefore, in this paper, we restrict the transformation to rigid cases. The parameter is the vector set of 6 degrees of freedom (DoFs) for 3D rigid transformation. These 6 DoFs include three displacements in the , , and directions and three rotation angles around the -, -, and -axes. Hence, we model the transformation and rewrite (7) as follows:where and hold the same denotations as those in Section 3.1, , , and are the translation parameters, and , , and denote the rotation parameters. In the following section, we adopt the transformation model defined by (9).

3.3. Similarity Measure

In image registration, the value of the similarity measure for two misaligned images is expected to be maximum when the images are perfectly aligned. Hence, choosing an appropriate similarity measure is an important task in image registration. As mentioned in Section 2.2, the JA divergence can reach the maximum when the difference between two random variables is minimal. Therefore, the JA divergence measure in (6) is applied as a similarity measure, and is given as follows:

By combining (8) and (10), we formulate the optimal spatial mapping between and aswhere denotes the set of estimated parameters and )), , represents the conditional probability distribution of the transformed floating image given the reference image.

The intensity values in and are denoted by and ), respectively. is the number of the chosen intensity bins for estimating the joint probabilities. By substituting and into the conditional probability , we can rewrite the formula as = . We choose the JA divergence with different weights and values as the similarity measure shown in (10). In (6), the reference image is adopted as basis for image registration, and the marginal probability distributions of the reference image are chosen as the weights; that is, . By substituting and into (6), we rewrite the similarity measure as

The similarity measure defined in (12) is then maximized using an optimization method to search for the optimal transformation parameters.

3.4. Optimization

The optimization algorithm mainly influences the convergence speed of a similarity measure. Hence, an appropriate optimization method is valuable to a registration framework. We do not obtain the derivatives of the probability distributions using FCHE method. Hence, gradient-based optimization techniques are not available herein. In this work, we exploit the downhill simplex method (Nelder–Mead [31] or Amoeba optimization) to maximize the similarity measure defined in (12), considering that this method only requires function evaluations without need for derivatives.

Notably, the downhill simplex optimization scheme is terminated if the difference between the current function value and the best function value is less than 0.01 and the maximum of difference between the current value and the best value for all parameters is less than 0.1. The scheme is also terminated if the number of iterations reaches a previously fixed value MAX. In our work, MAX is set to 100.

4. Experiments and Results

To evaluate the FCHE method, two groups of experiments were implemented on 2D and 3D medical images. In Section 4.1, the datasets used are described. The performance of the FCHE method in registration accuracy is exemplified through 2D registration in Section 4.2. Meanwhile, the method addresses the rigid registration of 3D medical images in Section 4.3.

The FCHE method, CHE algorithm without random sampling, SH, and PZW algorithms were coded by Visual C++ and implemented in Visual Studio 2012. The classes and functions were employed in the Insight Toolkit (ITK) [32]. In this work, all experiments were performed on a 3.10 GHz and 8 GB-memory personal computer. Meanwhile, we set the nonextensive parameter in the JA divergence as = 1.50. We also used the downhill simplex optimization to maximize the JA measure in all experiments.

4.1. Test Images

The datasets for 2D tests were obtained from the Visible Human Project [33]. We chose the head subset, including three protocols of MR images, T1-weighted MR, T2-weighted MR, and proton density- (PD-) weighted MR images (Figure 5) to enforce 2D registration. These images were acquired in the Portable Network Graphics (.png) format and aligned originally with one another at dimensions of 256 × 256 pixels. 3D registration experiments were carried out on simulated 3D datasets. The simulated data were obtained from the Brainweb database of the Montreal Neurological Institute [34]. These data contained 3D brain MR imaging (MRI) volumes simulated through the following three different protocols: T1-weighted, T2-weighted, and PD-weighted with various slice thicknesses, noise levels, and intensity nonuniformities. In the simulated experiments for 3D, the size of the volumes (with voxel coding on 8 bits) was 181 × 217 × 181 voxels at a voxel size of 1 mm × 1 mm × 1 mm. All corresponding slices in these volumes were aligned with each other.

4.2. Registration for 2D Images

The performance of our registration scheme based on FCHE method was first assessed using the real 2D brain data. The role of the number of bins and random samples employed in our method was examined and compared with those in the CHE without random sampling, SH, PZW, and particle filter (PF) [35].

4.2.1. Parameter Setting

The registration parameters were chosen by trial and error from the real data. To demonstrate the effect of these parameters, we performed the experimental process as follows. Using the pair of MR T1 and MR PD volumes mentioned in Section 4.1, we chose the former volume as reference image. By contrast, the latter volume was randomly transformed and considered as the floating image. Twenty initial transformations were generated, where the and coordinates were derived from the range 0–10 mm. To quantitatively evaluate the registration results, we calculated the difference between the true values and the estimated values as the registration error.

Figure 6 displays the mean and standard deviation of the registration errors when varying the number of bins and the number of random samples . The lowest mean of the registration error was obtained when = 32 (herein, = 3000) (Figure 6(a)). By contrast, the number of random samples = 3000 achieved the relatively low errors (Figure 6(b)). In the sequel, we used = 32 and = 3000 for 2D images.

4.2.2. Accuracy

To illustrate the registration accuracy of our method, we compared the CHE, SH, PZW, and PR methods with the FCHE in terms of real 2D brain datasets MR T1 and MR PD. In the process, the MR T1 image was also applied as the reference image. We selected 20 affine transformations, which involved displacements along the and directions generated randomly from the range 10–20 mm, rotation angles generated randomly from 0° to 20°, and scales generated randomly from 0.9 to 1.1. Consequently, a total of 20 pairs of test images and 100 affine registrations were obtained for the five methods. Table 1 shows the means and standard deviations of the registration errors in all 100 trials that exploited the five algorithms. Notably, the registration errors incurred under the FCHE method are lower than those acquired under the other four methods. Furthermore, the FCHE method improved the execution efficiency in estimating joint distributions compared with the CHE algorithm.

4.2.3. Noise Resistance

This section assesses the performance of our method, as well as the CHE, SH, PZW, and PF algorithms, in the presence of different noise levels. Similar to Section 4.2.1, MR T1 and MR PD images were chosen as the reference and floating images. Three levels of Gaussian noise with zero mean were added to the MR PD image by applying the imnoise function of Matlab. The respective variances were 0.01, 0.05, and 0.1. Then, 20 initial affine transformations were obtained to transform each noisy MR PD image. As a result, 20 pairs of test images were acquired for each noise level. Altogether, 300 registrations were carried out for the five approaches.

The resulting registration errors are displayed in Table 2. In each cell, the numbers in round brackets denote the means of the translation errors in the and directions. The rotation angles with their standard deviations are in square brackets. As shown in Table 2, the registration accuracies of FCHE are superior to those of the other four algorithms for each noise level.

4.3. Registration for 3D Images

To evaluate the performance of the FCHE approach applied to 3D image registration, we compared the method with CHE, SH, and PZW on three groups of simulated 3D brain datasets introduced in Section 4.1, namely, MR T1 and MR T2, MR T1 and MR PD, and MR T2 and MR PD. In each pair of test images, the subvolumes of size 41 × 41 × 41 voxels were exploited to implement the registration experiments along with the former image applied as reference image. Then, the zero-mean Gaussian noise with the variance of 0.05 was added to the latter of each image pair using the imnoise function of Matlab. We selected 20 rigid transformations with , , and coordinates generated randomly from the range 0–10 mm to transform the noisy images. Consequently, 20 pairs of test images for each group of data were generated, and a total of 240 3D rigid registrations for the four methods were implemented. Table 3 displays the statistics of the registration errors of the successful experiments in 20 tests conducted on three groups of images using four algorithms. Here, an experiment is regarded as “success” if the registration error in every direction is less than 1 mm. It is observed from Table 3 that the registration accuracies using the FCHE and CHE methods are comparable and better than those employing SH and PZW approaches. Meanwhile, the FCHE method also improved the registration efficiency compared to CHE.

4.4. Discussion of Experimental Results

Experiments were conducted on 2D and 3D datasets; the former used real brain data (MR T1, MR T2, and MR PD images), and the latter included simulated 3D images (MR T1, MR T2, and MR PD volumes). In 2D registration experiments, to compare the registration accuracy of our method with those of other four algorithms, 100 trials were carried out based on affine transformations using parameters randomly generated within a certain range. To access the immunity of noise, three levels of Gaussian noise were added to the floating images and 300 trials were performed. The results obtained for the 2D real images showed that our alignment technique provided lower registration errors and a better robustness to noise than those exploiting SH, PZW, and PF strategies and CHE without random sampling.

For 3D case, three groups of simulated 3D brain datasets, MR T1 and MR T2, MR T1 and MR PD, and MR T2 and MR PD, were employed to evaluate the performance of our method. For each group of test images, 20 pairs of test images were generated and totally 240 3D registrations for four methods were implemented. Experimental results on simulated 3D data show a better behavior of FCHE method with low registration errors when compared to the other methods, and the subvoxel registration accuracies are achieved by FCHE method in almost all cases.

5. Conclusion

This study introduced the FCHEs to estimate the joint probability distribution and accelerate the registration process by employing CIR and random samples on nongrid points. Estimation methods for 2D and 3D joint histograms were also elaborated. This work reduced the grid effect caused by grid-aligning transformations between two images to be registered. The reduction of such effect was achieved when a joint histogram was calculated from a subset of pixels randomly selected from nongrid coordinates of the fixed image instead of entire images. Then, a similarity measure based on the Arimoto entropy, called the JA divergence, was introduced for medical image registration. This similarity measure is a generalization of the well-known Jensen–Shannon divergence measure. We employed affine transformation as the parameter space, along with the JA divergence as the registration criteria, to carry out the registration experiments. To search the maxima of this similarity measure, we used the downhill simplex optimization approach. Then, we applied the FCHE estimation algorithm to estimate the joint probability distribution of two images to be registered.

As mentioned above, the similarity measure computed by adopting the FCHE estimation algorithm was not differentiable, which is an important property in nonrigid registrations. As a result, gradient-based optimization schemes were not available for the FCHE-based registration method. In the future, we intend to extend our application to other multimodal medical images, such as ultrasound and MR images, and to other organs.

Conflicts of Interest

The authors declare that there are no conflicts of interest regarding the publication of this paper. They also solidly confirm that the mentioned received grants in Acknowledgments did not lead to any conflicts of interest regarding the publication of this paper.

Acknowledgments

This work was supported by the Natural Science Foundation of Henan Province under Grant 162300410338, by the National Natural Science Foundation of China under Grant 61772576, by Key Scientific Research Projects of Henan Province under Grant 17A510006, by Scientific and Technological Program Projects of Henan Province under Grant 172102210071, by the project sponsored by SRF for ROCS, SEM, by Natural Science Foundation of Jiangsu Province under Grants DZXX-031 and BY2014127-11, and by the Qing Lan Project.