Computational and Mathematical Methods in Medicine

Volume 2016 (2016), Article ID 7419307, 14 pages

http://dx.doi.org/10.1155/2016/7419307

## A Parallel Nonrigid Registration Algorithm Based on B-Spline for Medical Images

^{1}School of Electronic & Information Engineering, Lanzhou Jiaotong University, Lanzhou 730070, China^{2}Lanzhou Yuxin Information Technology Limited Liability Company, Lanzhou 730000, China^{3}College of Electrical & Information Engineering, Shaanxi University of Science & Technology, Xi’an 710021, China

Received 13 August 2016; Accepted 2 November 2016

Academic Editor: Kostas Marias

Copyright © 2016 Xiaogang Du et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

#### Abstract

The nonrigid registration algorithm based on B-spline Free-Form Deformation (FFD) plays a key role and is widely applied in medical image processing due to the good flexibility and robustness. However, it requires a tremendous amount of computing time to obtain more accurate registration results especially for a large amount of medical image data. To address the issue, a parallel nonrigid registration algorithm based on B-spline is proposed in this paper. First, the Logarithm Squared Difference (LSD) is considered as the similarity metric in the B-spline registration algorithm to improve registration precision. After that, we create a parallel computing strategy and lookup tables (LUTs) to reduce the complexity of the B-spline registration algorithm. As a result, the computing time of three time-consuming steps including B-splines interpolation, LSD computation, and the analytic gradient computation of LSD, is efficiently reduced, for the B-spline registration algorithm employs the Nonlinear Conjugate Gradient (NCG) optimization method. Experimental results of registration quality and execution efficiency on the large amount of medical images show that our algorithm achieves a better registration accuracy in terms of the differences between the best deformation fields and ground truth and a speedup of 17 times over the single-threaded CPU implementation due to the powerful parallel computing ability of Graphics Processing Unit (GPU).

#### 1. Introduction

Image registration is a process used in medical image analysis to determine a spatial transformation that aligns image data according to the spatial coordinate of pixels. The process involves a reference image and a moving image; the moving image is deformed by a spatial transformation to align with the reference image [1]. With the development of medical imaging technologies, it is easy to obtain many images with different modalities and gain a more complete understanding of anatomy and function by registering and fusing these images. Generally, according to spatial transformation, image registration is categorized into two types: rigid registration and nonrigid registration. The rigid registration only describes the motion which is limited to global rotations and translations, and the nonrigid registration always includes very complex local and elastic deformations except rigid transformation.

Practically, nonrigid registration is always widely applied in the many clinical procedures to improve the geometric precision, such as disease diagnostic [2], neurosurgery [3] and Image-Guided Radiation Therapy (IGRT) [4]. Up to present, several classical nonrigid registration algorithms have been proposed and improved in the performance of registration results, including viscous fluid registration [5], demons registration [6], Finite Element Model (FEM) [7] and B-spline registration [8]. B-spline registration always employs uniform cubic B-spline curves to determine a displacement vector field which maps voxels in the moving images to those in the reference images [8]. Each voxel’s displacement between the moving image and the reference image is parameterized in terms of uniformly spaced control points that are aligned with the voxel mesh, and displacement vectors are obtained via interpolation of the control-point coefficients using piecewise continuous B-spline basis functions. Recently, B-spline registration became very popular because of their flexibility and robustness and is therefore able to perform both monomodality and multimodality image registration. However, In the B-spline registration algorithm, there are three time-consuming steps: B-spline interpolation, similarity metric computation, and similarity metric gradient computation. First, in the process of B-spline interpolation, a coarse array of B-spline coefficients is considered as the input and a fine array of displacement values is computed as the output defining the deformation vector field from the moving image to the reference image. Therefore, B-spline interpolation is expensive in terms of executing time for the fine deformation mesh with many control points. In addition, similarity metric is a very crucial step to measure the registration results, and it has an important influence on the accuracy of the registration results. In addition, similarity metric is calculated in each iteration of the B-spline registration algorithm, and its computational efficiency has a serious impact on the performance of the whole registration process. Finally, similarity metric gradient computation requires evaluating the partial derivatives of the similarity metric with respect to each spline-coefficient value of all control points; it is therefore a very time-consuming step to obtain better accuracy of registration results in the case of much more control points. In summary, the B-spline registration algorithm always needs more execution time for the larger amount of medical image data, which limits the clinical applications of B-spline registration [9, 10].

Aiming to improve the accuracy and significantly accelerate B-spline registration algorithm, we develop a B-spline-based nonrigid registration algorithm that is suitable for executing on GPU by considering the LSD as similarity metric. In our algorithm, all of the computation-intensive tasks are designed and implemented on GPU using the parallel computing framework, Computing Unified Device Architecture (CUDA). Experiments show that our algorithm improves the accuracy of registration results and achieves a speedup ratio of 17 times comparing to the single-threaded serial implementation. The main contributions are summarized as follows:(1)We introduce the LSD as similarity metric in the B-spline registration algorithm to obtain the higher registration accuracy for medical images.(2)We design a parallel computing strategy and create LUTs for B-splines registration to improve the execution performance. Meanwhile, three time-consuming steps including B-splines interpolation, the similarity metric LSD, and its gradient computation are designed in the form of the kernel functions which are executed in parallel on GPU.

The rest of this paper is organized as follows. Section 2 presents the related works of accelerating B-spline registration using cluster, many-core and GPU. Section 3 describes our B-spline registration algorithm employed the LSD as the similarity metric and NCG optimization. Section 4 explains the parallel computing details of our proposed algorithm based on CUDA. Section 5 shows experiments results and analysis. Finally, Section 6 concludes our work.

#### 2. Related Work

The parallel computing technology is an effective way to accelerate many general purpose applications and has a very crucial influence on the field of high performance computing. Parallel operations can be performed independently on different partition of medical images due to the good data parallel characteristic of medical images. At present, parallel computing has been widely used in medical image processing [11, 12], for example, image registration [13, 14]. There are three parallel computing technologies for accelerating medical image registration: cluster, many-core, and GPU computing.

Cluster computing is a well-established technology of accelerating image processing algorithms and nonrigid registration is also no exception. The main strategy of cluster computing is composed of two stages: dividing the medical image into many different small partitions and processing them independently on different computers while minimizing the communication cost. Zheng et al. [15] used mutual information as a similarity metric and developed parallel nonrigid registration algorithms on a computer cluster. Results reported in Zheng et al. [15] for B-splines show a speedup of for processors compared to a sequential implementation.

The recent developments in many-core processor technology offer new opportunities for truly large-scale and cost-effective parallel computing on a single chip. Meanwhile, multiprocessor computing techniques are utilized for accelerating image registration algorithms. Jiang et al. [16] used a FPGA-based implementation leading to a speedup of 3.2 times compared with a 2.666 GHz CPU execution. Rohlfing and Maurer Jr. [17] reduced computation time more than 50 times by using 64 CPUs of a shared memory supercomputer. More recently, Rohrer et al. [18] presented a multicore implementation of the B-spline computation based on a Cell Broadband Engine platform, reporting a speedup of 40 times over a sequential CPU implementation. Although these techniques provide considerable computation time improvements, they require either high technical knowledge or hardware with prices inhibiting wide adoption.

Modern GPU is the extension of many-core technology and has a large number of cores on the chip, and this many-core architecture of GPU is more suitable for parallel computing. For algorithms that can be parallelized within the SIMD model, a single GPU offers the computing power equivalent to a small cluster of desktop computers. As noted in the introduction, a bottleneck of the B-spline FFD registration algorithm is the cubic B-spline computation. Consequently, some work has been done to speed up this part by using GPU. For example, Sigg and Hadwiger [19] developed a cubic B-spline interpolation method utilizing linear interpolation of the GPU hardware; however, such an approach suffers from the limitations imposed by the low fixed-point precision of the hardware interpolator as reported by Ruijters et al. [20], which limits its applicability to robust and precise medical image registration.

In addition, researchers proposed many improved approaches to accelerate B-spline registration algorithm with respect to B-spline interpolation and similarity metric computation. Modat et al. [21] presented a parallel-friendly formulation algorithm that is suitable for GPU, it only take less than 1 min to perform registration of T1-weighted MR images without reducing the registration accuracy. Shackleford et al. [22] employed the SSD as similarity metric and accelerated B-spline registration within the stream-processing model using GPU, and this algorithm achieved a good speedup over the single-threaded CPU. Ikeda et al. [23] proposed an efficient GPU-based acceleration method for the nonrigid registration of multimodal images, and their main contribution is efficient utilization of on-chip memory for Normalized Mutual Information (NMI). Ellingwood et al. [24] adopted the sum of squared tissue volume difference as a similarity metric and thus developed a multilevel B-spline transform method to implement nonrigid mass-preserving registration of two CT lung images on GPU, and this method outperforms its multithreaded CPU version in terms of running time.

#### 3. Our Algorithm

Aiming to significantly improve the accuracy and the performance of B-spline registration algorithm, we propose a CUDA-based parallel B-spline registration algorithm which employs LSD as the similarity metric and used NCG as optimization strategy. In this section, we describe the details of our B-spline nonrigid image registration algorithm. Firstly, we represent the theory of describing nonrigid deformation using the cubic B-spline basis function. Secondly, we introduce the LSD as the similarity metric of our algorithm. Finally, we explain the NCG optimization procedure and the gradient of similarity metric in our algorithm.

##### 3.1. B-Spline Free-Form Deformation

B-spline FFD is used as the nonrigid transformation model, and three-dimensional (3D) deformation field is defined by 3D mesh of control points in the Cartesian coordinates. The deformation field for a voxel located at coordinates in the reference image can be mathematically described as follows [15, 21]:

In (1), is the cubic B-spline basis function in the direction [8], and represents the deformation mesh. In the 3D medical image data, a partition region is composed by the eight adjacent control points, and the 3D spacing coordinate of this partition region is shown as follows:where , , and are the voxel numbers of the , , and directions of the control point mesh. Displacement of voxels in each region are affected around 64 control points. The 64 control points coordinates are calculated in the following scheme [8, 25]:

In the medical image volume data, the voxel with the coordinate has the local coordinate in the region , and the value is normalized in as follows [8, 25]:

##### 3.2. The Similarity Metric LSD

A similarity metric quantifies the degree of alignment and is used to optimize the transformation’s parameters to achieve maximal alignment. The similarity metric has a key influence on the accuracy of registration results. Sum of Squared Difference (SSD) is a very classical similarity metric based on pixel computation, and it is widely used in the monomodality medical image registration. However, when SSD is used as the similarity metric, due to the fact that the SSD values of the different iterations are very closed in the later stage of optimization process, it leads to appear premature convergence rate and also affects the accuracy of the registration results.

Since the function value of the logarithm transformation has different influence on the different intervals of the definition domain, especially when the independent variable value is more and more small, the function value changes more quickly. In other words, the difference between two values of logarithmic function is more sensitive to the interval which has a small logarithmic values than the interval which has a large logarithmic values. In the latter stage of the registration process, the difference between the reference image and the moving image subject to local deformation becomes much smaller, the logarithmic operation can make the SSD of similarity metric to change more quickly, and it is helpful to find best deformation mesh more effectively. Therefore, in order to further improve the accuracy of registration results, we perform logarithmic operation on SSD and introduce the LSD as the similarity metric of the B-spline registration algorithm. LSD can be mathematically described as follows:

In (5), represents the reference image, is the moving image with B-spline deformation, and , , and are separately the voxel numbers in the , , and directions of reference and moving images. In order to promote smooth deformation, a penalty term is added to the LSD value. The final cost function to be optimized is a balance between the LSD and the deformation penalty:where . The penalty-term was first used for image registration by Rueckert et al. [8].

##### 3.3. B-Spline Registration Optimization Based on NCG

Conjugate Gradient (CG) is a very effective optimization method for optimization problem. CG overcomes the shortcoming of slow convergence of the gradient descent method and also avoids the shortage from Newton’s method, which needs to store and compute the Hessian matrix and the inverse matrix [26]. The NCG method is an extension of CG and suitable for minimizing general nonlinear functions. Due to the fact that the B-spline registration process is a nonlinear optimization problem, the NCG method is adopted for the entire registration process optimization in our algorithm. So it is important to calculate the gradient of the similarity metric for control points in every iteration and update the control point mesh according to the negative gradient direction from the previous search. Iteratively, the process of updating parameters of control point mesh can be mathematically described in where denotes the displacement vector of control points in the iteration process. is the iteration number. is a scalar, which represents the displacement magnitude of the negative gradient direction, and is obtained using the inexact linear search method. represents the search direction, which is calculated aswhere is a variable, which is calculated as

As seen in (10), we thus require the derivative of the cost function:

In order to decrease the computational complexity, we propose a voxel-centric approach to evaluate the point-centric gradient using the chain rule. We first compute the gradient for every voxel and then gather the information from all voxels to compute the gradient for control points. In (11), is calculated via the chain rule aswhere denotes the partial derivative of similarity metric for voxels, and it is calculated as follows:where is the moving image after the deformation, is the reference image, and denotes the initial gradient of the moving image. In (12), the second term is the derivative value of deformation field with respect to a control point, and the calculation form is described in

In (14), is the only metric related to the B-spline basis function, and it is also a constant value when the local coordinates of voxels in the mesh partition are determined. In order to simplify the parallel computing process of our algorithm, we set to 0.

#### 4. Parallel Accelerating Procedure for Our Algorithm

To effectively improve the execution efficiency of the B-spline registration, we employ the GPU to accelerate the time-consuming steps of our algorithm. In this section, we present the parallel computing details of our algorithm using the GPU. First, we explain the main procedure of our algorithm based on GPU. Secondly, we develop the parallel computing strategy of B-spline registration based on the characteristic of locality of B-spline transformation and design the LUTs for storing the reusable data for parallel kernel functions executing on the GPU. Then, we present kernel functions for these time-consuming steps. Finally, we summarize our proposed parallel algorithm based on GPU.

##### 4.1. The Main Procedure of Our Algorithm

In general, data parallelism is the main requirement for an algorithm to benefit from GPU execution. Our nonrigid registration algorithm based on B-spline FFD comprises three components, which may be considered independently: transformation of the moving image using the splines interpolation, evaluation of a similarity metric, and optimization against this similarity metric. In these three components, the optimization process always requires adequate iterations and the current iteration is always dependent to the previous iteration, so it is serial process and is not implemented in parallel on the GPU. Individually, B-splines interpolation and evaluation of a similarity metric may be formulated in a data parallel manner as they mainly consist of voxel-wise computations. In our algorithm, NCG is used for the optimization, so the first order derivative of the similarity metric is computed to determine the optimal direction of the next iteration. Since the LSD is used as the similarity metric for evaluating the registration quality in our algorithm, we also design the parallel kernel function of the gradient of similarity metric LSD in terms of the highly parallel data features. The main framework of our algorithm using GPU is shown in Algorithm 1.