About this Journal Submit a Manuscript Table of Contents
International Journal of Biomedical Imaging
Volume 2011 (2011), Article ID 680765, 16 pages
http://dx.doi.org/10.1155/2011/680765
Research Article

Numerical Solution of Diffusion Models in Biomedical Imaging on Multicore Processors

1University of Naples Federico II, Complesso Universitario M.S. Angelo, Via Cintia, 80126 Naples, Italy
2SPACI (Southern Partnership of Advanced Computing Infrastructures), c/o Complesso Universitario M.S. Angelo, Via Cintia, 80126 Naples, Italy
3University of Naples Parthenope, Centro Direzionale, Isola C4, 80143 Naples, Italy

Received 1 April 2011; Revised 16 June 2011; Accepted 24 June 2011

Academic Editor: Khaled Z. Abd-Elmoniem

Copyright © 2011 Luisa D'Amore et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Abstract

In this paper, we consider nonlinear partial differential equations (PDEs) of diffusion/advection type underlying most problems in image analysis. As case study, we address the segmentation of medical structures. We perform a comparative study of numerical algorithms arising from using the semi-implicit and the fully implicit discretization schemes. Comparison criteria take into account both the accuracy and the efficiency of the algorithms. As measure of accuracy, we consider the Hausdorff distance and the residuals of numerical solvers, while as measure of efficiency we consider convergence history, execution time, speedup, and parallel efficiency. This analysis is carried out in a multicore-based parallel computing environment.

1. Introduction

High-quality images are crucial to accurately diagnose a patient or determine treatment. In addition to requiring the best images possible, safety is a crucial consideration. Many imaging systems use X-rays to provide a view of what is beneath a patient’s skin. X-ray radiation levels must be kept at a minimum to protect both patients and staff. As a result, raw image data can be extremely noisy. In order to provide clear images, algorithms designed to reduce noise are used to process the raw data and extract the image data while eliminating the noise. In video imaging applications, data often have to be processed at rates of 30 images per second or more. Filtering noisy input data and delivering clear, high-resolution images at these rates require tremendous computing power. This gave rise to the need of developing high-end computing algorithms for image processing and analysis which are able to exploit the high performance of advanced computing machines.

In this paper, we focus on the computational kernels which arise as basic building blocks of the numerical solution of medical imaging applications described in terms of partial differential equations (PDEs) of parabolic/hyperbolic type. Such PDEs arise from the scale-space approach for description of most inverse problems in imaging [1]. One of the main reasons for using PDEs to describe image processing applications is that PDE models preserve the intrinsic locality of many image processing operations. Moreover, we can rely on standard and up-to-date literature and software about basic computational issues arising in such case (such as the construction of suitable discretization schemes, the availability of a range of algorithmic options, and the reuse of software libraries that allow the effective exploitation of high-performance computing resources). Finally, PDEs appear to be effectively implemented on advanced computing environments [2].

We consider two standard discretization schemes of nonlinear time-dependent PDEs: semi-implicit scheme and fully implicit scheme [3]. The former leads to the solution of a linear system at each time (scale) step, while the computational kernel of the fully implicit scheme is the solution of a nonlinear system, to be performed at each time (scale) step. Taking into account that we aim to solve such problems on parallel computer in a scalable way, in the first case, we use, as linear solver, Krylov iterative methods (GMRES) with algebraic multigrid preconditioners (AMG) [4, 5]. Regarding the fully implicit scheme, we use the Jacobian-Free Newton-Krylov (JFNK) method as nonlinear solver [6].

In recent years, multicore processors are becoming dominant systems in high-performance computing [7]. We provide a multicore implementation of numerical algorithms arising from using the semi-implicit and the implicit discretization schemes of nonlinear diffusion models underlying most problems in image analysis. Our implementation is based on parallel PETSc (Portable Extensible Toolkit for Scientific Computation) computing environment [8]. Parallel software uses a distributed memory model where the details of intercore communications and data managements are hidden within the PETSc parallel objects.

The paper is organized as follows. In Section 2, an overview of the PDE model equation used in describing some of inverse problems in imaging applications will be given. Then, the segmentation problem of medical structures is discussed. Numerical approach will be introduced in Section 3. Section 4 is devoted to the discussion of numerical algorithms based on semi-implicit and implicit numerical schemes. In Section 6, we describe the experiments that we carried out to show both the accuracy and the performance of these algorithms, while Section 7 concludes the work.

2. Diffusion Models Arising in Medical Imaging

The task in medical imaging is to provide in a noninvasive way information about the internal structure of the human body. The basic principle is that the patient is scanned by applying some sort of radiation and its interaction with the body is measured. This result is the data, whose origin has to be identified. Hence, we face an inverse problem. Most medical imaging problems lead to ill-posed (inverse) problems in the sense of Hadamard [911]. A standard approach for dealing with such intrinsic instability is to use additional information to construct families of approximate solution. This principle characterizes regularization methods that, starting from the milestone Tikhonov regularization [12], are now one of the most powerful tools for solution of inverse ill-posed problems. In 1992, Rudin et al. introduced the first nonquadratic regularization functional (i.e., the total variation regularization) [13] to denoise images. Moreover, the authors derive the Euler-Lagrange equations as a time-dependent PDE. In the same years Perona and Malik introduced the first nonlinear multiscale analysis [14].

Scale-space theory has been developed by the computer vision community to handle the multiscale nature of image data. A main argument behind its construction is that if no prior information is available about what are the appropriate scales for a given data set, then the only reasonable approach for a vision system is to represent the input data at multiple scales. This means that the original image should be embedded into a one-parameter family of derived images, in which fine-scale structures are successively suppressed: A crucial requirement is that structures at coarse scales in the multiscale representation should constitute simplifications of corresponding structures at finer scales—they should not be accidental phenomena created by the method for suppressing fine-scale structures. A main result is that if rather general conditions are imposed on the types of computations that are to be performed, then convolution by the Gaussian kernel and its derivatives is singled out as a canonical class of smoothing transformations [15, 16].

A strong relation between regularization approaches and the scale-space approach exists via the Euler-Lagrange equation of regularization functionals: it consists of a PDE of parabolic/hyperbolic (diffusion/advection) type [17], defined as follows.

Nonlinear Diffusion Models
Let and , defined in , be the scale-space representation of the brightness function image defined in describing the real (and unknown) object and the observed image (the input data). Let us consider the following PDE problem: is the scale (time) interval; is a nonincreasing real valued function (for ) which tends to zero as . Initial and boundary conditions will be provided according to the problem to be solved (denoising, segmentation, deblurring, registration, and so on).

Equations in (2) describe the motion of a curve (a moving front) with a speed depending on a local curvature. Such equations, known as level set equations, were first introduced in [18]. The original idea behind the level set method was a simple one. Given an interface in of codimension one (i.e., its dimension is ), bounding an (perhaps multiply connected) open region , we wish to analyze and compute its subsequent motion under a velocity field . This velocity can depend on position, time, the geometry of the interface (e.g., its normal or its mean curvature), and the external physics. The idea, as devised in 1988 by Osher and Sethian, is merely to define a smooth (at least Lipschitz continuous) function , that represents the interface as the set where . Thus, the interface is to be captured for all later time, by merely locating the set for which vanishes. The motion is analyzed by convecting the values (levels) with the velocity field . This elementary equation is Actually, only the normal component of is needed: and the motion equation becomes Taking into account that the mean curvature of is equation (5) describes the motion of under a speed proportional to its curvature (Mean Curvature Motion, MCM equation) [1820]. This basic model has received a lot of attention because of its geometrical interpretation. Indeed, the level sets of the image solution or level surfaces in 3D images move in the normal direction with a speed proportional to their mean curvature. In image processing, equations like (5) arise in nonlinear filtration, edge detection, image enhancement, and so forth, when we are dealing with geometrical features of the image-like silhouette of object corresponding to level line of image intensity function. Finally, the level set approach instead of explicitly following the moving interface itself takes the original interface and embeds it in higher dimensional scalar function , defined over the entire image domain. The interface is now represented implicitly as the zeroth level set (or contour) of this function, which varies with space and time (scale) using the partial differential equation in (2), containing terms that are either hyperbolic or parabolic. The theoretical study of the PDE was done by [21] which proved existence and uniqueness of viscosity solutions.

2.1. A Case Study: Image Segmentation

In this paper, we use equations (2) for image segmentation. The task of image segmentation is to find a collection of nonoverlapping subregions of a given image. In medical imaging, for example, one might want to segment the tumor or the white matter of a brain from a given MRI image.

The idea behind level set (also known implicit active contours, or implicit deformable models) for image segmentation is quite simple. The user specifies an initial guess for the contour, which is then moved by image-driven forces to the boundaries of the desired objects. More precisely, the input to the model is a user-defined point-of-view , centered in the object we are interested in segmenting. The output is the function . Function in (2) is the segmentation function, represents the initial contour (initial state of the segmentation function), and the image to segment is . Moreover, as proposed in [22], instead of following evolution of a particular level set of , the PDE model follows the evolution of the entire surface of under speed law dependent on the image gradient, without regard to any particular level set. Suitably chosen, this flow sharpens the surface around the edges and connects segmented boundaries across the missing information. In [22, 23], the authors formalized such model as the Riemannian mean curvature flow where the variability in the parameter also improves the segmentation process and provides a sort of regularization. Thus, (2) becomes accompanied with initial condition and zero Dirichlet boundary conditions. Regarding , it is usually defined as a circle completely enclosed inside the region that one wish to segment.

The term , called edge detector, is a nonincreasing real function such that while , and it is used for the enhancement of the edges. Indeed, it controls the speed of the diffusion/regularization: if has a small mean in a neighborhood of a point , this point is considered the interior point of a smooth region of the image and the diffusion is therefore strong. If has a large mean value on the neighborhood of , is considered an edge point and the diffusion spread is lowered, since is small for large . A popular choice in nonlinear diffusion models is the Perona and Malik function [14]: , . In many models, the function is replaced by its smoothed version , where is a smoothing kernel, for example, the Gauss function, which is used in presmoothing of image gradients by the convolution. For shortening notations, we will use abbreviation In conclusion, we use the Riemannian mean curvature flow, as model equation of the segmentation of medical structures: given , the initial image and equals to a circle contained inside an object of the image , we are interested in segmenting, we compute by solving (7). The level sets of , at steady state, provide approximations of the contour to detect.

3. Numerical Schemes

Nonlinear PDE in (7) can be expressed in a compact way as where

Scale Discretization
That is discretization with respect to . If is the scale interval and nscales is the number of scale steps, we denote by the th scale-step for all , so that , where is the step-size.

Using the Euler forward finite difference scheme to discretize the scale derivative on the left hand side of (9), we get or, equivalently Let us denote as , , the function evaluated at . Equation (12) is rewritten as

Depending on the collocation value, used to evaluate with respect to the parameter , inside the function on the right hand side of (12) three iterative schemes derive: (i)explicit scheme: , that is, the function is evaluated at ;(ii)semi-implicit scheme: , that is, we use to discretize the numerator of the fraction . Other quantities are evaluated at ;(iii)implicit scheme: , that is, the function is evaluated at .

In summary, the difference between the semi-implicit and the implicit scheme relies on the scale discretization of the term at the numerator of inside the function . This term controls the diffusion process, and it plays the role of edge-enhancement. If we consider the three-dimensional (3D) domain , the semi-implicit scheme employs a sort of 2D + 1 discretization of proceeding along nscales two dimensional (2D) slices each one obtained at , while the fully implicit scheme uses a fully 3D discretization of . This difference suggests that the fully implicit scheme may provide a more accurate edge detection than the semi-implicit scheme. This difference is highlighted by considering their discretization errors.

Space Discretization
That is discretization with respect to . If is the space domain, we introduce a rectangular uniform grid on consisting of (for simplicity we assume that is a rectangular of dimension ; this means that and ), nodes ,, , and we use finite volumes to discretize the partial derivatives of , as in [24, 25].

Scale-Space Discretization
Let be the vector obtained from the scale-space discretization of the function , we have the following iteration formulas.(i)Explicit scheme: where, for each , the matrix and is the unit matrix, while is the scale steps number.

(i)Semi-implicit scheme: where, for each , the matrix and is the unit matrix and is the scale steps number.(ii)Fully-implicit scheme: where is the scale steps number and , for each , is a nonlinear vector operator on , depending on .

In particular, we apply the Crank-Nicholson scheme [3] which uses the average of the forward Euler method at step and the backward Euler method at step : where is a matrix on , depending on , and is a nonlinear vector operator on , depending on .

4. Algorithms and Their Computational Complexity

The effectiveness of these schemes depends on a suitable balance between accuracy (scale-space discretization error), number of flop/s per iteration (algorithm complexity), and the total execution time needed to reach a prescribed accuracy (software performance).

Let us denote the discretization error . It is

Explicit scheme is accurate at the first order both with respect to scale and space, that is, ; anyway, it is the one straightforwardly computable. The computational kernel is a matrix-vector product, at every scale step. This scheme requires very small time steps in order to be stable (CFL (Courant-Friedrich-Levy) condition that guarantees the stability of the evolution), and its use is limited rather by its stability than accuracy. This constraint is practically very restrictive, since it typically leads to the need for a huge amount of iterations [3]. Semi-implicit scheme is absolutely stable for all scale steps. The accuracy, in terms of discretization error with respect to both scale and space, is of the first order, because , [24, 25]. Crank-Nicholson provides a discretization error of second order, that is, , but it requires extra computations, leading to a nonlinear system of equations, at every time step, while stability is ensured for all scale steps [3]. In the following, we collect these results:

Then, the fully implicit scheme provides an order of accuracy greater than that provided by the others. This difference may be important in those applications of image analysis where the edges are fundamental to recognize some pathologies.

Algorithm complexity of these schemes depends on the choice of the numerical solver. Concerning the semi-implicit scheme, we employ Krylov subspace methods, which are the most effective approaches for solving large linear systems [5]. In particular, we use Generalized Minimal RESidual method (GMRES) equipped with Algebraic multigrid (AMG) preconditioner. Such techniques are convenient because they require as input only the system matrix corresponding to the finest grid. In addition, they are suitable to implement in a parallel computing environment. For the fully implicit scheme we use the Jacobian Free Newton Krylov Method (JFNK) [6]. JFNK methods are synergistic combinations of Newton-type methods for superlinearly convergent solution of nonlinear equations and Krylov subspace methods for solving the Newton correction equations. The link between the two methods is the Jacobian-vector product, which may be probed approximately without forming and storing the elements of the true Jacobian.

Let us briefly describe the numerical algorithms that we are going to implement, which are based on the semi-implicit and the implicit discretization schemes, together with their complexity.

Algorithm SI (Semi-Implicit Scheme)
For all solution of with respect to . , for each , is a matrix . As space derivative we use the 2nd order finite covolume discretization scheme (see [24, 25] for convergence, consistence and stability). By this way, matrix is a block pentadiagonal matrix with tridiagonal blocks along the main diagonal and diagonal blocks along the upper and lower diagonals.

Algorithm I (Implicit Scheme)
For all solution of with respect to . , for each , is a nonlinear vector operator on .

Algorithm SI
For each scale step, to solve the linear system (21), we employ GMRES iterative method (see Algorithm 1). Computational kernel of GMRES is a matrix-vector product. Taking into account the structure of the coefficient matrix (we assume that , then ), the computational cost of GMRES is where is the maximum iterations of GMRES (over the scale steps). Computational complexity of Algorithm SI is

alg1
Algorithm 1: Preconditioned GMRES for solving a linear system . Input: (matrix coefficient), (right hand side), (preconditioner). Output: , approximate solution at the th step. For all iteration, a matrix-vector product is required.

Algebraic multigrid (AMG) method follows the main idea of (geometric) multigrid (MG), where a sequence of grids is constructed from the underlying geometry with corresponding transfer operators between the grids [26]. The main idea of MG is to remove the smooth error, that cannot be eliminated by relaxation on the fine grid, by coarse-grid correction. The solution process then as usual consists of presmoothing, transfer of residuals from fine to coarse grids, interpolation of corrections from coarse to fine levels, and optional postsmoothing. In contrast to geometric multigrid, the idea of AMG is to define an artificial sequence of systems of equations decreasing in size. We call these equations coarse-grid equations. The interpolation operator and the restriction operator define the transfer from finer to coarser grids and vice versa. Finally, the operator on the coarser grid at level is defined by

The AMG method consists of two main parts, the setup phase and the solution phase. During the setup phase, the coarse-grids and the corresponding operators are defined. The solution phase consists of a multilevel iteration. The number of recursive calls, which is the number of levels , depends on the size and structure of the matrix. For our case, we use the V-cycle pattern with the FALGOUT-CLJP coarse grid selection [27]. Looking at the Algorithm SI in Table 1, the preconditioner is just .

tab1
Table 1: Comparisons between two algorithms: .

Computational cost of each iteration of GMRES is that of the AMG preconditioner plus the matrix-vector products: then, we get:

Following picture shows a schematic description of Algorithm SI that emphasizes its main steps and the most time consuming operation, that is, the matrix vector products needed at each step of GMRES.

Algorithm I
For each scale step, to solve the nonlinear equations (22), we employ the Jacobian-Free Newton-Krylov (JFNK) method. JFNK is a nested iteration method consisting of at least two and usually four levels. The primary levels, which give the method its name, are the loop over the Newton method: and the loop building up the Krylov subspace out of which each Newton step is drawn: Outside of the Newton loop, a globalization method is often required. We use line search.

Forming each element of which is a matrix of dimension requires taking derivatives of the system of equations with respect to . This can be time consuming. Using the first order Taylor series expansion of , it follows that where is a perturbation. JFNK does not require the formation of the Jacobian matrix; it instead forms a result vector that approximates this matrix multiplied by a vector. This Jacobian-free approach has the advantage to provide the quadratic convergence of Newton method without the costs of computing and storing the Jacobian. Conditions are provided on the size of that guarantee local convergence.

Algorithm 3 shows a schematic description of Algoritm I that emphasizes its main steps and the most time consuming operation, that is, evaluations of the nonlinear operator at each innermost step.

Algorithm complexity of JFNK is where is the maximum number of Newton’s steps, over the scale steps, is the maximum number of GMRES iterations (computed over Newton’s steps and scale steps)), is the number of evaluations of . Finally, we get

A straightforward comparison between the algorithm complexity of these algorithms shows that Algorithm SI asymptotically seems to be comparable with respect to Algorithm SI. Of course, the performance analysis must also take into account the efficiency of these two schemes in a given computing environment. Next section describes the PETSc-based implementation of these algorithms that we have developed in a multicore computing environment.

5. The Multicore-Based Implementation

The software has been developed using the high-level software library PETSc (Portable Extensible Toolkit for Scientific Computations) (release 3.1, March 2010) [8]. PETSc provides a suite of data structures and routines as building blocks for the implementation of large-scale codes to be used in scientific applications modeled by partial differential equation. PETSc is flexible: its modules, that can be used in application codes written in Fortran, C, and C++, are developed by means of object-oriented programming techniques.

The library has a hierarchical structure: it relies on standard basic computational (BLAS, LAPACK) and communication (MPI) kernels and provides mechanism needed to write parallel application codes. PETSc transparently handles the moving of data between processes without requiring the user to call any data transfer function. This includes handling parallel data layouts, communicating ghost points, gather, scatter and broadcast operations. Such operations are optimized to minimize synchronization overheads.

Our parallelization strategy is based on domain decomposition: in particular, we adopt the row-block data distribution, which is the standard PETSc data distribution. Row-block data distribution means that blocks of dimensions of contiguous rows of matrices of dimension are distributed among contiguous processes. By the same way, vectors of size are distributed among processors as blocks of size . Such partitioning has been chosen because overheads, due to redistribution before the solution of the linear systems, are avoided. Further, row-block data distribution introduces a coarse grain parallelism which is best oriented to exploit concurrency of multicore multiprocessors because it does not require a strong cooperation among computing elements: each computing element has to locally manage the blocks that are assigned to it.

The computing platform that we consider is made of 16 blades (1 blade consisting of 2 quad core Intel Xeon E5410@2.33 GHz) Dell PowerEdge M600, equipped with IEEE double precision arithmetic. Because high performance technologies can be employed in medical applications only to the extent that the overall cost of the infrastructure is affordable, and because we consider single images of medium size, we show results obtained by using 1 blade, that is, we run the parallel algorithms on up to cores of a single blade. Of course, in case of multiple images or sequences of images, the use of a greater number of cores may be interesting.

6. Experiments

In this section we present and discuss computational results obtained by implementing Algorithm I and Algorithm SI in a multicore parallel computing machine. Before illustrating experimental results, let us briefly describe the choice of(1)test images, (2)comparison criteria,(3)parameters selection.

(1) Test Images
we have carried out many experiments in order to analyze the performance of these algorithms. Here we show results concerning the segmentation of a (malignant) melanoma (see Figure 7) [28]. Epiluminescence microscopy (ELM) has proven to be an important tool in the early recognition of malignant melanoma [29, 30]. In ELM, halogen light is projected onto the object, thus rendering the surface translucent and making subsurface structures visible. As an initial step, the mask of the skin lesion is determined by a segmentation algorithm. Then, a set of features containing shape and radiometric features as well as local and global parameters is calculated to describe the malignancy of the lesion. In order to better validate computed results and to analyze the software performance, we first consider a synthetic test image simulating the object we are interested in segmenting (see Figure 1).

680765.fig.001
Figure 1: Test 1. Image size: , simulated image and its contour to compute.

(2) Comparison Criteria
We compare the algorithms using the following criteria:(a)distance from original solution. As measure of the difference between two curves, we use the Hausdorff distance measured between the computed curve and the original one. It is well known that the Hausdorff distance is a metric over the set of all closed bounded sets (see [31]), here we restrict ourselves to finite point sets because that is all that is necessary for segmentation algorithms [32]. Given two finite point sets and , the Hausdorff distance between the sets and is defined as follows: where is the euclidean norm. It identifies the point that is fastest from any point of and vice versa, then it keeps the maximum,(b) efficiency: execution time of (serial) algorithms,(c)convergence history: behavior of residuals and iteration numbers of inner solvers,(d)Parallel performance: execution time, speedup, and efficiency versus cores number.

(3) Parameters Selection
We set and . Let us explain how we select the values of the scale step size and the number of scale steps. Regarding , its value is chosen according to that required to . Taking into account that, in Algorithm SI, is accurate at the first order with respect to , while it is accurate at the second order with respect to , in Algorithm I, by requiring that the discretization error is about the same, we get

Finally, in Algorithm SI, the stopping criterion of the linear solver (GMRES) uses the tolerance while the preconditioner AMG uses the tolerance and 25 as maximum number of AMG-levels. In the Algorithm 2, the stopping criterion of the nonlinear solver uses the tolerance

alg2
Algorithm 2: Sketch of Algorithm SI.

alg3
Algorithm 3: Sketch of Algorithm I.

Regarding the number of scale steps ( and ), taking into account that its choice depends on and on the value of , that is, the value of the scale parameter corresponding to steady state of the segmentation function , solution of the PDE model. To check the steady state, we require that the residuals, corresponding to different scale steps, reach the tolerance

We found that this corresponds to (see Figure 2) for Test 1 and to for Test 2.

680765.fig.002
Figure 2: Test 1: The 3D visualization of the segmentation function at steady state .

Test 1: Synthetic Image
In Tables 1, 2, and 3, we show the Hausdorff distance and execution time by requiring that discretization error is of the first, second, and third order, that is, ), ), and ), respectively. Hence, we get the following values of the scale step size:

tab2
Table 2: Comparisons between two algorithms: .
tab3
Table 3: Comparisons between two algorithms: .

Moreover, concerning the number of scale steps, it follows that

Note that while in the first case, that is, if we require , these two algorithms are quite numerically equivalent, both in terms of execution time and of the computed result; as discretization error decreases, Algorithm I appears to be more robust than Algorithm SI, in the sense that Algorithm I reaches the steady state with high accuracy (the Hausdorff distance is of 95%), while the computed results of Algorithm SI are less accurate, even though the execution time of Algorithm I sometimes slightly increases. These results suggest that if it needs to get an accurate and reliable result, Algorithm I should be preferable. Figure 3 show segmentation results. Finally, note that the execution time of these two algorithms asymptotically is the same, as stated by the analysis of computational cost carried on in Section 4.

680765.fig.003
Figure 3: Test 1: Comparisons between segmentation results. On the left Algorithm I. On the right Algorithm SI. First row: , . Second row: , . Third row: ,  .

Convergence History
Convergence history is illustrated by showing the behavior of relative residuals versus the scale steps (see Figures 4, 5, and 6), and by reporting iteration number of the GMRES and of Newton’s method, respectively, (see Tables 4, 5, 6, and 7). We consider , 0.016, and .

tab4
Table 4: Convergence history of Algorithm SI. . First column reports the scale step number, second column reports the number of GMRES iterations at each scale step, and last column reports the maximum number of AMG levels at each GMRES iteration.
tab5
Table 5: Convergence history of Algorithm SI. . On the first column, we denote the scale step number, and the symbol is used to denote the steps ranging from the -th until to the -th. Second column reports the number of GMRES iterations at each scale step, third column reports the maximum number of AMG levels at each GMRES iteration.
tab6
Table 6: Convergence history of Algorithm SI. . First column reports the scale step number, second column reports the number of Newton's steps at each scale step, and last column reports the number of GMRES iterations at each Newton's step.
tab7
Table 7: Convergence history of Algorithm SI. . First column reports the scale step number, second column reports the number of Newton’s steps at each scale step, and last column reports the number of GMRES iterations at each Newton's step.
680765.fig.004
Figure 4: Algorithm SI. Behavior of the relative residual versus 3 scale steps. .
680765.fig.005
Figure 5: Algorithm SI. Behavior of the relative residual versus 25 scale steps. .
680765.fig.006
Figure 6: Algorithm I. Behavior of the relative residual versus 10 scale steps. .
680765.fig.007
Figure 7: Test 2: ELM of a melanoma. Image size is .

In the following, we show results corresponding to the segmentation of a melanoma (see Figure 7). As expected, because this is a real image, the steady state is reached at a scale greater than that of the synthetic test image, that is, , thus both algorithms require a greater number of scale steps to the reach the steady state. Tables 8, 9, and 10, and Figure 8 compare results of Algorithm SI and Algorithm I.

tab8
Table 8: Test 2: Comparisons between two algorithms: .
tab9
Table 9: Test 2: Comparisons between two algorithms: .
tab10
Table 10: Test 2: Comparisons between two algorithms: .
680765.fig.008
Figure 8: Test 2: Comparisons between segmentation results. On the left Algorithm I. On the right Algorithm SI. First row:, , . Second row: , . Third row: ,.

Parallel Performance
We show the performance of the multicore-based parallel algorithms and their scalability as the number of cores increases. We run the parallel algorithms using up to cores of the parallel machine.

Following Figures report execution time, speedup, and efficiency of Algorithm SI, at scale step size (i.e., (i.e., Figures 9, 10 and 11), then same results are shown at scale step size , corresponding to (i.e., Figures 12, 13 and 14).

680765.fig.009
Figure 9: Test 1: Algorithm SI: Total execution time versus the number of cores. . 3 scale steps.
680765.fig.0010
Figure 10: Test 1: Algorithm SI: Speedup versus the number of cores. . 3 scale steps.
680765.fig.0011
Figure 11: Test 1: Algorithm SI: Parallel efficiency versus the number of cores. . 3 scale steps.
680765.fig.0012
Figure 12: Test 1: Algorithm SI: Total execution time versus the number of cores. .
680765.fig.0013
Figure 13: Test 1: Algorithm SI: Speedup versus the number of cores. .
680765.fig.0014
Figure 14: Test 1: Algorithm SI: Parallel efficiency versus the number of cores. .

Finally, we report execution time, speedup, and efficiency of Algorithm I, at scale step (i.e., Figures 15, 16 and 17) and (i.e., Figures 18, 19, and 20), respectively. Note that parallel efficiency of both algorithms always is, at least, of 60% and, on average, of about . In particular, parallel efficiency of Algorithm I is about 90%. Execution time of both algorithms reduces to about 2 seconds on eight cores in the first case (i.e., , and to about 10 seconds in the second case (). This means that, in a multicore computing environment, both algorithms provide the requested solution within a response time that can be considered quite acceptable in medical imaging applications and, in particular, that Algorithm I is competitive with Algorithm SI.

680765.fig.0015
Figure 15: Test 1: Algorithm I: Total execution time versus the number of cores. .
680765.fig.0016
Figure 16: Test 1: Algorithm I: Speedup. .
680765.fig.0017
Figure 17: Test 1: Algorithm I: Efficiency. .
680765.fig.0018
Figure 18: Test 1: Algorithm I: Total execution time versus the number of cores. .
680765.fig.0019
Figure 19: Test 1: Algorithm I: Speedup. .
680765.fig.0020
Figure 20: Test 1: Algorithm I: Efficiency. .

Figures 21, 22, 23, 24, 25, and 26 show time, speedup, and parallel efficiency of two algorithms in case of Test 2. We only consider the case of and .

680765.fig.0021
Figure 21: Test 2: Algorithm SI: Total execution time versus the number of cores. .
680765.fig.0022
Figure 22: Test 2: Algorithm SI: Speedup. .
680765.fig.0023
Figure 23: Test 2: Algorithm SI: Efficiency. .
680765.fig.0024
Figure 24: Test 2: Algorithm I: Total execution time versus the number of cores. .
680765.fig.0025
Figure 25: Test 2: Algorithm I. Speed up. .
680765.fig.0026
Figure 26: Test 2: Algorithm I. Efficiency. .

Finally, we show results on scalability of parallel algorithms. Let be the execution time of the parallel algorithm running on cores for segmenting an image of size . We measure the scalability of these algorithms by measuring as varies, once is fixed, and by measuring as and grow. We note that, in case of Algorithm SI, the scaling factor is while, for Algorithm I, we get as scaling factor

This means that Algorithm I scales better than Algorithm SI. In Figures 27 and 28 (for Algorithm SI) and Figure 29 and 30 (for Algorithm SI), we report as and varies. In particular, each point of the graph refers to the execution time of the parallel algorithm at , where , , and .

680765.fig.0027
Figure 27: Scalability of parallel Algorithm SI, as is fixed and varies, and . Each line refers to the execution time of the algorithm at a fixed value of .
680765.fig.0028
Figure 28: Scalability of parallel Algorithm SI, as and vary. Each point of the graph refers to the execution time of the parallel semi-implicit algorithm at , , where , , and .
680765.fig.0029
Figure 29: Scalability of parallel Algorithm I, as is fixed and varies, and . Each line refers to the execution time of the algorithm at a fixed value of . .
680765.fig.0030
Figure 30: Scalability of parallel Algorithm I, as and vary. Each point of the graph refers to the execution time of the parallel implicit algorithm at , , where , and . .

7. Conclusions

A straightforward comparison between the semi-implicit and the fully implicit discretization schemes of nonlinear PDE of parabolic/hyperbolic type states that fully implicit discretization usually leads to too expensive algorithms. In this paper, we provide a multicore implementation of two numerical algorithms arising from using these two discretization schemes: semi-implicit (Algorithm SI) and fully implicit (Algorithm I). Taking into account that we aim to solve such problems on parallel computer in a scalable way, in the first case, we use, as linear solver, Krylov iterative methods (GMRES) with algebraic multigrid preconditioners (AMG). Regarding the fully implicit scheme, we use the Jacobian-Free-Newton Krylov (JFNK) method as nonlinear solver.

We compare these two algorithms using different metrics measuring both the accuracy and the efficiency. We note that if we require that the discretization error is , these two algorithms are quite numerically equivalent, both in terms of execution time and of the computed result; while, as discretization error decreases, Algorithm I appears to be more robust than Algorithm SI, in the sense that Algorithm I reaches the steady state with high accuracy (the Hausdorff distance is of 95%), while the computed results of Algorithm SI are less accurate, even though the execution time of Algorithm I sometimes slightly increases. These results suggest that if it needs to get accurate and reliable results, Algorithm I should be preferable.

The parallel efficiency of both algorithms always is, at least, of and, on average, of about . In particular, parallel efficiency of Algorithm I is of about . Execution time of both algorithms reduces to about 2 seconds on eight cores if and to about 10 seconds if . This means that, in a multicore computing environment, Algorithm I is competitive with Algorithm SI.

In conclusion, our results suggest that if it is required high accuracy of the computed solution in a suitable turnaround time, using a multicore computing environment fully implicit scheme provides an accurate and reliable solution within a response time of few seconds, quite acceptable in medical imaging applications, such as computer-aided-diagnosis.

References

  1. G. Aubert and P. Kornprobst, Mathematical Problems in Image Processing: Partial Differential Equations and the Calculus of Variations, vol. 147 of Applied Mathematical Sciences, Springer, 2nd edition, 2006.
  2. http://www.cs.purdue.edu/research/cse/pses.
  3. J. W. Thomas, Numerical Partial Differential Equations, vol. 22 of Text in Applied Mathematics, Springer, 1995.
  4. Y. Saad and M. Schultz, “GMRES: a generelized minimal residual algorithm for solving nonsymmetric linear systems,” SIAM Journal on Scientific Computing, vol. 7, pp. 856–869, 1986.
  5. Y. Saad, Iterative Methods for Sparse Linear Systems, SIAM, 2nd edition, 1993.
  6. D. A. Knoll and D. E. Keyes, “Jacobian-free Newton-Krylov methods: a survey of approaches and applications,” Journal of Computational Physics, vol. 193, no. 2, pp. 357–397, 2004. View at Publisher · View at Google Scholar · View at MathSciNet
  7. J. Dongarra, D. Gannon, G. Fox, and K. Kennedy, “The impact of multicore on computational science software,” CTWatch Quarterly, vol. 3, no. 1, 2007.
  8. http://www.mcs.anl.gov/petsc/petsc-as/index.html.
  9. M. Bertero and P. Boccacci, Introduction to Inverse Problems in Imaging, IOP Publishers, Bristol, UK, 1998.
  10. A. K. Louis, “Medical imaging: state of the art and future development,” Inverse Problems, vol. 8, no. 5, pp. 709–738, 1992. View at Publisher · View at Google Scholar
  11. W. Rundell and H. Eng, Inverse Problems in Medical Imaging & Nondesctructive Testing, Springer, 1997.
  12. A. N. Tikhonov and V. Y. Arsenin, Solutions of Ill-Posed Problems, John Wiley & Sons, New York, NY, USA, 1977.
  13. L. Rudin, S. Osher, and E. Fatemi, “Nonlinear total variation based noise removal algorithms,” Physica D, vol. 60, no. 1–4, pp. 259–268, 1992.
  14. P. Perona and J. Malik, “Scale-space and edge detection using anisotropic diffusion,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 12, no. 7, pp. 629–639, 1990. View at Publisher · View at Google Scholar
  15. J. J. Koenderink, “The structure of images,” Biological Cybernetics, vol. 50, no. 5, pp. 363–370, 1984.
  16. T. Lindeberg , Scale-Space Theory in Computer Vision, Springer, 1994.
  17. O. Scherzer and J. Weickert, “Relations between regularization and diffusion filtering,” Journal of Mathematical Imaging and Vision, vol. 12, no. 1, pp. 43–63, 2000. View at Publisher · View at Google Scholar · View at MathSciNet
  18. S. Osher and J. A. Sethian, “Fronts propagating with curvature-dependent speed: algorithms based on hamilton-jacobi formulations,” Journal of Computational Physics, vol. 79, no. 1, pp. 12–49, 1988.
  19. J. A. Sethian, Level Set Methods and Fast Marching Methods: Evolving Interfaces in Computational Geometry, Fluid Mechanics, Computer Vision and Materials Science, Cambridge University Press, Cambridge, UK, 1999.
  20. S. Osher and R. Fedkiw, Level Set Methods and Dynamic Implicit Surfaces, Springer, 2003.
  21. L. C. Evans and J. Spruck, “Motion of level sets by mean curvature I,” Transaction of the American Mathematical Society, vol. 330, no. 1, 1992.
  22. A. Sarti and G. Citti, “Subjective surface and Riemannian mean curvature flow graphs,” Acta Mathematica Universitatis Comenianae, vol. 70, no. 1, pp. 85–104, 2001.
  23. R. Malladi and J. A. Sethian, “Level set methods for curvature flow, image enchancement, and shape recovery in medical images,” in Visualization and Mathematics, H. C. Hege and K. Polthier, Eds., pp. 329–345, Springer, Heidelberg, Germany, 1997.
  24. N. J. Walkington, “Algorithms for computing motion by mean curvature,” SIAM Journal on Numerical Analysis, vol. 33, no. 6, pp. 2215–2238, 1996.
  25. A. Handlovivcov, K. Mikula, and F. Sgallari, “Semi-implicit complementary volume scheme for solving level set like equations in image processing and curve evolution,” Numerishe Mathematik, vol. 93, no. 4, pp. 675–695, 2003.
  26. J. W. Ruge and K. Stüuben, “Algebraic multigrid methods (AMG) applied to fluid flow problems,” Frontiers in Applied Mathematics, 1986.
  27. V. E. Henson and U. M. Yang, “BoomerAMG: a parallel algebraic multigrid solver and preconditioner,” Applied Numerical Mathematics, vol. 41, no. 1, pp. 155–177, 2002. View at Publisher · View at Google Scholar · View at MathSciNet
  28. H. Ganster, A. Pinz, R. Röhrer, E. Wildling, M. Binder, and H. Kittler, “Automated melanoma recognition,” IEEE Transactions on Medical Imaging, vol. 20, no. 3, pp. 233–239, 2001. View at Publisher · View at Google Scholar · View at PubMed
  29. H. Pehamberger, A. Steiner, and K. Wolff, “In vivo epiluminescence microscopy of pigmented skin lesions. I. Pattern analysis of pigmented skin lesions,” Journal of the American Academy of Dermatology, vol. 17, no. 4, pp. 571–583, 1987.
  30. F. Nachbar, W. Stolz, T. Merkle, et al., “The ABCD rule of dermatoscopy: high prospective value in the diagnosis of doubtful melanocytic skin lesions,” Journal of the American Academy of Dermatology, vol. 30, no. 4, pp. 551–559, 1994.
  31. A. Csazar, General Topology, Adam HIlger, Bristol, UK, 1978.
  32. D. P. Huttenlocher, G. A. Klanderman, and W. J. Rucklidge, “Comparing images using the hausdorff distance,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 15, no. 9, pp. 850–863, 1993. View at Publisher · View at Google Scholar