Abstract

3D surfaces are important geometric models for many objects of interest in image analysis and Computational Anatomy. In this paper, we describe a Bayesian inference scheme for estimating a template surface from a set of observed surface data. In order to achieve this, we use the geodesic shooting approach to construct a statistical model for the generation and the observations of random surfaces. We develop a mode approximation EM algorithm to infer the maximum a posteriori estimation of initial momentum , which determines the template surface. Experimental results of caudate, thalamus, and hippocampus data are presented.

1. Introduction

3D surfaces are important geometric models for many objects of interest in image analysis and Computational Anatomy. For example, they are often used to represent the shape of 3D objects, the surface of human faces, and the boundaries of brain structures or of other human organs. Most data analysis methods in this domain are template-centered, and a proper estimation of a template plays an important role to obtain high quality results. This paper is devoted to the description of statistically supported template estimation method which is adapted to surface data sets.

Our approach will be to build a generative statistical shape model in which the template is a parameter. We will then estimate it using maximum likelihood. This model relies on the very natural setting for which an observed surface is a noisy version of a random deformation of the template. This is the most generic and most basic approach of the deformable template paradigm, even if we add a small refinement by including a prior distribution on the template, based on what we will call a hypertemplate. Even with this global scheme which is fairly simple, we will see that implementing it in the context of surfaces will constitute a significant theoretical and numerical challenge.

At the exception of the recent work of [1], this approach significantly differs from what has been mostly proposed in the literature, in which most of the methods compute templates as averages over specific common parametrizations of the surfaces (using, for example, the sphere as a parameter space [2]). Parametric representations, however, are limited by the fact that, because they are defined a priori and independently for each object, they cannot be assumed to suitably align important features in a given data set of surfaces (i.e., give similar coordinates to similar features in the surfaces). This usually results in oversmoothed template surfaces (which is the equivalent of getting blurry template images in the case of image averaging). In [1], a similar diffeomorphic transformation model is used, but, as we will see, our Bayesian construction will provide a well-specified template whereas [1] needs to rely to topologically unconstrained approximations to end up with a manageable template.

In addition to the references above, there have been several publications addressing the issue of shape averaging over a dataset, although most of them work with 3D volume data or landmark points set. In several cases, the average is based on metric properties of the space of shapes [37], and the template is computed as an intrinsic average, minimizing the sum of square distances to each element of the set (Fréchet mean). Such methods have been implemented in the context of diffeomorphic deformation models (which are also models of interest in the present paper) for landmark matching [8], for 3D average digital atlas construction [9], and to quantify variability in heart geometry [10]. Other definitions of the average, adapted to situations in which the data is corrupted by noise, have been proposed [1113], based on variational approaches (but not relying on a generative statistical model).

Our approach to build a generative statistical shape model is reminiscent of the construction developed in [14] for linear models of deformations, and in [15] for large diffeomorphic deformations in 3D volume image averaging. Adapting these ideas to surfaces will however require new algorithms and numerical developments.

In order to present our model, we need to first provide some background materials and notation, describing in particular the geodesic shooting equations that we will use to generate deformed surfaces. We will then introduce a random statistical model describing the generation and the observations of random surfaces. We then develop a Mode Approximation EM algorithm for surface averaging, to estimate the template from observations. In the optimization part, we derive and implement a new variational scheme, which is also applicable to surface matching, providing an alternative approach to the one originally proposed in [16, 17]. Finally, we present and discuss experimental results on caudate, thalamus, and hippocampus data.

2. EPDiff for Surface Evolution

We will base our random shape model on the so-called EPDiff equation, which describes the evolution of deformable structures (like images, surfaces, or landmarks) under the action of groups of diffeomorphisms. It is a geodesic equation for a Riemannian metric on diffeomorphisms, and describes a momentum conservation law in the associated Hamiltonian system. The reader interested by the theory behind this equation can refer to [1820], but most of this background will not be needed for the present paper, in which we will only use the specific form of the equations for surface evolution. The term EPDiff comes from its determination as an Euler-Poincaré equation in the group of diffeomorphisms, as introduced in [21]. One of its main interests here is that it provides a numerically stable, easily described, Hamiltonian evolution over diffeomorphisms, which will constitute an ideal support for our shape models.

The EPDiff equations describe the combined time evolution of a diffeomorphism, denoted and of what can be interpreted as a momentum, denoted . The initial conditions are always for , and some initial value, , for . This initial momentum will be a key component of the statistical model that will be built later on.

Let us start with the simplest form of the equation, which assumes that is a vector-valued function over , that is, . It involves a smoothing kernel, , defined on , a typical choice being Letting denote the gradient of with respect to its first variable, the corresponding EPDiff equation is Here, the notation refers to the usual dot product between vectors in . If is smooth enough, this system has solutions for arbitrary large , and the mapping is a diffeomorphism at all times.

The interesting fact about these equations is that they can have singular variants that are described in a similar way and have the same existence properties. The simplest way to relate the variants to the previous equation is to replace the Lebesgue's measure in the integrals by another, possibly singular, measure. For example, taking a surface in , we can use the volume form on as a reference measure and obtain the equations (in which and need only to be defined on ) where is the volume form on . Note that the first equation is defined for , but it suffices to use it for to obtain an equation for the evolving surface

We can write a discrete form of the equations by replacing by a sum of Dirac measures (at points in ), which gives, letting , Similarly to (3), the first equation is valid for all , but it suffices to solve it for , to obtain the evolution of the point set Also, (5) can be seen as a discretization of (3) in which are the vertices of a triangulation of , and , where is the area of a surface element around .

The evolution of point sets is the most important from a practical point of view, since it is an ODE that can be easily implemented. Assuming a radial kernel like in (1), and denoting , and , (5) can be rewritten as Once the initial position of the vertices, , and the initial momentum, , are provided, the evolution of the point set is uniquely determined.

3. Generative Model for Surface Observation

3.1. Random Triangulated Surfaces

If a triangulated template surface with vertices is given, and we solve, until time , (5) initialized with and a random initial momentum , the displaced vertices provide a random perturbation of the initial surface that will be denoted by . This is stated in the following definition.

Definition 1. Let be a triangulated surface with vertices . Let be a collection of vectors in . Let be the solution of (5) with initial condition and . One defines to be the triangulated surface with vertices and the same topology as .

By letting be random, we build as a random deformation of . This will form the “ideal”, unobserved, surface, of which only a noisy version is observable (the noise process will be described in the next section).

Following [15], we will use a Bayesian formulation in which is itself represented as a random deformation , where is a fixed surface that we will call the hypertemplate, and is a prior initial momentum shooting from to (same notation as in Definition 1). One of the main interests of using a hypertemplate is to fix the topology of so that it belongs, by construction, to the same class of objects as .

So, if surfaces are observed, we need to model the probability distribution of the prior momentum, (starting at ), which specifies and of deformation momenta which specify the surfaces . We now provide a statistical model for the joint probability distribution of .

We first introduce some notation. Letting be the kernel introduced in the previous section to define the geodesic shooting equations, we let be the by matrix formed with the 3 by 3 blocks . We define, for a triangulated surface with vertices , and ,

We define the joint distribution of on by where is a fixed parameter regulating the weight on the hypertemplate.

There is a technical difficulty here, which is that one must make sure that this probability can be normalized ( exists), which requires that the exponential is integrable. That this is true is not straightforward, and we have not been able to find a proof that works with any choice of the kernel . One way to deal with this is to introduce a constant (which can be chosen arbitrarily large so that it does not interfere with the algorithms that will follow), and add to the model the constraint that is smaller than . Under such an assumption, one obtains (after integrating out the 's) This is finite, since, for any given , the transformation is the restriction of a diffeomorphism to the vertices of (as seen from (5)). This implies that is nonsingular, and its determinant is bounded away from 0 when is restricted to a compact space.

In fact, the choice can be proved to be acceptable for a large class of kernels. Those are kernels for which the smallest eigenvalue of   decreases at a speed which is at most polynomial in the minimal distance, , between the vertices. A list of kernels satisfying this property can be found in [22]. For such kernels, we find that ( being fixed) . Just sketching the argument here, one can prove, using elementary properties of dynamical systems, that for some constant , so that the log determinant in (9) is linear in and is well defined, even with . For very smooth kernels, including the Gaussian, bounds on the smallest eigenvalue of are much worse (with a decay which is exponential in ), and the previous argument does not work. Since the bounds in [22] hold uniformly with respect to the number of points, a polynomial bound may still be valid for a fixed , although we were unable to discover it.

Notice that, conditionally to the template, the momenta are independent and follow a Gaussian distribution with inverse covariance matrix given by . An example of simulated random deformations obtained using such a model is provided in Figure 1.

3.2. Observation and Noise

The second part of our generative model is to describe the observation process, which takes an ideal surface generated according to the model above, and returns the noisy observable.

Modeling such a noise process is a tricky issue. Obvious choices (like adding noise to the vertices of ) do not work because one cannot assume that the observed surfaces are discretized consistently with the template. In this paper, we will work around this issue by assimilating the observation of a surface that of a singular Gaussian process.

For this, we consider that surfaces in are not observable directly, but only via their action on test functions, that we will call sensors. We define a sensor to be a smooth vector field over (typically with small support). Given an oriented surface , define where is the normal to . The real number, is the measurement of through the sensor .

Now, modeling noisy surfaces will result in assuming that, given any , the measurement is a random variable. We will assume that it is Gaussian, and more generally, that, given sensors, , the random vector is Gaussian.

, via its action on sensors, is therefore modeled as a Gaussian random field. Given an ideal surface , we will assume that its mean is given by and the process is thereafter uniquely characterized by its covariance operator We will assume that this covariance is associated to a symmetric operator so that (The apparently redundant parameter , which could have been included in , appears here because it can be easily estimated by the algorithm, with the operator remaining constant.)

To finalize our model, it remains to describe how is discretized, that is, to make explicit a finite family of sensors through which is measured. Let form a regular grid of points in . Let be a radial basis function (a Gaussian, e.g.,) and define, for and where is the th vector of the standard basis of (this therefore specifies sensors). The resulting observed variables are where is the th coordinate of . These variables are, by assumption, jointly Gaussian, with means and covariance matrix Assuming that is translation invariant, the resulting expression is a function of that we will denote

Let be the inverse matrix of the one with coefficients . The log likelihood of the process will include error terms taking the form Replacing by its expression in (16), we have Let us analyze the first term. We have with the notation

Treating the other terms similarly, we can rewrite the error term in the form and we will abbreviate this (introducing a notation for the right-hand side) as . Thus, we can write

We have the following important proposition.

Proposition 1. Assume that is taken equal to the Green function of . Then, when the grid becomes finer and is given by (22), one has

See the appendix for a proof of this proposition (which requires some background on the theory of Hilbert spaces with a reproducing kernel) and possible extensions. For practical purposes, we will use instead of in , therefore assuming that the sensors are chosen according to the proposition. It is interesting to notice that the resulting norm in this case is precisely the norm that has been introduced in [16] to compare surfaces, when they are considered as elements of a reproducing kernel Hilbert space of currents. We refer to [16] for details on the mathematical construction.

In the rest of the paper, we develop a parametric procedure to estimate the template from the observation of i.i.d. surfaces generated as described above. This includes in particular hidden deformation momenta , such that the complete distribution of observed and unobserved variables is where we have written for short .

We now discuss the estimation of the parameter , and of the associated template , which is the main purpose of this paper. This will be implemented with a mode approximation of the EM algorithm, as described in the next section.

4. Algorithms for Surface Template Estimation

4.1. Mode Approximation EM

Let be the hidden part of the process, representing the collection of initial momenta, and let be the collection of observed surfaces. The complete distribution for the process, including the prior is given by (26).

The EM algorithm is an iterative method that updates a current estimation of using the following two - and -steps.

-step: determine the conditional expectation .

-step: maximize this expression with respect to and replace the current estimation of by the obtained maximizer.

The conditional expectation can be expanded as

Considering the highly nonlinear relation between and the deformed surface , an explicit computation in (27) is impossible. We will therefore rely on the classic mode approximation in the EM which replaces the conditional distribution by a Dirac measure taken at the conditional mode, yielding where

This results in a maximum a posteriori procedure that maximizes alternatively in and in the 's. Like in [15], we will refer to it as a mode approximation to the EM (MAEM) rather than a MAP algorithm, in order to strengthen the fact that it is an approximation of the maximum a posteriori procedure relying on the likelihood of the observed data only. As illustrated in [14], the MAEM can be biased (leading to inexact estimation of the template, even with a very large number of samples), especially when the noise is important, but it is obviously more feasible than the exact EM. Notice that the MAEM method is a special form of EM algorithm, and as such optimizes a lower bound of the log likelihood of the observed data.

We summarize the two steps of the th iteration of the MAEM in our case. Suppose and the 's are the current variables to be updated. Then the next iteration is as follows.

MAE step: with (and therefore ) fixed, find, for , to minimize and replace by .

step: with fixed, update with the minimizer of

We now discuss how each of these steps can be implemented.

4.2. MAE Step

Our goal in this section is to optimize (30). We work with fixed and drop it from the notation to simplify the expressions. The objective function is This problem is equivalent to the surface matching algorithm considered in [16], with a slightly different formulation since [16] optimize an energy with respect to a time-dependent momentum instead of just the initial momentum (i.e., they solved simultaneously the geodesic estimation and the matching problems). These two formulations are equivalent when using continuous time (they produce the same minima), but they yield different results when discretized. In our setting, formulating the problem as in (32) is natural, and focuses on the modeled random variable, .

We need to compute the variation of with respect to . This computation will be useful for the -step also. We first discuss the discretization of the error term, which follows [16]. Let be a triangulated surfaces, with vertices and faces . Each face is represented by an ordered triple of vertices: , and we define the face centers and area-weighted normals by Then a discrete approximation of is where is considered as fixed and is therefore considered as a function of the vertices, , of .

With this notation, we can write We want to compute the gradient of and for this, apply the chain rule to the transformations , yielding where is the transpose matrix of .

The gradient of with respect to has been computed in [16], and is given as follows. We denote by the set of faces (triangles) that contain a vertex in a triangulated surface . For , we let denote the edge opposed to in , positively oriented in . With , we have where if and if .

Now let us derive the variation of with respect to the initial momentum, . We know that , where and evolve according to the system (7) with and (and are short for and ). Now an infinitesimal variation in the initial condition induces infinitesimal variations and over time, and the pair obeys the following differential system, that can be obtained from a formal differentiation of (7): with .

One can rewrite it in the matrix form: where with Solving this system with initial condition and provides what we have denoted One does not need to compute all the coefficients of the matrix using this equation in order to apply the transpose in (36). This is fortunate because this would constitute a computationally demanding effort given that this matrix is by with large. The right hand side of (36) can be in fact computed directly using a single dynamical system, given by where is defined in (39). If (44) is solved from time to time with and , then This is a simple consequence of the theory of linear differential systems (a proof is provided in the Appendix for completeness). Note that the matrix depends on the solution of (7) computed with initial conditions and . To emphasize this dependency, we will denote it in the following.

Given this, we see that a variation induces a first-order variation of the energy given by where the -dot product is and is the matrix formed with 3 by 3 blocs .

We choose to operate the gradient descent with respect to this dot product and therefore choose a variation proportional to . So, the algorithm to compute an optimal is the following.

Algorithm 1 (MAE Step for Surface Template Estimation). (1)Compute the variation using (37). (2)Solve backward in time (44) initialized with and .(3)Replace by (using a line-search to optimize ).

This algorithm has to be applied times (for all , ) in the MAE step.

Remark 1. The matrix being typically very badly conditioned, we numerically compute after adding a small positive number to the diagonal of . The inversion itself is computed using conjugate gradient.

4.3. M Step

There are many similarities between the -step and the -step variational problems, so that we will be able to only sketch the detail of the computation here. We need to minimize with

Let us consider the variation of each term in the sum (fixing , that we temporarily drop from the notation). Since we can write

The function being defined as before, we see that the derivative of the second term is given by applying the chain rule again, this time in the form

Like in the previous section, the transpose of the differential applied to the gradient of can be computed by solving a dynamical system backward in time. In fact, it is the same system as with the variation in , namely, still initialized with and , but the relevant result now is . The gradient of is then

Once this is computed, the next step is to compute (reintroducing in the notation) This follows a similar procedure, using (44), with instead of and instead of . This requires solving initialized with and . The variation of associated to an infinitesimal variation of is then

We summarize the M step in the following algorithm.

Algorithm 2 (-Step Algorithm for Surface Template Estimation). ()For : ()Compute using (37). ()Solve system (44) backward in time with initial condition and . ()Compute using (50).()Solve system (55) backward in time with and .()Replace by , being optimized with a line search.

4.4. Surface Template Estimation Algorithm

We finally summarize the surface template estimation algorithm:

Algorithm 3 (Surface Template Estimation). Having the hypertemplate and observed surfaces , the goal is to estimate the template . Let denote the current estimation with initial guess , , . Then, in the next iteration, update with the following steps: (i)With fixed, apply Algorithm 1 to update each , . (ii)With 's fixed, apply Algorithm 2 to obtain a new value for . (iii)Solve (7) initialized with the hypertemplate and the newly obtained to update the estimated template, .

5. Result and Discussion

We applied the algorithm to surface data of human brain's caudate, thalamus, and hippocampus. All data are courtesy of Center for Imaging Science at Johns Hopkins University. Each surface has around 5–10 thousand triangle cells. We randomly chose one as the hypertemplate and the others as observed surfaces. In these experiments, we set and .

Figures 2 and 3 are the template estimation result for caudate and thalamus, respectively.

We also applied the algorithm to 101 hippocampus surface data in the BIRN Project (Biomedical Informatics Research Network). In Figure 4, Panels (a)–(h) are 8 examples of the 101 observations. Panel (i) is the hypertemplate and Panel (j) is the estimated template.

The result is visually satisfying in the sense that the estimated template is found to agree with a qualitative representation of the population. For example, in the caudate experiment, the estimated template has an obviously narrower upper tip than the hypertemplate. This captures the population characteristic since most observed data have narrower upper tips. Notice that the obtained template does not look smoother than the rest of the population, as would typically yield template estimation methods that average over fixed parametrizations.

Figure 5 shows how the energy in (31) changes with the iteration in the caudate experiment. This confirms the effectiveness and convergence of the algorithm. One can see the energy drops quickly in the first twenty iterations, then gradually slows down. After the th iteration, the energy changes little and the estimated template remains stable.

In our model, the hypertemplate can be provided by an atlas obtained from other studies, although we here simply choose one of the surfaces in the population. Actually, as Figure 6 shows, different choices for the hypertemplate yield very similar results.

6. Conclusion

In this paper, we have presented a Bayesian approach for surface template estimation. We have built, for this purpose, a generative statistical model for surfaces: the construction first applies a random initial momentum to the template surface, then assumes an observation process using test functions and noise. The template is assumed to be generated as a random deformation of a hypertemplate, completing the Bayesian model. We used a mode approximation EM scheme to estimate the surface template, and introduced for this purpose a novel surface matching algorithm optimizing with respect to the initial momentum. The procedure has been tested with caudate, thalamus, and hippocampus surface data, showing its effectiveness and convergence, and also experimentally proved to be robust to variations in the choice of the hypertemplate.

Appendices

A. Proof of Proposition 1

Let us assume that , that is, is the Green Kernel of the operator . This assumption implies, in particular, that . Given a smooth function , denote Define the vector space Introduce the RKHS with scalar product given by Then can be identified to the orthogonal projection of on , because is the inverse of the Gram matrix (matrix of dot products) of , which are the generators of , and by assumption.

Now, let be a sequence of progressively finer grids with resolution tending to 0 when tends to infinity. We prove that , for which it suffices to prove that is dense in . This is equivalent to the fact that no nonzero in can be orthogonal to all the 's. But orthogonal to is only possible when, for all , Since the form an arbitrarily fine grid in and functions in are continuous, this implies .

So, in , which implies, for example, pointwise convergence as soon as is embedded in the set of continuous functions (that is, if is continuous). This directly implies Proposition 1 in the case , by taking for a given .

Extensions of this result is when is obtained by applying some linear operator, say , to . One then has and passing to the limit in the sum (for which one needs and continuous on ), one gets .

B. Transposing Linear Differential Equations

We here justify the procedure described in Section 4, and prove the following fact: consider the solution, , of the ordinary differential equation where   is a time-dependent operator (a by matrix). Then, for any , we have where is the solution of the differential equation with .

Let us prove this result. First remark that since is the solution of a linear equation, it depends linearly on the initial condition, . More precisely, let be the by matrix such that and . Then , and, obviously, . Using the identity , we obtain so that . Computing the transposed equation yields and taking Thus, if , we have and as announced.

Acknowledgments

This paper is partially supported by R01-MH056584, R01-EB000975, R01-MH60883, P50-MH071616, P01-AG003991, P01-AG0262761, P41-RR015241, R24-RR021382, and NSF DMS-0456253.