Abstract

Determining the outline or boundary contour of a two-dimensional object, or the surface of a three-dimensional object poses difficulties particularly when there is substantial measurement noise or uncertainty. By adapting the mathematical approach of stochastic function recovery to this task, it is possible to obtain usable estimates for these boundaries, even in the presence of large amounts of noise. The technique is applied to parametric boundary data and has potential applications in biomedical imaging. It should be considered as one of several techniques to improve the visualization of images.

1. Introduction

Three-dimensional (3D) computer reconstruction of a target volume or of a surface is an important activity in modern biomedical imaging. The accurate anatomical reconstruction in trauma or for use in image-guided intervention relies on mathematical imaging technology; and this paper develops the mathematical technique of stochastic function recovery [1] and illustrates its use for noisy boundary reconstruction. This is an alternative approach to the standard polynomial-based methods that we see as an add-on or complement to other techniques in use or being developed to improve upon the reconstruction of noisy boundary data to provide enhanced biomedical visualization.

The ability to distinguish features related to boundaries is intrinsic to technology of visualization. For example, in MRI imaging, a range of specialized methods have been developed for extracting information from signals so as to reconstruct images representing internal body structures [2]. Boundary recovery techniques apply to complex surgical procedures as with electroanatomical mapping that tracks the position of catheters inside the body with sparse signals recorded from electrodes at the tip of the catheter. Resulting surface maps must integrate real-time measurements with preoperative MR or CT images, and account for mapping data errors in registration and error due to patient movement [3]. In general, when signals are affected by noise, it must be effectively removed in order to improve the visualization and compared with other medical imaging modalities, ultrasound images suffer from speckle noise that often makes for weak or incomplete boundaries [4].

Our approach is to use stochastic convolution-deconvolution operators [1, 57], which have useful statistical properties, to smooth noisy surface data in a manner that does not obliterate detail, and which effectively removes Gaussian noise. The motivation for the approach is that, intrinsically, stochastic interpolation using probabilistic kernels for the generating function of the row space of the linear operator performs well at removing noise when used to approximate data. However, the difficulty in applying these methods directly is that they bias the data to a mean of zero.

In approximating one-dimensional data, this is not usually a difficulty. However, in approximating multidimensional data, this can cause an apparent shift in the approximant when working in parametric coordinates. This is because the Gaussian kernels used smooth positive values to their mean value, thus potentially shifting the coordinates nearer the origin. When we approximate in one dimension, the data values are shifted to the mean, however, the coordinates are not touched. If we are using parametric data, the coordinates are the data values ; and this means that the data can be shifted in or space. The less data existing or the greater the smoothing, the more will be this shift.

For example, when data is taken from a circle centered on the origin, the approximating curve is a circular curve centered on the origin, but of smaller radius, and the greater the smoothing applied in the approximation (if the data is very noisy), the smaller the area circumscribed by the approximating curve. As the number of sample points is increased, the approximation improves. However, for coarse surface data that is only smoothly varying, this can cause difficulties.

One approach is to make use of stochastic interpolation. Creating a dense noisy data set from the sparse noisy data using interpolation is followed by approximating this fine data set to recover the smoothed surface curve, thereby mitigating the shifting of the mean. While workable, this approach has several disadvantages, most notably that it requires a more costly interpolation step. It also requires the application of the technique more than once, and in the second or subsequent applications, the approximation must be done on a fine data set, meaning that many more points require approximation, again incurring a larger computational cost.

The solution is to make use of the convolution-deconvolution properties of stochastic interpolation combining the densification step with the approximation step. This would still be expensive. However, we introduce an approximate means for doing the interpolation that interpolates for smooth data, but which approximates for noisy data, thus avoiding the costly need to construct the inverse operator needed for interpolation.

2. Development

Consider the task of sampling a known function at points with so as to determine its value at . The stochastic interpolant [5] to the data , is given bywhere is the data vector, and where is a row stochastic matrix whose coefficients consist of the values . Choosing the generator of the row space of to be the Bernstein functions [1] (named after Bernstein as the derivation of this form that can be obtained from the Bernstein polynomials), we havewith on the partition , with and yielding a stochastic matrix. Setting generates the entries of , and setting to any set of consecutive values in generates the coefficients of , that is, the coefficients of are constructed in the same manner as for , except that the nodes at which is evaluated may differ from the values at which the data are given. While any probability density function (pdf) can be used, appropriately replacing the mean and variance of the Gaussian in (2), a pdf based on the normal distribution is consistent with the problem of filtering Gaussian noise.

In stochastic interpolation, with the coefficient of generated by (2), we can interpret as the discrete deconvolution of the data yielding the preimage generated by . This preimage is then convolved by to yield an -vector of values that interpolates the data at the output coordinates , ; the output coordinates are those that were used to generate the coefficients of . It is for this reason that we have elected to represent the matrix using another symbol since it is desirable to emphasize its role in convolution.

Defining with and with constant yields a coefficient structure in which is a diagonal matrix times a symmetric Toeplitz matrix: . Inversion or solution of these matrices can be accomplished in operations [8], however in the cases of interest to us, this is not necessary. It is possible to do better than this using an approximate inverse in which the row space of is generated directly. While evaluating is still , it is significantly faster than the Toeplitz matrix inversion of to obtain .

The approximate inverse is an approximation precisely because , that is, in applying stochastic interpolation to the data , there is an error: Thus the interpolant to the data can be expressed using successive correction to the errors usingSubstituting for , from to , givesand truncating the sum gives a working formula in which even for larger values of , so that the method nearly interpolates. Truncating at and applying the formula to generate output values instead of giveswhere . Provided that in (2) is small, the error in constructing using is small, however if larger values of are used, then greater smoothing is applied to the data and the use of (6) becomes necessary when is larger than 0.05. However, it will become apparent that it is because of this greater smoothing that it is unnecessary to apply any corrections as shown in (6), and thus the direct computation of is found to be convenient and efficient, requiring only a single matrix multiply.

In working with stochastic data recovery, it is obvious that using the Bernstein functions mollifies the data vector , and thus provides an approximation vector of length to the initial vector of length . Consider the evaluation of where the generator of the row space of and are given by (2). This interpolates the data provided that , meaning that the variance of the Gaussian pdf that is used to generate , is the same as the variance that is used to generate . If instead , then the preimage will be oversmoothed when is applied to the preimage, and the result will be approximation. Similarly, if , then the preimage will not be smoothed sufficiently, and the data will be roughened, or more appropriately it will be deconvolved.

With statistical errors present in multidimensional data, interpolation in parametric coordinates may yield an extremely complex curve, and the errors may cause the curve to wiggle excessively, often crossing over on itself. Clearly, some form of smoothing is necessary. However, as noted in Section 1, simply approximating the data so as to smooth these errors may introduce translation errors in the approximating surface representation, particularly when the number of data points is small, or the smoothing specified by is large. Since the approximation is convergent, a simple work-around can be achieved by densification of the data by interpolation as this will minimize this shift on subsequent smoothing. This leads to a computationally efficient approach to surface recovery that avoids translation errors while smoothing the noise based on evaluating , where is the desired number of output points representing the boundary, and are the number of input data points and .

To demonstrate this form, note that the intended construction is to evaluate , where is significantly greater than with , and then applying smoothing to this densified data by multiplying by , where . In effect, this interpolates the data and then smooths it by the application of the approximation to the densified data. Thus a two-step algorithm can be described as follows:

(1)(2)

This algorithm is equivalent to the following:

(1) that is, there exist and such that applying to is the same, or nearly the same, as applying to . The use of for a wide range of instead of introduces some additional smoothing, allowing for less smoothing to be used on the convolution step, that is, when applying . Since it saves an unnecessary matrix multiply, it is clearly faster. The construction of is not difficult, remarkably being given by the inverse of the generator of the row space of . The elements of the approximate inverse are given by the reciprocals of the coefficients of . It is for this reason that the direct inversion of , or solution of the system using efficient Toeplitz solvers, is not needed. The only exception may be when the data is free of noise and exact interpolation without any smoothing is desired.

The reconstruction of surface data is done using a parametric representation in which for two-dimensional data, and for three-dimensional data, where . The recovery of the data is done using (1) or its modification using an approximate inverse , applied successively to the data pairs , in two dimensions and to the data triplets , , in three dimensions. For example, in the case of interpolating two-dimensional data, we apply and to obtain the interpolant to the position vector and the position vector , or we apply and to obtain the approximate interpolants.

In reconstructing a parametrically represented surface generated from image data from pixel values, for instance, an algorithm for boundary detection and sorting is necessary. In our analysis, it is assumed that this is available, however the errors generated in parameterizations of the surface may not be entirely random, and thus systematic errors in surface representation will also be introduced. For the purpose of assessing the performance of the boundary recovery algorithms, it will be assumed that the errors are random Gaussian with a mean of zero, with variable variances , generated using the random variablewhere and are two random numbers uniformly distributed in the interval . Thus the parametrically ordered data , with or , and , are perturbed to yield the data sets .

In applying stochastic data recovery to the problem of finding the shape of a parametrically defined boundary, the problem of closed curves needs to be addressed. In the presence of large amount of noise, the two endpoints of the parametrically defined curves may not match: while ideally in the case of a close loop, in the presence of errors this will not be the case.

Finally, in implementing the algorithm, it was found that the dependence of (2) on in the denominator made the choice of dependent on , as the boundary data density increased, the recovery using any given value of produced increasingly rougher curves as increased, and thus it was found convenient to evaluate and based on a constant value of . Additionally, the algorithm was applied to all of the boundary data associated with a parametric data set to construct the boundary curve, rather than decomposing the data into overlapping blocks. The merits of using all of the data are a slight improvement in accuracy, while the drawbacks are that the cost of evaluating the algorithm increases as the block size increases, and this should be born in mind in applying the algorithm to large complex three-dimensional data sets.

The choice of depends on the smoothness of the desired boundary curve. The larger is the value of , the smoother are the results. The values for and depend on the amount of noise, as well as on the presumed smoothness of the boundary data. While this seems to present difficult choices, it is less complicated than it appears, as the choice for will usually be any sufficiently small value. For example, setting allows for representing the data as nearly piecewise linear along the boundaries. In contrast, the choice for requires some evaluation as this determines the smoothness of the recovered boundary.

3. Results and Discussion

We begin by applying the process to the recovery of the boundary of a disk defined parametrically by 12 uniformly distributed points with no random errors in the data as shown in Figure 1. The figure clearly illustrates the aforementioned difficulty of attempting to approximate (smooth parametric data).

In examining Figure 1(a), it is also important to observe that the beginning and the end of the curve are joined by a straight line in this example. The reason is that the slope of the approximant to the data is not the same as the slope to the data , and thus in this example, the first approximated point does not agree with the last approximated point , and are joined graphically with a straight line to close the curve.

A solution to the mismatch is to overlap the curves during reconstruction, that is, at several points away from the last point, and ending the construction several points away from the first point, then only using the curve from the first to the last point. In all of the studies presented, there is no attempt to overlap the curves in order to magnify these boundary affects, and to demonstrate that they are mostly negligible, as seen in Figure 1(b), whenever the construction is done correctly. For large data sets where it may be computationally advantageous to block the data, overlapping the endpoints is readily accomplished.

It is important to realize that if the number of points representing the disk were to be much more than 12, then it would have been difficult to visualize the contraction of the recovered boundary curve since the approximation is convergent. Thus for a sufficiently large number of data points, the difference between the approximating curve and the curve itself becomes arbitrarily small.

Recovering the boundary of a disk becomes more difficult, as shown in Figure 2, particularly if the amount of noise is quite large. In this example, the Gaussian noise for a disk of radius 5 is specified as yielding an extensively scattered data set. While the level of noise in this illustrative example is much higher than would be expected in any realistic imaging situation, the example serves a two-fold purpose: (1) on the one had it shows the robustness of the method at recovering a reasonable representation of the surface from the data that is more consistent with noise than data, and (2) it shows that the effects of errors in constructing the parameterization are less of an issue than might be presumed. In the figure, the connectivity of the boundary data is not ordered in moving counterclockwise around the circle, and so it is quite likely that any minor errors in parameterization, for example, using even the simplest unconstrained nearest neighbor search of the data, would cause unrecoverable errors in the surface representation that is recovered.

As expected, reducing the noise to significantly improves the recovery, even for half as many points, as shown in Figure 3.

While both of these figures are typical, the symmetry and the smoothness of the boundary of the disk make the recovery somewhat less challenging than for a more complex geometrical shape. Thus the performance of the algorithm is examined on a star-shaped region generated from the vertex data (8,65), (72,65), (92,11), (112,65), (174,65), (122,100), (142,155), (92,121), (42,155), (60,100), (8,65). Note that the recovery makes use of that on both steps is the same as that used in recovering the boundary of the disk.

The shape of the star in Figure 5 has become more wiggly using the smoothing parameters the same as in the case of lesser noise. The problem is a classical one: there is no means to discern the shape of the figure from the noisy data, except to note that an acceptable shape is determined by the smoothness of the boundary that is intrinsic to the figure. In this case, changing the smoothing can accommodate this subjective assessment, as illustrated in Figure 6. Note that the recovered boundary is consistent with the curve recovered from the less noisy data set: compare Figures 4 and 6 as shown in Figure 7.

The effects of doubling the smoothing by taking to be twice as large are clearly evident. Since the noise in both cases was generated using the same seed, it is only the magnitude of the excursions away from the star's boundary that change, and hence the figures are directly comparable. This perhaps most plainly illustrates that interaction between noise and smoothing, as the two curves are nearly identical. Even when the noise is doubled again to , the shape of the recovered curve using is remarkably consistent, as shown in Figure 8. At this level of noise there is some loss of resolution of the limbs, however the recovered boundary is recognizable as being related to the two boundaries obtained in Figures 4 and 6. While this is artificial in that the amount of noise in the data in real problems is not known, it does demonstrate the robustness of the algorithm at consistently recovering the boundary data irrespective of the added noise.

A final remark on the computing of the approximate interpolant is the following. Since the approximate interpolant fails to interpolate when is large or, more appropriately, fails to interpolate rapidly varying data, quite some effort was expended in developing mechanisms for being able to use large for smoothing and yet still maintain some fidelity with the boundary data while constructing the preimage. As it developed, this iterative correction was not necessary: the algorithm performed well even without the introduction of any corrections. In part, this is due to the rather simple shapes, and the relatively large amounts of noise that were examined.

In the case when a surface is oscillating rapidly with the noise much less than this surface oscillation, it is clearly necessary to implement the algorithm using this additional correction. Indeed, given sufficiently rough surface data, it may be necessary to use stochastic interpolation as otherwise some high-frequency surface details will be oversmoothed by the approximate inverse construction.

4. Conclusions

Examination of several attempts at domain boundary reconstruction in two dimensions is encouraging using the stochastic data recovery techniques. In particular, qualitative assessments demonstrate the viability of utilizing stochastic function recovery methods for reconstructing parametrically defined edges. Since the approach is intrinsically one dimensional, its extension to three dimensions is not difficult, and thus can easily be implemented and tested on more realistic problems. Moving to three dimensions poses no additional cost other than that more data has to be processed, that is, the algorithmic costs scale directly with the number of lines being evaluated, and the number of points on each line at which the algorithm is applied.

It should be noted that the proposed technique does not solve all problems involving noisy surface reconstruction, however it does provide an additional tool for analysis. The Gaussian-based kernels (the Bernstein functions) which were used to generate the row space of the matrices, were remarkably effective at cancelling the noise even under the most extreme conditions where the noise essentially obliterated the image shape. Of course, this noise was Gaussian and so it is only reasonable that the proposed approach would work well in these circumstances. For other types of noise, the use of alternative probability density functions for generating the mollifiers is easily accomplished, and thus the method has substantial design flexibility, and these options need to be explored in detail to ascertain their utility at cancelling these other types of noise.

The computational advantages of the technique are that it requires only two matrix multiplies of the data vector, and thus the approach is relatively cost effective. Moreover, the method is easily implemented in parallel, and further computational gains in efficiency can be achieved by blocking the data since typically only a segment of the entire data vector is needed to recover the data in any region. If fixed block sizes can be implemented, then the cost of the two matrix vector multiplies can be further reduced as the matrix-matrix multiply needs to be done only once, and so the algorithm reduces to a single-block matrix-vector multiply.