Determining the outline or boundary contour of a two-dimensional object, or the surface of a
three-dimensional object poses difficulties particularly when there is substantial measurement noise or uncertainty. By adapting the mathematical approach of stochastic function recovery to this task, it is possible to obtain usable estimates for these boundaries, even in the presence of large amounts of noise. The technique is applied to parametric boundary data and has potential applications in biomedical imaging. It should be considered as one of several techniques to improve the visualization of images.
1. Introduction
Three-dimensional (3D) computer reconstruction of a target volume
or of a surface is an important activity in modern biomedical imaging. The
accurate anatomical reconstruction in trauma or for use in image-guided
intervention relies on mathematical imaging technology; and this paper develops
the mathematical technique of stochastic function recovery [1] and illustrates its use for
noisy boundary reconstruction. This is an alternative approach to the standard
polynomial-based methods that we see as an add-on or complement to other
techniques in use or being developed to improve upon the reconstruction of
noisy boundary data to provide enhanced biomedical visualization.
The ability to distinguish features related to
boundaries is intrinsic to technology of visualization. For example, in MRI
imaging, a range of specialized methods have been developed for extracting
information from signals so as to reconstruct images representing internal body
structures [2]. Boundary
recovery techniques apply to complex surgical procedures as with
electroanatomical mapping that tracks the position of catheters inside the body
with sparse signals recorded from electrodes at the tip of the catheter.
Resulting surface maps must integrate real-time measurements with preoperative
MR or CT images, and account for mapping data errors in registration and error
due to patient movement [3]. In general, when signals are affected by noise, it
must be effectively removed in order to improve the visualization and compared
with other medical imaging modalities, ultrasound images suffer from speckle
noise that often makes for weak or incomplete boundaries [4].
Our approach is to use stochastic
convolution-deconvolution operators [1, 5–7], which have useful statistical properties, to smooth
noisy surface data in a manner that does not obliterate detail, and which
effectively removes Gaussian noise. The motivation for the approach is that,
intrinsically, stochastic interpolation using probabilistic kernels for the
generating function of the row space of the linear operator
performs well at removing noise when used to
approximate data. However, the difficulty in applying these methods directly is
that they bias the data to a mean of zero.
In approximating one-dimensional data, this is not usually a difficulty. However, in
approximating multidimensional data, this can cause an apparent shift in the
approximant when working in parametric coordinates. This is because the
Gaussian kernels used smooth positive values to their mean value, thus
potentially shifting the coordinates nearer the origin. When we approximate in
one dimension, the data values are shifted to the mean, however, the
coordinates are not touched. If we are using parametric data, the coordinates
are the data values ; and this means that the data can be shifted in or space. The less data
existing or the greater the smoothing, the more
will be this shift.
For example, when data is taken from a circle centered
on the origin, the approximating curve is a circular curve centered on the
origin, but of smaller radius, and the greater the smoothing applied in the
approximation (if the data is very noisy), the smaller the area circumscribed
by the approximating curve. As the number of sample points is increased, the
approximation improves. However, for coarse surface data that is only smoothly
varying, this can cause difficulties.
One approach is to make use of stochastic
interpolation. Creating a dense noisy data set from the sparse noisy data using
interpolation is followed by approximating this fine data set to recover the
smoothed surface curve, thereby mitigating the shifting of the mean. While
workable, this approach has several disadvantages, most notably that it
requires a more costly interpolation step. It also requires the application of
the technique more than once, and in the second or subsequent applications, the
approximation must be done on a fine data set, meaning that many more points
require approximation, again incurring a larger computational cost.
The solution is to make use of the
convolution-deconvolution properties of stochastic interpolation combining the
densification step with the approximation step. This would still be expensive.
However, we introduce an approximate means for doing the interpolation that
interpolates for smooth data, but which approximates for noisy data, thus avoiding
the costly need to construct the inverse operator needed for interpolation.
2. Development
Consider the
task of sampling a known function at points with so as to determine its value at .
The stochastic interpolant [5] to the data , is given bywhere is the data vector, and where is a row stochastic matrix whose coefficients
consist of the values .
Choosing the generator of the row space of to be the Bernstein functions [1] (named after Bernstein as
the derivation of this form that can be obtained from the Bernstein polynomials), we
havewith on the partition ,
with and yielding a stochastic
matrix. Setting generates the entries of ,
and setting to any set of consecutive values in generates the coefficients of ,
that is, the coefficients of are constructed in the same manner as for ,
except that the nodes at which is evaluated may differ from the values at
which the data are given. While any probability density function (pdf) can be
used, appropriately replacing the mean and variance of the Gaussian in (2), a
pdf based on the normal distribution is consistent with the problem of
filtering Gaussian noise.
In stochastic interpolation, with the coefficient of generated by (2), we can interpret as the discrete deconvolution of the data
yielding the preimage generated by .
This preimage is then convolved by to yield an -vector of values that interpolates the data at the output coordinates , ;
the output coordinates are those that were used to generate the coefficients of .
It is for this reason that we have elected
to represent the matrix using another symbol since it is desirable to emphasize its role in
convolution.
Defining with and with constant yields a coefficient structure in
which is a diagonal matrix times a symmetric
Toeplitz matrix: .
Inversion or solution of these matrices can be accomplished in operations [8], however in the cases of interest to us, this is not
necessary. It is possible to do better than this using an approximate inverse
in which the row space of is generated directly. While evaluating is still ,
it is significantly faster than the Toeplitz matrix inversion of to obtain .
The approximate inverse is an approximation precisely because ,
that is, in applying stochastic interpolation to the data ,
there is an error: Thus the
interpolant to the data can be expressed using successive correction to the
errors usingSubstituting for , from to ,
givesand truncating the sum gives a
working formula in which even for larger values of ,
so that the method nearly interpolates. Truncating at and applying the formula to generate output values instead of giveswhere .
Provided that in (2) is small, the error in constructing using is small, however if larger values of are used, then greater smoothing is applied to
the data and the use of (6) becomes necessary when is larger than 0.05. However, it will become
apparent that it is because of this greater smoothing that it is unnecessary to apply any corrections as shown in
(6), and thus the direct computation of is found to be convenient and efficient, requiring
only a single matrix multiply.
In working with stochastic data recovery, it is
obvious that using the Bernstein functions mollifies the
data vector ,
and thus provides an approximation vector of length to the initial vector of length .
Consider the evaluation of where the generator of the row space of and are given by (2). This interpolates the data
provided that ,
meaning that the variance of the Gaussian pdf that is used to generate , is the same as
the variance that is used to generate .
If instead ,
then the preimage will be oversmoothed when is applied to the preimage, and the result
will be approximation. Similarly, if ,
then the preimage will not be smoothed sufficiently, and the
data will be roughened, or more appropriately it will be deconvolved.
With statistical errors present in multidimensional
data, interpolation in parametric coordinates may yield an extremely complex
curve, and the errors may cause the curve to wiggle excessively, often crossing
over on itself. Clearly, some form of smoothing is necessary. However, as noted
in Section 1, simply approximating the data so as to smooth these errors may
introduce translation errors in the approximating surface representation,
particularly when the number of data points is small, or the smoothing
specified by is large. Since the approximation is
convergent, a simple work-around can be achieved by densification of the data
by interpolation as this will minimize this shift on subsequent smoothing. This
leads to a computationally efficient approach to surface recovery that avoids
translation errors while smoothing the noise based on evaluating ,
where is the desired number of output points
representing the boundary, and are the number of input data points and .
To demonstrate this form, note that the intended
construction is to evaluate ,
where is significantly greater than with ,
and then applying smoothing to this densified data by multiplying by ,
where .
In effect, this interpolates the data and then smooths it by the application of
the approximation to the densified data. Thus a two-step
algorithm can be described as follows:
(1) (2)
This
algorithm is equivalent to the following:
(1)
that is,
there exist and such that applying to is the same, or nearly the same, as applying to .
The use of for a wide range of instead of introduces some additional smoothing, allowing
for less smoothing to be used on the convolution step, that is, when applying .
Since it saves an unnecessary matrix multiply, it is clearly faster. The
construction of is not difficult, remarkably being given by
the inverse of the generator of the row space of .
The elements of the approximate inverse are given by the reciprocals of the
coefficients of .
It is for this reason that the direct inversion of
, or solution of the system using efficient Toeplitz
solvers, is not needed. The only exception may be when the
data is free of noise and exact interpolation without any
smoothing is desired.
The reconstruction of surface data is done using a
parametric representation in which for two-dimensional data, and for three-dimensional data, where .
The recovery of the data is done using (1) or its modification using an
approximate inverse ,
applied successively to the data pairs , in two dimensions and to the data triplets , , in three dimensions. For example, in the case
of interpolating two-dimensional data, we apply and to obtain the interpolant to the position
vector and the position vector ,
or we apply and to obtain the approximate interpolants.
In reconstructing a parametrically represented surface
generated from image data from pixel values, for instance, an algorithm for
boundary detection and sorting is necessary. In our analysis, it is assumed
that this is available, however the errors generated in parameterizations of
the surface may not be entirely random, and thus systematic errors in surface
representation will also be introduced. For the purpose of assessing the
performance of the boundary recovery algorithms, it will be assumed that the
errors are random Gaussian with a mean of zero, with variable variances ,
generated using the random variablewhere and are two random numbers uniformly distributed
in the interval .
Thus the parametrically ordered data ,
with or ,
and ,
are perturbed to yield the data sets .
In applying stochastic data recovery to the problem of
finding the shape of a parametrically defined boundary, the problem of closed
curves needs to be addressed. In the presence of large amount of noise, the two
endpoints of the parametrically defined curves may not match: while ideally in the case of a close loop, in the presence
of errors this will not be the case.
Finally, in implementing the algorithm, it was found
that the dependence of (2) on in the denominator made the choice of dependent on ,
as the boundary data density increased, the recovery using any given value of produced increasingly rougher curves as increased, and thus it was found convenient to
evaluate and based on a constant value of .
Additionally, the algorithm was applied to all of the boundary data associated
with a parametric data set to construct the boundary curve, rather than
decomposing the data into overlapping blocks. The merits of using all of the
data are a slight improvement in accuracy, while the drawbacks are that the
cost of evaluating the algorithm increases as the block size increases, and
this should be born in mind in applying the algorithm to large complex
three-dimensional data sets.
The choice of depends on the smoothness of the desired
boundary curve. The larger is the value of ,
the smoother are the results. The values for and depend on the amount of noise, as well as on
the presumed smoothness of the boundary data. While this seems to present
difficult choices, it is less complicated than it appears, as the choice for will usually be any sufficiently small value.
For example, setting allows for representing the data as nearly
piecewise linear along the boundaries. In contrast, the choice for requires some evaluation as this determines
the smoothness of the recovered boundary.
3. Results and Discussion
We begin by applying the process to the recovery of
the boundary of a disk defined parametrically by 12 uniformly distributed
points with no random errors in the data as shown in Figure 1. The figure
clearly illustrates the aforementioned difficulty of attempting to approximate
(smooth parametric data).
Figure 1: Recovery of the
boundary of disk specified by 12 points, starting at on the -axis, and moving counterclockwise around the
circle. Note that in the circle in (a) the boundary of the recovered disk is
approximated with using ,
resulting in the inscribed disk, while in (b) the boundary is constructed using
approximate interpolation using and applying .
In examining Figure 1(a), it is also important to
observe that the beginning and the end of the curve are joined by a straight
line in this example. The reason is that the slope of the approximant to the
data is not the same as the slope to the data ,
and thus in this example, the first approximated point does not agree with the last approximated
point ,
and are joined graphically with a straight line to close the curve.
A solution to the mismatch is to overlap the curves
during reconstruction, that is, at several points away from the last point, and
ending the construction several points away from the first point, then only
using the curve from the first to the last point. In all of the studies
presented, there is no attempt to overlap the curves in order to magnify these
boundary affects, and to demonstrate that they are mostly negligible, as seen
in Figure 1(b), whenever the construction is done correctly. For large data sets
where it may be computationally advantageous to block the data, overlapping the
endpoints is readily accomplished.
It is important to realize that if the number of
points representing the disk were to be much more than 12, then it would have
been difficult to visualize the contraction of the recovered boundary curve
since the approximation is convergent. Thus for a sufficiently large number of
data points, the difference between the approximating curve and the curve
itself becomes arbitrarily small.
Recovering the boundary of a disk becomes more
difficult, as shown in Figure 2, particularly if the amount of noise is quite
large. In this example, the Gaussian noise for a disk of radius 5 is specified
as yielding an extensively scattered data set.
While the level of noise in this illustrative example is much higher than would
be expected in any realistic imaging situation, the example serves a two-fold
purpose: (1) on the one had it shows the robustness of the method at recovering
a reasonable representation of the surface from the data that is more consistent
with noise than data, and (2) it shows that the effects of errors in
constructing the parameterization are less of an issue than might be presumed.
In the figure, the connectivity of the boundary data is not ordered in moving counterclockwise around the circle, and
so it is quite likely that any minor errors in parameterization, for example,
using even the simplest unconstrained nearest neighbor search of the data,
would cause unrecoverable errors in the surface representation that is
recovered.
Figure 2: Recovery of the boundary of disk specified by 96 points with Gaussian
noise of ,
starting at on the -axis and moving counterclockwise around the
circle with the boundary recovered using approximate interpolation, with and .
The recovered curve is shown as a thicker line.
As expected, reducing the noise to significantly improves the recovery, even for
half as many points, as shown in Figure 3.
Figure 3: Recovery of
the boundary of disk specified by 48 points with Gaussian noise of
as in Figure
2.
While both of these figures are typical, the symmetry
and the smoothness of the boundary of the disk make the recovery somewhat less
challenging than for a more complex geometrical shape. Thus the performance of
the algorithm is examined on a star-shaped region generated from the vertex
data (8,65), (72,65), (92,11), (112,65), (174,65),
(122,100), (142,155), (92,121), (42,155), (60,100), (8,65).
Note that the recovery makes use of that on both steps is the same as that used in
recovering the boundary of the disk.
The shape of the star in Figure 5 has
become more wiggly using the smoothing parameters the same as in the case of
lesser noise. The problem is a classical one: there is no means to discern the
shape of the figure from the noisy data, except to note that an acceptable
shape is determined by the smoothness of the boundary that is intrinsic to the
figure. In this case, changing the smoothing can accommodate this subjective
assessment, as illustrated in Figure 6. Note that the recovered boundary is
consistent with the curve recovered from the less noisy data set: compare Figures
4 and 6 as shown in Figure 7.
Figure 4: Recovery of the boundary of the star
specified by 100 points: 10 along each arm with Gaussian noise of .
The boundary is constructed using approximate interpolation, using and .
The recovered curve is shown as a thicker line.
Figure 5: Recovery of
the boundary of the star specified by 100 points with Gaussian noise of
as in Figure
4. The recovered curve is shown as
a thicker line.
Figure 6: Recovery of the
boundary of the star specified by 100 points with Gaussian noise of
as in Figure
5. The boundary is constructed
using approximate interpolation with
and
.
The recovered curve is shown as a thicker line.
Figure 7: Recovery of the boundary of the star
comparing the recovered boundary shown in Figure
4 to the recovered boundary
shown in Figure
6. In this comparison, the recovery algorithm is implemented so
that the smoothing is proportional to the noise in the cases being compared,
yielding excellent results.
The effects of doubling the smoothing by taking to be twice as large are clearly
evident. Since the noise in both cases was generated using the same seed, it is
only the magnitude of the excursions away from the star's boundary that change,
and hence the figures are directly comparable. This perhaps most plainly
illustrates that interaction between noise and smoothing, as the two curves are
nearly identical. Even when the noise is doubled again to ,
the shape of the recovered curve using is remarkably consistent, as shown in Figure
8. At this level of noise there is some loss of resolution of the limbs,
however the recovered boundary is recognizable as being related to the two
boundaries obtained in Figures 4 and 6. While this is artificial in that the
amount of noise in the data in real problems is not known, it does demonstrate
the robustness of the algorithm at consistently recovering the boundary data
irrespective of the added noise.
Figure 8: Recovery of the boundary of
the star specified by 100 points with Gaussian noise of .
In this case, an extremely large amount of noise is added, however the
algorithm is effective at recovering features of the star shape. The boundary
is constructed using approximate interpolation by using and .
The recovered surface is shown as a thicker line.
A final remark on the computing of the approximate
interpolant is the following. Since the approximate interpolant fails to
interpolate when is large or, more appropriately, fails to
interpolate rapidly varying data, quite some effort was expended in developing
mechanisms for being able to use large for smoothing and yet still maintain some
fidelity with the boundary data while constructing the preimage. As it
developed, this iterative correction was not necessary: the algorithm performed
well even without the introduction of any corrections. In part, this is due to
the rather simple shapes, and the relatively large amounts of noise that were
examined.
In the case when a surface is oscillating rapidly with
the noise much less than this surface oscillation, it is clearly necessary to
implement the algorithm using this additional correction. Indeed, given
sufficiently rough surface data, it may be necessary to use stochastic
interpolation as otherwise some high-frequency surface details will be
oversmoothed by the approximate inverse construction.
4. Conclusions
Examination of several attempts at domain boundary
reconstruction in two dimensions is encouraging
using the stochastic data recovery techniques. In particular, qualitative
assessments demonstrate the viability of utilizing stochastic function recovery
methods for reconstructing parametrically defined edges. Since the approach is
intrinsically one dimensional, its extension to three dimensions is not
difficult, and thus can easily be implemented and tested on more realistic
problems. Moving to three dimensions poses no additional cost other than that
more data has to be processed, that is, the algorithmic costs scale directly
with the number of lines being evaluated, and the number of points on each line
at which the algorithm is applied.
It should be noted that the proposed technique does
not solve all problems involving noisy surface reconstruction, however it does
provide an additional tool for analysis. The Gaussian-based kernels (the
Bernstein functions) which were used to generate the row space of the matrices,
were remarkably effective at cancelling the noise even under the most extreme
conditions where the noise essentially obliterated the image shape. Of course,
this noise was Gaussian and so it is only reasonable that the proposed approach
would work well in these circumstances. For other types of noise, the use of alternative
probability density functions for generating the mollifiers is easily
accomplished, and thus the method has substantial design flexibility, and these
options need to be explored in detail to ascertain their utility at cancelling
these other types of noise.
The computational advantages of the technique are that
it requires only two matrix multiplies of the data vector, and thus the
approach is relatively cost effective. Moreover, the method is easily
implemented in parallel, and further computational gains in efficiency can be
achieved by blocking the data since typically only a segment of the entire data
vector is needed to recover the data in any region. If fixed block sizes can be
implemented, then the cost of the two matrix vector multiplies can be further
reduced as the matrix-matrix multiply needs to be done only once, and so the
algorithm reduces to a single-block matrix-vector multiply.