Abstract
We consider the problem of differentiating a function specified by noisy data. Regularizing the differentiation process avoids the noise amplification of finite-difference methods. We use total-variation regularization, which allows for discontinuous solutions. The resulting simple algorithm accurately differentiates noisy functions, including those which have a discontinuous derivative.
1. Introduction
In many scientific applications, it is necessary to compute the derivative of functions specified by data. Conventional finite-difference approximations will greatly amplify any noise present in the data. Denoising the data before or after differentiating does not generally give satisfactory results (see an example in Section 4).
A method which does give good results is to regularize the differentiation process itself. This guarantees that the computed derivative will have some degree of regularity, to an extent that is often under control by adjusting parameters. A common framework for this is Tikhonov regularization [1] of the corresponding inverse problem. That is, the derivative of a function , say on , is the minimizer of the functional where is a regularization or penalty term that penalizes irregularity in , is the operator of antidifferentiation, is a data fidelity term that penalizes discrepancy between and , and is a regularization parameter that controls the balance between the two terms.
The data fidelity term is most commonly the square of the norm, , as is appropriate if has additive, white Gaussian noise [2]. Other data fidelity terms can be used if the noise has a different, known distribution; see [3] for an alternative in the case of Poisson noise, and [4] for a more general approach.
In [1], the regularization term is the squared norm; this controls the size of , without forcing minimizers to be regular. Tikhonov regularization was first applied to numerical differentiation by Cullum [5], where the regularization is the squared norm, . This forces minimizers to be continuous, as is required for the norm to be finite. This prevents the accurate differentiation of functions with singular points.
Other variational methods have the same drawback of forcing smoothness. An approach that penalizes the norm of forces the minimizer to be a cubic spline (see [6–8]). The variational approach of Knowles and Wallace [9] does not fall into the category of Tikhonov regularization, but explicitly assumes that is smooth.
2. Total-Variation Regularization
We propose to use total-variation regularization in (1.1). We will thus compute the derivative of on as the minimizer of the functional We assume (an empty assumption in the discrete case), and for convenience that (in practice we simply subtract from ). The functional is defined on , the space of functions of bounded variation. It is in fact continuous on , as is continuously embedded in , and is continuous on (being an integral operator with bounded kernel). Existence of a minimizer for follows from the compactness of in [10, page 152] and the lower semicontinuity of the seminorm [10, page 120]. This and the strict convexity of are sufficient to guarantee that has a unique minimizer .
Use of total variation accomplishes two things. It suppresses noise, as a noisy function will have a large total variation. It also does not suppress jump discontinuities, unlike typical regularizations. This allows for the computation of discontinuous derivatives, and the detection of corners and edges in noisy data.
Total-variation regularization is due to Rudin et al. in [11]. It has since found many applications in image processing. Replacing in the two-dimensional analog of (2.1) with the identity operator gives a method for denoising an image ; see [12, 13] for an example where is the Abel transform, giving a method for regularizing Abel inversion.
3. Numerical Implementation
A simple approach to minimizing (2.1) is gradient descent. This amounts to evolving to stationarity the PDE obtained from the Euler-Lagrange equation: where is the -adjoint of . Replacing the in the denominator with for some small avoids division by zero. Typically, (3.1) is implemented with explicit time marching, with discretized as for some fixed .
The problem with (3.1) is that convergence is slow. A faster algorithm is the lagged diffusivity method of Vogel and Oman [14]. The idea is to replace at each iteration of (3.1) the nonlinear differential operator with the linear operator . The algorithm has been proven to converge to the minimizer of [15].
We consider two discrete implementations of the lagged diffusivity algorithm. The first uses explicit matrix constructions, and is faster for smaller problems, but becomes impractical for data of more than a few thousand points. We assume that is defined on a uniform grid . Derivatives of are computed halfway between grid points as centered differences, . This defines our differentiation matrix . This approach avoids the consideration of boundary conditions for differentiation, and we find it gives better algorithmic results at the boundary. Integrals of are likewise computed halfway between grid points, using the trapezoid rule to define an matrix . Let be the diagonal matrix whose th entry is . Let , . The matrix is an approximation of the Hessian of at . The update is the solution to , where , and we solve the equation using MATLAB's backslash operator. For further algorithm details, see [16].
For larger problems, we use a modified version that avoids explicit matrices and uses simpler numerics. The differentiation matrix is constructed as the square, sparse matrix for simple forward differences, with periodic boundary conditions. We use a function handle for , simply using MATLAB's cumulative sum operator, as well as for . It is important that the discretization of be consistent with the discretization of , in the sense that if one were to use the function handles to construct matrices, then they would be transposes of each other. In other respects, the algorithm is as above, except the definition of now uses periodic boundary conditions, and the equation is solved using preconditioned conjugate gradient. For the preconditioner, we perform incomplete-Cholesky factorization on the sum of and the diagonal matrix whose entries are the row sums of , these sums being computable analytically.
Less straightforward is the choice of the regularization parameter . One approach is to use the discrepancy principle: the mean-squared difference between and should equal the variance of the noise in . This has the effect of choosing the most regular solution of the ill-posed inverse problem that is consistent with the data . In practice, the noise in is not generally known, but the noise variance can be estimated by comparing with a smoothed version of . The other approach is simply to use trial and error, adjusting to obtain the desired balance between suppression of oscillations and fidelity to the data. In the next section, we will see an example showing the effect of the value of .
4. Examples
4.1. A Simple Nonsmooth Example
Let , defined at 100 evenly spaced points in . We obtain by adding Gaussian noise of standard deviation 0.05. Figure 1 shows the resulting . First, we show in Figure 2 the result of computing by simple centered differencing. The noise has been greatly amplified, so much that denoising the result is hopeless.
We compare this with the result in Figure 3 of denoising before computing by differencing. The denoising is done by regularization, minimizing an appropriate denoising mechanism for continuous functions. We use , using the discrepancy principle, as this results in the norm of being 0.5, the expected value of the norm of the noise vector . The residual noise in the denoised is still amplified enough by the differentiation process to give an unsatisfactory result.
Now, we implement our total-variation regularized differentiation, (2.1). We use the matrix-based version described above, using and , initializing with the naive derivative (specifically [0; diff( f ); 0], to obtain a vector of the appropriate size). Although convergence is nearly complete after 100 iterations, the points closest to the jump move much more slowly, adopting their final positions after 7000 iterations. This takes just 13.1 s, running on a conventional dual-core desktop. The result is in Figure 4. The overall shape of is captured almost perfectly. The jump is correctly located. The one inaccuracy is the size of the jump: there is a loss of contrast, which is typical of total-variation regularization in the presence of noise. Decreasing the size of the jump reduces the penalty term in (2.1), at the expense of increasing the data-fidelity term.
We also show the result of applying the antidifferentiation operator to the computed , and compare with in Figure 5. The corner is sharp and the lines are straight, though a little too flat.
4.2. Respirometry Data
Now, we consider data obtained from a whole-room calorimeter, courtesy of Edward L. Melanson of the University of Colorado Denver. The metabolic rate of a subject within the calorimeter can be determined via respirometry, the measurement of oxygen consumption, and carbon dioxide production within the room. The raw data traces need to undergo response corrections in order to be useful, which involves differentiation. Quoting [17]:
In essence, the first derivative of the trace is calculated, multiplied by a constant derived from the volume of the room and the volumetric flow rate, and added to the original data. Because of the long time constant of the room (5 h), the multiplicative constant is very large. Consequently, any significant noise in the derivatized data will overwhelm the original trace.
Thus, we see the need to regularize the differentiation process for this application.
In Figure 6 there is an example of the raw data for the oxygen consumption rate, which needs to be differentiated for the purposes of response correction. The data consists of samples taken every second for most of a day, for a total of 82,799 samples. In Figure 7 there is the result of computing the unregularized, finite-difference derivative. If restricted to the same vertical range as the following plots, the plot would be a solid black rectangle.
The data size is much too large for the explicit-matrix implementation, so we use the implicit approach. In each case, we use 60 iterations, taking about 5 minutes. We compare the total-variation regularized derivative with that computed with regularization, for two different regularization strengths. First, a stronger regularization, with a value of for the TV case. The result is in Figure 8. We then adjust the parameter for regularization until the curve matches the TV result away from the large bump, namely, ; see Figure 9. A value of was used in both cases. The TV regularization is much more tolerant of rapid rises and falls while the result smooths away this information. We also compare the results of antidifferentiating the derivative and comparing with the original trace, with Figure 10 displaying a zoomed-in portion. The curve is unable to conform to the peak, as the regularization term heavily penalizes what would be the curvature of the curve in this figure. Away from the peak, the integrated derivatives follow the original trace, but not too closely, ignoring small-scale fluctuations. This is the effect of the regularization, with the choice of serving to determine the scale of fluctuation that is considered insignificant.
The above result would be appropriate if the rapid rise and fall in the derivative corresponded to the only feature of interest. Now, we examine the result of weaker regularization, so as to preserve smaller-scale features. In this instance, we use for the TV regularization. As before, we choose for the regularization so that the result matches the TV result over most of the time period, in this case . This time a slightly larger value of was required, in order to offset the poorer conditioning of the linear system solved in the lagged diffusivity algorithm when is smaller (in general, there is a tradeoff between better conditioning with larger values of and greater accuracy with smaller values).
Figures 11 and 12 show the results. Both derivatives capture more oscillations, including more structure in the rapid rise and fall. But the TV result is able to capture a discontinuity in the derivative that the result smooths away. When we compare the antiderivatives with the original function (Figure 13), we find that they follow the trace much more closely, conforming to smaller-scale fluctuations. Zooming in further, we see the greater curvature penalty on the curve prevents it from following the sharp corner, thus missing the discontinuous drop in the derivative.
5. Conclusions
We presented a method for regularizing the numerical derivative process, using total-variation regularization. Unlike previously developed methods, the TV method allows for discontinuities in the derivatives, as desired when differentiating data corresponding to nonsmooth functions. We used the lagged diffusivity algorithm, which enjoys proven convergence properties, with one implementation that works rapidly for small problems, and a second more suitable for large problems. The TV regularization allows the derivative to capture more features of the data, while adjusting the regularization parameter controls the scale of fluctuations in the data that are ignored.
Acknowledgments
This paper was supported by the U.S. Department of Energy through the LANL/LDRD Program. The authors wishes to thank John Lighton of Sable Systems International and Edward L. Melanson of the University of Colorado Denver for the data used in the examples.