Abstract

We consider the problem of differentiating a function specified by noisy data. Regularizing the differentiation process avoids the noise amplification of finite-difference methods. We use total-variation regularization, which allows for discontinuous solutions. The resulting simple algorithm accurately differentiates noisy functions, including those which have a discontinuous derivative.

1. Introduction

In many scientific applications, it is necessary to compute the derivative of functions specified by data. Conventional finite-difference approximations will greatly amplify any noise present in the data. Denoising the data before or after differentiating does not generally give satisfactory results (see an example in Section 4).

A method which does give good results is to regularize the differentiation process itself. This guarantees that the computed derivative will have some degree of regularity, to an extent that is often under control by adjusting parameters. A common framework for this is Tikhonov regularization [1] of the corresponding inverse problem. That is, the derivative of a function 𝑓, say on [0,𝐿], is the minimizer of the functional𝐹(𝑢)=𝛼𝑅(𝑢)+𝐷𝐹(𝐴𝑢𝑓),(1.1) where 𝑅(𝑢) is a regularization or penalty term that penalizes irregularity in 𝑢, 𝐴𝑢(𝑥)=𝑥0𝑢 is the operator of antidifferentiation, 𝐷𝐹(𝐴𝑢𝑓) is a data fidelity term that penalizes discrepancy between 𝐴𝑢 and 𝑓, and 𝛼 is a regularization parameter that controls the balance between the two terms.

The data fidelity term 𝐷𝐹 is most commonly the square of the 𝐿2 norm, 𝐷𝐹=𝐿0||2, as is appropriate if 𝑓 has additive, white Gaussian noise [2]. Other data fidelity terms can be used if the noise has a different, known distribution; see [3] for an alternative in the case of Poisson noise, and [4] for a more general approach.

In [1], the regularization term is the squared 𝐿2 norm; this controls the size of 𝑢, without forcing minimizers to be regular. Tikhonov regularization was first applied to numerical differentiation by Cullum [5], where the regularization is the squared 𝐻1 norm, 𝑅(𝑢)=𝐿0|𝑢|2. This forces minimizers to be continuous, as is required for the 𝐻1 norm to be finite. This prevents the accurate differentiation of functions with singular points.

Other variational methods have the same drawback of forcing smoothness. An approach that penalizes the 𝐿2 norm of 𝑢 forces the minimizer to be a cubic spline (see [68]). The variational approach of Knowles and Wallace [9] does not fall into the category of Tikhonov regularization, but explicitly assumes that 𝑢 is smooth.

2. Total-Variation Regularization

We propose to use total-variation regularization in (1.1). We will thus compute the derivative of 𝑓 on [0,𝐿] as the minimizer of the functional𝐹(𝑢)=𝛼𝐿0||𝑢||+12𝐿0||||𝐴𝑢𝑓2.(2.1) We assume 𝑓𝐿2 (an empty assumption in the discrete case), and for convenience that 𝑓(0)=0 (in practice we simply subtract 𝑓(0) from 𝑓). The functional 𝐹 is defined on 𝐵𝑉[0,𝐿], the space of functions of bounded variation. It is in fact continuous on 𝐵𝑉, as 𝐵𝑉 is continuously embedded in 𝐿2, and 𝐴 is continuous on 𝐿2 (being an integral operator with bounded kernel). Existence of a minimizer for 𝐹 follows from the compactness of 𝐵𝑉 in 𝐿2 [10, page 152] and the lower semicontinuity of the 𝐵𝑉 seminorm [10, page 120]. This and the strict convexity of 𝐹 are sufficient to guarantee that 𝐹 has a unique minimizer 𝑢.

Use of total variation accomplishes two things. It suppresses noise, as a noisy function will have a large total variation. It also does not suppress jump discontinuities, unlike typical regularizations. This allows for the computation of discontinuous derivatives, and the detection of corners and edges in noisy data.

Total-variation regularization is due to Rudin et al. in [11]. It has since found many applications in image processing. Replacing 𝐴 in the two-dimensional analog of (2.1) with the identity operator gives a method for denoising an image 𝑓; see [12, 13] for an example where 𝐴 is the Abel transform, giving a method for regularizing Abel inversion.

3. Numerical Implementation

A simple approach to minimizing (2.1) is gradient descent. This amounts to evolving to stationarity the PDE obtained from the Euler-Lagrange equation:𝑢𝑡𝑑=𝛼𝑢𝑑𝑥||𝑢||𝐴𝑇(𝐴𝑢𝑓),(3.1) where 𝐴𝑇𝑣(𝑥)=𝐿𝑥𝑣 is the 𝐿2-adjoint of 𝐴. Replacing the |𝑢| in the denominator with (𝑢)2+𝜖 for some small 𝜖>0 avoids division by zero. Typically, (3.1) is implemented with explicit time marching, with 𝑢𝑡 discretized as (𝑢𝑛+1𝑢𝑛)/Δ𝑡 for some fixed Δ𝑡.

The problem with (3.1) is that convergence is slow. A faster algorithm is the lagged diffusivity method of Vogel and Oman [14]. The idea is to replace at each iteration of (3.1) the nonlinear differential operator 𝑢(𝑑/𝑑𝑥)(𝑢/|𝑢|) with the linear operator 𝑢(𝑑/𝑑𝑥)(𝑢/|𝑢𝑛|). The algorithm has been proven to converge to the minimizer of 𝐹 [15].

We consider two discrete implementations of the lagged diffusivity algorithm. The first uses explicit matrix constructions, and is faster for smaller problems, but becomes impractical for data of more than a few thousand points. We assume that 𝑢 is defined on a uniform grid {𝑥𝑖}𝐿0={0,Δ𝑥,2Δ𝑥,,𝐿}. Derivatives of 𝑢 are computed halfway between grid points as centered differences, 𝐷𝑢(𝑥𝑖+Δ𝑥/2)=𝑢(𝑥𝑖+1)𝑢(𝑥𝑖). This defines our 𝐿×(𝐿+1) differentiation matrix 𝐷. This approach avoids the consideration of boundary conditions for differentiation, and we find it gives better algorithmic results at the boundary. Integrals of 𝑢 are likewise computed halfway between grid points, using the trapezoid rule to define an 𝐿×(𝐿+1) matrix 𝐴. Let 𝐸𝑛 be the diagonal matrix whose 𝑖th entry is ((𝑢𝑛(𝑥𝑖)𝑢𝑛(𝑥𝑖1))2+𝜖)1/2. Let 𝐿𝑛=Δ𝑥𝐷𝑇𝐸𝑛𝐷, 𝐻𝑛=𝐾𝑇𝐾+𝛼𝐿𝑛. The matrix 𝐻𝑛 is an approximation of the Hessian of 𝐹 at 𝑢𝑛. The update 𝑠𝑛=𝑢𝑛+1𝑢𝑛 is the solution to 𝐻𝑛𝑠𝑛=𝑔𝑛, where 𝑔𝑛=𝐾𝑇(𝐾𝑢𝑛𝑓)+𝛼𝐿𝑛𝑢𝑛, and we solve the equation using MATLAB's backslash operator. For further algorithm details, see [16].

For larger problems, we use a modified version that avoids explicit matrices and uses simpler numerics. The differentiation matrix 𝐷 is constructed as the square, sparse matrix for simple forward differences, with periodic boundary conditions. We use a function handle for 𝐴, simply using MATLAB's cumulative sum operator, as well as for 𝐴𝑇. It is important that the discretization of 𝐴𝑇 be consistent with the discretization of 𝐴, in the sense that if one were to use the function handles to construct matrices, then they would be transposes of each other. In other respects, the algorithm is as above, except the definition of 𝐸𝑛 now uses periodic boundary conditions, and the equation 𝐻𝑛𝑠𝑛=𝑔𝑛 is solved using preconditioned conjugate gradient. For the preconditioner, we perform incomplete-Cholesky factorization on the sum of 𝛼𝐿𝑛 and the diagonal matrix whose entries are the row sums of 𝐴𝑇𝐴, these sums being computable analytically.

Less straightforward is the choice of the regularization parameter 𝛼. One approach is to use the discrepancy principle: the mean-squared difference between 𝐴𝑢 and 𝑓 should equal the variance of the noise in 𝑓. This has the effect of choosing the most regular solution of the ill-posed inverse problem 𝐴𝑢=𝑓 that is consistent with the data 𝑓. In practice, the noise in 𝑓 is not generally known, but the noise variance can be estimated by comparing 𝑓 with a smoothed version of 𝑓. The other approach is simply to use trial and error, adjusting 𝛼 to obtain the desired balance between suppression of oscillations and fidelity to the data. In the next section, we will see an example showing the effect of the value of 𝛼.

4. Examples

4.1. A Simple Nonsmooth Example

Let 𝑓0(𝑥)=|𝑥1/2|, defined at 100 evenly spaced points in [0,1]. We obtain 𝑓 by adding Gaussian noise of standard deviation 0.05. Figure 1 shows the resulting 𝑓. First, we show in Figure 2 the result of computing 𝑓 by simple centered differencing. The noise has been greatly amplified, so much that denoising the result is hopeless.

We compare this with the result in Figure 3 of denoising 𝑓 before computing 𝑓 by differencing. The denoising is done by 𝐻1 regularization, minimizing𝛼𝐿0||𝑢||2+12𝐿0||||𝑢𝑓2,(4.1) an appropriate denoising mechanism for continuous functions. We use 𝛼=3.5×103, using the discrepancy principle, as this results in the 𝐿2 norm of 𝑢𝑓 being 0.5, the expected value of the 𝐿2 norm of the noise vector 𝑓𝑓0. The residual noise in the denoised 𝑓 is still amplified enough by the differentiation process to give an unsatisfactory result.

Now, we implement our total-variation regularized differentiation, (2.1). We use the matrix-based version described above, using 𝛼=0.2 and 𝜖=106, initializing with the naive derivative (specifically [0; diff( f ); 0], to obtain a vector of the appropriate size). Although convergence is nearly complete after 100 iterations, the points closest to the jump move much more slowly, adopting their final positions after 7000 iterations. This takes just 13.1 s, running on a conventional dual-core desktop. The result is in Figure 4. The overall shape of 𝑓0 is captured almost perfectly. The jump is correctly located. The one inaccuracy is the size of the jump: there is a loss of contrast, which is typical of total-variation regularization in the presence of noise. Decreasing the size of the jump reduces the penalty term in (2.1), at the expense of increasing the data-fidelity term.

We also show the result of applying the antidifferentiation operator to the computed 𝑓, and compare with 𝑓0 in Figure 5. The corner is sharp and the lines are straight, though a little too flat.

4.2. Respirometry Data

Now, we consider data obtained from a whole-room calorimeter, courtesy of Edward L. Melanson of the University of Colorado Denver. The metabolic rate of a subject within the calorimeter can be determined via respirometry, the measurement of oxygen consumption, and carbon dioxide production within the room. The raw data traces need to undergo response corrections in order to be useful, which involves differentiation. Quoting [17]:

In essence, the first derivative of the trace is calculated, multiplied by a constant derived from the volume of the room and the volumetric flow rate, and added to the original data. Because of the long time constant of the room (5 h), the multiplicative constant is very large. Consequently, any significant noise in the derivatized data will overwhelm the original trace.

Thus, we see the need to regularize the differentiation process for this application.

In Figure 6 there is an example of the raw data for the oxygen consumption rate, which needs to be differentiated for the purposes of response correction. The data consists of samples taken every second for most of a day, for a total of 82,799 samples. In Figure 7 there is the result of computing the unregularized, finite-difference derivative. If restricted to the same vertical range as the following plots, the plot would be a solid black rectangle.

The data size is much too large for the explicit-matrix implementation, so we use the implicit approach. In each case, we use 60 iterations, taking about 5 minutes. We compare the total-variation regularized derivative with that computed with 𝐻1 regularization, for two different regularization strengths. First, a stronger regularization, with a value of 𝛼=0.1 for the TV case. The result is in Figure 8. We then adjust the parameter for 𝐻1 regularization until the curve matches the TV result away from the large bump, namely, 𝛼=500; see Figure 9. A value of 𝜖=108 was used in both cases. The TV regularization is much more tolerant of rapid rises and falls while the 𝐻1 result smooths away this information. We also compare the results of antidifferentiating the derivative and comparing with the original trace, with Figure 10 displaying a zoomed-in portion. The 𝐻1 curve is unable to conform to the peak, as the 𝐻1 regularization term heavily penalizes what would be the curvature of the curve in this figure. Away from the peak, the integrated derivatives follow the original trace, but not too closely, ignoring small-scale fluctuations. This is the effect of the regularization, with the choice of 𝛼 serving to determine the scale of fluctuation that is considered insignificant.

The above result would be appropriate if the rapid rise and fall in the derivative corresponded to the only feature of interest. Now, we examine the result of weaker regularization, so as to preserve smaller-scale features. In this instance, we use 𝛼=103 for the TV regularization. As before, we choose 𝛼 for the 𝐻1 regularization so that the result matches the TV result over most of the time period, in this case 𝛼=1. This time a slightly larger value of 𝜖=106 was required, in order to offset the poorer conditioning of the linear system solved in the lagged diffusivity algorithm when 𝛼 is smaller (in general, there is a tradeoff between better conditioning with larger values of 𝜖 and greater accuracy with smaller values).

Figures 11 and 12 show the results. Both derivatives capture more oscillations, including more structure in the rapid rise and fall. But the TV result is able to capture a discontinuity in the derivative that the 𝐻1 result smooths away. When we compare the antiderivatives with the original function (Figure 13), we find that they follow the trace much more closely, conforming to smaller-scale fluctuations. Zooming in further, we see the greater curvature penalty on the 𝐻1 curve prevents it from following the sharp corner, thus missing the discontinuous drop in the derivative.

5. Conclusions

We presented a method for regularizing the numerical derivative process, using total-variation regularization. Unlike previously developed methods, the TV method allows for discontinuities in the derivatives, as desired when differentiating data corresponding to nonsmooth functions. We used the lagged diffusivity algorithm, which enjoys proven convergence properties, with one implementation that works rapidly for small problems, and a second more suitable for large problems. The TV regularization allows the derivative to capture more features of the data, while adjusting the regularization parameter controls the scale of fluctuations in the data that are ignored.

Acknowledgments

This paper was supported by the U.S. Department of Energy through the LANL/LDRD Program. The authors wishes to thank John Lighton of Sable Systems International and Edward L. Melanson of the University of Colorado Denver for the data used in the examples.