Abstract

In many applications, observed signals are contaminated by both random noise and blur. This paper proposes a blind deconvolution procedure for estimating a regression function with possible jumps preserved, by removing both noise and blur when recovering the signals. Our procedure is based on three local linear kernel estimates of the regression function, constructed from observations in a left-side, a right-side, and a two-side neighborhood of a given point, respectively. The estimated function at the given point is then defined by one of the three estimates with the smallest weighted residual sum of squares. To better remove the noise and blur, this estimate can also be updated iteratively. Performance of this procedure is investigated by both simulation and real data examples, from which it can be seen that our procedure performs well in various cases.

1. Introduction

Nonparametric regression analysis provides us statistical tools for estimating regression functions from noisy data [1]. When the underlying regression function has jumps, the estimated functions by conventional nonparametric regression procedures are not statistically consistent at the jump positions. However, the problem to estimate jump regression functions is important because the true regression functions are often discontinuous in applications [2]. This paper focuses on estimation of the jump regression function when the observed data are contaminated by both random noise and blur.

In the literature, there are some existing procedures for estimating regression curves with jumps preserved in cases when only random noise is present in observed data. These procedures include the one-sided kernel estimation methods (e.g., [39]), the local least-squares estimation procedures (e.g., [1012]), the wavelet transformation method [13], the spline smoothing method [14], and the direct estimation methods (e.g., [15, 16]).

In some applications, our observations are both blurred and contaminated by pointwise noise (e.g., signals of groundwater levels in geothermy). It is, therefore, important to remove both noise and blur when estimating the true regression function. In the nonparametric regression literature, we have not seen any discussion about this problem yet. In the context of image processing, which can be regarded as a two-dimensional nonparametric regression problem [2], there is a related research area concerned about deblurring images. Most existing image deblurring procedures assume that the point-spread function (psf), which describes the blurring mechanism, either is known or has a parametric form, and this function is homogeneous in the entire image (e.g., [1720]). In many applications, however, it is difficult to specify the psf completely or partially by a parametric model. Our major goal in this paper is to provide a method to estimate the true regression function properly in cases when both noise and blur are present and the psf is unspecified.

The remaining part of the paper is organized as follows. In next section, our proposed method is discussed in detail. Some comparative results are presented in Section 3 with several simulated examples. A real data application is presented in Section 4. Some concluding remarks are included in Section 5.

2. Our Proposed Method

Suppose that the regression model concerned is where are design points, are i.i.d. random noise with mean 0 and variance , and is an unknown nonparametric regression function that is continuous in except on some jump points . In (2.1), is a psf, and denotes the convolution between and , defined by

If is not zero at the origin and zero everywhere else, then there is no blurring in the observed curve. In such cases, model (2.1) is the conventional nonparametric regression model. Generally speaking, the problem described by model (2.1) is ill posed if both and are unknown, because infinite sets of and would correspond to the same response in such cases. Therefore, proper estimation of based on (2.1) is a challenging problem. As a demonstration, in Figure 1(a), the solid lines denote a true regression function with one jump point at , the dotted curve denotes the blurred regression function , and the small pluses denote a set of observations from model (2.1).

To estimate , by the conventional local linear kernel (LLK) smoothing [1], we would consider the problem where is a density kernel function with support and is a bandwidth parameter. Then the solution of (2.3) to , denoted as , is defined as the LLK estimator of . In Figure 1(b), the dotted curve denotes the blurred regression function, and the dashed curve denotes the LLK estimate of . It can be seen that the LLK estimate removes the noise well; but the blur is not removed and it is actually made more serious around the jump point.

Qiu [15] finds that the major reason why the conventional LLK estimate could not preserve the jump at the jump point is that it uses a “continuous” function (i.e., a linear function) locally to approximate the jump regression function. To overcome this limitation, Qiu suggests fitting a piecewise linear function around as follows: where is an indicator function defined by if and otherwise. The solution to and are denoted as and . It is easy to see that and are actually LLK estimates of constructed from observations in the left-sided neighborhood and the right-sided neighborhood , respectively, with kernel functions and . Then, Qiu suggests the following jump-preserving estimate of : where and are the Weighted Residual Mean Squares(WRMSs)of the left-sided and right-sided estimates, respectively, defined by

Qiu [15] proves that is a consistent estimate of when there is no blurring in the observed data.

Since only part of observations is actually used in , this estimator would be quite noisy in continuity regions of . To overcome this problem, similar to the method in [16], we propose a modification as follows: By (2.7), when is far away from any jump points, would be estimated by the conventional LLK estimate. It would still be estimated by one of the one-sided estimates around the jump points. Therefore, it is expected that would be better for estimating than . The solid curve in Figure 2 denotes constructed from the data shown in either plot of Figure 1. The two dotted curves show and , respectively. It can be seen that indeed estimates well in this case.

The estimate defined in (2.7) can also be updated iteratively as follows. The estimated values can be used as observed data, and the estimate can be updated by (2.7) with all quantities on the right-hand side of (2.7) computed from , and this process can continue iteratively. Numerical results in the next section suggest that a good estimate can usually be generated after about 5 iterations in all the cases considered there.

In our procedure, the bandwidth parameter should be chosen properly. To this end, we use the following cross-validation (CV) procedure:

where is the estimate of using all of the data points except the th point .

3. Simulation Study

In this section, some simulated examples are presented concerning the numerical performance of our proposed procedure. In all numerical examples presented in this paper, the Epanechnikov kernel function is used. We consider the following two true regression functions:

The function has two jumps at 0.4, 0.8, and a roof discontinuity (i.e., jump in the slope) at 0.2. The function has a jump at 0.78, and an unbalanced cusp (i.e., a sharp angle) at 0.26. These functions are shown by the solid curves in Figure 3. Our observations are generated from model (2.1) with random noise from the distribution and the psf . One set of observations from each regression function is shown by the little pluses in Figure 3.

For the proposed procedure, its Mean-Squared Error (MSE) values with several different combinations are presented in Figure 4 as functions of the number of iterations, where the bandwidth is chosen by the CV procedure in each iteration. All of the MSE values are computed based on 100 replications. From the plots in Figure 4, it can be seen that, for each combination, MSE values first decrease and then increase with the iteration number. The optimal number of iteration is around in each case, which is the number that we recommend to use in applications.

Next, we compare the proposed procedure (denoted as NEW) with the conventional local linear kernel (LLK) smoothing procedure and the jump-preserving curve estimation (JPCE) procedure by Qiu [15]. Figure 5 presents the estimated regression functions by all three methods from the observed data shown in Figure 3. For procedure NEW, 5 iterations are used. From the plots in Figure 5, it can be seen that LLK blurs the jumps, JPCE preserves the jumps well but its estimates are quite noisy in continuity regions, and our proposed procedure NEW preserves the jumps well and also removes noise efficiently.

Tables 1 and 2 present the MSE values and the corresponding standard errors of the three methods in various cases. We use 5 iterations in the proposed procedure NEW. From Tables 1 and 2, it can be seen that procedure NEW performs the best in all cases.

4. An Application

In this section, we apply our proposed method to a groundwater level data. Possible jumps in groundwater level arise from changes in subsurface fluid currents, which has become an important predictor of earthquakes. In Figure 6, little pluses denote the groundwater level observed by the Seismograph Network Stations of China Earthquake Center during January and May 2007. The solid curve denotes the estimated regression curve by our proposed procedure in which all parameters are chosen in the same way as in the simulation examples presented in Section 3. As indicated by the plot, the jumps around Mar 23, Aril 10, and Aril 17 are preserved well by our procedure. We checked the earthquake history and it is confirmed by China Earthquake Center that earthquakes with magnitude of more than 4.0 occurred in these periods in the area of observation.

5. Concluding Remarks

We have presented a blind deconvolution procedure for jump-preserving curve estimation when both noise and blur are present in the observed data. Numerical results show that it performs well in various cases. However, theoretical properties of the proposed method are not available yet, which requires much future research. We believe that the proposed method can be generalized to two-dimensional cases to solve problems such as image deblurring and restoration.

Acknowledgments

This research was supported in part by an NSF grant. The authors thank the guest editor of this special issue, Professor Ming Li, for help during the paper submission process. They also thank the three anonymous referees for their careful reading of the paper.