Abstract
Smoothing noisy data is commonly encountered in engineering domain, and currently robust penalized regression spline models are perceived to be the most promising methods for coping with this issue, due to their flexibilities in capturing the nonlinear trends in the data and effectively alleviating the disturbance from the outliers. Against such a background, this paper conducts a thoroughly comparative analysis of two popular robust smoothing techniques, the type estimator and estimation for penalized regression splines, both of which are reelaborated starting from their origins, with their derivation process reformulated and the corresponding algorithms reorganized under a unified framework. Performances of these two estimators are thoroughly evaluated from the aspects of fitting accuracy, robustness, and execution time upon the MATLAB platform. Elaborately comparative experiments demonstrate that robust penalized spline smoothing methods possess the capability of resistance to the noise effect compared with the nonrobust penalized LS spline regression method. Furthermore, the estimator exerts stable performance only for the observations with moderate perturbation error, whereas the estimator behaves fairly well even for heavily contaminated observations, but consuming more execution time. These findings can be served as guidance to the selection of appropriate approach for smoothing the noisy data.
1. Introduction
Penalized spline smoothing technique has already been perceived as a popular nonparametric smoothing approach for smoothing noisy data in engineering domain [1–3] during the past 20 years due to its ease of fitting, flexible choice of the inner knots, and smoothing parameter. The ideas of employing the combination of regression splines, with substantially smaller number of inner knots than the sample size, and the introduction of the penalized parameter, which compromises the fitting accuracy and the smoothness of the regression curve, can be traced back to at least O’Sullivan [4] who utilized a cubic Bspline basis for estimation in illposed inverse problems.
During the early stage, Kelly and Rice [5] and Besse et al. [6] approximated the smoothing splines by the hybrid splines equipped with the inner knots equal to the data samples and a penalty parameter for determining the smoothing amount of regression curve, which was usually calculated according to the (generalized) crossvalidation criterion. Other criterions included the Mallows’ criterion [7] and the Akaike information criterion [8, 9]. Besides, Eilers and Marx [10] proposed flexible criteria for choosing an optimal penalty parameter of dealing with the Bsplines coefficients. Wand [11] additionally derived a quick and simple formula approximation to the optimal penalty parameter for the penalized spline regression. Moreover, Ruppert and Carroll [12] investigated the spatially varying penalties and Ruppert [13] recommended the selection of designed inner knots. More discussion and examples on the penalized regression spline models can be referred to the excellent monograph written by Ruppert et al. [14].
The subsequent explorations on penalized spline smoothing remained a hot area arousing great interests from both theory researchers and practice engineers for several years. For example, the theoretical aspects and practical applications of penalized spline smoothing were fully discussed in a dissertation composed by Krivobokova [15] from financial mathematics. Crainiceanu et al. [16] investigated the implementation of nonparametric Bayesian analysis for penalized spline regression using WinBUGS, in which the Markov Chain Monte Carlo (MCMC) mixing operator was substantially improved by employing the low rank thin plate splines basis. Besides, Marra and Radice [2] presented an accessible overview of generalized additive models based on the penalized regression splines and explored their potential applications into medical research. Generally, the penalized regression spline model is referred to the estimation of univariate smooth functions based on noisy data. However, some researchers also put forward its extension into higher dimension. Typical models include the thin plate regression splines [17] which are low rank smoothers obtained by truncating the basis of standard thin plate splines thereby requiring less computation to fit to large data sets and the penalized bivariate splines [18] with application on triangulations for irregular regions.
The regression curve for penalized spline is established by minimizing the sum of squared residuals subject to a bound on the size of the spline coefficients. Accordingly, a closedform formula can be derived from the penalized minimization problem. Apparently, such penalized least squares regression splines cannot be resistant to the disturbance from the outliers. A simple and direct idea is to replace the squared residuals by some slowerincreasing loss function similarly as employed in regression estimator [19, 20] for alleviating the outlier effects from the atypical observations. Early research regarding the type smoothing technique started from Huber et al. [21] and Cox’s [22] work on cubic regression splines. Another robust model can be the estimation [23] which possesses highbreakdown point and minimizes the residual scale in an extremely robust manner. A fast algorithm has been created by SalibianBarrera and Yohai [24] for computing the regression estimates. These two estimators have been elaborately compared in the linear regression by ÇetIn and Toka [25] with the results indicating that the estimator is more capable of resisting the outlier disturbance compared with the estimator but with lower efficiency.
Up until almost a decade ago, however, little work has been delivered taking the penalized spline regression into the category of robust estimation. Oh et al. [26] suggested a robust smoothing approach implemented to the period analysis of variable stars by simply imposing Huber’s favorite loss function upon the generalized crossvalidation criterion. And Lee and Oh [27] proposed an iterative algorithm for calculating the type estimator of penalized regression splines via introducing the empirical pseudodata. Afterwards, Finger [28] confirmed that such type estimator compromised between the very robust estimators and the very efficient least square type estimators for the penalized spline regressions when investigating the performance of different estimation techniques in crop insurance applications. Meanwhile, by replacing the least squares estimation with a suitable estimator, Tharmaratnam et al. [29] put forward an estimation for penalized regression spline as well, which proves to be equivalent to a weighted penalized least squares regression and behaves well even for heavily contaminated observations.
Against such a background, this paper performs a thoroughly comparative analysis mainly based on these two popular robust smoothing techniques, the type estimator and estimation for penalized regression splines, both of which are reelaborated starting from their origins, with their derivation process reformulated and the corresponding algorithms reorganized under a unified framework. The rest of the paper is organized as follows. Section 2 reelaborates the type estimator and estimation for penalized regression splines, starting from the least squares estimator and reorganizes the corresponding algorithms within a unified framework. Section 3 describes the comparative experiments upon both simulated synthetic data and one real weather balloon data set. Concluding remarks are included in the end of Section 4.
2. Robust Penalized Regression Splines
The following discussion is devoted to the reelaboration of penalized regression splines starting from the origin, with their derivation processes reformulated and the corresponding practical algorithms reorganized under a unified framework.
The idea of the penalized regression spline begins with the following model: where is supposed to be a smooth but unknown regression function which needs to be estimated based on the sample observations , and denotes the constant variance of the random deviation error between the response variable and the regression function .
It is a flexible concept as regards the penalized spline smoothing since different basis function may correspond to different penalized spline smoothing regression function. A common selection is the truncated polynomial bases, but other choices can also be explored straightway similarly where and denotes a vector of regression coefficients, is the specified inner knots, and indicates the exponential order for truncated power basis.
2.1. Penalized Least Squares Regression Splines
Given a group of sample observations , an increasingly popular way for obtaining the estimation of is via transforming it into the category of a least squares problem, where we need to seek out the member from the class which minimizes the sum of squared residuals. To prevent overfitting, a constraint on the spline coefficients is always imposed; that is, for some constant as discussed in Ruppert et al. [14]. Thus, by employing the Lagrange multiplier, the coefficients of the penalized least squares regression spline are equivalent to the minimizer of for some smoothing (penalty) parameter with respect to , which compromises the bias and variance tradeoff of the fitted regression spline curve.
Introduce , the spline design matrix the vector of responses , and the diagonal matrix which indicates that only the spline coefficients are penalized. Thus, the closedform minimizer for the objective function (4) can arrive by direct calculations; namely, Consequently, the corresponding estimation vector generated by the penalized least square regression spline can be represented as For figuring out the penalty parameter in (7), the (generalized) crossvalidation technique is employed hereafter, which is computed by leaveoneout of the residual from sum of squares so as to avoid overfitting in the regression spline. The crossvalidation criterion is given by where is the regression spline estimated by leaving out the th observation point. The optimized penalty parameter can be calculated by minimizing over . For the linear smoothing matrix, , and it can be verified [14] that Substitution of in (8) with the formula (9) results in The generalized crossvalidation criterion is proposed [30] simply by replacing in (10) with their average ; namely,
2.2. Type Estimator for Penalized Regression Splines
The above estimate of the penalized least squares regression splines may greatly suffer from various robustness issues especially when the data is contaminated with outliers. For alleviating this effect, a penalized regression estimator can be straightforwardly constructed by replacing the previous squared residual loss function with the following type criterion [27]: where is even, nondecreasing in and , for which a common choice is taken of the Huber loss function with cutoff , Apparently, the above Huber’s function is a parabola in the vicinity of zero, and it increases linearly at a given level . A default choice of the tuning constant aims for a 95% asymptotic efficiency with respect to the standard normal distribution.
Denote the derivative of as . When a set of fitted residuals , from the penalized spline regression are given, a robust scale estimator [19, 20] can be employed for estimating the standard deviation of these residuals as follows: A common choice for a measure of scale with a high finite sample breakdown point can be the value of determined by where is the population median. Besides, the standard estimation of usually employs the median absolute deviation (MAD) statistic, which is defined as where is the usual sample median of the fitted residuals and MAD is actually the sample median of the values , with its finite sample breakdown point approximately being 0.5 [20]. Suppose the fitted residuals are randomly sampled from a normal distribution; note that MAD does not estimate the corresponding standard deviation, , but rather estimates (with being the 0.75 quantile of the standard normal distribution). To put MAD in a more familiar context, it typically needs to be rescaled so as to estimate especially when the residuals are sampled from a normal distribution. In particular, By utilizing the aforementioned spline basis functions, an type penalized spline estimator using the standardized residuals can be computed as , with its regression coefficients being as follows: Due to the nonlinear nature of and , it is nontrivial to seek the minimizer of (18) while satisfying (14). Under such circumstances, an type iterative algorithm for calculating the penalized regression splines is proposed [27] by introducing the empirical pseudodata where is the penalized least square estimator corresponding to the above pseudodata As proved in Theorem 1 of Lee and Oh’s [27] work, the penalized least square estimator converges to asymptotically, which implies an type iterative algorithm consisting of the steps in Algorithm 1.

Regarding the type estimator for the penalized regression splines (see Algorithm 1), the initial curve estimate in Step 0 is suggested [27], choosing the nonrobust penalized least square regression spline presented by (7).
2.3. Estimation for Penalized Regression Splines
The robustness properties of the aforementioned type estimator for penalized regression splines are not completely satisfactory since they have low breakdown point [31], which was clarified by Maronna and Yohai [32] for the general estimators when considering both the regression and scale, simultaneously.
Another more robust and relatively latest approach for smoothing the noisy data is the estimation for penalized regression splines [29], whose core idea is utilizing a flexible estimator to replace the traditional least squares estimator. The coefficient vector of the penalized regression splines is defined as follows: in which, for each , satisfies where is cumulative distribution function of the standard norm distribution, , and the loss function is taken from Tukey’s bisquare family [33] It is evidently known from theoretical assumption that each fitted residual vector should follow a normal distribution with mean of zero; that is, , , however its median statistic, Median, in which , may greatly depart from zero to a large extent in the general practice. Considering this, we ameliorate the estimation of by using MADN in (17) with the addition of the absolute value of residual median; namely, The penalized estimator for the regression spline model is computed as , with its coefficients vector satisfying both (21) and (22), simultaneously. The following formula derivations indicate that such penalized estimator can actually be equivalent to a weighted penalized least squares regression.
Before proceeding with the derivation, it is forewarned that the symbol is simply written as just for clarity. Firstly, when (21) reaches its minimum, the following equation should be satisfied: On the other hand, taking the derivative of the scale function in (22) with respect to leads to in which , , and , .
Deformation of the above formula produces Substituting of (27) into (25), Recall that and import new symbol ; we have And, thus, a further simple deformation arrives at the coefficient vector with an iteration form which forms the basis of constructing an iterative algorithm for penalized regression spline.
The above formula derivations verify that is actually the solution to the following weighted penalized least squares regression problem: And such representation indicates that the GCV criterion in (11) can be employed straightforwardly with response variable , weighted design matrix , and the penalty term . Meanwhile, due to some weight elements being zero, the penalty parameter is thus calculated based on the following Regularized GCV criterion: where and denotes the number of nonzero elements in weight matrix .
Consequently, based on the foregoing indepth discussions, the estimator iterative algorithm for penalized regression splines can be constructed as in Algorithm 2.

In particular, it should be pointed out that since the employed loss function is bounded giving rise to the nonconvexity of , the objective function in (21) may have several critical points, each of which only corresponds to one local minima. Therefore, the convergent critical point of the iterations derived from the above algorithm completely depends on the prescribed initial regression coefficients vector, , in Step 0 of Algorithm 2.
One possible solution to cope with this issue proposed by Tharmaratnam et al. [29] is based on the random sampling theory, which can be briefly described as, firstly, creating random subsamples with a proper size, for example, one fifth of the sample number; namely, , from the original sample observations. Each subsample brings about an initial candidate of the coefficient vector, for instance, assigned by least squares fitting for the penalized regression splines or the type estimator with a small amount of additional computation cost, with the initial regression coefficient vector of the th subsample being denoted by , staring from which the final iterative convergent vector produced through Algorithm 2 is being written as , . Eventually, the global optimum is taken among all these convergent vectors to be the one corresponding to the minimum objective function in (21); namely, Therefore, the robust estimator for penalized regression splines can be computed as .
3. Experimental Performance Evaluation
This section is devoted to the comparative experiments upon both simulated synthetic data and one real weather balloon data set for the foregoing explored penalized spline smoothing methods, including the nonrobust penalized least square regression spline (LS) presented by (7) and two robust smoothing approaches, the type estimator [27] and estimation [29], for penalized regression splines as described in Algorithms 1 and 2.
3.1. Configuration and Criteria
It should be incidentally mentioned that the exponential order for truncated power basis is always employed for all these three methods, and the knot locations are selected in accordance with the recommended choice by Ruppert et al. [14], th sample quantile of the Unique(), ; together with a simple choice of the knot number , where . Besides, the corresponding penalty parameter is determined according to the generalized crossvalidation (GCV) criterion for the LS fitting and estimation methods and the regularized GCV criterion for the estimation method. Meanwhile, the termination tolerance and the maximum iteration number of 100 are assigned for the initial input parameters in Algorithms 1 and 2 and subsamples are randomly generated for exploring the ultimate optimum.
Meanwhile, the hardware equipment and software environment established for the experiments below are Dell XPS 8700 Desktop (CPU: Intel Core i74770 @ 3.40 GHz), MATAB R2014a installed upon Windows 7 SP1 Ultimate (64 Bit).
3.2. Synthetic Data
The simulated study is designed the same as in Lee and Oh [27] with the testing function The designed samples are generated with their horizontal axes being uniformly extracted from the open interval and the corresponding error distribution employing nine possible scenarios with the probability distribution function (pdf) specified if necessary, including (1) uniform distribution, , (2) standard normal distribution, , (3) logistic distribution with its pdf as , (4) double exponential distribution with its pdf as , (5) contaminated normal distribution I, , that is the error constituted with 95% from and 5% from , (6) contaminated normal distribution II, , (7) slash distribution, defined from two statistically independent random variables following (1) and (2), that is, , (8) Cauchy distribution with the pdf as , whose simulated outliers can be generated from uniform distribution by inverse transformation technique [34], and (9) an asymmetric normal mixture distribution, . For the sake of reducing the simulation variability, the sample size is taken as a fixed value, . It should be mentioned incidentally that the above (1)~(8) error scenarios are symmetrically distributed and provided with the ascending order according to the heaviness of their tails, and scenario (9) denotes the standard normal distribution mixed with 10% asymmetric outliers from .
Figure 1 presents the spline regression curves from the penalized LS fitting, estimator, and estimator for one time simulated scattered samples in nine possible scenarios, each of which with a subfigure panel demonstrating the comparative regression results for these three penalized spline smoothing methods.
(a)
(b)
(c)
(d)
(e)
(f)
(g)
(h)
(i)
For proceeding further quantitative evaluation of the comparative analysis for these three penalized spline smoothing methods, the statistical indicator of average squared error (ASE) is utilized here to qualify the goodness of fitting between the regression spline for the th simulation replicate from each of the above outlier scenarios and the true simulated function with the specified replicate number for each distribution.
Performances of the penalized spline smoothing methods are thoroughly evaluated from the aspects of the fitting accuracy, robustness, and execution time. In the following, Figures 2 and 3 illustrate the box plots of both ASEs and code running times using (a) penalized LSestimation, (b) penalized estimation, and (c) penalized estimation for the spline smoothing methods, as well as their median value comparisons of these two indicators for the simulated samples from each of the 9 distribution scenarios.
(a)
(b)
(c)
(d)
(a)
(b)
(c)
(d)
From the ASE comparisons in Figure 2, it can be found that the penalized LSestimation behaves well for the symmetric error scenarios with small tail weights, such as the simple uniform and standard normal distribution. However, it cannot cope with the complex conditions such as the three latter error distributions possessing heavy tails, thereby binging about large ASEs which are specially rescaled to the additive vertical coordinate on the right so as to be accommodated with the ASEs results for other error sources in the same window (refer to panels (a) and (d) in Figure 2). In comparison, the robust penalized type estimation presents satisfactory regression fittings for the distributions with moderate tail weights like (2)~(5) scenarios, which are located at the middle in each panel. Meanwhile, the robust estimator provides the comparatively meritorious smoothing results especially for the complicated circumstances like slash, Cauchy, or the asymmetric distributions with heavy tails.
In terms of the robustness, LSestimator simply does not have the qualifications to be mentioned from this aspect. Instead, the estimator demonstrates stable performance for the designed samples with moderate perturbation error and the estimator behaves dominantly stable for all kinds of error source scenarios.
Unfortunately, as for the execution efficiency, the most timeelapsed one is the estimator, followed by the estimator, with the least consuming estimator being the LSestimator since it has been treated as the initial curve estimate for the two former approaches. Moreover, although both estimator and estimator are composed of similar iterative process, the random sampling theory is further involved in the former model for exploring the ultimate solution for a nonconvex optimization problem.
3.3. Balloon Data
This section is devoted to the comparative experiments of the aforementioned penalized spline smoothing methods upon a set of weather balloon data [35]. The data consist of 4984 observations collected from a balloon at about 30 kilometers altitude, with the outliers caused due to the fact that the balloon slowly rotates, causing the ropes from which the measuring instrument is suspended to cut off the direct radiation from the sun.
As illustrated above, Figure 4 displays these balloon observational data together with the regression fitting curves by the penalized LS method, the robust penalized estimation method, and the robust estimation method. Besides, it can also be explicitly discovered by contrast that the nonrobust LS regression curve suffers from the disturbance of the outliers, and this phenomenon seems even more significantly around the position of horizontal axis being 0.8, while the other two robust splines pass through approximately the majority of the observations. Meanwhile, the estimator seems to behave slightly more superior to the estimator since the former accords with the data trend better especially at the ridge and two terminal nodes.
At the same time, under the platform specified in Section 3.1, the algorithm execution times of this weather balloon data set with 4984 observations are 2.3 s, 35.1 s, and 58.9 s for the penalized LS regression, the penalized type estimator, and the penalized estimator, respectively.
4. Conclusions
This paper conducts a comprehensively comparative analysis of the current two popular robust penalized spline smoothing methods, the penalized type estimator and the penalized estimation, both of which are reelaborated starting from their origins, with their derivation process reformulated and the ultimate algorithms reorganized under a unified framework.
Experiments consist of firstly one simulated synthetic data set with the outliers in 9 possible scenarios and then a practical weather balloon data set for the aforementioned penalized regression spline, whose performances are thoroughly evaluated from the aspects of fitting accuracy, robustness, and execution time.
Conclusions from these experiments are that robust penalized spline smoothing methods can be resistant to the influence of outliers compared with the nonrobust penalized LS spline fitting method. Further, the estimator can exert stable performance only for the observed samples with moderate perturbation error while the estimator behaves dominantly well even for the cases with heavy tail weights or higher percentage of contaminations, but necessary with more executive time. Therefore, taking both the solving accuracy and implementation efficiency of the fitted regression spline into account, type estimator is generally preferred, whereas if only the fitting credibility is of concern, then the estimator naturally deserves to become the first choice.
Conflict of Interests
The authors declare that there is no conflict of interests regarding the publication of this paper.
Acknowledgments
Mr. Bin WANG is the beneficiary of a doctoral grant from the AXA Research Fund. Besides, the authors acknowledge the funding support from the Ministry of Science and Technology of China (Project nos. 2012BAJ15B04 and 2012AA12A305) and the National Natural Science Foundation (Project no. 41331175).