Table of Contents Author Guidelines Submit a Manuscript
Advances in Decision Sciences
Volume 2011 (2011), Article ID 485974, 22 pages
Research Article

Some Asymptotic Theory for Functional Regression and Classification

Department of Mathematics and Statistics, Texas Tech University, Lubbock, TX 79409, USA

Received 10 October 2011; Accepted 2 November 2011

Academic Editor: Wing Keung Wong

Copyright © 2011 Frits Ruymgaart et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.


Exploiting an expansion for analytic functions of operators, the asymptotic distribution of an estimator of the functional regression parameter is obtained in a rather simple way; the result is applied to testing linear hypotheses. The expansion is also used to obtain a quick proof for the asymptotic optimality of a functional classification rule, given Gaussian populations.

1. Introduction

Certain functions of the covariance operator (such as the square root of a regularized inverse) are important components of many statistics employed for functional data analysis. If Σ is a covariances operator on a Hilbert space, Σ a sample analogue of this operator, and 𝜑 a function on the complex plane, which is analytic on a domain containing a contour around the spectrum of Σ, a tool of generic importance is the comparison of 𝜑(Σ) and 𝜑(Σ) by means of a Taylor expansion:𝜑Σ=𝜑(Σ)+̇𝜑ΣΣΣ+remainder.(1.1) (It should be noted that ̇𝜑Σ is not in general equal to 𝜑(Σ), where 𝜑 is the numerical derivative of 𝜑; see also Section 3.) In this paper, two further applications of the approximation in (1.1) will be given, both related to functional regression.

The first application (Section 4) concerns the functional regression estimator itself. Hall and Horowitz [1] have shown that the IMSE of their estimator, based on a Tikhonov type regularized inverse, is rate optimal. In this paper, as a complementary result, the general asymptotic distribution is obtained, with potential application to testing linear hypotheses of arbitrary finite dimension, mentioned in Cardot et al. [2] as an open problem: these authors concentrate on testing a simple null hypotheses. Cardot et al. [3] establish convergence in probability and almost sure convergence of their estimator which is based on spectral cutoff regularization of the inverse of the sample covariance operator. In the present paper, the covariance structure of the Gaussian limit will be completely specified. The proof turns out to be routine thanks to a “delta-method’’ for 𝜑(Σ)𝜑(Σ), which is almost immediate from (1.1).

The second application (Section 5) concerns functional classification, according to a slight modification of a method by Hastie et al. [4], exploiting penalized functional regression. It will be shown that this method is asymptotically optimal (Bayes) when the two populations are represented by equivalent Gaussian distributions with the same covariance operator. The simple proof is based on an upper bound for the norm of 𝜑(Σ)𝜑(Σ), which follows at once from (1.1).

Let us conclude this section with some comments and further references. The expansion in (1.1) can be found in Gilliam et al. [5], and the ensuing delta method is derived and applied to regularized canonical correlation in Cupidon et al. [6]. For functional canonical correlation see also Eubank and Hsing [7], He et al. [8], and Leurgans et al. [9]. When the perturbation (ΣΣ in the present case) commutes with Σ the expansion (1.1) can already be found in Dunford & Schwartz [10, Chapter VII], and the derivative does indeed reduce to the numerical derivative. This condition is fulfilled only in very special cases, for instance, when the random function, whose covariance operator is Σ, is a second order stationary process on the unit interval. In this situation, the eigenfunctions are known and only the eigenvalues are to be estimated. This special case, that will not be considered here, is discussed in Johannes [11] who in particular deals with regression function estimators and their IMSE is Sobolev norms, when the regression is such a stationary process. General information about functional data analysis can be found in the monographs by Ramsay and Sliverman [12] and Ferraty and Vieu [13]. Functional time series are considered in Bosq [14]; see also Mas [15].

2. Preliminaries and Introduction of the Models

2.1. Preliminaries

As will be seen in the examples below, it is expedient to consider functional data as elements in an abstract Hilbert space of infinite dimension, separable, and over the real numbers. Inner product and norm in will be denoted by , and , respectively. Let (Ω,,) be a probability space, 𝑋Ω a Hilbert space valued random variable (i.e., measurable with respect to the 𝜎-field of Borel sets in ), and 𝜂Ω a real valued random variable. For all that follows it will be sufficient to assume that𝔼𝑋4<,𝔼𝜂2<.(2.1) The mean and covariance operator of 𝑋 will be denoted by𝔼𝑋=𝜇𝑋,𝔼𝑋𝜇𝑋𝑋𝜇𝑋=Σ𝑋,𝑋,(2.2) respectively, where 𝑎𝑏 is the tensor product in . The Riesz representation theorem guarantees that these quantities are uniquely determined by the relations𝔼𝑎,𝑋=𝑎,𝜇𝑋,𝑎,𝔼𝑎,𝑋𝜇𝑋𝑋𝜇𝑋,𝑏=𝑎,Σ𝑋,𝑋𝑏,𝑎𝑏,(2.3) see Laha & Rohatgi [16]. Throughout Σ𝑋,𝑋 is assumed to be one-to-one.

Let denote the Banach space of all bounded linear operators 𝑇 equipped with the norm . An operator 𝑈 is called Hilbert-Schmidt if𝑘=1𝑈𝑒𝑘2<,(2.4) for any orthonormal basis 𝑒1,𝑒2, of . (The number in (2.4) is in fact independent of the choice of basis.) The subspace HS of all Hilbert-Schmidt operators is a Hilbert space in its own right with the inner product𝑈,𝑉HS=𝑘=1𝑈𝑒𝑘,𝑉𝑒𝑘,(2.5) again independent of the choice of basis. This inner product yields the norm𝑈2HS=𝑘=1𝑈𝑒𝑘2,(2.6) which is the number in (2.4). The tensor product for elements 𝑎,𝑏 will be denoted by 𝑎𝑏, and that for elements 𝑈,𝑉HS by 𝑈HS𝑉.

The two problems to be considered in this paper both deal with cases where the best linear predictor of 𝜂 in terms of 𝑋 is linear:𝔼(𝜂𝑋)=𝛼+𝑋,𝑓,𝛼,𝑓.(2.7) Just as in the univariate case (Rao [17, Section 4g]), we have the relationΣ𝑋,𝑋𝑓=𝔼𝜂𝜇𝜂𝑋𝜂𝑋=Σ𝑋,𝜂.(2.8) It should be noted that if Σ𝑋,𝑋 is one-to-one and Σ𝑋,𝜂 in its range, we can solve (2.8) and obtain𝑓=Σ1𝑋,𝑋Σ𝑋,𝜂.(2.9)

Since the underlying distribution is arbitrary, the empirical distribution, given a sample (𝑋1,𝜂1),,(𝑋𝑛,𝜂𝑛) of independent copies of (𝑋,𝜂), can be substituted for it. The minimization property is now the least squares property, the same formulas are obtained with 𝜇𝑋, Σ𝑋,𝑋, 𝜇𝜂, and Σ𝑋,𝜂 replaced with their estimators𝜇𝑋=1𝑛𝑛𝑖=1𝑋𝑖=Σ𝑋,(2.10)𝑋,𝑋=1𝑛𝑛𝑖=1𝑋𝑖𝑋𝑋𝑖𝑋,(2.11)𝜇𝜂=1𝑛𝑛𝑖=1𝜂𝑖=Σ𝜂,(2.12)𝑋,𝜂=1𝑛𝑛𝑖=1𝜂𝑖𝜂×𝑋𝑖𝑋.(2.13) Let us next specify the two problems.

2.2. Functional Regression Estimation

The model here is𝜂=𝛼+𝑋,𝑓+𝜀,(2.14) where 𝜀 is a real valued error variable and the following assumption is satisfied.

Assumption 2.1. The error variable has a finite second moment, and 𝜀𝑋,𝔼𝜀=0,Var𝜀=𝑣2<.(2.15)

Example 2.2. The functional regression model in Hall and Horowitz [1] is essentially obtained by choosing =𝐿2(0,1), so that the generic observation is given by 𝜂=𝛼+10𝑋(𝑡)𝑓(𝑡)𝑑𝑡+𝜀.(2.16)

Example 2.3. Mas and Pumo [18] argue that in the situation of Example 2.2, the derivative 𝑋 of 𝑋 may contain important information and should therefore be included. Hence, these authors suggest to choose for the Sobolev space 𝑊2,1(0,1) in which case the generic observation satisfies 𝜂=𝛼+10𝑋(𝑡)𝑓(𝑡)𝑑𝑡+10𝑋(𝑡)𝑓(𝑡)𝑑𝑡+𝜀.(2.17)

Example 2.4. Just as in the univariate case, we have that the model 𝜂=𝛼+𝑋,𝑓2+𝜀,(2.18) quadratic in the inner product of , is in fact linear in the inner product of HS, because 𝑋,𝑓2=𝑋𝑋,𝑓𝑓HS.(2.19) We will not pursue this example here.

In the infinite dimensional case Σ𝑋,𝑋 cannot be one-to-one, and in order to estimate 𝑓 from the sample version of (2.9), a regularized inverse of Tikhonov type will be used, as in Hall & Horowitz [1]. Thus, we arrive at the estimator (see also (2.11) and (2.13))𝑓𝛿=Σ𝛿𝐼+𝑋,𝑋11𝑛𝑛𝑖=1𝜂𝑖𝑋𝑖𝑋=Σ𝛿𝐼+𝑋,𝑋1Σ𝑋,𝜂,forsome𝛿>0.(2.20) Let us also introduce𝑓𝛿=(𝛿𝐼+Σ)1Σ𝑓,𝑓.(2.21) In Section 4, the asymptotic distribution of this estimator will be obtained, and the result will be applied to testing.

2.3. Functional Classification

The method discussed here is essentially the one in Hastie et al. [4] and Hastie et al. [19, Sections 4.2 and 4.3]. Let 𝑃1 and 𝑃2 be two probability distributions on (,) with means 𝜇1 and 𝜇2 and common covariance operator Σ. Consider a random element (𝐼,𝑋)Ω{1,2}× with distribution determined by{𝑋𝐵𝐼=𝑗}=𝑃𝑗(𝐵),𝐵,{𝐼=𝑗}=𝜋𝑗0,𝜋1+𝜋2=1.(2.22) In this case, the distribution of 𝑋 is 𝜋1𝑃1+𝜋2𝑃2, with mean𝜇𝑋=𝜋1𝜇1+𝜋2𝜇2,(2.23) and covariance operatorΣ𝑋,𝑋=Σ+𝜋1𝜋2𝜇1𝜇2𝜇1𝜇2.(2.24)

Hastie et al. [19] now introduce the indicator response variables 𝜂𝑗=1{𝑗}(𝐼), 𝑗=1,2, and assume that the 𝜂𝑗 satisfies (2.7) for 𝛼𝑗 and 𝑓𝑗. Note that𝜇𝜂𝑗=𝔼𝜂𝑗=𝜋𝑗,Σ(2.25)𝑋,𝜂𝑗=𝔼𝜂𝑗𝑋𝜇𝑋=𝜋1𝜋2𝜇1𝜇2𝜋,𝑗=1,1𝜋2𝜇2𝜇1,𝑗=2.(2.26) Since 𝜂𝑗 is Bernoulli, we have, of course, 𝔼(𝜂𝑗𝑋)={𝐼=𝑗𝑋}. Precisely as for matrices (Rao & Toutenburg [20, Theorem A.18]), the inverse of the operator in (2.24) equalsΣ1𝑋,𝑋=Σ1𝛾Σ1𝜇1𝜇2𝜇1𝜇2Σ1,(2.27) where 𝛾=𝜋1𝜋2/(1+𝜋1𝜋2𝜇1𝜇2,Σ1(𝜇1𝜇2)), provided that the following assumption is satisfied.

Assumption 2.5. The vector 𝜇1𝜇2 lies in the range of Σ, that is, Σ1𝜇1𝜇2iswelldened.(2.28) It will also be assumed that 𝜋1=𝜋2=12.(2.29)

Assuming (2.28), (2.8) can be solved and yields after some algebra𝑓𝑗=Σ1𝑋,𝑋Σ𝑋,𝜂𝑗=𝛾Σ1𝜇1𝜇2,𝑗=1,𝛾Σ1𝜇2𝜇1,𝑗=2.(2.30) If only 𝑋, and not 𝐼, is observed, the rule in Hastie et al. [19] assigns 𝑋 to 𝑃1 if and only if𝔼𝜂1𝜂𝑋>𝔼2,𝑋(2.31)𝑋𝜇𝑋,Σ1𝜇1𝜇2>𝜋2𝜋12𝛾.(2.32) Because of assumption (2.29), the rule reduces to1𝑋2𝜇1+𝜇2,Σ1𝜇1𝜇2>0.(2.33)

Hastie et al. [19] claim that in the finite dimensional case, their rule reduces to Fisher’s linear discriminant rule and to the usual rule when the distributions are normal. This remains in fact true in the present infinite dimensional case. Let us assume that𝑃𝑗𝜇=𝒢𝑗,Σ,𝑗=1,2,(2.34) where 𝒢(𝜇,Σ) denotes a Gaussian distribution with mean 𝜇 and covariance operator Σ. It is well known [2123] that under Assumption 2.5 these Gaussian distributions are equivalent. This is important since there is no “Lebesgue measure’’ on [24]. However, now the densities of 𝑃1 and 𝑃2 with respect to 𝑃1 can be considered; it is well known that𝑑𝑃1𝑑𝑃2(𝑥)=𝑒𝑥(1/2)(𝜇1+𝜇2),Σ1(𝜇1𝜇2),𝑥.(2.35) This leads at once to (2.33) as an optimal (Bayes) rule, equal in appearance to the one for the finite dimensional case.

In most practical situations, 𝜇1, 𝜇2, and Σ are not known, but a training sample (𝐼1,𝑋1),,(𝐼𝑛,𝑋𝑛) of independent copies of (𝐼,𝑋) is given. Let𝑋𝑗=1𝑛𝑗𝑗𝒥𝑗𝑋𝑖,𝒥𝑗=𝑖𝐼𝑖=𝑗,#𝒥𝑗=𝑛𝑗,1(2.36)Σ=𝑛2𝑗=1𝑖𝒥𝑗𝑋𝑖𝑋𝑗𝑋𝑖𝑋𝑗,(2.37) and we have (cf. (2.24))Σ𝑋,𝑋=𝑛Σ+1𝑛2𝑛𝑋1𝑋2𝑋1𝑋2.(2.38)

Once again the operator Σ (and Σ𝑋,𝑋 for that matter) cannot be one-to-one. In order to obtain an empirical analogue of the rule (2.32), Hastie et al. [4] employ penalized regression, and Hastie et al. [19] also suggest to use a regularized inverse. (The methods are related.) Here the latter method will be used and 𝑋 will be assigned to 𝑃1 if and only if1𝑋2𝑋1+𝑋2,Σ𝛿𝐼+1𝑋1𝑋2>0.(2.39) Section 5 is devoted to showing that this rule is asymptotically optimal when Assumption (2.5) is fulfilled.

3. A Review of Some Relevant Operator Theory

It is well known [16] that the covariance operator Σ is nonnegative, Hermitian, of finite trace, and hence Hilbert-Schmidt and therefore compact. The assumption that Σ is one-to-one is equivalent to assuming that Σ is strictly positive. Consequently, Σ has eigenvalues 𝜎21>𝜎22>0, all of finite multiplicity. If we let 𝑃1,𝑃2, be the corresponding eigenprojections, so that Σ𝑃𝑘=𝜎2𝑘𝑃𝑘, we have the spectral representation:Σ=𝑘=1𝜎2𝑘𝑃𝑘,with𝑘=1𝑃𝑘=𝐼.(3.1)

The spectrum of Σ equals 𝜎(Σ)={0,𝜎21,𝜎22,}[0,𝜎21]. Let us introduce a rectangular contour Γ around the spectrum as in Figure 1, where 𝛿>0 is the regularization parameter in (2.20), and let Ω be the open region enclosed by Γ. Furthermore, let 𝐷(ΩΓ)=Ω be an open neighborhood of Ω and suppose that𝜑𝐷isanalytic.(3.2)

Figure 1

We are interested in approximations of 𝜑(Σ)=𝜑(Σ+Π), where Π is a perturbation. The application we have in mind arises for Π=Π=ΣΣ and yields an approximation of 𝜑(Σ); see also Watson [25] for the matrix case. Therefore, we will not in general assume that Π and Σ will commute. In the special case where 𝑋 is stationary, as considered in Johannes [11], there exists a simpler estimator Σ of Σ, such that Σ and Π do commute, which results in a simpler theory; see also Remark (4.1).

The resolvent of Σ,𝑅(𝑧)=(𝑧𝐼Σ)1[],𝑧𝜌(Σ)=𝜎(Σ)𝑐,(3.3) is analytic on the resolvent set 𝜌(Σ), and the operator1𝜑(Σ)=2𝜋𝑖Γ𝜑(𝑧)𝑅(𝑧)𝑑𝑧(3.4) is well defined. For the present operator Σ, as given in (3.1), the resolvent equals more explicitly𝑅(𝑧)=𝑘=11𝑧𝜎2𝑘𝑃𝑘.(3.5) Substitution of (3.5) in (3.4) and application of the Cauchy integral formula yields𝜑(Σ)=𝑘=1𝜑𝜎2𝑘𝑃𝑘.(3.6)

Example 3.1. The two functions 𝜑11(𝑧)=𝛿+𝑧,𝜑2𝑧(𝑧)=𝛿+𝑧,𝑧{𝛿},(3.7) are analytic on their domain that satisfies the conditions. With the help of these functions we may write (cf. (2.20) and (2.21)) 𝑓𝛿𝑓𝛿=𝜑11(Σ)𝑛𝑛𝑖=1𝜀𝑖𝑋𝑖𝑋+𝜑2Σ𝜑2(Σ)𝑓,𝑓,(3.8) and (cf. (2.39)) Σ𝛿𝐼+1𝑋1𝑋2=𝜑1Σ𝑋1𝑋2.(3.9)

Regarding the following brief summary and slight extension of some of the results in [5], we also refer to Dunford & Schwartz [10], Kato [26], and Watson [25]. Henceforth, we will assume thatΠ𝛿4.(3.10) For such perturbations, we have 𝜎(Σ)=𝜎(Σ+Π)Ω, so that the resolvent set of Σ satisfies𝜌Σ=𝜌(Σ+Π)Ω𝑐Γ.(3.11) It should also be noted that𝑅(𝑧)=sup𝑘1||𝑧𝜎2𝑘||2𝛿𝑧Ω𝑐.(3.12)

The basic expansion (similar to Watson [25])Σ𝑅(𝑧)=𝑧𝐼1=𝑅(𝑧)+𝑘=1𝑅(𝑧)(Π𝑅(𝑧))𝑘,𝑧Ω𝑐,(3.13) can be written as𝑅(𝑧)=𝑅(𝑧)+𝑅(𝑧)Π𝑅(𝑧)(𝐼Π𝑅(𝑧))1,(3.14) useful for analyzing the error probability for 𝛿0, as 𝑛, and also as𝑅(𝑧)=𝑅(𝑧)+𝑅(𝑧)Π𝑅(𝑧)+𝑅(𝑧)(Π𝑅(𝑧))2(𝐼Π𝑅(𝑧))1,(3.15) useful for analyzing the convergence in distribution of the estimators.

Let us decompose the contour Γ into two parts Γ0={(1/2)𝛿+𝑖𝑦1𝑦1} and Γ1=ΓΓ0, write 𝑀𝜑=max𝑧Γ|𝜑(𝑧)|, and observe that (3.10) and (3.12) entail that(𝐼Π𝑅(𝑧))12,𝑧Ω𝑐.(3.16) We now have12𝜋𝑖Γ𝜑(𝑧)𝑅(𝑧)(Π𝑅(𝑧))𝑘(𝐼Π𝑅(𝑧))1𝑑𝑧1𝜋𝑀𝜑Π𝑘Γ𝑅(𝑧)𝑘+11𝑑𝑧𝜋𝑀𝜑Π𝑘+1114𝛿2+𝑦2(1/2)(𝑘+1)||||𝑑𝑦+Γ1||||21𝑑𝑧𝜋𝑀𝜑Π𝑘4𝛿𝑘+5+2Σ,𝑘.(3.17) Multiplying both sides by 𝜑(𝑧), taking (1/2𝜋𝑖)Γ, and using 0<𝐶< as a generic constant that does not depend on Π or 𝛿, (3.14) and (3.15) yield the following.

Theorem 3.2. Provided that (3.10) is fulfilled, one has 𝜑(Σ+Π)𝜑(Σ)𝐶𝑀𝜑Π𝛿,(3.18)𝜑(Σ+Π)𝜑(Σ)̇𝜑ΣΠ𝐶𝑀𝜑Π𝛿2,(3.19) where ̇𝜑Σ is a bounded operator, given by ̇𝜑ΣΠ=𝑘𝜎𝜑2𝑘𝑃𝑘Π𝑃𝑘+𝑗𝑘𝜑𝜎2𝑘𝜎𝜑2𝑗𝜎2𝑘𝜎2𝑗𝑃𝑗Π𝑃𝑘.(3.20)

Remark 3.3. If Σ and Π commute, so will 𝑃𝑘 and Π, and 𝑅(𝑧) and Π, and the expressions simplify considerably. In particular, (3.20) reduces to ̇𝜑ΣΠ=𝑘𝜑𝜎2𝑘𝑃𝑘Π,(3.21) that is, ̇𝜑Σ=𝜑(Σ), where 𝜑 is the numerical derivative of 𝜑, and 𝜑(Σ) should be understood in the sense of “functional calculus’’ as in (3.6). For the commuting case, see Dunford & Schwartz [10].

Let us now present some basic facts that are useful to subsequent statistical applications. Dauxois et al. [27] have shown that there exists a Gaussian random element 𝒢ΣΩHS, such that 𝑛(ΣΣ)𝑑𝒢Σ, as 𝑛, in HS. Because the identity map from HS in is continuous, we may state𝑛ΣΣ𝑑𝒢Σ,as𝑛,in𝐻𝑆in,(3.22) by the continuous mapping theorem. This entails thatΠ=ΣΣ=𝒪𝑝1𝑛,as𝑛,(3.23) so that condition (3.10) is fulfilled with arbitrary high probability for 𝑛 sufficiently large. Expansions (3.14) and (3.15) and the resulting inequalities hold true for Σ replaced with Σ(𝜔)=Σ+Π(𝜔) for 𝜔{Π𝛿/4}.

Example 3.4. Application to asymptotic distribution theory. In this application 𝛿>0 will be kept fixed: see also Section 4.1. It is based on the delta method for functions of operators [6] which follows easily from (3.19). In conjunction with (3.22) this yields 𝑛𝜑2Σ𝜑2(Σ)𝑑̇𝜑2,Σ𝒢Σ,as𝑛,in.(3.24) In turn this yields 𝑛𝜑2Σ𝜑2𝑓(Σ)𝑑̇𝜑2,Σ𝒢Σ𝑓,as𝑛,in,(3.25) for any 𝑓, by the continuous mapping theorem. This result will be used in Section 4.

Example 3.5. Application to classification. Here we will let 𝛿=𝛿(𝑛)0, as 𝑛, and write 𝜑1,𝑛(𝑧)=1/(𝛿(𝑛)+𝑧) to stress this dependence on sample sizes. Since max𝑧Γ||𝜑1,𝑛||1(𝑧)𝛿(𝑛),(3.26) it is immediate from (3.18) that 𝜑1,𝑛(Σ)𝜑1,𝑛(Σ))=𝒪𝑝1𝛿2(𝑛)𝑛,as𝑛,(3.27) a result that will be used in Section 5.

4. Asymptotics for the Regression Estimator

4.1. The Asymptotic Distribution

The central limit theorem in Hilbert spaces entails at once1𝑛𝑛𝑖=1𝜀𝑖𝑋𝑖𝜇𝑑𝒢0,as𝑛,in,(4.1) where 𝒢0 is a zero mean Gaussian random variable in , and1𝑛𝑛𝑖=1𝑋𝑖𝑋𝜇𝑖𝜇Σ𝑑𝒢Σ,as𝑛,inHS,(4.2) where 𝒢Σ is a zero mean random variable in HS. These convergence results remain true with 𝜇 replaced by 𝑋 and, because 𝜀𝑋 by assumption (2.15), we also have that jointly1𝑛𝑛𝑖=1𝜀𝑖𝑋𝑖𝑋𝑛ΣΣ𝑑𝒢0𝒢Σ,as𝑛,in×HS,(4.3) where 𝒢Σ is the same in (3.22), and𝒢0𝒢Σ.(4.4)

Because the limiting variables are generated by the sums of iid variables on the left in (4.1) and (4.2) we have𝔼𝒢0𝒢0==𝔼{𝜀(𝑋𝜇)}{𝜀(𝑋𝜇)}𝔼𝜀2𝔼(𝑋𝜇)(𝑋𝜇)=𝑣2Σ,(4.5)𝔼𝒢ΣHS𝒢Σ=𝔼{(𝑋𝜇)(𝑋𝜇)Σ}HS{(𝑋𝜇)(𝑋𝜇)Σ},(4.6) for the respective covariance structures. These are important to further specify the limiting distribution of the regression estimator as will be seen in Section 4.2.

Let us write, for brevity,𝑓𝛿𝑓𝛿=𝑈𝑛+𝑉𝑛,(4.7) where, according to (2.20) and (2.21),𝑈𝑛=𝜑1Σ1𝑛𝑛𝑖=1𝜀𝑖𝑋𝑖𝑋,𝑉𝑛=𝜑2Σ𝜑2(Σ)𝑓.(4.8) Note that 𝜑1 and 𝜑2 depend on 𝛿.

With statistical applications in mind, it would be interesting if there would exist numbers 𝑎𝑛 and 𝛿(𝑛)0, as 𝑛, such that𝑎𝑛𝑓𝛿(𝑛)𝑓𝑑,as𝑛,in,(4.9) where is a nondegenerate random vector. It has been shown in Cardot et al. [28], however, that such a convergence in distribution when we center at 𝑓 is not in general possible.

Theorem 4.1. For fixed 𝛿>0, one has 𝑛𝑓𝛿𝑓𝛿𝑑=1+2,as𝑛,in,(4.10) where 1=𝜑1(Σ)𝒢0 and 2=(̇𝜑2,Σ𝒢Σ)𝑓 are zero mean Gaussian random elements, and 12.

Further information about the structure of the covariance operator of the random vector on the right in (4.10) will be needed in order to be able to exploit the theorem for statistical inference. This will be addressed in the next section.

4.2. Further Specification of the Limiting Distribution

It follows from (4.5) that 𝒢0 has a Karhunen-Loève expansion𝒢0=𝑖=1𝑣𝜎𝑗𝑍𝑗𝑝𝑗,(4.11) where the real valued random variables𝑍𝑗(𝑗)areiid𝑁(0,1).(4.12) Accordingly 1 in (4.10) can be further specified as1=𝜑1(Σ)𝒢0=𝑣𝑗=1𝜎𝑗𝛿+𝜎2𝑗𝑍𝑗𝑝𝑗.(4.13)

The Gaussian operator in (4.2) has been investigated in Dauxois et al. [27], and here we will briefly summarize some of their results in our notation. By evaluating the inner product in HS in the basis 𝑝1,𝑝2 it follows from (4.6) that𝔼𝑝𝑗𝑝𝑘,𝒢ΣHS𝒢Σ,𝑝𝛼𝑝𝛽HS𝑝=𝔼𝑗,𝑋𝜇,𝑝𝑘(𝑋𝜇)𝜎2𝑘𝑝𝑘×𝑝𝛼,𝑋𝜇,𝑝𝛽(𝑋𝜇)𝜎2𝛽𝑝𝛽=𝔼𝑋𝜇,𝑝𝑗𝑋𝜇,𝑝𝑘𝑋𝜇,𝑝𝛼𝑋𝜇,𝑝𝛽𝛿𝑗,𝑘𝛿𝛼,𝛽𝜎2𝑘𝜎2𝛽.(4.14)

This last expression does not in general further simplify. However, if we assume that the regressor 𝑋 satisfies𝑋𝑑=Gaussian(𝜇,Σ),(4.15) it can be easily seen that the𝑋𝜇,𝑝𝑗𝑑=𝑁0,𝜎2𝑗areindependent,(4.16) so that the expression in (4.14) equals zero if (𝑗,𝑘)(𝛼,𝛽). As in Dauxois et al. [27], we obtain in this case𝔼𝑝𝑗𝑝𝑘,𝒢ΣHS𝒢Σ,𝑝𝛼𝑝𝛽HS=𝑣0,(𝑗,𝑘)(𝛼,𝛽),2𝑗,𝑘,(𝑗,𝑘)=(𝛼,𝛽),(4.17) where𝑣2𝑗,𝑘=2𝜎4𝑗𝜎,𝑗=𝑘,2𝑗𝜎2𝑘,𝑗𝑘.(4.18)

Consequently the 𝑝𝑗𝑝𝑘(𝑗,𝑘) are an orthonormal basis of eigenvectors of the covariance operator of 𝒢Σ with eigenvalues 𝑣2𝑗,𝑘. Hence 𝒢Σ has the Karhunen-Loève expansion (in HS)𝒢Σ=𝑗=1𝑘=1𝑣𝑗,𝑘𝑍𝑗,𝑘𝑝𝑗𝑝𝑘,(4.19) where the random variables𝑍𝑗,𝑘(𝑗,𝑘)areiid𝑁(0,1).(4.20)

Let us, for brevity, write (see (3.7) for 𝜑2)𝑤𝑗,𝑘=𝜑2𝜎2𝑘𝜑2𝜎2𝑗𝜎2𝑘𝜎2𝑗𝜑,𝑗𝑘2𝜎2𝑗=𝛿,𝑗=𝑘𝛿+𝜎2𝑗𝛿+𝜎2𝑘,𝑗,𝑘,(4.21)2=̇𝜑2,Σ𝒢𝑓=𝑗=1𝑘=1𝑤𝑗,𝑘𝑃𝑗𝒢Σ𝑃𝑘𝑓=𝑗=1𝑘=1𝑤𝑗,𝑘𝑃𝑗𝛼=1𝛽=1𝑣𝛼,𝛽𝑍𝛼,𝛽𝑝𝛼𝑝𝛽𝑃𝑘𝑓=𝑗=1𝑘=1𝑤𝑗,𝑘𝑣𝑗,𝑘𝑍𝑗,𝑘𝑓,𝑝𝑘𝑝𝑗.(4.22) Summarizing, we have the following result.

Theorem 4.2. The random element 1, on the right in (4.10) can be represented by (4.13). If one assumes that the regressor 𝑋𝑑=Gaussian(𝜇,Σ), the random element 2 on the right in (4.10) can be represented by (4.22), where the 𝑍𝑗,𝑘 in (4.20) are stochastically independent of the 𝑍𝑗 in (4.12).

4.3. Asymptotics under the Null Hypothesis

Let us recall that 𝑓𝛿 is related to 𝑓 according to (2.21) so that the equivalence𝐻0𝑓𝛿=0𝑓=0,(4.23) where again 𝛿>0 is fixed, holds true. The following is immediate from Theorem 4.1.

Theorem 4.3. Under the null hypothesis in (4.23), one has 𝑛𝑓𝛿2𝑑2=12,as𝑛,(4.24) where 12𝑑=𝑣2𝑗=1𝜎2𝑗𝛿+𝜎2𝑗2𝑍2𝑗.(4.25)

An immediate generalization of the hypothesis in (4.23) is𝐻0𝑓𝛿=𝑓𝛿,0=𝜑2𝑓(Σ)0𝑓=𝑓0,(4.26) for some given 𝑓0. This hypothesis is in principle of interest for confidence sets. Of course, testing (4.26) can be reduced to testing (4.23) by replacing the 𝜂𝑖 with𝜂𝑖=𝜂𝑖𝑋𝑖,𝑓𝛿,0𝑋=𝛼+𝑖,𝑓𝑓𝛿,0+𝜀𝑖,(4.27) and then using the estimator𝑓𝛿=Σ𝛿𝐼+11𝑛𝑛𝑖=1𝜂𝑖𝑋𝑖𝑋.(4.28) Since 𝑓𝑓𝛿,0=0 the following is immediate.

Theorem 4.4. Assuming (4.27), one has 𝑛𝑓𝛿2𝑑12,as𝑛,(4.29) where 12 has the same distribution as in (4.25). Related results can be found in Cardot et al. [2].

Another generalization of the hypothesis in (4.23) is𝐻0𝑓𝛿𝑞𝕄=1,,𝑞𝑀,(4.30) where 𝑞1,,𝑞𝑀 are orthonormal vectors in . Let 𝑄 and 𝑄 denote the orthogonal projection onto 𝕄 and 𝕄 respectively, and note that 𝑓𝛿𝕄 if and only if 𝑄𝑓𝛿=0. A test statistic might be based on 𝑄𝑓𝛿2 and we have the following.

Theorem 4.5. Under 𝐻0 in (4.30), one has 𝑛𝑄𝑓𝛿2𝑑𝑄2,as𝑛.(4.31)

The distribution on the right in (4.31) is rather complicated if 𝑞1,,𝑞𝑀 remain arbitrary. But if we are willing to assume (4.20), it follows from (4.27) that𝑄2=𝑄12+𝑄22𝑄+21,𝑄2=𝑣2𝑗=1𝑘=1𝜎𝑗𝛿+𝜎2𝑗𝜎𝑘𝛿+𝜎2𝑘𝑍𝑗𝑍𝑘𝑄𝑝𝑗,𝑄𝑝𝑘+𝑗=1𝑘=1𝛼=1𝛽=1𝑤𝑗,𝑘𝑤𝛼,𝛽𝑣𝑗,𝑘𝑣𝛼,𝛽𝑍𝑗,𝑘𝑍𝛼,𝛽𝑓,𝑝𝑘𝑓,𝑝𝛽𝑄𝑝𝑗,𝑄𝑝𝛼+2𝑣𝑗=1𝜎𝑗𝛿+𝜎2𝑗𝑍𝑗𝛼=1𝛽=1𝑤𝛼,𝛽𝑣𝛼,𝛽𝑍𝛼,𝛽𝑓,𝑝𝛽𝑄𝑝𝑗,𝑄𝑝𝛼.(4.32)

A simplification is possible if we are willing to modify the hypothesis in (4.30) and use a so-called neighborhood hypothesis. This notion has a rather long history and has been investigated by Hodges & Lehmann [29] for certain parametric models. Dette & Munk [30] have rekindled the interest in it by an application in nonparametric regression. In the present context we might replace (4.30) with the neighborhood hypothesis𝐻0,𝜀𝑄𝑓𝛿2𝜀2,forsome𝜀>0.(4.33) It is known from the literature that the advantage of using a neighborhood hypothesis is not only that such a hypothesis might be more realistic and that the asymptotics are much simpler, but also that without extra complication we might interchange null hypothesis and alternative. This means in the current situation that we might as well test the null hypothesis𝐻0,𝜀𝑄𝑓𝛿2𝜀2,forsome𝜀>0,(4.34) which could be more suitable, in particular in goodness-of-fit problems.

The functional 𝑔𝑄𝑔2, 𝑔, has a Fréchet derivative at 𝑓𝛿 given by the functional 𝑔2𝑔,𝑄𝑓𝛿, 𝑔. Therefore, the delta method in conjunction with Theorem 4.1 entails the following result.

Theorem 4.6. One has 𝑛𝑄𝑓𝛿2𝑄𝑓𝛿2𝑑2,𝑄𝑓𝛿,as𝑛.(4.35)

The limiting distribution on the right in (4.35) is normal with mean zero and complicated varianceΔ2=4𝔼,𝑄𝑓𝛿2𝔼=41,𝑄𝑓𝛿2+𝔼2,𝑄𝑓𝛿2=4𝑣2𝑗=1𝜎2𝑗𝛿+𝜎2𝑗2𝑝𝑗,𝑄𝑓𝛿2+4𝔼𝑗=1𝑘=1𝑤𝑗,𝑘𝑃𝑗𝛼=1𝛽=1𝑣𝛼,𝛽𝑍𝛼,𝛽𝑝𝛼𝑝𝛽𝑃𝑘𝑓,𝑄𝑓𝛿2=4𝑣2𝑗=1𝜎2𝑗𝛿+𝜎2𝑗2𝑝𝑗,𝑄𝑓𝛿2+4𝑗=1𝑘=1𝑤2𝑗,𝑘𝑣2𝑗,𝑘𝑓,𝑝𝑘2𝑝𝑗,𝑄𝑓𝛿2.(4.36)

Remark 4.7. As we see from the expressions in (4.24), (4.32), and (4.36), the limiting distributions depend on infinitely many parameters that must be suitably estimated in order to be in a position to use the statistics for actual testing. Estimators for the individual parameters are not too hard to obtain. The eigenvalues 𝜎2𝑗 and eigenvectors 𝑝𝑗 of Σ, for instance, can in principle be estimated by the corresponding quantities of Σ. Although in any practical situation only a finite number of these parameters can be estimated, theoretically this number must increase with the sample size and some kind of uniform consistency will be needed for a suitable approximation of the limiting distribution. This interesting question of uniform consistency seems to require quite some technicalities and will not be addressed in this paper.

Remark 4.8. In this paper we have dealt with the situation where Σ is entirely unknown. It has been observed in Johannes [11] that if 𝑋 is a stationary process on the unit interval, the eigenfunctions 𝑝𝑗 of the covariance operator are always the same, known system of trigonometric functions, and only its eigenvalues 𝜎2𝑗 are unknown. Knowing the 𝑝𝑗 leads to several simplifications. In the first place, Σ can now be estimated by the expression on the right in (3.1) with only the 𝜎2𝑘 replaced with estimators. If Σ is this estimator, it is clear that Σ and Π=ΣΣ commute, so that the derivative ̇𝜑2,Σ now simplifies considerably (see Remark 3.3). Secondly, we might consider the special case of 𝐻0 in (4.30), where 𝑞𝑗=𝑝𝑗, 𝑗=1,,𝑀. We now have 𝑓𝛿𝑝𝕄=1,,𝑝𝑀𝑓𝕄,(4.37) so that even for fixed 𝛿 we can test the actual regression function. In the third place, under the null hypothesis in (4.37), the number of unknown parameters in (4.32) reduces considerably because now 𝑄𝑝𝑗=0 for 𝑗=1,,𝑀. When the 𝑝𝑗 are known, in addition to all the changes mentioned above, also the limiting distribution of Σ differs from that of Σ. Considering all these modifications that are needed, it seems better not to include this important special case in this paper.

4.4. Asymptotics under Local Alternatives

Again we assume that 𝑋 is Gaussian. Suppose that𝑓=𝑓𝑛=1𝑓+𝑛𝑔,for𝑓,𝑔.(4.38) For such 𝑓𝑛 only minor changes in the asymptotics are required, because the conditions on the 𝑋𝑖 and 𝜀𝑖 are still the same and do not change with 𝑛. Let us write𝑓𝛿=𝑓𝑛,𝛿=𝑓𝛿+1𝑛𝑔𝛿,(4.39) where 𝑓𝛿=(𝛿𝐼+Σ)1𝑓, 𝑔𝛿=(𝛿𝐼+Σ)1Σ𝑔. The following is immediate from a minor modification of Theorem 4.1.

Theorem 4.9. For 𝑓𝛿=𝑓𝑛,𝛿 as in (4.39), one has 𝑛𝑓𝛿𝑓𝛿𝑑𝑔𝛿+1+2,(4.40) where 1=𝜑1(Σ)𝒢0 is the same as in (4.13), 2 is obtained from 2 in (4.22) by replacing 𝑓 with 𝑓, and 12.

By way of an example, let us apply this result to testing the neighborhood hypothesis and consider the asymptotics of the test statistics in (4.35) under the local alternatives 𝑓𝑛,𝛿 in (4.39) with𝑄𝑓𝛿2=𝜀2,𝑔𝛿𝑓𝕄,𝛿,𝑔𝛿>0.(4.41)

Note that under such alternatives, 𝑓𝛿(𝑛) is still a consistent estimator of 𝑓 for any 𝛿(𝑛)0, as 𝑛, and so is 𝑓𝛿 for 𝑓𝛿. The following is immediate from Theorem 4.9. To conclude this section, let us first assume that parameters can be suitably estimated. We then arrive at the limiting distribution of a test statistics that allows the construction of an asymptotic level-𝛼 test whose asymptotic power can be computed.

Theorem 4.10. For 𝑓𝛿=𝑓𝑛,𝛿, as in (4.40) and (4.41), one has 𝑇𝑛=𝑛Δ𝑄𝑓𝛿2𝜀2𝑑2𝑔𝑁𝛿,𝑄𝑓𝛿Δ,1,as𝑛,(4.42) assuming that Δ is a consistent estimator of Δ in (4.36). Note that the limiting distribution is 𝑁(0,1) under 0 (i.e., 𝑔𝛿=0).

5. Asymptotic Optimality of the Classification Rule

In addition to Assumption 2.5 and (2.34), it will be assumed that the smoothness parameter 𝛿=𝛿(𝑛) in (2.39) satisfies𝛿(𝑛)0,𝛿(𝑛)𝑛1/4,as𝑛.(5.1) We will also assume that the sizes of the training samples 𝑛1 and 𝑛2 (see (2.36)) are deterministic and satisfy (𝑛=𝑛1+𝑛2)0<lim𝑛𝑛inf𝑗𝑛lim𝑛𝑛sup𝑗𝑛<1.(5.2) Let us recall from (3.7) that 𝜑1(𝑧)=𝜑1,𝑛(𝑧)=1/{𝛿(𝑛)+𝑧},𝑧𝛿(𝑛).

Under these conditions the probability of misclassification equals {𝑋(1/2)(𝑋1+𝑋2),𝜑1,𝑛(Σ)(𝑋1𝑋2)>0𝑋𝑑=𝒢(𝜇2,Σ)}. Let us note that|||1𝑋2𝑋1+𝑋2,𝜑1,𝑛Σ𝑋1𝑋21𝑋2𝜇1+𝜇2,𝜑1,𝑛𝜇(Σ)1𝜇2|||12𝑋1𝜇1+𝑋2𝜇2𝜑1,𝑛(Σ)𝜇1𝜇2+1𝑋2𝑋1+𝑋2×𝜑1,𝑛Σ𝜑1,𝑛(Σ)𝑋1𝑋2+𝑋1𝜇1+𝑋2𝜇2𝜑1,𝑛(Σ).(5.3) Since 𝑋𝑗𝜇𝑗=𝒪𝑝(𝑛1/2), 𝜑1,𝑛(Σ)=𝒪(𝛿1(𝑛)), and, according to (3.21),𝜑1,𝑛Σ𝜑1,𝑛(Σ)=𝒪𝑝1𝛿2(𝑛)𝑛,(5.4) it follows from (5.1) that the limit of the misclassification probability equalslim𝑛𝑋𝜇212𝜇1𝜇2,𝜑1,𝑛𝜇(Σ)1𝜇21>0=1Φ2𝜇1𝜇2,Σ1𝜇1𝜇2,(5.5) where Φ is the standard normal cdf.

For (5.5) we have used the well-known property of regularized inverses that (𝛿+Σ)1Σ𝑓𝑓0, as 𝛿0, for all 𝑓, and the fact that we may choose 𝑓=Σ1(𝜇1𝜇2) by Assumption 2.5. Since rule (2.33) is optimal when parameters are known, we have obtained the following result.

Theorem 5.1. Let 𝑃𝑗𝑑=𝒢(𝜇𝑗,Σ) and let Assumption 2.5 and (5.1) be satisfied. Then the classification rule (2.39) is asymptotically optimal.


  1. P. Hall and J. L. Horowitz, “Methodology and convergence rates for functional linear regression,” The Annals of Statistics, vol. 35, no. 1, pp. 70–91, 2007. View at Publisher · View at Google Scholar · View at Zentralblatt MATH · View at MathSciNet
  2. H. Cardot, F. Ferraty, A. Mas, and P. Sarda, “Testing hypotheses in the functional linear model,” Scandinavian Journal of Statistics, vol. 30, no. 1, pp. 241–255, 2003. View at Publisher · View at Google Scholar · View at Zentralblatt MATH · View at MathSciNet
  3. H. Cardot, F. Ferraty, and P. Sarda, “Functional linear model,” Statistics & Probability Letters, vol. 45, no. 1, pp. 11–22, 1999. View at Publisher · View at Google Scholar · View at Zentralblatt MATH · View at MathSciNet
  4. T. Hastie, A. Buja, and R. Tibshirani, “Penalized discriminant analysis,” The Annals of Statistics, vol. 23, no. 1, pp. 73–102, 1995. View at Publisher · View at Google Scholar · View at Zentralblatt MATH
  5. D. S. Gilliam, T. Hohage, X. Ji, and F. Ruymgaart, “The Fréchet derivative of an analytic function of a bounded operator with some applications,” International Journal of Mathematics and Mathematical Sciences, vol. 2009, Article ID 239025, 17 pages, 2009. View at Google Scholar · View at Zentralblatt MATH
  6. J. Cupidon, D. S. Gilliam, R. Eubank, and F. Ruymgaart, “The delta method for analytic functions of random operators with application to functional data,” Bernoulli, vol. 13, no. 4, pp. 1179–1194, 2007. View at Publisher · View at Google Scholar · View at Zentralblatt MATH · View at MathSciNet
  7. R. L. Eubank and T. Hsing, “Canonical correlation for stochastic processes,” Stochastic Processes and their Applications, vol. 118, no. 9, pp. 1634–1661, 2008. View at Publisher · View at Google Scholar · View at Zentralblatt MATH
  8. G. He, H.-G. Miiller, and J.-L. Wang, “Functional canonical analysis for square integrable stochastic processes,” Journal of Multivariate Analysis, vol. 85, no. 1, pp. 54–77, 2003. View at Publisher · View at Google Scholar · View at Zentralblatt MATH · View at MathSciNet
  9. S. E. Leurgans, R. A. Moyeed, and B. W. Silverman, “Canonical correlation analysis when the data are curves,” Journal of the Royal Statistical Society. Series B, vol. 55, no. 3, pp. 725–740, 1993. View at Google Scholar · View at Zentralblatt MATH
  10. N. Dunford and J. T. Schwartz, Linear Operators, Part I: General Theory, Interscience, New York, NY, USA, 1957.
  11. J. Johannes, “Privileged communication,” 2008. View at Google Scholar
  12. J. O. Ramsay and B. W. Silverman, Functional Data Analysis, Springer Series in Statistics, Springer, New York, NY, USA, 2nd edition, 2005.
  13. F. Ferraty and P. Vieu, Nonparametric Functional Data Analysis, Springer Series in Statistics, Springer, New York, NY, USA, 2006.
  14. D. Bosq, Linear Processes in Function Spaces, vol. 149 of Lecture Notes in Statistics, Springer-Verlag, New York, NY, USA, 2000. View at Publisher · View at Google Scholar · View at Zentralblatt MATH
  15. A. Mas, Estimation d'opérateurs de corrélation de processus fonctionnels: Lois limites, tests, déviations modérées, Ph.D. thesis, Université Paris VI, 2000.
  16. R. G. Laha and V. K. Rohatgi, Probability Theory, John Wiley & Sons, New York, NY, USA, 1979, Wiley Series in Probability and Mathematical Statistic.
  17. C. R. Rao, Linear Statistical Inference and Its Applications, John Wiley & Sons, New York, NY, USA, 1965.
  18. A. Mas and B. Pumo, “Functional linear regression with derivatives,” Tech. Rep., Institut de Modélisation Mathématique de Montpellier, 2006. View at Google Scholar
  19. T. Hastie, R. Tibshirani, and J. Friedman, The Elements of Statistical Learning, Springer Series in Statistics, Springer-Verlag, New York, NY, USA, 2001.
  20. C. R. Rao and H. Toutenburg, Linear Models, Springer Series in Statistics, Springer, New York, NY, USA, 1995.
  21. J. Feldman, “Equivalence and perpendicularity of Gaussian processes,” Pacific Journal of Mathematics, vol. 8, pp. 699–708, 1958. View at Google Scholar · View at Zentralblatt MATH
  22. Y. Hájek, “On a property of normal distribution of any stochastic process,” Czechoslovak Mathematical Journal, vol. 8, no. 83, pp. 610–617, 1958. View at Google Scholar
  23. U. Grenander, Abstract Inference, Wiley Series in Probability and Mathematical Statistic, John Wiley & Sons, New York, NY, USA, 1981.
  24. A. V. Skorohod, Integration in Hilbert Space, Springer, New York, NY, USA, 1974.
  25. G. S. Watson, Statistics on Spheres, John Wiley & Sons, New York, NY, USA, 1983.
  26. T. Kato, Perturbation Theory for Linear Operators, Die Grundlehren der mathematischen Wissenschaften, Band 132, Springer-Verlag, New York, NY, USA, 1966.
  27. J. Dauxois, A. Pousse, and Y. Romain, “Asymptotic theory for the principal component analysis of a vector random function: some applications to statistical inference,” Journal of Multivariate Analysis, vol. 12, no. 1, pp. 136–154, 1982. View at Publisher · View at Google Scholar · View at Zentralblatt MATH · View at MathSciNet
  28. H. Cardot, A. Mas, and P. Sarda, “CLT in functional linear regression models,” Probability Theory and Related Fields, vol. 138, no. 3-4, pp. 325–361, 2007. View at Publisher · View at Google Scholar · View at Zentralblatt MATH · View at MathSciNet
  29. J. L. Hodges, Jr. and E. L. Lehmann, “Testing the approximate validity of statistical hypotheses,” Journal of the Royal Statistical Society. Series B, vol. 16, pp. 261–268, 1954. View at Google Scholar · View at Zentralblatt MATH
  30. H. Dette and A. Munk, “Validation of linear regression models,” The Annals of Statistics, vol. 26, no. 2, pp. 778–800, 1998. View at Publisher · View at Google Scholar · View at Zentralblatt MATH · View at MathSciNet