Abstract

The main result in this paper is the determination of the Fréchet derivative of an analytic function of a bounded operator, tangentially to the space of all bounded operators. Some applied problems from statistics and numerical analysis are included as a motivation for this study. The perturbation operator (increment) is not of any special form and is not supposed to commute with the operator at which the derivative is evaluated. This generality is important for the applications. In the Hermitian case, moreover, some results on perturbation of an isolated eigenvalue, its eigenprojection, and its eigenvector if the eigenvalue is simple, are also included. Although these results are known in principle, they are not in general formulated in terms of arbitrary perturbations as required for the applications. Moreover, these results are presented as corollaries to the main theorem, so that this paper also provides a short, essentially self-contained review of these aspects of perturbation theory.

1. Introduction

Motivated by certain applications in numerical analysis and, in particular, statistics, this paper deals with the Fréchet derivative of an analytic function of a bounded linear operator on a separable Hilbert space (in the sense of the usual functional calculus), tangentially to the Banach space of all bounded linear operators mapping into itself. More precisely, a first order approximation to the difference is obtained, including the order of magnitude of the remainder. An example of such a function is a generalized or regularized inverse of the square rootwhere is the identity operator. Once the Fréchet derivative has been established (Section 2), it yields the asymptotic distribution of functions of certain random operators via an ensuing delta method: a well-known statistical technique (see Section 4).

Clearly can be regarded as a perturbed version of , and it is not surprising that perturbation methods are employed to obtain the desired result. The authors are aware of the possibility that the rather straightforward result on the Fréchet derivative might be hidden somewhere in the rich literature on perturbation theory [13]. Yet they have not been successful in identifying a reference that states the result in its present form, tailored to the applications they have in mind. Some remarks are particularly in order.

(a)The perturbations are typically of small norm but otherwise arbitrary bounded or Hermitian. In literature, they are often of the formfor operators , and a small number . In statistics, there is no point in representing the perturbation in such a form.(b)The perturbation and the operator are not assumed to commute, because in our applications such an assumption would not in general be fulfilled. If the operators do commute, however, the Fréchet derivative would reduce to , in the sense of functional calculus with the derivative of . In the case considered here, the actual Fréchet derivative and may differ considerably.(c)A central theme in perturbation theory concerns the perturbation of an isolated eigenvalue and corresponding eigenprojection (see, e.g, the references mentioned before). Some of the results are included, because they can be easily derived from the main result on the Fréchet derivative by choosing a special function (Section 3). In this way, the paper presents a concise and essentially self-contained review of some basic results in this area. They are again presented in terms of a general (Hermitian) perturbation , as being required for statistical application, in the same vein as, but somewhat more general than, Dauxois et al. [4].

As has already been mentioned in the beginning, will be a separable Hilbert space and the Banach space of all bounded linear operators mapping into itself. The inner product on will be denoted by and the norm by The norm on will be written , and the notation and will be used to denote the subspace of all Hermitian and all compact Hermitian operators, respectively.

We will exclusively deal with infinite dimensional Hilbert spaces and will not attempt to include the simpler finite dimensional case in our formulation. The Fréchet derivative for arbitrary perturbations is well known in the finite dimensional matrix case. This result and further references can be found in the recent monograph by Bhatia [5]. In the finite dimensional case, this derivative is also implicitly present in Theorem 2.1 of Ruymgaart and Yang [6] to obtain the asymptotic distribution of a function of a random matrix.

2. The Fréchet Derivative

Let us fix an arbitrary with spectrum and a bounded open region in the complex plane with smooth boundary , such thatFurthermore, let us consider functions of typewhere is an open neighborhood of . Let us write

The resolventis analytic on the resolvent set , and the operatoris well defined. This relation establishes an algebra homomorphism [7, Section 17.2] which implies in particular thatif is also analytic. In particular, we have

The operators are well defined for every sufficiently small. Note that according to Dunford and Schwartz [8, Lemma VII.6.11], there is a constant , such that

Theorem 2.1 (Fréchet Derivative). Let and suppose that satisfies (2.2). Then maps the neighborhood into , when defined in the usual way of functional calculus. This mapping is Fréchet differentiable at , tangentially to , with bounded derivative as defined by (2.8). More specifically, we havewhere is defined in (2.9) and

Proof. For to be well defined on the neighborhood, let us first show thatTo verify this, note that by (2.10) we have for such . Consequently, the operatoris bounded for each , which entails (2.14). Hence, is well defined for with .
Applying a Neumann series expansion [9, Section 5.2] to the inverse on the left in (2.15), we obtainjust as in Watson [10] for matrices. Term-wise integration yields (2.11).
The upper bounds in (2.12) and (2.13) are immediate from (2.8) and (2.9), respectively, by exploiting (2.3) and (2.15). The boundedness of as a linear operator mapping into itself follows at once from (2.12).

Remark 2.2. It will be seen in Section 4 that for the applications we have in mind it is important that we do not require that and commute. If they do, however, it is clear that the Fréchet derivative in (2.8) reduces toIt has been shown in Dunford and Schwartz [8, proof of Theorem VII.6.10] thatCombination of (2.11) with (2.18) and (2.19) yieldswriting, for any ,to indicate any quantity (operator, vector, number) whose norm or absolute value is of the given order. Note that in (2.20) the operator, is to be understood in the sense of the usual functional calculus as in (2.5) with replaced by its derivative .
In this situation of commuting operators, Dunford and Schwartz [8] obtain the Taylor expansionwhich implies, of course, (2.20).
Keeping the perturbation as before, we now restrict to the class of compact Hermitian operators. The bounded and countable spectrum consists of the number , whether an eigenvalue or not, and all the nonzero eigenvalues . In this work, we avoid technical issues related to being an eigenvalue, and assume that is one-to-one, that is, implies that . It is well known [7] that such a can be represented aswhere the are the corresponding orthogonal eigenprojections onto the mutually orthogonal finite dimensional eigenspaces. These projections provide a resolution of the identity in , that is,The resolvent has the expansion

Corollary 2.3. Let the conditions of Theorem 2.1 be fulfilled for with expansion (2.23). In this case the Fréchet derivative is given by

Proof. Let us substitute the expansion (2.25) for into the expression for in (2.8). Application of the partial fraction method yieldsThe right-hand side of (2.27) reduces at once to the expression on the right in (2.26) by an application of Cauchy's integral formula.

Example 2.4. The function , , is analytic on the entire complex plane so that Corollary 2.3 applies. The Fréchet derivative in (2.26) now reduces to. Of course this result is immediate because in this simple case .

Example 2.5. Next let us, for , consider the functionfor . Note that the choice of ensures that the pole at remains outside the contour . Clearly there exists an open region of the type required, such that is analytic on some open neighborhood of . Hence Corollary 2.3 applies again. The operator represents a regularized or generalized inverse of Tikhonov type, according to whether is injective or not. The Fréchet derivative in (2.26) now equalsfor .

Remark 2.6. For , and commuting the double sum on the right in (2.26) cancels and we obtainin accordance with (2.20). Apparently, the double sum is a correction term needed when and do not commute.

3. Perturbation of Eigenvalues and Eigenvectors

Throughout this section, both and are assumed to be Hermitian, so that also . In addition to this, we assume thatwith one-dimensional eigenspace. Consequently, the eigenprojection can be writtenwhere for the operator is defined by .

The region will now be chosen in such a way that it has a connected component with the propertiesA special analytic function such thatwill play an important role in the sequel. Note, for instance, that

For the Fréchet derivative of at , a special expression can be obtained. Let us writewhere is Hermitian with spectrum . According to the spectral theorem, there exists a resolution of the identity , , such thatIt should be noted thatwhere is the zero operator, and thatLet us define

Lemma 3.1. The Fréchet derivative of at is given by

Proof. This follows by substitution of (3.9) in the expression on the right infor this derivative; see also (2.8). We thus obtain
By Cauchy's integral formulaso that the first term on the right side in (3.13) is the zero operator. Regarding the second, note thatbecause each lies outside the contour . Consequently, the second term equalsSimilarly, the third term equals . The last term cancels, becausesince both and lie outside .

Some results about the perturbation of and in a given direction as in (1.3) that are well known in literature [1, 2] can be partly recovered for perturbations in some neighborhood, in an essentially self-contained manner, as simple consequences of the results in Section 2.

Corollary 3.2. Under the assumptions (3.1), (3.2), and for sufficiently small, the operator has an isolated eigenvalue with eigenprojection for some unit vector , satisfyingwhere is defined in (3.10).

Proof. In view of (3.5) and (3.11), application of (2.11) with yields . Clearly is Hermitian, and because by (2.6), it is also idempotent so that it is in fact some projection , for example, it follows that for all sufficiently small, and hence the range of must also have dimension 1 [11] so that for some with .
Next, let , , be the identity function. By (2.6), again, on the one hand we have , and on the other . Hence is an eigenvector of with eigenvalue .

Corollary 3.3. Under the assumptions of Corollary 3.2, we have

Proof. Let us first observe that because of (3.8). Hence (3.18) yields , whereIt sufficies to show that for . The idea of the proof can be found in Dauxois et al. [4].
Regarding , note that , once more using (3.18). Hence , as , and therefore for sufficiently small. This entails
For we haveas can be seen from (3.21).

Corollary 3.4. Under the assumptions of Corollary 3.2, we have

Proof. With the help of (3.19), we see that . The result follows from a routine calculation combined with the equalities , , and . For the last two equalities we assume that and are Hermitian and by (3.8).

Corollary 3.5. Let be given by (2.23) and satisfy (3.2). Then (3.18) and (3.19) remain true with

Proof. All nonzero eignvalues of are isolated, in particular . It is immediate from (2.23) that , and this leads to the special expression for in (3.24).

Remark 3.6. The assumption that be Hermitian is in fact not necessary. Of course, if we just require to be bounded, the perturbed operator is not in general Hermitian anymore. In particular, a suitably modified version of Corollary 3.3 will now claim the existence of a pair of eigenvectors, for and for , with expansionsas .

4. Applications

In this section, we will sketch three applications: two in statistics and one in numerical analysis.

4.1. Noisy Integral Equations

Let be a compact injective integral operator, with measurable real kernel denoted by the same symbol without confusion. More specifically, input and output are related according toIn practice, only finitely many data regarding the output are available, usually blurred by random measurement error. If the data are collected according to a random design, we may think of the data set as of independent copies of a pair of random variables, wherethe design variable has a Uniform distribution, the error variable has finite variance and zero mean, and where and are stochastically independent.

It is the purpose to recover from these data. It is expedient to “precondition” with the adjoint operator and recover from the equationwhere is compact, Hermitian, and strictly positive. Under suitable conditions, is an unbiased and -consistent estimator of ; see, for instance, van Rooij and Ruymgaart [12]. Since is unbounded, an estimator of the input is obtained by applying a regularized inverse of to . Here we will use the Tikhonov type inversewhere ; see also (2.29). This yields the input estimatorTo assess the quality of the estimator, one considers the mean integrated squared error (MISE)The behavior of the MISE is well studied in literature.

Recently, there is an interest in certain econometric models where the operator (or ) is unknown but can be estimated from the data. Let denote an estimator of and assume that is also compact, Hermitian, and nonnegative. In this case, the input estimatorwill be employed. One expects that estimation of will increase the MISE, and naturally the question arises how much bigger the MISE of will be than that of .

An upper bound for this increase of the MISE can be easily found from the results in Section 2. For large sample size , will be close to , and can be considered as a small random perturbation of . Writing for the Fréchet derivative at , we see from Theorem 2.1 thatApparently, is an extra error term due to the estimation of .

To find an upper bound for its MISE, let us first observe that (2.30) simplifies for and yieldswhere now the and the are the spectral characteristics of . Let us write, for brevity,and note thatWe thus arrive at

Hence, under suitable assumptions, estimation of the kernel yields an extra term in the MISE of the input estimator which is of order . In the Russian literature, sharper bounds can be found; see in particular Bakushinsky and Kokurin [13,Section 2.2]. For results of this type in the statistical literature, obtained in a different manner, see, for instance, Hall and Horowitz [14] and Florens [15].

4.2. Some Asymptotics for Functional Canonical Correlations

Let be a real random element in the Hilbert space and assume that . Its mean and covariance operator are well defined by the relations , for all . The operator is known to be of finite trace and hence Hilbert-Schmidt and compact. It is also nonnegative Hermitian. Without real loss of generality, we will assume to be injective, so that it will be strictly positive.

Next suppose that we are given a random sample of independent copies of . The usual estimators of and are and , respectively, where shares all the properties of , except that it cannot be injective because it has a finite dimensional kernel whose range has dimension at most .

Because cannot be injective, the finite dimensional definition of sample canonical correlation has to be modified, and some kind of smoothing or regularization is recommended in literature [16]. Regularization might even be useful when the population is considered, although is injective [17]. This regularization yields Tikhonov type inverses in an expression for the canonical correlation .

For a precise definition, let and be two closed subspaces of and the orthogonal projection onto (). Let us write , and note that for . Similar notation will be used for . The regularized squared principal canonical correlation for the population is now defined asIts sample analogue is obtained by replacing the with in (4.14). The supremum is actually a maximum, and pairs of maximizers will be denoted by , , and , , respectively. The corresponding canonical variates then are

For an alternative description of these canonical correlations, let us introduce the operatorInterchanging the indices and yields , and replacing with yields and . It can be seen that all these operators are Hilbert-Schmidt and strictly positive Hermitian. It will be assumed that has the largest eigenvalue with one-dimensional eigenspace generated by with . Under this condition, it has been shown in Cupidon et al. [18] thatfor . A similar result holds true for .

It is well known that the asymptotic distribution of the eigenvalues and eigenfunctions of a random operator can be derived from the asymptotic distribution of this random operator itself (see [10] for Euclidean spaces and [4] for Hilbert spaces). This technique is based on the results of Section 3. In the present situation, this means that we have to show the convergence in distribution of the suitably standardized . Because all operators are Hilbert-Schmidt, it can be shown that

Result (4.18) follows easily if convergence in distribution can be established for each of the factors defining , for instance,where this time , compare (2.29).It is known [4] that for some Gaussian random element , where is the Hilbert space of all Hilbert-Schmidt operators mapping into itself. Writing for the Fréchet derivative evaluated at (Section 2) and exploiting the fact that the imbedding of is are continuous, we obtain via a kind of delta-method [18, 19]the desired result. A combination of results like this for each of the factors of yields (4.18).

4.3. Solution of a Nonlinear Operator Equation

In Bakushinsky and Kokurin [13], the following problem is considered. Let and be Hilbert spaces and an operator, not necessarily linear. The (nonlinear) equationis studied. Let be a solution of (4.22) and introduce a set , for some . It is assumed that is Fréchet differentiable on . If is the derivative at it is, moreover, assumed thatwhere is a given number. Given an initial point and a sequence , , of regularization parameters, these authors show that, under some further conditions, the generalized Gauss-Newton method generates a sequence of points such that

In their proof of this result, the authors need a crucial upper bound. Under some additional assumptions, we want to derive this upper bound as an immediate consequence of Theorem 2.1. In order to relate the present problem to the setup of our paper, let us assume that , and note thatFor , letand setwhere obviously . It is not hard to see that (4.23) entailsfor some . Let be the contour in (2.1) and the corresponding domain. As in Bakushinsky and Kokurin [13], a function , , is employed in the iteration scheme, which is analytic on .

Narrowing down the generality in Bakushinsky and Kokurin [13] somewhat further, so that the current conditions are satisfied, their proof of the convergence of the iterations requires an upper bound for the expression (in our notation)Keeping fixed, let us briefly write this last expression as . Now Theorem 2.1 applies with , and application yields at oncefor some , by (4.28).

Acknowledgments

The authors are grateful to the referee for some useful comments. For this research, D. S. Gilliam was supported by AFOSR Grant no. FA9550-04-1027 and F. H. Ruymgaart by NSF Grant no. DMS-0605167.