Abstract
The main result in this paper is the determination of the Fréchet derivative of an
analytic function of a bounded operator, tangentially to the space of all bounded operators. Some applied problems from statistics and numerical analysis are included as a motivation for this study. The perturbation operator (increment) is not of any special form and is not supposed to commute with the operator at which the derivative is evaluated. This generality is important for the applications. In the Hermitian case, moreover, some results on perturbation of an isolated eigenvalue, its eigenprojection, and its eigenvector if the eigenvalue is simple, are also included. Although these results are known in principle, they are not in general formulated in terms of arbitrary perturbations as required for the applications. Moreover, these results are presented as corollaries to the main theorem, so that this paper also provides a short, essentially self-contained review of these aspects of perturbation theory.
1. Introduction
Motivated by certain applications in numerical analysis
and, in particular, statistics, this paper deals with the Fréchet derivative of
an analytic function
of a bounded
linear operator
on a separable
Hilbert space
(in the sense
of the usual functional calculus), tangentially to the Banach space
of all bounded
linear operators mapping
into itself.
More precisely, a first order approximation to the difference
(1.1)is obtained, including the order
of magnitude of the remainder. An example of such a function
is a
generalized or regularized inverse of the square root
(1.2)where
is the identity
operator. Once the Fréchet derivative has been established (Section 2), it
yields the asymptotic distribution of functions of certain random operators via
an ensuing delta method: a well-known statistical technique (see Section 4).
Clearly
can be regarded
as a perturbed version of
, and it is not surprising that perturbation methods
are employed to obtain the desired result. The authors are aware of the
possibility that the rather straightforward result on the Fréchet derivative
might be hidden somewhere in the rich literature on perturbation theory [1–3]. Yet they have not been successful in
identifying a reference that states the result in its present form, tailored to
the applications they have in mind. Some remarks are particularly in order.
(a)
The perturbations
are typically
of small norm but otherwise arbitrary bounded or Hermitian. In literature, they are often of the form
(1.3)for operators
, and a small number
. In statistics, there is no point in representing the
perturbation in such a form.
(b)
The
perturbation
and the
operator
are not assumed
to commute, because in our applications such an assumption would not in general
be fulfilled. If the operators do commute, however, the Fréchet derivative
would reduce to
, in the sense of functional calculus with
the derivative
of
. In the case considered here, the actual Fréchet
derivative and
may differ
considerably.
(c)
A central theme
in perturbation theory concerns the perturbation of an isolated eigenvalue and
corresponding eigenprojection (see, e.g, the references mentioned
before). Some of the results are included, because they can be easily derived
from the main result on the Fréchet derivative by choosing a special function
(Section 3). In
this way, the paper presents a concise and essentially self-contained review of some
basic results in this area. They are again presented in terms of a general
(Hermitian) perturbation
, as being required for statistical application, in the
same vein as, but somewhat more general than, Dauxois et al. [4].
As has already been mentioned in the beginning,
will be a
separable Hilbert space and
the Banach
space of all bounded linear operators mapping
into itself.
The inner product on
will be denoted
by
and the norm by
The norm on
will be written
, and the notation
and
will be used to
denote the subspace of all Hermitian and all compact Hermitian operators, respectively.
We will exclusively deal with infinite dimensional
Hilbert spaces and will not attempt to include the simpler finite dimensional
case in our formulation. The Fréchet derivative for arbitrary perturbations is
well known in the finite dimensional matrix case. This result and further
references can be found in the recent monograph by Bhatia [5]. In the finite
dimensional case, this derivative is also implicitly present in Theorem 2.1 of
Ruymgaart and Yang [6] to obtain the asymptotic distribution of a function
of a random matrix.
2. The Fréchet Derivative
Let us fix an arbitrary
with spectrum
and a bounded open region
in the complex
plane with smooth boundary
, such that
(2.1)Furthermore, let us consider
functions of type
(2.2)where
is an open
neighborhood of
. Let us write
(2.3)
The resolvent
(2.4)is analytic on the resolvent set
, and the
operator
(2.5)is well defined. This relation
establishes an algebra homomorphism [7, Section 17.2] which implies in
particular that
(2.6)if
is also
analytic. In particular, we have
(2.7)
The operators
(2.8)
(2.9) are well defined for every
sufficiently
small. Note that according to Dunford and Schwartz [8, Lemma VII.6.11],
there is a constant
, such that
(2.10)
Theorem 2.1 (Fréchet Derivative).
Let
and suppose
that
satisfies (2.2). Then
maps the
neighborhood
into
, when defined in the usual way of functional
calculus. This mapping is Fréchet differentiable at
, tangentially to
, with bounded derivative
as defined by (2.8). More specifically, we
have
(2.11)
where
is defined in (2.9) and
(2.12)
(2.13)
Proof.
For
to be well
defined on the neighborhood, let us first show that
(2.14)To verify this, note that by
(2.10) we have
for such
. Consequently, the operator
(2.15)is bounded for each
, which entails (2.14). Hence,
(2.16)is well defined for
with
.
Applying a Neumann series expansion [9, Section 5.2] to the inverse on the left in
(2.15), we obtain
(2.17)just as in Watson [10] for
matrices. Term-wise integration yields (2.11).
The upper bounds in (2.12) and
(2.13) are immediate
from (2.8) and (2.9), respectively, by exploiting
(2.3) and (2.15). The boundedness of
as a linear
operator mapping
into itself
follows at once from (2.12).
Remark 2.2.
It will be seen in Section 4 that for the
applications we have in mind it is important that we do not require that
and
commute. If
they do, however, it is clear that the Fréchet derivative in (2.8) reduces
to
(2.18)It has been shown in Dunford
and Schwartz [8, proof of Theorem VII.6.10] that
(2.19)Combination of (2.11) with
(2.18) and (2.19) yields
(2.20)writing, for any
,
(2.21)to indicate any quantity
(operator, vector, number) whose norm or absolute value is of the given order.
Note that in (2.20) the operator,
is to be understood
in the sense of the usual functional calculus as in (2.5) with
replaced by its
derivative
.
In this situation of commuting operators, Dunford and Schwartz [8] obtain the Taylor expansion
(2.22)which implies, of course, (2.20).
Keeping the perturbation as before, we now restrict
to the class
of compact
Hermitian operators. The bounded and countable spectrum consists of the number
, whether an eigenvalue or not, and all the nonzero
eigenvalues
. In this work, we avoid technical issues related to
being an eigenvalue,
and assume that
is one-to-one,
that is,
implies that
. It is well known [7] that such a
can be
represented as
(2.23)where the
are the
corresponding orthogonal eigenprojections onto the mutually orthogonal finite
dimensional eigenspaces. These projections provide a resolution of the identity
in
, that is,
(2.24)The resolvent has the
expansion
(2.25)
Corollary 2.3.
Let the conditions of Theorem 2.1 be fulfilled for
with expansion (2.23). In this case the Fréchet derivative
is given
by
(2.26)
Proof.
Let us substitute the expansion (2.25) for
into the
expression for
in (2.8).
Application of the partial fraction method yields
(2.27)The right-hand side of (2.27)
reduces at once to the expression on the right in (2.26) by an application of
Cauchy's integral formula.
Example 2.4.
The function 
, is analytic on the entire complex plane so that
Corollary 2.3 applies. The Fréchet derivative in (2.26) now reduces
to
(2.28)
. Of course this result is immediate because in this
simple case
.
Example 2.5.
Next let us, for
, consider the function
(2.29)for
. Note that the choice of
ensures that
the pole at
remains outside
the contour
. Clearly there exists an open region
of the type
required, such that
is analytic on
some open neighborhood
of
. Hence Corollary 2.3 applies again. The operator
represents a
regularized or generalized inverse of Tikhonov type, according to whether
is injective or
not. The Fréchet derivative in (2.26) now equals
(2.30)for
.
Remark 2.6.
For 
and
commuting the
double sum on the right in (2.26) cancels and we obtain
(2.31)in accordance with (2.20).
Apparently, the double sum is a correction term needed when
and
do not
commute.
3. Perturbation of Eigenvalues and
Eigenvectors
Throughout this section, both
and
are assumed to
be Hermitian, so that also
. In addition to this, we assume that
(3.1)with one-dimensional eigenspace.
Consequently, the eigenprojection can be written
(3.2)where for
the operator
is defined by
.
The region
will now be
chosen in such a way that it has a connected component
with the properties
(3.3)A special analytic function
such
that
(3.4)will play an important role in
the sequel. Note, for instance, that
(3.5)
For the Fréchet derivative of
at
, a special
expression can be obtained. Let us write
(3.6)where
is Hermitian
with spectrum
. According to the spectral theorem, there exists a
resolution of the identity 
, such that
(3.7)It should be noted
that
(3.8)where
is the zero
operator, and that
(3.9)Let us define
(3.10)
Lemma 3.1.
The Fréchet derivative of
at
is given
by
(3.11)
Proof.
This follows by substitution of (3.9) in the
expression on the right in
(3.12)for this derivative; see also
(2.8). We thus obtain
(3.13)
By Cauchy's integral formula
(3.14)so that the first term on the
right side in (3.13) is the zero operator. Regarding the second, note
that
(3.15)because each
lies outside
the contour
. Consequently, the second term equals
(3.16)Similarly, the third term equals
. The last term cancels, because
(3.17)since both
and
lie outside
.
Some results
about the perturbation of
and
in a given
direction as in (1.3) that are well known in literature [1, 2] can be partly recovered for perturbations in some neighborhood,
in an essentially self-contained manner, as simple consequences of the results
in Section 2.
Corollary 3.2.
Under the assumptions (3.1), (3.2), and for
sufficiently
small, the operator
has an isolated
eigenvalue
with
eigenprojection
for some unit
vector
, satisfying
(3.18)
where
is defined in
(3.10).
Proof.
In view of (3.5) and (3.11), application of (2.11)
with
yields
. Clearly
is Hermitian,
and because
by (2.6), it is
also idempotent so that it is in fact some projection
, for example, it follows that
for all
sufficiently
small, and hence the range of
must also have
dimension 1 [11]
so that
for some
with
.
Next, let 
, be the identity function. By (2.6), again, on the
one hand we have
, and on the other
. Hence
is an
eigenvector of
with eigenvalue
.
Corollary 3.3.
Under the assumptions of Corollary 3.2, we
have
(3.19)
Proof.
Let us first observe that
because of
(3.8). Hence (3.18) yields
, where
(3.20)It sufficies to show that
for
. The idea of the proof can be found in Dauxois et al. [4].
Regarding
, note that
, once more using (3.18). Hence
, as
, and therefore
for
sufficiently
small. This entails
(3.21)
For
we
have
(3.22)as can be seen from (3.21).
Corollary 3.4.
Under the assumptions of Corollary 3.2, we
have
(3.23)
Proof.
With the help of (3.19), we see that
. The result follows from a routine calculation
combined with the equalities 
, and
. For the last two equalities we assume that
and
are Hermitian
and
by (3.8).
Corollary 3.5.
Let
be given by
(2.23) and satisfy (3.2). Then (3.18) and (3.19) remain true
with
(3.24)
Proof.
All nonzero eignvalues of
are isolated,
in particular
. It is immediate from (2.23) that
, and this leads
to the special expression for
in (3.24).
Remark 3.6.
The assumption that
be Hermitian is
in fact not necessary. Of course, if we just require
to be bounded,
the perturbed operator
is not in
general Hermitian anymore. In particular, a suitably modified version of
Corollary 3.3 will now claim the existence of a pair of eigenvectors,
for
and
for
, with expansions
(3.25)as
.
4. Applications
In this section, we will sketch three applications: two
in statistics and one in numerical analysis.
4.1. Noisy Integral Equations
Let
be a compact
injective integral operator, with measurable real kernel denoted by the same
symbol without confusion. More specifically, input
and output
are related
according to
(4.1)In practice, only finitely many
data regarding the output are available, usually blurred by random measurement
error. If the data are collected according to a random design, we may think of
the data set as of
independent
copies
of a pair
of random
variables, where
(4.2)the design variable
has a Uniform
distribution,
the error variable
has finite
variance and zero mean, and where
and
are
stochastically independent.
It is the purpose to recover
from these
data. It is expedient to “precondition” with the adjoint operator
and recover
from the
equation
(4.3)where
is compact,
Hermitian, and strictly positive. Under suitable conditions,
(4.4)is an unbiased and
-consistent
estimator of
; see, for instance, van Rooij and Ruymgaart
[12]. Since
is unbounded,
an estimator of the input
is obtained by
applying a regularized inverse of
to
. Here we will use the Tikhonov type
inverse
(4.5)where
; see also (2.29). This yields the input
estimator
(4.6)To assess the quality of the
estimator, one considers the mean integrated squared error
(MISE)
(4.7)The behavior of the MISE is
well studied in literature.
Recently, there is an interest in certain econometric
models where the operator
(or
) is unknown
but can be estimated from the data. Let
denote an
estimator of
and assume that
is also
compact, Hermitian, and nonnegative. In this case, the input
estimator
(4.8)will be employed. One expects
that estimation of
will increase
the MISE, and naturally the question arises how much bigger the MISE of
will be than
that of
.
An upper bound for this increase of the MISE can be
easily found from the results in Section 2. For large sample size 
will be close
to
, and
can be
considered as a small random perturbation of
. Writing
for the Fréchet
derivative at
, we see from Theorem 2.1 that
(4.9)Apparently,
is an extra
error term due to the estimation of
.
To find an upper bound for its MISE, let us first
observe that (2.30) simplifies for
and
yields
(4.10)where now the
and the
are the
spectral characteristics of
. Let us write, for brevity,
(4.11)and note that
(4.12)We thus arrive
at
(4.13)
Hence, under suitable assumptions, estimation of the
kernel yields an extra term in the MISE of the input estimator which is of
order
. In the Russian literature, sharper bounds can be
found; see in particular Bakushinsky and Kokurin [13,Section 2.2]. For
results of this type in the statistical literature, obtained in a different
manner, see, for instance, Hall and Horowitz [14] and Florens [15].
4.2. Some Asymptotics for Functional Canonical Correlations
Let
be a real
random element in the Hilbert space
and assume that
. Its mean
and covariance
operator
are
well defined by the relations 
for all
. The operator
is known to be
of finite trace and hence Hilbert-Schmidt and compact. It is also nonnegative
Hermitian. Without real loss of generality, we will assume
to be
injective, so that it will be strictly positive.
Next suppose that we are given a random sample
of independent
copies of
. The usual estimators of
and
are
and
, respectively,
where
shares all the
properties of
, except that it cannot be injective because it has a
finite dimensional kernel whose range has dimension at most
.
Because
cannot be
injective, the finite dimensional definition of sample canonical correlation has to be modified, and some kind of smoothing
or regularization is recommended in literature [16]. Regularization might even
be useful when the population is considered, although
is injective
[17]. This
regularization yields Tikhonov type inverses in an expression for the canonical
correlation .
For a precise definition, let
and
be two closed
subspaces of
and
the orthogonal
projection onto
(
). Let us write
, and note that
for
. Similar notation will be used for
. The regularized squared principal canonical
correlation for the population is now
defined as
(4.14)Its sample analogue
is obtained by
replacing the
with
in (4.14). The
supremum is actually a maximum, and pairs of maximizers will be denoted by 
, and 
, respectively.
The corresponding canonical variates then are
(4.15)
For an alternative description of these canonical
correlations, let us introduce the
operator
(4.16)Interchanging the indices
and
yields
, and replacing
with
yields
and
. It can be seen that all these operators are
Hilbert-Schmidt and strictly positive Hermitian. It will be assumed that
has the largest
eigenvalue with one-dimensional eigenspace generated by
with
. Under this condition, it has been shown in Cupidon et al. [18] that
(4.17)for
. A similar result holds true for
.
It is well known that the asymptotic distribution of
the eigenvalues and eigenfunctions of a random operator can be derived from the
asymptotic distribution of this random operator itself (see [10] for
Euclidean spaces and [4] for Hilbert spaces). This technique is based on the results of Section 3. In the present situation, this means that we have to show the convergence in
distribution of the suitably standardized
. Because all operators are Hilbert-Schmidt, it can be
shown that
(4.18)
Result (4.18) follows easily if convergence in
distribution can be established for each of the factors defining
, for instance,
(4.19)where this time
, compare (2.29).It is known [4] that
(4.20)for some Gaussian random element
, where
is the Hilbert
space of all Hilbert-Schmidt operators mapping
into itself.
Writing
for the Fréchet
derivative evaluated at
(Section 2) and
exploiting the fact that the imbedding of
is
are continuous,
we obtain via a kind of delta-method [18, 19]
(4.21)the desired result. A
combination of results like this for each of the factors of
yields (4.18).
4.3. Solution of a Nonlinear Operator Equation
In Bakushinsky and Kokurin [13], the following
problem is considered. Let
and
be Hilbert
spaces and
an operator,
not necessarily linear. The (nonlinear) equation
(4.22)is studied. Let
be a solution
of (4.22) and introduce a set
, for some
. It is assumed that
is Fréchet
differentiable on
. If
is the
derivative at
it is,
moreover, assumed that
(4.23)where
is a given
number. Given an initial point
and a sequence 
, of regularization parameters, these authors show
that, under some further conditions, the generalized Gauss-Newton method
generates a sequence of points
such
that
(4.24)
In their proof of this result, the authors need a
crucial upper bound. Under some additional assumptions, we want to derive this
upper bound as an immediate consequence of Theorem 2.1. In order to relate the
present problem to the setup of our paper, let us assume that
, and note that
(4.25)For
, let
(4.26)and set
(4.27)where obviously
. It is not hard to see that (4.23)
entails
(4.28)for some
. Let
be the contour
in (2.1) and
the
corresponding domain. As in Bakushinsky and Kokurin [13], a function 
, is employed in the iteration scheme, which is
analytic on
.
Narrowing down the generality in Bakushinsky and
Kokurin [13] somewhat further, so that the current conditions are satisfied,
their proof of the convergence of the iterations requires an upper bound for
the expression (in our notation)
(4.29)Keeping
fixed, let us
briefly write this last expression as
. Now Theorem 2.1 applies with
, and application yields at once
(4.30)for some
, by (4.28).
Acknowledgments
The authors are grateful
to the referee for some useful comments. For this research, D. S. Gilliam was
supported by AFOSR Grant no. FA9550-04-1027 and F. H. Ruymgaart by NSF Grant no. DMS-0605167.
References
- T. Kato, Perturbation Theory for Linear Operators, Springer, Berlin, Germany, 1966.
- F. Rellich, Perturbation Theory of Eigenvalue Problems, Gordon and Breach, New York, NY, USA, 1969.
- F. Chatelin, Spectral Approximation of Linear Operators, Computer Science and Applied Mathematics, Academic Press, New York, NY, USA, 1983.
- J. Dauxois, A. Pousse, and Y. Romain, “Asymptotic theory for the principal component analysis of a vector random function: some
applications to statistical inference,” Journal of Multivariate Analysis, vol. 12, no. 1, pp. 136–154, 1982.
- R. Bhatia, Positive Definite Matrices, Princeton Series in Applied Mathematics, Princeton University Press, Princeton, NJ, USA, 2007.
- F. Ruymgaart and S. Yang, “Some applications of Watson's perturbation approach to random matrices,” Journal of Multivariate Analysis, vol. 60, no. 1, pp. 48–60, 1997.
- P. D. Lax, Functional Analysis, Pure and Applied Mathematics, John Wiley & Sons, New York, NY, USA, 2002.
- N. Dunford and J. T. Schwartz, Linear Operators. Part I: General Theory, Wiley Classics Library, Wiley-Interscience, New York, NY, USA, 1988.
- L. Debnath and P. Mikusiński, Introduction to Hilbert Spaces with Applications, Academic Press, San Diego, Calif, USA, 2nd edition, 1999.
- G. S. Watson, Statistics on Spheres, vol. 6 of University of Arkansas Lecture Notes in the Mathematical Sciences, John Wiley & Sons, New York, NY, USA, 1983.
- F. Riesz and B. Sz.-Nagy, Functional Analysis, Dover Books on Advanced Mathematics, Dover, New York, NY, USA, 1990.
- A. C. M. van Rooij and F. Ruymgaart, “Asymptotic minimax rates for abstract linear estimators,” Journal of Statistical Planning and Inference, vol. 53, no. 3, pp. 389–402, 1996.
- A. B. Bakushinsky and M. Yu. Kokurin, Iterative Methods for Approximate Solution of Inverse Problems, vol. 577 of Mathematics and Its Applications, Springer, Dordrecht, The Netherlands, 2004.
- P. Hall and J. L. Horowitz, “Nonparametric methods for inference in the presence of instrumental variables,” The Annals of Statistics, vol. 33, no. 6, pp. 2904–2929, 2005.
- J.-P. Florens, “Inverse problems and structural econometrics: the example of instrumental variables,” in Advances in Economics and Econometrics: Theory and Applications Dewatripont, M. Hanson and S. J. Turnovsky, Eds., vol. 2, pp. 284–311, Cambridge University Press, Cambridge, UK, 2003.
- S. E. Leurgans, R. A. Moyeed, and B. W. Silverman, “Canonical correlation analysis when the data are curves,” Journal of the Royal Statistical Society. Series B, vol. 55, no. 3, pp. 725–740, 1993.
- J. Cupidon, R. Eubank, D. S. Gilliam, and F. Ruymgaart, “Some properties of canonical correlations and variates in infinite dimensions,” Journal of Multivariate Analysis, vol. 99, no. 6, pp. 1083–1104, 2008.
- J. Cupidon, D. S. Gilliam, R. Eubank, and F. Ruymgaart, “The delta method for analytic functions of random operators with application to functional data,” Bernoulli, vol. 13, no. 4, pp. 1179–1194, 2007.
- A. W. van der Vaart, Asymptotic Statistics, Cambridge University Press, Cambridge, UK, 1998.