Abstract

This paper reviews recent research on dependent functional data. After providing an introduction to functional data analysis, we focus on two types of dependent functional data structures: time series of curves and spatially distributed curves. We review statistical models, inferential methodology, and possible extensions. The paper is intended to provide a concise introduction to the subject with plentiful references.

1. Introduction

Functional data analysis (FDA) is a relatively new branch of statistics, going back to the early 1990s, but its mathematical foundations are rooted in much earlier developments in the theory of operators in a Hilbert space and the functional analysis. In the most basic setting, the sample consists of curves . The set is typically an interval of the line. In increasingly many applications, it is however a subset of the plane, or a sphere, or even a 3D region. In those cases, the data are surfaces over a region, or more general functions over some domain, hence the term functional data. This survey is concerned mostly with the analysis of curves, but some references to more general functions are given in Section 1.1.

Functional data are high-dimensional data, as, in a statistical model, each functions consists of infinitely many values . In traditional statistics, the data consist of a sample of scalars or vectors. For example, for each survey participant, we may record age, gender, income, and education level. The data point thus has dimension four; it is a vector with quantitative and categorical entries. High-dimensional data typically have dimension comparable to or larger than the sample size. As they are often analyzed using regression models in which the sample size is denoted by and the number of explanatory variables by , high-dimensional data often fall into the “large , small ” paradigm, but clearly they form a much broader class, with a great deal of work focusing on covariance matrices based on a sample of  -dimensional vectors. A distinctive feature of functional data is that the curves or surfaces are assumed to be smooth in some sense; if is close , the values and should be similar. In the “large , small ” paradigm, there need not be any natural ordering of the covariates or any natural measure of distance between them. The analysis often focuses on the selection of a small number of relevant covariates (the variable selection problem). In the FDA, the analysis involves obtaining a smooth, low dimensional representation of each curve.

FDA views each curve in a sample as a separate statistical object. In this sense, FDA is part of the object data analysis in which data points are not scalars or vectors, but structures which are modeled by complex mathematical objects, for example, by graphs. Some references are given in Section 1.1.

However, even curves are far more complicated structures than scalars or vectors. The curves are characterized not only by magnitude but also by shape. The shape of a random curve plays a role analogous to the dependence between the coordinates of a random vector. Human growth curves provide a well-known example. Suppose that there are randomly selected subjects of the same gender. Let be the height of the th subject measured at time from birth. The points are different for different subjects. Using methods of FDA, we can construct continuous and differentiable curves . The shapes and magnitudes of these curves give us an idea about the variability in the process of growth, rather just about the variability of the final height, which can be assessed using the scalars .

Some data can be very naturally viewed as curves. For example, if the height measurements are available at a fairly regular and sufficiently dense grid of times , it is easy to visualize them as curves, even though it is not immediately obvious how to compute derivatives of such curves. In many situations, the points are extremely dense. For example, physical instruments may return an observation every five seconds, so in a day, we will have 17,280 values . A day is a natural time domain in many applications, and the problem is to replace the 17,280 values available in day by a smaller more manageable set of numbers. This is generally possible due to the assumption of some smoothness. At the other extreme are sparse longitudinal data. Such data often arise in medical research. For example, a measurement can be made on a patient only several times during the course of treatment. Yet we know that the quantity that is measured exists at any time, so it is a curve that is observed only at a few sparse time points. References to the relevant functional methodology are given in Section 1.1.

Growth curves or sparse observations on a sample of patients can be viewed as independent curves drawn from a population of interest. A large body of research in FDA has been motivated by various problems arising in such a setting. At the same time, many functional data sets, most notably in physical and environmental sciences, arise from long records of observations. An example is presented in Figure 1 which shows seven consecutive functional observations (curves). These curves show a very rough periodic pattern, but modeling periodicity is difficult, as this pattern is, in fact, severely disturbed several times a month due to ionospheric storms. The 24 h period must however enter into any statistical model as it is caused by the rotation of the Earth. It is thus natural in this context to treat the long continuous record as consisting of consecutive curves, each defined over a 24 h time interval. Space physics researchers have long been associating the occur enhancements on a given day with physical phenomena in near Earth space. This gives additional support to treating these data as a time series of curves of evolving shape, which we will call a functional time series. Similar functional series arise, for example, in urban pollution monitoring studies.

A functional time series does not however need to arise from cutting a continuous time record into adjacent pieces of natural length. Figure 2 shows a functional time series of curves representing (centered) Eurodollar futures prices. For each day, the time is not time within that day, but a time to the expiration of a contact, more details are given in Chapter 14 of Horváth and Kokoszka [1]. The curves show how the prices of contracts with various expiration horizons evolve from day to day.

Functional time series focus on temporal or otherwise sequential dependence between curves. In many applications, the curves are available at points in space. Such spatially indexed functional data will be denoted , where are locations in some region and is the value of the function at time . To compare such data structures to functional time series, consider two ways of looking at pollution data. If we consider only one location and are interested in the day to day pattern of pollution, we may consider curves , where is the day index and is the time within the day. In many studies, we are however interested in long-term trends in pollution over a region, for example, a large city. In such a case, we may wish to smooth out the day to day variability on consider the curves , where is the location of a measurement station within the city and is time within a few decades. Such a data structure is difficult to show in one graph. The spatial location can be shown on a map. The curves must be shown on a separate graph, as in Figure 3. As for the functional time series, the main feature of such data is that there is dependence of the characteristics of the curves which depended on the distance between them.

Of course, one can consider a more complex structure in which one may study spatially indexed functional time series, with the data being ; for a fixed , we have a functional time series, for a fixed , we have a functional spatial random field. We briefly discuss such a setting in Section 5.

The focus of this paper is on the recent developments in the research on dependent functional data. Section 3 considers functional time series, and Section 4 spatially indexed functions. The required background is provided in Section 2. A summary, which includes the discussion of some problems of current interest, is given in Section 5.

1.1. Further Reading

In this section, we discuss several important topics mentioned above, mostly by directing the reader to appropriate references.

1.1.1. Monographs on FDA

The two editions of the book of Ramsay and Silverman [2] have done a lot to introduce FDA to the statistics community and beyond. The monograph introduces most of the fundamental ideas of FDA including penalized smoothing, data registration, functional linear regression, functional principal components and functional analysis of derivatives. Ramsay and Silverman [3] elaborate on many data examples in Ramsay and Silverman [2] by providing detailed case studies, while Ramsay et al. [4] focus on the computational aspects of the analyses introduced in Ramsay and Silverman [2]. These books are concerned with iid samples of curves.

The book of Ferraty and Vieu [5] covers nonparametric methods with special emphasis on nonparametric prediction and classification. It has several chapters on dependent curves (-mixing). It studies both practical and abstract aspects of the problems. The collections Ferraty and Romain [6] and Ferraty [7] contain contributions from prominent researchers illustrating the increasingly many facets of FDA. Ferraty and Romain [6] has a smaller number of detailed papers, while Ferraty [7] has a large number of short communications. Shi and Choi [8] consider Gaussian regression models.

The monograph of Bosq [9] is highly recommended to anyone seeking a fast and rigorous introduction to the theory of functional time series. It contains all practically necessary facts from the theory of probability in Hilbert and Banach spaces, and a great deal of relevant theory of linear functional time series. Bosq and Blanke [10] elaborate on some aspects of this theory in an abstract way.

To fully understand some theoretical aspects of FDA a good background in the theory of Hilbert spaces is needed. There are many introductory textbooks, my personal favorites are Riesz and Nagy [11], Akhiezier and Glazman [12], and Debnath and Mikusinski [13]. There are fewer books on probability in Hilbert and Banach spaces, I have benefited from the monographs of Araujo and Giné [14], Linde [15], and Vakhaniia et al. [16].

The monograph of Horváth and Kokoszka [1] elaborates on many topics mentioned in this paper, especially those related to the functional principal components.

1.1.2. Basis Expansions

In this paper, we focus on procedures for densely observed curves. For such data, functional objects needed to perform the calculations can be created using the R package fda. We will not address the details of calculations, but note that all necessary background is given in Ramsay et al. [4]. In the following, we often refer the choice of the a basis and the number of basis functions. We now briefly discuss this point.

Functions are observed at discrete sampling values , , which may or may not be equally spaced. We work with functions. These data are converted to functional objects. In order to do this, we need to specify a basis. A basis is a system of basis functions, a linear combination of which defines the functional objects. The elements of a basis may or may not be orthogonal. We express a functional observation as where the are the basis functions. One of the advantages of this approach is that instead of storing all the data points, one stores the coefficients of the expansion, that is, the . This step thus involves an initial dimension reduction and some smoothing. All subsequent computations are performed on the matrices built from the coefficients . The number of the basis functions impacts the performance of some procedures, but others are fairly insensitive to its choice. For the data studied in this paper, we generally choose so that the plotted functional objects resemble original data with some smoothing that eliminates the most obvious noise. The two most commonly used basis systems are the Fourier and the B-spline bases. The Fourier basis is usually used for periodic, or nearly periodic functions with no strong local features and a roughly constant curvature. They are inappropriate for data with discontinuities in the function itself or in low order derivatives. The B-spline basis is typically used for nonperiodic locally smooth data. The B-spline basis functions are not orthogonal.

1.1.3. Sparsely Observed Functions

As noted above, in some applications, the data are available only at a few sparsely distributed points , which may be different for different curves, and the data may exhibit nonnegligible measurement errors. Yao et al. [17] introduce methodology to deal with such data; smoothing with a basis expansion is at best inappropriate, and often not feasible. The essence of the alternative approach is explained in Müller [18]. While there are various elaborations, see for example, Hall et al. [19], the idea is that smoothing must be applied to all observations collectively, not to individual trajectories (which basically do not exist for sparse data). The mean function is first obtained by smoothing the mean of all trajectories. Then the functional principal components are calculated as the eigenfunctions of a covariance kernel obtained by surface smoothing. Extensions of this technique have been applied in many settings, see for example, Müller and Stadtmüller [20], Müller et al. [21], Yao and Müller [22], and Di and Cariniceanu [23].

1.1.4. Software

All procedures described in this paper can be implemented in readily available statistical software. Ramsay et al. [4] provide a solid introduction to computational issues for functional data, as well as numerous examples. Their book describes the R package fda and analogous Matlab code. Clarkson et al. [24] describe the implementation in S+. Many specialized R packages now exist. Techniques for sparse data are implemented in the Matlab package PACE developed at the University of California at Davis, available at http://anson.ucdavis.edu/~mueller/data/software.html, at the time of writing.

1.1.5. Extensions beyond Curves

Various brain scans can be viewed as functions over a spatial domain. Analysis of such data is discussed in several papers, see for example, Reiss and Ogden [25] and Zipunnikov et al. [26]. Guillas and Lai [27], consider functional regression models with surfaces as regressors, Morris et al. [28] propose a Bayesian framework for the analysis of images.

The study of the brain also provides motivation to study complex statistical objects. Aydin et al. [29] model blood vessels in the brain as trees. Kenobi et al. [30] and Crane and Patrangenaru [31] provide references to data analysis on manifolds.

2. A Modeling Framework for Functional Data Analysis

This paper focuses on inferential procedures for functional data. To derive them, a statistical model for the data is needed. In particular, the curves must be viewed as elements of some space. Several modeling frameworks have been proposed, including the semimetric spaces, see Ferraty and Vieu [5], Sobolev spaces, see for example, Li and Hsing [32], and more general Besov spaces, see for example, Abramovich and Angelini [33] and Pensky and Sapatinas [34]. Sobolev spaces are Hilbert spaces in which derivatives enter into the definition of the inner product. They are used to study procedures which involve smoothing by penalizing functions with large integrated derivatives of higher order (typically order two). Besov spaces have been used to study wavelet expansions of functions, they are typically Banach spaces. In this paper, we consider only the simplest framework in which the curves are assumed to belong to the space of square integrable functions on , and, to be more specific, we assume that . All results we state in the following that involve inner products hold in an arbitrary separable Hilbert space. However, when we use integrals and functions with an argument like or , the formulas assume that the Hilbert space is . This space is sufficient to describe procedures based on functional means and variances; most inferential procedures for scalar data use only means and variances. The corresponding functional framework is introduced in Section 2.1. In Section 2.2, we define the functional principal components which provide a dimension reduction beyond that obtained via the basis expansion (1.1); is typically around one hundred, whereas in our applications the number of the functional principal components needed to effectively approximate the data is a single digit number, very often 2, 3, or 4.

In Sections 2.1 and 2.2, we define the required first- and second-order characteristics (functional parameters) of a random function . We cannot start with definitions based on a random sample because for dependent data sample statistics often need to be defined in a different way to ensure that they estimate the corresponding population (theoretical) parameters.

2.1. The Hilbert Space Model

The space is the set of measurable real-valued functions defined on satisfying . The space is a separable Hilbert space with the inner product An integral sign without the limits of integration is meant to denote the integral over the whole interval .

We view a random curve as a random element of equipped with the Borel -algebra. We say that is integrable if . If is integrable, there is a unique function such that for any . It follows that for almost all . The expectation commutes with bounded operators, that is, if and is integrable, then .

If is square integrable, that is, and , the covariance operator of is defined by It is easy to see that

One can show that a bounded operator is a covariance operator of some square integrable random function taking values in if and only if it is symmetric positive-definite and its eigenvalues satisfy . To understand these statements, some background on operators in a Hilbert space is needed. The facts we now state will be used in the remainder of the paper.

2.1.1. Operators in a Hilbert Space

Consider an arbitrary separable Hilbert space with inner product which generates the norm , and denote by the space of bounded (continuous) linear operators on with the norm An operator is said to be compact if there exist two orthonormal bases and , and a real sequence converging to zero, such that The may be assumed positive because one can replace by , if needed. A compact operator is said to be a Hilbert-Schmidt operator if . The space of Hilbert-Schmidt operators is a separable Hilbert space with the scalar product where is an arbitrary orthonormal base, the value of (2.7) does not depend on it. One can show that and .

An operator is said to be symmetric if and positive-definite if (An operator with the last property is sometimes called positive semidefinite, and the term positive-definite is used when for .)

A symmetric positive-definite Hilbert-Schmidt operator admits the decomposition with orthonormal which are the eigenfunctions of , that is, . The can be extended to a basis by adding a complete orthonormal system in the orthogonal complement of the subspace spanned by the original . The in (2.10) can thus be assumed to form a basis, but some may be zero.

An important class of operators in are the integral operators defined by with the real kernel . Such operators are Hilbert-Schmidt if and only if in which case

If and , the integral operator is symmetric and positive-definite, and it follows from (2.10) that

2.2. Functional Principal Components

Suppose is a square integrable random function in . To lighten the notation, assume that . It is clear how to modify all formulas by subtracting the mean function from . The eigenfunctions of the covariance operator defined by (2.3) are called the Functional Principal Components (FPCs). The FPCs , are thus defined by The are orthogonal, and we assume that they are normalized to unit norm. The are defined only up to a sign.

The random function admits the Karhunen-Loéve expansion: and the covariance kernel admits the expansion: The random variables are called the scores of .

An interpretation of the is often based on the following decomposition of variance: Another interpretation is the following. Fix and consider the expected squared error , where are orthonormal functions in , and are scalars. The above error is minimized if and .

3. Functional Time Series

Functional time series (FTSs) were introduced in Section 1, were several examples where given. Using the framework of Section 2.1, we will understand by a FTS a sequence of curves . Notice that is the time index, which typically refers to day or year, and is the time within that unit. In most textbooks and papers on time series analysis, the index denotes time, but in FDA it denotes the argument of a function, so we use to denote time and to denote the number of available curves; corresponds to the length of a time series.

The main idea behind functional time series modeling is that in many situations the time record can be split into natural intervals, and instead of modeling periodicity, we treat the curve in each interval as a whole observational unit. There are proponents and opponents of this approach. In several applications, it has been shown that the functional approach yields superior results, see for example, Antoniadis et al. [35], a few examples are also given in Bosq [9]. But is clear that in many cases the more standard time series techniques are competitive, if not superior. Functional techniques can be expected to be definitely more useful if the daily or annual curves are not defined on a grid of equidistant time points. An interesting example of such a setting is studied by Liebl [36] who considers curves defined on different intervals on different days. The daily domain interval is defined by the minimum, , and maximum, , of the electricity demand on day . For the demand , is the price of electricity.

Most methods described in this section are based on the FPC. In certain applications, different orthonormal systems, similar in spirit to the FPCs but more optimal for a specific application can be used. In addition to the predictive factors of Kargin and Onatski [37], we note the factors introduced by Bathia et al. [38] who considered the problem of estimating the dimension assuming that the functional time series takes values in a finite dimensional subspace of .

The remainder of this section is organized as follows. We first review the autoregressive functional model which has been, by far, most extensively used and studied. We then turn to more general ways of describing temporal dependence between functions. We conclude with a discussion of some recent developments.

A reader seeking a good reference to the fundamental concepts of time series analysis is referred to Brockwell and Davis [39] or Shumway and Stoffer [40]. More advanced treatments of the theory of linear time series are given in Brockwell and Davis [41] and Anderson [42].

3.1. Functional Autoregressive Process

The theory of autoregressive and more general linear processes in Hilbert and Banach spaces is developed in the monograph of Bosq [9], so here we focus only on some recent research. A sequence of mean zero elements of follows a functional AR(1) model if where and is a sequence of iid mean zero errors in satisfying .

3.1.1. Estimation

A popular approach to the estimation of the autoregressive operator involves the FPCs, see Section 2.2. Observe first that Define the lag-1 autocovariance operator by and denote with superscript the adjoint operator. Then, because, by a direct verification, , that is, We would like to obtain an estimate of by using a finite sample version of the relation . The operator does not however have a bounded inverse on the whole of . To see it, recall that admits representation (2.10), which implies that , where This makes it difficult to estimate the bounded operator using the relation . A practical solution is to use only the first most important EFPC’s , and to define The operator is defined on the whole of , and it is bounded if for . By judiciously choosing , we hope to find a balance between retaining the relevant information in the sample, and the danger of working with the reciprocals of small eigenvalues . In formula (3.6), the are the empirical (or sample) FPCs and the are are their eigenvalues, both defined by the relation , where defines the empirical covariance operator. Direct calculations, see Chapter 13 of Horváth and Kokoszka [1], lead to the estimator If the operator is a kernel operator defined by , then the estimator (3.8) is a kernel operator with the kernel

In some simulated cases, the surfaces may not closely resemble the true surface , see Chapter 13 of Horváth and Kokoszka [1].

3.1.2. Prediction

The most direct use of the functional AR(1) model is to predict the curve , and the most obvious method is to use the predictor , where is the estimator (3.8). For kernel operators, this predictor is given explicitly by where Kargin and Onatski [37] proposed a more sophisticated method that uses the so called predictive factors rather than the FPCs. Predictive factors are interesting in that they are functions which can be used to expand the curves, very much like the FPCs are used, but they define directions in the space which are most relevant for prediction. Using a finite sample implementation, Didericksen et al. [43] found that it does not lead to more accurate predictions. In general, predicted curves that use formula (3.10) tend to be closer to the mean curve and smoother than the actual observations. These predictions are however significantly better than just using the mean curve. A more detailed study is given in Didericksen et al. [43] and in Chapter 13 of Horváth and Kokoszka [1].

3.1.3. Order Determination

A generalization of the functional AR(1) model (3.1) is the FAR() model We now use instead of , because the letter will be used below to denote a different quantity. The work of Kokoszka and Reimherr [44] shows how to determine the order , and how to estimate the operators . The idea is to write (3.12) as a fully functional linear model. We start by expressing as an integral over the interval . Setting , a change of variables yields Denoting by the indicator function of the interval , we obtain Next we define Setting , we have where is an integral Hilbert-Schmidt operator with the kernel , that is,

Thus, if we can estimate , then we can estimate each of the . An estimator of can be constructed by any method valid for the fully functional linear model, for example using the basis or the FPCs expansions, see Chapter 8 of Horváth and Kokoszka [1].

Determining the order also uses representation (3.17). The FAR() model will be rejected in favor of FAR() if an estimate of is large in an appropriate sense. Kokoszka and Reimherr [44] developed a multistage procedures based on this idea, but it is too complex to be described here.

3.2. Approximable Functional Time Series

For many functional time series, it is not clear what specific model they follow, and for many statistical procedures, it is not necessary to assume a specific model. It is however important to know what the effect of temporal dependence on a given procedure is. In this section, we describe a very general notion of dependence, which is convenient to use in the framework of functional data. We restrict ourselves to this framework. Several related useful results, including a functional (in ) central limit theorem, were established by Aue et al. [45, 46] for vector-valued time series. Sharp invariance principles and additional references are given in Berkes et al. [47]. The exposition that follows is based on Hörmann and Kokoszka [48].

For , we denote by the space of (classes of) real-valued random variables such that . Further, we let be the space of valued random functions such that In this section, we use to denote the function space to avoid confusion with the space of scalar random variables.

Definition 3.1. A sequence is called --approximable if each admits the representation where the are iid elements taking values in a measurable space , and is a measurable function . Moreover we assume that if is an independent copy of defined on the same probability space, then letting we have

Definition 3.1 implies that is strictly stationary. It is clear from the representation of and that , so that condition (3.21) could be formulated solely in terms of and the approximations . Obviously the sequence as defined in (3.20) is not -dependent. To this end we need to define for each an independent copy of (this can always be achieved by enlarging the probability space) which is then used instead of to construct , that is, we set We call this method the coupling construction. Since this modification lets condition (3.21) unchanged, we will assume from now on that the are defined by (3.22). Then, for each , the sequences are strictly stationary and -dependent, and each is equal in distribution to .

The following example gives a good feel what Definition 3.1 means.

Example 3.2 (functional autoregressive process). Suppose that satisfies . Let be iid with mean zero. Then there is a unique stationary sequence of random elements such that The AR(1) sequence (3.23) admits the expansion , where is the th iterate of the operator . We thus set . It is easy to verify that for every in , . Since , it follows that . By assumption and therefore , so condition (3.21) holds with , as long as .

Several other models are --approximable, examples are given in Chapter 16 of Horváth and Kokoszka [1]. But we emphasize that the main role of --approximability is to provide a convenient nonparametric framework for quantifying temporal dependence of curves. In this sense it is similar to various mixing conditions, but there is no obvious relationship between it and the traditional mixing conditions, as defined by Doukhan [49] or Bradley [50]. It is not related to the conditions of Doukhan and Louhichi [51] either. It is similar to the conditions of Wu [52, 53], in that it assumes that the observed functions are nonlinear moving averages (Bernoulli shifts) of iid unobservable errors.

Recently Lian [54] extended the concept of --approximability by replacing the norm by a more general Orlicz norm. If is a convex function on with , then we define If , then . The remaining elements of Definition 3.1 are extended without difficulty.

3.2.1. Convergence of the Eigenfunctions and Eigenvalues

An important result valid for --approximable FTSs is that the empirical eigenfunctions and eigenvalues converge at the same rate as for the iid functions. This property allows us to extend many results established for functional random samples to FTSs. We state this result in Theorem 3.3, in which and are the eigenelements of the empirical covariance operator (3.7).

Theorem 3.3. Suppose that is an --approximable sequence such that the largest eigenvalues of its covariance operator are positive and distinct. Then, for ,

Kokoszka and Reimherr [55] showed that under the assumptions of Theorem 3.3 the are asymptotically normal. For linear functional series this asymptotic normality is implicit in the work of Mas [56].

3.2.2. The Long-Run Variance

A central concept in time series analysis is the long run variance (LRV) which replaces the variance in many well-known formulas for valid for iid observations. Let be a scalar stationary sequence. Its long run variance is defined as , where , provided this series is absolutely convergent. In that case . The concept of the LRV is extended to stationary random vectors. It turns out that --approximability implies that the LRV of exists and can be consistently estimated for vector-valued sequences. Details are given in Chapter 16 of Horváth and Kokoszka [1]. In the functional context, it is more natural to work with long-run variance kernels. The following results were established by Horváth et al. [57].

Theorem 3.4. Suppose that is an --approximable sequence of functions in with . Then the infinite sum converges in (hence is square integrable). Moreover, where is a mean zero Gaussian element of with the covariance function .

To apply Theorem 3.4, we must estimate the kernel . We now turn to this objective. Let be a kernel (weight function) defined on the line and satisfying the following conditions: Condition (3.28) is assumed only to simplify the proofs, a sufficiently fast decay could be assumed instead. Next we define the empirical (sample) correlation functions where . The estimator for is given by where is the smoothing bandwidth satisfying

Theorem 3.5. Suppose that the functional time series is --approximable and Then, under the conditions on the kernel and the bandwidth stated above,

3.2.3. Self-Normalization

Estimation of the LRV, even for scalar time series, is difficult due to the selection of the bandwidth . Data-driven methods exist, most notably those based on the theory of Andrews [58], but they do postulate a specific form of dependence. Moreover, some tests in which the LRV is estimated by such data driven procedures suffer from the problem of nonmonotonic power, see Shao and Zhang [59]. Block bootstrap and subsampling offer useful tools for solving many problems of time series analysis, but they again depend on bandwidth type block sizes whose selection is not obvious. Even though some recommendations for the selections of such block sizes exist, see for example, Bühlmann and Künsch [60], Politis et al. [61], Bickel and Sakov [62], this remains a difficult practical problem. An improved approach has recently been proposed by Shao and Politis [63]. A different idea, the self-normalization, was proposed by Shao [64, 65]. We explain it now briefly using a simple example involving a scalar time series.

Suppose that is a stationary time series such that in the Skorokhod space. The parameter is the LRV: Set Then, (3.34) implies

To see why (3.37) holds, set and observe that so that Next, observe that Consequently, The convergences in (3.40) and (3.42) are joint, so (3.37) follows.

The key point is the cancelation of when (3.40) is divided by (3.42). Relation (3.42) shows that is an inconsistent estimator of . The distribution of the right-hand side of (3.37) can however be simulated, and the critical values can be obtained with arbitrary precision. Relation (3.37) can be used to construct a confidence interval for without estimating the LRV. Such a construction does not require the selection of a bandwidth parameter in the kernel estimates of .

Zhang et al. [66] extend the idea of self-normalization to the detection of change point in FTSs. A normalization analogous to is not suitable for the change point problem, see Shao and Zhang [59]. It must be modified to take into account a possible change point. The details are too complex to be discussed here. The change point problem is discussed in Section 3.3.

3.3. Other Research Directions
3.3.1. Two-Sample Problems

In a two-sample problem, we consider two samples and , which may be realizations of two functional time series, or a single series, but taken over different time intervals. The statistical problem consists in testing if some specified characteristics of the samples are equal. To illustrate, suppose we want to test if the mean functions are equal. We assume the model We wish to test the null hypothesis against the alternative that is false. We can also test for the equality of the covariance operators, or any other quantities that might be of interest. In the FDA setting, testing for the equality of means, the FPCs, and the covariance operators has received a fair deal of attention.

An important contribution has been made by Benko et al. [67] who developed bootstrap procedures for testing the equality of mean functions, the FPCs, and the eigenspaces spanned by them. Horváth et al. [57] developed asymptotic tests for testing the equality of mean functions. Laukaitis and Račkauskas [68] considered the model , with innovations and group means , and test . Other related contributions are Cuevas et al. [69], Delicado [70], and Ferraty et al. [71].

Regarding testing the equality of the covariance operators, Panaretos et al. [72] developed a test assuming the data are iid and Gaussian. Their work was extended by Fremdt et al. [73] and Kraus and Panaretos [74] to non-Gaussian settings. Horváth et al. [75] studied two sample problems for functional regression models.

3.3.2. The Change Point Problem

When curves form a time series, it is typically assumed that, possibly after some transformation, they have the same distribution in . If some aspects of this distribution change, inference will become invalid. For example, if a mean changes at some time point, the sample mean will not estimate any population parameter. Change point analysis is a very broad field of statistics with applications in industrial quality control, study of economic, financial and environmental time series, and many other areas. Many monographs are available, including Brodsky and Darkhovsky [76], Basseville and Nikifirov [77], Csörgő and Horváth [78], Chen and Gupta [79], and Basseville et al. [80].

The idea of change point analysis is simple, and we explain it in the functional context using the change in mean function as an example. We observe functions , and we test the null hypothesis The simplest alternative is that of a single change point: where the has the same distribution with mean zero, and .

Note that under , we do not specify the value of the common mean, and under we do not specify nor . It is also important to distinguish between a change point problem and the problem of testing for the equality of means. In the latter setting, it is known which population or group each observation belongs to. In the change point setting, we do not have any a priori partition of the data into several sets with possibly different means. The change can occur at any point, and we want to test if it occurs or not, and, if it does, to estimate the point of change.

The simplest, and in many settings most effective, change point detection procedures are based on cumulative sums (CUSUM procedures). In the above setting, denote If the mean is constant, the difference is small for all and all . However, can become large due to chance variability if is close to or to . It is therefore usual to work with the sequence in which the variability at the end points is attenuated by a parabolic weight function. If the mean changes, the difference is large for some values of and of . Since the observations are in an infinite dimensional domain, projections on the FPCs are used to construct a test statistic and an estimator of a change point. The details are explained in Berkes et al. [81] who assume that the are independent; an extension to approximable sequences is developed in Hörmann and Kokoszka [48]. Theoretical properties of the estimator of a change point are studied in Aue et al. [45, 46].

There have been several extensions to more complex settings. Horváth et al. [82] consider the change point in the covariance operator of the FAR(1) process. Zhang et al. [66] use self-normalization to deal with temporal dependence and study change points in mean and in the covariance structure. Aston and Kirch [83] consider change point detection in a dependent sequence of functions assuming that the alternative is not a single change point but a change interval, the so-called epidemic alternative. The problem of detecting a change in the mean of a sequence of Banach-space valued random elements under an epidemic alternative is theoretically studied by Račkauskas and Suquet [84] who propose a statistic based on increasingly fine dyadic partitions of the index interval, and derive its limit, which is nonstandard.

3.3.3. Other Temporal Dependence Structures

The most extensively used and studied model for FTS is the FAR(1) model. Some nonlinear ARCH-type models are introduced in Hörmann et al. [85]. Similar models are used by Kokoszka and Reimherr [86] in a simulation study. Kokoszka and Reimherr [86] study the predictability (not the prediction) of the cumulative intraday returns (CIDRs) defined by where is the price of a financial asset at time on day .

Kokoszka et al. [87], see also Kokoszka and Zhang [88], study the dependence of the CIDRs defined above on other factors, including the CIDRs on market indexes and energy futures. They consider a factor model of the form: A different class of functional factor models was recently proposed by Hays et al. [89]. These models, introduced to forecast yield curves, say , are of the form The factors do not depend on and are orthonormal functions to be estimated. The dynamics are in the coefficients which are assumed to follow Gaussian autoregressive processes (the are also Gaussian). Model (3.50) could be termed a statistical factor model. It is designed for temporal forecasting, while model (3.49) is designed for regression type prediction in which that correlation structure of factors plays a major role.

Gabrys et al. [90] developed a test to determine if the regressors in the fully functional linear model are correlated. Benhenni et al. [91] consider a nonparametric regression in which are functions and (and ) scalars. They allow the to have either short or long memory.

A different framework is developed by Battey and Sancetta [92] who assume that the FTS is a Markov process and are concerned with the estimation of the conditional probabilities , where and are deterministic functions.

4. Geostatistical Functional Data

An interesting class of functional data are curves observed at several spatial locations, as already mentioned in Section 1. For example, can be the concentration of a pollutant at location and measured at time . In this section, we thus assume that the data consists of curves . The analysis of such a data structure must draw heavily on the concepts and tools of spatial statistics. In fact, the random field defined above is a special case of a spatiotemporal process. In this paper, we cannot provide the details of the tools of spatial statistics that we use. The required background, and much more, is given, for example, in Schabenberger and Gotway [93], Gelfand et al. [94], and Sherman [95]. We will however introduce the concepts to the extend needed to get a good idea of the main problems and solutions. This is done in Section 4.1. Then, in Section 4.2, we consider the fundamental problem of the estimation of the mean function. Section 4.3 points out to research on functional kriging. We note that there are functional data structures beyond those discussed here, we refer to Delicado et al. [96], who also discuss Bayesian approaches.

4.1. Basic Concepts of Spatial Statistics

A sample of spatial data is . (The are now scalars.) The spatial domain is typically a subset of the two-dimensional plane or sphere. The observed value is viewed as a realization of a random variable. Just as in time series analysis, stationary random fields play a fundamental role in modeling spatial data. To define arbitrary shifts, we must assume that is either the whole Euclidean space , or the whole sphere. The random field is then strictly stationary if for any points and any shift . The covariance function is then defined by If depends only on the length of , we say that the random field is isotropic. The covariance function of an isotropic random field is typically parametrized as The function is called the correlation function. For example, the exponential correlation function is given by . The main idea of spatial modeling is that some sort of empirical estimate of is first obtained, which is typically a very irregular, noisy function. Then a suitable parametric covariance function is fitted to ensure that the resulting covariance matrix is positive definite. There are many approaches and nontrivial issues involved in this process.

4.2. Estimation of the Mean Function and of the FPCs

We assume that the functions are observations of a stationary and isotropic spatial random field taking values in the space . In particular, the mean function does not depend on the location . The estimation of this mean function is the most important first step of the inference for spatially distributed functional data. If the curves are iid or form a time series, the usual sample mean is used. But if the curves are available at spatial locations, the curves at locations which are close to each other should be given smaller weights because they are very similar and contribute roughly the same information. This suggests that could be estimated by with the weights defined to minimize subject to the condition . Using the method of the Lagrange multiplier, it is not difficult to find a closed form expression for the weights . This expression involves unknown covariances Gromenko et al. [97] proposed an iterative procedure for the estimation of the weights and the covariances . They showed that the weighted average (4.4) is a significantly better estimator than the usual sample average. Hörmann and Kokoszka [98] showed that the sample average is not a consistent estimator of if there are clusters of points. The clusters were defined using a modification of a metric introduced by Park et al. [99]. Gromenko and Kokoszka [100] refined the approach of Gromenko et al. [97] and applied it to testing the hypothesis that the means in the samples of curves over two disjoint regions are different.

The approaches outlined above use parametric model fitting to obtain the covariances . It is well-known that if the number of the spatial locations is small, less than , then the optimization needed to fit a parametric model may fail or the fit may be poor. Gromenko and Kokoszka [101] addressed this issue by developing a nonparametric approach. They showed that in relevant settings (all with ), the nonparametric approach is superior to the parametric approach even excluding the cases when the former does not converge. The method of Gromenko and Kokoszka [101] is a useable refinement and extension of the ideas of Hall et al. [102] and Hall and Patil [103].

4.2.1. Estimation of the FPCs

If the random functions are realizations of a stationary -valued random field, then they have the same FPCs . Estimation of the in the spatial setting involves issues similar to those related in the estimation of the mean function , but the notation and approaches are more complex. They are discussed in Gromenko et al. [97] and Hörmann and Kokoszka [98].

4.3. Kriging and Other Research

A very important problem is to predict a curve at a specified location using the curves at available locations. This problem was addressed in Nerini et al. [104] and Giraldo [105]. Spatial prediction of this type is called kriging. Theoretical background to kriging of scalar fields is given in Stein [106]. The book of Wackernagel [107] gives an accessible introduction to kriging emphasizing multidimensional observations, a setting related to kriging of curves. Just as in the case of estimating the mean function, there are several approaches to kriging. The approach advocated by Giraldo et al. [105] is “most functional” in that it treats the curves as whole data objects. Similarly to (4.4), it predicts the unobserved function by a linear combination with the weights minimizing The estimation of the weights requires some work.

Giraldo et al. [108] consider the problem of determining clusters in in spatially correlated functional data. Härdle and Osipienko [109] show how spatial analysis of functional data can be used to calculate more accurate prices of weather derivatives.

An emerging important class of functional models are those with hierarchical structure and correlation at some levels, similar to spatial correlations discussed in this Section. Such models have applications in the analysis of medical experiments in which tissue samples are taken at several locations in an organ of a subject. Staicu et al. [110] propose fact methods for the estimation of such models.

5. Summary and Future Directions

We have reviewed recent developments in the analysis of dependent functions. We considered two data structures: functional time series and spatially distributed functions. Functional time series are collections of curves , where can be interpreted as time, most often day or year. The central point in the analysis and modeling of such data is to take into account the dependence of the curves . Spatially distributed functions, or geostatistical functional data, consist of curves available at locations . The main issue is to take into account the uneven distribution in space of the points .

Inference for data of both types assumes that the data are stationary, possibly after some transformation. At present there are no suitable tests of stationarity for such functional data. Second-order stationarity of FTS could potentially be tested using spectral methods. For scalar time series the relevant references are Grenander and Rosenblatt [111], Granger and Hatanaka [112], Priestley et al. [113], and Dwivedi and Subba Rao [114]. The recent work of Panaretos and Tavakoli [115] develops the fundamentals of the spectral theory of FTS.

In many settings, a hybrid data of the form can be considered. It could, for example, be maximum temperature or the count of flu cases on day of year at location . Such a data structure could be useful to study long term trends or changes in the annual pattern of a variable of interest over a region. There has not been much work on functional data of this form; Odei et al. [116] consider Bayesian modeling of snow melt curves. It their setting, is the amount of snow melt water on day of year at a location in Utah.

For long records of geophysical, weather, or environmental data, a serious problem are long segments of missing observations. For example, if is the rainfall on day at location , there may be whole years of missing data at certain locations (when a station is closed), and these segments will be different at different locations. An important challenge is to develop useful approaches that allow us to pool together information from curves at all locations when some of them have long missing segments. This problem is somewhat related to the problem of dealing with sparse functional data mentioned in Section 1.1, but it adds the complication of spatial dependence.

The functions discussed in this paper, whether those observed consecutively over time or at spatial locations, are assumed to be smooth, so that methods relying of basis and FPCs expansions can be applied. Some functions do not fall into this category and may exhibit sharp spikes and flat regions. There has not been much work on the time series or spatial fields of functions of this type. Timmermans and Von Sachs [117] propose an exploratory tool that aims at detecting the closeness of curves whose significant sharp features might not be well aligned.

The study of extremes involves work with point processes. For example could be a point process of threshold exceedances in year at location . No systematic modeling framework for such point process valued time series is available. The estimation of exceedance probabilities is studies in Draghicescu and Ignaccolo [118].

In summary, the study of dependent functional data has reached a level of maturity that makes it a useful subfield of FDA, but many important problems remain to be addressed. It is hoped that this paper has provided a useful introduction into this area.

Acknowledgments

This work was partially completed at the Institute for Mathematical Sciences, National University of Singapore, 2012. The research was partially supported by the NSF Grant DMS-0931948.