From the Kalman Filter to the Particle Filter: A Geometrical Perspective of the Curse of Dimensionality

Pannekoucke, Olivier; Cébron, Pierrick; Oger, Niels; Arbogast, Philippe

doi:https://doi.org/10.1155/2016/9372786

Advances in Meteorology

On this page

Abstract Introduction Conclusion Appendix Acknowledgments References Copyright Related Articles

Research Article | Open Access

Volume 2016 | Article ID 9372786 | https://doi.org/10.1155/2016/9372786

From the Kalman Filter to the Particle Filter: A Geometrical Perspective of the Curse of Dimensionality

Olivier Pannekoucke,^1,2Pierrick Cébron,¹Niels Oger,¹and Philippe Arbogast¹

Academic Editor: James Cleverly

Received28 Jun 2016

Accepted20 Oct 2016

Published30 Nov 2016

Abstract

The aim of this contribution is to provide a description of the difference between Kalman filter and particle filter when the state space is of high dimension. In the Gaussian framework, KF and PF give the same theoretical result. However, in high dimension and using finite sampling for the Gaussian distribution, the PF is not able to reproduce the solution produced by the KF. This discrepancy is highlighted from the convergence property of the Gaussian law toward a hypersphere: in high dimension, any finite sample of a Gaussian law lies within a hypersphere centered in the mean of the Gaussian law and of radius square-root of the trace of the covariance matrix. This concentration of probability suggests the use of norm as a criterium that discriminates whether a forecast sample can be compatible or not with a given analysis state. The contribution illustrates important characteristics that have to be considered for the high dimension but does not introduce a new approach to face the curse of dimensionality.

1. Introduction

From the work of Bengtsson et al. [1] and Snyder et al. [2], the particle filter is known to suffer from the “curse of dimensionality” [3, 4], in the sense that when the problem is in high dimension, the size of the particle ensemble should be exponentially large leading to the failure of direct Monte-Carlo strategies.

More precisely, in the particle filter, a weight is computed for each particle from the observational error distribution. This weight measures the proximity of the particle to a given observations set. If the ensemble size does not follow an exponential increase with the dimension, then the maximum weight is systematically close to 1, meaning that only one particle is compatible with the observations (or at least the less far from the observation considering the other particles): this leads to the particle degeneracy [5]. While the observation error distribution is often assumed Gaussian in data assimilation, a change of observation error distribution by a larger tail distribution may be introduced to limit the particle degeneracy in realistic application [6]. But this replacement should not be used for theoretical study where it would introduce incoherence between the way the observations are supposed to be and the way they are assimilated. Moreover, Snyder et al. [2] have shown that the particle degeneracy still occurs with larger tail distributions, for example, for the Cauchy distribution.

In the demonstration proposed by Snyder et al. [2], to study the particle filter, the curse of dimensionality is described in terms of weight, related to the observational space. Hence, it is still difficult to understand what happens in the state space, and also what is the real difference between the ensemble version of the Kalman equations (EnKF), introduced by Evensen [7], and the particle filter approach (PF), introduced by Gordon et al. [8]. This question is not to know whether the EnKF converges toward the PF in the general framework, where known results are existing for this issue: (a) the PF is known to converge toward the nonlinear filter in the limit of large number of particles in interaction, Del Moral [9]; (b) the EnKF is known to converge toward a different solution than the nonlinear filter [10, 11]. How can we feel the paradox that the two algorithms, the EnKF and the PF under Gaussian assumption, should deliver the same conclusion while, in the practical high dimension case, they do not?

In the present contribution, important characteristics are illustrated that have to be considered for the high dimension, but no new approach is introduced to face the curse of dimensionality. This work investigates another point of view that relies on the multidimensional spheres or hyperspheres. A similar but different approach has been considered by Chorin and Morzfeld [12]. This perspective, mentioned in the conclusion of Snyder et al. [2], helps to understand in a direct way the effect of the dimension on the difference between the Kalman filter and the particle approximation of the Bayes rule under Gaussian assumption. The main result used in this work is that a normalized Gaussian law in high dimension is not a simple extension of higher dimension of the usual scatter plot in 2d but converges toward the uniform law on a sphere in dimension, a phenomenon known as the Poincaré Lemma [13]. This phenomenon occurs even from the “high dimension” that is very small compared with the degrees of freedom encountered in geophysical applications: the constraints due to the dimension are quite strong for this area.

The understanding of the behaviour in high dimension is required when using alternative to classical data assimilation to update probabilistic information contained within an ensemble. Such an update can be motivated in order to increase the ensemble size by merging lagged ensemble [14, 15] with the question of how to merge ensemble from different forecast times [16]. This can also be due to the real time constraints: ensemble prediction system can now rely on an analysis ensemble [17], but in practice there can be an important delay between the operational analysis production (that relies on Kalman filter formalism) and the ensemble analysis production. As a result an analysis state exists that could be used to update the last forecast ensemble available, for example, by using the Particle filter approach [18, 19] where an adaptation of the metric in the computation of the weight is also introduced in the application of the particle filter strategy. This kind of situation is a motivation to clarify the differences between Kalman filter and particle filter, at least under Gaussian assumption.

To tackle this issue we first recall the behaviour of Gaussian distribution in high dimension in Section 2. In Section 3, we review the description of the filtering theory under Gaussian assumption, then detail the asymptotic consequences of the high dimension, and provide the constraints implied onto the background and the analysis distribution that constitutes the difference between the EnKF and the PF. The conclusions are reported in Section 4.

2. Behaviour of Gaussian Distribution in High Dimension

2.1. Concentration of Gaussian Law in High Dimension

Human intuition of the high dimension behaviour is often an extrapolation of the low dimension experiences. This leads us to assume that graphical representations in low dimensions are also verified in high dimensions. This is particularly true for Gaussian distribution where the similitude of scatter plots in dimensions lower than 3 invites us to speculate that the distribution in high dimensions should not be so different (see Figure 1). But actually this is completely wrong and leads to a misleading interpretation of what a Gaussian distribution is really, despite its intensive use in large systems as encountered in geophysical sciences and especially in data assimilation.

If is a random Gaussian vector with zero mean and covariance matrix (in data assimilation this is the statistical model for the background error), denoted by , then the asymptotic distribution of the norm is given by (see Appendix A)where denotes de trace operator and where is the effective dimension that provides a quantitative measure of the dimensionality [20].

This simple formula implies a deep contradiction between the natural intuition and the reality of Gaussian laws. But at the same time it provides the correct intuition as now detailed.

2.2. Asymptotic Convergence of Normal Law toward a Spherical Shell

For the particular case where is a normal law (reduced and centered Gaussian law), the effective dimension is and then This distribution means that, in high dimension, all samples of are concentrated within a spherical shell of radius and thickness . Note that the thickness of the shell is independent of the dimension for the norm (while this is not true for ).

As a simplified representation, Figure 2 mimics the behaviour of the Gaussian sample as the dimension increases. For each panel, a sample of the normal law is represented by the point , where is the argument of the complex number formed from the first two components of (here ). For this representation verifies , that is, the usual dispersion of Gaussian sample (see Figure 2(a)). But, for larger values of , the natural intuition fails to find the spherical shell illustrated for Figures 2(b)–2(d). The apparent decreasing, for increasing , of the theoretically constant thickness is due to the scaling used for the representation and corresponds to the relative error of the fluctuation over the radius: .

(a)

(b)

(c)

(d)

It is impressive to realise the absence of any sample within the spherical shell. The position of a sample seems binary: either it is at the right distance in the state space, or this is not really a sample of the Gaussian that was supposed to be sampled.

This simplified illustration provides us with a graphical tool that supports our intuition of what are normal Gaussian laws in high dimensions. Now we develop what happens for general Gaussian law by taking into account the possible correlation between components of the random vector.

2.3. Asymptotic Convergence of Gaussian Law toward a Spherical Shell

To illustrate the behaviour of Gaussian distribution in high dimension one first needs to construct a nontrivial covariance matrix and then to generate samples of .

In order to construct a correlation matrix , similar to the one encountered in data assimilation, we first consider an earth great circle of radius km, discretized with grid points with km. Then, following the background covariance matrix modelling based on the diffusion equation [21, 22], which produces Gaussian correlation functions, a square-root matrix is specified as such as withwhere is the correlation length-scale [23]. The resulting matrix is a covariance matrix where the diagonal terms representing the variance are all set to 1 provided that . This theoretical one-dimensional setting allows generating and computing all the quantities required from now.

As specified above, is a correlation matrix and some correlation functions are represented, in Figure 3(a), for the varying length-scale values km: the larger the length-scale value, the broader the correlation function. The effective dimension (Figure 3(b)) illustrates that the larger the length-scale value, the lower the number of significative principal direction of : the effective dimension is equal to (i.e., ) for km, while it is below for length-scale larger than km.

(a)

(b)

In practice, samples of can be obtained as the transformation of samples of random vector following a normal law .

Similarly to the schematic representation of Figure 2, we illustrate the distribution of samples of in Figure 4. For the short length-scale km, the grid points are almost decorrelated and the samples are uniformly distributed within a spherical shell of radius and thickness , since for this length-scale (see Figure 3). When the length-scale increases, the correlation between the first two components and of increases. For km, the correlation is higher than 0.9 and the repartition of sampling points within the spherical shell becomes heterogeneous, with a concentration along the first principal diagonal. This heterogeneity is even more pronounced for km, and it appears to be concentrated onto a subspace of an equatorial plane: this corresponds to the concentration of the sampling points within a spherical shell of the same radius but within a subspace of lower dimension (the effective dimension is the quantitative approximation of this dimension).

(a)

(b)

(c)

(d)

While the radius of the spherical shell remains at the fixed value , its thickness increases. This is the consequence of the effective dimension diminution as understood by the asymptotic variance formula : if decreases, then increases leading to an increase of the variance.

Hence, as the effective dimension decreases, the Gaussian distribution converges toward another Gaussian law of the same total variance but in a lower dimensional linear subspace. The graphical representation of the Gaussian law as a spherical shell is still verified.

Note that, in the continuous limit where the dimension tends to infinity (with vectors tending to functions), the above discussion remains true until the covariance matrix converges toward a covariance operator of finite trace. In particular, the continuous limit of the reduced and normalized Gaussian vector, , does not exist since it would imply an infinite trace operator (the radius of its hypersphere increases to infinity). But the continuous limit of the correlated Gaussian vector exists (the trace of the limit operator is finite): for , the Fourier spectrum, which is also Gaussian, is summable. Figure 4(b) helps to interpret the finiteness of the covariance operator trace: all the pertinent information about the statistics is contained within a subspace of finite dimension in the functional space.

From these results, the consequences of data assimilation are now described.

3. Consequences of the Concentration of the Measure in Data Assimilation

Data assimilation faces a paradox, recognized as resulting from the curse of dimensionality [2]: while the particle filter and the ensemble Kalman filter are supposed to be two possible algorithms for the discretization of the nonlinear filtering under Gaussian assumption, the particle filter fails to produce the a posteriori distribution. This issue and the consequences of the concentration of Gaussian law are now explored.

3.1. Nonlinear Filtering

Data assimilation aims to provide the probability of the real state of a system knowing the past/present forecast and observations. The general framework formalism is, without any assumption of linearity of dynamics or Gaussianity of statistics, reduced to the Bayes rulewhere is the (density of) probability distribution to find the true state at time in the vicinity of ; is the (density of) probability distribution to measure at time when the true state is known. This corresponds to the analysis step (considering the Bayesian data assimilation setting). The vector space of () is the state space (observational space), denoted by () where () is the dimension.

The forecast step transports the conditional distribution from time to time , by using the model propagator . Since we are interested in comparing the respective analysis procedure proposed by the EnKF and by the PF, the forecast step is no more detailed.

3.2. Gaussian Assumption and Concentration Consequences

Under the usual additional Gaussianity assumption, the two distributions and take the form where is the background state at time . is the background error modelized as a Gaussian random vector of zero mean and covariance matrix and with ; andwhere is the observational operator that maps the state space into the observational space. In general, an observational operator is a nonlinear operator, but, here, it is assumed to be linear. is the observation error modelized as a random Gaussian vector of zero mean and covariance matrix . Note that and are random matrices that depend on the real state trajectory and the observations assimilated. It results from the Gaussian assumption that is also Gaussian, written aswhere is the analysis state. is the analysis error that is a Gaussian random vector of zero mean and covariance matrix with where is the gain matrix.

Direct consequences of the concentration of Gaussian law in high dimensions are that according to (1)

Other consequences of the concentration property may be obtained, for example, for the observational variance tuning (see Appendix B).

Now we consider the discretization of the nonlinear filtering as it should be observed when Gaussian assumptions are verified.

3.3. Discretization of the Nonlinear Filtering

Under Gaussian assumption, a discrete version of the prior distribution is given by an ensemble of samples , such thatwhere are independent sample of the Gaussian law . The empirical distribution is given by . The convergence of the empirical distribution toward the distribution has to be understood as the weak convergence, that is, for all being of polynomial type, , where and .

From the asymptotic concentration of Gaussian law in high dimension, it results that

A numerical experiment supports this asymptotic distribution as illustrated in Figure 5. At first order, the histogram of the normalized quantity fits very well the theoretical normal law, whatever the length-scale . As a result, the background samples are all distributed within a spherical shell of radius and of thickness . One can notice a slight asymmetry as the length-scale increases (see Figures 5(c) and 5(d)). This is related to the asymmetry of the chi-squared distribution which is weakly approximated by a Gaussian distribution (see Appendix A). This approximation is not accurate for the small dimension sizes as encountered for km and km, where the effective dimensions are of orders and (see Figure 3(b)).

(a)

(b)

(c)

(d)

Note that (13) is useful to understand the effect of the localization based on the Schur product, often used in EnKF to limit the spurious long distance correlations resulting from the sampling noise in the background covariance matrix estimation [24]. In this framework, a localized background covariance matrix is built from the element-wise product, , of a correlation matrix with the ensemble estimated background covariance matrix ; that is, . Since is a covariance matrix, its diagonal coefficients are all equal to 1; as a result the traces of and are equal. Moreover, since the correlation functions in are chosen as compact support functions, the effective dimensions are different with . Hence, from (13), the variance associated with the localized matrix is smaller. As an illustration from Figure 5, the effect of the localization is equivalent to the transformation from the distribution in Figure 5(d) to the distribution in Figure 5(a). Thereafter, the question of the localization is no more considered.

At this stage, there is no difference between the EnKF and the PF strategies, and the difference takes place in the way the Bayes rule is applied.

3.4. Analysis Step in the EnKF

In this section we do not detail the practical implementation of the EnKF but only the formalism (see, e.g., Houtekamer and Mitchell [24]; Evensen [25] for practical aspects).

For the EnKF, the analysis is computed, and when using a “perturbation of observations” method [26], an ensemble of analysis perturbations is generated according towhere is an ensemble of independent observational errors, randomly generated from the true covariance matrix as with the random normal samples . From computation [26], it follows that The analysis ensemble is then constructed as and then, from the application of asymptotic concentration of Gaussian law,

Similarly to the validation of the asymptotic distribution (13), a numerical experiment supports the asymptotic distribution (17) as illustrated in Figure 6. In this experiment, an observation is placed at every grid points, and the observational covariance matrix is set as , where and with being the dimension of the observational space. The matrix operations (trace, product,…) are computed directly within the numerical setting. At first order, the histogram of the normalized quantity fits very well to the theoretical normal law, whatever the length-scale is. One can again notice a slight asymmetry as the length-scale increases (see Figures 6(c) and 6(d)).

(a)

(b)

(c)

(d)

We observe that the ensemble of analysis samples are all distributed within a spherical shell of radius and of thickness . As expected, this shows that, from the distance view, the covariance matrix of the ensemble is .

The EnKF transforms an ensemble of background samples distributed on a spherical shell of radius into an ensemble of analysis samples distributed on a spherical shell of radius . This is the exact solution of the Bayes rule as supported by the numerical experiment. Hence, the statistics of the two distances appear to be a strong constraint; the strength increases with the dimension of the problem. This is the main property we want to investigate in order to explore the difference between the EnKF and the PF.

The analysis step for the PF is now detailed.

3.5. Analysis Step in the PF

Several strategies exist for the particle discretization of the Bayes rule. However, as far as we know, most of these variants are suffering from the curse of dimensionality, for example, the importance sampling version as described by Snyder [27]. Hence, in this note, only the bootstrap filter is considered.

The analysis step in the PF [5, 6, 8, 9] follows a quite different way as it directly relies on the Bayes rule: for each member one computes the weight from which the a posteriori distribution is deduced as where stands for the Dirac distribution positioned in . For the bootstrap particle filter [8], an ensemble of analyses is generated as random samples of the distribution .

Hence, the difference between the EnKF and the PF is that, for the EnKF, background members are corrected to build analysis members, while, for PF, the analysis ensemble is a resampling of the background ensemble using the a posteriori weight. A natural question is to know when a background sample is an analysis sample of the particle filter. This is now addressed.

3.6. Compatibility Relation and Minimum Ensemble Size

We want to characterize in which case a background sample can be considered as an analysis sample . This is achieved by giving some compatibility relations that are described in the first subsection. This is followed by two subsections that lead to formulating a constraint on the ensemble size.

3.6.1. Compatibility Relations

As shown for the EnKF, all the analysis members are at a given fixed distance from the analysis . More precisely, since the distance asymptotically follows the Gaussian distribution (17), the maximum value that the distance can reach for an ensemble size of is given by (see Appendix C)

From the EnKF equations, , where it results that is the random vector where is the analysis increment. We deduce (Appendix A) that the asymptotic distribution of the norm is the Gaussian lawwhich is validated from numerical simulation in Figure 7. The minimum value that the distance can reach for an ensemble size of is thus (see Appendix C)a quantity that depends on the analysis increment .

(a)

(b)

(c)

(d)

Hence, for a background sample to be likely an analysis sample , the following distance inequality should necessarily be verified:If not, the two sample distributions of and are so different that no background sample can be retained as an element of the analysis samples. Of course, this result only occurs for finite sample distributions. In finite dimension, if the size of ensemble is infinite, then the EnKF and the PF for this Gaussian case provide the same a posteriori distribution, corresponding to and .

This trade-off is now illustrated from a numerical experiment in Figure 8. Each panel compares the histogram of distribution of the distance and of the distance , for a given finite ensemble of size . The vertical solid segment specifies the position of the maximum of magnitude that can be reached by , that is, , while the vertical dashed segment is the position of the minimum of magnitude associated with , that is, . For the length-scale km, the inequality implies almost surely the existence of a background sample that can be considered as being an analysis sample. This means that, within a PF, a point , as (24) is verified, can be observed. However, this is no more the case for higher length-scale values where one observes that .

(a)

(b)

(c)

(d)

The consequence is that the number of samples, , required in order to inverse the inequality is the integer for which the equality applies; that is,Hence, is an exponentially large number, function of the analysis increment . Note that this is also a function of because of (22) and the dependence of on .

The dependence in can be removed considering a climatological framework, as follows.

3.6.2. Climatological Approximation

In order to eliminate the dependence on the analysis increment, an average estimation of the minimum ensemble size can be computed under the additional assumption that the statistics are sample of a climatology, eliminating the bottom index . If () denotes the climatological background (observational) error covariance matrix, then we set and . Hence, the analysis increment of a day is the random Gaussian vector leading, in high dimension, to the asymptotic distributionMoreover, the average values of can be deduced from the computationand they are given byHence, the average value of is reduced to

From general properties of Gaussian law, more than of samples of are within ; another ensemble size can be deduced, which corresponds to an ensemble size which is able to cover the analysis sample with more than of probability; it writes (see Appendix C)This ensemble size may be considered as an upper bound for the maximal size required to sample the analysis distribution with the PF from the weighting and the resampling strategy.

The ensemble size (solid line with diamond) and (solid line with triangle) are represented in Figure 9 as a function of the length-scale. The solid line with no mark represents the ensemble size of . In this experiment, the ensemble size is maximum for medium length-scale values. For the small length-scale km, when the length-scale is very small, the minimal ensemble size is below . This relatively small value is due to the observational network: here only one per three grid points is observed; hence when there is no correlation in the background, the posterior distribution that is provided from the few observations is not so different from the prior distribution. When the length-scale is very large, for example, km, then the effective dimension is small (see Figure 3), and only a relatively small ensemble size is required. The case km is intermediate; the background error correlation implies a coupling between the points while the effective dimension remains important. Note that these results are in accordance with the one presented in Figure 8. follows a similar behaviour, but the gap with is not constant and varies with the length-scale.

For the particular case where , with , these two bounds are given by (this is in accordance with Snyder et al. [2], Figure 2) and , which are exponential quantities depending on the dimension .

The magnitude reached by the minimal ensemble size reminds one of the limits of the Monte-Carlo strategy when using the PF algorithm and the pitfall it presents.

3.6.3. Illustration of the Hyperspheres and Orthogonality Relation

From averaged values of (27) and (29), it is possible to represent the hyperspheres of the distributions; see Figure 10. On all these panels, the + symbol denotes the background and symbol denotes the average position of the analysis state (the average value of is ). The hypersphere representing the typical position of samples of the background distribution is the circle centered in and of radius , denoted by . The typical position of samples of the analysis distribution is the circle with triangle centered in and of radius , denoted by . The circle with diamond represents the hypersphere, , that contains the typical position deduced from the distribution of . Since should be typical of and , all the typical positions of the point view from the point does not lie onto the whole hypersphere but only at the intersection of the two hyperspheres and , that is, a hypersphere contained in the equatorial plane of (here, the hypersphere of intersection, centered in , is reduced to the two points of intersection). In particular, it appears that the analysis increment is orthogonal to most of the background sample error .

(a)

(b)

(c)

(d)

Figure 10

Illustration of the hyperspheres that contain the typical samples of the prior and a posteriori Gaussian distributions valid for the EnKF and the PF in the Gaussian setting, where + denotes the background , denotes the analysis state , the circle without mark denotes the background distribution, the circle with triangle marks denotes the analysis distribution, and the circle with diamond marks is centered on the analysis state with a radius of the typical distance between the analysis state and the background samples. These hyperspheres are reproduced for the various length-scale parameters (a) km, (b) km, (c) km, and (d) km, for the dimension set to .

This is another effect of the high dimension of the problem that can be understood from the computation of the cosine of the angle between the analysis increment and the background samples , whereThe histogram of the cosine distribution (32) is illustrated in Figure 11 for the various length-scale experiments (these are the unnormalized distributions in order to appreciate the concentration of the cosine around its average value ). For all the length-scales, the distributions are shown to be centered on zero, meaning that the angle is , or, said differently, , for almost all . It appears that the orthogonality is more pronounced for the short length-scales.

(a)

(b)

(c)

(d)

The distribution of this angle cosine can be theoretically approximated as follows. Since in high dimension the average norm of was shown to be , and the norm of is concentrated around , the angle cosine can be approximated as follows:

Furthermore, the random scalar is a Gaussian random value since it appears as being the linear transformation of the Gaussian random vector . Hence, the distribution of the cosine angle can be asymptotically approximated byThe Gaussian approximation (34) is represented, for each length-scale, in Figure 11 by the solid line curve. It appears as being a good approximation of the angle cosine distribution .

An averaged theoretical distribution for can be deduced from (28) leading to the Gaussian . For the particular case where with , it results that the distribution of the angle cosine is well approximated by meaning that the orthogonality is more pronounced for small length-scales.

3.6.4. Recommendation for PF and Beyond the Gaussian Framework

A known result is that if the PF is quite interesting for non-Gaussian data assimilation; it faces the curse of dimensionality for problems of large size. The present contribution illustrates the geometrical interpretation of the curse of dimensionality, considering the favorable case where the distribution is Gaussian, so that the EnKF and the PF should produce the same analysis distribution. The limit of the use of PF for high dimension is that the distance between the background samples and the typical support of the analysis distribution would be too large to select some background samples as analysis samples. Of course, in the realm of geophysical data assimilation where is of order (at the moment of writing this), we can expect the distributions to be highly non-Gaussian, and this could be an advantage for the PF. However, for such high dimension, , the diagnostic of the distance is entirely governed by the central limit theorem (or large deviation principle if we have to consider the correlation; see Appendix A), so the non-Gaussian distribution should lie within the hypersphere associated with the equivalent Gaussian distribution, that is, the Gaussian distribution whose covariance matrix is the covariance matrix of the non-Gaussian distribution (we assume that this is the last covariance matrix existing). Hence, even if non-Gaussian, the analysis distribution should lie within a hypersphere of smaller radius due to the data assimilation. As a consequence, the selection of background samples to produce analysis samples will face the same difficulties as in the Gaussian case as illustrated in Sections 3.6.1 and 3.6.2.

In this contribution, and considering illustration in Figure 2, it appears that high dimension starts from surprisingly very small effective dimension: can be considered as high dimension. Hence, for problems of typical effective dimension larger than , we recommend not using the PF algorithm since it can not be used with a reasonable ensemble size as encountered in the EnKF (dozen of members when writing the manuscript). Note that this recommendation is stated considering the effective dimension of the problem, not the dimension of where the state vector lies.

As a consequence, and this is our strong recommendation, any new PF algorithm trying to tackle high dimension issue should proof its ability to face the intrinsic limit, due to the dimension, as explored here in terms of distance. Said differently, the distance can be considered as a diagnostic to test the ability of a new algorithm to cope with the curse of dimension: if the new algorithm is suffering from the distance consequences, then the algorithm can not be used for the high dimension, and if the new algorithm is compatible with the distance consequences, then nothing more can be said from the distance point of view and other diagnostics than the distance should be employed to conclude.

4. Conclusion

In this work, the behaviour of the convergence of a Gaussian law toward a hypersphere (or a part of it) in high dimension has been considered in order to discuss the difference between the ensemble Kalman filter and the particle filter.

Both algorithms correspond to the discretization of the Bayes rule (the nonlinear filter), but they are known to lead to separate results in practice. This is mainly due to the curse of dimensionality effect: particle filter requires an exponentially large number of samples as the dimension of the problem increases.

The consequence of the concentration of Gaussian probability distributions on hyperspheres has been verified within an experimental setting. It appears that the distance between the mean of a Gaussian and one of its samples is equal to the square-root of the trace of the covariance matrix: the fluctuation magnitude of the distance around this square-root value is very small as the dimension increases. The computation of the variance of the distance provides an approximation of the effective dimension of the covariance matrix.

These properties suggest that the distance plays the role of a diagnostic tool determining if a background sample can be compatible, or not, with a given analysis state. It appears that the ensemble Kalman filter (EnKF) is able to transform a background sample from the background hypersphere to the analysis hypersphere. If the prior and the posterior distributions are too different, no background sample can be selected in the particle filter (PF) at least when the ensemble size is not an exponential function of the dimension of the state space. A formula for the minimal bound of the ensemble size has been obtained from the concentration behaviour.

Some recommendations have been stated considering the non-Gaussian case, where the geometrical constraints associated with the distance imply a test that should be verified by any PF implementation for problems in high dimension.

The concentration behaviour is particularly interesting with the emergence of the weighting ensemble strategy that relies on the particle filter formalism to update the probabilistic information contained in ensembles of different origin. When an analysis state is available, then the diagnosis of the distance between the members of the ensemble to the analysis state can be used to position the ensemble onto the right hypersphere.

Appendix

A. Asymptotic Distributions for Gaussian Vector Norm

In this section, we want to feature the behaviour of the norm of a Gaussian random vector in when the dimension is high.

If denotes Gaussian random vector of zero mean and with covariance matrix , the behaviour of can be deduced from generalization of the central limit theorem (when assuming that coordinate random value is not too strongly correlated but dependent, the large deviation principle can be invoked under the form of the Gartner-Ellis theorem; see Dembo and Zeitouni [28]; the rigorous derivation is beyond the scope of the present manuscript): in the present case follows a Gaussian law entirely featured by its mean and its variance. These two parameters are determined as described now.

If denotes a (symmetric) square-root of the covariance matrix , that is, , then a sample of can be obtained as the transformation of a Gaussian random vector . Then, can be written as . With , where is the Kronecker symbol, we obtain that is alsowhere denotes the trace of the matrix. From the Wick formula,the computation of provides that . Thus, the variance of isHence, the law of is given from the mean equation (A.1) and the variance equation (A.3), by This leads to writing a sample of as , where .

From the Taylor expansion , applied for , one obtains the approximation and deduces that the asymptotic distribution of the norm is Introducing the effective dimension of [20], the asymptotic distribution takes the form If denotes an ensemble of samples of , an estimation of the effective dimension can be obtained from the computation of estimation of the mean norm which is theoretically equal to , and the variance theoretically equal to , where one obtains the estimation

Note that when is of covariance matrix , then follows a chi-squared distribution which is weakly (accurately) approximated by a Gaussian law for small (large) dimension size .

When the mean of is no more zero but , that is, , the previous route is modified leading to the corrected asymptotic distributions

B. Observational Variance Tuning

The observational variance tuning is not a question directly related to the issue of difference between EnKF and PF for small ensemble size. However, this is of crucial importance in the realm of data assimilation where very few is known about the quality of the observations. Since the investigation of observational variance tuning illustrates an interesting use of the hypersphere constraints, we have chosen to discuss this application here.

The observational variance tuning aims to provide an inflation factor such that , where is the observational covariance matrix and is the observational covariance matrix as implemented in a given data assimilation. If an ensemble of background samples is available, then verifies the asymptotic distribution

Moreover, a synthetic observational error can be generated from with , where denotes the dimension of the observational space. Hence, the asymptotic distribution of the square norm is

Under the usual unbiased assumption, the innovation verifies , leading in high dimension to the asymptotic distribution

Combining together (B.1), (B.2), and (B.3), it follows that verifies the (first-order) asymptotic distributionwith the average value , computed as the ensemble mean of the square norms ; the asymptotic distribution of is given by

Hence, the quantities and are two estimators for the inflation factor .

C. Bounds for Gaussian Samples

The behaviour of the maximum of Gaussian samples of a random variable following a Gaussian law of mean and variance has been characterized by Berman [29] who has shown that whereThis is a classical result from which one has to understand that the maximum distance can be reached for a sampling ensemble of size is given by and the maximum (minimum) sample value is .

In reverse, for a given threshold , the expected minimum ensemble size required to observe a maximum distance of magnitude is given by

Competing Interests

The authors declare that there are no competing interests regarding the publication of this paper.

Acknowledgments

Olivier Pannekoucke would like to thanks Laure Raynaud, Marc Bocquet, Olivier Talagrand, and Gérald Desroziers, for interesting discussions.

References

T. Bengtsson, P. Bickel, and B. Li, “Curseof-dimensionality revisited: collapse of the particle filter in very large scale systems,” in Collections, pp. 316–334, Institute of Mathematical Statistics Collections, 2008.
View at: Publisher Site | Google Scholar
C. Snyder, T. Bengtsson, P. Bickel, and J. L. Anderson, “Obstacles to high-dimensional particle filtering,” Monthly Weather Review, vol. 136, no. 12, pp. 4629–4640, 2008.
View at: Publisher Site | Google Scholar
R. Bellman, Adaptive Control Processes: A Guided Tour, Princeton University Press, 1961.
View at: MathSciNet
M. Bocquet, C. A. Pires, and L. Wu, “Beyond gaussian statistical modeling in geophysical data assimilation,” Monthly Weather Review, vol. 138, no. 8, pp. 2997–3023, 2010.
View at: Publisher Site | Google Scholar
A. Doucet, N. de Freita, N. Gordon, and A. Smith, Sequential Monte Carlo Methods in Practice, Springer, Berlin, Germany, 2001.
P. J. van Leeuwen, “A variance-minimizing filter for large-scale applications,” Monthly Weather Review, vol. 131, no. 9, pp. 1175–2084, 2003.
View at: Publisher Site | Google Scholar
G. Evensen, “Sequential data assimilation with a nonlinear quasi-geostrophic model using Monte Carlo methods to forecast error statistics,” Journal of Geophysical Research, vol. 99, no. 5, pp. 10143–10162, 1994.
View at: Publisher Site | Google Scholar
N. J. Gordon, D. J. Salmond, and A. F. M. Smith, “Novel approach to nonlinear/non-gaussian Bayesian state estimation,” IEE Proceedings, Part F: Radar and Signal Processing, vol. 140, no. 2, pp. 107–113, 1993.
View at: Publisher Site | Google Scholar
P. Del Moral, Feynman-Kac Formulae: Ge-Nealogical and Interacting Particle Systems with Applications, Springer, 2004.
View at: Publisher Site | MathSciNet
F. Le Gland, V. Monbet, and V.-D. Tran, “Large sample asymptotics for the ensemble Kalman filter,” I. N. R. I. A 7014, 2009.
View at: Google Scholar
J. Mandel, L. Cobb, and J. D. Beezley, “On the convergence of the ensemble Kalman filter,” Applications of Mathematics, vol. 56, no. 6, pp. 533–541, 2011.
View at: Publisher Site | Google Scholar | MathSciNet
A. Chorin and M. Morzfeld, “Conditions for successful data assimilation,” Journal Geophysical Research, vol. 118, pp. 522–533, 2013.
View at: Google Scholar
M. Talagrand, “A new look at independence,” The Annals of Probability, vol. 24, no. 1, pp. 1–34, 1996.
View at: Publisher Site | Google Scholar | MathSciNet
R. N. Hoffman and E. Kalnay, “Lagged average forecasting, an alternative to Monte Carlo forecasting,” Tellus A, vol. 35, no. 2, pp. 100–118, 1983.
View at: Google Scholar
A. R. Lawrence and J. A. Hansen, “A transformed lagged ensemble forecasting technique for increasing ensemble size,” Monthly Weather Review, vol. 135, no. 4, pp. 1424–1438, 2007.
View at: Publisher Site | Google Scholar
Z. Ben Bouallègue, S. E. Theis, and C. Gebhardt, “Enhancing COSMO-DE ensemble forecasts by inexpensive techniques,” Meteorologische Zeitschrift, vol. 22, no. 1, pp. 49–59, 2013.
View at: Publisher Site | Google Scholar
L. Descamps, C. Labadie, A. Joly, E. Bazile, P. Arbogast, and P. Cébron, “PEARP, the Météo-France short-range ensemble prediction system,” Quarterly Journal of the Royal Meteorological Society, vol. 141, no. 690, pp. 1671–1685, 2015.
View at: Publisher Site | Google Scholar
M. Milan, D. Schüttemeyer, T. Bick, and C. Simmer, “A sequential ensemble prediction system at convection-permitting scales,” Meteorology and Atmospheric Physics, vol. 123, no. 1-2, pp. 17–31, 2014.
View at: Publisher Site | Google Scholar
L. Raynaud, O. Pannekoucke, P. Arbogast, and F. Bouttier, “Application of a Bayesian weighting for short-range lagged ensemble forecasting at the convective scale,” Quarterly Journal of the Royal Meteorological Society, vol. 141, no. 687, pp. 459–468, 2015.
View at: Publisher Site | Google Scholar
D. J. Patil, B. R. Hunt, E. Kalnay, J. A. Yorke, and E. Ott, “Local low dimensionality of atmospheric dynamics,” Physical Review Letters, vol. 86, no. 26, pp. 5878–5881, 2001.
View at: Publisher Site | Google Scholar
A. Weaver and P. Courtier, “Correlation modelling on the sphere using a generalized diffusion equation,” Quarterly Journal Royal Meteorological Society, vol. 127, no. 575, pp. 1815–1846, 2001.
View at: Publisher Site | Google Scholar
O. Pannekoucke and S. Massart, “Estimation of the local diffusion tensor and normalization for heterogeneous correlation modelling using a diffusion equation,” Quarterly Journal of the Royal Meteorological Society, vol. 134, no. 635, pp. 1425–1438, 2008.
View at: Publisher Site | Google Scholar
O. Pannekoucke, L. Berre, and G. Desroziers, “Background-error correlation length-scale estimates and their sampling statistics,” Quarterly Journal of the Royal Meteorological Society, vol. 134, no. 631, pp. 497–508, 2008.
View at: Publisher Site | Google Scholar
P. L. Houtekamer and H. L. Mitchell, “Data assimilation using an ensemble Kalman filter technique,” Monthly Weather Review, vol. 126, no. 3, pp. 796–811, 1998.
View at: Publisher Site | Google Scholar
G. Evensen, Data Assimilation: The Ensemble Kalman Filter, Springer, Berlin, Germany, 2nd edition, 2009.
View at: Publisher Site | MathSciNet
G. Burgers, P. J. van Leeuwen, and G. Evensen, “Analysis scheme in the ensemble Kalman filter,” Monthly Weather Review, vol. 126, no. 6, pp. 1719–1724, 1998.
View at: Publisher Site | Google Scholar
C. Snyder, “Particle filters, the ‘optimal’ proposal and high-dimensional systems,” in Proceedings of the ECMWF Seminar on Data Assimilation for Atmosphere and Ocean, ECMWF, Ed., pp. 1–10, ECMWF, Reading, UK, 2011.
View at: Google Scholar
A. Dembo and O. Zeitouni, Large Deviations Techniques and Applications, Stochastic Modelling and Applied Probability, Springer, 1998.
S. M. Berman, Sojourners and Extremes of Stochastic Processes, Wadsworth, Reading, Mass, USA, 1989.

Copyright

Copyright © 2016 Olivier Pannekoucke et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

PDF Download Citation

Download other formats

Order printed copies

Views

2931

Downloads

1165

Citations