Abstract

We relate the matrix of the second moments of a spherically truncated normal multivariate to its full covariance matrix and present an algorithm to invert the relation and reconstruct from . While the eigenvectors of are left invariant by the truncation, its eigenvalues are nonuniformly damped. We show that the eigenvalues of can be reconstructed from their truncated counterparts via a fixed point iteration, whose convergence we prove analytically. The procedure requires the computation of multidimensional Gaussian integrals over an Euclidean ball, for which we extend a numerical technique, originally proposed by Ruben in 1962, based on a series expansion in chi-square distributions. In order to study the feasibility of our approach, we examine the convergence rate of some iterative schemes on suitably chosen ensembles of Wishart matrices. We finally discuss the practical difficulties arising in sample space and outline a regularization of the problem based on perturbation theory.

1. Introduction

It has been more than forty years since Tallis [1] worked out the moment-generating function of a normal multivariate , subject to the conditional eventThe perfect match between the symmetries of the ellipsoid and those of allows for an exact analytic result, from which the complete set of multivariate truncated moments can be extracted upon differentiation. Consider, for instance, the matrix of the second truncated moments, expressing the covariances among the components of within . From Tallis’ paper it turns out thatwith denoting the cumulative distribution function of a -variable with degrees of freedom. Inverting (2)—so as to express as a function of —is trivial, since is a scalar damping factor independent of . In this paper, we shall refer to such inverse relation as the reconstruction of from . Unfortunately, life is not always so easy. In general, the effects produced on the expectation of functions of by cutting off the probability density outside a generic domain can be hardly calculated in closed form, especially if the boundary of is shaped in a way that breaks the ellipsoidal symmetry of . Thus, for instance, unlike (2), the matrix of the second truncated moments is expected in general to display a nonlinear/nontrivial dependence upon .

In the present paper, we consider the case where is a -dimensional Euclidean ball with center in the origin and square radius . Specifically, we discuss the reconstruction of from the matrix of the spherically truncated second moments. To this aim, we need to mimic Tallis’ calculation, with (1) being replaced by the conditional eventThis is precisely an example of the situation described in the previous paragraph: although has a higher degree of symmetry than , still there is no possibility of obtaining a closed-form relation between and , since breaks the ellipsoidal symmetry of : technically speaking, in this case we cannot perform any change of variable under the defining integral of the moment-generating function, which may reduce the dimensionality of the problem, as in Tallis’ paper.

In spite of that, the residual symmetries characterizing the truncated distribution help simplify the problem in the following respects: (i) the reflection invariance of the whole setup still yields and (ii) the rotational invariance of preserves the possibility of defining the principal components of the distribution just like in the unconstrained case. In particular, the latter property means that and share the same orthonormal eigenvectors. In fact, the reconstruction of from amounts to solving a system of nonlinear integral equations, having the eigenvalues of as unknown variables and the eigenvalues of as input parameters. In a lack of analytic techniques to evaluate exactly the integrals involved, we resort to a numerical algorithm, of which we investigate feasibility, performance, and optimization.

The paper is organized as follows. In Section 2, we describe a few examples illustrating the occurrence of spherical truncations in practical situations. In Section 3, we show that the aforementioned integral equations have the analytic structure of a fixed point vector equation; that is to say, . This suggests achieving the reconstruction of numerically via suitably chosen iterative schemes. In Section 4, we prove the convergence of the simplest of them by inductive arguments, the validity of which relies upon the monotonicity properties of ratios of Gaussian integrals over . In Section 5, we review some numerical techniques for the computation of Gaussian integrals over with controlled systematic error. These are based on and extend a classic work by Ruben [2] on the distribution of quadratic forms of normal variables. For the sake of readability, we defer proofs of statements made in this section to the appendix. In Section 6, we report on our numerical experiences: since the simplest iterative scheme, namely, the Gauss–Jacobi iteration, is too slow for practical purposes, we investigate the performance of its improved version based on overrelaxation; as expected, we find that the latter has a higher convergence rate; yet it still slows down polynomially in as and exponentially in as ; in order to reduce the slowing down, we propose an acceleration technique, which boosts the higher components of the eigenvalue spectrum. A series of Monte Carlo simulations enables us to quantify the speedup. In Section 7 we discuss the problems arising when is affected by statistical uncertainty and propose a regularization technique based on perturbation theory. To conclude, we summarize our findings in Section 8.

2. Motivating Examples

Spherical truncations of multinormal distributions may characterize different kinds of experimental devices and may occur in various problems of statistical and convex analysis. In this section, we discuss two motivating examples.

2.1. A Two-Dimensional Gedanken Experiment in Classical Particle Physics

Consider the following ideal situation. An accelerator physicist prepares an elliptical beam of classical particles with Gaussian transversal profile. The experimenter knows a priori the spatial distribution of the beam, that is, the covariance matrix of the two-dimensional coordinates of the particles on a plane orthogonal to their flight direction. We can assume with no loss of generality that the transversal coordinate system has origin at the maximum of the beam intensity and axes along the principal components of the beam; thus it holds . The beam travels straightforward until it enters a linear coaxial pipeline with circular profile, schematically depicted in Figure 1, where the beam is longitudinally accelerated. While the outer part of the beam is stopped by an absorbing wall, the inner part propagates within the pipeline. At the end of the beam flight the physicist wants to know if the transversal distribution of the particles is changed, due to unknown disturbance factors arisen within the pipeline. Accordingly, he measures again the spatial covariance matrix of the beam. Unfortunately, the absorbing wall has cut off the Gaussian tail, thus damping the covariance matrix and making it no more comparable to the original one. To perform such a comparison in the general case , the covariance matrix of the truncated circular beam has to go through the reconstruction procedure described in next sections.

2.2. A Multivariate Example: Connections to Compositional Data Analysis

Compositional Data Analysis (CoDA) has been the subject of a number of papers, pioneered by Aitchison [3] over the past forty years. As a methodology of statistical investigation, it finds application in all cases where the main object of interest is a multivariate with strictly positive continuous components to be regarded as portions of a total amount (the normalization is conventionally adopted in the mathematical literature). In other words, compositional variates belong to the -simplex:with the taxi-cab norm of , while compositions with different norms can be always projected onto via the closure operator . There are countless types of compositional data, whose analysis raises problems of interest for statistics [4], for example, geochemical data, balance sheet data, and election data. The simplex constraint induces a kind of dependency among the parts of a composition that goes beyond the standard concept of covariance. This invalidates many ordinary techniques of statistical analysis.

In order to measure distances on , Aitchison introduced a positive symmetric function , explicitly defined byThe Aitchison distance is a key tool in CoDA. It is scale invariant in both its first and second arguments; that is, it is left invariant by redefinitions with . Accordingly, its support can be extended to by imposingIt was proved in [5] that is a norm-induced metric on , provided the latter is given an appropriate normed vector space structure. Owing to the compositional constraint , it holds . Accordingly, the description of in terms of components is redundant: an essential representation requires compositions to be properly mapped onto -tuples. Among various possibilities, the Isometric Log-Ratio (ILR) transform introduced in [5] is the only known map of this kind leaving invariant. More precisely, the ILR fulfills

It is known from [6] that if is a log-normal -variate, then is a logistic-normal -variate (the reader should notice that in [6] the simplex is defined by ; thus property of [6] is here reformulated so as to take into account such difference), with a known relation between and . Analogously, it is not difficult to show that is a normal -variate, with related to via the change of basis matrices derived in [5]. Just to sum up, it holdsNow, suppose that fulfills (8) and has a natural interpretation as a composition, a representative set of observations of is given, and we wish to select from those units which are compositionally closer to the center of the distribution, according to the Aitchison distance. To see that the problem is well posed, we first turn into a set , of compositional observations of . Then, we consider the special point , representing the center of the distribution of in a compositional sense: minimizes the expression over and fulfills , see [7]. By virtue of (8) this entails . In order to select the observations which are closer to , we set a threshold and consider only those elements fulfilling , with being a sample estimator of on . Such selection rule operates a well-defined truncation of the distribution of . Moreover, in view of (7) and (8), we havewith . As a consequence, we see that a compositional selection rule based on the Aitchison distance and (8) is equivalent to a spherical truncation of a multinormal distribution. Obviously, once has been spherically truncated, the covariance matrix of the remaining data is damped; thus an estimate of the full covariance matrix requires the reconstruction procedure described in next sections.

2.3. Compositional Outlier Detection via Forward Search Techniques

Outlier detection in CoDA is a practical problem where the considerations made in Section 2.2 find concrete application. Most of the established methods for outlier detection make use of the Mahalanobis metric. This is however unfit to describe the distance between compositions. Log-ratio transforms allow to get rid of the compositional constraint and make it possible to apply standard statistical methods [8]. Here we consider an innovative approach, namely, the Forward Search Algorithm (FSA), introduced in [9] and thoroughly discussed in [10]. The FSA admits an elegant extension to CoDA, of which the covariance reconstruction is a key step. In the next few lines we sketch the original algorithm and outline its extension to CoDA.

2.3.1. Construction of the Signal

In its standard formulation the FSA applies to normal data. It assumes a dataset with for . The null hypothesis is that all the elements of are independent realizations of a stochastic variable . The FSA consists of a sequence of steps where data are recursively sorted. Along the recursion a signal is built and tested.

As a preliminary step, observations are randomly chosen from the bulk of . Let be the set of these observations. is used to provide initial estimates , of the true mean and the true covariance matrix , respectively. For , the th step of the algorithm goes as follows:(i)sort the elements of according to the increasing values of the square Mahalanobis distance:(ii)take the th element of the newly sorted dataset and regard as the th component of the signal;(iii)let be the set of the first observations of the newly sorted dataset;(iv)use to provide new estimates , of the true mean and the true covariance matrix, respectively.Notice that is a truncated dataset. Therefore, must include the Tallis’ correction factor, (2) with . While the recursion proceeds, the inliers of populate progressively . The recursion stops at the th step, when the first outlier of produces a statistically relevant discontinuity in the signal.

2.3.2. Statistical Test of the Signal

Statistical tests are needed to assess the relevance of discontinuities observed in the signal. At each step of the algorithm a new test is actually performed. Specifically, at the th step, is computed together with the lowest and highest values, respectively, and , which are theoretically admissible for under the null hypothesis at significance level for some . These values are nothing but the - and -percentage points of . Their envelopes for generate two curves that surround the signal when plotted versus . More precisely, one curve lies below it and the other lies above, provided the null hypothesis is not broken. The violation of one of the curves by the signal is interpreted as the result of the entrance of an outlier into . Although the distribution of cannot be calculated in closed form, its percentage points are obtained from a general formula, first derived in [11], yieldingwith the -percentage point of the Fisher distribution with parameters . Equation (11) holds with decent approximation, as confirmed by numerical simulations.

2.3.3. Extension of the Forward Search to CoDA

When is a compositional dataset, it is unnatural to assume that its elements are realizations of a multivariate . In this case the use of the FSE as outlined above does not make sense at all. Sometimes it is reasonable to assume , as first argued in [6]. In this case we can use the FSA to find outliers, provided that we first modify the algorithm in two respects:(i)we replace the Mahalanobis distance by the Aitchison one;(ii)we perform statistical tests consistently with the change of null hypothesis.Specifically, at the th step of the algorithm, we sort according to the increasing values of the square Aitchison distance:where is the center of . Analogously, given the th element of the newly sorted dataset, we regard as the th component of the signal. The percentage points of are obtained from by using the probability correspondence established in (9). Since is a spherically truncated dataset, the estimate of the covariance matrix derived from it must undergo the reconstruction procedure described in next sections.

2.4. General Covariance Reconstruction Problem

The examples discussed in the previous subsections are special cases of a more general inverse problem, namely, the reconstruction of the covariance matrix of a normal multivariate on the basis of the covariance matrix of truncated to some (convex) region . This is the simplest yet nontrivial inverse problem, which can be naturally associated with the normal distribution. The case corresponds to a setup where theoretical and practical aspects of the problem can be investigated with relatively modest mathematical effort. It is certainly a well-defined framework where to study regularization techniques for nonlinear inverse problems in statistics, for which there is still much room for interesting work [12, 13].

3. Definitions and Setup

Let be a random vector with jointly normal distribution in dimensions. The probability that falls within is measured by the Gaussian integralSince is symmetric positive definite, it has orthonormal eigenvectors . Let us denote by the orthogonal matrix having these vectors as columns and by the diagonal counterpart of . From the invariance of under rotations, it follows that depends upon just by way of . Accordingly, we rename the Gaussian probability content of asNote that (14) is not sufficient to fully characterize the random vector under the spherical constraint, for which we need to calculate the distribution law as a function of . Alternatively, we can describe in terms of the complete set of its truncated momentsAs usual, these can be all obtained from the moment-generating functionby differentiating the latter an arbitrary number of times with respect to the components of ; namely,It will be observed that is in general not invariant under rotations of . Therefore, unlike , the moments depend effectively on both and . For instance, for the matrix of the second moments such dependence amounts toBy parity, the only nonvanishing terms in the above sum are those with . Hence, it follows that and share as a common diagonalizing matrix. In other words, if is the diagonal matrix of the eigenvalues of , then . Moreover, is related to byThe ratios are naturally interpreted as adimensional correction factors to the eigenvalues of , so they play the same role as in (2). However, depends explicitly on the subscript ; thus each eigenvalue is damped differently from the others as a consequence of the condition .

Remark 1. In practical terms, (18)-(19) tell us that estimating the sample covariance matrix of from a spherically truncated population , made of independent units, via the classical estimator , being the sample mean, yields a damped result. Nonetheless, the damping affects only the eigenvalues of the estimator, whereas its eigenvectors are left invariant.

3.1. Monotonicity Properties of Ratios of Gaussian Integrals

Equations (14) and (19) suggest introducing a general notation for the Gaussian integrals over , under the assumption . So, we definewith each subscript on the left-hand side addressing an additional factor of under the integral sign on the right-hand side (no subscripts means ). Several analytic properties of such integrals are discussed in [14]. In the following proposition, we lay emphasis on some issues concerning the monotonicity trends of the ratios .

Proposition 2 (monotonicities). Let denote the set of the full eigenvalues without . The ratios fulfill the following properties:)  is a monotonic increasing function of at fixed and ;()  is a monotonic decreasing function of at fixed and ;()  is a monotonic decreasing function of at fixed and ,where an innocuous abuse of notation has been made on writing in place of .

Proof. Let the symbol denote a derivative with respect to . In order to prove property (), we apply the chain rule of differentiation to and then we pass under the integral sign in and . In this way, we obtainMoreover, since the truncated marginal density of is positive within a set of nonzero measure in , the monotonic trend of in is strict.
Properties () and () are less trivial than (). Indeed, the same reasoning as above now yields on the one handand on the otherDespite being not a priori evident, the right-hand side of both (22) and (23) is negative (and vanishes in the limit ). The inequalities within Euclidean balls have been first discussed in [14], while the inequalities within generalized Orlicz balls have been discussed in [15, 16] for the case where the probability distribution of is flat instead of being normal. More recently, a complete proof of both inequalities has been given in [17]. Despite the technical difficulties in proving them, their meaning should be intuitively clear. The variance inequality quantifies the squeezing affecting as a consequence of the truncation (in the unconstrained case it would be ). The covariance inequality follows from the opposition arising among the square components in proximity of the boundary of . Indeed, if , then in order for to stay within .

3.2. Definition Domain of the Reconstruction Problem

A consequence of Proposition 2 is represented by the following.

Corollary 3. Given , , and , is bounded bywith denoting the Kummer function; namely,

Proof. The upper bound of (24) corresponds to the value of in the -tuple limit , . This is indeed the maximum possible value allowed for according to properties () and () of Proposition 2. In order to perform this limit, we observe thatwith the symbol on the right-hand side representing the Dirac delta function (the reader who is not familiar with the theory of distributions may refer, for instance, to [18] for an introduction). Accordingly,The lower bound corresponds instead to the value taken by as and is kept fixed. In this limit, all the Gaussian factors in the probability density function except the one flatten to one. Hence,Numerator and denominator of the rightmost ratio are easily recognized to be integral representations of Kummer functions (see, e.g., [19, ch. 13]).

The upper bound of (24) can be sharpened, as clarified by the following.

Proposition 4 (bounds on the truncated moments). Let , , and be given. If is a permutation of such that , then the following upper bounds hold:

Proof. The overall upper bound on the sum of truncated moments follows fromAt the same time, the sum can be split and bounded from below byThe single upper bounds on the ’s are then obtained from (30)-(31). It will be noted that (29) is sharper than the upper bound of (24) only for and .

From now on, we shall assume, with no loss of generality, that the eigenvalues of are increasingly ordered, namely, (we can always permute the labels of the coordinate axes, so as to let this be the case). An important aspect related to the eigenvalue ordering is provided by the following.

Proposition 5 (eigenvalue ordering). Let , , and be given. If , then holds as well.

Proof. In order to show that the spherical truncation does not violate the eigenvalue ordering, we make repeated use of the monotonicity properties of Proposition 2. Specifically, if , thenwhere the symbol “” is used to explain where the inequality sign preceding it comes from and the “exchange symmetry” refers to the formal property of the one-index Gaussian integrals over to fulfill .

Let us now focus on (19). They have to be solved in order to reconstruct from . Formally, if we introduce a family of truncation operators (parametrically depending on ), such thatthen the reconstruction of from amounts to calculating . One should be aware that is not a surjective operator in view of Corollary 3 and Proposition 4. Therefore, is only defined within a bounded domain . If we definethen we have , where is the set of permutations of elements. From Proposition 4 we conclude that , beingIn fact, there are vectors fulfilling and ; thus we conclude that is a proper subset of . Numerical experiences based on the techniques discussed in the next sections show indeed thatA graphical representation of (36) in and dimensions is depicted in Figure 2. The reader should note that until Section 7 we shall always assume that comes from the application of to some ; thus by construction.

Now, we observe that (19) can be written in the equivalent formSince and are (nonindependent) input parameters for the covariance reconstruction problem (and in order to keep the notation light), in the sequel we shall leave the dependence of upon and implicitly understood; that is, we shall write (37) as . Hence, we see that the full eigenvalue spectrum is a fixed point of the operator . This suggests obtaining it as the limit of a sequenceprovided that this can be shown to converge. Note that since , it follows that , so the sequence is bounded from below by . In particular, this holds for . Therefore, the sequence moves to the right direction at least at the beginning. A formal proof of convergence, based on the monotonicity properties stated by Proposition 2, is given in the next section.

4. Convergence of the Fixed Point Equation

We split our argument into three propositions, describing different properties of the sequence . They assert, respectively, that (i) the sequence is componentwise monotically increasing; (ii) the sequence is componentwise bounded from above by any fixed point of ; and (iii) if has a fixed point, this must be unique. Statements (i) and (ii) are sufficient to guarantee the convergence of the sequence to a finite limit (the unconstrained spectrum is a fixed point of ). In addition, the limit is easily recognized to be a fixed point of . Hence, statement (iii) guarantees that the sequence converges to the unconstrained eigenvalue spectrum. We remark that all the monotonicities discussed in Proposition 2 are strict; that is, the ratios have no stationary points at finite and , which is crucial for the proof.

Proposition 6 (increasing monotonicity). Given , , and , the sequence , , , is monotically increasing; namely, .

Proof. The proof is by induction. We first notice thatthe inequality follows from . Suppose now that the property of increasing monotonicity has been checked off up to the nth element of the sequence. Then,the inequality follows this time from the inductive hypothesis and from properties () and () of Proposition 2.

Proposition 7 (boundedness). Given , , and , the sequence , , , is bounded from above; namely, , being a fixed point of .

Proof. We proceed again by induction. We first notice thatthe inequality follows as previously from . Suppose now that the property of boundedness has been checked off up to the nth element of the sequence. Then,the inequality follows for the last time from the inductive hypothesis and from properties () and () of Proposition 2.

According to Propositions 6 and 7, the sequence converges. Now, let be the limit of the sequence. Effortlessly, we prove that is a fixed point of . Indeed,Note that passing the limit over under the integral sign is certainly allowed for Gaussian integrals.

Proposition 8 (uniqueness of the fixed point). Let and be two fixed points of , corresponding to the same choice of , , and . Then, it must be that .

Proof. According to the hypothesis, and fulfill the equationsHence,where denotes the Jacobian matrix of and is given by having set . It will be noted that is essentially the covariance matrix of the square components of under spherical truncation (we have come across its matrix elements in (21)–(23)). As such, is symmetric positive definite. Indeed,On setting , we can represent as . If is not the null vector, then . Moreover, the eigenvalues of fulfill the secular equationwhence it follows that is positive definite as well (though it is not symmetric). Since the sum of positive definite matrices is positive definite, we conclude that is positive definite too. As such, it is nonsingular. Therefore, from (47), we conclude that .

5. Numerical Computation of Gaussian Integrals over

Let us now see how to compute with controlled precision. Most of the relevant work has been originally done by Ruben in [2], where the case of is discussed. We extend Ruben’s technique to Gaussian integrals containing powers of the integration variable. Specifically, it is shown in [2] that can be represented as a series of chi-square cumulative distribution functions:The scale factor has the same physical dimension as and . It is introduced in order to factorize the dependence of upon and at each order of the expansion. The series on the right-hand side of (51) converges uniformly on every finite interval of . The coefficients are given by having defined for and as the uniform average operator on the -sphere ; namely,Unfortunately, (52) is not particularly convenient for numerical computations, since is only given in integral form. However, it is also shown in [2] that the coefficients can be extracted from the Taylor expansion (at ) of the generating functionThis series converges uniformly for . On evaluating the derivatives of , it is then shown that ’s fulfill the recursion:Finally, the systematic error produced on considering only the lowest terms of the chi-square series of (51) is estimated bywith .

Now, as mentioned, it is possible to extend the above expansion to all Gaussian integrals . Here, we are interested only in and , since these are needed in order to implement the fixed point iteration and to compute the Jacobian matrix of . The extension is provided by the following.

Theorem 9 (Ruben’s expansions). The integrals and admit the series representations:with being an arbitrary positive constant. The series coefficients are given, respectively, by with denoting the Kronecker symbol. The series on the right-hand side of (57)-(58) converge uniformly on every finite interval of . The functionsare generating functions, respectively, for the coefficients , and (); that is, they fulfillfor . Finally, the coefficients ,  , and () can be obtained iteratively from the recursionswhere the auxiliary coefficients and are defined by

It is not difficult to further generalize this theorem, so as to provide a chi-square expansion for any Gaussian integral . The proof follows closely the original one given by Ruben. We reproduce it in the appendix for , just to highlight the differences arising when the Gaussian integral contains powers of the integration variable.

Analogously to (56), it is possible to estimate the systematic error produced when considering only the lowest terms of the chi-square series of and . Specifically, we find thatIn order to evaluate all Ruben series with controlled uncertainty, we first set (see once more [2] for an exhaustive discussion on how to choose )   ; then we choose a unique threshold representing the maximum tolerable systematic error; for example, (roughly corresponding to double floating-point precision), for all , , and , and finally for each we compute the integerproviding the minimum number of chi-square terms, for which the upper bound to the residual sum lies below . Of course, this procedure overshoots the minimum number of terms really required for the ’s to lie below , since we actually operate on the ’s instead of the ’s. Nevertheless, the computational overhead is acceptable, as it will be shown in the next section. For the sake of completeness, it must be said that typically the values of for , , and with the same (and , ) are not much different from each other.

To conclude, we notice that depends nontrivially upon . By contrast, since is monotically increasing in , we clearly see that is monotically increasing in . Now, should one evaluate and the like for a given at several values of , say , it is advisable to save computing resources and work out Ruben coefficients just once, up to the order corresponding to , since . We made use of this trick throughout our numerical experiences, as reported in the sequel.

6. Numerical Analysis of the Reconstruction Process

The fixed point (39) represents the simplest iterative scheme that can be used in order to reconstruct the solution . In the literature of numerical methods, this scheme is known as a nonlinear Gauss–Jacobi (GJ) iteration (see, e.g., [20]). Accordingly, we shall rewrite it as . As we have seen, the sequence converges with no exception as , provided . Given , the number of steps needed for an approximate convergence with relative precision , that is,depends not only upon , but also on and (note that the stopping rule is well conditioned, since and also ). In order to characterize statistically the convergence rate of the reconstruction process, we must integrate out the fluctuations of due to changes of ; that is, we must average by letting fluctuate across its own probability space. In this way, we obtain the quantity , which better synthesizes the cost of the reconstruction for given and . It should be evident that carrying out this idea analytically is hard, for on the one hand depends upon nonlinearly and on the other hand has a complicated distribution, as we briefly explain below.

6.1. Choice of the Eigenvalue Ensemble

Since is the eigenvalue spectrum of a full covariance matrix, it is reasonable to assume its distribution to be a Wishart for some scale matrix and for some number of degrees of freedom . In the sequel, we shall make the ideal assumption , so that the probability measure of is (see, e.g., [21])Under this assumption, the probability measure of is obtained by performing the change of variable in (72). Unfortunately, we have no analytic representation of . Thus, we have neither an expression for the distribution of . However, can be extracted numerically as follows:(i)generate randomly by means of the Bartlett decomposition [22];(ii)take the ordered eigenvalue spectrum of ;(iii)obtain by applying the truncation operator to .Note that since is only defined for , we need to rescale as increases. The simplest choice is to keep the ratio fixed. The larger this ratio, the closer fluctuates around (recall that if , then and ). In view of this, large values of are to be avoided, since they reduce the probability of testing the fixed point iteration on eigenvalue spectra characterized by large condition numbers . For this reason, we have set in our numerical study.

Having specified an ensemble of matrices from which the eigenvalue spectra are extracted, we are now ready to perform numerical simulations. To begin with, we report in Figure 3 the marginal probability density function of the ordered eigenvalues and their truncated counterparts for the Wishart ensemble at , as obtained numerically from a rather large sample of matrices (106 units). It will be noted that (i) the effect of the truncation is severe on the largest eigenvalues, as a consequence of the analytic bounds of Corollary and Proposition ; (ii) while the skewness of the lowest truncated eigenvalues is negative, it becomes positive for the largest ones. This is due to a change of relative effectiveness of (29) with respect to (29) .

6.2. Choice of the Simulation Parameters

In order to explore the dependence of upon , we need to choose one or more simulation points for the latter. Ideally, it is possible to identify three different regimes in our problem: (strong truncation regime), (crossover), and (weak truncation regime). We cover all of them with the following set of points:where stands for the mode. In principle, it is possible to determine with high accuracy by using analytic representations of the marginal probability densities of the ordered eigenvalues [24]. In practice, the latter become computationally demanding at increasingly large values of : for instance, the determination of the probability density of requires sums, which is unfeasible even at . Moreover, to our aims, it is sufficient to choose approximate values, provided that these lie not far from the exact ones. Accordingly, we have determined the eigenvalue modes numerically from samples of Wishart matrices. Our estimates are reported in Table 1 for . They have been obtained from Grenander’s estimator [23]:with properly chosen parameters , .

We are now in the position to investigate numerically how many terms in Ruben’s expansions must be considered as is set to , for our choice of the eigenvalue ensemble and with set as in Table 1. As an example, we report in Figure 4 the discrete distributions of for the basic Gaussian integral at , the largest dimension that we have simulated. As expected, we observe an increase of with . Nevertheless, we see that the number of Ruben’s components to be taken into account for a double precision result keeps altogether modest even in the weak truncation regime, which proves the practical usefulness of the chi-square expansions.

6.3. Fixed Point Iteration at Work

The GJ iteration is too slow to be of practical interest. For instance, at , and (corresponding to a reconstruction of with single floating-point precision), it is rather easy to extract realizations of which require to converge. An improvement of the GJ scheme is achieved via overrelaxation (GJOR); that is,Evidently, at , the GJOR scheme coincides with the standard GJ one. The optimal value of the relaxation factor is not obvious even in the linear Jacobi scheme, where depends upon the properties of the coefficient matrix of the system. For instance, if the latter is symmetric positive definite, it is demonstrated that the best choice is provided by , being the spectral radius of the Jacobi iteration matrix [25]. In our numerical tests with the GJOR scheme, we found empirically that the optimal value of at is close to the linear prediction, provided that is replaced by , being defined as in Section 3 (note that ).

By contrast, the iteration diverges after few steps with increasing probability as if is kept fixed at ; in order to restore the convergence, must be lowered towards as such limit is taken.

To give an idea of the convergence rate of the GJOR scheme, we show in Figure 5(a) a joint box-plot of the distributions of at and . From the plot we observe that the distribution of shifts rightwards as decreases: clearly, the reconstruction is faster if is in the weak truncation regime (where is closer to ), whereas it takes more iterations in the strong truncation regime. The dependence of upon , systematically displayed in Figure 6, is compatible with a scaling lawapart from small corrections occurring at large . Equation (76) tells us that increases polynomially in at fixed . In order to estimate the parameters and in the strong truncation regime (where the algorithm becomes challenging), we performed jackknife fits to (76) of data points with . Results are collected in Figure 5(b), showing that is roughly constant, while increases almost linearly in . Thus, while the cost of the eigenvalue reconstruction is only polynomial in at fixed , it is exponential in at fixed . The scaling law of the GJOR scheme is therefore better represented by , with being a normalization constant independent of and and representing approximately the slope of as a function of . Although the GJOR scheme improves the GJ one, the iteration reveals to be still inefficient in a parameter subspace, which is critical for the applications.

6.4. Boosting the GJOR Scheme

A further improvement can be obtained by letting depend on the eigenvalue index in the GJOR scheme. Let us discuss how to work out such an adjustment. On commenting on Figure 3, we have already noticed that the largest eigenvalues are affected by the truncation to a larger extent than the smallest ones. Therefore, they must perform a longer run through the fixed point iteration, in order to converge to the untruncated values. This is a possible qualitative explanation for the slowing down of the algorithm as . In view of it, we expect to observe some acceleration of the convergence rate, if is replaced, for instance, byThe choice corresponds obviously to the standard GJOR scheme. Any other choice yields . Therefore, the new scheme is also expected to display a higher rate of failures than the GJOR one at , for the reason explained in Section . The componentwise overrelaxation proposed in (77) is only meant to enhance the convergence speed in the strong truncation regime and in the crossover, where the improvement is actually needed.

In order to confirm this picture, we have explored systematically the effect of on by simulating the reconstruction process at , with varying from 0 to 2 in steps of . First of all, we have observed that the rate of failures at large is fairly reduced if the first iterations are run with , and only afterwards is switched on. Having minimized the failures, we have checked that for each value of , the scaling law assumed in (76) is effectively fulfilled. Then, we have computed jackknife estimates of the scaling parameters and . These are plotted in Figure 7 as functions of . Each trajectory (represented by a dashed curve) corresponds to a given value of . Those with darker markers refer to smaller values of and the other way round. From the plots we notice that(i)all the trajectories with lie below the one with ;(ii)the trajectories of display a clear increasing trend with ; yet their slope lessens as increases. By contrast, the trajectories of develop a mild increasing trend with as increases, though this is not strictly monotonic;(iii)the trajectories of both and seem to converge to a limit trajectory as increases; we observe a saturation phenomenon, which thwarts the benefit of increasing beyond a certain threshold close to .We add that pushing beyond is counterproductive, as the rate of failures becomes increasingly relevant in the crossover and eventually also in the strong truncation regime. By contrast, if the rate of failures keeps very low for essentially all simulated values of .

Our numerical results signal a strong reduction of the slowing down of the convergence rate. Indeed, (i) means qualitatively that and are reduced as increases. (ii) means that is reduced as increases (this is the most important effect, as is mainly responsible for the exponential slowing down with ). The appearance of a slope in the trajectories of as increases indicates that a mild exponential slowing down is also developed at denominator of the scaling law , but the value of is anyway smaller than at . Finally, (iii) means that choosing has a minor impact on the performance of the algorithm. In Figure 8, we report a plot of the parameter (obtained from least-squares fits of data to a linear model ) as a function of . We see that . This quantifies the maximum exponential speedup of the convergence rate, which can be achieved by our proposal. When is close to , amounts to few hundreds at and .

7. On the Ill-Posedness of the Reconstruction in Sample Space

So far we have discussed the covariance reconstruction under the assumption that represents the exact truncated counterpart of some and we have looked at the algorithmic properties of the iteration schemes which operatively define . Such analysis is essential in order to characterize mathematically; yet it is not sufficient in real situations, specifically when is perturbed by statistical noise.

In this section, we examine the difficulties arising when performing the covariance reconstruction in sample space. We first recall that, according to Hadamard [26], a mathematical problem is well posed provided that the following conditions are fulfilled:(H1)there exists always a solution to the problem;(H2)the solution is unique;(H3)the solution depends smoothly on the input data.Inverse problems are often characterized by violation of one or more of them; see, for instance, [13]. In such cases, the standard practice consists in regularizing the inverse operator, that is, in replacing it by a stable approximation. With regard to our problem, the reader will recognize that (H1) is violated (and the problem becomes ill-posed) as soon as the space of the input data is allowed to be a superset of : once clarifying how is concretely estimated in the applications (Sections 7.1 and 7.2), we propose a perturbative regularization of , which improves effectively the fulfillment of (H1) (Section 7.3). By contrast, Proposition 8 guarantees that whenever a solution exists, it is also unique; thus (H2) is never of concern. Finally, the fulfillment of (H3) depends on how the statistical noise on is nonlinearly inflated by the action of . For the sake of conciseness, in the present paper, we just sketch the main ideas underlying the perturbative regularization of , whereas a technical implementation of it and a discussion of (H3) are deferred to a separate paper [27].

7.1. Definition of the Sample Truncated Covariance Matrix

The examples of Section 2 assume that (i) spherical truncations are operated on a representative sample of with finite size , (ii) is known exactly, and (iii) the input budget for the covariance reconstruction is given by the subsetAs usual in the analysis of stochastic variables in sample space, we assume that the observations are realizations of i.i.d. stochastic variables , . Thus, is itself a stochastic variable in sample space, where it readswith denoting the characteristic function of and being just a shortcut for its extended counterpart. It is easily recognized that is a binomial variate. If we indeed denote by the sample expectation operator (i.e., the integral with respect to the product measure of the joint variables ), then a standard calculation yieldsHence, we see that the relative dispersion of is . Now, the simplest way to measure and , respectively, from the sets and is via the classical estimatorsWe define the sample estimates and , respectively, of and as the eigenvalue spectra of and . By symmetry arguments we see that is unbiased. Indeed, it holdsThe right-hand side of (83) makes only sense if we conventionally define the integrand to be zero in the integration subdomain or equivalently if we interpret as the conditional one (the event occurs a.s. only as ). Since the sample measure is even under while the integrand is odd, we immediately conclude that .

7.2. Bias of the Sample Truncated Covariance Matrix

The situation gets somewhat less trivial with : the normalization factor , which has been chosen in analogy with (81), is not sufficient to remove completely the bias of at finite , though we aim at showing here that the residual bias is exponentially small and asymptotically vanishing. In order to see this, we observe thatwith denoting the sample expectation corresponding to a multinormal measure with diagonal covariance matrix , conditioned to . Having diagonalized the product measure, we observe that the integrand on the right-hand side is odd for and even for under the joint change of variables for , similarly to what we did in Section 2. As a consequence, it holdswhence we infer that the matrix is diagonalized by the same matrix as . From (85) we also conclude that. It should be observed that in general since the computation of requires the diagonalization of , which is in general performed by a diagonalizing matrix . Nevertheless, if vanishes then vanishes too. Now, we observe that splits into three contributions:which can be exactly calculated and expressed in terms of , , and . For instance,Analogously, we haveHence, it follows thatSince , we see that . Thus, we conclude that is asymptotically unbiased.

A discussion of the variance of the sample truncated covariance matrix is beyond the scope of the present paper. We just observe that, apart from the above calculation, studying the sample properties of the truncated spectrum is made hard by the fact that eigenvalues and eigenvectors of a diagonalizable matrix are intimately related from their very definition; thus such study would require a careful consideration of the distribution of the sample diagonalizing matrix of .

7.3. Perturbative Regularization of

When is critically close to the internal boundary of , a sample estimate may fall outside of it due to statistical fluctuations. In that case the iterative procedure described in the previous sections diverges. On the quantitative side, the ill-posedness of the reconstruction problem is measured by the failure probabilitywhich is a highly nontrivial function of , , and . An illustrative example of it is reported in Figure 9(a), which refers to a specific case with and . The plot suggests that the iterative procedure becomes severely ill-posed in the regime of strong truncation.

In order to regularize the problem, we propose to go back to (19) and consider it from a different perspective. Specifically, we move from the observation that a simplified framework occurs in the special circumstance when the eigenvalue spectra are fully degenerate, which is essentially equivalent to the setup of [1]. If , by symmetry arguments it follows that and the other way round. Equation (19) reduces in this limit toIt can be easily checked that the function is monotonically increasing in . In addition, we havethus (92) can be surely (numerically) inverted provided that . We can regard (92) as an approximation to the original problem (19). When is not degenerate, we must define in terms of the components of . One possibility is to average them, that is, to chooseSubject to this, we expect to lie somewhere between and . Equation (92) can be thought of as the lowest order approximation of a perturbative expansion of (19) around the point . If the condition number of is not extremely large, such an expansion is expected to quickly converge, so that a few perturbative corrections to should be sufficient to guarantee a good level of approximation.

As mentioned above, a technical implementation of the perturbative approach and a thorough discussion of its properties are deferred to a separate paper [27]. Here, we limit ourselves to observing that the definition domain of perturbation theory is ultimately set by its lowest order approximation, since corrections to (92) are all algebraically built in terms of it, with no additional constraints. Following (94), the domain of comes to be defined asand it is clear that (it is sufficient to sum term by term all the inequalities contributing to (36)). In Figure 10, we show the set difference in and dimensions. When but its estimate , it may well occur ; that is, the set difference acts as an absorbing shield of the statistical noise. Therefore, if we define the failure probability of the perturbative reconstruction aswe expect the inequality to generously hold. An example is given in Figure 9(b): we see that becomes lower than by orders of magnitude as soon as and are not exceedingly small. In this sense, the operator can be regarded as the lowest order approximation of a regularizing operator for .

8. Conclusions

In this paper we have studied how to reconstruct the covariance matrix of a normal multivariate from the matrix of the spherically truncated second moments, describing the covariances among the components of when the probability density is cut off outside a centered Euclidean ball. We have shown that and share the same eigenvectors. Therefore, the problem amounts to relating the eigenvalues of to those of . Such relation entails the inversion of a system of nonlinear integral equations, which admits unfortunately no closed-form solution. Having found a necessary condition for the invertibility of the system, we have shown that the eigenvalue reconstruction can be achieved numerically via a converging fixed point iteration. In order to prove the convergence, we rely ultimately upon some probability inequalities, known in the literature as square correlation inequalities, which have been recently proved in [17].

In order to explore the convergence rate of the fixed point iteration, we have implemented some variations of the nonlinear Gauss–Jacobi scheme. Specifically, we have found that overrelaxing the basic iteration enhances the convergence rate by a moderate factor. However, the overrelaxed algorithm still slows down exponentially in the number of eigenvalues and polynomially in the truncation radius of the Euclidean ball. We have shown that a significant reduction of the slowing down can be achieved in the regime of strong truncation by adapting the relaxation parameter to the eigenvalue that is naturally associated with, so as to boost the higher components of the spectrum.

We have also discussed how the iterative procedure works when the eigenvalue reconstruction is performed on sample estimates of the truncated covariance spectrum. Specifically, we have shown that the statistical fluctuations make the problem ill-posed. We have sketched a possible way out based on perturbation theory, which is thoroughly discussed in a separate paper [27].

A concrete implementation of the proposed approach requires the computation of a set of multivariate Gaussian integrals over the Euclidean ball. For this, we have extended to the case of interest a technique, originally proposed by Ruben for representing the probability content of quadratic forms of normal variables as a series of chi-square distributions. In the paper, we have shown the practical feasibility of the series expansion for the integrals involved in our computations.

Appendix

Proof of Theorem 9. As already mentioned in Section 4, the proof follows in the tracks of the original one of [2]. We detail the relevant steps for , while for we only explain why it is necessary to distinguish between equal or different indices and the consequences for either case.
In order to prove (57), we first express in spherical coordinates; that is, we perform the change of variable , being and (recall that , with embodying the angular part of the spherical Jacobian and the differentials of angles); then we insert a factor of under the integral sign. Hence, readsThe next step consists in expanding the inner exponential in Taylor series (in his original proof, Ruben considers a more general setup, with the center of the Euclidean ball shifted by a vector from the center of the distribution. In that case, the Gaussian exponential looks different and must be expanded in series of Hermite polynomials. Here, we work in a simplified setup, where the Hermite expansion reduces to Taylor’s); namely,This series converges uniformly in . We review the estimate just for the sake of completeness:where . It follows that we can integrate the series term by term. With the help of the uniform average operator introduced in (53), is recast toThe presence of an additional factor of in the angular average is harmless, since . We finally notice that the radial integral can be expressed in terms of a cumulative chi-square distribution function on replacing ; namely,Inserting (A.5) into (A.4) results in Ruben’s representation of . This completes the first part of the proof.
As a next step, we wish to demonstrate that the function of (61) is the generating function of the coefficients . To this aim, we first recall the identitiesvalid for . On setting , we see that can be represented in the integral formprovided . As previously done, we introduce spherical coordinates and expand in Taylor series. By the same argument as above, the series converges uniformly in (the factor of does not depend on ), thus allowing term-by-term integration. Accordingly, we haveWe see that the right-hand side of (A.8) looks similar to (A.4), the only relevant differences being the presence of the factor of under the sum sign and the upper limit of the radial integral. With some algebra, we arrive atThe series coefficients are recognized to be precisely those of (59).
In the last part of the proof, we derive the recursion fulfilled by the coefficients . To this aim, the derivative of has to be evaluated at and then identified with . The key observation is that differentiating reproduces itself; that is to say,withand the auxiliary coefficient being defined as in (67). Equation (A.10) lies at the origin of the recursion. Indeed, from (A.10), it follows that that is a function of and ; namely, . Proceeding analogously yields the general formulaAt , this readsThe last step consists in proving thatwith defined as in (65). This can be done precisely as explained in [2].
Having reiterated Ruben’s proof explicitly in a specific case, it is now easy to see how the theorem is extended to any other Gaussian integral. First of all, from (A.1), we infer that each additional subscript in enhances the power of the radial coordinate under the integral sign by 2 units. This entails a shift in the number of degrees of freedom of the chi-square distributions in Ruben’s expansion, amounting to twice the number of subscripts. For instance, since has two subscripts, its Ruben’s expansion starts by , independently of whether or . In second place, we observe that in order to correctly identify the generating functions of Ruben’s coefficients for a higher-order integral , we need to take into account the multiplicities of the indices , , ,…. As an example, consider the case of () and . By going once more through the argument presented in (A.7), we see that (A.6) are sufficient to show that (63) is the generating function of . By contrast, in order to repeat the proof for the case of , we need an additional integral identity; namely,valid once more for . Hence, we infer that must depend upon via a factor of , whereas () must depend on and via factors of, respectively, and . The different exponents are ultimately responsible for the specific values taken by the auxiliary coefficients and of (68).
To conclude, we observe that the estimates of the residuals and , presented in Section 4 without an explicit proof, do not require any further technical insight than already provided by [2] plus our considerations. We leave them to the reader, since they can be obtained once more in the tracks of the original derivation of .

Competing Interests

The authors declare that they have no competing interests.

Acknowledgments

The authors are grateful to A. Reale for encouraging them throughout all stages of this work and to G. Bianchi for technical support at ISTAT. They also thank R. Mukerjee and S. H. Ong for promptly informing them about their proof of (22) and (23). The computing resources used for their numerical study and the related technical support at ENEA have been provided by the CRESCO/ENEAGRID High Performance Computing infrastructure and its staff [28]. CRESCO (Computational Research Centre for Complex Systems) is funded by ENEA and by Italian and European research programmes.