Abstract

Covariance is used as an inner product on a formal vector space built on 𝑛 random variables to define measures of correlation 𝑀𝑑 across a set of vectors in a 𝑑-dimensional space. For 𝑑=1, one has the diameter; for 𝑑=2, one has an area. These concepts are directly applied to correlation studies in climate science.

1. Introduction

In a study of the earth's climate system, Douglass [1] considered the correlation among a set of 𝑁 climate indices. A distance 𝑑 between two indices 𝑖 and 𝑗 was defined as𝑑𝑖𝑗(𝑡)=cos1||𝜑𝑖𝑗||,(𝑡)(1.1) where 𝜑𝑖𝑗 is the Pearson correlation coefficient. It was stated that 𝑑 satisfies the conditions to be a metric. The measure of correlation, or closeness, among the 𝑁 indices was taken to be the diameter 𝐷𝐷𝐼0𝑑(𝑡)=max𝑖𝑗(𝑡)𝑖,𝑗𝐼0.(1.2) Equation (1.2) was applied to the data from a global set of four climate indices to determine the correlation among them (minimum in 𝐷) and to infer 18 changes in the state since 1970 (see Section 8). It was pointed out that the topological diameter 𝐷, as a measure of phase locking among the indices, is convenient for computation but was probably not the best measure. It was suggested that a better measure of correlation among the 𝑁 indices could be based upon the area of the spherical triangles created by the 𝑁 vectors on the unit sphere.

This paper gives a proof that 𝑑𝑖𝑗 is a metric and generalizes the diameter to higher dimensions. In addition, the data of [1] are analyzed using this generalization to areas (see Section 8), and many new abrupt climate changes are identified.

2. Probability

Let 𝑋 and 𝑌 be random variables with expected values 𝐸(𝑋)=𝜇 and 𝐸(𝑌)=𝜈. With these values, we make several standard definitions.

Definition 2.1. The variance of 𝑋 is defined as [𝑋]Var=𝐸(𝑋𝜇)2.(2.1)

Definition 2.2. The covariance of 𝑋 and 𝑌 is defined as [][].Covar𝑋,𝑌=𝐸(𝑋𝜇)(𝑌𝜈)(2.2)

We now list a few basic properties of variance and covariance (found in [2]).

Properties 2.3
For 𝑋 and 𝑌 as above. (i)Covar is symmetric. (ii)Covar is bilinear. (iii)Var[] is a quadratic form. (iv)Covar[𝑋,𝑌]=𝐸[𝑋𝑌]𝐸[𝑋]𝐸[𝑌]. (v)Covar[𝑋,𝑋]=Var[𝑋], the variance of 𝑋.

Proof. (i) See [2, page 323]. (ii) Follows easily from the definition. (iii) See [2, page 323]. (iv) See [2, page 323].

3. Vector Spaces

The first way most students learn to compare two vectors is through the dot product. The dot product is one example of the more general idea of an inner product. Here we define an inner product and prove that covariance is an inner product.

Definition 3.1. For any real vector space 𝑉, an inner product is a map ,𝑉×𝑉(3.1) that satisfies the following properties for every 𝑢,𝑣,𝑤𝑉, and 𝑎: (i)𝑢+𝑣,𝑤=𝑢,𝑤+𝑣,𝑤(ii)𝑎𝑣,𝑤=𝑎𝑣,𝑤(iii)𝑣,𝑤=𝑤,𝑣(iv)𝑣,𝑣0 and 𝑣,𝑣=0 if and only if 𝑣=0.

We will now construct a vector space for which covariance is an inner product. Let {𝑋1,𝑋2,,𝑋𝑛} be a set of 𝑛 random variables. Also let 𝑉=Span(𝑋1,𝑋2,,𝑋𝑛), the formal -vector space with basis elements {𝑋1,𝑋2,,𝑋𝑛}. We must put one mild hypothesis upon 𝑉 in order for it to have the desired properties. The hypothesis is that the vectors must be “probabilistically independent.” That is, for any 𝑐1,,𝑐𝑛, we have that Var[𝑐1𝑋1++𝑐𝑛𝑋𝑛]=0 if and only if 𝑐1==𝑐𝑛=0. It should be noted that this independence is in no way related to the linear independence of the random variables.

Proposition 3.2. Let 𝑉=Span(𝑋1,𝑋2,,𝑋𝑛), the formal -vector space generated by the random variables {𝑋1,𝑋2,,𝑋𝑛} which are probabilistically independent, then covariance is an inner product on 𝑉.

Proof. We must prove the four properties from Definition 3.1.
(i), (ii), and (iii) follow immediately from Properties 2.3.
(iv) Covar(𝑋,𝑋)=𝐸[(𝑥𝜇)2]0. The nonnegativity is obvious as we are squaring a real number. The condition that Covar(𝑋,𝑋)=0𝑋=0 follows from the probabilistic independence of {𝑋1,,𝑋𝑛}.

The proposition implies that 𝑉 is an inner product space (a vector space equipped with an inner product), and as such it has a norm defined by 𝑋=Covar(𝑋,𝑋)=SD(𝑋), where SD(𝑋) is the standard deviation of 𝑋. Additionally it follows from the Cauchy-Schwartz inequality [3] that |Covar(𝑋,𝑌)|SD(𝑋)SD(𝑌).

Using the inner product on 𝑉, we are able to define an angle between two vectors. To do this, we first define a new map 𝜌(𝑉{0})×(𝑉{0}) using the standard definition of correlation 𝜌(𝑋,𝑌)=Covar(𝑋,𝑌).SD(𝑋)SD(𝑌)(3.2) By the Cauchy-Schwartz inequality, we can easily see that |𝜌(𝑋,𝑌)|1, as such we implicitly define Γ, the angle between 𝑋 and 𝑌, as follows: Covar(𝑋,𝑌)=SD(𝑋)SD(𝑌)cos(Γ).(3.3) Therefore, 𝜌(𝑋,𝑌)=Covar(𝑋,𝑌)/SD(𝑋)SD(𝑌)=cos(Γ).

Definition 3.3. Γ(𝑋,𝑌)=cos1(𝜌(𝑋,𝑌)) is the “Correlation Angle” of 𝑋 and 𝑌.

Our definition of Γ is the standard method of defining an angle from the covariance (or any other) inner product. We will show that Γ is a “metric” on the unit sphere of 𝑉.

Definition 3.4. For any set 𝑆, a map 𝑑𝑆×𝑆 is a metric if for any 𝑥,𝑦,𝑧𝑆 the following properties are satisfied:
(a)𝑑(𝑥,𝑦)0with𝑑(𝑥,𝑦)=0𝑥=𝑦(positive definite),(b)𝑑(𝑥,𝑦)=𝑑(𝑦,𝑥)(symmetry), (c)𝑑(𝑥,𝑧)𝑑(𝑥,𝑦)+𝑑(𝑦,𝑧)(triangle inequality).

Theorem 3.5. The map Γ𝑉×𝑉 from Definition 3.3 is a metric on 𝑆(𝑉)= the unit sphere of 𝑉.

Proof. We must prove that Γ satisfies the 3 conditions in Definition 3.4. (a)cos1[1,1][0,𝜋] so the nonnegativity is satisfied trivially. It remains to show that Γ(𝑋,𝑌)=0𝑋=𝑌. This is true because if the angle between two vectors is zero, then they are (positive) scalar multiples of each other. Thus since 𝑋 and 𝑌 are unit vectors, if Γ(𝑋,𝑌)=0, we must have 𝑋=𝑌. (b)Γ(𝑋,𝑌)=cos1(𝜌(𝑋,𝑌))=cos1(Covar(𝑋,𝑌)/SD(𝑋)SD(𝑌))=cos1(Covar(𝑌,𝑋)/SD(𝑌)SD(𝑋))=cos1(𝜌(𝑌,𝑋))=Γ(𝑌,𝑋). (c)To prove the triangle inequality, a geometric idea in itself, we delve into the geometry being defined. We will complete this part of the proof in Section 4.

Our metric Γ allows us to measure the correlation between two vectors.

Definition 3.6. For 𝑋, 𝑌, Γ, and 𝜌 as above:(i)if Γ=0(𝜌=1), then 𝑋 and 𝑌 are maximally positively correlated. (ii)If Γ=𝜋(𝜌=1), then they are maximally negatively correlated. (iii)If Γ=𝜋/2(𝜌=0), then 𝑋 and 𝑌 are uncorrelated.

It should be noted that cases (i) and (ii) are both considered to be “maximally correlated.”

4. A Geometric Interpretation

The vector space 𝑉 with inner product 𝐶𝑜𝑣𝑎𝑟 lends itself nicely to a geometric interpretation. First we must establish a small amount of background.

Consider 𝑆, the standard unit sphere in Euclidean 𝑛-space (𝑛). Great circles are the intersection of a plane through the origin and 𝑆. They share many properties with the standard idea of lines in Euclidean space, including the property that they define the shortest path between any two points. For a thorough treatment of great circles as lines on a sphere, see [46] or [7].

For any two nonzero vectors 𝑣1 and 𝑣2 in 𝑛, let 𝜃 be the (minimal) angle formed by 𝑣1 and 𝑣2. The unit vectors ̂𝑣1 and ̂𝑣2, corresponding to 𝑣1 and 𝑣2, define two points 𝑝1 and 𝑝2 on 𝑆. In order to measure the distance from 𝑝1 to 𝑝2 along 𝑆, we take the length of the arc on great circle between the two points. By definition, this is the radian measure of 𝜃.

If 𝑉, the vector space considered in Section 3, is thought of as 𝑛 with 𝑣1 and 𝑣2 any two vectors, then we can compute the spherical distance between 𝑣1 and 𝑣2, namely, the distance between 𝑝1 and 𝑝2 on 𝑆. We call this quantity Γ𝑑spherical𝑣1,𝑣2𝜌̂𝑣=arccos1,̂𝑣2=Γ.(4.1)

Thus far we have identified the inner product space (𝑉,Covar) as 𝑛. We solidify this intuition with the following proposition. First we define 𝐴=(𝐴𝑖,𝑗)=(Covar(𝑋𝑖,𝑋𝑗)), a real valued symmetric matrix. As in [3], we use 𝐴 to create the inner product on 𝑛.

Proposition 4.1. The inner product space Span(𝑋1,,𝑋𝑛),𝐶𝑜𝑣𝑎𝑟𝑛,𝐴, where 𝐴 is a “twisted dot product” defined for two vectors (𝑐1,,𝑐𝑛) and (𝑑1,,𝑑𝑛) as c1,,𝑐𝑛𝐴𝑑1,,𝑑𝑛𝑐=1,,𝑐𝑛𝐴𝑑1𝑑𝑛.(4.2)

Proof. This follows from the standard method of representing an inner product by a matrix (see [3, chapter 8.1]).

Now we return to our proof of Theorem 3.5.

Proof of Theorem 3.5(iii). Let 𝑋,𝑌,𝑍𝑉 be unit vectors. We have left to show that Γ(𝑋,𝑌)+Γ(𝑌,𝑍)Γ(𝑋,𝑍).
Because 𝑋 and 𝑍 are unit vectors, Γ(𝑋,𝑍) is the geodesic distance between 𝑋 and 𝑍. Since geodesic distance satisfies the triangle inequality, Γ must as well.

5. Projective Metric

For scientists, 𝜌=±1 (equivalently Γ=0 or Γ=𝜋) are often both considered to be “maximally correlated,” for example, see [1]. To take this into account, we modify our metric on the unit sphere of 𝑉. We think of 𝑉 as a projective space, the space of lines through the origin of 𝑉. We denote this space as (𝑉).

Our original correlation angle Γ is modified to be Γ||||=𝜋=arccos𝜌(𝑋,𝑌)Γ0Γ2𝜋𝜋Γ2Γ𝜋.(5.1)

Proposition 5.1. Γ(𝑋,𝑌) is a metric on (𝑛).

Proof. We must show that the three conditions of Definition 3.4 are met.
(i)Γ=0 corresponds to a correlation angle of 0 or 𝜋. The two vectors are either in the same direction or opposite direction. In either case, they determine the same line through the origin and hence correspond to the same point in projective space. (ii)As in Definition 3.4, the symmetry of Γ follows from the symmetry of 𝜌. (iii)As before, the triangle inequality follows as Γ is the geodesic distance for a projective space.

The metric Γ(𝑋,𝑌) gives the angular distance between 𝑋 and 𝑌. If 𝜌(𝑋,𝑌)=±1 (what we called a “maximal correlation”) then Γ=0, however, if 𝜌(𝑋,𝑌)=0, which we called orthogonality or noncorrelation, then Γ=𝜋/2.

Proposition 5.2. Let Γ be the metric cos1(𝜌(𝑋,𝑌)), then the pair ((𝑉),Γ) is a projective metric space.

Proof. This is by construction.

6. Time Dependence

Until this point, we have treated our random variables {𝑋1,,𝑋𝑛} as being time independent. However, random variables often depend on time. Therefore, we will now consider each random variable as depending discretely on time. It should be noted that what follows is essentially a replication of what has come before, however, 𝑋 and 𝑌 are now treated as vectors instead of singleton points. Vectors, however, are just points of 𝑉. The additional theory and notation is simply a means of dealing with the additional information.

To make our 𝑛 random variables time dependent, they will now be given as𝑋1=𝑋1(𝑡),𝑋1(𝑡+1),𝑋1,𝑋(𝑡+2),2=𝑋2(𝑡),𝑋2(𝑡+1),𝑋2(,𝑋𝑡+2),𝑛=𝑋𝑛(𝑡),𝑋𝑛(𝑡+1),𝑋𝑛.(𝑡+2),(6.1)

We must now redefine the covariance. We do this by looking at a time window starting at time 𝑡 with a duration of 𝐾, where 𝐾 is called the summation window 𝑋Covar𝑖,𝑋𝑗=1𝐾𝑡+𝐾1𝑙=𝑡𝑋𝑖𝑋(𝑙)𝜇𝑗,(𝑙)𝜈(6.2) where 𝜇 and 𝜈 are the sample means in the summation window of 𝑋𝑖 and 𝑋𝑗, respectively. That is, 𝜇=(1/𝐾)𝑝=𝑡+𝐾1𝑙=𝑡𝑋𝑖(𝑙).

If we think of 𝑋𝑖 and 𝑋𝑗 as the vectors 𝑋𝑖=(𝑋𝑖(𝑡),,𝑋𝑖(𝑡+𝐾1)) (resp., for 𝑋𝑗), then we get that 𝑋Covar𝑖,𝑋𝑗=1𝐾𝑋𝑖𝑋𝜇𝑗,̂𝜈(6.3) where “” is the standard Euclidean dot product, and 𝜇 is the length 𝐾 vector (𝜇,,𝜇) (resp., for ̂𝜈). This is called the “Pearson Covariance.”

In other words, if we define the vectors 𝑋𝑖𝑋=(𝑖(𝑡)𝜇)/𝐾, and 𝑋𝑗𝑋=(𝑗(𝑡)𝜇)/𝐾, then we define the Pearson Correlation as follows.

Definition 6.1. CovarPearson(𝑋𝑖,𝑋𝑗𝑋)=𝑖𝑋𝑗, where “” is the usual Euclidean inner product.

Now we define the Pearson Correlation as 𝑋̂𝜌𝑖,𝑋𝑗=CovarPearson𝑋𝑖,𝑋𝑗CovarPearson𝑋𝑖,𝑋𝑖CovarPearson𝑋𝑗,𝑋𝑗Γ.=cos(6.4) Here again Γ corresponds to the standard Euclidean angle, known as the Pearson Correlation Angle, and the resulting metric is the standard metric studied in classical spherical geometry (see [46] or [7]).

Remark 6.2. The angle Γ between 𝑋𝑖 and 𝑋𝑗 is the same as the angle between 𝑋𝑖 and 𝑋𝑗, the unit vectors corresponding to 𝑋𝑖 and 𝑋𝑗.

7. Correlation Measures: 𝑀𝑛 and 𝑀𝑛,𝑎

To this point, we have developed a method that will numerically tell us the correlation between two vectors. In this section, we will create two sets of functions that allow us to measure the correlation across a set of vectors. The first set, {𝑀𝑖,𝑎}, is based upon taking the volumes of 𝑖-simplices (a 1-simplex is a line, a 2-simplex a triangle, a 3-simplex is a tetrahedron, etc.). The set of 𝑀𝑖,𝑎 benefits from computability but is not as precise as the second set of measures {𝑀𝑖}, that measure the volume of 𝑖-dimensional convex hulls.

Given a set of vectors {𝑋1,,𝑋𝑚}𝑉, let 𝑈={𝑈1,,𝑈𝑚} be the set of corresponding unit vectors. We will define a way to measure the closeness of the 𝑈𝑖 to each other using the metric Γ. To do this, we define the diameter of 𝑈 as 𝐷=max𝑖,𝑗Γ𝑈𝑖,𝑈𝑗.(7.1) If all of the vectors are taken in the standard way to be points on the unit sphere, then the diameter is a measure of the overall spread of the points. If the diameter is small, then the vectors are all close together, hence highly correlated. Whereas if the diameter is large at least some of the points are far apart, hence not highly correlated. The benefit of the diameter is that it is an easy quantity to calculate; however, it can be somewhat misleading. If, for instance, a large number of points are clustered together and there is one outlying point, the diameter can be quite large despite the fact that the points are generally quite correlated.

We now proceed to generalize the correlation measure defined by 𝐷. Let 𝑇 be a collection of 𝑡 points on the 𝑛-sphere, and let 𝐷 be the set of 𝑛-simplices made up of points in 𝑇.

Definition 7.1. 𝑀𝑛,𝑎(𝑇)=maxΔ𝐷{Vol(Δ)}.

This maximum is taken over the 𝐶=𝐶(𝑇,𝑛+1)=(𝑡𝑛+1) different 𝑛-simplices made of points in 𝑇.

Definition 7.2. 𝑀𝑛(𝑇)=Vol(𝐻).

The volume used in Definition 7.2 is the spherical volume, and 𝐻 is the convex hull of the points of 𝑇 with respect to the spherical measure. That is, it is the smallest geodesically convex set containing 𝑇. (Geodesically convex means that any two points in the set have the minimal geodesic between them completely in the set as well.)

The volume is computed by constructing the convex hull of 𝑇, then disregarding all the points of 𝑇 not contributing to the hull. The hull is then divided into its “essential” 𝑛-simplices, and the volumes of these simplices are summed.

𝑀𝑛 and 𝑀𝑛,𝑎 are each measures of 𝑛-dimensional volume. 𝑀𝑛,𝑎 benefits from being easily computable. 𝑀𝑛, though harder to compute, gives a better measure of the overall spread of the vectors. However, in the one dimensional case, we have that 𝑀1=𝑀1,𝑎=𝐷, the diameter. The reason for this is that when making the hull to compute 𝑀1 all, but the two furthermost points will be disregarded. This equality is not true in general, a fact which can be easily observed by plotting four points forming a quadrilateral, where 𝑀2,𝑎<𝑀2. In the general case, however, we do have the inequality 𝑀𝑛,𝑎(𝑇)𝑀𝑛(𝑇). This follows since the maximal simplex will necessarily be a subset of the convex hull. Since volume is monotonic, we have the inequality.

Assume that 𝑠 of the 𝑡 points of 𝑇 are essential to the convex hull. There is a constant 𝐵=𝐵(𝑠,𝑛) defining the number of essential simplices that compose the convex hull. That is, 𝑀𝑛(𝑇)=thesumofthevolumeof𝐵-essentialsimplices.(7.2) Replacing the volume of each spherical simplex with the maximal one, that is, 𝑀𝑛,𝑎(𝑇), we get the following inequalities 𝑀𝑛,𝑎(𝑇)𝑀𝑛(𝑇)𝐵𝑀𝑛,𝑎(𝑇).(7.3) Since 𝐵 depends only on the number of points in 𝑇, we see that, for a fixed data set, 𝑀𝑛 and 𝑀𝑛,𝑎 differ by at most a fixed constant.

To relate 𝑀𝑛 and 𝑀𝑛,a to Section 6, we note that when 𝑇 time-dependent random variables are looked at over a summation window of length 𝐾=𝑛+1, then we get 𝑇 points on the 𝑛-sphere. In this situation, we can apply the measures of spread given by 𝑀𝑛(𝑇) or 𝑀𝑛,𝑎(𝑇) or 𝑀𝑘,𝑎(𝑇) for 𝑘<𝑛.

8. Topology of Earth's Climate Indices and Phase-Locked States

In this section, we apply our new correlation measure to data from Douglass's paper [1]. In [1], the diameter (𝑀1=𝑀1,𝑎) is used to analyze a set of climate data; in this section, we use 𝑀2,𝑎 to analyze the same data. Comparing the results of the new analysis to Douglass's original analysis shows the increased effectiveness of the new correlation measure.

Various regions of the Earth's climate system are characterized by temperature and pressure indices. Douglass [1], in a study of a global set of four indices, defines a distanceΓ𝑖𝑗(𝑡)=cos1||𝑋̂𝜌𝑖(𝑡),𝑋𝑗(||𝑡)(8.1) between indices that satisfies the properties required to be a metric (Definition 3.4), where 𝜌(𝑋𝑖(𝑡),𝑋𝑗(𝑡)) is the Pearson correlation coefficient. Note that the distance Γ is an angle.

In Section 7 the correlation among a set of indices can be measured, using 𝑀𝑖,𝑎 by taking the volumes of 𝑖-simplices. In [1], Douglass uses the diameter of the metric space (𝐼0,Γ), defined as 𝐷𝐼0(𝑡)=max𝑖,𝑗Γ𝑋𝑖(𝑡),𝑋𝑗(𝑡)𝑖,𝑗𝐼0.(8.2)

In the notation of Section 7,  𝐷=𝑀1,𝑎. Geometrically, 𝐷 selects the largest angle Γ(𝑋𝑖,𝑋𝑗) among the set. The diameter 𝐷 may be considered a “dissimilarity” index because large 𝐷 means weak correlation. Thus, the minima in 𝐷 are associated with high correlation among the elements of the set. In Douglass, [1], two cases were considered: (1) the set of 3 Pacific ocean indices and (2) the global set of 4 indices (6 independent pairs). The 𝐷 of the global set is shown (in red) in Figure 1.

The maximal area 𝑀2,𝑎, the generalized correlation measure, was computed for the same four indices of [1]. The plot for the calculation is shown (blue) in Figures 1(a) and 1(b). Comparison of the two plots shows that the area measure reveals more minima (30) than the diameter (18). The various minima are indicated by arrows in Figures 1(a) and 1(b), and a list of dates is given in Table 1.

9. Summary

By using covariance on a set of time-independent random variables or the covariance defined by the Pearson correlation on a set of time-dependent variables, we create metrics Γ and Γ (resp.) on the unit sphere (resp., projective space) of the corresponding formal vector spaces. If 𝑉 is the 𝑛-dimensional formal vector space whose basis is the set of random variables {𝑋1,,𝑋𝑛}, we use Γ or Γ to create 𝑀𝑛 or 𝑀𝑛,𝑎, two measures of spread on values taken by the 𝑋𝑖. In Section 8, we give an explicit example of showing the use of 𝑀2,𝑎 on a global set of climate indices.

The two measures of spread differ by at most a fixed multiplicative constant, so for theoretical purposes, they are of equivalent use. However, when applied, they can have different values. The volume of the convex hull created of {𝑋1,,𝑋𝑛}, given by 𝑀𝑛, is the most precise measure of the correlation of the 𝑋𝑖; however, it is computationally difficult. The maximal volume of all possible 𝑛-simplices defined by the 𝑋𝑖, given by 𝑀𝑛,𝑎, is a rougher measure of correlation. However, 𝑀𝑛,𝑎 is a simpler computation than 𝑀𝑛.

In the 2-dimensional example, where all the vectors lie on the 2-sphere, one can apply 𝑀2,𝑎, 𝑀2, or 𝑀1,𝑎=Diameter. But in general 𝑀1,𝑎 is coarser than 𝑀2,𝑎 but is significantly easier to compute. For example, in [1] and Section 8, the use of 𝑀2,𝑎 yields much finer and cleaner results than the use of 𝑀1,𝑎. More generally in 𝑛-dimensions 𝑀𝑙 and 𝑀𝑙,𝑎 for any 𝑙𝑛, and one sacrifices accuracy for ease.