Research Article | Open Access
Gaussian Covariance Faithful Markov Trees
Graphical models are useful for characterizing conditional and marginal independence structures in high-dimensional distributions. An important class of graphical models is covariance graph models, where the nodes of a graph represent different components of a random vector, and the absence of an edge between any pair of variables implies marginal independence. Covariance graph models also represent more complex conditional independence relationships between subsets of variables. When the covariance graph captures or reflects all the conditional independence statements present in the probability distribution, the latter is said to be faithful to its covariance graph—though in general this is not guaranteed. Faithfulness however is crucial, for instance, in model selection procedures that proceed by testing conditional independences. Hence, an analysis of the faithfulness assumption is important in understanding the ability of the graph, a discrete object, to fully capture the salient features of the probability distribution it aims to describe. In this paper, we demonstrate that multivariate Gaussian distributions that have trees as covariance graphs are necessarily faithful.
Markov random fields and graphical models are widely used to represent conditional independences in a given multivariate probability distribution (see [1–5], to name just a few). Many different types of graphical models have been studied in the literature. Concentration graphs encode conditional independence between pairs of variables given the remaining ones. Formally, let us consider a random vector with a probability distribution where is a finite set representing the random variables in . An undirected graph is called the covariance graph (see [1, 6–11]) associated with the probability distribution if the set of edges is constructed as follows: Note that means that the vertices and are not adjacent in .
The concentration graph associated with is an undirected graph , where is the set of vertices and each vertex represents one variable in . The set is the set of edges (between the vertices in ) constructed using the pairwise rule: for pair , where .
Note that the subscript zero is invoked for covariance graphs (i.e., versus ) as the definition of covariance graphs does not involve conditional independences.
Both concentration and covariance graphs not only are used to encode pairwise relationships between pairs of variables in the random vector , but as we will see below, these graphs can also be used to encode conditional independences that exist between subsets of variables of . First, we introduce some definitions.
The multivariate distribution is said to satisfy the “intersection property” if for, any subsets , , , and of which are pairwise disjoint,
We will call the intersection property (see ) in (1.3) above the concentration intersection property in this paper in order to differentiate it from another property that is satisfied by when studying covariance graph models. Though this property can be further relaxed, we will retain the terminology used in .
We first define the concept of separation on graphs. Let , , and denote a pairwise disjoint set of vertices. We say that a set separates and if all paths connecting and in intersect , that is, . (This is not to be confused with stochastic independence which is denoted by as compared to .) Now, let satisfy the concentration intersection property. Then, for any triplet of subsets of pairwise disjoint, if separates and in the concentration graph associated with , then the random vector is independent of given . This latter property is called concentration global Markov property and is formally defined as Kauermann  shows that if satisfies the following property: for any triplet of subsets of pairwise disjoint, then, for any triplet of subsets of pairwise disjoint, if separates and in the covariance graph associated with , then . This latter property is called the covariance global Markov property and can be written formally as follows: In parallel to the concentration graph case, property (1.5) will be called the covariance intersection property and is sometimes also referred to as the composition property. Even if satisfies both intersection properties, the covariance and concentration graphs may not be able to capture or reflect all the conditional independences present in the distribution; that is, there may exist one or more conditional independences present in the probability distribution that does not correspond to any separation statement in either or . Equivalently, a lack of a separation statement in either or does not necessarily imply a conditional independence. On the contrary case when no other conditional independence exists in except the ones encoded by the graph, we classify as a faithful probability distribution to its graphical model (see ). More precisely, we say that is concentration faithful to its concentration graph if, for any triplet of subsets of pairwise disjoint, the following statement holds: Similarly, is said to be covariance faithful to its covariance graph if, for any triplet of subsets of pairwise disjoint, the following statement holds: A natural question of both theoretical and applied interest in probability theory is to understand the implications of the faithfulness assumption. This assumption is fundamental since it yields a bijection between the probability distribution and the graph in terms of the independences that are present in the distribution. In this paper, we show that when is a multivariate Gaussian distribution, whose covariance graph is a tree, it is necessarily covariance faithful, that is, such probability distributions satisfy property (1.8). Equivalently, the associated covariance graph is fully able to capture all the conditional independences present in the multivariate distribution . This result can be considered as a dual of a previous probabilistic result proved by Becker et al.  for concentration graphs that demonstrates that Gaussian distributions having concentration trees (i.e., the concentration graph is a tree) are necessarily concentration faithful to its concentration graph (implying that property (1.7) is satisfied). This result was proved by showing that Gaussian distributions satisfy two types of conditional independence properties: the intersection property and the decomposable transitivity property. The approach in the proof of the main result of this paper is vastly different from the one used for concentration graphs (see ). Indeed, a naïve or unsuspecting reader could mistakenly think that the result for covariance trees follows simply by replacing the covariance matrix with its inverse in the result in Becker et al. . This is of course incorrect and, in some sense, equivalent to saying that a matrix and its inverse are the same. The covariance matrix encodes marginal independences whereas the inverse covariance matrix encodes conditional independences. These are very different models. Moreover, the former is a curved exponential family model whereas the latter is a natural exponential family model.
The outline of this paper is as follows. Section 2 presents graph theory preliminaries. Section 2.2 gives a brief overview of covariance and concentration graphs associated with multivariate Gaussian distributions. The proof of the main result of this paper is given in Section 3. Section 4 concludes by summarizing the results in the paper and the implications thereof.
2.1. Graph Theoretic Concepts
This section introduces notation and terminology that is required in subsequent sections. An undirected graph consists of two sets and , with representing the set of vertices, and the set of edges satisfying, . For , we write when and we say that and are adjacent in . A path connecting two distinct vertices and in is a sequence of distinct vertices , where and , where, for every , . Such a path will be denoted and we say that connects and or alternatively and are connected by . We also denote by the set of paths between and . We now proceed to define the subclass of graphs known as trees. Let be an undirected graph. The graph is called a tree if any pair of vertices in is connected by exactly one path; that is, . A subgraph of induced by a subset is denoted by , and . A connected component of a graph is the largest subgraph of such that each pair of vertices can be connected by at least one path in . We now state a lemma, without proof, that is needed in the proof of the main result of this paper.
Lemma 2.1. Let be an undirected graph. If is a tree, then any subgraph of induced by a subset of is a union of connected components, each of which are trees (or what we will refer to as a “union of tree connected components”).
For a connected graph, a separator is a subset of such that there exists a pair of nonadjacent vertices and such that , and If is a separator, then it is easily verified that every such that is also a separator.
2.2. Gaussian Concentration and Covariance Graphs
In this section, we present a brief overview of concentration and covariance graphs in the case when the probability distribution is multivariate Gaussian. Consider a random variable , where and , where denotes the cone of positive definite matrices. Without loss of generality, we will assume that . Gaussian distributions can also be parameterized by the inverse of the covariance matrix denoted by . The matrix is called the precision or concentration matrix. It is well known (see ) that for any pair of variables , where , . Hence, the concentration graph can be constructed simply using the precision matrix and the following rule: . Furthermore, it can be easily deduced from a classical result (see ) that for any Gaussian concentration graph model the pairwise Markov property in (1.2) is equivalent to the concentration global Markov property in (1.4).
As seen earlier in (1.1) covariance graphs on the other hand are constructed using pairwise marginal independence relationships. It is also well known that, for multivariate Gaussian distributions, . Hence, in the Gaussian case, the covariance graph can be constructed using the following rule: . It is also easily seen that Gaussian distributions satisfy the covariance intersection property defined in (1.5). Hence, Gaussian covariance graphs can also encode conditional independences according to the following rule: for any triplet of subsets of pairwise disjoint, if separates and in the covariance graph , then .
3. Gaussian Covariance Faithful Trees
We now proceed to study the faithfulness assumption in the context of multivariate Gaussian distributions and when the associated covariance graphs are trees. The main result of this paper, presented in Theorem 3.1, proves that multivariate Gaussian probability distributions having tree covariance graphs are necessarily faithful to their covariance graphs; that is, all of the independence and dependences in can be read by using graph separation. We now formally state Theorem 3.1. The proof follows shortly after a series of lemmas/theorem(s) and an illustrative example.
Theorem 3.1. Let be a random vector with Gaussian distribution . Let be the covariance graph associated with . If is a disjoint union of trees, then is covariance faithful to .
The proof of Theorem 3.1 requires, among others, a result that gives a method to compute the covariance matrix from the precision matrix using the paths in the concentration graph . The result can also be easily extended to show that the precision matrix can be computed from the covariance matrix using the paths in the covariance graph . We now formally state this result.
Lemma 3.2. Let be a random vector with Gaussian distribution , where and are positive definite matrices. Let and denote, respectively, the concentration and covariance graph associated with the probability distribution of . For all in , where, if , ,, denote, respectively, and with rows and columns corresponding to variables in path omitted. The determinant of a zero-dimensional matrix is defined to be 1.
The lemma above follows immediately from a basic result in linear algebra which gives the cofactor expression for the inverse of a square matrix. In particular, for an invertible matrix , its inverse can be expressed as follows:
A simple proof can be found in Brualdi and Cvetkovic . The result has been rediscovered in other contexts (see ), but, as noted above, it follows immediately from the expression for the inverse of a matrix.
The proof of our main theorem (Theorem 3.1) also requires the results proved in the lemma below.
Lemma 3.3. Let be a random vector with Gaussian distribution . Let and denote, respectively, the covariance and concentration graphs associated with , then (i) and have the same connected components,(ii)if a given connected component in is a tree, then the corresponding connected component in is complete and vice versa.
Proof. Proof of (i): the fact that and have the same connected components can be deduced from the matrix structure of the covariance and the precision matrix. The connected components of correspond to block diagonal matrices in . Since , then, by properties of inverting partitioned matrices, also has the same block diagonal matrices as in terms of the variables that constitute these matrices. These blocks correspond to distinct components in and . Hence, both matrices have the same connected components.
Proof of (ii): let us assume now that the covariance graph is a tree, hence it is a connected graph with only one connected component. We will prove that the concentration graph is complete by using Lemma 3.2 and computing any coefficient (). Since is a tree, there exists exactly one path between any two vertices and . We will denote this path as . Then, by Lemma 3.2, First, note that the determinants of the matrices in (3.3) are all positive since principal minors of positive definite matrices are positive. Second, since we are considering a path in , , . Using these two facts, we deduce from (3.3) that for all . Hence, and are adjacent in for all . The concentration graph is therefore complete. The proof that when is assumed to be a tree implying that is complete follows similarly.
We now give an example illustrating the main result in this paper (Theorem 3.1).
Example 3.4. Consider a Gaussian random vector with covariance matrix and its associated covariance graph (which is a tree) as given in Figure 1(a).
Consider the sets , , and . Note that does not separate and in as any path from and does not intersect . Hence, we cannot use the covariance global Markov property to claim that is not independent of given . This is because the covariance global Markov property allows us to read conditional independences present in a distribution if a separation is present in the graph. It is not an “if and only if” property in the sense that the lack of a separation in the graph does not necessarily imply the lack of the corresponding conditional independence. We will show however that in this example is indeed not independent of given . In other words, we will show that the graph has the ability to capture this conditional dependence present in the probability distribution .
Let us now examine the relationship between and given . Note that in this example , , and . Note that the covariance graph associated with the probability distribution of the random vector is the subgraph represented in Figure 1(b) and can be obtained directly as a subgraph of induced by the subset .
Since 2 and 5 are connected by exactly one path in , that is, , then the coefficient , that is, the coefficient between 2 and 5 in inverse of the covariance matrix of , can be computed using Lemma 3.2 as follows: where and are, respectively, the covariance matrices of the Gaussian random vectors and . Hence, since the right hand side of the equation in (3.4) is different from zero. Hence, .
Now, recall that, for any Gaussian random vector , where , , and are pairwise disjoint subsets of . The contrapositive of (3.5) yields
Hence, we conclude that since does not separate and , is not independent of given . Thus, we obtain the desired result:
(a) An 8-vertex covariance tree
We now proceed to the proof of Theorem 3.1.
Proof of Theorem 3.1. Without loss of generality, we assume that is a connected tree. Let us assume to the contrary that is not covariance faithful to , then there exists a triplet of pairwise disjoint subsets of , such that , but does not separate and in , that is,
As does not separate and and since is a connected tree, then there exists a pair of vertices such that the single path connecting and in does not intersect ; that is, . Hence, . Thus, two cases are possible with regard to where the path can lie: either or . Let us examine both cases separately.
Case 1 (). In this case, the entire path between and lies in and hence we can find a pair of vertices belonging to and such that . (As an illustration of this point, consider the graph presented in Figure 1(a). Let , , and . We note that the path lies entirely in and hence we can find two vertices, namely, and , belonging to path that are adjacent in ).Recall that since is a tree, any induced graph of by a subset of is a union of tree connected components (see Lemma 2.1). Hence, the subgraph of induced by is a union of tree connected components. As and are adjacent in , they are also adjacent in and belong to the same connected component of . (In our example in Figure 1(a) with , consists of a union of two connected components with its respective vertices being and .) Hence, the only path between and is precisely the edge . Using Lemma 3.2 to compute the coefficient , that is, coefficient in the inverse of the covariance matrix of the random vector , we obtain, where denotes the covariance matrix of and denotes the matrix with the rows and the columns corresponding to variables and omitted. We can therefore deduce from (3.9) that . Hence, . Now, since is Gaussian, , and , we can apply (3.5) to arrive at a contradiction to our initial assumption in (3.8).Note in the case that is empty the path has to lie entirely in . This is because by assumption does not intersect . The case when lies in is covered in Case 1 and hence it is assumed that . (As an illustration of this point, consider once more the graph presented in Figure 1(a). Consider , , and . Here, and the path connecting and intersects .) Case 2 ( and ). In this case, there exists a pair of vertices with , such that the vertices and are connected by exactly one path in the induced graph of by (see Lemma 2.1). (In our example in Figure 1 with , , and , the vertices and will correspond to vertices 2 and 7, respectively, and , which is a path entirely contained in .)
Let us now use Lemma 3.2 to compute the coefficient , that is, the -coefficient in the inverse of the covariance matrix of the random vector . We obtain that where denotes the covariance matrix of and denotes with the rows and the columns corresponding to variables in path omitted. One can therefore easily deduce from (3.10) that . Thus, is not independent of given . Hence, once more we obtain a contradiction to (3.5) since and .
Remark 3.5. The dual result of the theorem above for the case of concentration trees was proved by Becker et al. . We note however that the argument used in the proof of Theorem 3.1 cannot also be used to prove faithfulness of Gaussian distributions that have trees as concentration graphs. The reason for this is as follows. In our proof, we employed the fact that the subgraph of induced by a subset is also the covariance graph associated with the Gaussian subrandom vector of as denoted by . Hence, it was possible to compute the coefficient which quantifies the conditional (in)dependence between and given , in terms of the paths in and the coefficients of the covariance matrix of . On the contrary, in the case of concentration graphs the sub-graph of the concentration graph induced by is not in general the concentration graph of the random vector . Hence our approach is not directly applicable in the concentration graph setting.
In this note we looked at the class of multivariate Gaussian distributions that are Markov with respect to covariance graphs and prove that Gaussian distributions which have trees as their covariance graphs are necessarily faithful. The method of proof used in the paper is also vastly different in nature from the proof of the analogous result for concentration graph models. Hence, the approach that is used could potentially have further implications. Future research in this area will explore if the analysis presented in this paper can be extended to other classes of graphs or distributions.
D. Malouche was supported in part by a Fullbright Fellowship Grant 68434144. B. Rajaratnam was supported in part by NSF grants DMS0906392, DMS(CMG)1025465, AGS1003823, NSA H98230-11-1-0194, and SUFSC10-SUSHSTF09SMSCVISG0906.
- D. R. Cox and N. Wermuth, Multivariate Dependencies, vol. 67 of Monographs on Statistics and Applied Probability, Chapman & Hall, London, UK, 1996.
- S. L. Lauritzen, Graphical Models, vol. 17 of Oxford Statistical Science Series, The Clarendon Press Oxford University Press, New York, NY, USA, 1996.
- J. Whittaker, Graphical Models in Applied Multivariate Statistics, Wiley Series in Probability and Mathematical Statistics: Probability and Mathematical Statistics, John Wiley & Sons Ltd., Chichester, UK, 1990.
- D. Edwards, Introduction to Graphical Modelling, Springer Texts in Statistics, Springer, New York, NY, USA, 2nd edition, 2000.
- B. Rajaratnam, H. Massam, and C. M. Carvalho, “Flexible covariance estimation in graphical Gaussian models,” The Annals of Statistics, vol. 36, no. 6, pp. 2818–2849, 2008.
- G. Kauermann, “On a dualization of graphical Gaussian models,” Scandinavian Journal of Statistics, vol. 23, no. 1, pp. 105–116, 1996.
- M. Banerjee and T. Richardson, “On a dualization of graphical Gaussian models: a correction note,” Scandinavian Journal of Statistics, vol. 30, no. 4, pp. 817–820, 2003.
- N. Wermuth, D. R. Cox, and G. M. Marchetti, “Covariance chains,” Bernoulli, vol. 12, no. 5, pp. 841–862, 2006.
- D. Malouche, “Determining full conditional independence by low-order conditioning,” Bernoulli, vol. 15, no. 4, pp. 1179–1189, 2009.
- K. Khare and B. Rajaratnam, “Covariance trees and Wishart distributions on cones,” in Algebraic Methods in Statistics and Probability II, vol. 516 of Contemporary Mathematics, pp. 215–223, American Mathematical Society, Providence, RI, USA, 2010.
- K. Khare and B. Rajaratnam, “Wishart distributions for decomposable covariance graph models,” The Annals of Statistics, vol. 39, no. 1, pp. 514–555, 2011.
- M. Studený, Probabilistic Conditional Independence Structures, Springer, New York, NY, USA, 2004.
- A. Becker, D. Geiger, and C. Meek, “Perfect tree-like Markovian distributions,” Probability and Mathematical Statistics, vol. 25, no. 2, pp. 231–239, 2005.
- R. A. Brualdi and D. Cvetkovic, A Combinatorial Approach to Matrix Theory and Its Applications, Chapman & Hall/CRC, New York, NY, USA, 2008.
- B. Jones and M. West, “Covariance decomposition in undirected Gaussian graphical models,” Biometrika, vol. 92, no. 4, pp. 779–786, 2005.
Copyright © 2011 Dhafer Malouche and Bala Rajaratnam. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.