Research Article | Open Access
Qiang Zhao, Panpan Luo, Qing Xi, "Local Structure Recovery of Chain Graphs after Marginalization", Discrete Dynamics in Nature and Society, vol. 2015, Article ID 982495, 5 pages, 2015. https://doi.org/10.1155/2015/982495
Local Structure Recovery of Chain Graphs after Marginalization
This paper discusses local structure recovery of chain graphs (CGs) when there exist unobserved or latent variables or after marginalization over observed variables. Under a condition presented in this paper, it is explained which edges and directions of edges in local structure can be recovered validly and which cannot after marginalization.
Graphical models, also known as Markov networks and Bayesian networks, including independence graphs, directed acyclic graphs (DAGs), and chain graphs (CGs) have been applied widely to many fields, such as stochastic systems, data mining, pattern recognition, artificial intelligence, and causal discovery. Chain graphs (CGs) are widely used to represent independence, conditional independence, and causal relationships among the random variables [1–5]. Structure recovery of CGs has been discussed by many authors [4–6]. A statistical conclusion may be reversed after marginalization over some variables, which is called Yule-Simpson paradox [7, 8]. For sampling design, prior knowledge or assumptions on models are necessary for valid structural learning of CGs and parameter estimates since some variables may be unobserved, such as the faithfulness assumption [4, 5] and collapsibility [9, 10]. On the other hand, for data analysis, conditions are necessary for marginalizing over some observed variables. Various conditions have been presented for avoiding a statistical conclusion reversion about association and parameters of linear models [10–14]. Collapsibility of parameter estimates for undirected graphical models over some variables has been discussed in [9, 15–17], and that of DAGs has been discussed in [18, 19].
In this paper, we discuss local structure recovery for a CG when there exist unobserved or latent variables or after marginalization over observed variables. Suppose that there is an unknown true CG with a large number of variables but we may be interested in construction of a local structure of the CG from a marginal distribution of a subset of variables. We present a condition for this localized recovery and explain which edges and directions of edges in local structure can be recovered validly from the marginal distribution and which edges may be spurious. We say that an edge or a direction is recovered validly from the marginal distribution if it is the same as that recovered from the joint distribution. The condition is useful for both sampling design and data analysis. This localization of structure recovery is related to identification and collapsibility.
2. Notation and Definitions
Let be a graph with a vertex set and an edge set . We say that there is an undirected edge, or a line, between vertex and vertex (denoted as ), if and ; and there is a direct edge, or an arrow, from vertex to (denoted as ), if and . Chain components of are obtained by removing all arrows in and taking the connectivity components of the remaining undirected graph. A chain graph with only undirected edges is known as an undirected graph (). A chain graph with only directed edges and without any directed cycles is known as a directed acyclic graph ().
If , then we call a parent of and a child of . In general, we use and to represent the collection of parents and children of , respectively. If , then we call a neighbour of . We use to represent the set of neighbours of vertex in . If there is an edge between the vertices and , then we say that and are joined or adjacent. The family of contains and its parents, which is denoted as . For a set , we can define similarly , , and . The boundary of is the set of neighbours and parents of , which is denoted as . What is more, the boundary of a set is defined as . In case that the underlying CG is clearly specified, the subscript is often ignored to simplify the notations.
If there is a sequence of the vertices , , with or , for all , then we call it a route and and the ends of the route. Furthermore if for all there is , we say the route is descending and use to denote a descending route from vertex to . If the vertices in a route are all distinct from each other, then we call the route a path. If there is a descending path from vertex to , then we call an ancestor of vertex and a descendant of vertex , which are denoted as and , respectively. Similarly, the ancestral set of a set is defined as , and the descendant set is defined as . Furthermore, we define . A vertex without any children in is called a terminal. Besides, if , we call a terminal set. We call a route a pseudocycle if it satisfies . Besides, it is called a cycle if it also satisfies and are distinct vertices. A cycle or pseudocycle is directed if it is descending and has for some . Finally, if graph does not contain any directed cycles or directed pseudocycles, we will call it a (chain graph).
A section of a route in means any maximum undirected subroute in , for example, with . We call and the two ends of the section. Besides, if there is in for some (or in for some ), then we call (or ) a head-terminal, or else we call it a tail-terminal. The head-to-head section in reference to route is the section which has two head-terminals, and a non-head-to-head section if it has one head-terminal at most. We use to denote the set of vertices of a section . If with , then we consider that section is outside of the set . If , then we consider that is hit by set .
Another important concept in is complex . A complex in is an especial path , , with , , and in , where there is no extra edge among vertices on the path. The vertices and are called the parents of complex , denoted as , and is called the region of . What is more, the complex with vertex as one of its parents is denoted by . At last, the Markov Blanket of vertex is defined as .
Next, we give the definition of -separation in . If a section of route satisfies the following two presentations, (1) if is a head-to-head section in reference to and is outside of set , or (2) is not a head-to-head section in reference to and is hit by set , then we say that is -separated by the set in . What is more, we say that the disjoint set in is -separated by the disjoint set if every route between and is -separated by , and form a -separation which is denoted by .
If two share the same -separation patterns over the same set of vertices, then we say that they are Markov equivalent. All chain graphs, which are Markov equivalent with each other, form equivalent class, which is known as Markov equivalent class. It is well known that two are Markov equivalent if and only if they share the identical global skeleton and complexes .
If a probability distribution over permits the following factorization , where is the collection of chain components of and after given , the conditional probability distribution of is denoted as ; then we say that is a compatible probability distribution in reference to . It is easy to check that if is a compatible probability distribution in reference to , we have where represents the conditional independence between and given in . If the condition is strengthened to then we say that is faithful to . Let be the family of compatible probability distributions in reference to . A chain graph model is defined as , where is a family of probability distributions that are compatible with .
Example 1. Consider a in Figure 1, where and , . We have , , , , , , , and . and are complexes, where and , respectively. The path between and is -separated by or , while the path between and is -separated by the empty set. The sets and are -separated by the set , and form a -separation .
3. Local Structure Recovery after Marginalization
Let , , and be a partition of all variables in . In this section, we assume that and suppose that variables in are omitted or unobservable. The assumption of conditional independence has been discussed as one of conditions for collapsibility of parameter estimates over unobserved variables [9–12, 14, 18, 19]. Collapsibility of parameter estimates further requires another condition that the separator is a complete subgraph in the moral graph. A decomposition approach of structural learning proposed in  also requires the condition of a complete separator. In this section, we show under the assumption of conditional independence but without the condition of a complete separator that the local structure of over can be partially recovered from the marginal distribution of observed variables in . In many practical applications, the conditional independence can be judged with domain or prior knowledge, such as Markov chain, chain graphical models, and dynamic or temporal models . When all variables in the full set are observed, we can first construct an undirected independence graph over from the observed data; then we find a set which separates and in the undirected graph , and thus we have that holds [2, 3].
Two have the same Markov property if and only if they have the same skeleton (i.e., an undirected version of ) and the same complexes . Thus for recovering the structure of a , we can only learn the skeleton and complexes from a distribution of observed variables. A marginal distribution obtained from the distribution with the Markov property of may not obey the Markov property of any . Although the class of is not closed under marginalization in this sense, we show that local structure of a may be partially recovered under some conditions. First we discuss which edges of local structure over can be recovered validly and which cannot from a marginal distribution of .
Theorem 2. Two vertices in are -separated by a subset of if and only if they are -separated by a subset of .
Proof. The sufficiency is obvious since . For the necessity, let and be two vertices in that are -separated by . Thus there is no edge connecting and in . Since and are contained in , and are contained in , otherwise . Without loss of generality, suppose that is not an ancestor of . Thus we have that -separates and .
From Theorem 2, we can see that the existence of edges falling into can be determined validly from the marginal distribution of .
Theorem 3. Let and be two vertices in and , respectively. Then and are -separated by a subset of if and only if they are -separated by a subset of .
Proof. The sufficiency is trivial. For the necessity, we show below that can -separate every route connecting and . We consider the following two cases separately:(1)a route is contained in ,(2)a route is not completely contained in .
We first discuss case (1). Let . Note that must not be the vertex since there is no edge between and , but may be . Further we consider two subcases:(1.1) is not contained in any head-to-head section,(1.2) is contained in some head-to-head section.
For subcase (1.1), we have for and thus for . So the path is -separated by node .
For subcase (1.2), let us suppose the head-to-head section on route is , where is the other parent of this head-to-head section. Since is contained in , then ; thus we have the fact that the node must not be . Otherwise, and , which is contradictory to . Because and , we have that . Thus and then . Thus we have that the route is -separated by node .
Now we consider case (2) and show first that such a route contains a head-to-head section. Let such that and are contained in but no vertices from to are contained in since is not completely contained in . We know that the arrows must be oriented as and since and . Then there must be a head-to-head section between and on , and none of its descendants is in . Thus vertices on the region of this head-to-head section and its descendants are not in , and we obtain that is -separated by . We proved this theorem.
According to Theorem 3, the existence of edges crossing and can also be determined validly from the marginal distribution of .
Example 1 (continued). Consider again the in Figure 1. Let , , and . We have . Thus according to Theorems 2 and 3, we can obtain a local skeleton from the marginal distribution of , which may have spurious edges, as shown in Figure 2. Similarly, suppose that variables in are observed but those in are not observed. Then we obtain the local skeleton from the marginal distribution of , as shown in Figure 3. Note that the edges , , and falling into are spurious in Figure 2, but they are absent in Figure 3.
Next we discuss recovery of complexes from a marginal distribution of . We say that a complex () can be determined validly from the marginal distribution of if the marginal distribution has the following two conditions: (1) for some and (2) and . Below we discuss which complexes of local structure can be determined validly from the marginal distribution and which cannot. The following two theorems show conditions for determining complexes and for validity of determined complexes, respectively.
Theorem 4. If at most one vertex of a complex is not contained in , then the complex can be determined from the marginal distribution of .
Proof. Let () be a complex where and are the parents of this complex and at most one of these vertices is not contained in . It follows from that . From Theorems 2 and 3, we can validly determine the presence of edges and find a subset of that -separates and and does not contain the vertices from to . Thus this complex can be determined from the marginal distribution of .
Theorem 5. If two parents of a complex are contained in , then the directions of the complex can be determined from the marginal distribution of .
Proof. Let be a complex where and are the parents of this complex and . It follows from that ( and may be the same vertex). Rewrite this complex as , where and . Then we can determine the presence of edge from the marginal distribution of . From Theorems 2 and 3, we can validly determine the presence of edges as well as finding a subset of that -separates and and does not contain the vertex nor . Thus the directions of this complex, and , can be determined from the marginal distribution of .
According to Theorem 4, from the marginal distribution of , we can determine all complexes in the local structure which have at most one vertex in . When two parents of a complex are contained in , it may not be determined since the two parents may not be -separated by any subset of . From Theorem 5, however, if a complex is determined from the marginal distribution of whose two parents are contained in , the directions of the complex must be valid. A complex who has more than one vertex fall into may be a spurious complex since the edges falling into may be spurious.
From Theorems 2 to 5, we can get an approach for local structure recovery, in which edges can be recovered according to Theorems 2 to 3, and then directions can be determined according to Theorems 4 and 5. Suppose that we are interested in the local structure recovery of a over a set of variables. Then we must find a set to be observed based on the domain or prior knowledge such that is large enough to -separate from the set of unobserved variables.
Example 1 (continued). Now we search complexes in Figures 2 and 3. For Figure 2, suppose that variable is not observed. According to Theorem 4, three complexes , , and can be found from the marginal distribution of , as shown in Figure 4. The complexes and cannot be found since there exist spurious edges between their parents, although the directions of and are oriented with other complexes. For Figure 3, there is no complex in this local structure because variables , and are conditional independent given variable .
If variable is omitted or unobservable, we only obtain the local structure in Figure 4, in which all directions and all edges not falling into are valid, but edges falling into (here , , and ) may be spurious.
We showed that the conditional independence is a sufficient condition for local structure recovery when there exist unobserved or latent variables or after marginalization over observed variables. The conditional independence is an important prior knowledge for local structure recovery. This prior may hold in many cases, such as Markov chain, chain graphical models, and dynamic or temporal models . Suppose that we are interested in the local structure of a over a set of variables. We must find a set which is large enough to separate from the set of unobserved variables. Based on the conditional independence , we explained which edges and directions of edges in local structure can be recovered validly and which cannot after marginalization.
Based on the theoretical results presented in this paper, we can efficiently recover local structures of a . Domain or prior knowledge of conditional independencies can be utilized to facilitate the structural recovery. The theoretical results can also be used for an observational study design and a split questionnaire survey sampling [22, 23].
Conflict of Interests
The authors declare that there is no conflict of interests regarding the publication of this paper.
This research was supported by NSFC (11001155), Doctoral Fund of Shandong Province (BS2010SW030), and A Project of Shandong Province Higher Educational Science and Technology Program (J11LA08).
- D. R. Cox and N. Wermuth, Multivariate Dependencies: Models, Analysis, and Interpretation, Chapman and Hall, London, UK, 1996.
- D. Edwards, Introduction to Graphical Modelling, Springer Texts in Statistics, Springer, New York, NY, USA, 2nd edition, 2000.
- S. L. Lauritzen, Graphical Models, vol. 17 of Oxford Statistical Science Series, Oxford University Press, 1996.
- J. Pearl, Causality: Models, Reasoning, and Inference, Cambridge University Press, Cambridge, UK, 2000.
- P. Spirtes, C. Glymour, and R. Scheines, Causation, Prediction and Search, vol. 81 of Lecture Notes in Statistics, Springer, New York, NY, USA, 1993.
- R. G. Cowell, A. P. Dawid, S. L. Lauritzen, and D. J. Spiegelhalter, Probabilistic Networks and Expert Systems, Springer, New York, NY, USA, 1999.
- E. H. Simpson, “The interpretation of interaction in contingency tables,” Journal of the Royal Statistical Society, Series B: Methodological, vol. 13, pp. 238–241, 1951.
- G. U. Yule, “Notes on the theory of association of attributes in statistics,” Biometrika, vol. 2, no. 2, pp. 121–134, 1903.
- D. Madigan and K. Mosurski, “An extension of the results of Asmussen and Edwards on collapsibility in contingency tables,” Biometrika, vol. 77, no. 2, pp. 315–319, 1990.
- N. Wermuth, “Parametric collapsibility and the lack of moderating effects in contingency tables with a dichotomous response variable,” Journal of the Royal Statistical Society. Series B. Methodological, vol. 49, no. 3, pp. 353–364, 1987.
- D. R. Cox and N. Wermuth, “A general condition for avoiding effect reversal after marginalization,” Journal of the Royal Statistical Society. Series B. Statistical Methodology, vol. 65, no. 4, pp. 937–941, 2003.
- Z. Geng, “Collapsibility of relative risk in contingency tables with a response variable,” Journal of the Royal Statistical Society Series B: Methodological, vol. 54, no. 2, pp. 585–593, 1992.
- N. Wermuth, “Moderating effects of subgroups in linear models,” Biometrika, vol. 76, no. 1, pp. 81–92, 1989.
- A. S. Whittemore, “Collapsibility of multidimensional contingency tables,” Journal of the Royal Statistical Society, Series B: Methodological, vol. 40, no. 3, pp. 328–340, 1978.
- S. Asmussen and D. Edwards, “Collapsibility and response variables in contingency tables,” Biometrika, vol. 70, no. 3, pp. 567–578, 1983.
- M. Frydenberg, “Marginalization and collapsibility in graphical interaction models,” The Annals of Statistics, vol. 18, no. 2, pp. 790–805, 1990.
- Z. Geng, K. Wan, and T. Feng, “Mixed graphical models with missing data and the partial imputation EM algorithm,” Scandinavian Journal of Statistics: Theory and Applications, vol. 27, no. 3, pp. 433–444, 2000.
- S.-H. Kim and S.-H. Kim, “A note on collapsibility in DAG models of contingency tables,” Scandinavian Journal of Statistics. Theory and Applications, vol. 33, no. 3, pp. 575–590, 2006.
- X. Xie and Z. Geng, “Collapsibility for directed acyclic graphs,” Scandinavian Journal of Statistics, vol. 36, no. 2, pp. 185–203, 2009.
- M. Studeny, “A recovery algorithm for chain graphs,” International Journal of Approximate Reasoning, vol. 17, no. 2-3, pp. 265–293, 1997.
- Z. Geng, C. Wang, and Q. Zhao, “Decompsition of search for v-structures in DAGs,” Journal of Multivariate Analysis, vol. 96, no. 2, pp. 282–294, 2005.
- R. J. A. Little and D. B. Rubin, Statistical Analysis with Missing Data, Wiley, New York, NY, USA, 2nd edition, 2002.
- S. Rässler, Statistical Matching, vol. 168 of Lecture Notes in Statistics, Springer, New York, NY, USA, 2002.
Copyright © 2015 Qiang Zhao et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.