#### Abstract

We propose an approach for structural learning of directed acyclic graphs from multiple databases. We first learn a local structure from each database separately, and then we combine these local structures together to construct a global graph over all variables. In our approach, we do not require conditional independence, which is a basic assumption in most methods.

#### 1. Introduction

Graphical models including independence graphs, directed acyclic graphs (DAGs), and Bayesian networks have been applied widely to many fields, such as data mining, pattern recognition, artificial intelligence, complex systems, and causal discovery [1–4]. Graphical models can be used to cope with uncertainty for a large system with a great number of variables. Structural learning of graphical models from data is an important and difficult problem and has been discussed by many authors [1–5]. There are two main kinds of structural learning methods. One is constraint-based learning and the other is score-based learning. Most of the structural learning approaches deal with only one database with completely observed data. With the development and popularity of computers, various databases have been built, which may contain different sets of variables and overlap with each other. For example, in medical research, a researcher collects data of these variables, another researcher may collect data of other variables, and they have some common variables.

In this paper, we discuss how to learn the structures of DAGs from multiple databases with different and overlapped variables. In our approach, we first learn a local structure from each database separately, and then we combine these structures together to construct a global graph over all variables. Several theoretical results are shown for the validity of our algorithm. Our approach can validly discover DAGs from multiple databases. In our approach, we only need a weaker condition than conditional independence, which is a basic assumption in most methods [1–5]. This approach can also utilize the prior knowledge of conditional independencies to reduce the number of variables in each conditional set.

Section 2 gives notation and definitions. In Section 3, we show how to construct the DAG with multiple databases. We give an example in Section 4 to illustrate our approach for recovering a DAG. Finally a discussion is given in Section 5.

#### 2. Notation and Definitions

Let denote a DAG where is a set of vertices and is a set of directed edges. A directed edge from a vertex to another vertex is denoted by , and we say that is a parent of and is a child of . We denote the set of all parents of a vertex by . A path between the two distinct vertices and is a sequence of distinct vertices that starts with , ends with , and two consecutive vertices are connected by an edge; that is, , where or is contained in , for (), and , for all . We say that is an ancestor of and is a descendant of if there is a path from to in and all edges on this path point at the direction toward . Denote the set of ancestors of as . A path is said to be separated by a set of vertices if and only if there exists at least one , such that . And is said to be -separated by if and only if (1)contains a “chain” or a “fork” such that the middle vertex is in , or (2) contains an “inverted fork” such that the collision vertex is not in and no descendant of is in .

Two sets and of vertices are separated (-separated) by a set if separates (-separates) every path from any vertex in to any vertex in , and we say is a separator (-separator) of and . If , and in a DAG , then the triple is called an immorality and and are called two parents of the immorality.

Note that the two vertexes and of the DAG are nonadjacent if and only if they are -separated by some subset . Though this is obvious the case by taking to be either or for DAGs, there are certain types of graphs in which nonadjacency is not sufficient for separability, for example, the ancestral graph in [6].

*Example 2.1. *Consider a DAG in Figure 1, where and . In this DAG, we have and that , , , , and are immoralities. The path between and is -separated by or , while the path between and is -separated by the empty set. Vertex is an ancestor of and is a descendant of . The sets and are -separated by the set .

Let vertices in a DAG denote variables . If a joint distribution or density of variables satisfies where is the conditional probability or density of given , then DAG and the distribution are said to be compatible [3] and obeys the global directed Markov property of [2]. We use the notation in [7] to denote independence. Let denote the independence of and and the conditional independence of and given for any variables or sets of variables , , and . Since in our discussion we always think the joint distribution corresponds to an underlying DAG, we do not differentiate the usage of letters as , , , , , to denote variables or vertexes; however, we will make an obvious reference if the context is not clear.

As pointed out by [3], if sets and are -separated by , then is independent of conditionally on in every distribution that is compatible with . In this paper, we assume that all the distributions are compatible with . We also assume that all independencies of a probability distribution of variables in can be checked by -separation of , called the faithfulness assumption in [4]. The faithfulness assumption means that all independencies and conditional independencies among variables can be represented by . For a distribution which obeys the faithfulness assumption, we can learn the underlying DAG by checking the pairwise conditional independence , where and are two random variables and is a subset of variables.

A hypergraph is a collection of vertex sets [8, 9]. Multiple databases are depicted as a hypergraph, where a hyperedge is an observed variable set in a database and [5, 10]. A database with an observed variable set is treated as a sample from a marginal distribution of the variable set . Let , which is the intersection of and the other sets. Given a collection of databases , there is no information on higher interactions over different databases.

*Example 2.2. *Let be a hypergraph, as shown in Figure 2. We can get that , , and .

#### 3. Structural Learning of DAGs

In this section, we propose an approach for structural learning of DAGs. In our approach, we first learn a local structure from each database, and then we combine these local structures together to construct a global graph over all variables.

Note that for a distribution obeying the faithfulness assumption with respect to a certain DAG , there may not exist a DAG which can fully represent all the conditional independencies in the marginal distribution; see [6] for more discussion on this issue. Though this fact implies that we may not expect to learn a DAG for each database, it will be shown that under a certain condition a marginal structure could be learned which partially reflects the original true structure.

We consider a joint distribution on a set of variables which satisfies the faithfulness assumption and denote by the DAG which can fully represent the conditional independencies in this joint distribution. We consider the problem of structure recovery from multiple databases. To facilitate our discussion, we first give the definition of the local structure.

*Definition 3.1. *For a DAG and a subset , we define the local structure to be an undirected graph where the edge set for all .

From the definition, it is known that to judge whether and are adjacent in the local structure , we need only to search for a -separator from all possible variable subsets such that two variables and are independent conditionally on . With the faithfulness assumption, this is equivalent to test whether and are independent conditionally on and this can be done by using data observed on only. Note that the edges in the local structure may be spurious in the sense that its two vertexes are not adjacent in the original DAG; we call these edges spurious edges. However, in the section below, we show that such learned local structure could be used to identify part of the true structure of the original DAG under one additional condition. We give a lemma to be used in proofs of theorems.

Lemma 3.2. *If is not an ancestor of , then and are -separated by a subset of if and only if they are -separated by . *

*Proof. *See [2].

The following two theorems show the relationship between the local structure and the true structure of the original DAG.

Theorem 3.3. *Let , , and be a partition of . If and are separated by , then two vertices in are -separated by a subset of if and only if they are -separated by a subset of .*

*Proof. *The necessity is obvious since . For the sufficiency, let and be two vertices in that are -separated by . Thus there is no edge connecting and in . Since and are contained in and and are separated by , and are contained in . Without loss of generality, suppose that is not an ancestor of . From Lemma 3.2 we have that -separates and .

From Theorem 3.3, we can see that two vertexes in are adjacent in the DAG if and only if they are adjacent in the local structure . This means that with the faithfulness assumption, the existence of edges falling into can be determined validly from the marginal distribution of .

Theorem 3.4. *Suppose , , and is a partition of and and are separated by . Let and . If and are -separated by a subset of , then they are -separated by a subset of .*

*Proof. *The result is obvious since .

According to Theorem 3.4, we can see that and are not adjacent in the DAG if they are not adjacent in the local structure . This means that we may get spurious edges in the local structure .

According to the two Theorems above, we can get that an edge whose two vertices are contained only by one database can be determined by using the marginal distribution of without requirement of the other databases. Those edges crossing and or falling into may be spurious. Their existence must be determined according to multiple databases.

Now we give the algorithm for structural learning of directed acyclic graphical models.

*Algorithm 3.5. *Construct a DAG from Multiple Databases(1) Input: Multiple databases .(2) Construct a local structure from database separately for each :(a) Initialize as a complete undirected graph;(b) Delete edge from if there exists a subset of such that and are conditional independent given . (3) Construct the global undirected graph :(a) Initialize the edge set of as the union of all edge sets of for ;(b) For an edge , which falls into some , delete it from if it is absent in some other .(4) Delete spurious edges to construct the global skeleton:(a) For an edge with in some and in , delete it from if there exists a subset of such that and are conditional independent given ;(b) For an edge , which fall in some , delete it from if there exists a subset of or such that and are conditional independent given .(5) Construct the equivalence class:(a) Orient the local skeleton as if and are not adjacent in and ;(b) Orient other edges if each opposite of them creates either a directed cycle or a new immorality.(6) Output: The equivalence class of DAGs.

Note that at step 4 denotes all the vertices that are adjacent with vertex . According to Theorems 3.3 and 3.4, the equivalence class constructed by the above algorithm is valid.

#### 4. Illustration of Structural Learning

In this section, we illustrate our algorithm using the ALARM network in Figure 3 that is often used to evaluate structural learning algorithms [4, 11, 12]. The ALARM network in Figure 3 describes causal relations among 37 variables in a medical diagnostic system for patient monitoring. Using the network, some researchers generate continuous data from normal distributions and others generate discrete data from multinomial distributions [4, 12]. Our approach is applicable for both continuous and discrete data. Since the validity of our algorithm can be ensured by Theorems 3.3 and 3.4, the algorithm is illustrated by using conditional independencies from the underlying directed acyclic graph in Figure 3 rather than conditional independence tests from simulated data.

Suppose that we have three databases as depicted by the hypergraph in Figure 3. Database contains variables , database contains variables , and database contains variables . Thus , , and . Note that and are not conditional independent given. At Step 2, the local structures are obtained separately from the three databases, as shown in Figures 4(a), 4(b), and 4(c), respectively, for example, the undirected edge because is independent of conditional on .

(a) The local structure for database |

(b) The local structure for database |

(c) The local structure for database |

At step 3, we combine three local structures together to construct the global undirected graph, as shown in Figure 5. At step 4, we delete the spurious edges to get the global skeleton, which is the undirected version of Figure 6. For example, the spurious edge can be deleted since variables and are conditional independent given .

At step 5, we determine immoralities and orient edges as much as possible. For example, the direction of the undirected edge is determined as by so as not to create a new -structure, and the direction of the undirected edge is determined as by and so as not to create a cycle. At last we obtain the equivalence class in Figure 6, in which all directed edges are oriented correctly, except that four undirected edges , , , and cannot be oriented because any of their orientation leads to a Markov equivalent DAG.

#### 5. Discussion

In this paper, we presented an approach for structural learning of directed acyclic graphs from multiple databases. In our approach, we require that and are separated by , which is a weaker condition than the condition that and are -separated by . This condition can be judged with experts’ prior knowledge of associations among variables, such as Markov chains, chain graphical models, and time series.

There are several obvious advantages of our approach for structural learning. First we do not require conditional independence, which is a basic assumption in most methods. Second we search -separators in or , which is much smaller than . At last, the theoretical results proposed in this paper can be applied to scheme design of multiple databases. Without loss of information on structural learning of DAGs, a joint data set can be replaced by a group of incomplete data sets based on the prior knowledge.

#### Acknowledgments

I would like to thank the referees for their valuable comments and suggestions. This research was supported by NSFC (11001155), Doctoral Fund of Shandong Province (BS2010SW030), and A Project of Shandong Province Higher Educational Science and Technology Program (J11LA08).