Research Article | Open Access
Statistical Identification of Markov Chain on Trees
The theoretical study of continuous-time homogeneous Markov chains is usually based on a natural assumption of a known transition rate matrix (TRM). However, the TRM of a Markov chain in realistic systems might be unknown and might even need to be identified by partially observable data. Thus, an issue on how to identify the TRM of the underlying Markov chain by partially observable information is derived from the great significance in applications. That is what we call the statistical identification of Markov chain. The Markov chain inversion approach has been derived for basic Markov chains by partial observation at few states. In the current letter, a more extensive class of Markov chain on trees is investigated. Firstly, a type of a more operable derivative constraint is developed. Then, it is shown that all Markov chains on trees can be identified only by such derivative constraints of univariate distributions of sojourn time and/or hitting time at a few states. A numerical example is included to demonstrate the correctness of the proposed algorithms.
The classic study of continuous-time homogeneous Markov chains usually has a natural assumption that it has a known transition probability matrix or a transition rate matrix (TRM). However, the TRM of a Markov chain in realistic systems might be unknown although it is an underlying one (named as the underlying Markov chain), and that is what needs to be identified by observable data. For a Markov chain with a finite state space, it is determinative provided that its TRM is identified. In some realistic systems, moreover, one probably observes the partial motion of the underlying Markov chain. For example, one can only observe the sojourn sequences and hitting sequences on one of the states. Subsequently, one can fit the PDF of sojourn (resp., hitting) time by statistical techniques. It is easy and trivial to identify the TRM of this Markov chain if we know exactly the PDFs of sojourn time and hitting time of every state.
A natural question arises as to whether it is possible to identify the TRM by those few states. Due to the transition relations of states which intercommunicate in the underlying Markov chain, it is indeed possible. Thus, an issue on how to identify the TRM of the underlying Markov chain by partially observable information is derived from the great significance in applications.
In fact, the continuous-time homogeneous Markov model which described the gating kinetics of single ion channel is a good example to show the real application in biophysics and neuroscience; for example, see [1, 2] and further publications. In experiments, the transition between the various states cannot be directly observed and only transitions between a small number of open states are observable. It is straightforward to derive the sojourn time and hitting time distributions for a single state or a small number of states. The PDFs of sojourn or hitting time generally take the form of a sum of exponentials (e.g., [2–4]). And the best-fitting approach and function can be found with a number of readily available computer programs [5–7].
Unfortunately, once the fitting is completed, it is very difficult to use these PDFs to identify the TRM of the underlying Markov chain. The most difficult challenge lies in finding out the hidden solutions and algorithms to perform the reverse operation. Thus, the identification has always been addressed directly using the maximum likelihood estimate (MLE) (e.g., see [8–11] and further references) rather than performing this inversion. Another series of publications using canonical forms of reduced dimensions analyze finite two-state trajectories, in which the system is aggregated into only two states, defined as the on state and the off state (irreversible transitions are also allowed); see [12, 13] and the references therein for a review. All of them are powerful. However, each approach has pros and cons. None of them can perfectly identify all kinds of Markov chains. In general, for example, the sojourn times and may not be sufficient for parameterizing a given Markov model. This is shown by  where, in general, and as well as bivariate distributions of subsequent open and closed times, and , are necessary.
Note that those discarded the prior information from the underlying topology. Thus, it is indeed possible to perform the complicated inversion since the prior information from the underlying architecture of Markov chain is very helpful to accomplish it. This is what we call statistical identification of Markov chain (SIMC). We developed a Markov chain inversion approach (MCIA) to open the possibility of solving the SIMC. This issue can be reduced to three steps for applications. The first one is to obtain the necessary data at the fewest possible states by preprocessing the observed data. The second one is to accurately fit the corresponding PDFs of states required by the next step. The final one is to find out the algorithm to identify the TRM of the underlying Markov chain by those PDFs. Throughout this paper, we focus our attention on the final one since there are a number of readily available approaches to fit those PDFs.
We explore a class of Markov models where reversibility is maintained. Since some realistic systems obey the principle of microscopic reversibility or detailed balance, this implied the reversibility of the Markov chain. At least in applications to ion channels, an assumption of reversibility is plausible in many circumstances, corresponding to channels which are not coupled to a free energy source such as an ion concentration gradient; for discussion of this assumption, see . As an example, [16, 17] derived a criterion to identify whether it is reversible or not in a three-state Markov system based on the coarse-grained information of two-state trajectories. Throughout the rest of this paper, the unqualified term “Markov chain” usually refers to a reversible Markov chain.
It is well known that classic architectures include star-graph, linear, and cycle chains. Thus, some general constraints and the corresponding algorithms for them were explored in former publications [18–20]. There is no obvious strategy for selecting the constraints to use for a general model. A key task is to find out the feasible solutions and the corresponding algorithms to different classes. Then, a subsequent question is whether more complicated architectures are identified by such approach. On this point, any one of them can be viewed as a tree provided that it contains no cycles. This implies that the tree is a very important extension of the star graph and line graph and represents one of the two classes of topologies (whether containing cycles or not). It is known that all Markov chains on trees are reversible . Thus, SIMC on trees will be exploited in this letter, although some of them are still not applied in real systems. It is derived that all Markov chains on trees are identified by our approach. It is mentioned that, for SIMC on trees, the greatest contribution should be to discover the fundamental way and the corresponding algorithm to identify the TRM due to the challenge of performing the reverse operation as stated earlier. Hence, the work further opens up the possibility of carrying out the statistical identification for all reversible Markov chains.
Such approach has obvious advantages over MLE in that it can identify the most reversible Markov chain without the requirement of bivariate distributions of subsequent sojourn time and hitting time and in that the computation is accurate based on the accurate sojourn time PDFs and the prior information about the underlying topological structure of the Markov chain.
This paper is organized as follows. After recalling the sojourn time distribution, a type of a more operable derivative constraint is developed in Section 2. In Section 3, the solution to SIMC on trees is provided by such derivative constraints. In addition, we address the solutions to three particular and regular representatives of trees, including theoretical conclusions, proofs, and algorithms. Finally, a numerical example is included to demonstrate the correctness of the proposed model.
2. Distributions of Hitting Time and Its Derivative Relationships
Throughout this section, denotes a diagonal matrix whose entries on the diagonal are those of them, and denotes transpose, a column vector of ones, and 0 a matrix (vector) of zeros, with the dimensions of these being clear from their context. Furthermore, and denote the th entry and the th column vector of the matrix , respectively.
Let be a reversible Markov chain with finite state space , conservative TRM such that , and equilibrium distribution such that where , and the first one (i.e., for any ) implies the reversibility.
Assume that the state space may be partitioned as into classes, where denotes the class of closed states (unobservable), the class of open states being distinguishable (i.e., having different open levels), and the class of aggregated open states (i.e., having other common open levels); for example, see .
A proper subset of the open class is observed such that either or for (this condition allows one to obtain the sojourn time and hitting time sequences of from experimental recording). Let . The state space is now partitioned as . The TRM may be partitioned as, by the classes and , The corresponding partition is used for the equilibrium distribution ; that is, .
2.1. Distributions of Hitting Time
The sojourn time and hitting time of can be defined as and , with the usual convention that the infimum over an empty set is infinity, respectively. They are also the hitting time and sojourn time of , respectively.
For ease of exposition, let be the number of , and , . Note that is nonsingular since is irreducible, and hence all the eigenvalues of have strictly negative real parts. Assume that are all real eigenvalues of such that . By symmetric reparameterization with respect to , can be diagonalized with an orthogonal matrix and we can obtain a matrix and then set and . Then, one can get
Theorem 1. The PDF of hitting time of (resp., the sojourn time of ) takes the following form: Note that the above is a sum of -exponentials, , where are the real eigenvalues of and .
In particular, if only contains a single state , that is, , then the PDF of the sojourn time degenerates to an exponential form.
Corollary 2. The PDF of the sojourn time at a single state takes a form of an exponential
2.2. The Derivative Constraints
Let andThen, is the th derivative of the hitting time PDF of at .
A class of constraints commonly used can be derived from the derivatives of the hitting time distribution at .
Theorem 3. The derivatives of the hitting time distribution of at are related to the transition rates with the formula It is noted that many other constraints are required to identify the general Markov chains . However, derivative constraints are the most common ones. Thus, we will more deeply discover the intrinsic laws in derivative constraints. Further, we will show that only by applying the derivative constraints is it possible to solve most of the Markov chains such as the trees; see the next section.
In particular, the most useful and commonly used constraints are the first three ones.
Corollary 4. The following equations hold: In particular, a single state has a more simple and visual form of , and by far the most common and convenient are those for a single state.
Corollary 5. Observing a single state , one can get the following equations for a tree: The right-hand side of the first equation gives a sum over all exit rate flows for state because of . Conditional upon the first equation, the second and the third one are obvious. The expression within the bracket of (resp. ) is the sum over rates of possible -step (resp., -step) transitions departing from a state , having a direct transition to , and returning to itself. Those possible transitions imply the intrinsic relationships to so that most transitions can be identified by those hitting PDFs.
By the end of this section, we show that the above are known from the required PDFs at and follow from Lemma 6. First of all, Corollary 2 shows that , where is the expected sojourn time for state . Corollary 5 shows that . In addition, . It suffices to yield , which implies the following lemma.
Lemma 6. The PDFs of sojourn time and hitting time at a single state show thatFor observations at , one can obtain the corresponding . Thus, (6) shows that is known, which is equal to times a known constant from the observation at . The reason is as follows. Lemma 6 shows that, based on statistics for a single state , one can obtain by its PDF for any , followed by .
3. Solutions to Trees
As stated in the Introduction, the tree is a very important extension of the star graph and line graph and represents one of the two classes of topologies. It is known that all Markov chains on trees are reversible . Thus, SIMC on trees will be exploited in this section, although some of them are still not applied in real systems. In this letter, the term “tree” refers to a connected tree. For a Markov chain on a tree with vertices and with a TRM , its state space is composed of the vertices , an edge between vertex and vertex implies the nonzero transition rates and , and is the probability of a transition from to .
It is concluded that all Markov chains on trees are identified by our approach. There are various kinds of trees, so we cannot provide individual proofs and algorithms. Without loss of generality, a complete binary tree is provided to show a general proof and solution. According to the classification of Wolfram MathWorld, there are many valid and specific trees: Banana Tree, Cross Graph, E Graph, Fork Graph, Firecracker Graph, H Graph, Spider Graph, and so forth. Cross Graph, E Graph, Fork Graph, H Graph, and Spider Graph can be conformationally viewed as a Star-Branch Graph (a special tree) which can be identified by the algorithms in earlier publications [19, 20]. Double-Star Graph, Banana Tree, and Firecracker Graph are three particular and regular models of them. For demonstration on solutions of specific trees, those three trees will also be addressed as representatives of trees.
It is mentioned that, for SIMC on trees, the greatest contribution should be to discover the fundamental way and the corresponding algorithm to identify the TRM due to the challenge of performing the reverse operation as stated earlier.
3.1. General Conclusions on the Tree
At the start of this subsection, we present several useful conclusions from the derivative constraints which are convenient to determine the transition rates of Markov chain on trees as the first step.
Lemma 7. For a subchain on a tree such as in Figure 1, the following conclusions hold.
(i) One can obtain the quantities at least from the sojourn and hitting time PDFs at state ; in other words, at least can be expressed in terms of real functions of , , , and .
(ii) Further, one can obtain and , , , and at least from the sojourn and hitting time PDFs at state , provided that the transitions between and its other children , that is, and for any , are all known or have been identified. That is to say, and , , , and at least can be expressed in terms of real functions of and ().
Proof. (i) Firstly, are easy to get by Lemma 6. Secondly, by Corollary 5,Thus,which imply the first assertion.
(ii) On the basis of (i), one needs only to find out the algorithm to obtain , , , and . For ease of exposition, set the constants as , , and since and are all known for any .
Firstly, one hasSecondly, by Corollary 5,It follows thatFinally, by reversibility, it easily follows thatwhich completes the proof.
Lemma 8. A subtree such as that in Figure 2 is considered, in which is the root, are the leftmost children, and are leaves as all children of state . Let denote the descendants of state for . For example, is empty and .
One can obtain from the sojourn and hitting time PDFs at state , provided that the transitions between and its descendants in for any are all known or have been identified.
Proof. Firstly, one can obtain according to (ii) of Lemma 7.
Secondly, , and by Corollary 5, Obviously, only the bold part is unknown in (17). Thus, one can determine by (17). Then, only the bold part is unknown in (18), and can be determined by (18). Furthermore, .
Finally, by induction, it is not difficult to verify that can be identified by the subsequent .
The proof is completed.
Remark 9. The above conclusion is true by observation at for any provided that the transitions between and its other descendants are all known or have been identified.
Lemmas 7 and 8 show that if a state in a tree has children and the transitions between states and of the children as well as the descendants of such children are all known or have been identified, then the transitions between state , the th child (say ), and the parent can be identified by the PDFs of sojourn and hitting time at the th child (i.e., state ).
3.2. Main Results on Trees
Recall the statement in the Introduction; it is trivial to identify the TRM of an underlying Markov chain provided that every state of this chain is observed. People hope that the number of states observed is as few as possible. But there is no strict criterion for determining how many states are observed to use for a general model other than simplicity and ease of solution. However, there is no doubt that it is also of questionable significance to finish it if the number is more than half of the whole chain. Thus, we impose a mild restriction on the number of observed states such that it is smaller than or equal to half of the whole chain.
First of all, a solution is first provided to complete the binary tree, in which all interior nodes have two children and all leaves have the same depth or same level.
Theorem 10. If the initial distribution of a Markov chain on a complete binary tree is the equilibrium distribution, then this tree can be identified by the PDFs of the sojourn time and hitting time for all leaves.
Proof. Consider a complete binary tree with a depth of ; it has leaves at the last level and has nonleaf nodes (including one root).
Now, let us use mathematical induction to prove these facts as follows.
Step 1. We prove the case with levels. For the convenience of expression, let – be the leaf from the left to the right at the last level, - be the node from the left to the right at the former level, and be the root (see Figure 3). Let be the corresponding equilibrium distribution for state for . Firstly, according to Lemma 6, one can obtain , and , , , and by observation at states –. Secondly, (i) of Lemma 7 implies that , , , , , and can be solved. Thus, one has and .
By observation at state (resp., state ), according to Corollary 5, one has (resp., ) and . Finally, one gets (resp. ) and by (ii) of Lemma 7. Those complete the proof of levels.
Step 2. We suppose that this claim is true for any complete binary tree with levels. For a complete binary tree with levels, we then show it is correct as follows. At this point, a complete binary tree with levels can be viewed as a left or right child. This means that all transition rates in two complete binary trees with levels can be identified by observation at all leaves between and .
For conciseness of notation, let (resp. ) be a subtree belonging to the left child in the complete binary tree with (resp., ) levels, and let - be the number of the leftmost nodes at each level from the bottom to the top in such complete binary tree with levels.
Firstly, by observation at state , according to Corollary 5, one hasBy inductive assumptions, there is only one unknown in (19). Thus, can be determined. Again by inductive assumptions, one can obtain (where denotes the right child of the node ).
Analagous to the above, one can get byThus, can be determined by (20).
Those imply that all transition rates from the left subtree (for the tree with levels) can be identified by observations at the first leaves.
Secondly, one can identify the rest (i.e., right part) of the tree with levels by duplicate method of the left part.
These have proved that the conclusions of induction are true for a complete binary tree with levels.
As a matter of fact, this theorem can also be proved simply by Lemma 8.
Remark 11. This proof shows the corresponding algorithm to determine all transition rates. The solution by observation at all leaves is satisfied by the mild restriction on the number of observed states, although the number of leaves is bigger than the number of nonleaf nodes by one. Because it also can be identified by observation at any of those leaves, for example, without loss of generality, assume that – are observable for a tree in Figure 3; one can obtain by observations at and and by observations at and then obtain by observation at or ; thus, , , and . If the number of all leaves is less than that of all nonleaf nodes for a tree, such as a tree in which many nodes have only one child, then this solution is very valid. These assertions are valid to -tree later.
One can carry this line of argument a bit further to obtain the following result.
Theorem 12. If the initial distribution of a Markov chain on -tree is the equilibrium distribution, then this tree can be identified by the PDFs of the sojourn time and hitting time for all leaves.
One may proceed as previously explained to obtain a more general conclusion for the tree.
Theorem 13. If the initial distribution of a Markov chain on a tree is the equilibrium distribution, then this tree can be identified by the PDFs of the sojourn time and hitting time for all leaves.
Theoretically, the transition rates about state in a tree can be calculated by PDFS at any leaf from the descendants. However, the one from the shortest path should be optimal to minimize the error propagation.
Furthermore, identification by observation at leaves is just one of the solutions for Markov chains on trees. There are also other solutions for particular trees; for example, the number of leaves is bigger than the number of nonleaf nodes by one.
3.3. Solutions to Special Trees
In this subsection, Double-Star Graph, Banana Tree, and Firecracker Tree, as representatives of trees, will be addressed for demonstration to solutions of specific trees.
3.3.1. Double-Star Graph
Consider a Double-Star Graph Markov chain (see Figure 4), where the two central states and are observable states. The transition rate matrix is given as follows: One can observe the two central states and .
For the conciseness of notation, we denote , , and , . Thus, the state space is simple for and the transition rates are nonzero only for or . Thus, one intends to derive , for all and , for all , as well as , . We set Then, is the equilibrium distribution and the reversibility in (1) is satisfied.
From now on, we investigate the algorithm of how to identify such chain.
Firstly, we observe state and let be its sojourn time. It follows from Corollary 2 thatSecondly, we observe the state and let be its sojourn time. It follows from Corollary 2 thatFinally, we observe the two states and , that is, . Let (resp., ) be the hitting time (resp., sojourn time) of . According to Theorem 1, one can get the following lemma.
Lemma 14. The PDF of the hitting time is, for , as follows: where
The following theorem is an immediate consequence of Lemma 14.
Theorem 15. If the initial distribution of the Markov chain with Double-Star Graph in Figure 4 is the equilibrium distribution, then it can be identified by the PDFs of the sojourn time and hitting time for the two central states and .
Proof. Firstly, and can be obtained by (23) and (24). Secondly, according to Lemma 14, and can be obtained by the first two assertions of (26), because all and are known from the PDF in Lemma 14 by observation of states and . Finally, and can be followed by the final two assertions of (26).
Note that this proof shows the corresponding algorithm to determine all transition rates of such Double-Star Graph chain. In the course of calculation, one can easily derive real roots from comparatively accurate PDFs.
3.3.2. Banana Tree Graph
An ()-banana tree, say , as defined by Chen et al. , is a graph obtained by connecting one leaf of each of copies of a -star graph with a single root vertex that is distinct from all the stars. Thus, there are leaves, nodes, and root. This is one of the regular trees.
Consider a Markov chain with Banana Tree Graph (see Figure 5).
For or , it is degenerated into the special case of Star-Branch Graph.
For , there are two leaves for each -star graph. It is implied that the number of all leaves is less than , compared with of all nonleaf nodes. So, observation at all leaves is valid. According to Theorem 13, a normal solution is as follows.
Theorem 16. If the initial distribution of a Markov chain with Banana Tree Graph is the equilibrium distribution, then it can be identified by the PDFs of the sojourn time and hitting time for all leaves.
For , the number of all leaves is bigger than that of nonleaf nodes. An optional solution is as follows.
Theorem 17. If the initial distribution of a Markov chain with Banana Tree Graph is the equilibrium distribution, then it can be identified by the PDFs of the sojourn time and hitting time for the root state and all node states, that is, all nonleaf states.
Proof. For ease of exposition, suppose that is the root (see the previous introduction of banana tree in this subsection) of this tree and that is the central state, is the leaf, and are the others at each -star graph, respectively. Firstly, the corresponding , , , , , and can be obtained by each nonleaf state. Secondly, according to the proof of Double-Star Graph, all , can be obtained by observation at all nonleaf states, because their is similar to that of observation at for Double-Star Graph. Finally, for each -star graph, , , , and .
3.3.3. Firecracker Graph
An ()- firecracker, say , is a graph obtained by the concatenation of -stars by linking one leaf from each .
A Markov chain with Firecracker Graph (see Figure 6) is provided.
For or , it is degenerated into the special tree which is similar, but not completely equivalent, to a Star-Branch Graph.
For , there are two leaves for each -star graph. It is implied that the number of all leaves is identical to that of all nonleaf nodes. So, observation at all leaves is valid. According to Theorem 13, a normal solution for is as follows.
Theorem 18. If the initial distribution of a Markov chain with Firecracker Graph is the equilibrium distribution, then it can be identified by the PDFs of the sojourn time and hitting time for all leaves.
For , the number of all leaves is bigger than that of nonleaf nodes. Following the technique of proof in Theorem 17, it is easy to verify such optional solution.
Theorem 19. If the initial distribution of a Markov chain with Firecracker Graph is the equilibrium distribution, then it can be identified by the PDFs of the sojourn time and hitting time for all nonleaf states.
3.4. Numerical Example
To demonstrate how to apply our algorithms to real data, we present a numerical example here. As we mentioned before, we focus on the final step and then we omit the preprocessing of data. Thus, the data we have is the observed PDFs of sojourn time and hitting time at each leaf. Based upon these data, we could find all the transition rates of this tree.
Example 1. The model of a Markov chain is a binary tree in Figure 3 with the true transition rates matrix given byWe can divide the calculation into two steps: fitting sojourn time and hitting time histogram and transition rates estimation.
Step 1 (sojourn time and hitting time histogram estimate). Suppose that we have obtained the PDFs by fitting (simulating) as follows:where (resp., ) is the PDF of sojourn time (resp., hitting time) at the state for .