Blackwell Spaces and -Approximations of Markov Chains
On a weakly Blackwell space we show how to define a Markov chain approximating problem, for the target problem. The approximating problem is proved to converge to the optimal reduced problem under different pseudometrics.
A target problem (TP) is defined as a homogeneous Markov chain stopped once it reaches a given absorbing class, the target. Our purpose is to only use the necessary information relevant with respect to the target and in consequence to reduce the available information. A new Markov chain, associated with a new equivalent but reduced matrix is defined. In the (large) finite case, the problem has been solved for TPs: in [1–3], it has been proved that any TP on a finite set of states has its “best target” equivalent Markov chain. Moreover, this chain is unique and there exists a polynomial time algorithm to reach this optimum.
The question is now to find, in generality, an -approximation of the Markov problem when the state space is measurable. The idea is to merge into one group the points that -behave the same with respect to the objective and, at the same time, to keep an almost equivalent Markov chain, with respect to the other “groups”. The construction of these groups is done through equivalence relations. Each group will correspond to a class of equivalence. In fact, there are many other mathematical fields where approximation problems are faced by equivalences. For instance, in integration theory, we use simple functions, in functional analysis, we use the density of countable generated subspaces, and, in numerical analysis, we use the finite elements method.
In this paper, the approximation is made by means of discrete equivalences, which will be defined in the following. We prove that the sequence of approximations tends to the optimal exact equivalence relation defined in [1–3], when we refine the groups. Finer equivalence will imply better approximation, and accordingly the limit will be defined as a countably generated equivalence.
Under a very general Blackwell-type hypothesis on the measurable space, we show that it is equivalent to speak on countably generated equivalence relationships or on measurable real functions on the measurable space of states. If we do not work under this framework of Blackwell spaces, we can be faced to paradoxes, as it is explained by , of enlarging -algebras, while decreasing the information available to a decision-maker. The -approximation of the Markov chain depends always upon the kind of objective. In , Jerrum deals with ergodic Markov chains. His objective is to approximate the stationary distribution by means of a discrete approximating Markov chain, whose limit distribution is close in a certain sense to the original one. However, unlike our following work, his purpose is not the explicit and unified construction of the approximating process. In this paper, we focus on the target problem. We solve extensively the TP, where the objective is connected with the conditional probability of reaching the target , namely , for any . This part extends the work in [1–3].
2. Main Results
Let be a measurable space, equipped with a assumption (A0) that will be explained when necessary. Let be any transition probability on . A homogeneous Markov process is naturally associated to . In the target problem, we are interested in the probabilities of reaching the target class within steps, namely in The set is a priori given, and does not change through the computations. is supposed to be an absorbing set lying in .
Definition 2.1. Let be a measurable space and let . Let be a sub -algebra of such that . A function is a transition probability on if (i) is a probability measure on , for any , (ii) is -measurable, for any . Given a transition probability on , we denote by the transition probability on given inductively by We denote by the set of the transition probabilities on . We denote by the set of all transition probabilities on that we equip with a suitable pseudometric : It is such that This pseudometric, which is compatible with , allows to approximate by simpler kernels.
A target problem is defined through a transition probability . More precisely, we have the following definition.
Definition 2.2. A target problem is a quadruple , where and . A simple target problem is a target problem where is generated by an at most countable partition of .
The main purpose of this paper is to approximate any target problem by a sequence of simple target problems in the spirit of the construction of the Lebesgue integral, where the integral of a function is approximated by the integral of simple functions . The strategies will play the role of the approximating subdivisions .
Definition 2.3. We call strategy a sequence of maps from the set of the target problems to the set of the simple target problems. A strategy is a target algorithm if it is built as in Section 3.
In the “Lebesgue example” given above, the strategy is related to the “objective” of the problem (the integral) and the pseudometric is required to go to 0 as goes to infinity. Here also, a strategy is meaningful if tends to 0 as goes to infinity. Moreover, for what concerns applications, given a target problem a good strategy should not need the computation of , . The first main result of this paper states that the target algorithms are always good strategies.
Theorem 2.4. For any target problem and any target algorithm , where .
Two questions immediately arise: does the sequence have a limit (and in which sense)? Moreover, since is defined as a pseudometric, does this limit depend on the choice of ?
The extension of the concept of compatible projection given in [1–3] to our framework will enable us to understand better the answer to these questions. A measurable set of a measurable space is an -atom if it has no nonempty measurable proper subset. No two distinct atoms intersect. If the -field is countably generated, say by the sequence then the atoms of are of the form where each is either or .
Definition 2.5. An equivalence relationship on a measurable space is measurable (discrete) if there exists a (discrete) random variable ( denotes the Borel -algebra), such that and we denote it by . Let be a target problem. A compatible projection is a measurable equivalency such that and We say that if corresponds to partitions finer than . Finally, a compatible projection is said to be optimal if , for any other compatible projection .
Theorem 2.7. If is a compatible projection for the target problem , then there exists a target problem . such that for any .
It is not said “a priori” that an optimal compatible projection must exist. If it is the case, then this equivalence is obviously unique.
Theorem 2.8. For any target problem , there exists a (unique) optimal compatible projection .
To conclude the main results, let us first go back to the Lebesgue example. The simple functions are chosen so that increases to and . The following theorem guarantees these two similar facts by showing the “convergence” of any strategy to the optimal problem.
Theorem 2.9. Let , with target algorithm and let be the optimal compatible projection associated to the target problem . Then (i) for any , and ; (ii), for any .
Remark 2.10 (The topology top). In Theorems 2.4–2.9, we have proved the convergence of to with respect to the pseudometric . The pseudometric topology is the topology induced by the open balls , which form a basis for the topology. Accordingly, the previous theorems may be reread in terms of convergence of to on the topological space .
2.1. Connection with Weak Convergence
Given a strategy , if we want to show a sort of weak convergence of to , for any , we face the two following problems: (i)each is defined on a different domain (namely, on ), (ii)we did not have required a topology on .
We overcome the first restriction by introducing a new definition of probability convergence. The idea is given in the following example.
Example 2.11. Let , be the -algebra on generated by the dyadic subdivision. Suppose we know that is the unique probability on s.t. for any , . Even if is not defined on the Borel sets of , it is clear that in “some” sense, it must happen that , where is the Lebesgue measure on the Borel sets of . Note that the cumulative function of is not defined, and therefore a standard weak convergence cannot be verified.
In fact, we know that that is, in this case, as , we can determine the cumulative function on a dense subset. This fact allows to hope that in a particular sense.
Definition 2.12. Let be a filtered space, and set . Let , and be probability measures. One says that converges totally to on the topological space as tends to infinity if (converges in weak sense on ), for any , such that . One writes .
Going back to the example, it is simple to check that , where are given in Example 2.11 and is the standard topology on . In fact, let be any extension of to the Borel sets of . For any , we have by (2.9) that where is the cumulative function of , which implies the weak convergence of to and, therefore, .
For what concerns the topology on , we will define the topological space induced by the pseudometric associated to the target problem , and the pseudometric . In this way is only defined with the data of the problem. One may ask: is this topology too poor? The answer is no, since it is defined by the pseudometric . In fact, means that and play “almost the same role” with respect to . A direct algorithm which takes into account needs the computation of at each step. In any case, even if may not be computable, it defines a nontrivial interesting topology on . As expected, we have the following theorem.
Theorem 2.13. Let , with target algorithm. Then, for any given ,
3. The Target Algorithm
In this section, we introduce the core of the approximating target problem, namely a set of strategies which solves the target problem.
Given a measurable space and a target problem , the target algorithm is built in the spirit of the exact one given in [1, 2], which starts from the largest classes and and then reaches the optimal classes according to a backward construction.
The target algorithm defines a strategy , where and it consists of three steps: (1)the choice of a sequence of equivalences on the simplex defined on the unit ball of with ; (2)the definition of a filtration based on where each is generated by a countable partition of ; (3)the choice of a suitable measure and the definition of .
3.1. Preliminary Results on Measurability and Equivalency and the Choice of
Associated to each countably generated sub -algebra , we define the equivalence relationship induced by the atoms of : Thus, if is a sequence of countably generated -algebras, then
Now, the atoms of the -algebra of each simple target problem are at most countable, by definition. Then may be represented as a transition matrix on the state set . Each row of is a distribution probability on (i.e., a sequence in the simplex of ). The first step of the target algorithm is to equip with the -norm and then to define an -equivalence on .
We will alternatively use both the discrete equivalencies and the countable measurable partitions, as a consequence of the following result, whose proof is left to the appendices.
Lemma 3.1. Given a measurable space , there exists a natural bijection between the set of discrete equivalencies on and the set of the countable measurable partitions of it.
Let be the unit sphere in and be the simplex on . Let , for any , and be the standard topology on . Denote by the Borel -algebra on generated by . We look at as a subset of so that the Borel -algebra induced on is .
Definition 3.2. is an -equivalence on if it is the product of finite equivalences on each , and whenever .
Remark 3.3. The choice of the -norm on is linked to the total variation distance between probability measures. This distance between two probability measures and is defined by . On the other hand, the total variation of a measure is , where the supremum is taken over all the possible partitions of . As , we have that ; see . To each corresponds the probability measure on with (and vice versa). In fact, implies and . Therefore, since , we have
Example 3.4. Define the -cut as follows: , for all , where denotes the entire part of . is an -equivalence on . Indeed, (i)for each , define . Then is a finite equivalence on and , for all , (ii)for any is measurable with respect to ,(iii)for all,
3.2. The Choice of
The following algorithm is a good candidate to be a strategy for the approximating problem we are facing. Given a sequence of -equivalences on , we define inductively. Consider the equivalence classes given by and divide them again according to the next rule. Starting from any two points in the same class, we check whether the probabilities to attain any other -classes are -the same. Mathematically speaking: we have the following steps.
Step 3.2. .
Step n. is based on the equivalence and on , inductively. is generated by a countable partition of , say . We define, for any couple , Lemma 3.6 shows that is a discrete equivalency on , and therefore it defines as generated by a countable partition of .
Remark 3.5. The choice of the “optimal” sequence is not the scope of this work. We only note that the definition of can be relaxed and the choice of the sequence may be done interactively, obtaining a fewer number of classes at each step.
Lemma 3.6. Let be a target problem. defined as above, is a filtration on and for any , is a finite (and hence discrete) equivalency on .
Proof. The monotonicity of is a simple consequence of (3.7).
The statement is true for , since . For the induction step, let be the measurable partition of given by . The map given by is therefore measurable. As is a finite equivalency on , the map is also measurable, where is the natural projection associated with . Thus, two points are such that if and only if their image by is the same point of . It results that the new partition of built by is obtained as an intersection of the sets —which formed the -partition- with the counter-images of by . Intersections between two measurable finite partitions of being a measurable finite partition of , we are done.
3.3. The Choice of and the Definition of
Before defining , we need the following result, which will be proved in Section 5.
Theorem 3.7. Let be defined as in the previous section and let . Then is a compatible projection.
More precisely, let be a probability measure on such that , for any (the existence of such a measure is shown in Example 3.8).
For any , let be the -random variable such that . Define is uniquely defined on , the only -null set of being the empty set. We claim that is a probability measure, for any .
We give in the following an example of the measure that has been used in (3.9) which justifies its existence.
Example 3.8. Let be a sequence of independent and identically distributed geometric random variables, with , . Let and set . There exists a probability measure on such that
and thus, , for all . Moreover, it follows that for any ,
We check by induction that we can embed into , for any . The required measure will be the trace of on the embedded -field .
For , define , . The embedding forms a nontrivial partition, and therefore the restriction of to the embedding of defines a probability measure on with if .
For the induction step, suppose it is true for . Given , we have , where is a nontrivial partition in and therefore the restriction of to the embedding of defines a probability measure on with if .
Given , let . The monotonicity of ensures that each will belong to one and only one . Moreover, by definition of , we have that Since is at most countable, we may order for any . We have accordingly defined an injective map , where
According to the cardinality of , define the -embedding By definition of and (3.12), it follows that we have mapped into a partition in . Moreover, as a consequence of (3.11). The restriction of to the embedding of defines a probability measure on with if . Note that is by construction an extension of to since by (3.14),
Finally, the Carathéodory's extension theorem ensures the existence of the required , as , for any . Note that is just mapped to the trace of on the embedded .
The problem of approximation is mathematically different if we start from a Markov process with a countable set of states or with an uncountable one. Let us consider, for the moment, the countable case: is an at most countable set of the states and is the power set. Each function on is measurable. If we take any equivalence relation on , it is both measurable and identified by the -algebra it induces (see Theorem 4.6). This is not in general the case when we deal with a measurable space , with uncountable. In this section, we want to connect the process of approximation with the upgrading information. More precisely, a measurable equivalence defines both the partition and the sigma algebra . One wishes these two objects to be related, in the sense that ordering should be preserved. Example 4.5 shows a paradox concerning and when is uncountable. In fact, we have the following lemma.
Lemma 4.1. Let be countably generated sub -algebras of a measurable space . Then .
In particular, let be random variables. If , then .
Proof. See Appendix A.
The problem is that even if a partition is more informative than another one, it is not true that it generates a finer -algebra, that is, the following implication is not always true for any couple of random variables and : Then Lemma 4.1 is not invertible, if we do not require the further assumption (A0) on the measurable space . This last fact connects the space with the theory of Blackwell spaces (see Lemma 4.3). We will assume the sole assumption (A0).
Example 4.2 (). We give here a counterexample to assumption (A0), where two random variables generate two different sigma algebras with the same set of atoms. Obviously, assumption (A0) does not hold. Let be a Polish space and suppose . Let and consider the sequence , that determines , that is, , . Let . . As a consequence of Lemma A.3, there exist two random variables such that and . The atoms of are the points of , and then the atoms of are also the points of , since .
We recall here the definition of Blackwell spaces. A measurable space is said Blackwell if is a countably generated -algebra of and whenever is another countably generated -algebra of such that , and has the same atoms as . A metric space is Blackwell if, when endowed with its Borel -algebra, it is Blackwell. The measurable space is said to be a strongly Blackwell space if is a countably generated -algebra of and (A1) if and only if the sets of their atoms coincide, where and are countably generated -algebras with , .
For what concerns Blackwell spaces, the literature is quite extensive. Blackwell proved that every analytic subset of a Polish space is, with respect to its relative Borel -field, a strongly Blackwell space (see ). Therefore, if is (an analytic subset of) a Polish space and , then cannot be a weakly Blackwell space (see Example 4.2). Moreover, as any (at most) countable set equipped with any -algebra may be seen as an analytic subset of a Polish space, then it is a strongly Blackwell space. More connections and examples involving Blackwell spaces, measurable sets and analytical sets in connection with continuum hypothesis (CH) may be found in [8–11]. Finally, note that assumption (A0) and assumption (A1) coincide, as the following lemma states.
Lemma 4.3. Let be a measurable space. Then (A0) holds if and only if (A1) holds.
Proof. Lemma A.3 in Appendix A asserts that is countably generated if and only if there exists a random variable such that . In addition, as a consequence of Lemma 4.1, we have only to prove that (A1) implies (A0). By contradiction, assume (A1), , but . We have , and then by (A1) and Lemma A.3. On the other hand, from (3.3), we have that .
We call weakly Blackwell space a measurable space such that assumption (A0) holds. If is a weakly Blackwell space, then is a weakly Blackwell space, for any . Moreover, every strong Blackwell space is both a Blackwell space and a weakly Blackwell space whilst the other inclusions are not true in general. In [12, 13], examples are provided of Blackwell spaces which may be shown not to be weakly Blackwell. The following example shows that a weakly Blackwell space need not be Blackwell.
Example 4.4 (weakly Blackwell Blackwell). Let be an uncountable set and be the countable-cocountable -algebra on . is easily shown to be not countably generated, and therefore is not a Blackwell space. Take any countably generated -field , that is, , . (i)Since each set (or its complementary) of is countable, then, without loss of generality, we can assume the cardinality of to be countable. (ii)Each atom of , is of the form Note that the cardinality of the set is countable, as it is a countable union of countable sets. As a consequence of (4.1), we face two types of atoms: (1)for any , . This is the atom made by the intersections of all the uncountable generators. This is an uncountable atom, as it is equal to . (2)exists such that . This implies that this atom is a subset of the countable set . Therefore, all the atoms (except ) are disjoint subsets of the countable set and hence they are countable. It follows that the number of atoms of is at most countable. Thus, is a strongly Blackwell space, that is, is a weakly Blackwell space.
Example 4.5 (Information and -algebra (see )). Suppose , where is the countable-cocountable -algebra on and . Consider a decisionmaker who chooses action 1 if and action 2 if . Suppose now that the information is modeled either as the partition of all elements of , , and in this case the decisionmaker is perfectly informed, or as the partition . If we deal with -algebras as a model of information then and . The partition is more informative than , whereas is not finer than . In fact and therefore if the decisionmaker uses as its structure of information, believing it more detailed than , he will never know whether or not the event has occurred and can be led to take the wrong decision. In this case, -algebras do not preserve information because they are not closed under arbitrary unions. However, if we deal with Blackwell spaces, any countably generated -algebra is identified by its atoms and therefore will possess an informational content (see, e.g., ).
The following theorem, whose proof is in Appendix B, connects the measurability of any relation to the cardinality of the space and assumption (A0). It shows the main difference between the uncountable case and the countable one.
Theorem 4.6. Assume (CH). Let be a measurable space. The following properties are equivalent: (1)any equivalence relation on is measurable and assumption (A0) holds, (2) is a weakly Blackwell space, (3) is countable and .
The following theorem mathematically motivates our approximation problem: any limit of a monotone sequence of discrete equivalence relationships is a measurable equivalence.
Theorem 5.1. For all , let be a discrete equivalency. Then is a measurable equivalency. Conversely, for any measurable equivalency , there exists a sequence of discrete equivalencies such that .
Proof. See Appendix A.
Proof of Theorem 2.7. Let be a compatible projection. We define What remains to prove is that . More precisely, we have to show that is -measurable, for all . By contradiction, there exists such that the random variable is not -measurable. Then , and hence by assumption (A0), which contradicts (2.7).
Proof of Theorem 3.7. As a consequence of Theorem 5.1, , where . Define
We will prove that, for any , is -measurable and consequently will be a compatible projection. This implies that there exists a measurable function so that . Therefore, if , then , which is the thesis.
We need to show that for any and , we have We first show that it is true when by proving that which implies that . The inclusion is always true. For the other inclusion, let . Let ; there exists such that . Therefore, (3.7) and the definition of imply , for any . As , we obtain that . Then (5.3) is true on the algebra .
Actually, let such that . We prove that (5.3) holds for by showing that Again, since , then and therefore . Conversely, the set is empty since the sequence of -measurable maps converges to 0: Then (5.3) is true on the monotone class generated by the algebra , that is, on .
Proof of Theorem 2.8. Given a target algorithm , let be defined as in Theorem 3.7. We claim that is optimal. Let be another compatible projection and let be the target problem given by Theorem 2.7. We are going to prove by induction on that
In fact, for it is sufficient to note that .
Equation (3.7) states that , where is the discrete random variable, given by Lemma 3.1, s.t.(5.8)
Let be defined as . Then(5.9) Obviously, . For the induction step, as , we have that is -measurable, and therefore . Then . Therefore , which implies by Lemma 4.1, and hence is optimal.
Corollary 5.2. does not depend on the choice of .
Proof. is optimal, for all . The optimal projection being unique, we are done.
Proof of Theorem 2.4. Let be defined as in Theorem 3.7 and be given by Theorem 2.7 so that for any . Then each of (3.9) can be rewritten as
where is the -class of equivalence of and since .
Note that . Then, for any , there exists an so that . Therefore we are going to prove by induction on that which completes the proof. If , then by definition of , since , we have that For the induction step, we note that where is the partition of given by . By induction hypothesis, for any , for large enough. Since if , it follows that Equation (5.13) becomes On the other hand, by (5.10), The definition of states that whenever and therefore Since as tends to infinity, we get the result.
Proof of Theorem 2.9. By (3.9) and Lemma 3.6, is a martingale with respect to the filtration , for any . Then, if as in (3.9), we have that
Let be defined as in Theorem 3.7 and given by Theorem 2.7 so that for any . Unfortunately, (5.20) is not enough to state that
even if is countably generated (see, e.g., , for counterexample). In fact, Polish assumption is assumed in  to guarantee (5.21).
Here, we will deal with the specific properties of and to deduce (5.21). Take and . By (5.10) and the definition of , since , we have, for any , that since the only -null set in is the empty set. Then for any and .
5.1. Weak Convergence of Conditional Probabilities
Let the target problem be given and let , where be a target algorithm. In order to prove Theorem 2.13, which states the total convergence of the probability measure towards , we proceed as follows: (i)first, we define the topology on ; (ii)then, we define a “natural” topology on associated to any target algorithm . We prove in Theorem 5.4 the total convergence of to , under this topology; (iii)then, we define the topology on as the intersection of all the topologies ; (iv)finally, we show Theorem 2.13 by proving that . The nontriviality of will imply that of .
We introduce the pseudometric on as follows:
Now, let be the topology generated by . is a closed set if and only if , . In fact, if , for a given , then , for any and therefore is closed. is a topological space.
Remark 5.3. Let us go back to Example 2.11. The topology defined by asking that the sets in each are closed is strictly finer than the standard topology. On the other hand, the same example may be explained with left closed-right opened dyadic subdivisions, which leads to a different topology that also contains the natural one. Any other “reasonable” choice of subdivision will lead to the same point: the topologies are different, and all contain the standard one. In the same manner, we are going to show that all the topologies contain the standard one, .
Theorem 5.4. Let the target problem and the target algorithm be given. For any target algorithm ,
Proof. Denote by any extension of to . We have to check that , for any closed set of and (see, e.g., ).
Let , with and (take, e.g., as the closure of in ).
Note that, since , we have . But, . Actually, as tends to infinity, from the target algorithm and as tends to infinity, from the continuity of the measure.
An example of a natural extension of to is given by where, for any , is the -random variable such that . As mentioned for , is a probability measure, for any .
Corollary 5.5. For any fixed strategy , let be as in Theorem 2.4. We have for any given .
In order to describe the topology , we will denote by the closure of a set in a given topology . Note that the monotonicity of implies where is the (discrete) topology on generated by . Since is the intersection of all the topologies , we have
Proof of Theorem 2.13. Let be the closed set in so defined
that is, is the complementary of an open ball in with center and radius . If we show that , then we are done, as the arbitrary choice of and spans a base for the topology .
We are going to prove which implies . It is always true that ; we prove the nontrivial inclusion . Assume that . Now, , for any , and then there exists a sequence with such that , for any . Thus, , where is the -class of equivalence of . Thus since is -measurable. By Theorem 2.4, for any , Now, let be such that and take sufficiently large s.t. We have and therefore The arbitrary choice of implies , which is the thesis.
A. Results on Equivalence Relations
In this appendix we give the proof of auxiliary results that connect equivalency with measurability.
Proof of Lemma 3.1. Let be a discrete equivalency on . Then defines a countable measurable partition of . Conversely, let be a countable measurable partition on . Define s.t. . Therefore is measurable and is a discrete equivalency on .
Lemma A.1. Let be two random variables such that . Then .
Proof. Let be fixed. We must prove that . If or are empty, then we are done. Assume then that and define . We have the following two cases.Case 1 ( such that ). By definition of , . Conversely, let . Since , then , that is, . Then .Case 2 ( we have that ). Then . Conversely, let . Since , then , which implies , that is, . Then .
The next lemma plays a central rôle. Its proof is common in set theory.
Lemma A.2. For all , let be a discrete measurable equivalency. Then there exists a random variable such that .
Proof. Before proving the core of the Lemma, we build a sequence of functions that will be used to define the function .
Take to be the increasing function and let the sequence of function so defined: