Consistency of Probability Decision Rules and Its Inference in Probability Decision Table

Pei, Zheng; Zou, Li; Karimi, Hamid Reza; Shi, Peng

doi:https://doi.org/10.1155/2012/507857

Mathematical Problems in Engineering

On this page

Abstract Introduction Conclusion Acknowledgments References Copyright Related Articles

Research Article | Open Access

Volume 2012 | Article ID 507857 | https://doi.org/10.1155/2012/507857

Consistency of Probability Decision Rules and Its Inference in Probability Decision Table

Zheng Pei,¹Li Zou,²Hamid Reza Karimi,³and Peng Shi^4,5

Academic Editor: Zhan Shu

Received17 Jan 2012

Accepted15 Mar 2012

Published10 Jun 2012

Abstract

In most synthesis evaluation systems and decision-making systems, data are represented by objects and attributes of objects with a degree of belief. Formally, these data can be abstracted by the form (objects; attributes; P), where P represents a kind degree of belief between objects and attributes, such that, P is a basic probability assignment. In the paper, we provide a kind of probability information system to describe these data and then employ rough sets theory to extract probability decision rules. By extension of Dempster-Shafer evidence theory, we can get probabilities of antecedents and conclusion of probability decision rules. Furthermore, we analyze the consistency of probability decision rules. Based on consistency of probability decision rules, we provide an inference method to finish inference of probability decision rules, which can be used to decide the class of a new object . The conclusion points out that the inference method of the paper not only deals with precise information, but also imprecise or uncertain information as well.

1. Introduction

Processing uncertain or incomplete information may be placed in the sphere of artificial intelligence. The term “reasoning with uncertain or incomplete information” in the narrow sense means the way of representing a partial information that is available to a user about a fragment of reality and the way of processing such an information. In the broader sense, it is used to denote the interdisciplinary sphere of research concerned with the search for methods of modeling uncertain or incomplete knowledge. Those methods can refer to any application domain and any level of knowledge; one seeks way of representing both object-level knowledge and meta-level knowledge, the latter being the knowledge about the former [1–5]. Nowadays, the main tools of processing uncertain or incomplete information are fuzzy set theory, probability theory, possibility theory, Dempster-Shafer evidence theory, and rough sets theory. The main advantage of fuzzy set theory is that the fuzzy set framework provides a lot of combination operators, which allows the user to adapt the processing scheme to the specificity of the data at hand [6–17]. The probabilistic theory has solid mathematic theory, but its inference cannot model uncertain measurement [18–20]. Numerical possibility distributions can encode special convex families of probability measures. The connection between possibility theory and probability theory is potentially fruitful in the scope of statistical reasoning, because variability of observations should be distinguished from incomplete information [21, 22]. Dempster-Shafer evidence theory has the ability to deal with ignorance and missing information [23]. In particular, it provides explicit estimations of imprecision and conflict between information from different sources. Indeed, probability theory may be seen as a limit of Dempster-Shafer evidence theory when it is assumed that there is no imprecision, and that only certainty has to be taken into account [24–27]. By using indistinguishability relations, rough sets theory can model and handle incomplete information and uncertain knowledge discovered from information system [2, 28–30].

In most synthesis evaluation and decision-making systems, data reflect the relation between objects and attributes with a degree of belief, that is, an object has an attribute with a degree of belief. Informally, the information system or decision tables with a degree of belief are called probability information system or probability decision tables. In this paper, we focus on probability decision tables, we analyze the consistency of probability decision rules which are extracted from the probability decision tables, and provide an inference method based on probability decision rules. The organization of this paper is as follows. In Section 2, we make briefly a review of extension of Dempster-Shafer evidence theory (DSEV). In Section 3, we provide a kind of probability information system and probability decision tables to represent objects and attributes with a degree of belief, and it is shown that our probability information system is extension of classical information system and a special case of interval-valued information system. In Section 4, we discuss how to extract probability decision rules from a probability decision table. In Section 5, we discuss consistency of probability decision rules. In Section 6, we provide a method to finish inference of probability decision rules. We conclude in Section 7.

2. Extension of DSEV

In DSEV, probability masses are allocated to subsets of a frame of discernment in contrast to Bayesian probability theory, in which only singletons are carrying probability masses. Such subsets with positive mass are called focal elements, and the family of focal elements is said to be the basic probability assignment. So, the DSEV can be seen to be generalization of the classic probability theory. Formally, the DSEV concerns itself with belief structures, which can be defined as follows: let be a finite set of elements, and let be a measure on the subsets of such that (1) for each ; (2) ; (3) . is called a basic probability assignment function. Any subset of such that is called a focal element. and its associated values are called by a belief structure. Two important functions play a significant role in DSEV: and for any subsets and of . is called a belief function, and is called a plausibility function. measures the total amount of probability that must be distributed amongst the elements of , and measures the maximal amount of probability that can be distributed among the elements in . Denoted is the complement of , then we have(i),(ii), ,(iii), ,(iv), .

In the classical probability model, the probability mass function is a mapping from to , which indicates how the probability mass is assigned to the elements. Based on the probability mass function , the set mapping can be induced, where for each , we have . Obviously, is a belief structure; furthermore, we have . However, in DSEV, we know probabilities of the focal sets instead of probabilities of each element ; hence, we are not able to calculate the probability associated with the subsets of , but instead to use the two measures and , corresponding to a lower and upper bound on the unknown , that is, .

In [18], extending belief structure was proposed. Extending belief structure means using a belief structure defined on one frame to obtain a belief structure on another frame. Consider two frames and , whose elements are possible answers to perhaps related questions. We say that an element is compatible with an element if it is possible, relative to our knowledge and opinion, that is the answer to the question considered by the frame , and is the answer to the question considered by the frame . Denote this as . If for all and all , we have , then we say that the two questions are independent.

Let and be two frames, and is their Cartesian product, which consists of all pairs with and , then the compatibility relation is a subset of consisting of all pairs for which , that is, if . Specially, if and are independent, then . Any compatibility relation over can be represented as a multivalued mapping: such that , where is a compatibility relation. Assume that and are two frames with a compatibility relation and associated multivalued mapping . Let be a belief function defined on the frame . The extension of to as can be defined as follows: If are the focal elements of where , then , where , are the focal elements of and , over all such that . Let , then There exist two special cases of the extension: (1) assume that we know , which is a belief function defined on , the marginal belief function of on (or ) is defined as the extension of to (or ), denoted as (or ); (2) the extension of a belief function on the frame to the frame . In (2.1), two extensions are expressed as follows.(1)Let and (or ), then we have a multivalued mapping in which . Thus, we have a belief function Furthermore, , or .(2)The marginal belief function on (or ) is If are focal elements of with , we get of (or of ) as (), where , that is, is projection of onto (or ).

Assume that is a belief function on , and let be the extension of onto that we know nothing about the answer on . In this case, we call the minimal extension of ; in particular, we assume that every answer in is compatible with any answer in ; thus, compatible function is So, we have extension of onto as . Assume that , then as , we have , and as , we have . Hence, , and . Assume that , then , ; thus, , and . If we have , we can get , where . Since , then , that is, is the set consisting of each element in coupled with each element in .

Assume that we have two independent sources of evidence as the location of the special element in , which have associated belief structures and , respectively. The problem is to find a combined belief structure over reflecting the “ANDing” of the two pieces of evidence. Let and be two belief structures on with focal elements and , respectively, then their combination, denoted by , is a belief structure over such that (1) ; (2) ; (3) . The focal elements of are all sets such that . Because satisfies commutativity and associativity, we have for belief structures on . indicates the normalization factor needed to assure . If , then no normalization is required. If , then we cannot obtain based on the above method (1). In this case, the belief structures are completely conflicting, and we need another method for combining evidence [18, 20].

3. Probability Information Systems

Information systems, sometimes called data tables, attribute value systems, condition action tables, knowledge representation systems, and so forth are used for representing knowledge and have been popularly used in artificial intelligence [2]. Formally, a pair is called information systems. Where is a nonempty, finite set called the universe and a nonempty, finite set of attributes, that is, for , where is called the value set of , elements of are called objects and interpreted as cases, states, processes, patients, and observations. Attributes are interpreted as features, variables, characteristic, and conditions. As a special case of information systems, a decision table has the form , where is a distinguished attribute called decision (attribute). The elements of are called conditions (attributes).

The relation between an object and a value of an attribute is certain, that is, is a function, and either or is true. In real practice, the relation between objects and attribute values is uncertain, that is, with a degree of belief . If , it means that . If , it means that . If , it means that with uncertainty. In this paper, we limit the degree of belief in probability of .

Definition 3.1. A triple is called a probability information system, where is a nonempty, finite set called the universe, a nonempty, finite set of attributes, and , such that for each , , means that object has of with probability .

In Definition 3.1, for any fixed object , is a probability density function on of . Hence, a probability information system is also understood by an information system with a probability density function on the value set of each attribute for each object. An example of a probability information system is shown in Table 1, in which, for object 1 and solar energy, (high, 0.2) means that object 1 has high solar energy with probability 0.2, that is, . For object 1 and residual , (high, 0) means that object 1 has not high residual , that is, . For object 3 and temperature, (low, 1) means that object 4 has low value, that is, .

In a probability information system, for any object and , we denote If ( is cardinality of ), then there exists unique such that due to . In this case, the probability information system is reduced to an information system because for each object and , there exists only one such that . From this point of view, probability information systems are extension of information systems. If , then the probability information system becomes an interval-valued information system when all probabilities are not considered, in such case, for each object and , we have . Hence, a probability information system of Definition 3.1 is a special case of interval-valued information system. Deference between them is in interval-valued information system, but with a probability density function on in probability information system.

Based on probability information system, probability decision tables have the form , is called conditions, and is called decision. Let , such that for each and , . For simplicity, denote and as and , respectively, when , and and as and when . For any object and , denote . If , then probability decision table is reduced to classical decision tables. If , then probability decision table becomes interval-valued decision tables when all probabilities are not considered.

Let and . For each and attribute value set of attribute , , denote as a probability density function on , and denote as a probability density function on decision value set . Formally, a probability decision table is shown in Table 2.

In Table 2, is the probability of “ that has the value of ", where , and for each , . For each and , denote

Then we can get a reduced probability decision table shown in Table 3, which is called the maximum probability decision table in this paper.

4. Probability Decision Rules

In a decision table , decision rules can be extracted and formalized as follows: where is a formula, which is generated by some finitely using connectives or , , and . For more details about decision rules, reader can consult [2, 28–30]. In this section, we discuss some properties and formalization of probability decision rules based on Table 3.

Firstly, we can use rough set theory to deal with the information of Table 3 without considering probability ; for simplicity, denote Then attributes of Table 3 can be represented as follows: We can define an equivalence relation on as follows: Then lower and upper approximation, reduction of attributes, decision rules, consistent decision tables, and so on can be discussed. However, in the paper, the problem that we need to deal with is how to get probability of every decision rule. By using rough set theory, we can get decision rules as follows: where , , , and . Let is the equivalence class decided by , is the equivalence class decided by . and , is the probability of “ has of ”. Let , then we have independent sources of evidence over , and belief structure of each source of evidence is as follows: where . Based on (4.6), we can obtain a combined belief structure over , that is, as follows [19]: For attribute , is the maximum probability that has the value of , so, and , , that is, This means that is the maximum possibility of all . Similarly, we can get the probability of each as follows: where is the maximum possibility of .

According to (4.8) and (4.12), we can get a probability decision rule as follows: where , . The decision rule (4.5) is a special case of the probability decision rule (4.14), that is, if every and . Obviously, for every ,

Theorem 4.1. Let if , then .

Proof. By , we know that , , , so , by (4.8),

Theorem 4.2. Let be a new object, then if and only if .

Proof. According to (4.7), we know that . So, we have and this means that if , then that is, . Converse is obvious.

Corollary 4.3. If , then

According to the maximum possibility, Corollary 4.3 means that the more the elements of are, the more the probabilities of of are. Intuitively, the elements of reflect the degree of belief of a decision rule; the more the elements of are, the more the degree of belief of a decision rule is. Theorem 4.2 means that, sometimes, adding an element to can make the degree of belief of a decision rule decrease. According to the condition of Theorem 4.2, one can see that is not the maximal probability. From the probability point of view, may not be in ; if is forced in , then the degree of belief of a decision rule will decrease.

5. Consistency of a Probability Decision Rule

From the logic point of view, if there exists a valuation such that and , then is satisfiable, otherwise, is not satisfiable. We discuss consistency of probability decision rule similar to satisfiability of .

Definition 5.1. In probability decision rule (4.14), for a new object , if , , is such that then there exists consistency between and the probability decision rule, and we also call antecedents and conclusion of the probability decision rule consistent. If there does not exist such , then we call antecedents and conclusion of the decision rule inconsistent.

By Theorem 4.2, we know that if and only if (5.1) is satisfied. Obviously, if and are maximum probability assignments, respectively, that is, if they satisfy the condition of Corollary 4.3, then (5.1) is true. So, using maximum probability assignments to get a probability decision rule, its antecedents and conclusion are consistent, and for a new object and its probability assignments on , if we use maximum probability assignment to decide the class of , then and the probability decision rule are consistent. However, sometimes, there exists a case: for a new object , according to maximum probability assignment, we get each attribute value and , but there does not exist decision rule such that its antecedents and conclusion match and , respectively. In this case, using (5.2), we can choose a probability decision rule such that and the probability decision rule are consistent. If there exist more than one decision rule, then is included in the decision rule such that its conclusion is .

Corollary 5.2. A new object and decision rule (4.14) are consistent if and only if the probability assignments of are such that (5.2).

Theorem 5.3. For a new object , let if and , then .

Proof. By , we know that On the other hand, By , get , so we have , that is, .

6. Inference of Probability Decision Rules

Inspired by inference method of Zadeh [13], we provide an inference method of probability decision rules in this section, where the decision rule has the form . Assume that we have a new element shown in Table 4, we infer a probability density function on and decide the class of and its degree of belief. The inference process can be rewritten as follows: In (6.1), the probability decision rule has the universes of of antecedent and of conclusion. In (6.1), can be rewritten as “if the degree of belief of is and and the degree of belief of is , then the degree of belief of is ,” can be rewritten as “the degree of belief of attribute values of is Table 4”; the conclusion is “how many are the degrees of belief of decision attribute values of .” The single condition probability decision rule of has the form: In the paper, domains of antecedents and conclusion of are and , respectively. We rewrite probability density functions as follows: Then, (6.1) and (6.2) can be modified as Let and , then the logical combinations of and are given as follows: , ,, and . Then, we can get the following relations on : The inference for (6.5) can be expressed as follows: where operator is According to (6.11), we can obtain ; generally, maybe , however, every of can be normalized as follows: where is normalized factor. By (6.12), we can get the probability density function on , According to the maximum probability of (6.13), we can decide in which class is included and its degree of belief.

Theorem 6.1. In (6.11), if and are maximum probabilities, respectively, then is the maximum probability of (6.13).

Proof. By (6.6) and (6.11), we know that = . Obviously, and , so we have , that is, is the maximum probability of (6.13).

According to Corollary 4.3, Theorem 6.1 shows that the inference conclusion can make and decision rule consistent. For simplicity, in the rest of this section, we always assume that , , and are maximum probabilities, respectively.

Theorem 6.2. If , and , then is the maximum probability of (6.13).

Theorem 6.3. If is the maximum probability.

Theorem 6.4. If , then is the maximum probability of (6.13).

Proof. By (6.9) and (6.11), we know that = = = . When , by , we know that , so . , and . obviously, , and is the maximum probability.

Remark 6.5. According to (5.2), if and are maximum probability assignments, respectively, then the probability decision rule is consistent. Theorems 6.1–6.4 show that we obtain maximum probability. Hence, using – ensures the consistency of decision rule.

Theorem 6.6. If , then , is the minimum probability of (6.13).

Proof. By (6.10) and (6.11), we know that = . By , we know that , so . = , so we have . For , by , we get , is minimal probability of (6.13).

Remark 6.7. According to (5.2), if and are maximum probability assignments, respectively, then the probability decision rule is consistent. If we use , Theorem 6.6 show that we obtain minimal probability. Hence, using cannot ensure the consistency of decision rule.

7. Conclusion

In this paper, we provide a kind of probability information system and probability decision tables to represent a degree of belief of “objects have attributes,” formally, the kind of probability information system is extension of classical information system and a special case of interval-valued information system. Based on rough set theory, we discuss extraction of probability decision rules. Then we analyse consistency of probability decision rules. Finally, we provide a method to finish inference of probability decision rules.

Appendices

A. The Proof of Theorem 6.2

Proof. By (6.7) and (6.11), we know that = , where index of shows the column of , when , noted by . By , , we get . So, By for all, we have , and , so = and obviously, . So, we have = . By for all , we have = . By , , we have ,. Because for all , , so , , and , that is, is the maximum probability of (6.13).

B. The Proof of Theorem 6.3

Proof. By (6.7) and (6.11), we know that = , where index of shows the column of , when , noted by . By , , we get ,. So, . By for all , we have , and , so , and obviously, . So, we have ,. By for all , we have = . By , , we have ,. Because for all , , so , , and that is, is the maximum probability of (6.13).

Acknowledgments

This work is partially supported by Sichun Key Laboratory of Intelligent Network Information Processing (SGXZD1002-10), National Natural Science Foundation (61175055, 61105059), Sichuan Key Technology Research and Development Program (2012GZ0019, 2011FZ0051), and Liaoning Excellent Talents in University (LJQ2011116).

References

O. Khozaimya, A. Al-Dhaherib, and A. M. M. Sharif UllahA, “Decision-making approach using point-cloud-based granular information,” Applied Soft Computing, vol. 11, no. 10, pp. 2576–2586, 2011.
View at: Google Scholar
E. Orlowska, Incomplete Information: Rough Set Analysis, vol. 13, Physica, Heidelberg, Germany, 1998.
Z. Pei, D. Ruan, J. Liu, and Y. Xu, Linguistic Values Based Intelligent Information Processing: Theory, Methods, and Application, Atlantis press & World Scientific, 2009.
G. Resconi and A. J. van der Wal, “Morphogenic neural networks encode abstract rules by data,” Information Sciences, vol. 142, no. 1, pp. 249–273, 2006.
View at: Google Scholar
R. R. Yager, “A framework for reasoning with soft information,” Information Sciences, vol. 180, no. 8, pp. 1390–1406, 2010.
View at: Publisher Site | Google Scholar | Zentralblatt MATH
C. Basaran, M. H. Suzer, K. D. Kang, and X. Liu, “Robust fuzzy CPU utilization control for dynamic workloads,” Journal of Systems and Software, vol. 83, no. 7, pp. 1192–1204, 2010.
View at: Publisher Site | Google Scholar
M. Doumpos and C. Zopounidis, “Preference disaggregation and statistical learning for multicriteria decision support: a review,” European Journal of Operational Research, vol. 209, no. 3, pp. 203–214, 2011.
View at: Publisher Site | Google Scholar | Zentralblatt MATH
J. H. Luo, X. Liu, Y. Zhang, D. X. Ye, and Z. Xu, “Fuzzy trust recommendation based on collaborative filtering for mobile ad-hoc networks,” in Proceedings of the 33rd Annual IEEE Conference on Local Computer Networks, Montreal, QC, Canada, 2008.
View at: Google Scholar
Z. Pei, Y. Xu, D. Ruan, and K. Qin, “Extracting complex linguistic data summaries from personnel database via simple linguistic aggregations,” Information Sciences, vol. 179, no. 10, pp. 2325–2332, 2009.
View at: Google Scholar
Z. Pei, X. Liu, and L. Zou, “Extracting association rules based on intuitionistic fuzzy sets,” International Journal of Innovative Computing, Information and Control, vol. 6, no. 6, pp. 2567–2580, 2010.
View at: Google Scholar
J. Rao, Y. D. Wei, J. Y. Gong, and C. Z. Xu, “DynaQoS: Model-free self-tuning fuzzy control of virtualized resources for QoS provisioning,” in Proceedings of the IEEE 19th International Workshop on Quality of Service, 2009.
View at: Google Scholar
P. C. Xiong and Y. S. Fan, “QoS-aware web service selection by a synthetic weight,” in Proceedings of the 4th International Conference on Fuzzy Systems and Knowledge Discovery, vol. 2, pp. 632–637, Hainan, China, 2007.
View at: Google Scholar
L. A. Zadeh, “A note on Z-numbers,” Information Sciences, vol. 181, no. 14, pp. 2923–2932, 2011.
View at: Publisher Site | Google Scholar
L. A. Zadeh, “Generalized theory of uncertainty (GTU)—principal concepts and ideas,” Computational Statistics & Data Analysis, vol. 51, no. 1, pp. 15–46, 2006.
View at: Publisher Site | Google Scholar | Zentralblatt MATH
L. A. Zadeh, “Probability measures of fuzzy events,” Journal of Mathematical Analysis and Applications, vol. 23, pp. 421–427, 1968.
View at: Publisher Site | Google Scholar | Zentralblatt MATH
L. A. Zadeh, “Fuzzy sets,” Information and Computation, vol. 8, pp. 338–353, 1965.
View at: Google Scholar | Zentralblatt MATH
L. Zou, Z. Pei, X. Liu, and Y. Xu, “Semantic of linguistic truth-valued intuitionistic fuzzy proposition calculus,” International Journal of Innovative Computing, Information and Control, vol. 5, no. 12, pp. 4745–4752, 2009.
View at: Google Scholar
R. R. Yager, “On the fusion of imprecise uncertainty measures using belief structures,” Information Sciences, vol. 181, no. 15, pp. 3199–3209, 2011.
View at: Publisher Site | Google Scholar | Zentralblatt MATH
R. R. Yager, “Including probabilistic uncertainty in fuzzy logic controller modeling using Dempster-Shafer theory,” IEEE Transactions on Systems, Man and Cybernetics, vol. 25, no. 8, pp. 1221–1230, 1995.
View at: Google Scholar
R. R. Yager, “On the Dempster-Shafer framework and new combination rules,” Information Sciences, vol. 41, no. 2, pp. 93–137, 1987.
View at: Publisher Site | Google Scholar | Zentralblatt MATH
D. Dubois, “Possibility theory and statistical reasoning,” Computational Statistics & Data Analysis, vol. 51, no. 1, pp. 47–69, 2006.
View at: Publisher Site | Google Scholar | Zentralblatt MATH
D. Dubois and H. Prade, “On possibility theory, formal concept analysis and granulation survey,” Applied and Computational Mathematics, vol. 10, no. 1, pp. 10–19, 2011.
View at: Google Scholar
A. P. Dempster, “New methods for reasoning towards posterior distributions based on sample data,” Annals of Mathematical Statistics, vol. 37, pp. 355–374, 1966.
View at: Publisher Site | Google Scholar | Zentralblatt MATH
D. Fixsen and R. P. S. Mahler, “The modified Dempster-Shafer approach to classification,” IEEE Transactions on Systems, Man, and Cybernetics Part A, vol. 27, no. 1, pp. 96–104, 1997.
View at: Google Scholar
Y. Tang and Y. Xu, “Application of fuzzy naive Bayes and a real-valued genetic algorithm in identification of fuzzy model,” Information Sciences, vol. 169, no. 3-4, pp. 205–226, 2005.
View at: Publisher Site | Google Scholar
Y. Tang and J. Zheng, “Generalized Jeffrey's rule of conditioning and evidence combining rule for a priori probabilistic knowledge in conditional evidence theory,” Information Sciences, vol. 176, no. 11, pp. 1590–1606, 2006.
View at: Publisher Site | Google Scholar | Zentralblatt MATH
J. Yen, “Generalizing the Dempster-Shafer theory to fuzzy sets,” IEEE Transactions on Systems, Man, and Cybernetics, vol. 20, no. 3, pp. 559–570, 1990.
View at: Publisher Site | Google Scholar | Zentralblatt MATH
Z. Pawlak, “Rough sets, decision algorithms and Bayes' theorem,” European Journal of Operational Research, vol. 136, no. 1, pp. 181–189, 2002.
View at: Publisher Site | Google Scholar | Zentralblatt MATH
Z. Pawlak, “Rough sets and intelligent data analysis,” Information Sciences, vol. 147, no. 1–4, pp. 1–12, 2002.
View at: Publisher Site | Google Scholar | Zentralblatt MATH
Z. Pawlak and A. Skowron, “Rough sets and Boolean reasoning,” Information Sciences, vol. 177, no. 1, pp. 41–73, 2007.
View at: Publisher Site | Google Scholar

Copyright

Copyright © 2012 Zheng Pei et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

PDF Download Citation

Download other formats

Order printed copies

Views

4702

Downloads

937

Citations