Abstract

The task of finding all the minimal inconsistent subsets plays a vital role in many theoretical works especially in large knowledge bases and it has been proved to be a NP-complete problem. In this work, at first we propose a max-term counting based knowledge inconsistency checking strategy. And, then, we put forward an algorithm for finding all minimal inconsistent subsets, in which we establish a Boolean lattice to organize the subsets of the given knowledge base and use leaf pruning to optimize the algorithm efficiency. Comparative experiments and analysis also show the algorithm’s improvement over past approaches. Finally, we give an application for inconsistency measure calculation of fuzzy knowledge based systems.

1. Introduction

A large knowledge system operating for a long time almost inevitably becomes polluted by wrong data that make the system inconsistent. Despite this fact, a sizeable part of the system remains unpolluted and retains useful information. It is widely adopted that a maximal consistent subset of a system contains a significant portion of unpolluted data [1]. So, simply characterizing a knowledge base as either consistent or inconsistent is of little practical value, and thus ensuring the consistency becomes an important issue [24].

In practice, there are two types of methods: one method is based on minimal inconsistent subsets, where every strict subset is consistent and the other is directly based on maximal consistent subsets. Actually, the relationship between minimal inconsistent subsets and maximal consistent subsets was discovered separately in [1, 5, 6], which is known as the hitting subset problem [7].

As finding minimal inconsistent subsets or maximal consistent subsets is NP-complete, the most efficient algorithm is not known yet, and there are a number of heuristic optimizations that can be used to substantially reduce the size of the search space. In practice, heuristic information [8, 9], optimization [10, 11], and hybrid techniques [12, 13] are recognized to reduce time complexity. In the latest research, McAreavey et al. presented a computational approach to finding and measuring inconsistency in arbitrary knowledge bases [14], while Mu et al. gave a method for measuring the significance of inconsistency in the viewpoints framework [15]. In all the abovementioned works, effectively finding minimal inconsistent subsets is the critical step which has a great impact on the applications especially for large knowledge bases. Apparently, its computational complexity depends on the underlying strategies used for checking the consistency of subsets of the knowledge base, but till now this important issue has not gotten a satisfying solution.

In this paper, we first propose an efficient strategy to check the consistency of a given knowledge base. And, then, we put forward an algorithm to find all of the minimal inconsistent subsets of the given knowledge base. Thereafter, to illustrate the algorithm’s improvement, we conduct thorough comparative experiments and analysis with respect to one of the latest proposed algorithm MARCO [16] and give a discussion on the relative algorithms DAA [17] and PDDS [18]. Finally, we give an application for inconsistency measure calculation of fuzzy knowledge based systems.

2. Theoretical Basis

Let denote the propositional language built from a finite set of variables using logical connectives and logical constants . Every variable is called an atomic formula or an atom. A literal is an atom or its negation. A clause is a formula restricted to a disjunction of literals and let denote the set of variables in a clause . A knowledge base is a finite set of arbitrary formulae.

As every formula can be converted into an equivalent conjunction normal form (CNF) formula, knowledge base can be normalized in such a way that every formula contained in it is a clause. For a given normalized knowledge base, if there are no redundant clauses, we say it is an optimized knowledge base.

By the syntactic approach in proof theory, if both and can be derived from a knowledge base , then we say is inconsistent. With the semantic approach in model theory, an interpretation or world is a function from to the set of Boolean values . Let denote the set of worlds of . A world is a model of , denoted as , iff is true under in the classical truth-functional manner. Let denote the set of models of ; that is, . We say that is satisfiable iff there exists a model of . Conversely, is unsatisfiable iff there are no models of . These two approaches coincide in propositional logic; that is, a knowledge base is consistent iff is satisfiable.

In the following discussion, let the Greek lower case letters be formulae from and English lower case letters variables from .

Definition 1. For a Boolean function of variables , a sum term in which each of the variables appears once (in either its complemented or uncomplemented form) is called a max-term.

Proposition 2. Let be variables and the different max-terms built on these variables. Then .

For example, let , , , and be formulas built on . Then it is trivial to show that , which means no assignments to and satisfy , , , and simultaneously.

Definition 3. Let be a formula built on a set of variables . Then we call the extension of ; that is,and, for each , , .

For example, let be a formula built on . Then we have and . Actually, a simple manipulation leads to .

Remark 4. It is easy to see that carrying extension of does not change the original meaning of .

Theorem 5. Let and      be two formulas built on a set of variables , in which , . Then the following propositions hold:(i)If there exists a variable such that one of and appears in and the other one appears in , then .(ii)Otherwise, .

Proof. (i) As there exists a variable such that one of and appears in and the other one appears in , then one of and must appear in and the other one must appear in , which makes differ from . Therefore we have .
(ii) If there does not exist a variable such that one of and appears in and the other one appears in , then there are two situations that need to be surveyed, respectively.
Situation . If , then there exists only one common formula of and , that is, . Hence, we have , which is equivalent to Situation . If , then the common formula of and is where . Hence, we haveTherefore, combining the above two situations, the theorem is proved.

Corollary 6. Let be a set of formulas built on a set of variables . Then the following propositions hold:(1)If there exist a variable and two formulas and such that one of and appears in and the other one appears in , then .(2)Otherwise, .

Theorem 7. Let be a set of formulas built on a set of variables . Then after carrying extensions for , respectively, and using to denote the number of different formulas obtained, we have Moreover, if , then is inconsistent; otherwise is consistent.

Proof. According to inclusion-exclusion principle and in light of Proposition 2 the proof is trivial.

Let be a set of formulas defined on . According to Theorem 7, we have and thus is consistent.

3. An Algorithm for Finding All Minimal Inconsistent Subsets

In this section, at first we propose an algorithm for finding all nominal inconsistent subsets via Boolean lattice. And, then, we give an illustrative example and a thorough comparative study with algorithm MARCO by using the number of visited subsets as the benchmark. Besides this, we also give a discussion on relative algorithms DAA and PDDS.

3.1. Algorithm

An algorithm to find the minimal inconsistent subsets of a given knowledge system must check each of its subsets for inconsistency. One way to proceed is to construct a Boolean lattice of subsets of the given knowledge system, which is initially used by Bird and Hinze in the process of finding the maximal consistent subsets [19].

A lattice is called a Boolean lattice if(i) is distributive,(ii) has 0 and 1,(iii)each has a complement .

Figure 1 sketches a three-variable Boolean lattice, where all the labels of the nodes consist of the power set of set .

In Algorithm 1, a Boolean lattice is also established and leaf pruning is adopted to optimize the algorithm efficiency. Because the cardinality of the subsets at each level is smaller than those on the level above it, a breadth-first search of the lattice will consider all smaller sets before any larger ones. Apparently, leaf pruning strategy can be used based on the fact that if a node denotes an minimal inconsistent subset, then all of its ancestors are inconsistent; dually, if a node denotes a consistent subset, then all of its descendents are consistent.

Input: a knowledge base
Output: all the minimal inconsistent subsets of knowledge base
Begin
(1) normalize the knowledge base to ensure that every formula contained in it is a clause,
   and denoted it as ;
   initialize to be an empty set of sets;
(2) build a variable Boolean lattice with each node denoting a set of formulas
   (if a node denotes a formula set, then we call it a -degree node), and give each node a unmarked flag;
(3) set up an empty list List, and put all of the 2-degree nodes into List;
   fetch the head Head from List;
   if Head is inconsistent, then  //by using Theorem 7
    Begin
     put Head into ;
     mark all the ancestors of Head;
    End
   else Head is consistent, then
    Begin
     insert all of the un-marked upper neighbors of Head at the front of List;
     mark all the descendants of Head, and if they exist in List, then remove them from List;
    End
(4) return .
End.

According to Theorem 7, directly computing is time consuming, so we will store the intermediate calculation results. For example, in Algorithm 1, for each visited node (whose corresponding formula set is ), we will store the value of . Apparently, when computing the value of -degree node, the stored value of each -degree nodes can be reused to save time cost.

In the minimal inconsistent subsets finding algorithm proposed in [14], which is derived indirectly on maximal consistent subset, there exists a disadvantage that while getting maximal consistent subset, pseudo-maximal consistent subset will be generated [11, 15]. As our proposed algorithm always checks the smaller sets before the larger sets, so it can overcome this problem.

If the maximal cost for checking inconsistency of is , the complexity of this algorithm is in the worst case. In the following, by using experiment, we will show the relationship between the number of subsets that were checked for inconsistency and the size the of the knowledge base with respect to different probabilities that two formulas are consistent.

In the experiment, we use generator GENBAL [20] to generate knowledge bases. The graphs in Figures 2 and 3 show the number of subsets that were checked for inconsistency related to , the number of clauses contained in the given normalized knowledge base and , and the probability that two formulas are consistent. All counts are averaged across 100 randomly generated formulae by using GENBAL.

From Figures 2 and 3, we can see that larger values for mean that more subsets will be checked. Moreover, it is easy to show that larger values for generally also lead to fewer and smaller minimal inconsistent subsets.

3.2. An Illustrative Example and Comparative Study

Apart from our proposed method, there are many other solvers for computing minimal inconsistent subsets. One of the latest published algorithm is MARCO [16], which adopts the most recent advances. At first we give an illustrative example and then we compare our method with MARCO.

Example 8. Considering a set of formulas, , , , and , which is used in [16], we use Algorithm 1 to find all the minimal inconsistent subsets.
At first we also establish a Boolean lattice, which is shown in Figure 4.
Then we establish a list , and initialize as

Fetch the head from ; that is, . According to Theorem 7, is inconsistent, and then mark its ancestors ,  , and .

Fetch the head from ; that is, . According to Theorem 7, is consistent; insert its unmarked upper neighbor at the front of , and mark its descendants and .

Fetch the head from ; that is, . According to Theorem 7, is inconsistent.

Fetch the head from ; that is, . According to Theorem 7, is consistent, and mark its unmarked descendant .

Fetch the head from ; that is, . According to Theorem 7, is consistent; insert its unmarked upper neighbor at the front of , and mark its unmarked descendant .

Fetch the head from ; that is, . According to Theorem 7, is consistent, and mark its unmarked descendant and and remove and from .

At this point, is empty; algorithm terminates with results and .

It is apparent that in order to get all the minimal inconsistent subsets we have to judge the consistency of 6 sets, which are

Fundamentally, the MARCO algorithm operates repeatedly:(i)Selecting an unexplored point in the power set lattice, a subset of that we call a seed.(ii)Checking the satisfiability of the seed.(iii)Growing or shrinking it to an MSS or an MUS as appropriate.(iv)Marking a corresponding region of the lattice as explored.

When we use algorithm MARCO, the consistency of 10 sets needs to be considered one by one, which areSo, our algorithm performs better than MARCO with respect to the number of the visited sets.

The difference between our algorithm and MARCO is that our algorithm traverses the Boolean lattice incremental according to the cardinalities of the sets while MARCO traverses the Boolean lattice randomly, as the function GetUnexplored() used in MARCO randomly returns any unexplored sets. As the objectives of both algorithms are to find all of the minimal sets which are inconsistent, the incremental feature of our algorithm brings a higher efficiency.

In the comparative study, we also use generator GENBAL [20] to generate knowledge base. Number of visited subsets is used as the benchmarks, as it is more objective than the other benchmarks. For example, we do not choose CPU times as the benchmark as it is strongly affected by the running environment, including the status of both hardware and software.

The graphs in Figures 5, 6, and 7 show the number of subsets that were checked for inconsistency related to , the number of clauses contained in the given normalized knowledge base and , and the probability that two formulas are consistent. All counts are averaged across 100 randomly generated formulae by using GENBAL. All of Figures 5, 6, and 7 show that our algorithm performs better than MARCO with respect to the number of the visited sets.

DAA is another algorithm that exploits the hitting set duality between minimal correction sets (MCSes) and minimal unsatisfiable subsets [17]. DAA uses the Grow subroutine on known-satisfiable subsets to produce maximal satisfiable subsets (MSSes) and their complementary MCSes and then computes minimal hitting sets of the MCSes found thus far. PDDS, an approach closely related to DAA, was later proposed [18]. The main differences are that PDDS takes an initial set of either maximal satisfiable subsets (MUSes) or MCSes as input, and PDDS does not necessarily compute all hitting sets of the MCSes at each iteration, avoiding the memory scaling issues of DAA.

The DAA and PDDS algorithms have the benefit that they are decoupled from the choice of hitting set algorithm. It is pointed out that the choice of the incremental algorithm presented by Fredman and Khachiyan [21] for computing hitting sets results in a version of the DAA algorithm with worst case runtime that is subexponential in the size of the output [22]. And studies have shown that MARCO performs better than DAA and PDDS [16].

4. Inconsistency Measure Calculation for Fuzzy Knowledge Based Systems

In this section, we show an application of Algorithm 1 for inconsistency measure calculation of fuzzy knowledge based systems.

Fuzzy knowledge based systems are a typical rule-based inference system for providing expertise over a domain, which is capable of drawing conclusions from given uncertain evidence [23]. In fuzzy knowledge based systems, knowledge is represented by using possibilistic logic.

Let denote a fuzzy knowledge based system, in which are classical propositional logic formulas and are their possibility measures.

Definition 9. Let be a fuzzy knowledge based system. If contains two formulas , with , then we call the possibility based deduction result of and denote by .

Definition 10. Let be a possibility formula built on a set of variables . Then we call the extension of ; that is,and for each , , .

Definition 11. Let be a fuzzy knowledge based system. Then we call the projection of onto the classical knowledge base.

Definition 12 (see [24]). Let be a fuzzy knowledge based system. If is inconsistent, then its inconsistency measure is defined as

Theorem 13. Let be a fuzzy knowledge based system built on a set of variables . After extensions and possibility based deduction of are performed, is derived. If , then is inconsistent, and its inconsistency measure is

Proof. As and is result of extensions and possibility based deduction of , then . Since , we know that is inconsistent, and its inconsistency measure is . Hence, the theorem is proved.

Example 14. Let be a fuzzy knowledge based system built on .
After carrying extensions and possibility based deduction of , we getAccording to Theorem 13, we know that is inconsistent and its inconsistency measure is 0.3.

5. Conclusions

The purpose of this paper is to find all the minimal inconsistent subsets of a given knowledge system. Initially we propose a max-term counting based knowledge inconsistency checking strategy. And, then, we put forward an algorithm for finding all minimal inconsistent subsets, in which we establish a Boolean lattice to organize the subsets of the given knowledge base and use leaf pruning to optimize the algorithm efficiency. Finally, we give a method for inconsistency measure calculation of fuzzy knowledge based system.

As in a fuzzy knowledge based system, there may be several statements in contradiction to each other; how to measure the significance of the inconsistency is a valuable problem for further study.

Conflict of Interests

The author declares that there is no conflict of interests regarding the publication of this paper.

Acknowledgments

The work presented in this paper is supported by Doctorial Foundation of Henan Polytechnic University (B2011-102). The author also gratefully acknowledges the helpful comments and suggestions of the reviewers, which have greatly improved the presentation.