Abstract

As an important tool for data analysis and knowledge processing, formal concept analysis (FCA) has been applied to many fields. In this paper, we introduce a new method to find all formal concepts based on formal contexts. The amount of intents calculation is reduced by the method. And the corresponding algorithm of our approach is proposed. The main theorems and the corresponding algorithm are examined by examples, respectively. At last, several real-life databases are analyzed to demonstrate the application of the proposed approach. Experimental results show that the proposed approach is simple and effective.

1. Introduction

Formal concept analysis (FCA), proposed by Wille in 1982 [1], is a field of applied mathematics based on the mathematization of concept and conceptual hierarchy. It thereby activates mathematical thinking for conceptual data analysis and knowledge processing. FCA starts with a formal context defined as a triple containing an object set, an attribute (property) set, and a binary relation between the object set and the attribute set. A formal concept is a pair (object subset, attribute subset) induced by the binary relation, and a concept lattice is an ordered hierarchical structure of formal concepts. A formal context in FCA corresponds to a special information system with input data being two-valued in rough set theory [2].

Most of the researches on FCA concentrate on the following topics: construction and pruning algorithm of concept lattices [3, 4]; relationship between FCA and rough sets [510]; acquisition of rules [11, 12]; reduction of concept lattices [6, 10, 13]. FCA is also proved to be useful in many fields, such as the organization of web search results on a hierarchical structure of concepts based on common topics [14], information retrieval [15, 16], hierarchical analysis of software code [1719], visualization in software engineering [19, 20], detecting suspects in human traficking [21], analysis of questionnaire data [22], and mining gene expression data [23]. Further references to applications of FCA can be found in [14, 24].

Formal concepts are very important notions of FCA. And intents and extents are also very important elements of formal concepts. The set of intents (extents) is isomorphic to the corresponding concept lattice under the order relationship “” (“”). So, if the set of intents is determined, the corresponding concept lattice is identified. Thus, obtaining all intents or extents is very important. Generally, the basic way to obtain all intents or extents is via their definitions. If there are objects, then we should calculate times to obtain all intents. Obviously, the computational costing is very huge. To solve this problem, we give a new method to obtain all intents. And correspondingly, the formal concepts are determined.

This paper is organized as follows. In Section 2, we briefly review some basic notions related to FCA. In Section 3, a novel concept acquisition approach is introduced and some related conclusions are given. In Section 4, the corresponding algorithm is proposed and experimental results are shown to illustrate the validity of our method. Finally, conclusions are drawn in Section 5.

2. Preliminaries

In this section, we recall some basic notions and properties in FCA.

Definition 1 (see [24]). A formal context consists of two sets and and a relation between and . The elements of are called the objects and the elements of are called the attributes of the context. In order to express that an object is in a relation with an attribute , we write or and read it as “the object has the attribute m.”
With respect to a formal context , Ganter and Wille [24] defined a pair of dual operators for any and by
A formal context is called canonical if  , , , and , , . We assume that all the formal contexts we study in the sequel are finite and canonical.
Let be a formal context. , ; the following properties hold.(1),   .(2),  .(3), .(4).(5).(6).
If and , then is called a formal concept, where is called the extent of the formal concept and is called the intent of the formal concept. For any , a pair is a formal concept and is called an object concept. Similarly, for any , a pair is a formal concept and is called an attribute concept. The family of all formal concepts of forms a complete lattice that is called the concept lattice and is denoted by . For any , the partial order is defined by And the infimum and supremum of and are defined by
respectively.

Definition 2 (see [10]). Let be a formal context. and . Denote and is an intent of .

Example 3 (see [10]). Table 1 is a formal context . is an object set and is an attribute set. The corresponding concept lattice is shown in Figure 1, in which every set is denoted directly by listing its elements except , , and .

3. A Novel Concept Acquisition Approach

The basic way to obtain all intents or extents is via their definitions. If there are objects, then we should calculate times to get all intents. Obviously, the amount of computation is very large. So our paper presents a new approach to solve the problem. In this section, we give this new method and some theorems to explain its rationality and validity.

Before giving the method, we firstly propose a related definition.

Definition 4. Let be a formal context. , . Denote and , and , , where presents the cardinal of a set.

Since the method in this paper is aimed at obtaining all intents, we use subsets of to determine subsets of . On the contrary, if we want to obtain all extents, the subsets of can be used to determine subsets of . This point has been illustrated in the sequel.

Theorem 5. is an intent of an object concept.

Proof. The proof is immediately obtained from Definitions 1 and 4.

Theorem 6. If there exists () such that , then .

Proof. Suppose . By Definition 4, there exists such that .
Since , there exists such that . Noting that , we have . Moreover, from , we know that there exists satisfying ; that is, .
Now, we discuss two cases to prove .
The one case is that . In this case, . Thus, .
The other one is that . In this case, . Because , there exists such that . Therefore, we have ; that is, . That means, . Therefore, we have . Thereby, we can obtain .
To sum up the above two cases, holds.

Theorem 6 guarantees the convergence of Algorithm 2 involved in the sequel.

Corollary 7. If there exists () such that , then for any , , we have .

Proof. According to the condition , we have by Theorem 6. Using Theorem 6 repeatedly, we can easily obtain the following results: .

Theorem 8. Suppose , , ; if , then we have .

Suppose ; there is satisfying ; according to the condition and Definition 4, there is satisfying .
Because , there exists such that ; that is, . Obviously, , and thus, . That is, . That means . Therefore, we have . From Definition 4, we know that . It is a contradiction with .
Therefore, holds.

Corollary 9. Suppose , , and ,; then for any , , .

Proof. Because , and , we have by Theorem 8. Using Theorem 8 repeatedly, we have .

Theorem 10. Suppose is canonical; then if and only if  .

Proof.
Necessity. Suppose .
For any , if , then it is evident that . If , then there exists such that . By Definition 4, there exists such that . Obviously, . Since is arbitrary and , we have .
For any , by Definition 2 and properties of the operator , we have . Without loss of the generality, we can suppose . If , then by Definition 4. If  , from above Corollary 7, we have . Since and is arbitrary, we obtain .
Therefore, .
Sufficiency. We assume and prove .
If , then there is , but . From Corollary 9, for any , , we have . So, . is canonical and , so . Thus, .
On the other hand, since , by Definition 4, there exists , such that . By the definition of , .
That means there exists one set such that , but . Therefor, .

Theorem 10 gives a sufficient and necessary condition and computation method to find . Now, the process to calculate all intents is summarized as follows. Step  1. Calculate and by Definition 4. Step  2. Calculate and by Definition 4. If , then the set of intents is . Otherwise, we proceed Step  3. Step  3. Calculate and by Definition 4. If , then the set of intents is . Otherwise, calculate () continuously. The computation needs to stop at which exactly meets . Meanwhile, the set of intents is .

The merit of our method is that we do not need to calculate all , and the computation needs only to stop at which exactly meets . Now all the intents have been found and there is no extra computing.

In the following, we use an example in the literature [24] to examine the main results about the new method to find all intents of formal concepts.

The formal context in Table 2 is a minor revision of the famous example, a film “Living Beings and Water” [24]. Since we require all the formal contexts in this paper are canonical, we delete the attribute (water) from the original formal context. The objects are living beings mentioned in the film and are denoted by , where 1 is leech, 2 is bream, 3 is frog, 4 is dog, 5 is spike-weed, 6 is reed, 7 is bean, and 8 is maize. And the attributes in are the properties which the film emphasizes: : lives in water, : lives on land, : needs chlorophyll to produce food, : two seed leaves, : one seed leaf, : can move around, : has limbs, and : suckles its offspring.

The corresponding concept lattice of this formal context is shown in Figure 2.

We calculate and () firstly:

Similarly, we can calculate , , , and find . And we can also know . In fact, we only need to calculate , , . Once we have , but , the computation can be stopped.

According to Theorem 10, the set of all the intents is ; that is, , , , , , , , , , , , , , , , , , , . These results are easily examined from Figure 2.

4. Algorithms and Experiments

4.1. Algorithms

Algorithm 1 is given based on Definition 1 completely.

 (1) input context, (2) step = 1, (3) get first step set , (4) intent set = former step set, (5) step , (6) later step set make arbitrary two ranks meet in the context}, (7) while step ≤ max rank of context   {former step set = later step set; intent set = union of later step set and intent set; step++;}, (8) output the set of intents.

 (1) input context, (2) step = 1, (3) get first step set , (4) intent set = former step set, (5) step , (6) later step set make arbitrary two ranks meet in the context}, (7) while (step ≤ max rank of context)  {if later step set is not the subset of former step set   {former step set = later step set; intent set = union of later step set and intent set; step ;} else {return , (8) output the set of intents.

Algorithm 2 is based on our approach presented by Theorem 10. Comparing with Algorithm 1, we add a condition to terminate the program.

The time complexity of Algorithm 2 is analyzed as follows.

Denote ; by Definition 4, we know the time complexity of Step I in Algorithms 1 or 2 is . So we can get two matters as follows.(1)The time complexity of algorithm is .(2)Suppose that Algorithm 2 will be terminated in the th step; then the time complexity of Algorithm 2 is by Theorem 10. We can easily get .

We present an example demonstrating performance of Algorithm 2. The database “patient and Ill symptoms” showed in Table 3 comes from UCI Machine Learning Repository [25]. Suppose there are 12 patients which are denoted by and symptoms of patients which are denoted by , where is headache, is fever, stands for painful limbs, represents swollen glands in neck, is cold, is stiff neck, is rash, and is vomiting. Input the formal context and run the program; we obtain the set of all intents when :.

4.2. Experimental Results

In this section, we conduct some experiments to compare Algorithm 2 with Algorithm 1. In the experiments, four real life databases we selected are as follows:(1)Living beings and water [24] introduced in Section 4.1.(2)Patients and ill symptoms [25] introduced in Section 4.1.(3)Bacterial Taxonomy [26]. Data are presented for 6 species most of whom having data for more than one strain and 16 phenotypic characters (0 and 1). The species are Escherichia coli (ecoli), Salmonella typhi (styphi), Klebsiella pneumoniae (kpneu), Proteus vulgaris (pvul), Proteus morganii (pmor), and Serratia marcesens (smar). The phenotypic characters are H2S, MAN, LYS, IND, ORN, CIT, URE, ONP, VPT, INO, LIP, PHE, MAL, ADO, ARA, and RHA.(4)Membership of Developing Countries in Supranational Group [24]. In this data, 130 developing countries are objects. Six properties (group of 77, nonaligned, least developed countries, most seriously affected countries, Organization of Petrol Exporting Countries, and African Caribbean and Pacfic Countries) are attributes.

The results are shown in Table 4 and Figure 3, where Time 1 and Time 2 are the running time of Algorithms 1 and 2, respectively. presents the number of intents and the efficiency is equivalent to (Time 1 − Time 2)/Time 1. It can be seen that Algorithm 2 is much more efficient than Algorithm 1 along with the increase of .

5. Conclusion

To find new methods to solve the difficult problems of the concept lattice construction is a hot problem. Constructing concept lattices is a novel research branch for data processing and data analysis. Different methods play essential roles in different problems. This paper first defines some basic notions. Based on the basic notion of intents, we obtain a new judgment method of finding all intents of formal concepts. Moreover, an example is given to explain the feasibility of this method. At last, we give the corresponding algorithm of this method and do the experiments to illustrate the effectiveness of this method.

For Algorithm 2, we have the following discussion which can be applied to real application. We can compare with of a formal context. If , then we use subsets of to determine subsets of and output the set of intents. Otherwise, according to the duality principle, the subsets of can be used to determine subsets of and output the set of extents. We will improve the corresponding algorithm of this method in the future.

Conflict of Interests

The authors declare that there is no conflict of interests regarding the publication of this paper.

Acknowledgment

This work is supported by Grants from the National Natural Science Foundation of China (no. 11371014, no. 11071281, and no. 61202206).