Research Article | Open Access
A Novel Concept Acquisition Approach Based on Formal Contexts
As an important tool for data analysis and knowledge processing, formal concept analysis (FCA) has been applied to many fields. In this paper, we introduce a new method to find all formal concepts based on formal contexts. The amount of intents calculation is reduced by the method. And the corresponding algorithm of our approach is proposed. The main theorems and the corresponding algorithm are examined by examples, respectively. At last, several real-life databases are analyzed to demonstrate the application of the proposed approach. Experimental results show that the proposed approach is simple and effective.
Formal concept analysis (FCA), proposed by Wille in 1982 , is a field of applied mathematics based on the mathematization of concept and conceptual hierarchy. It thereby activates mathematical thinking for conceptual data analysis and knowledge processing. FCA starts with a formal context defined as a triple containing an object set, an attribute (property) set, and a binary relation between the object set and the attribute set. A formal concept is a pair (object subset, attribute subset) induced by the binary relation, and a concept lattice is an ordered hierarchical structure of formal concepts. A formal context in FCA corresponds to a special information system with input data being two-valued in rough set theory .
Most of the researches on FCA concentrate on the following topics: construction and pruning algorithm of concept lattices [3, 4]; relationship between FCA and rough sets [5–10]; acquisition of rules [11, 12]; reduction of concept lattices [6, 10, 13]. FCA is also proved to be useful in many fields, such as the organization of web search results on a hierarchical structure of concepts based on common topics , information retrieval [15, 16], hierarchical analysis of software code [17–19], visualization in software engineering [19, 20], detecting suspects in human traficking , analysis of questionnaire data , and mining gene expression data . Further references to applications of FCA can be found in [14, 24].
Formal concepts are very important notions of FCA. And intents and extents are also very important elements of formal concepts. The set of intents (extents) is isomorphic to the corresponding concept lattice under the order relationship “” (“”). So, if the set of intents is determined, the corresponding concept lattice is identified. Thus, obtaining all intents or extents is very important. Generally, the basic way to obtain all intents or extents is via their definitions. If there are objects, then we should calculate times to obtain all intents. Obviously, the computational costing is very huge. To solve this problem, we give a new method to obtain all intents. And correspondingly, the formal concepts are determined.
This paper is organized as follows. In Section 2, we briefly review some basic notions related to FCA. In Section 3, a novel concept acquisition approach is introduced and some related conclusions are given. In Section 4, the corresponding algorithm is proposed and experimental results are shown to illustrate the validity of our method. Finally, conclusions are drawn in Section 5.
In this section, we recall some basic notions and properties in FCA.
Definition 1 (see ). A formal context consists of two sets and and a relation between and . The elements of are called the objects and the elements of are called the attributes of the context. In order to express that an object is in a relation with an attribute , we write or and read it as “the object has the attribute m.”
With respect to a formal context , Ganter and Wille  defined a pair of dual operators for any and by
A formal context is called canonical if , , , and , , . We assume that all the formal contexts we study in the sequel are finite and canonical.
Let be a formal context. , ; the following properties hold.(1), .(2), .(3), .(4).(5).(6).
If and , then is called a formal concept, where is called the extent of the formal concept and is called the intent of the formal concept. For any , a pair is a formal concept and is called an object concept. Similarly, for any , a pair is a formal concept and is called an attribute concept. The family of all formal concepts of forms a complete lattice that is called the concept lattice and is denoted by . For any , the partial order is defined by And the infimum and supremum of and are defined by
Definition 2 (see ). Let be a formal context. and . Denote and is an intent of .
Example 3 (see ). Table 1 is a formal context . is an object set and is an attribute set. The corresponding concept lattice is shown in Figure 1, in which every set is denoted directly by listing its elements except , , and .
3. A Novel Concept Acquisition Approach
The basic way to obtain all intents or extents is via their definitions. If there are objects, then we should calculate times to get all intents. Obviously, the amount of computation is very large. So our paper presents a new approach to solve the problem. In this section, we give this new method and some theorems to explain its rationality and validity.
Before giving the method, we firstly propose a related definition.
Definition 4. Let be a formal context. , . Denote and , and , , where presents the cardinal of a set.
Since the method in this paper is aimed at obtaining all intents, we use subsets of to determine subsets of . On the contrary, if we want to obtain all extents, the subsets of can be used to determine subsets of . This point has been illustrated in the sequel.
Theorem 5. is an intent of an object concept.
Theorem 6. If there exists () such that , then .
Proof. Suppose . By Definition 4, there exists such that .
Since , there exists such that . Noting that , we have . Moreover, from , we know that there exists satisfying ; that is, .
Now, we discuss two cases to prove .
The one case is that . In this case, . Thus, .
The other one is that . In this case, . Because , there exists such that . Therefore, we have ; that is, . That means, . Therefore, we have . Thereby, we can obtain .
To sum up the above two cases, holds.
Corollary 7. If there exists () such that , then for any , , we have .
Theorem 8. Suppose , , ; if , then we have .
Proof. We will adopt the proof by contradiction.
Suppose ; there is satisfying ; according to the condition and Definition 4, there is satisfying .
Because , there exists such that ; that is, . Obviously, , and thus, . That is, . That means . Therefore, we have . From Definition 4, we know that . It is a contradiction with .
Corollary 9. Suppose , , and ,; then for any , , .
Theorem 10. Suppose is canonical; then if and only if .
Necessity. Suppose .
For any , if , then it is evident that . If , then there exists such that . By Definition 4, there exists such that . Obviously, . Since is arbitrary and , we have .
For any , by Definition 2 and properties of the operator , we have . Without loss of the generality, we can suppose . If , then by Definition 4. If , from above Corollary 7, we have . Since and is arbitrary, we obtain .
Sufficiency. We assume and prove .
If , then there is , but . From Corollary 9, for any , , we have . So, . is canonical and , so . Thus, .
On the other hand, since , by Definition 4, there exists , such that . By the definition of , .
That means there exists one set such that , but . Therefor, .
Theorem 10 gives a sufficient and necessary condition and computation method to find . Now, the process to calculate all intents is summarized as follows. Step 1. Calculate and by Definition 4. Step 2. Calculate and by Definition 4. If , then the set of intents is . Otherwise, we proceed Step 3. Step 3. Calculate and by Definition 4. If , then the set of intents is . Otherwise, calculate () continuously. The computation needs to stop at which exactly meets . Meanwhile, the set of intents is .
The merit of our method is that we do not need to calculate all , and the computation needs only to stop at which exactly meets . Now all the intents have been found and there is no extra computing.
In the following, we use an example in the literature  to examine the main results about the new method to find all intents of formal concepts.
The formal context in Table 2 is a minor revision of the famous example, a film “Living Beings and Water” . Since we require all the formal contexts in this paper are canonical, we delete the attribute (water) from the original formal context. The objects are living beings mentioned in the film and are denoted by , where 1 is leech, 2 is bream, 3 is frog, 4 is dog, 5 is spike-weed, 6 is reed, 7 is bean, and 8 is maize. And the attributes in are the properties which the film emphasizes: : lives in water, : lives on land, : needs chlorophyll to produce food, : two seed leaves, : one seed leaf, : can move around, : has limbs, and : suckles its offspring.
The corresponding concept lattice of this formal context is shown in Figure 2.
We calculate and () firstly:
Similarly, we can calculate , , , and find . And we can also know . In fact, we only need to calculate , , . Once we have , but , the computation can be stopped.
4. Algorithms and Experiments
The time complexity of Algorithm 2 is analyzed as follows.
Denote ; by Definition 4, we know the time complexity of Step I in Algorithms 1 or 2 is . So we can get two matters as follows.(1)The time complexity of algorithm is .(2)Suppose that Algorithm 2 will be terminated in the th step; then the time complexity of Algorithm 2 is by Theorem 10. We can easily get .
We present an example demonstrating performance of Algorithm 2. The database “patient and Ill symptoms” showed in Table 3 comes from UCI Machine Learning Repository . Suppose there are 12 patients which are denoted by and symptoms of patients which are denoted by , where is headache, is fever, stands for painful limbs, represents swollen glands in neck, is cold, is stiff neck, is rash, and is vomiting. Input the formal context and run the program; we obtain the set of all intents when :.
4.2. Experimental Results
In this section, we conduct some experiments to compare Algorithm 2 with Algorithm 1. In the experiments, four real life databases we selected are as follows:(1)Living beings and water  introduced in Section 4.1.(2)Patients and ill symptoms  introduced in Section 4.1.(3)Bacterial Taxonomy . Data are presented for 6 species most of whom having data for more than one strain and 16 phenotypic characters (0 and 1). The species are Escherichia coli (ecoli), Salmonella typhi (styphi), Klebsiella pneumoniae (kpneu), Proteus vulgaris (pvul), Proteus morganii (pmor), and Serratia marcesens (smar). The phenotypic characters are H2S, MAN, LYS, IND, ORN, CIT, URE, ONP, VPT, INO, LIP, PHE, MAL, ADO, ARA, and RHA.(4)Membership of Developing Countries in Supranational Group . In this data, 130 developing countries are objects. Six properties (group of 77, nonaligned, least developed countries, most seriously affected countries, Organization of Petrol Exporting Countries, and African Caribbean and Pacfic Countries) are attributes.
The results are shown in Table 4 and Figure 3, where Time 1 and Time 2 are the running time of Algorithms 1 and 2, respectively. presents the number of intents and the efficiency is equivalent to (Time 1 − Time 2)/Time 1. It can be seen that Algorithm 2 is much more efficient than Algorithm 1 along with the increase of .
To find new methods to solve the difficult problems of the concept lattice construction is a hot problem. Constructing concept lattices is a novel research branch for data processing and data analysis. Different methods play essential roles in different problems. This paper first defines some basic notions. Based on the basic notion of intents, we obtain a new judgment method of finding all intents of formal concepts. Moreover, an example is given to explain the feasibility of this method. At last, we give the corresponding algorithm of this method and do the experiments to illustrate the effectiveness of this method.
For Algorithm 2, we have the following discussion which can be applied to real application. We can compare with of a formal context. If , then we use subsets of to determine subsets of and output the set of intents. Otherwise, according to the duality principle, the subsets of can be used to determine subsets of and output the set of extents. We will improve the corresponding algorithm of this method in the future.
Conflict of Interests
The authors declare that there is no conflict of interests regarding the publication of this paper.
This work is supported by Grants from the National Natural Science Foundation of China (no. 11371014, no. 11071281, and no. 61202206).
- R. Wille, “Restructuring lattice theory: an approach based on hierarchies of concepts,” in Ordered Sets, I. Rival, Ed., pp. 445–470, Reidel, Dordrecht, The Netherlands, 1982.
- Z. Pawlak, “Rough sets,” International Journal of Computer and Information Sciences, vol. 11, no. 5, pp. 341–356, 1982.
- L. K. Guo, F. P. Huang, Q. G. Li, and G. Q. Zhang, “Power contexts and their concept lattices,” Discrete Mathematics, vol. 311, no. 18-19, pp. 2049–2063, 2011.
- S. O. Kuzetsov and S. A. Obiekov, “Comparing performance of algorithms for generation concept lattices,” Journal of Experimental & Theoretical Artificial Intelligence, vol. 14, pp. 189–216, 2002.
- R. Bělohlávek, “Similarity relations in concept lattices,” Journal of Logic and Computation, vol. 10, no. 6, pp. 823–845, 2000.
- W. X. Zhang, Y. Y. Yao, and Y. Leung, Rough Set and Concept Lattice, Xi'an Jiantong University Press, Xi'an, China, 2006.
- R. E. Kent, “Rough concept analysis: a synthesis of rough sets and formal concept analysis,” Fundamenta Informaticae, vol. 27, no. 2-3, pp. 169–181, 1996.
- J. Saquer and J. S. Deogun, “Concept approximations based on rough sets and similarity measures,” International Journal of Applied Mathematics and Computer Science, vol. 11, no. 3, pp. 655–674, 2001.
- J. J. Qi, L. Wei, and Z. Z. Li, “A partitional view of concept lattice,” in Lecture Notes in Computer Science, vol. 10, pp. 74–83, 2005.
- L. Wei, Reduction Theory and Approach to Rough Set and Concept Lattice, Xi'an Jiaotong University, Xi'an, China, 2005.
- R. Godin, R. Missaoui, and H. Alaoui, “Incremental concept formation algorithms based on Galois (concept) lattices,” Computational Intelligence, vol. 11, no. 2, pp. 246–267, 1995.
- B. Ganter, G. Stumme, and R. Wille, Formal Concept Analysis: Foundations and Applications, Springer, Berlin, Germany, 2005.
- W. X. Zhang, L. Wei, and J. J. Qi, “Attribute reduction in concept lattice based on discernibility matrix,” in Rough Sets, Fuzzy Sets, Data Mining, and Granular Computing, vol. 3642 of Lecture Notes in Computer Science, pp. 157–165, Springer, Berlin, Germany, 2005.
- C. Carpineto and G. Romano, Concept Data Analysis: Theory and Applications, John Wiley and Sons, New York, NY, USA, 2004.
- W. C. Cho and D. Richards, “Improvement of precision and recall for information retrieval in a narrow domain: reuse of concepts by formal concept analysis,” in Proceedings of the IEEE/WIC/ACM International Conference on Web Intelligence (WI '04), pp. 370–376, September 2004.
- Y. Okubo and M. Haraguchi, “Finding conceptual document clusters with improved top-N formal concept search,” in Prcoeeding of the IEEE/WIC/ACM International Conference on Web Intelligence (WI '06), pp. 347–351, Hong Kong, China, December 2006.
- U. Dekel and Y. Gil, “Visualizing class interfaces with formal concept analysis,” in Proceedings of the 18th Annual ACM SIGPLAN Conference on Object-Oriented Programming, Systems, Languages, and Applications (OOPSLA '03), pp. 288–289, Anaheim, Calif, USA, 2003.
- G. Snelting and F. Tip, “Understanding class hierarchies using concept analysis,” ACM Transactions on Programming Languages and Systems, vol. 22, no. 3, pp. 540–582, 2000.
- P. Tonella, “Using a concept lattice of decomposition slices for program understanding and impact analysis,” IEEE Transactions on Software Engineering, vol. 29, no. 6, pp. 495–509, 2003.
- V. Ganapathy, D. King, T. Jaeger, and S. Jha, “Mining security-sensitive operations in legacy code using concept analysis,” in Proceedings of the 29th International Conference on Software Engineering (ICSE '07), pp. 458–467, May 2007.
- J. Poelmans, P. Elzinga, D. I. Ignatov, and S. O. Kuznetsov, “Semi-automated knowledge discovery: identifying and profiling human trafficking,” International Journal of General Systems, vol. 41, no. 8, pp. 774–804, 2012.
- R. Belohlavek, E. Sigmund, and J. Zacpal, “Evaluation of IPAQ questionnaires supported by formal concept analysis,” Information Sciences, vol. 181, no. 10, pp. 1774–1786, 2011.
- M. Kaytoue, S. O. Kuznetsov, A. Napoli, and S. Duplessis, “Mining gene expression data with pattern structures in formal concept analysis,” Information Sciences, vol. 181, no. 10, pp. 1989–2001, 2011.
- B. Ganter and R. Wille, Formal Concept Analysis: Mathematical Foundations, Springer, Berlin, Germany, 1999.
- A. Asuncion and D. J. Newman, UCI Machine Learning Repository, University of California, School of Information and Computer Sciences, Irvine, Calif, USA, 2007.
- A. H. Fielding, Clustering and Classification Techniques for the Biosciences, Cambridge University Press, London, UK, 2002.
Copyright © 2014 Ting Qian and Ling Wei. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.