Abstract

In spite of the voluminous studies in the field of intelligent retrieval systems, effective retrieving of information has been remained an important unsolved problem. Implementations of different conceptual knowledge in the information retrieval process such as ontology have been considered as a solution to enhance the quality of results. Furthermore, the conceptual formalism supported by typical ontology may not be sufficient to represent uncertainty information due to the lack of clear-cut boundaries between concepts of the domains. To tackle this type of problems, one possible solution is to insert fuzzy logic into ontology construction process. In this article, a novel approach for fuzzy ontology generation with two uncertainty degrees is proposed. Hence, by implementing linguistic variables, uncertainty level in domain's concepts (Software Maintenance Engineering (SME) domain) has been modeled, and ontology relations have been modeled by fuzzy theory consequently. Then, we combined these uncertain models and proposed a new ontology with two degrees of uncertainty both in concept expression and relation expression. The generated fuzzy ontology was implemented for expansion of initial user's queries in SME domain. Experimental results showed that the proposed model has better overall retrieval performance comparing to keyword-based or crisp ontology-based retrieval systems.

1. Introduction

The process of searching specific information among a large number of information items is known as information retrieval (IR). Users of IR Systems expect to find the most relevant items to a certain query. The computing parameters such as recall and precision are used for effectiveness appraisal of these systems [1]. Generally, an information retrieval system does not present an ideal behavior. Users often receive large result sets, and they have to spend a considerable time to find these items which are really relevant to their initial queries. Indeed, this kind of searching information will neglect relevant documents that do not contain the index terms which are specified in the user’s queries. Working with specific domain knowledge, the mentioned problem can be tackled by incorporating a knowledge base such as an ontology which builds the relationships between index terms and existing information retrieval systems [2].

One of the motivations of the semantic web is the implementation of ontologies to overcome the limitations of keyword-based search [3]. Ontology is a conceptualization of a domain into a human understandable, machine-readable format consisting of entities, attributes, relationship, and axioms [4]. It is used as a standard knowledge representation for the semantic web [5]. An ontology-based information retrieval system is constructed on specified domain knowledge. Applying knowledge structures for filtering and searching the user’s relevant needed information is the objective of inserting an ontology to information retrieval systems. If the searched information is covered under the concept of the user’s knowledge domain, using ontology will increase the probability of relevancy [6]. However, the conceptual formalism supported by a typical ontology may not be sufficient to represent uncertainty, commonly found in many application domains due to the lack of clear-cut boundaries between concepts of the domains. One possible solution to confront the uncertain and vague information is inserting the fuzzy logic to ontology constructing process [7]. Fuzzy set theory, among computational intelligence techniques, is a promising approach to improve the effectiveness of information retrieval systems [8]. It deals with uncertainty that may be present in document and query representations as well as in their relationships. It has already been used for indexing, clustering, and recommendation [2]. The fuzzy ontology is based on modification of an existing crisp ontology. The modification process is entirely incremental and the conversion process to a fuzzy ontology adds membership values to the currently existing relations, and may also add new entries to the ontology [9].

Several researchers have considered the usage of fuzzy ontologies to model the uncertainty in the domain knowledge [10], where Lee et al. proposed a fuzzy ontology for news summarization. In their paper, the fuzzy inference mechanism generated the membership degrees for each fuzzy concept of the fuzzy ontology [11]. They also proposed a novel type-2 fuzzy ontology and applied it to diet assessment by combining the type-2 fuzzy sets and the ontology model to propose a type-2 fuzzy ontology. Moreover, they used the type-2 fuzzy ontology to diet assessment domain to propose a fuzzy diet assessment agent for people with the average levels of physical activity [12]. A Type-2 Fuzzy Ontology (T2FO) is a knowledge representation model for describing the domain knowledge with uncertainty. It is an extension of the domain ontology and contains six layers, including a domain layer, a category layer, a fuzzy concept layer, a fuzzy variable layer, a fuzzy set layer, and a Type-2 fuzzy set (T2FS) layer. The concepts and relations of the T2FO are constructed by fuzzy variables, fuzzy sets, and T2FSs [12]. In a similar work, Lee et al. developed a type-2 fuzzy ontology and used it for personal diabetic-diet recommendation [13]. Quan et al. presented an automatic fuzzy ontology generation for semantic help-desk support for supporting customer services utilizing the semantic web technologies. It has been focused on an automatic generation approach, known as fuzzy formal concept analysis (FFCA) for fuzzy machine service ontology that can also deal with uncertainty data [14].

One application of fuzzy ontologies is in query expansion task. The main aim of query expansion is to add new meaningful terms to the initial query. Query expansion technology can improve the efficiency of the search engine by adding other terms which are closely related to the original query terms and disambiguate the user query [9]. Bahri et al. addressed fuzzy ontology implementation and query answering on databases. They propose a language to define fuzzy ontology schema and to query fuzzy ontology databases and an inferential engine to infer fuzzy concept instances and their membership degrees [15]. Pan et al. proposed a framework of fuzzy query languages for fuzzy ontologies and present ed query answering algorithms for query languages over fuzzy DL-Lite ontologies [16].

In this paper, a new method for ontology generation is proposed based on fuzzy theory with two degrees of uncertainty. Considering two uncertainty degrees in concept expression and relation expression and combining uncertain models to generate a new fuzzy ontology is the main contribution of this work. In this model, linguistic variables are used and membership degree of concepts to a certain domain and similarly membership degree of relations to concepts in SME domain are modeled as well. Then, these uncertain models are combined and a new ontology with two uncertainty degrees, both in concept expression and relation expression, is proposed. Finally, performance appraisal of information retrieval system based on proposed query expansion algorithm is measured in SME domain by comparing the lack of ontology, the crisp ontology, and the fuzzy ontology situations.

The rest of this paper is organized as follows: Section 2 reviews related work to the subject of paper. In Section 3, basic concepts of fuzzy set theory are explained. Fuzzification of the modification process ontology is depicted in Section 4. In Section 6, information retrieval system performance appraisal is discussed, and finally Section 7 concludes the paper.

Fuzzy logic systems (FLSs) have been credited with providing an adequate methodology for designing robust systems that are able to deliver a satisfactory performance when contending with the uncertainty, noise, and imprecision attributed to real-world environments and applications [10]. As a result, FLSs have been used in wide range of applications including fuzzy ontologies.

Because the aim of this paper is proposing an information retrieval approach based on fuzzy ontology, in this section some fuzzy information retrieval models which are used as a tool for improving the retrieval performance are presented.

Parry has implemented fuzzy ontology for information retrieval which is focused on medical documents retrieval. This ontology has fuzzy values in its relations [17]. Zhai et al. presented a fuzzy ontology for semantic information retrieval in e-commerce domain, and a semantic query expansion method is used for this purpose. Their framework includes three parts: concepts, properties of concepts, and values of properties in which property value can be either standard data types or linguistic values of fuzzy concepts [18]. They also implemented fuzzy ontology for semantic information retrieval in supply chain management, traffic information retrieval, and intelligent transportation systems fields [1921].

Leite and Ricarte presented a framework to encode a geographic knowledge base composed of multiple-related ontologies whose relationships were expressed as fuzzy relations. This knowledge organization was used in a fuzzy method to expand the user initial query [22].

An ontology-based spatial query expansion method was proposed by Fu et al., which considered a geographical ontology to expand geographic terms. Various factors are taken into account to support intelligent expansion of a spatial query, including types of spatial terms as encoded in the geographical ontology, types of nonspatial terms as encoded in the domain ontology, as well as the semantics of the spatial relationships and their context of use [23]. Bratsas et al. used a fuzzy query expansion and a fuzzy thesaurus to solve the Medical Computational Problem (MCP). In the experiments, the system was capable of retrieving the same MCP for distinct descriptions of the same problem. The system uses a unique fuzzy thesaurus for query expansion [24].

Ogawa et al. used the keyword connection matrix model which the knowledge about the relevant keywords or terms is encoded as a single fuzzy relation that expresses the degree of similarity between the terms. The information retrieval process uses the fuzzy keyword connection matrix to find similarities between the query terms as a way to improve query results [25].

Pereira et al. presented the fuzzy relational ontological model in information search systems that considers knowledge base as a fuzzy ontology with concepts representing the categories and the keywords of a domain. When the user enters a query, composed of concepts, the system performs its expansion and may add new concepts based on the ontology knowledge. After expansion, the similarity between the query and the documents is calculated by fuzzy operations [26].

The fuzzy rough set-based web query expansion method was represented by Cock and Cornelis, which used the tight upper approximation in fuzzy rough set theory to find terms to be added to the query. The knowledge base is a thesaurus that consists of a term-term relation. In this approach, a term Y will only be added to a query represented by the fuzzy set A if all the terms that are related to Y are also related to at least one keyword of the query. The proposed technique results were promising when the query had ambiguous terms [27].

Calegari and Sanchez proposed a fuzzy ontology-approach to improve semantic information retrieval and introduced an information retrieval algorithm that allows to derive a unique path among the entities involved in the query in order to obtain maxima semantic associations in the knowledge domain [28]. Bahri et al. implemented fuzzy ontology for query answering on databases. They proposed a language to define fuzzy ontology schema and to query fuzzy ontology databases and an inferential engine to infer fuzzy concept instances and their membership degrees [29].

By reviewing the studies in fuzzy ontology implementation, it can be observed that the proposed approaches do not have adequate ability in uncertainty representation of concepts and have not considered natural semantic relations between ontology concepts. These weaknesses may cause some problems in fuzzy semantic information retrieval. Since concepts belong with a specific membership degree to a certain domain and similarly relations belong with a certain membership degree to concepts, this paper proposes a fuzzy ontology with two degrees of uncertainty, both in concept expression and relation expression.

3. Fuzzy Membership Function and Linguistic Variables

The special structure of fuzzy numbers makes calculations very time-consuming and sophisticated. Generally for facilitating calculations and practical usage, particular fuzzy numbers are used. In this paper, the experts’ opinions are described by linguistic variables which have been expressed in trapezoidal fuzzy numbers. In order to determine the relevancy of ontology elements to related specific domain (concept’s membership degree to main domain and relation’s membership degree to concepts), seven linguistic variables have been used as: “not relevant”, “very low relevant”, “low relevant”, “medium relevant”, “high relevant”, “very high relevant”, and “fully relevant”. Figure 1 presents these linguistic variables and their corresponding trapezoidal fuzzy numbers.

4. Fuzzification the “Modification Process Ontology”

Software maintenance happens in a relatively disorganized way and naturally leads to the deterioration of software systems’ structure. Lacking a complete knowledge of all the implementation details, apply modifications that will result in a loss of structure, which in turn makes the systems more difficult to understand fully and, therefore, to maintain [30]. To break this vicious circle, we aim at developing a knowledge management approach for software maintenance domain. This approach will be modeled as fuzzy ontology.

The initial modification process ontology, which has been proposed by Dias, is presented in Figure 2. This ontology organizes concepts from the modification request (and its causes) to the maintenance activities in SME domain [24]. In this study, the existing uncertainty in concepts and relations of the modification process ontology is modeled in two phases. Firstly, an uncertainty degree for describing the concepts of ontology is obtained using linguistic variables. Secondly, the relations of existing ontology have been fuzzified, and finally, a new ontology with a unique membership degree for each relation is proposed by implementing fuzzy composition. Assume that 𝑈 and 𝑉 are two collections of objects. An arbitrary fuzzy set 𝑅, defined in the Cartesian product 𝑉×𝑊, will be called a fuzzy relation in the space 𝑈×𝑉. 𝑅 is thus a function defined in the space 𝑈×𝑉, which takes values from the interval [0, 1]. Let 𝑅1 be a fuzzy relation in 𝑈×𝑉 and 𝑅2 a fuzzy relation in 𝑉×𝑊. For all (𝑢,𝑤) in 𝑈×𝑊, max-min, max-product, and max-average compositions are defined as follows [31]: 𝑅Max-Min1,𝑅2𝑅(𝑢,𝑤)=MaxMin1(𝑢,𝑣),𝑅2(𝑅𝑣,𝑤)overall𝑣intheset𝑉,(1)Max-Product1,𝑅2𝑅(𝑢,𝑤)=Max1(𝑢,𝑣)×𝑅2(𝑅𝑣,𝑤)overall𝑣intheset𝑉,(2)Max-Average1,𝑅2=1(𝑢,𝑤)2𝑅Max1(𝑢,𝑣)+𝑅2(𝑣,𝑤)overall𝑣intheset𝑉.(3)

4.1. Concepts’ Fuzzification of the “Modification Process Ontology”

Although fuzzification of ontology’s relations has been focused in almost all the revised papers, concept’s fuzzification has been considered rarely [2, 4, 9, 11, 1821]. So, this section considers fuzzification of ontology concepts particularly. The supportive idea is that existing concepts in ontology are not related completely to the considered SME domain, so each concept has a specific membership degree to the specific domain. For fuzzification of the concepts’ ontology, a special questionnaire was designed and related experts were asked to determine the relevancy degree of the modification process ontology’s concepts to the domain with following linguistic values: “not relevant”, “very low relevant”, “low relevant”, “medium relevant”, “high relevant”, “very high relevant”, and “fully relevant”. These values were fuzzified by the membership function shown in Figure 1. In order to reach a unique membership degree for each concept of the ontology, trapezoidal fuzzy variable defuzzifier has been used. The defuzzification value of a trapezoidal fuzzy number 𝐴=[𝑎1,𝑎2,𝑎3,𝑎4], 𝑎1𝑎2𝑎3𝑎4, is defined as [32]𝑎𝐷=1+𝑎2+𝑎3+𝑎44.(4)

The results have been presented in Table 1.

4.2. Relations’ Fuzzification of the “Modification Process Ontology”

Resembling the mentioned process in Section 4.1, another questionnaire was designed for fuzzification of ontology relations in Figure 2. The experts were asked to determine the relevancy degree of each relation to the existed concepts with these values: “not relevant”, “very low relevant”, “low relevant”, “medium relevant”, “high relevant”, “very high relevant”, and “fully relevant”. These linguistic variables were fuzzified with the membership function which has been shown in Figure 1. Table 2 shows the results of modeling the ontology relations with trapezoidal numbers. In order to reach a unique membership degree for each relation of the ontology, the trapezoidal fuzzy variable defuzzifier (4) has been used. Column 2 of Table 2 shows the ontology relations which have been presented in Figure 3.

4.3. Proposing the New Combined Model Based on Fuzzy Set Theory

The final step of this phase is the combination of concepts and relations’ membership degrees and proposing a unique membership degree for ontology relations in the SME domain. Because the relevancy degree of the ontology relations to the concepts has been fuzzified and the concepts of the SME domain are fuzzified as well, the membership degree of relations in the modification process ontology can be calculated by using fuzzy compositions. Implementing the proposed fuzzy ontology shows better results for max-product composition (2) in comparison to other fuzzy compositions, therefore this fuzzy composition type is used. The final fuzzy ontology has been shown in Figure 3.

5. The Query Expansion Algorithm

In this section, topics including the proposed query expansion algorithm, the algorithm in a pseudocode form and computational complexity of the crisp, and fuzzy ontology query expansion algorithms are discussed.

5.1. The Proposed Query Expansion Algorithm Based on the Ontology

In this algorithm, it is assumed that entered user query terms are existed in the modification process ontology. If the query terms were not in the related ontology of a specific domain, the most common approach is adding synonym terms based on a general dictionary (commonly English comprehensive ontology WordNet). General dictionaries do not consider any specific domain, hence a satisfied accuracy will not be obtained [33]. Therefore, in the proposed query expansion algorithm, those situations are considered that the entered query terms exist in the modification process ontology. Different approaches are used for query expansion based on the existence or the lack of relations between query terms in the ontology. These approaches are explained in the following.

(a) Query Expansion Based on Crisp Ontology
With this supposition that query terms exist in the modification process ontology and these terms have or do not have a semantic relation with each other, two situations are possible.(i) Query terms exist in the modification process ontology and have semantic relation
The associated relations of terms in an ontology are extracted (it is possible that one term is father or child of another term). For each father term, all its father terms (generalized terms) are extracted, and for each child term, all its child terms (specialized terms) are extracted as well and then these extracted terms are entered in the expanded query, consequently. For example, consider query = (“modification activity”, “enhancement maintenance”). These terms have “modification activity” > “enhancement maintenance” relation in the ontology; thus, instead of the term “modification activity”, generalized terms such as “maintenance activity” will be entered in the expanded query. Similarly, instead of the term “enhancement maintenance”, specialized terms such as “adaptive maintenance”, “perfective maintenance”, and “preventive maintenance” are added in the expanded query.
(ii) Query terms exist in the modification process ontology and do not have semantic relations
In this situation, all related father and child terms will be entered in the expanded query. As a case in point, consider query = (“corrective maintenance”, “maintenance project”). Because there is no relation between these query terms in the ontology, all of their father and child terms are added in the expanded query; so, instead of the term “corrective maintenance”, the terms “modification activity”, and instead of the term “maintenance project”, the terms “maintenance activity” and “modification request” will be added in expanded query.

(b) Query Expansion Based on Fuzzy Ontology
This status is very similar to status (a) with the difference that terms added in the query have a membership degree equal or greater than a certain threshold limit. As is discussed in the next section, the maximum average precision belongs to threshold limit 0.78, so this limit is considered. For example, consider query = (“modification activity”, “enhancement maintenance”). These terms have “modification activity” > “enhancement maintenance” relation in the ontology and obey the situation (i); thus, instead of term the “modification activity”, generalized terms with a membership degree equal or greater than 0.78 such as “maintenance activity” will be entered in the expanded query. Similarly, instead of the term “enhancement maintenance”, specialized terms with membership degree equal or greater than 0.78 such as “adaptive maintenance” are added in the expanded query.
As a more illustration, consider query = (“corrective maintenance”, “maintenance project”). Because there is no relation between these query terms in the ontology, all of their father and child terms are added in the expanded query based on the situation (ii); so, instead of term “corrective maintenance”, the term “modification activity” and instead of term “maintenance project” the term “modification request” which have a membership degree equal or greater than 0.78 will be added in expanded query.

5.2. Pseudocode Form of the Algorithm

The pseudocode form of the algorithm based on crisp and fuzzy ontology is as shown in Pseudocode 1.

/ / situation (a.i)
for ( ) {
get(query);
if (query words exist in modification process ontology && have relation together){
 if (the query word is parent of other words)
add(other parent of this word to the query terms);
 else if (the query word is child of other words)
add(other child of this word to the query terms);
} / / end if
} / / end for
***************************
/ / situation (a.ii)
for ( ) {
get(query);
 if (query words exist in modification process ontology && have no relation together){
 add(all parents and children of this words to query);
} / / end for
***************************
/ / situation (b.i)
for ( ) {
get(query);
if (query words exist in modification process ontology && have relation together){
 if (the query word is parent of other words && their membership degree >=threshold limit)
 add(other parent of this word to the query terms);
else if (the query word is child of other words && their membership degree >=threshold limit)
add(other child of this word to the query terms);
} / / end if
} / / end for
***************************
/ / situation (b.ii)
for ( ) {
get(query);
if (query words exist in modification process ontology && have no relation together && their
membership degree > = threshold limit){
 add(all parents and children of this words to query);
} / / end for
***************************

5.3. Computational Time of the Crisp and Fuzzy Ontology Query Expansion Algorithm

Computational time of the crisp and fuzzy ontology query expansion algorithm in worth case of both situation (a) and (b) of our pseudocode will be O(n), because in the worth case the user enters all of terms of the crisp and fuzzy ontology in the query and in normal case is O(constant).

6. Performance Evaluation of Information Retrieval System

The two most frequent and basic measures for information retrieval effectiveness are precision and recall [34, 35]. So, we used these measures for the ontology performance evaluation. Precision and recall are defined in terms of a set of retrieved documents (e.g., the list of documents produced by a web search engine for a query) and a set of relevant documents (e.g., the list of all documents on the internet that are relevant for a certain topic), Precision=#(relevantitemsretrieved),#(retrieveditems)=𝑃(relevantretrieved)Recall=#(relevantitemsretrieved)#(relevantitems)=𝑃(retrievedrelevant).(5)

The average precision plots at each standard recall level across all queries and evaluates overall system performance on a document/query corpus [36, 37].

For performance evaluation of information retrieval system in local searches, the Google Desktop search engine has been implemented, and a lot of related documents to the Software Maintenance Engineering domain have been entered in. For appropriate threshold limit determining in described situation (b), various thresholds from 0 to 0.9 have been considered. Furthermore, both unique and multiple query terms were entered in the search engine. Figure 4 shows the average precision of fuzzy ontology for different threshold limits. As it is obvious in this figure, the maximum average precision belongs to threshold limit 0.78.

As a result, the threshold limit 0.78 has been used for the situation (b) in the proposed query expansion algorithm. In Figure 5, the average precision for different levels of the recall values has been depicted for the three situations: the lack of ontology situation, the crisp ontology query expansion situation, and the fuzzy ontology query expansion situation. These situations are selected to evaluate the fuzzy ontology performance rather than the lack of ontology situation and the crisp ontology query expansion situation. As shown in Figure 5, the fuzzy query expansion situation has the best average precision in retrieving relevant documents.

7. Conclusion

In this paper we proposed a new approach for effective retrieving of information by implementing fuzzy ontology generation technique with two uncertainty degrees. Taking fuzzy ontology can tackle the uncertainty of relations in comparison to taking crisp ontology and make it possible to find uncertain information in a specific domain. For performance evaluation of information retrieval system, we considered three different situations (two situation with ontology existence (crisp or fuzzy) and one without any ontology). The empirical results of using proposed query expansion algorithm showed that crisp ontology’s average precision increased 5% in comparison to take no ontology for expansion job and fuzzy ontology’s average precision increased 3% in comparison to use crisp ontology query expansion situation. Furthermore, fuzzy ontology’s average precision increased 8% in comparison to take no ontology for expansion. These results showed that the quality of information retrieval system using fuzzy ontology query expansion method improves remarkably.

In the future, we will implement fuzzy theory and neural network methods to build fuzzy ontology from unstructured data automatically. Also, we will compare the results with the proposed approach in this paper.

Acknowledgment

This research has been partially supported by Iran Telecommunication Research Center (Contract no. 20127/500), and the authors appreciate its supportive role gratefully.