Abstract

Nowadays, most real-world decision problems consist of two or more incommensurable or conflicting objectives to be optimized simultaneously, so-called multiobjective optimization problems (MOPs). Usually, a decision maker (DM) prefers only a single optimum solution in the Pareto front (PF), and the PF’s knee solution is logically the one if there are no user-specific or problem-specific preferences. In this context, the biomedical ontology matching problem in the Semantic Web (SW) domain is investigated, which can be of help to integrate the biomedical knowledge and facilitate the translational discoveries. Since biomedical ontologies often own large-scale concepts with rich semantic meanings, it is difficult to find a perfect alignment that could meet all DM’s requirements, and usually, the matching process needs to trade-off two conflict objectives, i.e., the alignment’s recall and precision. To this end, in this work, the biomedical ontology matching problem is first defined as a MOP, and then a compact multiobjective particle swarm optimization algorithm driven by knee solution (CMPSO-K) is proposed to address it. In particular, a compact evolutionary mechanism is proposed to efficiently optimize the alignment’s quality, and a max-min approach is used to determine the PF’s knee solution. In the experiment, three biomedical tracks provided by Ontology Alignment Evaluation Initiative (OAEI) are used to test CMPSO-K’s performance. The comparisons with OAEI’s participants and PSO-based matching technique show that CMPSO-K is both effective and efficient.

1. Introduction

Decision-making requires finding an optimum solution to a decision problem in the process of identifying and evaluating alternatives. Nowadays, most real-world decision problems consist of two or more incommensurable objectives to be optimized simultaneously, so-called multiobjective optimization problems (MOPs) [1]. In general, MOP does not have one optimum solution but a set of solutions, so-called Pareto optimal solutions, which are superior to the others in terms of one or more objectives. In many situations, a decision maker (DM) only prefers one single optimum solution in the Pareto front (PF) [2], and therefore, the optimization and the decision process are often combined. One of the multiobjective methods is driven by the PF’s knee solution which is the one for which an improvement in one of its objectives will result in a deterioration in another. Often, PF’s knee solution is logically preferred to the DM if there are no user-specific or problem-specific preferences [3].

In this context, the biomedical ontology matching problem in the Semantic Web (SW) domain is investigated. Although biomedical ontology is extensively used in the biomedical domain to model the biomedical knowledge, it is developed and maintained by different communities, and the same biomedical knowledge could be described with different terminologies or in different contexts. To bridge the semantic gaps between two biomedical ontologies and support their communications, it is necessary to find their identical concept mapping, which is the so-called biomedical ontology matching. Since biomedical ontologies often own large-scale concepts with rich semantic meanings, it is difficult to find a perfect alignment that could meet all DM’s requirements, and usually, the matching process needs to trade-off two conflict objectives, i.e., the alignment’s recall and precision. To this end, in this work, the biomedical ontology matching problem is defined as a MOP. Although there are many studies on the single-objective approaches for addressing the ontology matching problem [48], the research on multiobjective ontology matching techniques is still in this infancy [9]. Being inspired by the success of the particle swarm optimization algorithm (PSO) in the ontology matching domain [10], large-scale discrete optimization domain [1113], and biomedical engineering [14], in this work, a compact multiobjective particle swarm optimization algorithm driven by knee solution (CMPSO-K) is further proposed to address this problem. In particular, CMPSO-K uses a compact multiobjective evolutionary mechanism to efficiently optimize the alignment’s quality and a max-min approach to determine the PF’s knee solution. The contributions made in this paper are as follows: (1) a discrete multiobjective optimal model for the biomedical ontology matching problem is constructed; (2) a hybrid biomedical concept similarity metric is proposed, which can effectively calculate the similarity value of two biomedical ontology concepts; and (3) a CMPSO-K is proposed to optimize the biomedical ontology alignment, which takes into consideration both DM’s preference and algorithm’s performance.

The rest of the paper is organized as follows: Section 2 overviews the SIA-based ontology matching techniques; Section 3 defines the biomedical ontology matching problem and presents the similarity measure on biomedical concepts; Section 4 presents the CMPSO-K-based biomedical ontology matching technique in detail; Section 5 shows the experimental results; and finally, Section 6 draws the conclusion and presents the future work.

2. Swarm Intelligence Algorithm-Based Ontology Matching Technique

The first generation of swarm intelligence algorithm- (SIA-) based matcher dedicates to optimize the ways of aggregating various ontology matchers’ corresponding alignments. The very first matching system is Genetics for Ontology ALignments (GOAL) [7], which uses evolutionary algorithm (EA) to optimize the aggregating weight set of different ontology matchers. Later, Alexandru-Lucian and Iftene [15] further used EA to optimize one more parameter to filter the unauthentic concept mappings to final alignment. Acampora et al. [4] introduced a local search process into EA’s evolving process to improve the algorithm’s performance. Xue and Wang [16] used a new metric as the fitness function to guide the algorithm’s search direction. Their approach can address the holistic matching problem and determine a universal weight configuration for matching several pairs of ontologies at a time. He et al. [6] proposed to utilize artificial bee colony algorithm (ABC) to optimize all the parameters in the matching process, whose results are better than the EA-based matchers. More recently, Xue et al. [17] proposed a new approach that uses NSGA-III [18] to combine various similarity measures without tuning the aggregating parameters. However, when the scale of the similarity measures becomes huge, e.g., more than 50 similarity measures, this approach could be inefficient.

The above matchers need to calculate all the matchers’ alignments and store them in the main memory before aggregating them, which requires huge memory consumption. Recently, the second generation of the SIA-based matcher tries to directly find an entity correspondence set that is close to the golden alignment. GAOM (genetic algorithm-based ontology matching) [8] regards two ontologies as two discrete concept sets and employs EA to determine the optimal entity mapping set. Alves et al. utilized a memetic algorithm (MA), which combines EA with a local search strategy, to execute the instance-based ontology matching process [5]. They first matched the instances and then propagated the instance pair’s similarity value to the corresponding concepts. More recently, MapPSO [10] uses PSO to determine the optimal entity correspondence set. In particular, MapPSO introduces a new quality measure on the ontology alignment, which depends on the statistical results on the alignment. More recently, Chu et al. [19] first modeled the ontologies in the vector space so that two entities’ similarity value can be calculated through the cosine function, and then EA was used to determine the optimal alignments. Xue [20] proposed a new similarity metric for measuring the biomedical concepts’ similarity value and then used the firefly algorithm (FA) to optimize the biomedical ontology alignment. However, their proposal suffers from the premature convergence when matching large-scale ontologies. Our proposal also belongs to this category, but different from the existing work, in this work, the biomedical ontology problem is regarded as a discrete MOP, and a multiobjective SIA is presented to address it, which takes into consideration the DM’s preference.

3. Preliminaries

3.1. Biomedical Ontology Matching Problem

Ontology matching aims at determining the identical entity mappings, which is the so-called ontology alignment. In the past, the ontology alignment’s quality was often measured by f-measure [21], which is defined as follows:where and are, respectively, the alignments determined by the domain expert and ontology matcher. Recall and precision, respectively, measure A’s completeness and soundness, f-measure is their harmony mean, and [0, 1] is the weight to trade-off recall and precision. However, f-measure requires domain experts to provide R, which limits its application in real practice. Supposing a golden biomedical ontology alignment’s cardinality is 1 : 1, since the larger cardinality of the concept mapping set is and the higher the found correspondences’ mean similarity value is, the better the alignment quality is, three approximate measures, i.e., recall’, precision’, and f-measure’, are used to, respectively, approximate the original recall, precision, and f-measure [22]:where and are two ontologies’ concept scale, is ’s concept mapping number, and is the -th concept mapping’s similarity value.

Finally, biomedical ontology matching problem’s discrete multiobjective optimal model is defined as follows:where and are, respectively, the concept number of two biomedical ontologies, means the -th pair of concept correspondence, i.e., -th source concept is mapped to target -th concept, and two objective functions and are to, respectively, maximize recall’ and precision’ of ’s corresponding alignment.

3.2. Similarity Measure on Biomedical Concept

In this work, a hybrid similarity measure is used to enhance the confidence of the calculated similarity value, which takes into consideration the concept’s syntax and linguistic and context information. First, for each biomedical concept, the information (the label, comment, and property name) from itself and its context concept are put into its separated profile. Then, two biomedical concepts’ similarity value is calculated according to the following equation:where and are, respectively, two biomedical concept’s corresponding profiles and is calculated by integrating the SMOA [23] based on syntax measure and Unified Medical Language System (UMLS) [24] based on linguistic measure.

4. Compact Multiobjective Swarm Optimization Algorithm Driven by Knee Solution

This work proposes a CMPSO-K for solving the biomedical ontology matching problem, which approximates the population-based PSO’s evolving process through a probability vector (PV) [25]. In the next, the objective decomposition approach, the encoding mechanism, and knee solution determination are presented, respectively, and finally, the details of CMPSO-K are shown through the pseudo-code.

4.1. Objective Decomposition

In this work, the weighted sum approach is used to transform a MOP into a set of subproblems and solves them simultaneously.where is the number of decomposed problems and ,. Two objectives of our problem are to maximize recall’ and precision’, and the subproblem’s objective is defined as follows:where and , ,.

4.2. Encoding Mechanism

In this work, the Gray coding, a binary encoding mechanism, is used to encode an alignment. As can be seen from Figure 1, an example of the encoding mechanism is shown, in which the source concept “uterine gland” with index 8 is mapped to target concept “Uterine Gland” with index 6 whose Gray code is 110. In particular, Gray code 000 means a source concept is not mapped to any target concept.

Moreover, a PV is utilized to characterize a population for solving one decomposed subproblem, whose element number is equal to that of an individual, and each element in PV represents the probability of being 0. Therefore, various individuals can be generated with the binary code through a PV. Figure 2 shows an example of generating an individual through PV. Through comparing each of PV’s element with the random number, the new individual’s element value can be determined. It is obvious that when all of PV’s elements are close to 1 or 0, the algorithm tends to converge. At the end of each generation, PV is updated according to the best individual by increasing (or decreasing) its element value if the corresponding element of that individual is 1 (or 0).

4.3. Crossover Operator

The crossover operator generates one child individual by mixing the information of two parent individuals, which is carried out according to the crossover probability. In this work, the uniform crossover operator is used, whose pseudo-code is shown in Algorithm 1. Given two parent solutions and , each gene bit value of their offspring new is the same with the corresponding gene bit value of (or ) when a random number is larger than (or smaller than) the crossover rate.

Input: two individuals and , crossover probability
Output: a new individual new
(1) = ;
(2)for  = 1;  ≤  ., ++ do
(3)if  <  then
(4) = ;
(5)end if
(6)end for
(7)return ;
4.4. Determination of Knee Solution

Figure 3 shows an example of knee solution and user preferred region. As can be seen from the figure, the knee solution, a subproblem’s elite solution, and the PF form a user preferred region for that subproblem. In each generation, the newly generated solution tends to move toward that region to meet a DM’s requirement. Thus, the knee solution’s determination is of utmost importance for the algorithm’s search performance. In this work, for those solutions in the PF, a max-min approach is used to determine the knee solution. In particular, suppose and are two solutions in the PF, and and , , are, respectively, their recall’ and precision’, and they can be compared according to the following formula:

On this basis, a best solution can be selected from PF solutions, and this procedure is shown in Algorithm 2.

Input: PF solution set
Output: a knee solution
(1);
(2)for  = 1;  ≤  ., do
(3)
(4)if then
(5)   = ;
(6)   = ;
(7)end if
(8)end for
(9)return
4.5. Pseudo-Code of Compact Multiobjective Particle Swarm Optimization Algorithm Driven by Knee Solution

The pseudo-code of CMPSO-K is presented in Algorithm 3. CMPSO-K first divides the problem into three subproblems that, respectively, maximize , , and . First, three PVs and local best individuals are, respectively, initialized for three subproblems, and then the knee solution (or global best individual) is initialized by first using the nondominated sorting algorithm [26] on the population and then the knee solution in the PF is determined. In each generation, CMPSO-K tries to solve each subproblem by approximating PSO’s position updating strategy, i.e., crossover an individual with the local best individual and global best individual to obtain a new individual, and then use the new one to update the local best individual and PV. After solving each subproblem, a population in the current generation is obtained by 3 local best individuals and 6 individuals in total generated through , , and , respectively, and then the nondominated sorting algorithm is used to determine its PF and the current generation’s knee solution . Through comparing with historical knee solution through the max-min approach, the latter can be updated. Finally, when the generation approaches the maximum generation , the algorithm terminates and returns .

Input: maximum generation , crossover rate , step length for updating
Output: knee solution
(1)Initialization
(2)initialize generation ;
(3)initialize , and by setting all the elements inside as ;
(4)initialize three local best individuals , and ;
(5)generate individuals through , and ;
(6);
(7)initialize the knee solution (or global best individual) ;
(8)Evolving Process
(9)while do
(10)Updating
(11)generate an individual through ;
(12)
(13);
(14);
(15);
(16)if then
(17) = ;
(18)end if
(19)Updating
(20)generate an individual through ;
(21) = ;
(22) = ;
(23)
(24);
(25)if winner = =  then
(26)  =  ;
(27)end if
(28)Updating
(29)generate an individual through ;
(30) = ;
(31)  =  ;
(32)
(33);
(34)if then
(35) = ;
(36)end if
(37)Updating
(38)generate 2 individuals through , and , respectively,
(39);
(40)determine current generation’s knee solution ;
(41);
(42)if then
(43) = ;
(44)end if
(45);
(46)end while
(47)return ;

5. Experiment

To test CMPSO-K’s performance, the experiment exploits three biomedical tracks in Ontology Alignment Evaluation Initiative (OAEI), i.e., anatomy track, large biomed track, and disease and phenotype track. The testing cases in these tracks are all practical ontologies that are widely used in the biomedical domain and open to achieve, which have lots of overlapping information with different representations. OAEI provides the reference alignments for each track to test a matcher’s performance, and Table 1 briefly describes the ontologies in these tracks.

In terms of the alignment’s quality, the EA-based matcher [8], ABC-based matcher [6], PSO-based matcher [10], and OAEI’s participants are compared in Tables 2 and 3, and also, in terms of the memory consumption and converging speed, CMPSO-K-based matcher and other SIA-based matchers are compared in Figures 4 and 5. All the SIAs’ results are the mean values of 30 independent executions. EA, ABC, and PSO’s configurations are referred to their literatures, and CMPSO-K uses the following configuration: number of decomposed problems: ; maximum generation: ; crossover probability:; and step length for updating PV:. This configuration is determined in an empirical way, which represents a trade-off setting in the experiment to obtain the highest average results on all testing cases.

First, Friedman’s test [32] is used to figure out whether all the matchers present any difference and then determines whether one matcher statistically outperforms others through Holm’s test [33]. In Friedman’s test, the null hypothesis is that all the matchers are equivalent, and if the computed value must be equal to or greater than the tabled critical chi-square value at the specified level of significance [34], this hypothesis can be rejected. In this work, a level of significance is chosen, and the critical value for 8 degrees of freedom (since there are 9 matchers), i.e., , needs to be considered.

In Table 2, computed  = 54.43, which is greater than 15.507, and therefore, the null hypothesis can be rejected, and Holm’s test can be further carried out. As shown in Table 2, since the CMPSO-K-based matcher ranks with the lowest value, it is set as a control matcher that will be compared with others.

In Holm’s test, value is the testing statistic for comparing the -th and -th matchers, which is used for finding the that is the corresponding probability from the table of the normal distribution. is then compared with , which is an appropriate level of significance. According to Table 3, it is possible to state that our approach statistically outperforms other biomedical ontology matchers on f-measure at significance level. Since the multiobjective evolving mechanism can better trade-off two objectives and the knee solution can effectively guide the algorithm’s search direction, CMPSO-K’s solutions are much better than other SIAs.

Figures 4 and 5, respectively, compare CMPSO-K with other SIAs on the memory consumption and converging speed. As can be seen from the figures, CMPSO-K can significantly improve the converging speed and reduce the memory consumption, which shows the effectiveness of the compact encoding mechanism and the compact evolutionary operators. To sum up, CMPSO-K-based ontology matching technique can efficiently optimize the biomedical ontology alignments.

6. Conclusion and Future Work

To optimize biomedical ontology alignment’s quality, in this paper, a discrete multiobjective optimal model is constructed, and a hybrid similarity metric to distinguish the heterogeneous biomedical concepts, a CMPSO-K-based ontology matching technique is then proposed for addressing it. Compared with the most existing SIA-based ontology matching techniques, CMPSO-K takes into consideration both the algorithm’s performance and the DM’s preference. Accordingly, three methods are proposed to achieve this goal: (1) the compact encoding mechanism for saving the memory consumption and runtime; (2) multiobjective decomposition and evolutionary mechanism for trading-off different objectives; and (3) the max-min strategy for determining the knee solution and guiding the algorithm’s searching direction. Our work presents a novel compact multiobjective evolutionary framework that can improve the efficiency of the current SIA-based ontology matching technique.

In the experiment, the f-measure values obtained by CMPSO-K outperform all the other competitors, which shows that CMPSO-K can effectively optimize the ontology alignments. In particular, the quality of alignment of CMPSO-K is better than EA ABC and PSO, which shows that CMPSO-K’s multiobjective evolutionary mechanism driven by the knee solution can effectively trade-off two optimal objectives and find the better solution. Moreover, since the compact encoding mechanism uses PVs to represent the swarms and the compact evolutionary operators can simplify the population-based PSO’s evolving process, CMPSO-K can significantly reduce the memory consumption and runtime. Since none of the similarity measures can effectively distinguish all the heterogeneous concepts in any situations, it is necessary to aggregate several similarity measures to improve the result’s precision. We utilize a hybrid similarity measure which combines three kinds of similarity measures to calculate the entity similarity value, and therefore, CMPSO-K’s results are significantly higher than other systems that only take into consideration one or two categories of similarity measure, such as DOME, POMAP++, and LogMap. However, AML applies too many similarity measures that lead to the conflicting results, which decrease its recall value. Thus, how many similarity measures should be selected and combined to ensure the quality of the alignment will be one of our future work. Last but not the least, CMPSO-K takes into consideration of DM’s preference and utilizes the knee solution to guide the algorithm’s search direction, which can effectively trade-off two optimal objectives and find better solutions.

To further improve the matching process’s efficiency, in the future, we will be interested in developing a biomedical ontology partitioning technique to split two ontologies into disjoint ontology segments so that the large-scale problem can be converted into several small-scale segment matching problems. After that, the parallel computation can also be further utilized to match the similar segments and improve the matching process’s performance. With respect to the similarity measure, we would like to develop an adaptive biomedical concept similarity measure framework, which can instantiate an effective similarity metric according to two biomedical ontologies’ heterogeneous characteristics.

Data Availability

The data used to support this study can be found in http://oaei.ontologymatching.org.

Conflicts of Interest

The authors declare that they have no conflicts of interest in the work.

Acknowledgments

This work was supported by the National Natural Science Foundation of China (no. 61503082), the Natural Science Foundation of Fujian Province (no. 2016J05145), the Program for New Century Excellent Talents in Fujian Province University (no. GY-Z18155), the Program for Outstanding Young Scientific Researcher in Fujian Province University (no. GY-Z160149), and the Scientific Research Foundation of Fujian University of Technology (nos. GY-Z17162 and GY-Z15007).