Abstract
Security ontology can be used to build a shared knowledge model for an application domain to overcome the data heterogeneity issue, but it suffers from its own heterogeneity issue. Finding identical entities in two ontologies, i.e., ontology alignment, is a solution. It is important to select an effective similarity measure (SM) to distinguish heterogeneous entities. However, due to the complex semantic relationships among concepts, no SM is ensured to be effective in all alignment tasks. The aggregation of SMs so that their advantages and disadvantages complement each other directly affects the quality of alignments. In this work, we formally define this problem, discuss its challenges, and present a problem-specific genetic algorithm (GA) to effectively address it. We experimentally test our approach on bibliographic tracks provided by OAEI and five pairs of security ontologies. The results show that GA can effectively address different heterogeneous ontology-alignment tasks and determine high-quality security ontology alignments.
1. Introduction
Security ontology builds a shared knowledge model for an information system’s security area to facilitate the establishment of trust relationships [1]. Figure 1 shows an example of security ontology. An oval denotes a concept, such as SecurityProtocol or ProtocolEncryption. The arrow between two concepts denotes a subsumptive relationship, for example, ProtocolSignature is subsumed by SecurityProtocol. A concept might have properties, such as the XACML and ACL properties of ProtocolAccessControl. However, security ontologies have different application requirements and bias interest, which causes the ontologies themselves to suffer from the heterogeneity problem. Finding identical entities in two security ontologies, i.e., security ontology alignment, is a solution to this issue [2, 3]. It is important to use a similarity measure (SM) to distinguish heterogeneous entities when aligning security ontologies. However, due to the complex semantic relationships among concepts, no SM is effective in all contexts. Hence, it is important to aggregate SMs so that their advantages and disadvantages complement each other.

The most flexible way to aggregate SMs is the parallel framework, which assigns a weight for each SM to obtain the final alignment. During this procedure, each SM's similarity matrix is calculated, whose rows and columns, respectively, represent two ontologies’ entities and whose elements are their similarity values. The aggregated matrix is determined by aggregating all the matrices with the weighted mean strategy. A threshold is used to filter elements with low similarity values to obtain the final matrix, which is decoded to the ontology alignment. It is a complex problem to determine the optimal aggregating weight set for SMs since there are many local optimal solutions. Genetic algorithm (GA) [4, 5] is a classic global optimization algorithm, which is adept at solving the optimization problem without the information of the objective’s gradient. Being inspired by its success in the complex optimization domains [6, 7], we build a mathematical model under a parallel aggregating framework to define the security ontology alignment problem, propose a problem-specific GA to address it, and determine high-quality security ontology alignments.
The remainder of this paper is arranged as follows. Section “Preliminaries” defines the security ontology alignment and similarity measure. Section “Genetic Algorithm to Integrate Security Ontologies” describes the GA-based alignment technique. Experimental results are discussed in section “Experiment,” and section “Conclusion” relates our conclusions.
2. Preliminaries
2.1. Security Ontology Alignment
Security ontology consists of concepts, properties, and axioms, and an ontology alignment is a mapping set. A mapping is a 3-tuple (, , ), where and are two ontologies’ entities, and is their similarity [8, 9]. Aligning ontologies require us to find the correspondence between two ontology entities to bridge their semantic gap. As shown in Figure 2, the input of ontology alignment is a pair of ontologies. After using different SMs to determine the corresponding similarity matrices, GA is used to optimize their aggregating weights to obtain the final alignment.

A security ontology alignment’s quality can be measured with metrics in the information retrieval domain [10]:where and are, respectively, an alignment and reference alignment and denotes a set’s cardinalities. Here, f-measure is the harmony mean of recall and precision. On this basis, the security ontology alignment problem has the objective to maximize the f-measure, and the decision variable is , where , is the SM’s aggregating weight, and . In this work, we choose the weighted average strategy to aggregate the SMs, which is the most popular and flexible method in the domain of information fusion of combining SMs. The other aggregating mechanisms, such as those in the field of evidential reasoning and fuzzy reasoning, could be also applied, which is one of our future works.
2.2. Similarity Measure
SM can generally be categorized as either syntactic, linguistic, or taxonomy SM [11, 12], which we describe as follows.
Syntactic SM calculates the similarity of two strings through their edit distance. We use the Levenshtein distance [13]:where and are the respective character numbers of strings and and is their edit distance.
Linguistic SM utilizes an electronic dictionary to measure the similarity of two words. We use WordNet [14, 15] as the electronic knowledge base. Linguistic similarity is defined aswhere and are words derived from two entities and denotes the number of meanings of .
Taxonomy SM uses the context of concepts and to determine their similarity [16, 17]:where and are the superclasses of and , respectively, and and are, respectively, their th and th subclasses. In particular, the taxonomy SM determines the similarity value by calculating the average similarity of two concepts’ parent pair and all their direct subclass pairs.
3. Genetic Algorithm to Integrate Security Ontologies
3.1. Encoding Mechanism
In this work, we use binary coding [18] to reduce the evolutionary operation’s computational complexity. Considering that the coding information must contain the weight set of SMs, we store them in disguised form by storing the cutting points in the coding information. We sort a set of cutting points in the ascending order as , and then we can get the corresponding weight set:
Through calculation, we can use cutting points to obtain aggregating weights. This work selects three SMs, so we need to encode the information of two cutting points. We use 10 gene bits to represent a cutting point; hence, the length of a chromosome is 20 gene bits. Figure 3 shows an example of the encoding mechanism, where two cutting points represent the aggregating weights of the SM, and five gene bits are used to encode each cutting point. As shown in the figure, a chromosome is decoded to decimal to obtain the cutting point set , which is sorted to obtain the cutting point set . Then, weights , , and are calculated according to formula (5).

3.2. Selection
The selection operator is the kernel component of GA, which decides whether a solution’s gene information can persist. A solution with a higher fitness value should have a greater probability of selection, but one with a lower fitness value should also have a certain opportunity. This work empirically chooses the classic roulette selection operator. The probability of selecting an individual is the ratio of its fitness value to the sum of the fitness values of all solutions; hence, each individual has the opportunity to be selected. If the th solution has fitness value , its selection probability is .
3.3. Crossover
The crossover operator mixes the genes of two parent solutions according to a crossover probability. We randomly select a cutting point using the single-point crossover operator [19], and two children are generated by swapping the right parts of two parents’ genes.
3.4. Mutation
The mutation operator aims to maintain population diversity, which is critical to the algorithm's searching ability. This work selects the locus mutation operator [20], which judges whether a gene value should be flipped by generating a random number in [0, 1] and comparing it with the mutation probability.
3.5. Pseudocode of Genetic Algorithm
Given the maximum generation , we present the GA pseudocode: Initialization for; ; do for; ; do ; end for end for Evaluation for; ; do evaluation(); end for Evolution ; whiledo crossover(); mutation(); for; ; do evaluation(); end for selection(); saveElite(); ; end while
The gene values of each individual are initialized as 1 or 0, and then the population’s solutions are evaluated. In each generation, the crossover and mutation operators are successively applied, and all solutions are re-evaluated. The selection operator is then used to determine the population of the next generation. Finally, the worst solution is replaced by the best one in the history (i.e., the elite solution).
3.6. Experiment
We utilized the Bibliographic track from OAEI (http://oaei.ontologymatching.org) to test the performance of our proposal. In particular, 1XX and 2XX are the respective testing cases with IDs beginning with 1 and 2. In 1XX, two ontologies under alignment are exactly the same except for different OWL restrictions, while in 2XX, they are heterogeneous in terms of the entity name and/or the concept’s hierarchical structure. We also chose four pairs of specialized security ontologies for testing: (1) Network Security Ontologies—Network Attack Ontology (NAO) [21] and Ontology-Based Attack Model (NAM) [22]; (2) Security Requirement-Related Ontologies—Security and Domain Ontology for Security Requirement Analysis (SDOSRA) [23] and Extended Ontology for Security Requirements (EOSR) [24]; (3) Miscellaneous Security Ontologies—Ontological approach toward Cyber Security in Cloud Computing (OCSCC) [25] and Ontology in Cloud Computing (OCC) [26]; (4) Application-Based Security Ontologies—Security Ontology for Mobile Applications (SOMA) [27] and Security Ontology for Mobile Agents Protection (SOMAP) [28], and Cloud Security Policy (CSP) [29] and Cloud Ontology (CO) [30]. The threshold for filtering the final alignment was set as 0.85, and the configuration of GA was empirically set to a maximum 3000 generations, crossover rate 0.6, and mutation rate 0.02. In the experiment, we compared our approach with OAEI’s participants, Table 1 compares the results in terms of recall and precision, and Figure 4 compares the f-measures. Table 2 shows the results of using GA to align the security ontologies. The results of our approach were the mean values of 30 independent runs.

As shown in Table 1, the recall and precision of our approach were generally higher than those of OAEI. This is because GA is able to effectively jump out of lots of local optimas, and find the optimal aggregating weights from large-scale feasible solutions. In particular, the precision of our approach was high, which shows that aggregating different similarity measures can effectively distinguish heterogeneous entities.
As can be seen from Figure 4, the results of our approach were the best on 1XX testing cases, which shows that GA can effectively align two ontologies with the same entities and structures. In addition, with respect to different heterogeneous tasks on 2XX testing cases, our approach was also effective, which shows that our approach is able to address the matching problem with different heterogeneity characteristics.
Table 2 depicts the results of approaches to aligning five pairs of real security ontologies, which show our approach can achieve a high capacity on all testing cases in terms of the f-measure. To sum up, our approach was robust at addressing different alignment tasks and could determine high-quality security ontology alignments.
4. Conclusions
To ensure communication and cooperation among different security applications built on security ontologies, we proposed a GA-based ontology alignment technique to address the security ontology heterogeneity problem. We defined the problem, discussed its challenges, and presented a problem-specific GA to effectively address it. Bibliographic tracks provided by OAEI and five pairs of security ontologies were used to test our approach’s performance. The experimental results show that our approach is able to align different heterogeneous ontologies and determine high-quality security ontology alignments.
In the future, we are interested in adaptive similarity selection, which determines effective and nonconflicting similarity measures according to the heterogeneous features of two ontologies under alignment. Moreover, when the number of similarity measures is large, some strategies to improve efficiency should be introduced to improve GA’s performance.
Data Availability
The data used to support this study can be found in the corresponding footnotes.
Conflicts of Interest
The authors declare that they have no conflicts of interest in the work.
Acknowledgments
The authors thank LetPub (https://www.letpub.com) for its linguistic assistance during the preparation of this manuscript. This work was supported by the Natural Science Foundation of Fujian Province, China (grant no. 2019J01889); the “Tiancheng Huizhi” Innovation and Education Promotion Fund, China (grant no. 2018A02005); and the Education-Scientific Research Project for Middle-Aged and Young of Fujian Province, China (grant no. JT180626).