Abstract

Since Internet of Everything (IoE) makes all the connections that come online more relevant and valuable, they are subject to numerous security and privacy concerns. Cybersecurity ontology is a shared knowledge model for tackling the security information heterogeneity issue on IoE, which has been widely used in the IoE domain. However, the existing CSOs are developed and maintained independently, yielding the CSO heterogeneity problem. To address this issue, we need to use the similarity measure (SM) to calculate two entities’ similarity value in two CSOs and, on this basis, determine the entity correspondences, i.e., CSO alignment. Usually, it is necessary to integrate various SMs to enhance the result’s correctness, but how to combine and tune these SMs to improve the alignment’s quality is still a challenge. To face this challenge, this work first models CSO matching problem as a Constrained Multiobjective Optimization Problem (CMOOP) and then proposes a Coevolutionary Multiobjective Evolutionary Algorithm (CE-MOEA) to effectively address it. In particular, CE-MOEA uses the multiobjective evolutionary paradigm to avoid the solutions’ bias improvement and introduces the coevolutionary mechanism to trade off Pareto Front’s (PF’s) diversity and convergence. The experiment uses Ontology Alignment Evaluation Initiative’s (OAEI’s) bibliographic track and conference track and five real CSO matching tasks to test CE-MOEA’s performance. Comparisons between OAEI’s participants and EA- and MOEA-based matching techniques show that CE-MOEA is able to effectively address various heterogeneous ontology matching problems and determine high-quality CSO alignments.

1. Introduction

Internet of Everything (IoE) is one such technological advancement that represents an interconnected network of people, processes, data, and things. Since IoE makes all the connections that come online more relevant and valuable, they are subject to numerous security and privacy concerns [1]. Cybersecurity ontology is the shared knowledge model for standardizing the security terminologies, setting up the relationship among them, and eliminating semantic differences between different security policies on IoE [2]. Figure 1 shows a fragment of a CSO, where an oval node denotes a concept, such as concept “SecurityPolicy” and “SecurityObject”; the edge connecting two nodes represents the relationship of two concepts; e.g., concept “SecurityToken” is subsumed by concept “SecurityAssertion”; a concept might have properties; e.g., concept “AlternativeType” owns the properties “Capability” and “Requirement.”

However, the existing CSOs are developed and maintained independently, yielding the CSO heterogeneity problem. Finding the semantically equivalent entity pairs in two security ontologies, i.e., CSO matching, is an effective solution to this issue. When matching two CSOs, it is necessary to use the similarity measure (SM) to calculate two entities’ similarity value. However, no SM can ensure its effectiveness in all contexts, and we usually need to comprehensively aggregate several SMs to improve the results’ confidence. In recent years, Evolutionary Algorithm (EA) [3] has become a popular method of optimizing SM’s aggregating weights [4, 5], being dedicated to maximizing the alignment’s f-measure [6]. According to Xue et al. [7], the single-objective EA tends to improve the solution’s quality by improving recall (which measures the alignment’s completeness) or precision (which measures the alignment’s correctness) while sacrificing the other, yielding solution’s bias improvement. To improve the ontology alignment’s quality, this work makes the following contributions: (1) A Constrained Multiobjective Optimization Model for the CSO matching problem is constructed, trying to simultaneously optimize the alignment’s completeness and correctness. (2) A Coevolutionary Multiobjective Evolutionary Algorithm (CE-MOEA) is proposed to determine the solutions that represent the trade-offs between the alignment’s completeness and correctness. In particular, CE-MOEA uses a new paradigm of coevolutionary framework to solve the Constrained Multiobjective Optimization Problem (CMOOP) with the assistance of solving a helper problem. The helper problem is a simpler version of the original MOP, and they are separately addressed by the same multiobjective optimizer. CE-MOEA is characterized by the weak cooperation between two populations, which can be more effective than strong cooperation in existing MOEAs for solving CMOOP [8].

The rest of this paper is organized as follows: after surveying EA-based ontology matching techniques (Section 2), the definition of the cyber ontology matching problem is given (Section 3), and the problem-specific CE-MOEA for addressing this problem is presented (Section 4), followed by the experiment and the corresponding analysis (Section 5). Finally, the conclusion is drawn and future work is presented (Section 6).

2.1. Evolutionary Algorithm Based Ontology Matching Technique

How to combine and tune different similarity measures to improve the ontology alignment’s quality is a challenging problem [9], and EA is a state-of-the-art methodology to face it [10]. Martinez et al. [11] first propose to improve ontology alignment through EA. They are dedicated to finding a suitable weight set for aggregating three kinds of similarity measures in parallel. After that, Ginsca et al. [12] and Naya et al. [13] further optimize another parameter for the matching process, i.e., the threshold for determining the final alignment. The above three works with the objective of maximizing the alignment’s quality suffer from two drawbacks: (1) a reference alignment should be provided in advance to evaluate the alignment’s quality, but it is not always available in the practical matching task; (2) a bias improvement on the solutions caused by f-measure would bring negative impacts on the results. To overcome these issues, Xue et al. [14] propose the approximate evaluating metrics on alignment’s quality and introduce Unanimous Improvement Ratio (UIR) to ensure the solutions’ unanimous improvement during algorithm’s search process. Their work is able to match more than one pair of heterogeneous ontologies and find the uniform aggregating weights. Later on, Lv et al. [15] not only use the approximate metrics to evaluate the solutions, but also introduce the adaptive selection pressure to improve the algorithm’s efficiency. Moreover, the local search strategy and compact encoding mechanism are also combined with EA to improve its searching efficiency [16]. More recently, Lin et al. [17] propose to use EA to aggregate several similarity measures and optimize the alignment’s quality. To better trade off the completeness and correctness of the alignment and improve the searching efficiency, Acampora et al. [18] and Xue et al. [19, 20] regard the matching problem as a multiobjective optimizing process and, respectively, used two popular MOEAs, i.e., NSGA-II [21] and MOEA/D [22], to address it. Their approaches aim to find a set of non-dominated solutions that represent a balance between an alignment’s completeness and correctness, and the solutions with the best sub-objective values in the Pareto Front (PF) are selected as the output. To improve the algorithm’s efficiency, the meta-model is introduced to evaluate the solution’s fitness, which can effectively address the expensive evaluating issue [23]. However, the constrained multiobjective CSO matching problem poses stiff challenge to the existing MOEA-based matching techniques, because it is difficult for them to handle both objectives and constraints so as to ensure the solutions’ convergence and diversity.

2.2. Coevolutionary Algorithm for Constrained Multiobjective Optimization Problem

For decades, MOEAs have shown their effectiveness in solving Multiobjective Optimization Problem (MOOP) [24], and in recent years, more attention has been drawn to CMOOP, such as collaborative CTAEA [25] and PPS with biphasic search [26]. A CMOOP is formally defined as follows:where is D-dimensional decision variable; is the decision space; consists of M objectives; and and are, respectively, the inequality constraints and the equality constraints. The constraints define a feasible region for CMOOP, and the algorithm should determine the feasible solutions to minimize the objectives as much as possible. Since the constraints and the objectives should be separately handled and balanced, CMOOP should not be regarded as the extension of classical MOOP.

With the development of Coevolutionary Algorithm and its effectiveness on many challenging problems, the coevolutionary constraint handling technique is used in addressing CMOOP. Ceollo [27] and Huang et al. [28], respectively, propose a Coevolutionary EA and Coevolutionary Differential Evolution (DE) Algorithm to address the CMOOP. To balance the constraints and objectives, they assign each subpopulation an independent penalty factor and evolve them simultaneously. Liu et al. [29] propose a coevolutionary framework that consists of two subpopulations. One subpopulation is dedicated to optimizing the objectives without considering the constraints, while the other tries to minimize the violation of constraints. Kieffer et al. [30] first decompose the constraints and assign each constraint to a subpopulation. After that, each subpopulation tries to satisfy more constraints with the requisition that its assigned constraint is met. Wang et al. [31] use M subpopulations to address M constrained single-objective optimization problem, and then find a new subpopulation for solving M-objective CMOOP.

Although CMOOP has been studied for two decades and various techniques have been suggested in the state-of-the-art MOEAs, it is still difficult to address the CMOOP with small feasible region, which might lead to a poor convergence and diversity [32]. In addition, the strong cooperation between subpopulations yields the difficulties of keeping the population’s convergence and diversity. To address these issues, we propose a CE-MOEA, which makes use of two subpopulations with weak cooperating framework to address the multiobjective CSO matching problem. CE-MOEA is able to better balance the solutions’ convergence and diversity.

3. Cybersecurity Ontology Matching Problem

A CSO consists of the class set, the datatype property set, and the object property set [17], and the existing CSOs can be generally categorized into three categories, i.e., generalized security ontologies, specialized security ontologies, and miscellaneous security ontologies [33]. Due to the human subjectivity, these CSOs might have different ways of class definitions, yielding the ontology heterogeneity problem, which hampers their communications. Ontology matching is dedicated to finding the set of entity correspondences between heterogeneous entities, i.e., ontology alignment. Each entity correspondence consists of two entities, their relationships (typically equivalence ) and the confidence that it holds [34]. Figure 2 shows the flowchart of matching two ontologies. Each SM is used to construct a similarity matrix for two ontologies under alignment, whose row and column are the entities of two ontologies, and its element is the similarity value of two corresponding entities. After that, these similarity matrices are aggregated into one matrix, which is then converted into the ontology alignment.

Recall and precision are two classical metrics for evaluating the quality of an alignment [5], which, respectively, measure an alignment’s completeness and correctness:where |A| and |RA| are, respectively, the numbers of correspondences in the alignment A and the reference alignment RA, and is the number of the true positive correspondences in A.

It is difficult to obtain a perfect ontology alignment, whose recall and precision are both equal to 1.00; therefore, we need to balance them during the matching process [34]. Assume n is the number of similarity measures; CSO matching problem is formally defined as follows:where is the aggregating weight of similarity matrix, and and , respectively, calculate the decision variable corresponding alignment’s recall and precision.

We can also use the statistic-based approach to approximately calculate the recall and precision of an alignment, which are, respectively, defined as follows [35]:where , , and are, respectively, the entity numbers of ontologies and , and their alignment A; is the number of mapped entities in A; and is the correspondence’s similarity value. The motivations behind the metrics and are that the more the mapped entities are, the more the correct mappings found could be (i.e., the higher the recall could be), and the higher the average similarity value is, the higher the confidence of the alignment could be (i.e., the higher the precision could be). On this basis, we define the helper problem as follows:where and , respectively, calculate the decision variable X’s corresponding alignment’s approximate recall and precision. uses the most sound metric to ensure the population’s convergence, while is relatively relaxed, and the population for addressing it could be more diverse. The cooperation between two populations, which aim to address and , respectively, can bring mutual benefits for them and guide the algorithm to ensure both convergence and diversity of the population.

4. Coevolutionary Multiobjective Evolutionary Algorithm

(1) initialize ;
(2) initialize ;
(3)
(4)
(5)
(6)While gen < MaxGen do
(7)  
(8)  
(9)  
(10)  
(11)  
(12)  
(13)  
(14)  
(15)  
(16)  
(17)  
(18)  gen = gen+1;
(19) end while
(20) Return

This work uses the binary encoding mechanism; please see also our previous work [36] for more details. As shown in Algorithm 1, two subpopulations with size N are first randomly initialized and then evaluated by the original problem and helper problem . In each generation, two parent sets and with size N/2 are randomly selected from and . Each parent set generates an offspring population with size N/2 with the single-point crossover operator and flip-bit mutation [37]. Afterwards, and are both combined with two offspring populations, and , which are, respectively, evaluated by and . Finally, we execute NSGA-II’s non-dominated sorting and environmental selection on and . When the generation gen reaches the maximum generation MaxGen, the algorithm terminates and returns as the output.

CE-MOEA always evaluates by and evolves to solve , and since is a simplified version of , the evaluation by does not increase the algorithm’s computational complexity. is simpler, and thus usually converges quickly and has better diversity. assists in solving by sharing its offspring, which is able to improve converging speed and helps it jump out of the local optimums. Different from other MOEAs which make subpopulations cooperate in the whole evolving process, CE-MOEA evolves two subpopulations separately except for sharing their offspring in each generation. CE-MOEA uses a weak cooperation to offer each subpopulation freedom to evolve and makes one subpopulation to assist the other to address the original optimization problem. According to Tian et al. [7], the coevolutionary paradigm with weak cooperation is more effective than a strong cooperation.

5. Experiment

5.1. Experimental Configuration

In the experiment, we first compare our approach with EA-based matching technique [17], NSGA-II-based matching technique [19], and OAEI’s participants on bibliographic track and conference track provided by Ontology Alignment Evaluation Initiative (OAEI). In particular, OAEI’s bibliographic track requires matching two bibliographic ontologies, and the target ontology’s entity names could be random strings or synonyms. The hierarchy could be expanded or flattened, the properties could be suppressed, and the classes could be refined by several subclasses or flattened. OAEI’s conference track requires matching 16 different ontologies on the conference organization, which have been used in some actual conference series and the corresponding conference web sites. After that, we compare CE-MOEA with EA-based and NSGA-II-based matching techniques on five pairs of real CSOs, which are all popular ontologies in the cybersecurity domain and own large quantities of heterogeneous entities:(1)Network Security Ontologies: Network Attack Ontology (NAO) [38] and Ontology-based Attack Model (NAM) [39].(2)Security Requirement-related Ontologies: Security and Domain Ontology for Security Requirement Analysis (SDOSRA) [40] and Extended Ontology for Security Requirements (EOSR) [41].(3)Miscellaneous Security Ontologies: Ontological approach toward Cybersecurity in Cloud Computing (OCSCC) [42] and Ontology in Cloud Computing (OCC) [43].(4)Application-Based Security Ontologies: Security Ontology for Mobile Applications (SOMA) [44] and Security Ontology for Mobile Agents Protection (SOMAP) [45].(5)Cloud Security Ontologies: Cloud Security Policy (CSP) [46] and Cloud Ontology (CO) [47].

Finally, we carry out the T-test to statistically compare three EA-based matching techniques. In particular, the configurations of EA and NSGA-II are referred to in their papers, and the configuration of CE-MOEA is as follows:(1)Population size = 20.(2)Maximum generation = 2000.(3)Crossover rate = 0.65.(4)Mutation rate = 0.012.

Three categories of similarity measures used by CE-MOEA are as follows:(1)Syntax-based similarity measure: Levenshtein distance [48].(2)Linguistic-based similarity measure: WordNet-based distance [49].(3)Taxonomy-based similarity measure: context-based distance [50].

The algorithm’s configurations are determined through the empirical experiments, and their robustness against different heterogeneous matching tasks is verified through the experimental results. Three similarity measures are the classical ones that belong to three categories of similarity measures in ontology matching domains, which have been proved to have mutual benefits in enhancing the results’ confidence [11].

5.2. Experimental Results

Tables 1 and 2 make comparisons on OAEI’s testing cases in terms of recall, precision, and f-measure. In particular, f-measure is a uniform mean of recall and precision. Figures 3, 4, and 5 respectively compare EA, NSGA-II, and CE-MOEA on CSO matching tasks. Table 3 compares CE-MOEA with EA and NSGA-II with the mean f-measure and the corresponding standard deviation , and in Table 4, the statistical T-test [51] is executed on the data presented in Table 3. The results of EA, NSGA-II, and CE-MOEA presented in the tables and figures are the mean values of 30 independent runs.

As shown in Tables 1 and 2, compared with OAEI’s participants, EA-, NSGA-II-, and CE-MOEA-based matching techniques comprehensively take into consideration several similarity measures, whose precision values are generally high. In addition, the iterative refinement on the alignment is an effective way of finding more correct entity correspondences; therefore, EA-based matching techniques’ recall values are also high in general.

In Figures 3, 4, and 5, since MOEA is able to better trade off the alignment’s recall and precision, NSGA-II and CE-MOEA’s results are better than those of classical EA. With the introduction of the coevolutionary mechanism, CE-MOEA is able to further improve the results’ quality by helping the algorithm jump out of the local optimum. In particular, the subpopulation for the helper problem can improve the diversity in general, while the subpopulation for the original problem ensures the algorithm’s convergence. The cooperation between them is able to better trade off the PF’s diversity and convergence and further improve the alignment’s quality.

In Table 4, T-test’s degree of freedom of is 2, and the significant level is 0.05. On all testing cases, the p values are all smaller than 0.05, and thus, we can draw the conclusion that CE-MOEA statistically outperforms EA- and NSGA-II-based matching techniques at the significance level of 5%. To conclude, CE-MOEA-based ontology matching technique is able to effectively address various ontology heterogeneity problems and determine high-quality CSO alignments.

6. Conclusion and Future Work

Due to the distributed and independent nature of cybersecurity systems, it is necessary to match various heterogeneous CSOs to manage cybersecurity knowledge on IoE. To this end, this work proposes a CE-MOEA-based matching technique to effectively determine CSO alignment. CE-MOEA uses the multiobjective evolutionary paradigm to avoid the solutions’ bias improvement and introduces the coevolutionary mechanism to trade off PF’s diversity and convergence. The experiment uses OAEI’s bibliographic track and conference track and five real CSO matching tasks to test CE-MOEA’s performance. Comparisons between OAEI’s participants and EA- and CE-MOEA-based matching techniques show that our proposed algorithm is able to effectively address various heterogeneous ontology matching problems and determine high-quality cybersecurity ontology alignments. The experimental results also show that the evolutionary paradigm is able to find better alignment than other artificial techniques and the weak cooperating framework is effective in further improving MOEA’s performance.

Although CE-MOEA-based aligning technique shows its superiority in the experiment, it is not able to detect the m:n correspondence; i.e., multiple source entities are mapped with multiple target entities, which is a common complex correspondence pattern. In addition, CE-MOEA is also not able to find other semantic relationships among the entities, such as the subsumption. The divide-and-conquer approach has been proved to be a viable method that can facilitate the effectiveness of matching process [52], and we are also interested in utilizing the clustering algorithm, such as graph clustering algorithm [53], to partition two CSOs, which can be of help to improve the efficiency of matching process [5456].

Data Availability

The data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

This work was supported by the National Natural Science Foundation of China (no. 62172095), the Natural Science Foundation of Fujian Province (nos. 2020J01875 and 2019J01771), and the Scientific Research Foundation of Fujian University of Technology (no. GY-Z17162).