Abstract

Artificial Internet of Things (AIoT) integrates Artificial Intelligence (AI) with the Internet of Things (IoT) to create the sensor network that can communicate and process data. To implement the communications and co-operations among intelligent systems on AIoT, it is necessary to annotate sensor data with the semantic meanings to overcome heterogeneity problem among different sensors, which requires the utilization of sensor ontology. Sensor ontology formally models the knowledge on AIoT by defining the concepts, the properties describing a concept, and the relationships between two concepts. Due to human’s subjectivity, a concept in different sensor ontologies could be defined with different terminologies and contexts, yielding the ontology heterogeneity problem. Thus, before using these ontologies, it is necessary to integrate their knowledge by finding the correspondences between their concepts, i.e., the so-called ontology matching. In this work, a novel sensor ontology matching framework is proposed, which aggregates three kinds of Concept Similarity Measures (CSMs) and an alignment extraction approach to determine the sensor ontology alignment. To ensure the quality of the alignments, we further propose a compact Particle Swarm Optimization algorithm (cPSO) to optimize the aggregating weights for the CSMs and a threshold for filtering the alignment. The experiment utilizes the Ontology Alignment Evaluation Initiative (OAEI)’s conference track and two pairs of real sensor ontologies to test cPSO’s performance. The experimental results show that the quality of the alignments obtained by cPSO statistically outperforms other state-of-the-art sensor ontology matching techniques.

1. Introduction

Internet of Things (IoT) [1] consists of interconnected things with built-in sensors, and Artificial IoT (AIoT) [2] further integrates Artificial Intelligence (AI) with IoT to create the sensor network that can communicate and process data. To implement the communications and co-operations among intelligent systems on AIoT, it is necessary to annotate sensor data with the semantic meanings to overcome heterogeneity problem among different sensors, which requires the utilization of sensor ontology [3]. Sensor ontology formally models the knowledge on AIoT by defining the concepts, the properties describing a concept, and the relationships between two concepts. Since sensor ontologies are regarded as the solution to data heterogeneity on AIoT, and in recent years, many sensor ontologies [4] have been developed. However, due to human’s subjectivity, the overlapping information in these ontologies could be defined with different terminologies and contexts, yielding the ontology heterogeneity problem. Therefore, before using them, it is necessary to integrate their knowledge by finding the correspondences between their concepts. Ontology matching can bring sensor ontologies into mutual agreement by automatic determining identical concept correspondences (i.e., ontology alignment), which is regarded as an effective technique to address the ontology heterogeneity problem.

Due to high computational complexity in the matching process, the Swarm Intelligence algorithm (SI) has become a popular methodology for integrating heterogeneous ontologies [59]. Martinez-Gil and Montes [10] propose the Genetics for Ontology Alignments (GOAL), which first generates a similarity matrix for each similarity measure, and then uses the Genetic Algorithm (GA) to optimize the weights for aggregating these matrices. Aggregating weights determined by GOAL can be reused to match the ontologies with similar heterogeneous features. Ginsca and Iftene [11] not only optimize the parameters in the matching process but also the threshold in the alignment filtering process. Acampora et al. [12] try to improve GA’s converging speed as well as the solution’s quality by introducing a local search strategy. Xue and Wang [13] propose a new metric to approximately measure the alignment’s f-measure [14], and on this basis, utilize the hybrid GA to execute the instance-level matching in the Linked Open Data cloud (LOD). More recently, He et al. [15] propose an Artificial Bee Colony algorithm (ABC) based matching technique to aggregate different similarity measures, which can improve the alignment’s quality. These SI-based matching techniques need to first store the similarity matrices determined by the similarity measures, which sharply increase the computational complexity. To this end, Genetic Algorithm based Ontology Matching (GAOM) [16] models the ontology matching as a bipartite graph matching process and tries to use GA to directly determine the alignment with high quality. Since the instance information can effectively improve the alignment’s precision value, Alves et al. [17] first propose an instance-based similarity measure and then utilize a hybrid GA to determine the optimal mappings. MapPSO [18] models the ontology matching as a bipartite graph matching problem, and it proposes to use the Particle Swarm Optimization algorithm (PSO) [19] to address it. MapPSO utilizes the statistical information of the alignment to approximately evaluate its quality and guide the algorithm’s search direction, which can automatically determine high-quality alignments. For dynamic applications on SSW, it is necessary to integrate the sensor ontologies online, and thus, besides the quality of the alignments, the matching efficiency is also of prime importance. Being inspired by the success of compact SI in various applications [2023], this work proposes a compact PSO (cPSO) to integrate the sensor knowledge in AIoT. Our proposal uses a probabilistic representation of the population to execute the optimizing process, which simulates the population behaviour as it extensively explores the decision space at the beginning of the optimization process and progressively focuses the search on the most promising genotypes and narrows the search radius. Thus, a run of cPSO requires much more limited memory consumption comparing to the standard PSO. In particular, we formally define the sensor ontology matching problem and propose a problem-specific cPSO to effectively address the problem and integrate the sensor knowledge inside.

The rest of the paper is organized as follows: Section 2 presents the concept similarity measures and the mathematical model of sensor ontology matching problem; Section 3 gives the details of cPSO; Section 4 shows the experimental results; and finally; Section 5 draws the conclusions.

2. Preliminaries

2.1. Concept Similarity Measure

Concept Similarity Measure (CMS) is a function that takes as input two concepts’ information, and output a real number in [0, 1] which represents their similarity value. In general, there are three kinds of CMS, i.e., string-based CMS, linguistic-based CMS, and structure-based CMS. In particular, string-based CMS takes as input two concepts’ labels and compares their syntax information, linguistic-based CMS also compares two concepts’ labels but it uses the external digital dictionary such as Wordnet [24] to calculate their similarity value, and structure-based CMS calculate the similarity value of two concepts based on their direct super-concepts and subconcepts.

Given two concepts and , we first remove the meaningless words (such as the stop word) from their labels and convert them into two token sets and , then the string-based similarity value is calculated as follows: where and are, respectively, the cardinalities of and . The first ratio indicates the overlap fraction of with respect to , the second one indicates with respect to , and the minimum value is selected as their string-based similarity value. The linguistic-based similarity value is defined as follows: where denotes the th token in and is the th token in , if they are synonymous in Wordnet, otherwise 0. Finally, supposing superC1 and superC2 are, respectively, the super-concept set of and , and are, respectively, the direct subconcept set of and , the structure similarity value is defined as follows: where denotes the th super-class of and the th super-class of , is the th direct subclass of and the th subclass of .

Since none of the CSMs can ensure the effectiveness in all context, i.e., distinguishing all the heterogeneous concepts, usually they are combined together to enhance the result’s confidence. Due to its flexibility, the weighted average strategy becomes a popular way of aggregating CSMs, which is defined as follows: where .

2.2. Alignment Extraction

Each aggregating weight set corresponds to a unique aggregated CSM, which can be further used to construct a similarity matrix whose element is the similarity value between the th concept of one ontology and the th concept of the other. On this basis, we can extract an alignment with the cardinality 1 : 1 (one concept from the source ontology is only mapped with one concept from the target ontology and vice versa) from according to the following steps: (1) sort all the similarity values in in descending order; (2) output the element with the largest value, say , as a concept correspondence () in the extracted alignment; (3) replace the elements that are in the same row or column with as 0; (4) repeat the steps (1) to (3) until all the elements in are 0. Figure 1 shows an example of the extracting process, where and are two ontologies, denotes the th ontology’s th concept. Finally, six correspondences are extracted: (), (), (), (), (), (). With respect to the last correspondence, since its similarity value is low, which is regarded as unauthentic, the final alignment consists of top five correspondences.

2.3. Sensor Ontology Matching Problem

Since the quality of an alignment is directly proportional to the mean similarity value of all the correspondences found and the cardinality of the alignment, we utilize the following equation to calculate an alignment ’s quality: where , , and are, respectively, the cardinalities of two ontologies , , and , is the th correspondence’s similarity value.

On this basis, the mathematical model of sensor ontology matching problem can be defined as follows: where represents the th similarity measure’s aggregating weight, and calculates the aggregating weight set ’s corresponding alignment’s quality.

3. Compact Particle Swarm Optimization Algorithm

PSO is inspired by the behaviour of birds, where each bird (particle) has the memory of the best-visited position and moves to a leading bird (elite particle) with some degree of randomization. This procedure can be described with the following update sequence, for the th particle in generation : where is the velocity, i.e., a perturbation vector, is the th particle’s position in current generation and is its best position visited in the history, is the best position found by all the particles, and , is the weight vector. Eq. (7) and Eq. (8) indicate that PSO update each particle by exchanging its gene values with both local best particle and global best particle to find a better position. Clearly, the original PSO is a population-based SI, and in this work, we further propose a compact version of PSO to improve the algorithm’s performance.

3.1. Encoding and Decoding Mechanism

This work uses a binary encoding mechanism, i.e., Gray code, and each particle’s gene values can be divided into two parts, one stands for the weight set for aggregating the similarity measures and the other for the similarity threshold for filtering the correspondences with low similarity values. Concerning the characteristics of the weights in Section 2.1, we normalize them when decoding. We utilize one Probability Vectors (PVs) to represent a population, whose element number is equal to the length of a particle. Each PV’s element represents the probability of being 1 corresponding to each gene bit of a particle. We can use PV to generate various binary particles through its probability in each dimension, and when each dimension value is closed to 1 or 0, the algorithm is about to converge. In addition, PV should be updated in each generation to move toward the elite, which can make the new particles generated in the next are more closed to the elite.

Figure 2 shows an example of generating a particle through PV. Given a PV (0.1, 0.3, 0.5, 0.9)T, generate four random numbers in [0, 1], e.g., 0.2, 0.4, 0.6, and 0.1, and we can determine a new particle by comparing them with PV’s elements accordingly. To be specific, since , , , and , the newly generated particle is 0001. In each generation, PV’s elements are updated according to the best particle found. If the bit value of the elite particle is 1 (or 0), the corresponding PV’s element will be increased (decreased) by step, which can make the newly generated particle more closer to the elite particle. For example, given a PV (0.1, 0.3, 0.5, 0.9)T, an elite particle 1110 and the step length , since the first-bit value of the elite particle is 1, accordingly, we update the first element of PV by step, which makes the first-bit value of newly generated particle is more likely to be 1 (the same with elite particle’s first-bit value). Therefore, after updating all elements of PV, the newly generated particles would be closer to the elite particle in terms of each bit value. When all elements of PV are 1 or 0, the newly generated particles will be the same and the algorithm converges.

3.2. Crossover Operator

Given two particles and , the crossover operator generates one offspring by exchanging their gene values. In this work, we generate the offspring particle by copying a sequential gene fragment of from into the corresponding gene bits of , so that inherits a sequential gene fragment from both and . For the sake of clarity, the pseudo-code of the crossover operator is shown in Algorithm 1.

particleoff=particle2
position=round(rand(0, particleoff.length)
whiledo
end while
return

4. The Pseudo-Code of Compact Particle Swarm Optimization Algorithm

The pseudo-code of cPSO is presented in Algorithm 2. cPSO first initializes the probability vector PV by setting all the elements as 0.5, which is then used to initialize the local best particle and global best particle . In each generation, cPSO first tries to update the through the crossover between it and a new particle, and then the by exchanging its gene values with . Finally, cPSO updates PV according to to move the new generated particle towards it.

1:
2: set generation
3: set all elements in as 0.5;
4: generate one particle through to initialize the local best particle and global best particle
5:
6: while
7: generate a new particle through
8:
9: ;
10: 
11: if
12:  
13: end if
14: 
15: 
16: 
17:  
18: end if
19: 
20: for
21:  ifthen
22:   
23:  else
24:   
25:  end if
26: end for
27: 
28: end while
29: return

5. Experimental Results and Analysis

5.1. Experimental Setup

In the experiment, we use the Ontology Alignment Evaluation Initiative (OAEI)‘s Conference track “http://http://oaei.ontologymatching.org/2019/conference/index.html” and two pairs of real sensor ontologies to test cPSO’s performance. The experiment compares cPSO with four state-of-the-art sensor ontology matching techniques, i.e., ASMOV [25], CODI [26], SOBOM [27], and FuzzyAlign [28], on all testing cases in terms of -measure. We empirically set cPSO’s crossover probability as 0.8 and maximum generation as 3000, and cPSO’s results in the tables are the average of thirty independent runs. In Table 1, we briefly describe the ontologies in these testing cases.

5.2. Statistical Comparison

We utilize two popular statistical testing methods, i.e., Friedman’s Test (FT) [29] and Holm’s Test (HT) [30], to compare different competitors’ performance. In particular, FT aims at checking whether there are differences among the competitors, and HT is further used to find whether one competitor statistically outperforms others. First, we need to reject HT’s null-hypothesis that all competitors’ performances are the same. To this end, the computed value must be equal to or greater than the tabled critical chi-square value at the specified level of significance . In this work, since we are comparing 5 matchers, the critical value for 4 degrees of freedom is 9.488.

In Table 2, since , which is greater than 9.488, and therefore, the null hypothesis is rejected. Then, HT is further carried out. Since cPSO ranks with the lowest value, it is set as a control matcher that will be compared with others. HT’s value is the testing statistic for comparing the th and th competitors, which is used for finding the value that is the corresponding probability from the table of the normal distribution. The value is then compared with , and according to Table 3, we can state that cPSO statistically outperforms other competitors on f-measure at 5% significance level.

6. Conclusion

AIoT aims at creating a sensor network that can communicate and process data, which can be technically implemented by using the sensor ontologies to annotate sensor data with the semantic meanings. To support the co-operations among AIoT applications based on ontologies, it is necessary to integrate these sensor ontologies by finding the alignment between them. In this work, a novel matching framework is proposed, which aggregates three kinds of CSMs and an alignment extraction approach to determine the ontology alignment. We propose a compact PSO and use it to optimize the aggregating weights for the CSMs and a threshold for filtering the alignment, which ensures the quality of the results. The experimental results show that our proposal can effectively match different sensor ontologies, and the quality of the alignments obtained by cPSO statistically outperforms other state-of-the-art sensor ontology matching techniques.

In the future, we will further improve cPSO to match the large-scale sensor ontologies, and address the problem of Instance Coreference Resolution (ICR) in the sensor network domain, which requires matching large-scale sensor instances in the Linked Open Data cloud (LOD). We also want to extend cPSO to match the ontologies in the specific domains such as the biomedical domain and geographical domain. The particular strategies and techniques need to proposed and used to improve the alignment’s precision and recall because these matching tasks require specific background knowledge base and complex forms of alignment.

Data Availability

The data used to support this study can be found in http://oaei.ontologymatching.org.

Conflicts of Interest

The authors declare that they have no conflicts of interest in the work.

Acknowledgments

This work is supported by the Fujian province undergraduate universities teaching reform research project (No. FBJG20190156), the 2018 Program for Outstanding Young Scientific Researcher in Fujian Province University, the Program for New Century Excellent Talents in Fujian Province University (No. GY-Z18155), the Scientific Research Foundation of Fujian University of Technology (No. GY-Z17162), the Science and Technology Planning Project in Fuzhou City (No. 2019-G-40), the Foreign Cooperation Project in Fujian Province (No. 2019I0019), and the Guangxi Key Laboratory of Automatic Detecting Technology and Instruments (No. YQ20206).