Abstract

Knowledge graphs (KGs) are one of the most widely used techniques of knowledge organizations and have been extensively used in many application fields related to artificial intelligence, for example, web search and recommendations. Entity alignment provides a useful tool for how to integrate multilingual KGs automatically. However, most of the existing studies evaluated ignore the abundant information of entity attributes except for entity relationships. This paper sets out to investigate cross-lingual entity alignment and proposes an iterative cotraining approach (CAREA) to train a pair of independent models. The two models can extract the attribute and the relation features of multilingual KGs, respectively. In each iteration, the two models alternate to predict a new set of potentially aligned entity pairs. Besides, this method further filters through the dynamic threshold value to enhance the two models’ supervision. Experimental results on three real-world datasets demonstrate the effectiveness and superiority of the proposed method. The CAREA model improves the performance with at least an absolute increase of 3.9 across all experiment datasets. The code is available at https://github.com/ChenBaiyang/CAREA.

1. Introduction

Knowledge graphs (KGs) that possess machine-readable representations of factual knowledge are becoming the basis for many applications such as web search (Google and Bing), recommendations (Amazon and eBay), and social networks (Facebook and Linkedin). Multilingual KGs (e.g., DBpedia [1], YAGO [2], and ConceptNet [3]) are constructed in separate languages from various data sources and contain a wealth of complementary facts. The bridging of language gaps and the improvement of user experience from downstream cross-language applications benefit a lot from the entity equivalent in multilingual KGs. Hence, aligning the entities in multilingual KGs has recently attracted an increasing amount of research attention and is called the problem of cross-lingual entity alignment.

Most existing entity alignment methods entirely rely on the graph structures, while the abundant attribute information in KGs remains unexplored. The attributes of an entity represented by different languages often share enormous semantic information, leading to a potentially valid view of the entities connected to multilanguage KGs. However, it is nontrivial to capture and make use of such information for cross-lingual entity alignment. First, attribute information can be quite diverse across different KGs. The most likely cause of the differences is that there exist distinct attribute concerns in the process of developing applications. Second, the semantic association of attributes cannot be modeled directly since the critical entity expression languages are different. Moreover, the simultaneous use of relationships and attributes across multilingual KGs is a near term challenge in the area of knowledge graphs.

Cotraining is a popular machine learning method, where two complementary models utilize a large number of unlabeled examples to bootstrap the performance of each other iteratively [4, 5]. Cotraining can be readily applied to multilingual tasks since the data in these tasks have two or more views (i.e., a subset of features). It is also applicable to employ cotraining to the task of entity alignment across multilingual KGs, as the entity attributes and graph structure information naturally form two independent views of a KG. In the cotraining framework, each model is trained on one of the two views, under the assumption that either view is sufficient to make a prediction. In each iteration, the cotraining algorithm selects high-confidence samples ranked by each of the models to form new auto-labeled data samples and then uses both labeled data and additional auto-labeled data to update the other model.

This paper introduces a cotraining based approach CAREA to learn embeddings from two independent views of knowledge (relationships and attributes) in multilingual KGs. CAREA iteratively trains two-component models that are called attribute-based model and structure-based model , respectively. extracts the attribute features according to attribute occurrence frequencies and value data types, which also employs a Multilayer Perceptron (MLP) to transform both KGs into a unified vector space. On the other hand, adopts a graph attention mechanism to capture the multirelation characteristics of KGs. During each iteration of the cotraining process, both models alternately predict a set of new potential aligned entity pairs to strengthen the supervision of cross-lingual learning. Such collaborative predictions gradually improve the performance of each model. To improve the accuracy of aligned entity prediction, we further evaluate the predicted entity pairs through a dynamic threshold. Experimental results on three real-world datasets demonstrate the effectiveness and superiority of our proposed method CAREA.

The rest of this paper is organized as follows. Section 2 summarizes the related works. Section 3 formally defines the research question. Section 4 introduces the proposed approach. Section 5 presents the experimental results. Finally, we conclude this work in Section 6.

2.1. KG Embedding

Embedding-based entity analysis approaches have demonstrated their effectiveness in modeling the semantic information of KGs, which aim to project entities into low-dimensional embedding spaces. The KG embedding model TransE [6] interprets a relation as the translation from an entity to another. Such KG embedding models using the translations have shown their feasibility and later been improved by several subsequent studies, such as TransH [7], TransR [8], and TransD [9].

TransH and TransR update the modeling of multimapping relations of TransE from one to many. TransD uses a dynamic matrix to transfer entities and relations rather than a fixed one. R-GCN [10] is a similar model that incorporates relation type information by setting a transformation matrix for each relation. Some authors consider avoiding the use of translation approaches for KG embedding, including [1115]. The perfect example is shown in the study of Nathani et al. [15], which extended the graph attention mechanism (GAT) to capture entity and relationship features in the multihop neighbors of a given entity. Some research below makes use of the additional information in KGs to improve embedding performance. For example, reverse triples and relational paths are combined in PTransE [16]. The categorical attributes such as gender and hobby are introduced in KR-EAR [17]. In addition, some works explore the type, local structure, and global patterns in KG embedding [10, 1820].

2.2. Entity Alignment

Entity alignment aims to automatically determine whether an entity pair in different KGs refers to the same entity in reality. Traditional entity alignment methods take advantage of various features of KGs, such as the semantics of OWL properties [21], compatible neighbors and attribute values of entities [22], and the relation structures [23].

Many recent studies have used embedding methods to deal with the alignment problem in KGs. MTransE [24] deploys three mechanisms, distance-based axis alignment, translation vectors, and linear transformations to learn multilanguage KG embeddings. An improved model IPTransE [25] combines the advantages of TransE and PTransE to embed a single KG. Then, an iterative and parameter sharing step is added in IPTransE for various KGs embedding. BootEA [26] improves on JAPE [27] by using the bootstrapping strategy, which provides an iterative data labeling method. Accordingly, the constructed training data for potential entity alignment can be used to learn entity alignment-oriented embedding. MuGNN [28] learns alignment-oriented KG embeddings by a multichannel mechanism that encodes KGs via KG completion and entity pruning. NAEA [29] merges neighborhood subgraph-level information of entities and designs a neighborhood-aware attention representation mechanism on cross-lingual KGs. RDGCN [30] proposes a relation-aware dual-graph convolutional network to leverage relations through attentive interactions between the KG and its dual relation counterpart. MRAEA [31] learns cross-lingual entity embeddings by attending over the entity’s neighbors and the meta semantics of its connecting relations.

Some literature on cross-lingual entity alignments has highlighted the role of both KG structures and attributes. JAPE [27] embeds the structures of different KGs into a uniform hidden space and uses the attribute correlation of KGs to realize the refinement of entity embedding. However, attribute components can significantly degrade the performance of JAPE’s structural components when attributes are heterogeneous or have a confused association between the attributes. Graph convolutional networks (GCNs) [32] are also employed in the study [33] to learn embeddings from both the structure and attribute information of entities for cross-lingual alignment.

3. Problem Definition

In a KG, facts are mainly stored in two types of triples entity, attribute, value and entity, relation, entity, which are called attribute triple and relation triple, respectively. This paper denotes a KG as , where is the set of entities, is the set of relations, and represents the set of attributes in the KG. Each attribute of an entity consists of a set of key-value pairs.

Definition 1. Cross-lingual entity alignment: let and be two arbitrary KGs in different languages. The entity pairs that refer to the same real-world object are called prealigned entities, denoted as . The task of cross-lingual entity alignment is to find hidden aligned entity pairs based on the prealigned pairs L.

4. Proposed Approach

4.1. Overview

The details of the proposed model CAREA taken in this section are based on the cotraining algorithm. Its framework is shown in Figure 1.

We construct two independent models: attribute-based model and relation-based model . The advantage of the cotraining algorithm is that it reinforces the performance of the two models in the process of iterations. Both models are retrained with the prealigned entity pairs and predict the new pairs of potential aligned entities at each iteration. Subsequently, dynamic thresholds are used to filter the anticipated results further. The method merges the remaining entity pairs into for the next iteration until convergence.

4.2. Attribute-Based Model

In our scenario, the attributes of an entity consist of a number of key-value pairs, for example, name:Michael, where “name” is the attribute key, and “Michael” is the attribute value. For simplicity, a key-value pair is also called an attribute.

4.2.1. Attribute Extension

A critical problem of attribute representation is that some actual attributes may not be observed since they are not explicitly built or captured by the crawlers. Therefore, we first extend the attributes of both KGs by using the prealigned entity pairs. Typically, given a couple of aligned entities, if one entity has an attribute in a KG, the other KG’s corresponding entity also has this attribute. Based on such an observation, we can add a key-value pair to one entity in a KG if its counterpart in the other KG has this key-value pair. Formally, the attributes of an entity are denoted by , where is a key-value pair. For each entity in KG 1, its attribute can be extended to by

Similarly, its counterpart in KG 2 can be extended into .

4.2.2. Attribute Feature Representation

In multilingual KGs, the attributes are in different languages and cannot be directly compared. However, our observation shows the following:(1)The occurrence frequencies of equivalent attribute pairs, that is, attribute keys, in multilingual KGs are approximately similar. For example, an entity representing a person in different KGs often has some equivalent attributes such as name, date of birth, and nationality. Although the texts that describe these attributes are multilingual, their frequency in different KGs is similar to the ratio of person entities to all KG entities.(2)The values of an equivalent attribute pair in different KGs has the same data type. For example, both the English word “Michael” and the Chinese word “Mai Ke” are strings, and both “3.14” and “3.14159” are floating numbers.

Hence, this study represents the attributes of an entity by its attribute key frequencies and attribute value types. The description of entity attribute features can be illustrated briefly by a concrete example shown in Figure 2.

First, the attribute triples in each KG are merged into a set of key-value pairs, where the keys and values are then used to represent the frequency and the type features, respectively. The frequency of an attribute is a floating number ranging from 0 to 1, calculated as follows:where is the occurrence number of attribute in a KG and is the total number of entities in the KG. In this example, the frequencies of entity “Michael”’s nationality and birthdate are 0.2162 and 0.3351, respectively.

Second, we divide the frequency range (i.e., the interval of [0, 1]) into a sequence of small real intervals. Its frequency interval number can represent an attribute. In this paper, a proportional sequence is applied to split the frequency range. The interval for an attribute can be computed bywhere is the proportionality constant, and we fix in this paper. is the least frequency of occurrence of an attribute in a KG. For example, “Nationality” and “Guo Ji” in Figure 2 are both in interval 2, although their frequencies are different. The parameter setting makes the interval more robust to small changes caused by noise, especially when more different frequency attributes are merged into one interval.

The value type of an attribute is its data type. Following previous work [27], this study distinguishes four kinds of data types, that is, Integer, Double, DateTime, and String. We encode the value type by a one-hot vector with the same dimension to the number of data types. For example, The codes for the attribute value “America” and “1958-08-29” are and , respectively.

As explained in the above two steps, it is clear that the primary ideal of attribute feature representation is to integrate the representations of the frequency and the type of an attribute . We combine the two representations into a sparse matrix as shown in Figure 2. Each row in the matrix denotes the value type, and the row index is the frequency interval number. On top of that, we reshape this matrix into a row feature vector . In this way, the attributes for an entity can be formed by the sum of its every attribute vectors as

To reduce noise, we use an indicator function to transform the attribute vector of an entity into the following binary representation:

The binary representation is averaged by the entity’s neighbors aswhere denotes the neighboring entities of . Then, a three-layer MLP transforms the attribute vectors of the two KGs into a uniform vector space, making the equivalent entities in different KGs close to each other. The MLP output is considered as the embedding of an entity, which is represented as . We use ReLU as an activation function in this paper. Batch normalization and dropout are added to increase performance. The details of the objective function are introduced in Section 4.4.

4.3. Relationship-Based Model

In KGs, there are various types of relations describing the role of entity associations that are crucial to aligning entities across KGs. Many previous works represent a relation by a transformation of the relationship connected entities. However, these methods bring the relationship too close to the entity [31]. Therefore, it will be difficult to capture the features of multiple relationships. Hence, this paper represents the entities and relations separately. Their combinations are adopted as the inputs to a graphical attention network (GAT) [34]. As a result, the two KGs are embedded into a unified vector space so that the equivalent entities in different KGs are close to each other. This study treats relations as undirected; that is, is equivalent to .

The idea of GAT is to calculate each entity’s hidden representations in two KGs by focusing on their entity neighbors. GAT follows a self-attention strategy in its learning process. First, the embedding of each entity and its connected relation are randomly initialized. This study sets the embedding dimension of entities and relations as the same . Second, we average the entity with its neighbors. Then, the entity embedding and the averaged embedding of entity connections are concatenated as the input to the GAT network aswhere represents ’s neighboring entities, represents the set of relations that are outward from , and notation represents the concatenate operation. The attention coefficients can be calculated bywhere indicates the weighted importance of neighboring to and is the shared attention weight vector.

Different from the original GAT, there is no weight matrix for the input feature in equation (8). In this study, all adjacent entities are normalized using softmax function and LeakyReLU nonlinearity with negative input slope . Such normalization makes the coefficients between different nodes easy to compare, which can be denoted by

Nonlinear ReLU is applied to the combination of participating neighbors. The operation yields the output features of each entity:

The stability of the training process is prepared by adopting a multihead mechanism. Specifically, independent heads of attention execute the transformation of equation (10). Then, the averaged features result in the following output:where is the indicator of heads and represents the attention coefficient in the -th head. This study also expands the attention mechanism to multihop neighboring level information by adding more layers, thus creating a more global-aware representation of the KG. Let be the output features of from 0-th (input features) to -th layer. We concatenate them together to obtain the final output features of entity as

4.4. Objective Function

As was mentioned in the previous section, the two models both provide embeddings of the entities for two KGs from different views. This section uses the same objective function to optimize both of them. Following the previous work [33], Manhattan Distance is employed to be the similarity measure. The similarity of and in the joint vector space can be calculated by

All similar entities in should be calculated using the same method to find the entity ’s counterpart. The nearest one is chosen as ’s equivalent. On top of that, we adopt the following margin-based loss function since it ensures the balance between positive and negative samples and ensures the lower scores for positive ones; that is,where represents and is a hyperparameter of margin. and are the negative counterpart of and , respectively. In this work, the entities in and are randomly selected as negative counterparts. Adam [35] is adopted to minimize the loss function.

4.5. Cotraining Algorithm

In this study, the cotraining process of the attribute-based model and relation-based model is conducted iteratively. Two components alternately take turns to train and predict new potential aligned entity pairs at each iteration until either of the two parts no longer obtain new pairs. Such a prediction is based on the cosine similarity of entities in the united vector space. A new pair sourced from a KG is suggested by searching the nearest neighbor (NN) in the other KG. It is worth noting that, in most cases, the NNs are asymmetric. For example, although in is the most similar entity to in , there may be another entity in G(2) closer to . Thus, the newly predicted entity pairs should be bidirectional nearest neighbors.

4.5.1. Dynamic Similarity Threshold

We further evaluate the predicted potential aligned entity pairs by dynamically adjusting the threshold in each iteration. That is to say, only the entity pairs, whose cosine similarity falls within a certain threshold , are populated into the aligned pair set . As higher similarity threshold implies higher precision, we set higher thresholds for earlier iterations. However, it may also limit the capability of the model to propose a sufficient number of aligned entity pairs. Thus, lower thresholds are taken for later iterations. The design of the threshold function is alternative. In this paper, we design a linear threshold function aswhere is the initial threshold, is the iteration number, and is the coefficient controlling the changing rate of each iteration. In order to control the precision of each component model, we set different threshold parameters for different models in our experiments. The detailed cotraining procedure of CAREA is given in Algorithm 1.

Input: Two KGs to be aligned and , a set of prealigned entity pairs, and the parameters of threshold function.
Output: The parameters of and .
(1)Initial iteration number ;
(2)repeat
(3) Reinitialize and ;
(4) Train based on ;
(5);
(6)
(7);
(8)
(9) Train based on ;
(10);
(11);
(12);
(13)
(14)
(15)until;

5. Experiments

5.1. Datasets

This section applies a popular public dataset DBP-15K [27] for entity alignment to evaluate the performance of the approach CAREA. DBP-15 K contains three cross-lingual subsets built from DBpedia: , , and . Each of the three subsets contains two KGs in different languages, for example, for French and English. Their statistics are displayed in Table 1.

5.2. Experiment Settings

Following the previous work [29], we adopt two evaluation metrics: (1) : the proportion of correctly aligned entities ranked in the top . (2) Mean Reciprocal Rank (MRR): the average of the reciprocal ranks of results. Higher and MRR scores indicate better alignment performance. The two metrics can be calculated as follows:where is the position of tested entity pair in the returned list, is the set of tested entity pairs, and and are adopted in our experiments.

This research is compared with other baselines by the same evaluation metrics and the dataset’s splitting method. The experiment randomly splits 30% of the prealigned entity pairs as training data, while the rest 70% for testing. The average score of both alignment directions (e.g., ZHEN vs. ENZH) is reported by considering the asymmetric of the nearest relation across KGs. Each experiment instance is run five times independently. Their average performances are considered as the final results. The same settings for experiment models are applied unless otherwise stated. The hidden dimensions for attributes, entities, and relations are the same: . The margin parameter , dropout rate, and learning rate of Adam are 3, 0.3, and 0.005, respectively. For the relation-based model , we fix the number of attention heads and GAT layer’s depth . For the cotraining process, we take the results of CAREA’s third iteration as its final performance. On the other hand, the threshold coefficients for both model components are the same as . The initial thresholds for and are empirically set to 0.95 and 0.9, respectively.

5.3. Baselines

To demonstrate the advantage of our method, we compared it with the following baselines:(i)MtransE [24]: MtransE is a structure-based model for multilingual KG embeddings, to provide a simple and automated solution. The model characterizes monolingual relations and deploys three different techniques to represent cross-lingual transitions, namely, axis calibration, translation vectors, and linear transformations.(ii)JAPE [27]: JAPE is an attribute-preserving embedding model that incorporates the relation and attribute embeddings for entity alignment.(iii)GCN-Align [33]: GCN-Align employs GCNs to learn embeddings from both the structure and attribute information of entities for cross-lingual KG alignment.(iv)BootEA [26]: BootEA adopts a bootstrapping strategy, which iteratively labels potential entity alignments as training data and leverages it for learning alignment-oriented embeddings.(v)MuGNN [28]: MuGNN learns alignment-oriented KG embeddings by robustly encoding two KGs via KG completion and entities pruning.(vi)NAEA [29]: NAEA incorporates neighborhood subgraph-level information of entities and designs a neighborhood-aware attentional representation mechanism on multilingual KGs.

The performances of the above baselines come from the reported results in their papers. We also evaluate the effectiveness of the component models of our approach, including the following:(i)Attribute-based model: the model, denoted as CAREA-a, ignores structure embedding components to assess the effect of the attribute embedding. In other words, the attribute features are only used to align entities without a cotraining strategy.(ii)Structure-based model: we also estimate the component’s performance of network structure embedding, which ignores the attribute features except for the structure ones to align entities. The model is similarly denoted as CAREA-s.

5.4. Experiment Results
5.4.1. Overall Performance

All the comparison methods can be treated under two groups according to feature categories. One is purely based on KG structures, including MTransE, BootEA, MuGNN, NAEA, and CAREA-s. The other leverages both entity attributes and relations for entity alignment, including JAPE, GCN-Align, and CAREA. Table 2 summarizes the overall results of all compared methods on the three datasets.

In the structure-based group, our model CAREA-s performs better than MTransE by at least 27.3% in terms of on three datasets, which also resulted in a greater score than MuGNN by at least 8.4%. The comparison results have demonstrated the effectiveness of our structure-based approach. Further tests in other group revealed that CAREA outperforms JAPE and GCN-Align by at least 28.6% across all datasets on . The result demonstrates our approach’s superiority by leveraging both entity attributes and KG structures for entity alignment. Finally, CAREA ranked the best over all the competing approaches across all datasets. For example, the performance of CAREA is better than NAEA and BootEA by at least 3.9% and 6.0% in terms of , respectively.

Another proposed component model CAREA-a is excluded from the above comparisons since it is the only one that relies solely on attribute information to align entities. It achieves lower scores. For example, when CAREA-a were stimulated with and on , lower scores of 22.1% and 51.8% were reported. This is mainly because of the heterogeneity across multilingual KGs or the KGs may not be explicitly constructed or captured by the crawlers. Although CAREA-a does not perform as well as that of structure-based approaches, it provides another view for entity alignment and improves the performance of our method in the KG alignment task.

5.4.2. Effects of Cotraining Algorithm

This part confirms the achievements of the approach CAREA by presenting each iteration of the cotraining process. The result is shown in Figure 3. Its trend reveals that there has been a gradually similar increase in terms of all evaluation metrics, which is verified on all the three datasets of our two-component models. The iterative cotraining algorithm significantly improves the performance, with at least an absolute increase of 10.5% across all experiment datasets. Both the attribute model and the structure model get enhanced from each iteration. After 3 to 4 iterations, the component model performance becomes stable.

5.4.3. Parameter Sensitivity Analysis

This part investigates the parameter sensitivity of the proposed CAREA on three primary parameters: (1) the proportion of the prealigned entity pairs, (2) the feature dimension , and (3) the margin parameter of the proposed objective function.Sensitivity to data proportions. We run CAREA by the training proportions from 10% to 50% with a step size of 10%. Figure 4 illustrates the change of concerning different proportions. The shown results on all the datasets become better with the proportion increase following our expectations. The amount of data is a significant factor that more training data can provide more extended information to overlay the cross-lingual KGs. Figure 4 shows that CAREA performed encouragingly when using only 10% of the aligned entities as training data. For example, and on are 56.2% and 82.4%, respectively. Therefore, CAREA is expected to be well adapted to annotate constrained scenarios.Sensitivity to the feature dimension : Figure 5 depicts the sensitivity of the model performance on different feature dimensions. The model performance of CAREA keeps stable with the data dimension increases in all datasets. It is proved that a high-dimensional feature space is helpful to preserve entity information and improve the entity alignment performance. A larger dimension necessarily consumes more computing resources. Hence, we chose to weigh efficiency against effectiveness.Sensitivity to the margin : the model performance produced by setting different margin parameter (from 1 to 4) of the objective function is shown in Figure 6. The performances become steady when with at most 2.5% range from all datasets. Therefore, CAREA can keep stable when varies within a reasonable range.

6. Conclusion and Future Work

The purpose of the present research was to investigate the cross-lingual entity alignment problem in KGs. This study has constructed a cotraining based approach CAREA to learn entity embeddings from two independent views of knowledge (relationships and attributes). CAREA is innovatively constructed as a two-component model and , which can extract the attribute and the relation information, respectively. In each iteration, both models alternately take turns of the train-and-predict process, which gradually improves each model’s performance. Experiments on three popular datasets confirm the effectiveness and superiority of CAREA on the entity alignment task. The insights of model construction gained from this study may be of assistance to complex multilanguage and cross-domain knowledge organization and analysis. Future work seeks to extend the CAREA method to other applications, such as link prediction, information extraction, and entity classification.

Data Availability

The data used to support the findings of this study are publicly available at https://github.com/ChenBaiyang/CAREA or https://github.com/nju-websoft/JAPE.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

This work was supported by the Social Science Planning Project of Sichuan Province (no. SC20TJ020), the National Natural Science Foundation (nos. 61902324, 11426179, and 61872298), the Science and Technology Program of Sichuan Province (nos. 2020JDRC0067, 2016JY0244, 2017JQ0059, 2019GFW131, 2020JY, and 2020GFW), the Innovation Fund of Postgraduate, Xihua University (nos. ycjj2019021 and ycjj2020023 20170410143123), the Fund Project of Chengdu Science and Technology Bureau (no. 2017-RK00-00026-ZF), the Scientific Research Fund of Sichuan Provincial Education Committee (nos. 15ZB0134 and 17ZA0360), the Foundation of Cyberspace Security Key Laboratory of Sichuan Higher Education Institutions (no. sjzz2016-73), the Sichuan Youth Science and Technology Innovation Research Team (no. 2021X), and the Open Fund Project of Xihua University (no. 20170410143123). Xianyong Li is gratefully acknowledged for his discussion with the authors.