Abstract
Recently, knowledge graph embedding methods have attracted numerous researchers’ interest due to their outstanding effectiveness and robustness in knowledge representation. However, there are still some limitations in the existing methods. On the one hand, translationbased representation models focus on conceiving translation principles to represent knowledge from a global perspective, while they fail to learn various types of relational facts discriminatively. It is prone to make the entity congestion of complex relational facts in the embedding space reducing the precision of representation vectors associating with entities. On the other hand, parallel subgraphs extracted from the original graph are used to learn local relational facts discriminatively. However, it probably causes the relational fact damage of the original knowledge graph to some degree during the subgraph extraction. Thus, previous methods are unable to learn local and global knowledge representation uniformly. To that end, we propose a multiview translation learning model, named MvTransE, which learns relational facts from globalview and localview perspectives, respectively. Specifically, we first construct multiple parallel subgraphs from an original knowledge graph by considering entity semantic and structural features simultaneously. Then, we embed the original graph and construct subgraphs into the corresponding global and local feature spaces. Finally, we propose a multiview fusion strategy to integrate multiview representations of relational facts. Extensive experiments on four public datasets demonstrate the superiority of our model in knowledge graph representation tasks compared to stateoftheart methods.
1. Introduction
Knowledge graphs [1] are a sort of directed graphs consisting of entities as nodes and relations between entities as edges. And each relational fact of knowledge graphs is stored as a triplet (head, relation, tail), abbr. (h, r, t), where h and t represent head and tail entities, respectively, and r is a relationship from h to t. With the advent of the big data era, the scale of knowledge graph continues to grow; diverse largescale knowledge graphs (e.g., WordNet [2] and Freebase [3]) have appeared. Despite the large scale of current knowledge graphs, they are still far from the knowledge completeness. For example, 75% of people in Freebase lack nationality information and 71% lack birthplace [4]. Therefore, it is necessary to design approaches to automatically complete or infer missing relational facts of the existing knowledge graph.
Recently, embeddingbased approaches present strong feasibility and robustness in terms of knowledge graph completion, which project entities and relations of knowledge graphs into a dense, continuous, and lowdimensional vector space. Among the existing approaches, translationbased approaches have attracted numerous researchers’ interests due to the outstanding effectiveness and robustness in the knowledge representation. The first translationbased method was proposed by Bordes et al., named TransE [5]. For each triplet (h, r, t), TransE treats a relation r as a translation operation from h to t in a vector space. If (h, r, t) holds, the translation principle should be satisfied in the vector space, where h, r, and t are vector representations of h, r, and t. TransE is a simple yet effective translation model in processing 1to1 simple relational facts, which stands for each single head entity connecting only one tail entity via a specific relation. It achieves stateoftheart performance on link prediction. However, the translation principle is too rigid to deal with complex relational facts, including 1toN, Nto1, and NtoN facts. Technically, it may cause spatial congestion of entities when many head entities (or tail entities) are projecting at only one point of the vector space.
To eliminate the weakness of TransE in representing complex relational facts, a series of improved models were proposed, such as TransH [6], TransR [7], TransD [8], and TranSparse [9]. Essentially, the above methods focused on designing various translation principles to learn complex relational facts more precisely. However, they still embed a complete knowledge graph into a single vector space from a global perspective while they fail to learn the various type of relational facts discriminatively. That is, each entity and relation of knowledge graphs are learned as corresponding unique representing vectors in their spaces. Hence, there are still vector congestions in the translation spaces. For a real world example, the head entities of the triples (Obama, President, the United States) and (Trump, President, the United States), i.e., Obama and Trump, are projected closely in the vector space due to the same social status. But, they are quite different in other perspectives.
To solve this problem, puTransE [10] splits knowledge graphs into multiple parallel spaces in the form of subgraphs from a local perspective and achieves the spatial sparsification of complex relational facts. Specifically in puTransE, entities and relations have respective feature representations in different parallel spaces. This approach improves the ability to learn complex relational facts due to avoiding the spatial congestion of entities in complex relational facts. However, puTransE still has two shortcomings. First, puTransE poses excessive sparseness to simple relational facts during the parallel space generation. Consequently, it hardly learns the complete vector representations of simple relational facts within a single subspace. This is because puTransE performs the spatial sparsification not only to complex relational facts but also to simple ones. Second, puTransE randomly selects local knowledge to construct multiple parallel spaces, which is prone to impair relational facts of the original knowledge graph. For example, there is a golden triplet in the original graph that cannot be consisted by entities and relations in any parallel spaces.
In summary, all of the above methods embed knowledge graph from a single perspective, i.e., from a localview or globalview. Thus, they fail to learn local and global knowledge representation uniformly. To that end, we borrow the idea of multiview learning methods [11, 12] to propose a multiview translation learning model, named MvTransE, which embeds relational facts from globalview and localview, concurrently. In detail, we first generate multiple parallel subgraphs through semantic and structural perspectives of entities to accurately capture localview knowledge of the knowledge graph. Then, the original knowledge graph and generated parallel subgraphs are embedded into globalview and localview spaces, respectively. Finally, we propose a multiview fusion strategy to integrate multiview representations of relational facts. We outline the main contributions of this paper as follows:(1)We incorporate the idea of multiview learning into our model MvTransE, which can precisely learn relational facts from both global and local views(2)Our model extracts local knowledge from semantic and structural perspectives to construct multiple parallel subgraphs so as to solve the entity spatial congestion problem by learning localview representations of relational facts(3)MvTransE applies a multiview fusion strategy to combine globalview and localview representations of the knowledge graph, which effectively overcomes the missing of relational facts in parallel spaces(4)Extensive experiment results demonstrate our method outperforms stateoftheart models in two knowledge graph completion tasks
2. Related Work
Since the appearance of knowledge graphs, numerous researchers have studied various methods to represent relational facts of the graphs. Initially, some embeddingbased models such as Structured Embedding (SE) [13], Semantic Matching Energy (SME) [14, 15], Latent Factor Model (LFM) [16], and Neural Tensor Network (NTN) [17] achieved considerable performance in knowledge representation, while fail to cope with largescale graphs due to the computing complexity. Recently, translationbased models have attracted lots of attention due to their effective and robust representation abilities.
TransE [5] is the first translationbased method, which treats a relation r as a translation from h to t for a triplet (h, r, t). Hence, TransE defines the scoring function as , where ‖·‖_{l1/l2} stands for norm1 or norm2 computation. During the model training, if a triplet (h, r, t) holds, the translation principle should be satisfied in the vector space, of which process is illustrated in Figure 1. That is, TransE keeps translation vectors () approximate to tail vector t. TransE achieves remarkable performance in representing simple relational facts, i.e., 1to1 triplets. However, it has limitations in dealing with complex relationship facts including 1toN, Nto1, and NtoN triplets due to the rigid translation principle.
TransH [6] tries to solve the problem of TransE by implementing an entity to have unique representations when the entity is involved in different relations. Specifically, for each triplet (h, r, t), TransH projects h and t to a relation r specific hyperplane to obtain projected vectors and . The scoring function is defined as _{.}
TransR/CTransR [7] models entities and relations in different vector spaces, respectively, i.e., the entity vector space and the relation vector space. For each relation r, it set a projection matrix M_{r} to map the vector of entities from the entity vector space to the relation vector space, i.e., and . Its scoring function is .
TransD [8] considers the diversity of entities and relations simultaneously. It uses the product of two vectors of an entityrelation pair to replace a projection matrix, i.e., and . TransD is more extensible and can be applied to the largescale knowledge graphs. Its scoring function is .
TranSparse [9] considers the heterogeneity and imbalance of entities and relations in a knowledge graph, which are generally ignored by previous works. TranSparse constructs adaptive sparse matrices and , instead of projection matrices, to concurrently prevent the overfitting of simple relational facts and the underfitting of complex relational facts. Its scoring function is .
FT [18] and DT [19] design the flexible translation principles and the dynamic translation principles, respectively. To some extent, they improve the ability to handle complex relational facts. Essentially, they focus on elaborating various translation principles to learn complex relational facts more accurately. TransAt [17] and GANbased framework [20] use attention mechanisms and generate adversarial networks to improve the model performance, respectively. However, all of the above methods naturally embed the complete knowledge graph into a uniform vector space from a single perspective, failing to solve the space congestion problem thoroughly.
PuTransE [21] is an online and robust improvement of TransE solving the hyperparameters sensitivity problem and the spatial congestion of entity and relation, as well as the processing of dynamic knowledge graphs. It adopts multiple parallel spaces to learn the vectors of entities and relations, thus avoiding spatial congestion in complex relational facts. Therefore, puTransE achieves stateoftheart performance on the link prediction task. However, puTransE still has two weaknesses resulting in the performance limitation. First, puTransE causes the excessive sparseness of simple relational facts during the random parallel space generation. It performs knowledge extraction including not only complex relational facts but also sparse simple relational facts. Thus, it probably cannot learn complete vector representations of simple relational facts via a single subgraph. Second, puTransE randomly selects local knowledge to construct multiple parallel spaces, which is prone to impair original facts of knowledge graphs. For example, there is a golden triplet, i.e., a positive sample, in the original graph which cannot be consisted by entities and relation in any parallel spaces. This situation decreases the relational fact prediction accuracy of puTransE.
3. Our Method
In this section, we introduce the details of MvTransE, which embeds relational facts from globalview and localview, respectively. The workflow of MvTransE mainly consists of three steps. The first step (Section 3.1) is to generate multiple parallel subgraphs so as to extract particular local relational facts of the knowledge graph accurately. The second step (Section 3.2) aims at discriminatively embedding the knowledge graph into multiple parallel spaces to acquire multiview representations of relational facts. The last step (Section 3.3) integrates multiple versions of knowledge representations, i.e., fuses localview and globalview representations of entities and relations. Figure 2 presents a multiview knowledge learning process of MvTransE.
3.1. Subgraph Generation
The subgraph generation aims to extract local relational facts from different perspectives, so as to solve spatial congestion of entities by sparsely embedding entities and relations into different parallel vector spaces in the following graph embedding step. Therefore, we construct multiple parallel subgraphs based on different relations of the knowledge graph. Each subgraph mainly contains the local relational facts selected from a specific relation.
Initially, we give the definition of some related symbols in the subgraph generation process. We define a knowledge graph as G = (E, R, T), where E and R denote an entity set and a relation set of graph G, respectively; T ⊆ E × R × E represents a triplet set of G. And _{sub} is the final generated subgraph set, is a subgraph that ∈ , and and represent an entity set and a relation set of , respectively. Algorithm 1 demonstrates the details of the subgraph generation, which mainly consists of two steps given in the following.
3.1.1. SemanticsRelated Entity Selection
To accurately learn local relational facts, we first select relation relevant entities for a subgraph to ensure the semantic consistency of its knowledge as much as possible. That is, entities in a subgraph should be semantically related to each other based on a specific relation. We randomly select a relation r from relation set R then to generate entity set E_{r}, which consists of extracted entities interconnecting with r. Since E_{r} is generated via relation r; therefore, r is deemed as the semantic center of the current subgraph ^{.}
3.1.2. StructureRelated Subgraph Expansion
In order to learn latent knowledge associating with r more comprehensively, we need to expand each subgraph according to the local graph structure of entities in the set E_{r}. This step ensures a generated subgraph containing semantic and structural features of local relational facts simultaneously. Specifically, we first randomly select an entity e_{i} from E_{r} as a starting entity to expand the subgraph . And then, we randomly select a triplet whose head or tail entity is the starting entity e_{i} and add the head or tail entity to the subgraph . Consequently, we can get the local structure information of E_{r} regarding G by repeating the above two operations for n_{s} time. Due to the randomly selected entities and triplets in the subgraph expansion, each subgraph expanded from E_{r} may include different relational facts, which makes generated subgraphs slightly different from each other in terms of semantics and structure. Thereafter, with respect to learning local relational facts discriminately, these random operations ensure MvTransE can learn local knowledge representations from multiple perspectives.

Besides, to make each subgraph focusing on particular relational facts, we need to control the scale of subgraphs for avoiding extracting excessive irrelevant facts. We set hyperparameter n_{t} to control the number of triplets in subgraphs, set hyperparameter n_{s} to control the expansion speed of each subgraph, set hyperparameter to control the maximum iterations of the triplet selection, and set hyperparameter n to control the number of generated subgraphs.
3.2. Graph Embedding
The goal of this step is to obtain globalview representation and localview representation of entities and relations. Hence, we perform original knowledge graph G embedding and subgraphs embedding to learn global knowledge and local knowledge, respectively. In each vector space, we define the following equation as the scoring function f_{r} (h, t) to translate each triplet (h, r, t):where h, r, and t are vector representations of h, r, and t, and is the l_{1}norm or l_{2}norm distance.
In MvTransE, we use the marginbased loss function as the optimization target in each vector space, which is defined as follows:where is a set of embedded vector spaces, T is a set of positive triplets in a graph, and T^{′} is a set of negative triplets generated by randomly replacing the head (or tail) of each positive triplet (h, r, t) ∈ T, and is a fixed margin distance for distinguishing positive and negative triplets. We use the stochastic gradient descent (SGD) [22] to minimize the loss function.
Algorithm 2 presents the multiview graph embedding process, which aims at respectively embedding each knowledge graph and subgraph into single vector spaces. Thus, we will get n + 1 vector spaces, including one globalview vector space and n localview vector spaces. The globalview vector space obtains the globalview representation of all entities and relations regarding the original knowledge graph; the localview vector spaces differentially learn localview representation of entities and relations regarding complex relational facts from different semantic and structural perspectives.

3.3. Multiview Fusion Strategy
In this section, we propose a multiview fusion strategy that adopts an adaptive selection principle to integrate knowledge representations of globalview vector space and localview vector space. For each testing triplet (h, r, t), we define a scoring estimation function to calculate the distance score of a vector space, and then dynamically select the final representation of the triplet according to the minimum score. The scoring estimation function is defined as follows:where ∆ is a vector space in which contains h, r, and t; hΔ, rΔ and tΔ are vectors of h, r and in a vector space ∆.
Since each parallel vector space generally contains local relational facts related to a particular relation, our model can subtly solve the spatial congestion problem. Additionally, MvTransE constructs a globalview vector space containing complete knowledge representations in a knowledge graph, which makes any testing triplet able to find a knowledge representation at least. Thus, our model significantly improves the performance of learning simple relational knowledge.
4. Experiments
In this section, we study the performance of our model in link prediction and triplet classification tasks under four public datasets, i.e., WN18, WN18RR, WN11, and FB15K237.
4.1. Datasets
WordNet [2] is a large knowledge graph of English vocabulary which is widely used in graph embedding works. In WordNet, a set of synonyms representing a basic vocabulary concept is taken as an entity, and various semantic relations are established between these synonym sets. In the following experiments, we use three public subsets of WordNet, i.e., WN18, WN18RR, and WN11. WN18 contains 18 relations and 40943 entities. WN18RR is a modified version of WN18 introduced by Dettmers et al. [23], which removes the reversing relational facts avoiding information leakage problem in representation tasks. WN11 consists of 11 relations and 38696 entities. Freebase is a large collaborative knowledge graph storing the general facts of the real world. We use a subset of Freebase, i.e., FB15k237 [21], which consists of 237 relations and 14541 entities in total. Table 1 presents the statistics of the above datasets.
4.2. Link Prediction
Link prediction aims to predict the missing head entity h or tail entity t of a test triplet (h, r, t). In this experiment, we take the entity h (or t) missed in test triplets as the correct entity, and all other entities are considered as candidate entities. Firstly, we construct candidate triplets by replacing h (or t) of the test triplet. Then, the link prediction score of each triplet is calculated by the scoring function of our model. Finally, candidate entities and the correct entity are sorted in ascending order based on their prediction scores. We adopt two metrics used in [5] to evaluate our model: the average rank of each correct entity (i.e., Mean Rank) and the average number of correct entities ranked in the top 10 (i.e., Hits@10). Obviously, a good prediction performance should achieve a high Hits@10 and a low Mean Rank.
Note that the candidate triplets may already exist in the knowledge graph, so these candidate triplets should be considered as the correct triplets. The scores of these candidate triplets are likely to be lower than the correct triplets. Therefore, we should filter out these candidate triplets that have already appeared in the train, validation, and test sets. We denote an evaluation setting by “Filt” if we filter out these candidate triplets before the test, otherwise denote it by “Raw.”
We compare MvTransE with a few stateoftheart methods in the link prediction task on WN18, WN18RR, and FB15K237 datasets. On WN18, MvTransE is compared to RESCAL [24], SE, SME, LFM, TransE, TransH, TransR/CTransR, puTransE, and TransAt in Table 2. In Table 3, we compare MvTransE with three competitive methods DistMult [25], ComplEx [26], and ConvE [23] on WN18RR and FB15K237 datasets both of which do not have information leakage problem found on WN18 dataset. We directly use the results reported in their published papers or in [7] due to the same experimental settings. For MvTransE, our experimental settings are as follows.
4.2.1. Subgraph Generation Setup
On WN18, we set the number of subgraphs n = 5000, the expanding speed of subgraphs n_{s} ∈ [50, 450], and the size of subgraphs n_{t} ∈ [200, 1500]. On WN18RR, we set the number of subgraphs n = 5000, the expanding speed of subgraphs n_{s} ∈ [50, 300], and the size of subgraphs n_{t} ∈ [200, 1000]. On FB15K237, we set the number of subgraphs n = 5000, the expanding speed of subgraphs n_{s} ∈ [100, 350], and the size of subgraphs n_{t} ∈ [1000, 2200].
4.2.2. Graph Embedding Setup
The dimension of vectors k is set among {25, 30, 35, 40, 50}, the training phase epoch among {200, 500, 1000}. In globalview vector space, we set the learning rate among {0.01, 0.005, 0.001, 0.0008}, the margin value among {2, 3, 3.5, 4, 4.5, 5, 5.5}, the minibatch size B_{0} among {100, 200, 500}. For each dataset, we choose the best parameter configuration through validation sets and they are as follows: k = 35, epoch = 1000, = 0.005, = 3.5, B_{0} = 100 on WN18; k = 35, epoch = 1000, = 0.0008, = 5.5, B_{0} = 100 on WN18RR; k = 35, epoch = 1000, = 0.001, = 3, B_{0} = 100 on FB15K237. In localview spaces, we randomly initialize the learning rate and the margin within a certain range, and obtain the best parameter settings as follows: ∈ [0.001, 0.005], ∈ [1.5, 5.5] on WN18; ∈ [0.0008, 0.005], ∈ [3.5, 5.5] on WN18RR; ∈ [0.005, 0.015], ∈ [3.5, 5.5] on FB15K237.
Table 2 presents the corresponding experimental results of three model settings: (1) MvTransE (Globalview) denotes the prediction results by using the globalview representation. (2) MvTransE (Localview) denotes the prediction results by using all parallel localview representation. (3) MvTransE (Multiview) denotes the prediction results of the integrated multiview representation derived from a multiview fusion strategy. It can be seen from Table 2, the multiview representation yields better results than globalview and localview representations. The results prove that our idea of representing knowledge from multiple perspectives is effective. In detail, our method substantially outperforms stateoftheart methods in the Mean Rank metric. In Hits@10, our method is also superior to all baseline methods and achieves the best performance under “Raw,” and achieves the same best performance as TransAt under “Filt” settings.
Table 3 presents the results of MvTransE (Multiview) on WN18RR and FB15K237 to further illustrate the merits of our method. Obviously, MvTransE (Multiview) markedly outperforms all methods on WN18RR, achieving stateoftheart performance in two metrics. On FB15K237, MvTransE (Multiview) has achieved the best performance over all of the methods on the Mean Rank metric, performing the same as ConvE on Hits@10 metric. Particularly, our model achieves excellent performance on all three different datasets in the Mean Rank metric, which evaluates the overall quality of the learned knowledge representations. This is because that MvTransE aims at learning knowledge representations from multiple perspectives and dynamically fusing these representations as an optimal combination.
To further explain the above observation, we present the prediction results of our method regarding all types of relational facts of WN18. Table 4 lists the type distribution of triplets in WN18 based on four relation categories. Table 5 presents the experimental results of three model settings on each relation category. Specifically, the globalview setting outperforms the localview setting in predicting 1to1, 1toN head, and Nto1 tail on both metrics, which actually fall into the simple relational facts learning category. On the contrary, the localview setting exhibits superior performance in predicting complex relational facts, i.e., including NtoN, Nto1 head, and 1toN tail facts learning category. Clearly, by combining the advantages of globalview and localview settings, the multiview method achieves the best performance in learning simple and complex relational facts.
4.3. Triplet Classification
Triple classification task aims to determine whether a given triple (h, r, t) is correct or not. In this experiment, we adopt WN11 to verify the effectiveness of our method. Following the experiment setting of previous work [5], we set a classifying threshold for each relation r. To maximize the classification accuracy, we optimize on the validation set. Giving a test triple (h, r, t), if its score is lower than , it will be classified as a positive sample, otherwise a negative sample.
We choose SE, SME, Single Layer Model (SLM) [17], LFM, NTN, TransE, TransH, and puTransE as baselines. We use the results reported in [7] directly since the data set is the same. For MvTransE, our experimental settings are as follows.
4.3.1. Subgraph Generation Setup
We set the number of subgraphs n = 5000, the length of random walk n_{s} ∈ [50, 300], and the size of subgraphs n_{t} ∈ [200, 1000].
4.3.2. Graph Embedding Setup
The dimension of vectors k is set among {15, 20, 25}, the training phase epoch among {200, 500, 1000}. In globalview vector space, we set the learning rate among {0.02, 0.01, 0.05}, the margin value among {3, 4, 4.5, 5}, and the minibatch size B_{0} among {100, 200, 500}. We choose the best configuration by the validation set, which is as follows: k = 20, epoch = 1000, = 0.01, = 4.5, B_{0} = 100. In localview vector spaces, we randomly initialize the learning rate ∈ [0.005, 0.015], the margin value ∈ [3.5, 5.5].
The experimental results of the triplet classification are shown in Figure 3. Clearly, MvTransE obtains the best performance among all of the baseline methods. Compared with the translationbased methods, i.e., TransE and TransH, our method is able to learn complex relational facts more subtly by constructing subgraphs from the original knowledge graph. On the other side, our method outperforms the recent competitor puTransE due to our knowledge fusion strategy which leverages an adaptive selection principle to integrate the globalview and the localview knowledge representations reasonably. Therefore, MvTransE is more suitable for embedding large and complex knowledge graphs. MvTransE has great advantages in knowledge graph completion tasks due to its multiview knowledge learning and fusing methods.
5. Conclusion and Future Work
In this work, we propose a multiview translation learning model, named MvTransE, which aims at presenting graph relational facts from globalview and localview, respectively. MvTransE achieves stateoftheart performance by solving the entity spatial congestion problem and the relational fact impairment problem. Extensive experiments demonstrate that MvTransE outperforms stateoftheart models on the link prediction task and triplet classification task. In the future, we will focus on the subgraph construction scheme to learn the local relational facts more efficiently.
Data Availability
The data used to support the findings of this study are available from the corresponding author upon request.
Conflicts of Interest
The authors declare that they have no conflicts of interest.
Acknowledgments
This work was partially supported by the National Natural Science Foundation of China (Nos. U1811264, 61966009, and U1711263), the Natural Science Foundation of Guangxi Province (Nos. 2020GXNSFAA159055, 2019GXNSFBA245059, and 2019GXNSFBA245049), the Guangxi InnovationDriven Development Project (No. AA17202024), and the Beihai Science and Technology Project Contract (No. 202082001).