Scientific Programming

Scientific Programming / 2020 / Article
Special Issue

Big Data Management and Analytics in Scientific Programming

View this Special Issue

Research Article | Open Access

Volume 2020 |Article ID 7084958 | https://doi.org/10.1155/2020/7084958

Chenzhong Bin, Saige Qin, Guanjun Rao, Tianlong Gu, Liang Chang, "Multiview Translation Learning for Knowledge Graph Embedding", Scientific Programming, vol. 2020, Article ID 7084958, 9 pages, 2020. https://doi.org/10.1155/2020/7084958

Multiview Translation Learning for Knowledge Graph Embedding

Academic Editor: Zhiang Wu
Received14 Oct 2019
Accepted09 Jul 2020
Published25 Aug 2020

Abstract

Recently, knowledge graph embedding methods have attracted numerous researchers’ interest due to their outstanding effectiveness and robustness in knowledge representation. However, there are still some limitations in the existing methods. On the one hand, translation-based representation models focus on conceiving translation principles to represent knowledge from a global perspective, while they fail to learn various types of relational facts discriminatively. It is prone to make the entity congestion of complex relational facts in the embedding space reducing the precision of representation vectors associating with entities. On the other hand, parallel subgraphs extracted from the original graph are used to learn local relational facts discriminatively. However, it probably causes the relational fact damage of the original knowledge graph to some degree during the subgraph extraction. Thus, previous methods are unable to learn local and global knowledge representation uniformly. To that end, we propose a multiview translation learning model, named MvTransE, which learns relational facts from global-view and local-view perspectives, respectively. Specifically, we first construct multiple parallel subgraphs from an original knowledge graph by considering entity semantic and structural features simultaneously. Then, we embed the original graph and construct subgraphs into the corresponding global and local feature spaces. Finally, we propose a multiview fusion strategy to integrate multiview representations of relational facts. Extensive experiments on four public datasets demonstrate the superiority of our model in knowledge graph representation tasks compared to state-of-the-art methods.

1. Introduction

Knowledge graphs [1] are a sort of directed graphs consisting of entities as nodes and relations between entities as edges. And each relational fact of knowledge graphs is stored as a triplet (head, relation, tail), abbr. (h, r, t), where h and t represent head and tail entities, respectively, and r is a relationship from h to t. With the advent of the big data era, the scale of knowledge graph continues to grow; diverse large-scale knowledge graphs (e.g., WordNet [2] and Freebase [3]) have appeared. Despite the large scale of current knowledge graphs, they are still far from the knowledge completeness. For example, 75% of people in Freebase lack nationality information and 71% lack birthplace [4]. Therefore, it is necessary to design approaches to automatically complete or infer missing relational facts of the existing knowledge graph.

Recently, embedding-based approaches present strong feasibility and robustness in terms of knowledge graph completion, which project entities and relations of knowledge graphs into a dense, continuous, and low-dimensional vector space. Among the existing approaches, translation-based approaches have attracted numerous researchers’ interests due to the outstanding effectiveness and robustness in the knowledge representation. The first translation-based method was proposed by Bordes et al., named TransE [5]. For each triplet (h, r, t), TransE treats a relation r as a translation operation from h to t in a vector space. If (h, r, t) holds, the translation principle should be satisfied in the vector space, where h, r, and t are vector representations of h, r, and t. TransE is a simple yet effective translation model in processing 1-to-1 simple relational facts, which stands for each single head entity connecting only one tail entity via a specific relation. It achieves state-of-the-art performance on link prediction. However, the translation principle is too rigid to deal with complex relational facts, including 1-to-N, N-to-1, and N-to-N facts. Technically, it may cause spatial congestion of entities when many head entities (or tail entities) are projecting at only one point of the vector space.

To eliminate the weakness of TransE in representing complex relational facts, a series of improved models were proposed, such as TransH [6], TransR [7], TransD [8], and TranSparse [9]. Essentially, the above methods focused on designing various translation principles to learn complex relational facts more precisely. However, they still embed a complete knowledge graph into a single vector space from a global perspective while they fail to learn the various type of relational facts discriminatively. That is, each entity and relation of knowledge graphs are learned as corresponding unique representing vectors in their spaces. Hence, there are still vector congestions in the translation spaces. For a real world example, the head entities of the triples (Obama, President, the United States) and (Trump, President, the United States), i.e., Obama and Trump, are projected closely in the vector space due to the same social status. But, they are quite different in other perspectives.

To solve this problem, puTransE [10] splits knowledge graphs into multiple parallel spaces in the form of subgraphs from a local perspective and achieves the spatial sparsification of complex relational facts. Specifically in puTransE, entities and relations have respective feature representations in different parallel spaces. This approach improves the ability to learn complex relational facts due to avoiding the spatial congestion of entities in complex relational facts. However, puTransE still has two shortcomings. First, puTransE poses excessive sparseness to simple relational facts during the parallel space generation. Consequently, it hardly learns the complete vector representations of simple relational facts within a single subspace. This is because puTransE performs the spatial sparsification not only to complex relational facts but also to simple ones. Second, puTransE randomly selects local knowledge to construct multiple parallel spaces, which is prone to impair relational facts of the original knowledge graph. For example, there is a golden triplet in the original graph that cannot be consisted by entities and relations in any parallel spaces.

In summary, all of the above methods embed knowledge graph from a single perspective, i.e., from a local-view or global-view. Thus, they fail to learn local and global knowledge representation uniformly. To that end, we borrow the idea of multiview learning methods [11, 12] to propose a multiview translation learning model, named MvTransE, which embeds relational facts from global-view and local-view, concurrently. In detail, we first generate multiple parallel subgraphs through semantic and structural perspectives of entities to accurately capture local-view knowledge of the knowledge graph. Then, the original knowledge graph and generated parallel subgraphs are embedded into global-view and local-view spaces, respectively. Finally, we propose a multiview fusion strategy to integrate multiview representations of relational facts. We outline the main contributions of this paper as follows:(1)We incorporate the idea of multiview learning into our model MvTransE, which can precisely learn relational facts from both global and local views(2)Our model extracts local knowledge from semantic and structural perspectives to construct multiple parallel subgraphs so as to solve the entity spatial congestion problem by learning local-view representations of relational facts(3)MvTransE applies a multiview fusion strategy to combine global-view and local-view representations of the knowledge graph, which effectively overcomes the missing of relational facts in parallel spaces(4)Extensive experiment results demonstrate our method outperforms state-of-the-art models in two knowledge graph completion tasks

Since the appearance of knowledge graphs, numerous researchers have studied various methods to represent relational facts of the graphs. Initially, some embedding-based models such as Structured Embedding (SE) [13], Semantic Matching Energy (SME) [14, 15], Latent Factor Model (LFM) [16], and Neural Tensor Network (NTN) [17] achieved considerable performance in knowledge representation, while fail to cope with large-scale graphs due to the computing complexity. Recently, translation-based models have attracted lots of attention due to their effective and robust representation abilities.

TransE [5] is the first translation-based method, which treats a relation r as a translation from h to t for a triplet (h, r, t). Hence, TransE defines the scoring function as , where ‖·‖l1/l2 stands for norm-1 or norm-2 computation. During the model training, if a triplet (h, r, t) holds, the translation principle should be satisfied in the vector space, of which process is illustrated in Figure 1. That is, TransE keeps translation vectors () approximate to tail vector t. TransE achieves remarkable performance in representing simple relational facts, i.e., 1-to-1 triplets. However, it has limitations in dealing with complex relationship facts including 1-to-N, N-to-1, and N-to-N triplets due to the rigid translation principle.

TransH [6] tries to solve the problem of TransE by implementing an entity to have unique representations when the entity is involved in different relations. Specifically, for each triplet (h, r, t), TransH projects h and t to a relation r specific hyperplane to obtain projected vectors and . The scoring function is defined as .

TransR/CTransR [7] models entities and relations in different vector spaces, respectively, i.e., the entity vector space and the relation vector space. For each relation r, it set a projection matrix Mr to map the vector of entities from the entity vector space to the relation vector space, i.e., and . Its scoring function is .

TransD [8] considers the diversity of entities and relations simultaneously. It uses the product of two vectors of an entity-relation pair to replace a projection matrix, i.e., and . TransD is more extensible and can be applied to the large-scale knowledge graphs. Its scoring function is .

TranSparse [9] considers the heterogeneity and imbalance of entities and relations in a knowledge graph, which are generally ignored by previous works. TranSparse constructs adaptive sparse matrices and , instead of projection matrices, to concurrently prevent the overfitting of simple relational facts and the underfitting of complex relational facts. Its scoring function is .

FT [18] and DT [19] design the flexible translation principles and the dynamic translation principles, respectively. To some extent, they improve the ability to handle complex relational facts. Essentially, they focus on elaborating various translation principles to learn complex relational facts more accurately. TransAt [17] and GAN-based framework [20] use attention mechanisms and generate adversarial networks to improve the model performance, respectively. However, all of the above methods naturally embed the complete knowledge graph into a uniform vector space from a single perspective, failing to solve the space congestion problem thoroughly.

PuTransE [21] is an online and robust improvement of TransE solving the hyperparameters sensitivity problem and the spatial congestion of entity and relation, as well as the processing of dynamic knowledge graphs. It adopts multiple parallel spaces to learn the vectors of entities and relations, thus avoiding spatial congestion in complex relational facts. Therefore, puTransE achieves state-of-the-art performance on the link prediction task. However, puTransE still has two weaknesses resulting in the performance limitation. First, puTransE causes the excessive sparseness of simple relational facts during the random parallel space generation. It performs knowledge extraction including not only complex relational facts but also sparse simple relational facts. Thus, it probably cannot learn complete vector representations of simple relational facts via a single subgraph. Second, puTransE randomly selects local knowledge to construct multiple parallel spaces, which is prone to impair original facts of knowledge graphs. For example, there is a golden triplet, i.e., a positive sample, in the original graph which cannot be consisted by entities and relation in any parallel spaces. This situation decreases the relational fact prediction accuracy of puTransE.

3. Our Method

In this section, we introduce the details of MvTransE, which embeds relational facts from global-view and local-view, respectively. The workflow of MvTransE mainly consists of three steps. The first step (Section 3.1) is to generate multiple parallel subgraphs so as to extract particular local relational facts of the knowledge graph accurately. The second step (Section 3.2) aims at discriminatively embedding the knowledge graph into multiple parallel spaces to acquire multiview representations of relational facts. The last step (Section 3.3) integrates multiple versions of knowledge representations, i.e., fuses local-view and global-view representations of entities and relations. Figure 2 presents a multiview knowledge learning process of MvTransE.

3.1. Subgraph Generation

The subgraph generation aims to extract local relational facts from different perspectives, so as to solve spatial congestion of entities by sparsely embedding entities and relations into different parallel vector spaces in the following graph embedding step. Therefore, we construct multiple parallel subgraphs based on different relations of the knowledge graph. Each subgraph mainly contains the local relational facts selected from a specific relation.

Initially, we give the definition of some related symbols in the subgraph generation process. We define a knowledge graph as G = (E, R, T), where E and R denote an entity set and a relation set of graph G, respectively; T ⊆ E × R × E represents a triplet set of G. And sub is the final generated subgraph set, is a subgraph that  ∈ , and and represent an entity set and a relation set of , respectively. Algorithm 1 demonstrates the details of the subgraph generation, which mainly consists of two steps given in the following.

3.1.1. Semantics-Related Entity Selection

To accurately learn local relational facts, we first select relation relevant entities for a subgraph to ensure the semantic consistency of its knowledge as much as possible. That is, entities in a subgraph should be semantically related to each other based on a specific relation. We randomly select a relation r from relation set R then to generate entity set Er, which consists of extracted entities interconnecting with r. Since Er is generated via relation r; therefore, r is deemed as the semantic center of the current subgraph .

3.1.2. Structure-Related Subgraph Expansion

In order to learn latent knowledge associating with r more comprehensively, we need to expand each subgraph according to the local graph structure of entities in the set Er. This step ensures a generated subgraph containing semantic and structural features of local relational facts simultaneously. Specifically, we first randomly select an entity ei from Er as a starting entity to expand the subgraph . And then, we randomly select a triplet whose head or tail entity is the starting entity ei and add the head or tail entity to the subgraph . Consequently, we can get the local structure information of Er regarding G by repeating the above two operations for ns time. Due to the randomly selected entities and triplets in the subgraph expansion, each subgraph expanded from Er may include different relational facts, which makes generated subgraphs slightly different from each other in terms of semantics and structure. Thereafter, with respect to learning local relational facts discriminately, these random operations ensure MvTransE can learn local knowledge representations from multiple perspectives.

Input: Knowledge Graph , hyperparameters of subgraph
Output: A subgraphs set .
(1)  ⟵ Initialize empty set for subgraph
(2) while n > 0 do
(3)   //Set the maximum iterations
(4)   ⟵ Random sample a relation
(5)   ⟵ Select semantics related entities
(6)   ⟵ Generate subgraph scale hyperparameters
(7)  //Initialize empty set for selected triplets
(8)  whiledo
(9)   
(10)   whiledo//Control the scale of subgraphs
(11)    ⟵ Randomly sample an entity
(12)    ⟵ Randomly select a relevant triplet from
(13)    //Add a selected triplet
(14)   
(15)   end while
(16)   ⟵ All entities collected in
(17)  
(18)  end while
(19)  //Add a subgraph
(20) 
(21) end while

Besides, to make each subgraph focusing on particular relational facts, we need to control the scale of subgraphs for avoiding extracting excessive irrelevant facts. We set hyperparameter nt to control the number of triplets in subgraphs, set hyperparameter ns to control the expansion speed of each subgraph, set hyperparameter to control the maximum iterations of the triplet selection, and set hyperparameter n to control the number of generated subgraphs.

3.2. Graph Embedding

The goal of this step is to obtain global-view representation and local-view representation of entities and relations. Hence, we perform original knowledge graph G embedding and subgraphs embedding to learn global knowledge and local knowledge, respectively. In each vector space, we define the following equation as the scoring function fr (h, t) to translate each triplet (h, r, t):where h, r, and t are vector representations of h, r, and t, and is the l1-norm or l2-norm distance.

In MvTransE, we use the margin-based loss function as the optimization target in each vector space, which is defined as follows:where is a set of embedded vector spaces, T is a set of positive triplets in a graph, and T is a set of negative triplets generated by randomly replacing the head (or tail) of each positive triplet (h, r, t) ∈ T, and is a fixed margin distance for distinguishing positive and negative triplets. We use the stochastic gradient descent (SGD) [22] to minimize the loss function.

Algorithm 2 presents the multiview graph embedding process, which aims at respectively embedding each knowledge graph and subgraph into single vector spaces. Thus, we will get n + 1 vector spaces, including one global-view vector space and n local-view vector spaces. The global-view vector space obtains the global-view representation of all entities and relations regarding the original knowledge graph; the local-view vector spaces differentially learn local-view representation of entities and relations regarding complex relational facts from different semantic and structural perspectives.

Input: A set of Knowledge Graph and Subgraphs , vector dimension k, global-view margin , global-view learning rate .
Output: A set of generated vector spaces .
(1)  ⟵ Initialize empty set for vector spaces
(2) fordo
(3)  ifthen
(4)    ⟵ global-view margin , global-view learning rate
(5)  else
(6)    ⟵ Randomly initialize local-view margin and learning rate
(7)  end if
(8)   ⟵ All entities and relations in respectively
(9)   ⟵ Initialize uniform (, ) for each ,
(10)  loop
(11)   //Sample a minibatch of size b
(12)    //Initialize the set of pairs of triplets
(13)   fordo
(14)   //Sample corrupted triplet
(15)   
(16)   end for
(17)   Update vectors w.r.t
(18)     end loop
(19)   Δ ⟵ ()//Trained parameters are saved as one vector space
(20)    ⟵  //Add vector space to sets
(21)     end for
3.3. Multiview Fusion Strategy

In this section, we propose a multiview fusion strategy that adopts an adaptive selection principle to integrate knowledge representations of global-view vector space and local-view vector space. For each testing triplet (h, r, t), we define a scoring estimation function to calculate the distance score of a vector space, and then dynamically select the final representation of the triplet according to the minimum score. The scoring estimation function is defined as follows:where ∆ is a vector space in which contains h, r, and t; hΔ, rΔ and tΔ are vectors of h, r and in a vector space ∆.

Since each parallel vector space generally contains local relational facts related to a particular relation, our model can subtly solve the spatial congestion problem. Additionally, MvTransE constructs a global-view vector space containing complete knowledge representations in a knowledge graph, which makes any testing triplet able to find a knowledge representation at least. Thus, our model significantly improves the performance of learning simple relational knowledge.

4. Experiments

In this section, we study the performance of our model in link prediction and triplet classification tasks under four public datasets, i.e., WN18, WN18RR, WN11, and FB15K-237.

4.1. Datasets

WordNet [2] is a large knowledge graph of English vocabulary which is widely used in graph embedding works. In WordNet, a set of synonyms representing a basic vocabulary concept is taken as an entity, and various semantic relations are established between these synonym sets. In the following experiments, we use three public subsets of WordNet, i.e., WN18, WN18RR, and WN11. WN18 contains 18 relations and 40943 entities. WN18RR is a modified version of WN18 introduced by Dettmers et al. [23], which removes the reversing relational facts avoiding information leakage problem in representation tasks. WN11 consists of 11 relations and 38696 entities. Freebase is a large collaborative knowledge graph storing the general facts of the real world. We use a subset of Freebase, i.e., FB15k-237 [21], which consists of 237 relations and 14541 entities in total. Table 1 presents the statistics of the above datasets.


Dataset#Ent#Rel#Train#Valid#Test

WN18409431814144250005000
WN113869611112581260910544
WN18RR40943118683530343134
FB15K-237145412372721151753520466

4.2. Link Prediction

Link prediction aims to predict the missing head entity h or tail entity t of a test triplet (h, r, t). In this experiment, we take the entity h (or t) missed in test triplets as the correct entity, and all other entities are considered as candidate entities. Firstly, we construct candidate triplets by replacing h (or t) of the test triplet. Then, the link prediction score of each triplet is calculated by the scoring function of our model. Finally, candidate entities and the correct entity are sorted in ascending order based on their prediction scores. We adopt two metrics used in [5] to evaluate our model: the average rank of each correct entity (i.e., Mean Rank) and the average number of correct entities ranked in the top 10 (i.e., Hits@10). Obviously, a good prediction performance should achieve a high Hits@10 and a low Mean Rank.

Note that the candidate triplets may already exist in the knowledge graph, so these candidate triplets should be considered as the correct triplets. The scores of these candidate triplets are likely to be lower than the correct triplets. Therefore, we should filter out these candidate triplets that have already appeared in the train, validation, and test sets. We denote an evaluation setting by “Filt” if we filter out these candidate triplets before the test, otherwise denote it by “Raw.”

We compare MvTransE with a few state-of-the-art methods in the link prediction task on WN18, WN18RR, and FB15K-237 datasets. On WN18, MvTransE is compared to RESCAL [24], SE, SME, LFM, TransE, TransH, TransR/CTransR, puTransE, and TransAt in Table 2. In Table 3, we compare MvTransE with three competitive methods DistMult [25], ComplEx [26], and ConvE [23] on WN18RR and FB15K-237 datasets both of which do not have information leakage problem found on WN18 dataset. We directly use the results reported in their published papers or in [7] due to the same experimental settings. For MvTransE, our experimental settings are as follows.


DatasetWN18
MetricMean rankHits@10 (%)
RawFiltRawFilt

RESCAL1180116337.252.8
SE101198568.580.5
SME (linear/bilinear)545/526533/50965.1/54.774.1/61.3
LFM46945671.481.6
TransE26325175.489.2
TransH (unif/bern)318/401303/38875.4/73.086.7/82.3
TransR (unif/bern)232/238219/22578.3/79.891.7/92.0
CTransR (unif/bern)243/231230/21878.9/79.492.3/92.3
puTransE392988.194.9
TransAt16915781.495.1

MvTransE (global-view)19818679.792.2
MvTransE (local-view)464284.193.9
MvTransE (Multiview)292488.395.1


DatasetWN18RRFB15K-237
MetricMean Rank(Filt)Hits@10 (filt)Mean Rank(Filt)Hits@10 (filt)

DisMult51100.492540.41
ComplEx52610.513390.42
ConvE52770.462460.49
MvTransE13230.521390.49

4.2.1. Subgraph Generation Setup

On WN18, we set the number of subgraphs n = 5000, the expanding speed of subgraphs ns ∈ [50, 450], and the size of subgraphs nt ∈ [200, 1500]. On WN18RR, we set the number of subgraphs n = 5000, the expanding speed of subgraphs ns ∈ [50, 300], and the size of subgraphs nt ∈ [200, 1000]. On FB15K-237, we set the number of subgraphs n = 5000, the expanding speed of subgraphs ns ∈ [100, 350], and the size of subgraphs nt ∈ [1000, 2200].

4.2.2. Graph Embedding Setup

The dimension of vectors k is set among {25, 30, 35, 40, 50}, the training phase epoch among {200, 500, 1000}. In global-view vector space, we set the learning rate among {0.01, 0.005, 0.001, 0.0008}, the margin value among {2, 3, 3.5, 4, 4.5, 5, 5.5}, the minibatch size B0 among {100, 200, 500}. For each dataset, we choose the best parameter configuration through validation sets and they are as follows: k = 35, epoch = 1000,  = 0.005,  = 3.5, B0 = 100 on WN18; k = 35, epoch = 1000,  = 0.0008,  = 5.5, B0 = 100 on WN18RR; k = 35, epoch = 1000,  = 0.001,  = 3, B0 = 100 on FB15K-237. In local-view spaces, we randomly initialize the learning rate and the margin within a certain range, and obtain the best parameter settings as follows: ∈ [0.001, 0.005],  ∈ [1.5, 5.5] on WN18;  ∈ [0.0008, 0.005],  ∈ [3.5, 5.5] on WN18RR;  ∈ [0.005, 0.015], ∈ [3.5, 5.5] on FB15K-237.

Table 2 presents the corresponding experimental results of three model settings: (1) MvTransE (Global-view) denotes the prediction results by using the global-view representation. (2) MvTransE (Local-view) denotes the prediction results by using all parallel local-view representation. (3) MvTransE (Multiview) denotes the prediction results of the integrated multiview representation derived from a multiview fusion strategy. It can be seen from Table 2, the multiview representation yields better results than global-view and local-view representations. The results prove that our idea of representing knowledge from multiple perspectives is effective. In detail, our method substantially outperforms state-of-the-art methods in the Mean Rank metric. In Hits@10, our method is also superior to all baseline methods and achieves the best performance under “Raw,” and achieves the same best performance as TransAt under “Filt” settings.

Table 3 presents the results of MvTransE (Multiview) on WN18RR and FB15K-237 to further illustrate the merits of our method. Obviously, MvTransE (Multiview) markedly outperforms all methods on WN18RR, achieving state-of-the-art performance in two metrics. On FB15K-237, MvTransE (Multiview) has achieved the best performance over all of the methods on the Mean Rank metric, performing the same as ConvE on Hits@10 metric. Particularly, our model achieves excellent performance on all three different datasets in the Mean Rank metric, which evaluates the overall quality of the learned knowledge representations. This is because that MvTransE aims at learning knowledge representations from multiple perspectives and dynamically fusing these representations as an optimal combination.

To further explain the above observation, we present the prediction results of our method regarding all types of relational facts of WN18. Table 4 lists the type distribution of triplets in WN18 based on four relation categories. Table 5 presents the experimental results of three model settings on each relation category. Specifically, the global-view setting outperforms the local-view setting in predicting 1-to-1, 1-to-N head, and N-to-1 tail on both metrics, which actually fall into the simple relational facts learning category. On the contrary, the local-view setting exhibits superior performance in predicting complex relational facts, i.e., including N-to-N, N-to-1 head, and 1-to-N tail facts learning category. Clearly, by combining the advantages of global-view and local-view settings, the multiview method achieves the best performance in learning simple and complex relational facts.


Relation category1-to-11-to-NN-to-1N-to-N

#Triplet (train)1281546555455531014
#Triplet (valid)44193818971119
#Triplet (test)42184719811130


Tasks (k = 35)Predicting headPredicting tail
1-to-11-to-NN-to-1N-to-N1-to-11-to-NN-to-1N-to-N

Global-viewHits@10 (%)90.5/90.591.9/92.756.0/87.685.7/89.792.9/93.254.5/86.994.1/94.284.6/88.6
(Raw/Filt)Mean rank12/12158/158228/198217/21635/35258/228132/132227/225
Local-viewHits@10 (%)71.4/71.491.8/92.070.3/86.894.6/95.071.4/71.478.4/91.784.4/84.494.3/94.8
(Raw/Filt)Mean rank97/9723/2372/6122/21177/17744/3367/6723/23
MultiviewHits@10 (%)90.5/90.594.2/94.376.1/94.295.0/95.492.9/93.979.8/93.994.1/94.294.8/95.2
(Raw/Filt)Mean rank8/812/1147/3439/3820/1944/3114/1423/22

The bold values stand for the best performance in the corresponding metric.
4.3. Triplet Classification

Triple classification task aims to determine whether a given triple (h, r, t) is correct or not. In this experiment, we adopt WN11 to verify the effectiveness of our method. Following the experiment setting of previous work [5], we set a classifying threshold for each relation r. To maximize the classification accuracy, we optimize on the validation set. Giving a test triple (h, r, t), if its score is lower than , it will be classified as a positive sample, otherwise a negative sample.

We choose SE, SME, Single Layer Model (SLM) [17], LFM, NTN, TransE, TransH, and puTransE as baselines. We use the results reported in [7] directly since the data set is the same. For MvTransE, our experimental settings are as follows.

4.3.1. Subgraph Generation Setup

We set the number of subgraphs n = 5000, the length of random walk ns ∈ [50, 300], and the size of subgraphs nt ∈ [200, 1000].

4.3.2. Graph Embedding Setup

The dimension of vectors k is set among {15, 20, 25}, the training phase epoch among {200, 500, 1000}. In global-view vector space, we set the learning rate among {0.02, 0.01, 0.05}, the margin value among {3, 4, 4.5, 5}, and the minibatch size B0 among {100, 200, 500}. We choose the best configuration by the validation set, which is as follows: k = 20, epoch = 1000,  = 0.01,  = 4.5, B0 = 100. In local-view vector spaces, we randomly initialize the learning rate ∈ [0.005, 0.015], the margin value ∈ [3.5, 5.5].

The experimental results of the triplet classification are shown in Figure 3. Clearly, MvTransE obtains the best performance among all of the baseline methods. Compared with the translation-based methods, i.e., TransE and TransH, our method is able to learn complex relational facts more subtly by constructing subgraphs from the original knowledge graph. On the other side, our method outperforms the recent competitor puTransE due to our knowledge fusion strategy which leverages an adaptive selection principle to integrate the global-view and the local-view knowledge representations reasonably. Therefore, MvTransE is more suitable for embedding large and complex knowledge graphs. MvTransE has great advantages in knowledge graph completion tasks due to its multiview knowledge learning and fusing methods.

5. Conclusion and Future Work

In this work, we propose a multiview translation learning model, named MvTransE, which aims at presenting graph relational facts from global-view and local-view, respectively. MvTransE achieves state-of-the-art performance by solving the entity spatial congestion problem and the relational fact impairment problem. Extensive experiments demonstrate that MvTransE outperforms state-of-the-art models on the link prediction task and triplet classification task. In the future, we will focus on the subgraph construction scheme to learn the local relational facts more efficiently.

Data Availability

The data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

This work was partially supported by the National Natural Science Foundation of China (Nos. U1811264, 61966009, and U1711263), the Natural Science Foundation of Guangxi Province (Nos. 2020GXNSFAA159055, 2019GXNSFBA245059, and 2019GXNSFBA245049), the Guangxi Innovation-Driven Development Project (No. AA17202024), and the Beihai Science and Technology Project Contract (No. 202082001).

References

  1. R. Ronald Bakker, “Knowledge graphs: representation and structuring of scientific knowledge,” 1987. View at: Google Scholar
  2. G. A. Miller, “WordNet: a lexical database for English,” Communications of the ACM, vol. 38, no. 11, pp. 39–41, 1995. View at: Publisher Site | Google Scholar
  3. K. D. Bollacker, C. Evans, P. Paritosh, T. Sturge, and J. Taylor, “Freebase: a collaboratively created graph database for structuring human knowledge,” in Proceedings of the Special Interest Group on Management of Data, pp. 1247–1250, Vancouver, Canada, June 2008. View at: Google Scholar
  4. X. Dong, E. Gabrilovich, G. Heitz et al., “Knowledge vault: a web-scale approach to probabilistic knowledge fusion,” in Proceedings of the Special Interest Group on Knowledge Discover and Data Mining, pp. 601–610, New York, NY, USA, August 2014. View at: Google Scholar
  5. B. Antoine, N. Usunier, A. Garc´ıa-Dur´an, J. Weston, and O. Yakhnenko, “Translating embeddings for modeling multi-relational data,” in Proceedings of Neural Information Processing Systems, pp. 2787–2795, Lake Tahoe, NV, USA, December 2013. View at: Google Scholar
  6. Z. Wang, J. Zhang, J. Feng, and Z. Chen, “Knowledge graph embedding by translating on hyperplanes,” in Proceedings of Association for the Advances of Artificial Intelligence, pp. 1112–1119, Québec City, Canada, July 2014. View at: Google Scholar
  7. Y. Lin, Z. Liu, M. Sun, Y. Liu, and X. Zhu, “Learning entity and relation embeddings for knowledge graph completion,” in Proceedings of Association for the Advances of Artificial Intelligence, pp. 2181–2187, Austion, TX, USA, January 2015. View at: Google Scholar
  8. G. Ji, S. He, L. Xu, K. Liu, and J. Zhao, “Knowledge graph embedding via dynamic mapping matrix,” in Proceedings of the Association for Computer Linguistics, pp. 687–696, Beijing, China, July 2015. View at: Google Scholar
  9. G. Ji, K. Liu, S. He, and J. Zhao, “Knowledge graph completion with adaptive sparse transfer matrix,” in Proceedings of the Association for the Advances of Artificial Intelligence, pp. 985–991, Phoenix, AZ, USA, February 2016. View at: Google Scholar
  10. Yi Tay, A. T. Luu, and S. C. Hui, “Non-parametric estimation of multiple embedding for link prediction on dynamic knowledge graphs,” in Proceedings of the Association for the Advances of Artificial Intelligence, pp. 1243–1249, San Francisco, CA, USA, February 2017. View at: Google Scholar
  11. X. Tian Xia, D. Dacheng Tao, T. Tao Mei, and Y. Yongdong Zhang, “Multiview spectral embedding,” IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics), vol. 40, no. 6, pp. 1438–1446, 2010. View at: Publisher Site | Google Scholar
  12. D. Zhai, H. Chang, S. Shan, X. Chen, and W. Gao, “Multiview metric learning with global consistency and local smoothness,” ACM Transactions on Intelligent Systems and Technology, vol. 3, no. 3, pp. 1–22, 2012. View at: Publisher Site | Google Scholar
  13. B. Antoine, J. Weston, R. Collobert, and Y. Bengio, “Learning structured embeddings of knowledge bases,” in Proceedings of the Association for the Advances of Artificial Intelligence, pp. 301–306, San Francisco, CA, USA, February 2011. View at: Google Scholar
  14. B. Antoine, X. Glorot, J. Weston, and Y. Bengio, “Joint learning of words and meaning representations for open-text semantic parsing,” in Proceedings of the Artificial Intelligence and Statistics, pp. 127–135, Canary Islands, Spain, April 2012. View at: Google Scholar
  15. A. Bordes, X. Glorot, J. Weston, and Y. Bengio, “A semantic matching energy function for learning with multi-relational data,” Machine Learning, vol. 94, no. 2, pp. 233–259, 2014. View at: Publisher Site | Google Scholar
  16. R. Jenatton, N. Le Roux, B. Antoine, and G. Obozinski, “A latent factor model for highly multi-relational data,” in Proceedings of the Neural Information Processing Sysytems, pp. 3176–3184, Lake Tahoe, NV, USA, December 2012. View at: Google Scholar
  17. R. Socher, D. Chen, C. D. Manning, and A. Y. Ng, “Reasoning with neural tensor networks for knowledge base completion,” 2013. View at: Google Scholar
  18. J. Feng, M. Huang, M. Wang, M. Zhou, H. Yu, and X. Zhu, “Knowledge graph embedding by flexible translation,” in Proceedings of the Knowledge Representation and Reasoning, pp. 557–560, Cape Town, South Africa, April 2016. View at: Google Scholar
  19. L. Chang, M. Zhu, T. Gu, C. Bin, J. Qian, and J. Zhang, “Knowledge graph embedding by dynamic translation,” IEEE Access, vol. 5, pp. 20898–20907, 2017. View at: Publisher Site | Google Scholar
  20. P.F. Wang, S. Li, and R. Pan, “Incorporating gan for negative sampling in knowledge representation learning,” in Proceedings of Association for the Advances of Artificial Intelligence, pp. 2005–2012, New Orleans, LO, USA, February 2018. View at: Google Scholar
  21. K. Toutanova and D. Chen, “Observed versus latent features for knowledge base and text inference,” in Workshop on Continuous Vector Space and their Compositionality, pp. 57–66, Beijing, China, July 2015. View at: Google Scholar
  22. J. C. Duchi, E. Hazan, and Y. Singer, “Adaptive subgradient methods for online learning and stochastic optimization,” Journal of Machine Learning Research, vol. 12, pp. 2121–2159, 2011. View at: Google Scholar
  23. Tim Dettmers, P. Minervini, P. Stenetorp, and S. Riedel, “Convolutional 2D knowledge graph embeddings,” in Proceedings of Association for the Advances of Artificial Intelligence, pp. 1811–1818, New Orleans, LO, USA, February 2018. View at: Google Scholar
  24. M. Nickel, V. Tresp, and H. P. Kriegel, “A three-way model for collective learning on multi-relational data,” in Proceedings of International Conference on Machine Learning, pp. 809–816, Bellevue, WA, USA, February 2011. View at: Google Scholar
  25. B. Yang, W.-T. Yih, X. He, J. Gao, and Li Deng, “Embedding entities and relations for learning and inference in knowledge bases,” in Proceedings of International Conference on Learning Representations, pp. 1–13, San Diego, CA, USA, December 2015. View at: Google Scholar
  26. Th´eo Trouillon, J. Welbl, S. Riedel, E. ´ric Gaussier, and G. Bouchard, “Complex embeddings for simple link prediction,” in Proceedings of International Conference on Machine Learning, pp. 2071–2080, New York, NY, USA, June 2016. View at: Google Scholar

Copyright © 2020 Chenzhong Bin et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.


More related articles

 PDF Download Citation Citation
 Download other formatsMore
 Order printed copiesOrder
Views404
Downloads315
Citations

Related articles

Article of the Year Award: Outstanding research contributions of 2020, as selected by our Chief Editors. Read the winning articles.