Abstract

Knowledge graphs are crucial foundations for building intelligent systems, such as question answering and recommendation. However, their performance is hampered by the incompleteness of KGs, so the knowledge graph completion arises to infer whether a triple of the form (head entity, relation, tail entity) is a missing fact. The path-based approach that encodes paths from the head entity to the tail entity for reasoning achieves good performance. Previous work suggests that entity type is beneficial for learning path representations. Nevertheless, the semantics of entities are not captured accurately, as many entities are not typed or loosely typed. In addition, previous methods tend to model paths only from the forward direction but fail to capture new path patterns from the reverse direction (i.e., tail entity to head entity). In this paper, we introduce a structure enhanced path reasoning (SPR) framework to address the above-given problems. First, the model uilizes the structure of entities, i.e., their relational contexts (the relations linked from the given entity), to obtain a reliable path representation that captures correct entity semantics. This information is accessible to all nonisolated entities in all KGs, so that it can compensate the semantics for entities or KGs that have no type available. Second, we leverage the structure of paths to derive their reverse paths, so as to enhance the path representation by additionally encoding the new patterns embedded in them through a dual path encoding method. In order to verify the effectiveness of the proposed methods, we design different architectures based on LSTM and Transformer, respectively. Experimental results on two benchmark datasets, WN18RR, and FB15k-237, show that our approach apparently outperforms state-of-the-art methods on fact prediction task and relation prediction task. Furthermore, extensive experiments illustrate the benefits of enhancing path reasoning by exploiting structure information from entity relational contexts and the dual path encoding method.

1. Introduction

Knowledge graphs (KGs) contain millions of structured facts represented as triples, where each of them is stored as (head entity, relation, tail entity). As an effective way to store and search knowledge, KGs are critical for many enterprises to construct intelligent systems, such as web search [1, 2], question answering [35], and recommendation [6, 7]. Although many KGs have been built and published, such as WordNet [8], DBpedia [9], and Freebase [10], they are generally incomplete as a large number of facts are missing [11], which hinders the performance of intelligent systems. Therefore, knowledge graph completion (KGC) is extremely crucial in improving the quality of KGs by inferring missing relations between entities.

Generally, prior methods of KGC can be divided into three categories: embedding-based, rule-based, and path-based models. (1) Embedding-based models [1215] efficiently learn semantic connections between head entity, relation, and tail entity by mapping them to a continuous vector space, but they have long been criticized for lacking explainability. (2) Rule-based models [1618] mine rules composed of relations and variable entities, where each rule is assigned with a confidence score indicating the probability the rule holds. Despite their explainable strengths, they suffer from mining a limited form of rules dictated only by relations, resulting in poor discrimination. (3) Path-based models [19, 20] focus on modeling the path information between entities to interpretively predict their missing relation, which can learn not only regular relation patterns but also the semantics of entities on the path.

Due to the advantage that path-based methods can derive explainable inference results based on the explicit and interpretable paths between entities, they have been widely studied. The core of path-based methods is to learn the representations of paths, where each path consists of nodes representing entities and edges representing relations, as the dark green path shown in the left part of Figure 1. Earlier work only learns relation patterns of paths, by obtaining the path representation from the probability of executing random walk between two entities [19, 21], or by encoding the features of relation sequence on the path through RNN [22]. Since entities on the path also play an essential role in inference, later work [23, 24] takes entity information, such as entity itself and entity types, together with relations on the path to get the path representation by applying RNN or LSTM [25]. Although exploiting entity type information can simultaneously improve discrimination and generalization of path-based methods [24], there are still two problems that impede the performance of KGC.

First, these works only consider entities or entity types, many of which are not typed or loosely typed, leading to inaccurate and inadequate path representations. We argue that the local structure of an entity, by which we mean its neighboring relations, namely, relational contexts [26], can provide valuable contextual information. All nonisolated entities have relational contexts, and entities with different semantics own different relational contexts. Taking Figure 1 as an example, entity Jackie Chan which is an actor has relation playedInFilm, while entity Hong Kong which is a location links with relation locationLanguage. This suggests that entities linked with playedInFilm are likely to be actors, not locations. Furthermore, given a query relation, relational contexts should not receive equal attention; only those that are semantically similar to the query are important. For instance, when predicting the profession of Jackie Chan, relational contexts such as playedInFilm and directedFilm should be emphasized, while others such as gender and nationality should be disregarded.

Second, existing methods usually learn path representations only from the forward direction but fail to capture new patterns from the reverse structure of paths, i.e., reverse paths. The reverse path from tail entity to head entity contains new relation patterns along a “new” path composed of new relations. As shown in Figure 1, the relation pattern of path <Jackie Chan  Hong Kong cantonese> for predicting the query relation personLanguage is “placeOfBirth  locationLanguage  personLanguage.” While its reverse path <cantonese  Hong Kong Jackie Chan>, which infers the reverse query relation personSpeakTheLanguage, contains different relation patterns “locationSpeakTheLanguage personBornInTheLocation personSpeakTheLanguage.” So that the prediction results may differ when the encoding of reverse paths is also taken into account.

To tackle these problems, we propose a structure enhanced path reasoning (SPR) framework for KGC through leveraging the structure information of entities and paths, i.e., relational contexts and reverse paths, respectively. First, a multiperspective path encoder is applied to encode relations and entities information on a path to obtain the path representation. To get an accurate and adequate path representation that captures correct entity semantics, we propose to utilize multiperspective entity information, i.e., entity relational contexts and entity types for each entity, whose importance is considered differently through attention mechanism to be aggregated into entity features. Then, an attentive path aggregator is employed to fuse path features of all paths between the entity pair. Finally, to obtain path representations that are enriched by new relation patterns contained in reverse paths for final prediction, a dual path encoding method is proposed to combine path features of forward paths with reverse ones. Moreover, to verify the validity of the proposed approach, we not only implement the path encoder based on LSTM as in most previous work, but also apply Transformer [27] as the second architecture.

We evaluate our approach for both fact prediction task and relation prediction task on two benchmark datasets, WN18RR [14] and FB15k-237 [28]. Experimental results show that SPR outperforms state-of-the-art KGC approaches and achieves an absolute MAP gain for fact prediction over the best path-based baseline of 3.63% on WN18RR and 4.52% on FB15k-237. For relation prediction, SPR also scores the best, with MRR and hits@1 at 99.2% and 98.7% on WN18RR, respectively. Extensive experiments illustrate the effectiveness of utilizing entity relational contexts and dual path encoding. The code and datasets are available at https://github.com/wylResearch/SPR.

Our main contributions could be summarized as follows:(i)We introduce relational contexts of entities to learn accurate and sufficient path representations that capture correct entity semantics(ii)We propose to learn path representations by encoding relations, entity relational contexts and entity types on the path and additionally encoding reverse paths in a dual encoding manner, which enriches path representations and improves prediction performance(iii)The proposed method achieves the state-of-the-art for fact prediction task and relation prediction task on WN18RR and FB15k-237, and quantitative and qualitative analyses demonstrate the effectiveness of entity relational contexts and dual path encoding method for enhancing path representations

The remainder of this article is organized as follows. Section 2 provides a brief description of related work, and Section 3 gives a formal definition of the KGC task. A detailed description of the proposed framework SPR is reported in Section 4. The experimental settings and results are discussed in Sections 5 and 6, respectively. Finally, main conclusions and future works are summarized in Section 7.

2.1. Knowledge Graph Completion

There are mainly three categories of approaches to implement KGC: (1) embedding-based models learn low-dimensional distributed embeddings of entities and relations by designing a score function. The approach can be further classified into: distance-based models, such as TransE [12], TransH [29], RotatE [30], and HAKE [31]; similarity-based models, such as ComplEx [13] and ANALOGY [32]; neural network models, such as NTN [33], ConvE [14], and R-GCN [15]; models with additional information, such as DKRL [34], TKRL [35], KALE [36], and PTransE [37]. (2) Rule-based models mine interpretable rules from the knowledge graph to fill in missing facts based on existing knowledge, where a rule is defined in the form of , in which the is an atom, i.e., a fact with variable entities, and is a conjunction of atoms. Early approaches, such as AMIE [16] and RULES [38], mine the structure of rules in discrete spaces while learning their confidence in continuous spaces. Recent approaches, such as NeuralLP [17], DRUM [18], and NLIL [39], tend to employ neural networks to simultaneously learn the structure and confidence of rules in a continuous space. (3) Path-based models generally extract paths between an entity pair and learn the path features to make predictions. The approach falls into two major categories: path reasoning methods, such as PRA [19] and Chains [23], and reinforcement-based path finding methods, such as DeepPath [20] and MINERVA [40]. The former pays more attention to the path feature encoding process, while the latter focuses on extracting an effective inference path. In this paper, we focus on the path reasoning methods.

2.2. Path Reasoning Methods in KGC

Paths between entities are beneficial to the explainability of reasoning in KGC. Early path reasoning approaches [19, 41] utilize the probability of walking between entities following relation sequences to get the path features. Later on, neural networks are employed to encode the path. Neelakantan et al. [22] apply RNN to encode the relation sequence as path representations. Chains [23] encodes alternate sequences of entity information and relations by RNN. APR [24] separately encodes relation sequence and entity types by LSTM and merges them into the path feature. However, the above-given methods only encode paths in the forward direction but ignore capturing new patterns from reverse paths. Although Cor-PRA [21], which is based on PRA, alleviates this problem by conducting a bidirectional random walk to get the probability as path feature, it neither learns the semantics of relations on the path nor considers encoding entity information. Different from it, we not only model entity information along with its relational contexts to enhance path representations but also capture new patterns from a novel dual path encoding method.

2.3. Entity Information in KGC

Entities are critical for predicting missing knowledge in KGs. To enhance entity embedding learned solely from facts, some embedding-based methods apply additional knowledge to enrich entity semantics, such as entity descriptions [34, 42], entity types [35, 43], and entity neighbors [15, 44]. Recently, PathCon [26] proposes to fuse relational contexts into entity representation and further perform message passing over relation graphs for relation prediction. However, PathCon represents an entity by summing up the representations of its relational contexts, without considering their different contributions. Unlike the above approach, a query relation-guided portrait attention and the self-attention are considered in our LSTM-based and Transformer-based architectures, respectively, to capture the importance of relational contexts for each entity, so that various relational contexts contribute differently.

Among path-based methods, the entity is not considered by the earlier work [19, 22, 41], where only relations are used. In later approaches, the entities themselves and entity types are taken as entity information for learning path representation [23, 24]. Chains [23] simply sums entity type representations and combines it with entity embedding to represent an entity. APR [24] employs an attention mechanism on type hierarchies as entity semantics. While we further leverage entity relational contexts to capture reliable entity semantics, especially for entities that are not typed or loosely typed in KGs.

3. Problem Formulation

3.1. Knowledge Graph

Let be a KG, where is the set of entities, is the set of relations and is the set of facts. Each fact in can be defined as , where , . For each fact , we take its reverse form into account, where denotes the reverse relation of .

3.2. Paths

The paths from head entity to tail entity are denoted as . A path with length can be stated as , where , , , , , , and . We take as the reverse path of , so that , where is the set of reverse paths from to corresponding to .

3.3. Relational Contexts and Entity Types

For entity , we define its relational contexts as the relations in facts with as the head entity, that is . To distinguish it from relations on paths, is used to denote one of the relational contexts in , where . Let denote the types of entity , and is one of the types in , where . An example path with relational contexts and types shown for each entity is displayed in Figure 2.

Given a knowledge graph , a query relation , a head entity , a tail entity , and some paths extracted between the entity pair, the goal of KGC is to infer whether the triple is true, i.e., the probability that holds. Important symbols are summarized in Table 1, including those representations that will be described later. In addition, we use the bold letter to represent a vector, e.g., denotes the embedding of the relation .

4. Approach

We propose a structure enhanced path-reasoning framework, namely, SPR, which enhances path representations by incorporating relational contexts into entity semantics and encoding reverse paths in a dual path encoding manner, to perform KGC. The overall architecture is presented in Figure 3, which consists of three components: (1) the multiperspective path encoder (MPE) encodes relations and multiperspective entity information, i.e., relational contexts and types of each entity, to learn accurate and adequate path representations. We explore different architectures that are based on LSTM and Transformer, which are described in Sections 4.1 and 4.2, respectively. (2) The attentive path aggregator fuses the representations of paths, where important and useful paths are highly weighted (Section 4.3). (3) The dual path encoding combines the path representations of forward paths and reverse ones to enhance the final prediction (Section 4.4).

4.1. LSTM-Based Multiperspective Path Encoder

The multiperspective path encoder encodes relations, entity relational contexts and entity types on a path to get its path representation , i.e., , where is the function for encoding a path. We present the architecture based on LSTM to implement in this section, termed LSTM-MPE, as shown in Figure 4. It consists of a LSTM-based relation encoder and a LSTM-based entity encoder to encode the path information separately.

4.1.1. Relation Encoder

For a path , its relation sequence is the key to inferring between . The relation encoder applies a LSTM, namely , to encode , and the encoding step can be expressed aswhere is the hidden state of at step , and is the randomly initialized embedding of relation . The last hidden state, formulated as , is used to denote the relation representation of path .

4.1.2. Entity Encoder

Entity information plays a crucial role during the inference, which can lead to different path representations even when the relation sequences of paths are the same. Prior path-based work only considers entity or entity types as entity information, which leads to unreliable path representations that capture inaccurate and insufficient entity semantics. To compensate for the entity semantics obtained solely from entity types, we introduce relational contexts as additional entity information.

For entity sequence of path , its entity information includes two perspectives: entity relational contexts and entity types, namely, relational context sequence and entity type sequence . Encoding each of them comprises two modules: a Portrait Attention module and a Sequence Encoding module. Since the two kinds of entity information are encoded in the same way, we will use the relational context sequence as an example for a detailed description.

(1) Relational Context Portrait Attention. This module aims to fuse the relational contexts of each entity of the path, i.e., for step of , fuse to obtain the feature vector of entity about relational contexts. Inspired by APR [24], we use the current path features to lead the selection of important relational contexts of an entity. Furthermore, we consider that when predicting different query relations, the relational contexts of the same entity should be given different emphases. Therefore, the query relation is employed to guide the selection. Formally, for path , the attention to relational context is guided by the query relation and the path feature about the relational contexts selected in the previous steps. Denote the path feature about relational contexts before step as , which we will illustrate the detail later. This historical feature and the embedding of query relation are first concatenated and then feed into a single-layer feed-forward neural network to obtain the guidance vector :where is the embedding of query relation . Then, is combined with the embedding of relational context to compute its attention weight :where denotes the randomly initialized embedding of relational context , and , , and are all single-layer feed-forward neural networks. Next, the normalized attention weight of relational context is calculated as

Now, we can compute the fused relational context representation for entity , i.e., , which is the weighted sum of all the transformed features of relational contexts in :

(2) Relational Context Sequence Encoding. The goal of this module is to obtain the relational context representation of the whole path, namely, , from relational context sequence of . We employ a LSTM, denoted as , to encode the features on the sequence. Let denote its hidden state of step , which is the historical feature mentioned in equation (2). It is used to obtain the fused feature vector (equation (5)) of , for input to at step . Formally, the encoding step of relational context sequence of can be expressed as

The initialization of hidden state and cell state for of entity sequence is guided by the relation representation of the corresponding relation sequence :where both and are single-layer feed-forward neural networks. Let be the last hidden state of for , to represent the relational context representation of the path.

(3) Entity Type Portrait Attention and Entity Type Sequence Encoding. The method for encoding entity type sequence has the same structure as that for encoding relational context sequence, but with different parameters. Denote the LSTM that encodes entity type sequence as , then its last hidden state, notated as , which represents the entity type representation of path can be obtained similarly.

Then, the relational context representation and entity type representation are summed to represent the entity representation of path , formally written as

The relation representation and entity representation are concatenated as the path representation as follows, where means concatenation:

4.2. Transformer-Based Multiperspective Path Encoder

Transformer-based networks show great power in fields such as natural language processing [45, 46] and computer vision [47, 48]. Although some methods apply it to solve the knowledge graph completion problem, they do not use it to encode both relation and multiperspective entity information on a path [49, 50]. Therefore, in this section, we try to investigate its abilities to encode path information and propose a Transformer-based multiperspective path encoder, notated as Transformer-MPE, to implement .

For a path , its relation sequence , entity relational context sequence , , and entity type sequence are processed into three separate input sequences, namely, the relation input sequence , entity relational context input sequence , and entity type input sequence . Inspired by pretrained language models [45], we add “[CLS]” token and “[SEP]” token to segment these sequences. As illustrated in Figure 5, “[CLS]” and “[SEP]” are placed at the beginning and the end of a sequence, respectively. The “[SEP]” token is also used to separate the relational contexts or types of different entities in the entity information sequence. The embeddings of these two special tokens are randomly initialized.

Then, these input sequences, i.e., , , and , are fed into three separate Transformer encoders, namely, , , and , respectively. Since different kinds of information on the path are separately put into different encoders, we only consider the input embeddings as the sum of token embeddings and position embeddings, without the segment embeddings. Moreover, considering the self-attention mechanism in Transformer, instead of designing additional attention for the relational contexts or types within each entity, we treat them as a sequence with the same position embeddings. Hence, the overall semantics of all entities at different positions on the path can be observed, and important relational contexts or entity types can be identified for inference.

The output hidden states of “[CLS]” token of these encoders are taken as relation representation , relational context representation , and entity type representation of path , respectively, which can be written as

Next, is summed with as the entity representation , which is next concatenated with to get the path representation of path , as in LSTM-based path encoder (equations (8) and (9)). Note that, different from the LSTM-MPE, is not considered in the Transformer-MPE which we will leave as future work.

4.3. Attentive Path Aggregator

The paths between an entity pair are usually extracted randomly, and most of them are useless or even noisy for inference. It is therefore necessary to select important and useful ones among these paths. The attentive path aggregator is applied to weigh the paths and fuse their features. The weight of path is determined directly by its path feature , which can be formulated aswhere and are single-layer feed-forward neural networks. Then, the normalized attention weight of iswhere is the number of paths in . Finally, the pooled representation of all paths in , i.e., , is computed as the weighted sum of these path representations:

4.4. Dual Path Encoding

Prior approaches only use feature of paths to predict whether is connected by , i.e., . However, it overlooks the role of reverse paths for reasoning in assisting the prediction. Although the inverse sequence can be obtained from the path , it is unable to reason . The reason is that would break the directionality of relations. For example, take , then in sequence there is , i.e., , while in path there is , which is contradictory. In order to enhance the inference about using features about , the dual path encoding module additionally models representations of reverse paths . The reverse path maintains the relation directionality and thus can be used to predict , which means but with different path features than . This is because the reverse path contains new relation patterns combined by new relations , which will result in new path features.

This module learns the representation of reverse paths in the same way as . That is, for a reverse path , the multiperspective path encoder encodes it as the path representation . Then, paths in are fused into by the attentive path aggregator. Finally, the dual-path representations and are utilized by simply applying a two-layer feed-forward neural network to predict the probability :where is the function.

Note that, encoding reverse paths share the parameters of the multiperspective path encoder with encoding forward ones. Two attentive path aggregators are applied separately to fuse features of paths in and of reverse paths in . In LSTM-MPE, both relational context portrait attention and entity type portrait attention for reverse path are led by reverse query relation in equation (2), and the initialization of in relational context sequence encoding and in entity type sequence encoding are guided by relation representation of reverse path in equation (7).

4.5. Training

Let denote the set of query relations. For each task (query relation) , the true triples in the KG with as relation are regarded as positive samples while the unobservable triples with as relation are treated as negative samples . The ground truth is 1 for positive sample and 0 for negative sample. The parameters of the model for each can be trained end to end by minimizing the loss between predictions and ground truths:where is the training samples about , i.e., , and is the binary cross-entropy loss. The training process of SPR is presented in Algorithm 1.

Input: KG , query relations , training examples , validation examples , paths between the entity pair of each example, types of each entity.
Output: The set of model parameters for each query relation, i.e., .
(1)for each query relation in do
(2) Initialize model SPR with parameters ;
(3)fordo
(4)  fordo
(5)    sample a batch of training examples from ;
(6)   for in do
(7)    Compute the prediction probability by equation (14);
(8)    Compute the loss by equation (15);
(9)   Update by minimizing the batch loss;
(10) Validate using validation examples ;

5. Experiments

5.1. Datasets

We evaluate our model using the data released in APR [24] for experiments, which are based on two common KGs datasets: WN18RR [14] and FB15k-237 [28]. The statistics are shown in Table 2. The data contains positive and negative examples, as well as paths extracted by bidirectional breadth first search between the entities in each example for each query relation. Reverse relations have already been added to augment these knowledge graphs for path extraction. The 11 relations in WN18RR are all set as query relations, and 10 relations are sampled as queries out of 237 relations in FB15k-237. The maximum length of a path is 6 for WN18RR and 4 for FB15k-237. The maximum number of paths between each entity pair is 200, which means that if more paths can be extracted, then 200 paths of them are randomly sampled. The type hierarchies of each entity are provided in both datasets. Specifically, the type data of entities for WN18RR is extracted from inherited hypernyms available in WordNet [8], and for FB15K-237, it is those released by Xie et al. [35]. The average and maximum numbers of entity types are 4.6 and 14 for WN18RR, 6.4 and 7 for FB15k-237. For more details, please refer to APR [24]. As for the relational contexts used in the proposed method, we count them for each entity in both KGs, where the average and maximum numbers of relational contexts are 2.6 and 9 for WN18RR, 10.6 and 59 for FB15k-237.

5.2. Experimental Settings

We present the settings according to the different components of the proposed SPR. For LSTM-MPE, its hidden dimension of all the LSTMs (i.e., , , and ) is set to 150 for both datasets. The output dimensions of the fully connected layers for encoding relational context sequence, i.e., , , , , and , are set to 50, 50, 1, 150, and 150, respectively, and the same for the fully connected layers that encode entity type sequence. For Transformer-MPE, the number of Transformer layers and the number of heads per layer of , , and are all set as 3 and 2 on both datasets. The dimension of the feed-forward network in these Transformer layers is set to 150. For attentive path aggregator and dual path encoding, the output dimensions of , , and are set to 100, 1, and 1, and the output dimension of is set to be the same as . Since SPR can be built on different MPE, we denote the overall model based on LSTM-MPE and Transformer-MPE by SPR_LSTM and SPR_Transformer, respectively.

For both architectures, the embedding matrix of relations is randomly initialized for both datasets. The embedding matrix of entity types for WN18RR is initialized with vectors provided by APR [24], which is mapped from a pretrained Google News word2vec model. The embedding matrix of entity types for FB15k-237 is randomly initialized. As for relational contexts, another random initialized embedding matrix different from relations is applied. Let , , and be the embedding dimensions of relations, relational contexts, and entity types, respectively, whose values are listed in Table 3. The embedding matrices of position in SPR_Transformer are randomly initialized, with the same dimensions as the corresponding input tokens (relations, relational contexts, or entity types). Table 3 also reports the maximum number of relational contexts and types used for each entity in the model, i.e., and , as well as the batch size and learning rate . Note that, we set to the maximum number of types of the entity, i.e., we use all the types contained in it. However, for relational contexts, the number of them for some entities is so large, e.g., up to 59 in FB15k-237, so that we use only instead of all of them. Due to the limitation of computational resources, SPR_Transformer is set to a smaller batch size than SPR_LSTM, and the learning rate is reduced accordingly. We employ the self-adaptive optimization method Adam [51] for all trainings, and the model is trained fully to 30 epochs, where the best one is chosen based on validation sets. All experiments are conducted with one Tesla V100 GPU.

5.3. Evaluation Metrics

We evaluate the performance of SPR for implementing KGC on two tasks, namely, fact prediction and relation prediction. For fact prediction task, where the system needs to, given a missing triple , identify whether it is true or not. We utilize the classical mean average precision (MAP) as the evaluation metric following previous path-based models. It measures the model performance on all categories (here on all query relations), which is the mean of the average precision (AP) for each category. As for relation prediction task, the goal is to predict the missing relation between the entity pair . We adopt typical ranking metrics including mean rank (MR), mean reciprocal rank (MRR), and Hits@n.

5.4. Baselines

We select several state-of-the-art models with released code as baselines:(1)Embedding-based models: TransE [12], PTransE [37], RotatE [30], HAKE [31], PairRE [52], and PathCon [26]. Among them, paths are additionally considered in PTranse and PathCon. We test these embedding-based models by their released codes with corresponding optimal parameters, and the hidden dimension is set to 500 on WN18RR and 1000 on FB15k-237. Following Xiong et al. [20], test triples with the same relation are clustered together to calculate the AP and finally obtain the MAP. Note that, since PathCon focuses on relation prediction task by calculating the probability distribution of relations given an entity pair, we take the probability of the query relation for an entity pair as its triple score to calculate MAP for fact prediction task.(2)Path-based models: PRA [19], SFE [41], Path-RNN [22], Chains [23], and APR [24], where their details are described as follows:(a)PRA [19]: adopts the probabilities of performing random walks between entities following relation sequences as path features.(b)SFE [41]: improves PRA [19] by regarding relation paths as binary features without calculating the random walk probabilities.(c)Path-RNN [22]: encodes the relation sequence on the path through RNN as path features for prediction.(d) [23]: encodes relations and entity types on the path by RNN to jointly create the path representation and consider multiple paths for inference, where the features of multiple types of an entity are averaged.(e) [23]: goes a step further than by combining the features of entities themselves to obtain the path representation, in addition to the relations and the averaged entity type features.(f)APR [24]: utilizes two LSTMs to separately encode relations and weighted entity type hierarchies on the path and combines them into path representations which are aggregated by an attention mechanism.

The first two approaches, i.e., PRA [19] and SFE [41], are tested with code released by Gardner and Mitchell [41]; Path-RNN [22], [23], and [23] are tested with code released by Das et al. [23], which use LogSumExp method proposed in Chain [23] to pool scores of paths; APR [24] are tested with code released by Liu et al. [24]. We use their fact prediction results reported by Liu et al. [24].

6. Results and Analysis

In this section, we first analyze the main results of SPR against the baselines for fact prediction task in Section 6.1. Then, we additionally build a unified version of SPR and report its results for fact prediction task in Section 6.2. The results for relation prediction task are presented in Section 6.3. To explore the effects of using entity, entity relational contexts, entity types, and dual path encoding, we design several variants of SPR_LSTM in Section 6.4. Next, the contribution of each component is described in Section 6.5. We also illustrate the influences of different strategies for choosing relational contexts, as well as the effects of the value of in Section 6.6. At last, to better understand the effectiveness of the proposed approach, several case studies are shown in Section 6.7.

6.1. Main Results of Fact Prediction

The fact prediction results of SPR against various baselines are shown in Table 4. We can observe that both LSTM-based and Transformer-based SPR outperforms all the baselines in both datasets. (1) Compared to path-based models, the results suggest the effectiveness of SPR in enhancing path features by utilizing relational contexts and dual path encoding, where SPR_LSTM achieves 3.63% and 4.52% improvements over the best on WN18RR and FB15k-237, respectively. SPR_Transformer is not as good as SPR_LSTM, probably due to the fact that the information of the query relation is not fused in MPE-Transformer, which can refer to Section 6.5 for more analysis. Insufficient training data may also be the reason that hinders the great power of Transformer, since each query relation has its own training data and model parameters (more detailed analysis can be found in Section 6.2). (2) The superiority of SPR is displayed on WN18RR with a MAP higher than 87%, which beats the embedding-based approach by a large margin where the MAP is no higher than 55%. This is probably because SPR does not use the embedding of entity as most embedding-based methods do, but uses relational contexts and types to represent the entity. When there are a huge number of entities in a KG, the use of entity embedding may introduce noise and lead to performance degradation as discussed in Section 6.4. In addition, it may also benefit from the ability of SPR to efficiently utilize information on the path for prediction, where the results in Section 6.4 show that different kinds of information on paths play essential roles in performance improvement. (3) On FB15k-237, even though the recent embedding-based method HAKE with 58.11% MAP shows slightly better performance than the path-based methods APR with 57.35% MAP, SPR beats both of them. However, compared with the embedding-based approach, the effectiveness of SPR on FB15k-237 is not as obvious as on WN18RR, which may be caused by the higher number of relations on FB15k-237. This will lead to a great number of combinations of relational patterns on the paths, thus making it more difficult for SPR to learn knowledge from the paths and make predictions. (4) SPR beats PathCon on both datasets, even though it also utilizes relational contexts and paths between entity pairs, indicating the validity of SPR by treating paths as sequences and identifying different contributions of multiple relational contexts for each entity.

6.2. Results of Unified Version
6.2.1. SPR_Unified

Recall that SPR treats different query relations as different tasks for training in Section 4.5. In order to obtain a model that takes all relations as one task for inference, we modify the proposed SPR as a unified model, namely, SPR_Unified. Given an entity pair , instead of predicting the probability of a fact consisting of a certain query relation , SPR_Unified predicts a vector representing the relation features between and , so that the score of can be generated by matching against the embedding of via dot product. Specifically, it replaces the prediction in equation (14) by the following formulas:where the output dimension of the fully connected layer is changed to be the same as the dimension of relation embeddings, is the relation embedding of query relation , and is the function.

Input: KG , query relations , training examples for all query relations, validation examples for all query relations, paths between the entity pair of each example, types of each entity.
Output: The model parameters for all query relations, i.e., .
(1)Initialize model SPR_Unified with parameters ;
(2)fordo
(3)fordo
(4)   sample a batch of training examples from ;
(5)  for in do
(6)   Compute the prediction probability by equation (17);
(7)   Compute the loss by equation (18);
(8)  Update by minimizing the batch loss;
(9) Validate using validation examples ;
6.2.2. Training

For training SPR_Unified, positive samples are composed of all the positive instances corresponding to each query relations in SPR, i.e., . And, negative samples are obtained by corrupting the relation of each positive triple in . That is, for a positive triple in , we obtain its negative samples by corrupting its relation by all relations that do not exist between and , i.e., . The ground truth is 1 for positive sample and 0 for negative one. SPR_Unified is trained by minimizing the loss between predictions and ground truths over the training samples :where , and is the binary cross-entropy loss. The training process is shown in Algorithm 2.

6.2.3. Evaluation

Note that, query relation is encoded in SPR_LSTM, so we remove the guide of in equation (2) to get its unified version, noted as SPR_Unified_LSTM. And we denote SPR_Unified_Transformer as the unified SPR_Transformer. For comparison, we also modify APR [24] to APR_Unified in the same way. We evaluate these unified models for fact prediction and relation prediction task on WN18RR. It is worth noting that we do not evaluate on FB15k-237, since this dataset released by Liu et al. [24] contains only samples of 10 query relations out of 237 relations and cannot perform complete training on the unified model for all relations.

Results for fact prediction are listed in Table 5, we find that: (1) the unified version SPR_Unified_Transformer achieves the best result of 81.32% MAP, which gains an absolute improvement of 27.48% and 6.7% compared to the previous optimal embedding-based model HAKE and path-based model APR_Unified, respectively. (2) Comparing SPR_Unified_Transformer with SPR_Unified_LSTM, the Transformer-based architecture is clearly superior to the LSTM-based one with a 9.49% higher MAP, which illustrates the ability of Transformer to handle large-scale and complex data. (3) Compared to APR_Unified, which is also a path-based method built on LSTM, SPR_Unified_LSTM shows a performance degradation. This can be explained by the inability of LSTM to handle complex data and learn useful information when considering relational contexts and dual path encoding method in the unified version, resulting in a MAP decline of SPR_Unified_LSTM. (4) The performances of unified versions, including APR_Unified and SPR_Unified, are reduced compared to the corresponding original models. This is because in the unified version, there is only one set of model parameters to be trained, rather than one set for each query relation. That is, the total number of parameters of the unified model is reduced compared to the original one, while the amount of data for training it is increased. As a result, the unified version fails to learn and remember all the knowledge, leading to reduced performance. However, the decline in our SPR_Unified_Transformer is the least, with 6.21% lower than SPR_Transformer, while the drop of APR_Unified is 10.29% lower than APR [24]. This once again indicates the ability of the Transformer-based architecture to deal with complex and massive data.

6.3. Main Results of Relation Prediction

The relation prediction results of SPR and its unified version on WN18RR are listed in Table 6. As mentioned above, the dataset FB15k-237 released by Liu et al. [24] contains only samples for predicting 10 query relations, which is not suitable for relation prediction task of predicting all relations, so we only evaluate on WN18RR.

We can conclude that: (1) SPR_Unified achieves the best results in several metrics such as MR, MRR, hits@1, and hits@5. Although the hits@3 of SPR_Unified is slightly worse than APR_Unified by about 0.1% to 0.2%, it obtains the best MRR and MR since the hits@1 is at least 0.5% higher. (2) SPR_Unified_LSTM and SPR_Unified_Transformer obtains comparable performance, with the former having a higher hits@3 and the latter having a higher hits@1. However, compared to their fact prediction performance in Table 5, SPR_Unified_Transformer outperforms SPR_Unified_LSTM by a large margin, indicating that the MAP for fact prediction task is a more rigorous evaluation metric. (3) It is not surprising that the unified version SPR_Unified shows better performance than SPR for relation prediction task, while the opposite conclusion is reached for fact prediction. It is mainly because SPR_Unified learns probability distributions over all relations in one model, while SPR gets different probability distributions for each query relation which has separate model parameters. In SPR, a query relation has a high triple score does not mean that the query relation will rank top among all relations.

6.4. Variants Analysis

Several variants of SPR_LSTM are explored for comparison, to investigate the effects of using entity, entity relational contexts and entity types, as well as dual path encoding: (1) Path_LSTMr does not encode entity information, which suggests that the path representation in equation (9) is obtained only from the relation representation, i.e., ; (2) Path_LSTMe encodes entity information by directly encoding entity sequences ( and ), i.e., it uses entity embeddings as input to the LSTM in entity encoder, rather than encoding sequences of relational contexts and entity types; (3) Path_BiLSTM does not apply the dual path encoding method, i.e., it uses only but conducts bidirectional encoding, where the backward encoding of a path is equivalent to encoding its inverse sequence , rather than its reverse path . Specifically, Path_BiLSTM encodes relation sequence by Bi-LSTM to obtain and and encodes the sequences of the relational contexts and entity types of by and forward and backward, to obtain , , , and .

Table 7 shows the results of SPR_LSTM against its variants, from which we can find that: (1) learning path representation with the dual path encoding method by taking relational contexts, entity types as entity information, achieves the best results. (2) Directly encoding entities may not bring benefits. Compared to Path_LSTMr, although the results of Path_LSTMe increase in FB15k-237 (from 51.88% to 55.32%), they deteriorate in WN18RR (from 82.64% to 78.21%), which confirms that entity embedding may lead to noise. This may be due to the larger number of entities and the smaller number of facts in WN18RR, which highlights the drawbacks of embedding-based methods that learn entity representations directly. (3) Compared with the Path_BiLSTM, which encodes the relation sequence and entity information sequence directly forward and backward, the results of SPR_LSTM gain 1.69% and 2.81% on WN18RR and FB15k-237 respectively, indicating that the dual path encoding method in SPR_LSTM works better than encoding forward paths and their inverse sequences in Path_BiLSTM. It suggests the indispensable contribution of the reverse paths that comprise new relation patterns.

6.5. Ablation Study

We conduct ablation studies to investigate the effectiveness of different components of SPR: (1)-DualPath: removing the dual path encoding method, which means we only encode forward paths to get and use it to calculate in equation (14); (2)-Rel_ctx: removing entity relational contexts for paths and , which implies that the entity representation in equation (8) is obtained only from the entity type representation, i.e., and ; (3)-Type: removing entity types for paths and , where the entity representation in equation (8) is obtained only from the relational context representation, i.e., and ; (4)-: removing the guide of query relation in relational context portrait attention and entity type portrait attention in SPR_LSTM, i.e., the attention for relational contexts and types is only guided by historical path feature in equation (2).

From the results shown in Tables 8 and 9, it can be observed that: (1) removing the dual path encoding (-DualPath) results in performance degradation in both datasets, which demonstrates that the learned features of reverse paths are different from those of forward paths and can help predict the query relation. (2) Using only entity relational contexts (-Type) or entity types (-Rel_ctx) degrades performance. This proves that these two kinds of information are complementary to the semantic portrayal of entities, thus together they can enhance the representation of entities. Besides, the performance drop is more pronounced when relational contexts are not used, which indicates its greater contribution than types. (3) For SPR_LSTM, the results decline when without the guide of query relation (-), indicating that query relation can help to spotlight essential relational contexts and types of the entity. Compared to SPR_Transformer, neither of which uses information of query relation, its performance is slightly lower (0.05%) on FB15k-237, suggesting that the guidance of query relation is more important on FB15k-237. (4) For SPR_LSTM, the removal of dual path encoding (-DualPath) has the most noticeable decreases, which is 3.91% and 4.62% in WN18RR and FB15k-237, respectively. For SPR_Transformer, the most influential setting is the removal of relational contexts (-Rel_ctx), with drops of 1.73% and 3.83% in WN18RR and FB15k-237, respectively. These imply that relational contexts and dual path encoding are critical for improving the ability to path reasoning.

6.6. Influence of Entity Relational Contexts

We examine the impact of relational contexts in this section. First, three strategies are set to rank the relational context of an entity based on how often it appears in the facts with the entity as the head: (1) frequent_sort: sorts the relational contexts in descending order of frequency; (2) infrequent_sort: sorts the relational contexts in ascending order of frequency; (3) random_sort: sorts the relational contexts randomly. Second, we investigate the influence of , which is the maximum number of relational contexts used for each entity. That is, if there are more than relational contexts of an entity, then only the top of them are used.

The results based on SPR_LSTM are displayed in Figure 6. We can notice that: (1) although the performance on the two datasets displays different trends, the results are relatively high when is close to the average number of relational contexts per entity. (2) On WN18RR, regardless of the value of , the performance is approximately the same for all three sorting scenarios. This may be because the average number and the kinds of relational contexts are both small, i.e., 2.6 and 11, respectively. Thus, when is smaller than the average number of 2.6, most entities select the same relational contexts under all three strategies. Even when is greater than 2.6, the number of combinations of entity relational contexts on the path is not large, therefore the strategies make little difference. (3) On FB15k-237, the prediction performance gets the best apparently with the frequent_sort, while it shows erratic performance as increases under the infrequent_sort and random_sort settings. This indicates that frequently occurring relational contexts are more representative of the semantics often expressed by the entity. While using infrequent relational contexts would be inconsistent with the semantics the entity intends to represent in most cases.

6.7. Case Study

We analyze the weights of relational contexts and types of entities on the paths. Figure 7 gives a path of a positive example in FB15K-237 learned by SPR_LSTM. We can observe that new knowledge can be learned from relational contexts, so as to enrich the semantics of entities. For instance, most of the learned highlight types of entity RazzieAward shown in the first column are meaningless pad rather than meaningful award category. While the relational contexts nomineeWorkOfAward and winningWorkOfAward of this entity RazzieAward are given higher weights, which fills a gap in the entity information learned from types. Moreover, relational contexts can capture what kind of semantics an entity is concerned with on a path. As the entity Machete shown in the second column, its relational context awardHonoredFor which is about award is emphasized for inferring the query relation nomineeWorkOfAward, while filmLanguage is neglected.

We also count the important relational contexts learned for head entities and tail entities of several query relations, where the results are in Table 10. As can be seen, most of the emphasized relational contexts are logically meaningful for predicting the query relation. For example, when inferring facts with query relation personProfession, i.e., predicting the profession (tail entity) of a person (head entity), athleteSportTeam and filmDesignby are highly valued, which can infer professions like “athlete” and “actor,” respectively. However, relational contexts such as nationality and gender, which are owned by most entities in the KG, are ignored, suggesting that the model can capture important relational contexts associated with the query relation.

In addition, Table 11 shows the prediction probability for inferring query relation , where only one forward path is available for the listed positive examples. The reverse path, and the relational contexts and types of entities are not displayed. Model Path_LSTMt is a variant that removes dual path encoding, entity relational contexts, and query relation in entity information attention from SPR_LSTM, which means it only encodes relations and entity types of forward paths as the path representations. From the comparison with Path_LSTMt, it is clear that SPR_LSTM predicts correctly, which demonstrates the contribution of relational contexts and dual path encoding.

7. Conclusion

In this paper, we propose a path-based framework named SPR for knowledge graph completion by enhancing the path representation through the structure information of each entity and each path, i.e., relational context and reverse path, respectively. We utilize relational contexts of entities to obtain a reliable path representation that captures accurate and sufficient entity semantics, where different weights of relational contexts are taken into account. Moreover, a dual path encoding method is used to enrich the path representation by capturing new path patterns contained in reverse paths. Different architectures based on LSTM and Transformer are designed to encode the information on the path, including relations, entity relational contexts, and entity types. Experimental results on fact prediction task and relation prediction task show that SPR outperforms the state-of-the-art models. Quantitative and qualitative experiments demonstrate the effectiveness of enhancing path reasoning through utilizing entity relational contexts and dual path encoding method.

It will be part of our future work to focus on designing a Transformer-based architecture, which fuses the query relation to guide the important selection of entity relational contexts and entity types. In addition, leveraging textual descriptions encoded by pretrained language models to complement the semantics of entities may be beneficial for path reasoning. Therefore, there is a potential for further additional research on how to integrate the textual semantic information with structure information on the path for final prediction.

Data Availability

The datasets supporting this research are from previously reported studies, which are open-sourced and have been cited in the paper. The codes used to support the findings of this study are open-sourced at https://github.com/wylResearch/SPR.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Authors’ Contributions

Yilin Wang and Zhen Huang are equally contributing authors.

Acknowledgments

This work was supported by the National Natural Science Foundation of China (Grant no. 62006243).