Abstract

Knowledge graph, a typical multi-relational structure, includes large-scale facts of the world, yet it is still far away from completeness. Knowledge graph embedding, as a representation method, constructs a low-dimensional and continuous space to describe the latent semantic information and predict the missing facts. Among various solutions, almost all embedding models have high time and memory-space complexities and, hence, are difficult to apply to large-scale knowledge graphs. Some other embedding models, such as TransE and DistMult, although with lower complexity, ignore inherent features and only use correlations between different entities to represent the features of each entity. To overcome these shortcomings, we present a novel low-complexity embedding model, namely, SimE-ER, to calculate the similarity of entities in independent and associated spaces. In SimE-ER, each entity (relation) is described as two parts. The entity (relation) features in independent space are represented by the features entity (relation) intrinsically owns and, in associated space, the entity (relation) features are expressed by the entity (relation) features they connect. And the similarity between the embeddings of the same entities in different representation spaces is high. In experiments, we evaluate our model with two typical tasks: entity prediction and relation prediction. Compared with the state-of-the-art models, our experimental results demonstrate that SimE-ER outperforms existing competitors and has low time and memory-space complexities.

1. Introduction

Knowledge graph (KG), as an important part of the artificial intelligence, is playing an increasingly more essential role in different domains [1]: question answer system [2, 3], information retrieval [4], semantic parsing [5], named entity disambiguation [6], biological data mining [7, 8], and so on [9, 10]. In knowledge graphs, facts can be denoted as instances of binary relations (e.g., PresidentOf (DonaldTrump, American)). Nowadays, a great number of knowledge graphs, such as WordNet [11], Freebase [12], DBpedia [13], YAGO [14], and NELL [15] usually do not appear simultaneously. Instead, they were constructed to describe the structured information in various domains [16], and all of them are fairly sparse.

Knowledge representation learning [1719] is considered as an important task to extract the latent features from associated space. Recently, knowledge embedding [20, 21], an effective method of feature extraction [22], was proposed to compress a high-dimensional and sparse space into a low-dimensional and continuous space. Knowledge embedding can be used to derive new unknown facts from known knowledge bases (e.g., link prediction) and to determine whether a triplet is correct or not (e.g., triplets classification) [23]. Moreover embedding representation [24] has been used to support question answer systems [25] and machine reading [26]. However, almost all embedding models only use the features and attributes in knowledge graph to represent entities and relations, which omits the fact that entities and relations are projections of the facts in independent space. Besides, almost all of them have high time and memory-space complexities and cannot be used in large-scale knowledge graphs.

In this research, we propose a novel similarity-based knowledge embedding model, namely, SimE-ER, which calculates the entity and relation similarities between two spaces (independent and associated spaces). A sketch of the model framework is provided in Figure 1. The basic idea of this paper is that independent and associated spaces are used to represent the irrelevant and interconnected entities (relations) features, respectively. In independent space, the features of entities (relations) are independent and irrelevant. By contrast, the features of entities (relations) in associated space are interconnected and interacting, and the entities and relations can be denoted by the entities and relations connected with them. Plus, the similarities of the same entities (relations) with different spaces are high. In Figure 1, we can see that, in independent space, the features of are only constructed by themselves, but, in associated spaces, the entity is denoted by other entities and relations which can be described as blue points (lines). We want the features of in independent and associated spaces to be similar. Besides, vector embedding is used to represent knowledge graphs.

In associated space, take as an example the entity which Steve Jobs has multiple triplets, such as (Steve Jobs, Apple Inc., FoundOf), (Steve Jobs, America, Nationality), and (Steve Jobs, Laurene Powell, CoupleOf). If we combine all corrupt triplets with the same missing entity, such as (…, Apple Inc., FoundOf), (…, America, Nationality), and (…, Laurene Powell, Couple), it is easy to locate that the missing entity is Steve Jobs. Similarly, if we combine all the corrupt triplets with the same relation, such as (Steve Jobs, Apple Inc., …), (Jack Ma, Alibaba, …), and (Sundar Pichai, Google, …), we can obtain that the missing relation is FoundOf. The scenario is shown in Figure 2. Hence using correlation between different entities to represent features is an effective method. However, in practice, it is unsuitable to only use the correction between different entities and omit the inherent features entities have, such as the attributes of each entity which are hard to represent with the correlations between different entities. Therefore, we construct the independent space which can preserve the inherent features each entity has. We combine both independent and associated spaces to represent overall features of entities and relations, which can in turn represent the knowledge graph more comprehensively. The motivation of employing both types of spaces is to model correlation while reserving individual specificity.

Compared with other embedding models, vector embedding has evident advantages on time and memory-space complexities. We evaluate SimE-E and SimE-ER on the popular tasks of entity prediction and relation prediction. The experiment results validate the competitive results achieved by the proposed method compared with previous models.

Contributions. To summarize, the main contributions of this paper are as follows:(i)We propose a similarity-based embedding model, namely, SimE-ER. In SimE-ER, we consider the entity and relation similarities of different spaces simultaneously, which can extract the features of entities and relations comprehensively.(ii)Compared with other embedding models, our model has lower time and space complexity, which improves the effectiveness of processing large-scale knowledge graphs.(iii)Through thorough experiments on real-life datasets, our approach is demonstrated to outperform the existing state-of-the-art models in entity prediction and relation prediction tasks.

Organization. We discuss related work in Section 2 and then introduce our method, along with the theoretical analysis, in Section 3. Afterwards, experimental studies are presented in Section 4, followed by conclusion in Section 5.

In this section, we introduce several related works [19] published in recent years which get the state-of-the-art results. According to the relation features, we divide embedding models into two parts: matrix-based embedding models [27] and vector-based embedding models [28].

2.1. Matrix-Based Embedding Models

In this part, matrices (tensors) are used to describe relation features.

Structured Embedding. Structured Embedding Model (SE) [29] considers that head and tail entities are overlapping in a specific-relation space where the triplet exists. It uses two mapping matrices and to extract feature from and .

Single Layer Model. Compared with SE, Single Layer Model (SLM) [30] uses a nonlinear activation function to translate the extracted features and considers the features after activation to be orthogonal with relation features. The extracted features are comprised of the entities’ features after mapping and a bias of their relation.

Neural Tensor Network. Neural Tensor Network (NTN) [30, 31] is a more complex model and considers that the tensor can be regarded as better feature extractor compared with matrices.

Semantic Matching Energy. The basic idea of Semantic Matching Energy (SME) [32] is that if the triplet is correct, the feature of head entity and tail entity is orthogonal. Similar to SLM, the features of head (tail) entity are comprised of the entities’ features after mapping and a bias of their relation. There are two methods to extract features, i.e., linear and nonlinear.

Latent Factor Model. Latent Factor Model (LFM) [33, 34] assumes that features of head entity are orthogonal with those of tail entity when the head entity is mapped in specific-relation space. Its score function can be defined as , where , , denote the features of head entity, relation, and tail entity, respectively.

2.2. Vector-Based Embedding Models

In this part, relations are described as vector rather than matrix to improve the effectiveness of representation models.

Translation-Based Model. The basic idea of translation-based model, TransE [23, 35, 36], is that the relation is a translation vector between and . The score function is , where , , and denote the head entity, relation, and tail entity embeddings, respectively. Because TransE only processes simple relations, other translation-based models [3739] are proposed to improve TransE.

Combination Embedding Model. CombinE [40] describes the relation features with the plus and minus combination of each pair. Compared with other translation-based models, CombinE can represent relation features in a more comprehensive way.

Bilinear-Diag Model. DistMult [41] uses a formulation of bilinear model to represent entities and relations and utilizes the learned embedding to extract logical rules.

Holographic Embedding Model. HOLE [42] utilizes a compositional vector space based on the circular correlation of vectors, which creates fixed-width representations. The compositional representation has the same dimensionality as the representation of its constituents.

Complex Embedding Model. ComplEx [43] divides entities and relations into two parts, i.e., real part and imaginary part. Real part denotes the features of symmetric relation, and imaginary part denotes the features of asymmetric relations.

Project Embedding Model. ProjE [44], a shared variable neural network model, uses two-diagonal matrix to extract the entity and relation features and calculate the similarity between features and candidate entity. In training, the correct triplets have high similarity.

Convolutional Embedding Model. ConvE [45] transfers the features into 2D space and uses convolutional neural network to extract the entity and relation features.

Compared with matrix-based embedding models, vector-based models have obviously advantages on time and memory-space complexities. In these vector-based models, TransE is a classical baseline and has been applied on many applications, TransR is an improved method of TransE which solves the complex relation types, and DistMult and ComplEx use probability-based method to represent knowledge and achieve state-of-the-art results.

3. Similarity-Based Model

Given a training set of triplets, each triplet has two entities (the set of entities) and relationship (the set of relationship). Our model learns the entities embeddings (, , , ) and relationship embeddings (, ) to represent the feature of entities and relations, where the subscripts , denote the independent and associated space. The entity embedding and relation embedding take value in , where is the dimension of entity and relation embedding spaces.

3.1. Our Models

The basic idea of our model is that, for each entity (relation), the features are divided into two parts. The first part describes inherent features of entities (relations) in independent space. The feature embedding vectors can be denoted as , , . The second part signs triplet features in associated space, and the feature embedding vectors can be denoted as , , . In independent space, the feature vectors are described as the inherent features entities (relations) have. In associated space, the features of are comprised of other entities and relations which connect with entity .

The entities (relations) in associated space are projections of entities (relations) in independent space. Hence the representation features of the same entity in independent and associated space are similar, while the representation features of different entities are not similar. The formula can be described as follows:where denotes element-wise product. In detail, in (1), if we combine the features of and , we can obtain part of the features. That is to say, the features are similar with . In this paper, we use Cosine to calculate the similarity between different spaces. Taking head entity as an example, the Cosine similarity between different spaces can be denoted aswhere Dot denotes the dot-product and Sum denotes the summation over the vector element. calculates the similarity, and    constrain the length of features. To reduce the training complexity, we just consider the numerator and use regularization items to replace the denominator. Hence the similarity of head entity features in independent and graph spaces can be described as We expect that the value of is larger when and denote the same head entity, while the value of is smaller otherwise.

To represent entities in a more comprehensive way, we consider the similarity of head and tail entities simultaneously. The score function can be denoted as The embedding model based on the similarity of head and tail entities is named as SimE-E.

On the basis of entity similarity, we need to consider relation similarity, which can enhance the representation of relation features. The comprehensive model, which considers all the similarities of entity (relation) features in different spaces, can be described as

The embedding model based on the similarity of entity and relation is named as SimE-ER.

3.2. Training

To learn the proposed embedding and encourage the discrimination between golden triplets and incorrect triplets, we minimize the following logistic ranking loss function over the training set: where corresponds to the embeddings , , , , , and is a label of triplet. denotes that is positive and denotes that is negative. is a triplets set [28] which contains both positive triplets set and negative triplets set .

The set of negative triplets, constructed according to (9), is composed of training triplets with either head (tail) entity or relation replaced by a random entity or relation. Only one entity or relation is replaced for each corrupted triplet with the same probability. To prevent overfitting, some constraints are considered when minimizing the loss function :

Equation (10) is to constrain the length of entity (relation) features for SimE-E and SimE-ER. We convert it to the following loss function by means of soft constraints:where is a hyperparameter to weigh the importance of soft constraints. We utilize the improved stochastic gradient descent (Adagrad) [46] to train the models. Comparing with SGD, Adagrad shrinks learning rate effectively when the number of iterations increases, which means that it is insensitive to learning rate.

3.3. Comparison with Existing Models

To compare the time and memory-space complexities between different models, we show the results in Table 1, where represents the dimension of entity and relation embeddings, is the number of tensor’s slices, and and are the numbers of entities and relations, respectively.

The comparison results are showed as follows:(i)Except for DistMult and TransE, the baselines use relation matrix to project entities’ features into relation space, which makes these models have high memory-space and time complexities. Compared with these models, SimE-E and SimE-ER have lower time complexity. SimE-E and SimE-ER can be used on large-scale knowledge graphs more effectively.(ii)In comparison to TransE, SimE-E and SimE-ER can dynamically control the ratio of positive and negative triplets. It enhances the robustness of representation models.(iii)Compared with SimE-E and SimE-ER, DistMult is a special case of them when we only consider single similarity of entity or relation. That is to say, SimE-E and SimE-ER can extract the features of entities (relations) more comprehensively.

4. Experiments and Analysis

In this section, our models SimE-E and SimE-ER are evaluated and compared with several baselines which have been shown to achieve state-of-the-art performance. Firstly, two classical tasks are adopted to evaluate our models: entity prediction and relation prediction. Then, we use cases to verify the effectiveness of our models. Finally, according to the practical experimental results, we analyze the time and memory-space costs.

4.1. Datasets

We use two real-life knowledge graphs to evaluate our method:(i)WordNet   (https://wordnet.princeton.edu/download), a classical dictionary, is designed to describe correlation and semantic information between different words. Entities are used to describe the concepts of different words, and relationships are defined to describe the semantic relevance between different entities, such as instance hypernym, similar to, and member of domain topic. The data version we use is the same as [23] where triplets are denoted as (sway_2, has_instance, brachiate_1) or (felis_1, member_meronym, catamount_1). A subset of WordNet is adopted, named as WN18 [23].(ii)Freebase (code.google.com/p/wiki-links), a huge and continually growing knowledge graph, describes large amount of facts in the world. In Freebase, entities are described by labels, and relations are denoted by a hierarchical structure, such as “” and “”. We employ two subsets of Freebase, named as FB15K and FB40K [23].

We show the statistics information of datasets in Table 2. From Table 2, we see that, compared with WN18, FB15K and FB40K have more relationships and can be regarded as the typical large-scale knowledge graphs.

4.2. Experiment Setup

Evaluation Protocol. For each triplet in the test set, each item of triplets (head entity or tail entity or relation) is removed and replaced by items in the dictionary in turn, respectively. Using score function to calculate these corrupted triplets and sorting the scores by ascending order, the rank of the correct entities or relations is stored. For relation in each test triplet, the whole procedure is repeated. In fact, we need to consider that some correct triplets are generated in the process of removing and replacement. Hence, we filter out the correct triplets from corrupted triplets which actually exist in training and validation sets. The evaluation measure before filtering is named as “Raw”, and the measure after filtering is named as “Filter”. We used two evaluation measures to evaluate our approach which is similar to [42]:(i)MRR is an improved measure of MeanRank [23] which calculates the average rank of all the entities (relations) and calculates the average reciprocal rank of all the entities (relations). Compared with MeanRank, MRR is less sensitive to outliers. We report the results using both Filter and Raw rules.(ii)Hits@ reports the ratio of correct entities in Top-n ranked entities. Because the number of entities is much larger than that of relations, we take Hits@, Hits@, Hits@ for entity prediction task and take Hits@, Hits@, Hits@ for relation prediction task.

A state-of-the-art embedding model should have higher MRR and Hits@.

Baselines. Firstly, we compare the proposed methods with CP which uses canonical polyadic decomposition to extract the entities and relation features; then we compare the proposed methods with TransE which considers that tail entity features are close to the combined features of head entity and relation. Besides TransR [47], ER-MLP [48], DistMult [41], and ComplEx [43] are also used for comparison with our methods. We train CP [49], DistMult, ComplEx, TransE, and TransR using the codes provided by authors. We choose the length of dimension among , the weight of regularization among , the learning rate among , and the ratio of negative and correct samples among . The negative samples in different epochs are different.

Implementation. For experiments using SimE-E and SimE-ER, we select the dimension of the entity and the relation among , the weight of regularization among , the ratio of negative and correct samples among , and the mini-batch size among . We utilized the improved stochastic gradient descent (Adagrad) [46] to train the loss function. With the iteration epoch increasing, the learning rate in Adagrad is decreases, and Adagrad is insensitive to learning rate. The initial values of both SimE-E and SimE-ER are generated by Random function, and the range is , where is the dimension of feature vector. Training is stopped using early stopping on the validation set MRR (using the Filter measure), computed every 50 epochs with a maximum of 2000 epochs.

In SimE-E model, the optimal configurations on validation set are(i), , , on WN18,(ii), , , on FB15K,(iii), , , on FB40K.

In SimE-ER model, the optimal configurations on validation set are(i), , , on WN18,(ii), , , on FB15K,(iii), , , on FB40K.

T-test. In experiments, for each model, we run 15 times independently and calculate the mean and standard deviation. Then we use Student's t-test with to compare the performance between different models, and the t-test can be shown as follows [50, 51].

and are mean and standard deviation on model 1 with run times; and are mean and standard deviation on model 2 with times. Then we can construct the hypothesis:And the t-test can be described as The degree of freedom () in t-distribution can be shown as follows:

In entity and relation prediction tasks, we calculate mean and standard deviation of MRR and Hit and compare their performance with t-test.

4.3. Link Prediction

For link prediction [5254], we tested two subtasks—entity prediction and relation prediction. Entity prediction aims to predict the missing or entity from the fact triplet ; similarly, relation prediction is to determine which relation is more suitable for a corrupted triplet .

Entity Prediction. This set of experiments tests the models’ ability to predict entities. Experimental results of mean and plus/minus standard deviation on both WN18 and FB15K are shown in Tables 3, 4, and 5, and we can observe the following:(i)On WN18, a small-scale knowledge graph, ComplEx, achieves state-of-the-art results on MRR and Hits@. However, on FB15K and FB40K, two large-scale knowledge graphs, SimE-E and SimE-ER, achieve excellent results on MRR and Hits@, and the values of Hits@ are up to 0.868 and 0.889, respectively. The outstanding results prove that our models can represent different kinds of knowledge graphs effectively, especially on large-scale knowledge graphs.(ii)ComplEx is better than SimE-ER on WN18, and the reason is that ComplEx can distinguish symmetric and antisymmetric relationship contained in the relation structure of WN18. However, on FB15K and FB40K, SimE-E and SimE-ER are better than ComplEx. The reason is that the number of relations is much larger than WN18, and the relation structure is more complex and hard to represent, which has obvious influence on the representation ability of ComplEx.(iii)The results of SimE-E and SimE-ER are similar to each other. The largest margin is filtered MRR on FB15K at 0.013. The phenomenon demonstrates that both SimE-E and SimE-ER can extract the entity features in knowledge graph and predict the missing entities effectively.(iv)Compared with DistMult, the special case of our models, SimE-E and SimE-ER achieve better results, especially on FB15K, and the filter MRR is up to 0.740. The results can prove that our models which use irrelevant and interconnected features to construct independent and associated spaces can represent the entities and relations features more comprehensively.

We use t-test to evaluate the effectiveness of our models, and the evaluation results can prove that on FB15K and FB40K, compared with other baselines, our results achieve significant improvements, e.g., on the Hits@ results of ComplEx and SimE-ER, which is larger than . The t-test results can prove that, on FB15K and FB40K, our experimental results achieve significant improvement compared with other baselines.

Relation Prediction. This set of experiments tests the models’ ability to predict relations. Tables 6, 7, and 8 show the prediction performance on WN18 and FB15K. From the tables, we discover the following:(i)Similar to the results in the entity prediction, on WN18, ComplEx achieves better results on MRR and Hits@, and SimE-ER obtains better results on Hits@ and Hits@. On FB15K, besides the value of Hits@, the results of SimE-ER are better than ComplEx and other baselines, and the value of Hits@ is up to 0.842, which is much higher (improvement of 20.1%) than the state-of-the-art baselines. ON FB40K, SimE-ER achieves state-of-the-art results on all the measures; in particular, the filter MRR is up to 0.603.(ii)In entity prediction task, the results of SimE-E and SimE-ER are similar. However, in relation prediction tasks, SimE-ER achieves significant results on Raw MRR, Hits@, and Hits@. We use the t-test to verify the results, and the t-values are larger than . The difference between entity and relation tasks can demonstrate that considering both entity and relation similarity can extract relation features more effectively on the basis of ensuring the entity-features extraction.(iii)On FB15K, the gap is significant and SimE-E and SimE-ER outperform other models, with a MRR (Filter) of 0.593 and 0.842 of Hits@. On both datasets, CP and TransE perform the worst, which illustrates the feasibility of learning knowledge embedding in the first case and the power of using two mutual restraint parts to represent entities and relations in the second.

We also use t-test to evaluate our model; i.e., comparing SimE-ER with ComplEx on filter MRR, , which is larger than . The t-test results can prove that the performance of SimE-ER is better than other baselines on FB15K and FB40K.

To analyze the relation features, Table 9 shows the MRR with Filter of each relation on WN18, where denotes the number of triplets for each relation in the test set. From Table 9, we conclude the following:(i)For almost all relations on WN18, compared with other baselines, SimE-E and SimE-ER achieve competitive results, which demonstrates that our methods can extract different types of latent relation features.(ii)Compared with SimE-E, the relation MRRs of SimE-ER are much better on most relations, such as hypernym, hyponym, and derivationally_related_form.(iii)On almost all results of relation MRR, SimE-ER is better than DistMult, a special case of SimE-ER. That is to say, compared with single embedding space, using two different spaces to describe entity and relation, features can achieve better performance.

Case Study. Table 10 shows the detailed prediction results on test set of FB15K. It illustrates the performance of our models. Given head and tail entities, the top-5 predicted relations and relative scores of SimE-ER are depicted in Table 10. From the table, we observe the following:(i)In triplet 1, the relation prediction result is ranked on top-2, and in triplet 2, the result is top-1. The relation prediction results can prove the performance of SimE-ER. However, in triplet 1, the correct result (top-2) has similar score with other prediction results (top-1, top-3). That is to say, it is difficult for SimE-ER to distinguish similar relationships.(ii)For any relation prediction results, the top-5 relation prediction results are similar; that is to say, similar relations have similar representation embeddings, which is in line with common sense.

4.4. Complexity Analysis

To compare the time and memory-space complexity of different models, we show the analytical results of FB15K in Table 11, where represents the dimension of entity and relation space, “Mini-batch” represents the mini-batch of each iteration, “Params” denotes the number of parameters in each model on FB15K, and “Time” denotes the running time of each iteration. Note that all models are run on standard hardware of Inter(R) Core(TM) i7U 3.5GHz + GeForce GTX TITAN. We report the average running time over one hundred iterations as the running time of each iteration. From Table 11, we observe the following:(i)Except for DistMult, SimE-E and SimE-ER have lower time and memory complexities compared with the baselines, because in SimE-E and SimE-ER, we only use element-wise products between entities’ and relations’ vectors to generate the representation embedding.(ii)On FB15K, the time costs of SimE-E and SimE-ER in each iteration are 5.37s and 6.63s, respectively, which are lower than 7.53s, the time cost of TransE which has fewer parameters. The reason is that the mini-batch of TransE is 2415 which is much larger than the mini-batches of SimE-E and SimE-ER. Besides, for SimE-E and SimE-ER, the number of iterations is 700 times with 3760 (s) and 4642 (s), respectively.(iii)Because SimE-E and SimE-ER have low complexity and high accuracy, they can easily be applied to large-scale knowledge graph, while using less computing resources and running time.

5. Conclusion

In this paper, we propose a novel similarity-based embedding model SimE-ER that extracts features from knowledge graph. SimE-ER considers that the similarity of the same entities (relations) is high in independent and associated spaces. Compared with other representation models, SimE-ER is more effective in extracting the entity (relation) features and represents entity and relation features more flexibly and comprehensively. Besides, SimE-ER has lower time and memory complexities, which indicates that it is applicable on large-scale knowledge graphs. In experiments, our approach is evaluated on entity prediction and relation prediction tasks. The results prove that SimE-ER achieves state-of-the-art performances. We will explore the following future work:(i)In addition to the facts in knowledge graph, there also are large amount of logic and hierarchical correlations between different facts. How to translate these hierarchical and logic information into low-dimensional vector space is an attractive and valuable problem.(ii)In real world, extracting relations and entities from large-scale text information is an important yet open problem. Combining latent features of knowledge graph and text sets is a feasible method to construct the connection between structured and unstructured data. It is supposed to enhance the accuracy and efficiency of entity (relation) extraction.

Data Availability

All the datasets used in this paper are fully available without restriction upon request.

Conflicts of Interest

The authors declare that there are no conflicts of interest regarding the publication of this paper.

Acknowledgments

This work was partially supported by NSFC under Grants nos. 71690233 and 71331008.