Abstract

In the process of learning and reasoning knowledge graph, the existing tensor decomposition technology only considers the direct relationship between entities in knowledge graph. However, it ignores the characteristics of the graph structure of knowledge graph. To solve this problem, a knowledge graph reasoning algorithm based on multihop relational paths learning (MHRP-learning) and tensor decomposition is proposed in this paper. Firstly, MHRP-learning is adopted to obtain the relationship path between entity pairs in the knowledge graph. Then, the tensor decomposition is performed to get a novel learning framework. Finally, experiments show that the proposed method achieves advanced results, and it is applicable to knowledge graph reasoning.

1. Introduction

The knowledge graph is a knowledge base called a semantic network, which is a knowledge base with a directed graph structure. The nodes of the graph represent entities or concepts. The edges of a graph represent various semantic relationships between entities/concepts, such as similar relationships between two entities. The semantic network was proposed in the late 1950s and early 1960s, and it can be regarded as a data structure for storing knowledge, that is, a graph-based data structure [1]. It mainly uses the three-tuple form (head, relation, and tail) for knowledge representation: head is the head entity, tail is the tail entity, and relation is the relationship. By using a semantic network, sentences in natural language can be easily expressed and stored graphically for machine translation [2], question and answer system [3], and natural language understanding [4].

Knowledge graph reasoning is based on existing entity relationship triples, through inference calculation, establishing new relationships between entities and enriching and expanding knowledge graphs [5]. Knowledge graph reasoning can be roughly divided into symbol-based reasoning and statistical-based reasoning. In the first category, symbol-based reasoning can reason from an existing knowledge graph, use the rules of a new relationship between entities, and may also conflict detection logic of knowledge graph. For example, Oren et al. [6] proposed the use of Peer-To-Peer’s distributed framework to achieve data reasoning. Urbani et al. proposed the OWLRL query algorithm based on map reduce [7]. In literature [8], map reduce is used to realize the inference algorithm of OWLEL ontology, and it is proved that map reduce technology can solve the large-scale OWLEL ontology problem. In the second category, the statistical-based method generally refers to relational machine learning methods, which learn new interentity relations from knowledge graph through statistical laws. Nickel et al. [9] presented a relational potential feature model called bilinear model, which considers the interaction of potential features to learn about potential entity relationships. Lao et al. proposed a path ranking method to predict the existence of a possible edge by observing the features of the edges of the triples in the knowledge graph [10]. Drumond et al. used the two-two interactive tensor decomposition model to learn the potential relationships in the knowledge graph [11].

The tensor decomposition algorithm regards the entire knowledge graph as a large tensor and decomposes it into several small tensor slices through tensor decomposition technology [12]. In this way, the high-dimensional knowledge graph is processed for dimensionality reduction, which greatly reduces the data scale during calculation. In addition, Barmpoutis et al. proposed a method for automatic image tagging, in which tensor representation is an appropriate mathematical representation of multilink relations [13]. However, the existing tensor decomposition only considers the direct association between entities and does not take into account the multipath characteristics of the knowledge graph [14]. The reasoning performance is limited to a certain extent, and the relationship between entities cannot be deeply explored. According to the characteristics of the knowledge graph structure, the path reasoning algorithm uses the path relationship between entities to perform inference calculation, which can effectively mine the new relationship between entities in the knowledge graph. However, the existing inference algorithm cannot solve the long path reasoning and does not consider the path reliability and semantic combination problem.

In recent years, the path sorting algorithm has become a promising method for learning large-scale knowledge graph inference paths [1517]. However, the path sorting algorithm runs in a completely discrete space, which makes it difficult to evaluate and compare similar entities and relationships in the knowledge graph. To better learn the relationship path, the translation-based knowledge-based embedding method can be applied to encode the continuous state of reinforcement learning agent, while the framework of multihop relational paths learning is obtained.

The contributions of the paper can be stated as follows. (1) Based on the knowledge graph reasoning of path tensor decomposition, a reasoning method combining multihop relational paths learning method and tensor decomposition is proposed. (2) Tensor decomposition is used to make inference in these paths, and the path between entity pairs in knowledge graph is calculated by means of MHRP-learning. (3) The multipath relationship between entities in knowledge graph and the new facts between entities are explored to further enrich and improve of knowledge graph.

The rest of the paper is organized as follows. In section two, the knowledge graph learning and reasoning are introduced. In section three, the method of this paper is introduced. Section four gives the experimental results. At last, section five draws the conclusion of this paper.

2. Knowledge Graph Learning and Reasoning

Compared with the earlier Semantic Web, the knowledge graph has its own characteristics. First of all, knowledge graph emphasizes the association between entities and the attribute value of entities. Although knowledge graph can also have hierarchical relationships of concepts, the number of these relationships is much less than the number of relationships between entities, and the early semantic network is mainly used for the representation of natural language sentences. Secondly, a knowledge map is an important source; specially it is the encyclopedia of semistructured data extraction; this is not the same as early semantic network mainly by artificial building, and through the data access to high quality knowledge as a seed and then through knowledge mined technology, large-scale, high quality knowledge map can be quickly built; Finally, the construction of knowledge graph emphasizes the integration of knowledge from different sources and knowledge cleaning techniques, which were not the focus of the early Semantic Web.

Complex relationships and uncertainties exist in large-scale Knowledge Graph Completion work, information retrieval, natural language processing, machine learning, and other applied research fields. Knowledge Graph Completion is an effective method to solve problems in these fields. It mainly includes the following tasks.

2.1. Link Prediction and Entity Resolution

Link prediction is the prediction of possible relationships in the knowledge map. There may be some missing relationships between entities or wrong relationships between entities in the knowledge map [18]. Therefore, through the link prediction of knowledge map, these missing relationships can be completed or the wrong relationships can be corrected so as to achieve the function of improving knowledge map. It is a very important task in knowledge graph learning and reasoning.

Entity resolution, also known as entity linkage, refers to the possibility that different entity names may represent the same thing or that the same entity name may represent different things. By entity resolution, redundant entities can be removed by predicting and identifying whether different entities are the same thing. The same entity has different names. For example, “China Mobile Communications Corporation” may also have different names such as “China Mobile” and “Mobile Communications.” Entity resolution can specify these different names under the same entity. The same name can mean different entities, such as Apple, which Mr. Jobs started, or fruit. By parsing the semantic entities of the context, apple can be distinguished.

2.2. Entity Clustering and Classification

The cluster generated by traditional clustering is a set of data objects whose goal is to classify the entities with the same or similar attributes into one class. In link-based clustering, it is divided into objects of the same class. Not only are entities similar but their relationships may be the same or similar. Link-based clustering can quickly query corresponding entities and relationships by predicting similar triplet classifications. Triplet classification is a method of binary classification, which means to judge the rationality of a given triplet. Triples can be used in question answering systems. For example, in response to the question, “Does a dog have a tail?” the triplet (dog, has part, tail) can be constructed and the correctness of the triplet is judged, so as to achieve the knowledge map learning and reasoning [19].

2.3. Improve the Knowledge Map

Large-scale knowledge atlas contains a large number of entities, relationships, and facts. However, these knowledge maps are imperfect to some extent. For example, free-base has about 71% without the attribute of place of birth and about 75% without the attribute of nationality [20]. Therefore, improving knowledge graph is one of the most important tasks in knowledge graph learning and reasoning. Through learning and reasoning algorithms, the missing entities and relationships can be extracted from Semantic Web or other relevant databases, thus enriching and perfecting the knowledge map.

3. Methodology

In this section, multihop relational paths learning (MHRP-learning) is introduced to obtain the relationship path first. Then, we present the tensor decomposition method. Finally, the proposed method of knowledge graph representation based on MHRP-learning and tensor decomposition is presented.

3.1. MHRP-Learning Method

The task of relational reasoning is to predict and find reliable paths between entity pairs. The MHRP-learning method consists of two parts. The part one is the external environment, which indicates the dynamic interaction between the agent and knowledge graph. Also, it is usually modeled as a Markov decision process. is defined to express the Markov decision process. represents a continuous multistate space and is a collection of all available actions. is a matrix of transition probability. represents the reward function of .

The policy network is the part two of the MHRP-learning method. And it maps a state vector to a random policy. The random gradient descent is used to update neural network parameters. The MHRP-learning method based on policy is more suitable for the knowledge graph scheme in this paper. One factor is the problem of path discovery in knowledge graph. Due to the complexity of relational graph, the action space may be very large. Then, the three contents of the MHRP-learning environment, such as actions, states, and rewards, are introduced as follows.

The associated entity pair for relationship is , and it expects the policy to find the most informative path to link these entity pairs in this paper. Starting with the source entity , the policy network is used by the policy to select the most promising relationships to find its target entity . In this paper, the agent position is captured by each state in knowledge graph. After performing the operation, the agent moves from one entity to another entity. The state vector of step can be expressed aswhere indicates the embedding of the current entity nodes and indicates the embedding of the target entity. The inference relationships are not included in states because it is not useful.

In order to find the inference path controlled by the reward function, the monitored policy network is retrained by the reward function. Because the agent follows the random strategy, the agent will not be troubled by repeated steps. To further improve the efficiency of training, the length will be limited to the maximum length. If the agent cannot reach the target entity within the maximum length step, the scenario is over. The policy network can be updated by the gradient, which is expressed as follows:where represents the combination of the reward function and the parameter can be updated to maximize the expected cumulative reward.

3.2. Tensor Decomposition

Tensor is a general term for high-dimensional arrays. The tensor decomposition is the process of decomposing high-dimensional arrays into multiple low-dimensional matrices. In this paper, the purpose of using tensor decomposition is to reduce the dimension of the entity matrix and relational matrix model. At present, the tensor decomposition method is applied to the learning and reasoning process of knowledge graph. The most representative and influential is the third-order tensor decomposition RESCAL [18]. The focus of this paper is the MHRP-learning algorithm. In order to highlight its effectiveness and facilitate the comparison and verification effect, the RESCAL algorithm is used. The RESCAL model is briefly introduced as following. In the knowledge graph with n entities and m relationships, a third-order tensor can be used to represent it, while the k-th interaction relationship between entities and entities can be represented by the k-th layer of tensor. By decomposing the tensor of the k-th layer, it can be approximated as , where A is a matrix of , and n represents the number of entities in the knowledge graph, and r represents the feature (or dimension) of each entity; each row in the A matrix represents an entity.

is a matrix of , representing the k-th relationship between the entity and the entity, and its decomposition diagram is shown in Figure 1. Therefore, the decomposition of the entire tensor can be solved by the optimization problem.where represents the error in the k-th tensor decomposition and is a constraint added to prevent the overfitting problem that occurs during the optimization process. Their forms are shown as follows:

3.3. The Proposed Method of Knowledge Graph Representation

The inference path learned by the MHRP-learning may be a logical formula of the prediction relationship chain for an entity pair. Each formula is validated using a two-way search. In a typical knowledge graph, an entity node can link to a large number of neighbors, which have the same relationship links. If the formula consists of such links, the number of intermediate entities will increase exponentially when the reasoning formula is followed in this paper. The number of intermediate nodes can be greatly reduced if this article starts in the opposite direction from the verification formula.

In order to make equation (3) converge as quickly as possible with the constraint of equations (4) and (5), the alternating least squares method is used to update A and . And the whole process reaches the maximum number of iterations until converges to a certain value, while A and can be expressed as follows:where . The data dimension and complexity are reduced by decomposing the multirelational data, while retaining the characteristics of the original data. It can be used for collection classification learning, link prediction, entity analysis and learning and reasoning of large knowledge graphs, and achieving better results.

Since there is no constraint between the entity and the relationship in the tensor decomposition, there is a certain relationship between the entity and a certain relationship type. However, in reality, most relationships only apply to a few types of entities.

The proposed algorithm in this paper is focused on computing knowledge graph of each entity in the path of the relationship between entities. By using random walk strategy for any head entity, we can reach the tail entity and form the candidate entity relations of the head and tail entity and then calculate the loss function value of each path using tensor decomposition technology, so as to predict the new entity relations in the knowledge graph and enrich and expand the knowledge graph.

4. Experiments

4.1. The Experimental Dataset

The experimental dataset is based on two general knowledge graphs. WN11 is the subsets of WordNet [19], and FB15K-237 is sampled from FB15K [20]. The statistics of the two datasets are shown in Table 1.

In order to avoid the repetition of datasets in the training set, the path triples that exist in the training set are eliminated when constructing the test set data samples. At the same time, in order to avoid the large sample data of the training set caused by too many paths, the limited path length of the construction data is less than 4.

4.2. Experimental Results and Analysis

Mean rank and HITS@10 mean rank are commonly used in the current knowledge graph reason algorithm. Mean rank represents the average number of entities in front of the correct entity obtained by calculating the value of the triples score function of the test set. HITS@10 mean rank is obtained the top 10% value of the correct entity by calculating the score function value of the triad in the test set. As a result, the smaller mean rank value was, the bigger HITS@10 value was and the higher the accuracy of the model prediction was.

To effectively reflect ability of the proposed method to answer path questions, the results of RESCAL and TRESCAL [21] were compared as shown in Table 2. On dataset WN11, the proposed method predicted better than RESCAL and TRESCAL. In HITS@10 performance indicators, the proposed method increased by 54% and 35.5%, respectively. In the dataset FB15k-237, the proposed method prediction performance was better than those of RESCAL and TRESCAL, and in HITS@10 indicators, the proposed method increased by 25.2% and 13.4%, respectively. Therefore, the proposed method can effectively predict the intermediate entity set in the path, so it has better prediction performance in the path dataset.

In order to effectively explain the prediction performance of the proposed method in improving knowledge graph, some existing knowledge graph reason algorithms are compared, and the results are shown in Table 3.

On the dataset WN11, the prediction of the proposed method is better compared with those of TransE, TransR, TransH, and TransD. In the dataset FB15k-237, the proposed method also showed a slight improvement in the HITS@10 index.

It can be found from Table 3 that the proposed method performs better than some existing single-path inference algorithms in entity link prediction of knowledge graph. The reasons are as follows. In the basic dataset WN11 and FB15k-237, there are some interrelated relations, from which the combination constitutes some multipath information. The proposed method can not only predict single path but also extend the prediction of maximum path at the same time, so it can obtain better prediction performance.

In summary, the proposed method has better predictive performance in the task of answering path questions and predicting entity links, which is suitable for enriching and expanding knowledge graph.

5. Conclusion

Based on the knowledge graph reasoning of path tensor decomposition, a reasoning method combining multihop relational paths learning (MHRP-learning) method and tensor decomposition is proposed. The path between entity pairs in knowledge graph is calculated by means of MHRP-learning. Tensor decomposition is used to make inference in these paths, and the multipath relationship between entities in knowledge graph and the new facts between entities are explored to further enrich and improve of knowledge graph. The experiment shows that this method is superior to some existing methods in the task of solving knowledge graph path questions, and it has a good prediction effect to predict the physical links of knowledge graph.

Data Availability

The data used to support the findings of this study are included within the article.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

This work was supported in part by the Hunan Natural Science Foundation (nos. 2019JJ40097, 2019JJ40096, and 2020JJ4237), in part by the Youth Research Foundation of Hunan Education Department (no. 20B247), in part by the Outstanding Youth Research Foundation of Hunan Province (no. 2020JJ2015), in part by the Key Research Foundation of Hunan Science and Technology Department (no. 2017NK2390), in part by the Research Foundation of Science and Technology Bureau of Yongzhou City, China (nos. 2019YZKJ08 and 2019YZKJ10), and in part by the construct program of applied characteristic discipline in Hunan University of Science and Engineering.