Abstract

Driven by the rapid development of mobile computing and the Internet of Things, the number of devices connected to Internet has increased dramatically in recent years. Such development has generated massive amounts of data and highlighted the importance and urgency of using the accumulated big data to improve frequently used services. Deploying the question answering service in the mobile edge computing environment is considered a good way to make efficient use of the data and improve user experiences. Powered by the breakthroughs of deep learning technologies, question answering system based on knowledge graph (KBQA) has flourished in recent years. Knowledge representation, as a key technology of KBQA, can express the knowledge graph as the vectors containing more semantic information and thereby improve the accuracy of the question answering system. This paper proposes a knowledge representation method that integrates more features than the traditional methods. In our method, knowledge is represented as a combination of a structured vector reflecting the target triple and the domain information around the entity. By representing richer semantic vectors, our method outweighs TransE, ConvE, and KBAT, in terms of link prediction.

1. Introduction

The explosive growth of smart phones and other mobile terminals and the emergence of many new applications have made a great impact on mobile and wireless networks [13]. The traditional centralized network cannot meet the needs of mobile users due to heavy load and long delay [46]. Therefore, a new architecture is proposed to open network capabilities from the core network to the edge network, i e., mobile edge computing [79]. Mobile edge computing deploys the services and functions originally located in the cloud data center to the edge of the mobile network and provides computing, storage, network, and communication resources at the edge of the mobile network [10, 11].

Question answering system is an advanced form of information retrieval system, which can use accurate and concise natural language to answer users’ questions. KBQA is different from other types of question answering systems in that KBQA uses knowledge graphs to provide a highly structured knowledge source for the question answering system. Knowledge graph (KG) is a large scale multirelationship graph, consisting of entities and their relationships. A few of the existing large knowledge graphs include Freebase, DBpedia, YAGO, and XLORE. Although these knowledge graphs are large in scale, they are far from complete. In order to address this problem, link prediction is proposed. The common way to solve this task is to learn a low-dimensional representation of all entities and relationships and use them to predict new facts, also known as knowledge representation learning [12].

In recent years, many knowledge representation models have been proposed. The most classic model is TransE [13], which has been inspired by Word2vec. TransE can only be used in one-to-one relationships and cannot express polysemous words. Based on TransE, researchers have proposed many extended models. TransH [14] introduces a specific relationship Hyperplane, so that different entities have different representations under different relationships. TransR [15] makes different relationships to have different semantic spaces. Neural networks have also been widely used recently to build knowledge representation models. R-GCN [16] uses graph convolutional networks [17] (GCN) to represent relationships. ConvKB [18] combines the triple vector into a 3-column matrix and inputs it into the convolutional layer, which is represented by one-dimensional convolution. ConvE [19] uses two-dimensional convolution and multiple nonlinear features to model triples. Whether it is a traditional Trans model or a neural network-based model, each triple is processed independently, and the rich semantic information around the entity cannot be used.

In this paper, we propose KRDGC, a knowledge representation method that integrates multiple features. In order to learn the rich semantic information of nodes in the knowledge graph, KRDGC not only uses the structural information of the triple but also considers the neighborhood information around the entity. KRDGC uses TransD to represent structure information, so that different head and tail entities can be mapped to different relationship spaces according to their own attributes and relationship characteristics. The improved graph attention networks (GATs) can give different weights to adjacent entities according to different relationships between entities. We use improved GAT to obtain the neighborhood information within two hops of the entity. Finally, KRDGC uses the capsule network as a decoder. Our contributions in this paper are as follows:(1)We propose an end-to-end knowledge representation model KRDGC that combines the advantages of TransD and GAT.(2)We use the capsule network as a decoder to extract features of the same dimension in multiple feature maps.(3)We evaluate our KRDGC for link prediction on benchmark datasets FB15K-237 and WN18RR. KRDGC obtains the best mean rank and highest Hits@3.

The rest of the paper is organized as follows. Section 2 provides a review of background. In Section 3, we introduce the classic knowledge representation model. Section 4 reports experimental results and datasets’ descriptions followed by our conclusion and future research directions in Section 5.

2. Background

Mobile edge computing provides content storage, computing, and distribution services near the mobile user side through in-depth cooperation with content providers and application developers [20, 21]. This enables applications, services, and content to be deployed in a highly distributed environment to better meet low latency and high bandwidth requirements [2224]. For our system, the overall deployment framework is shown in Figure 1.

Recently, knowledge representation learning models can be divided into three categories: (1) using the structural information of the triple itself, (2) utilizing external information such as text, images, or rules, and (3) fusion of entity neighborhood information.

Models based on structured information can be divided into two categories, traditional translation models and neural network-based models. Traditional models based on translation include TransE, TransH, TransR, and TransD [25]. These models do not consider any information other than triples. TransH and TransR focus on the multiple representations of entities in different relations, improving the performance on knowledge completion and triple classification. However, both models only project entities according to the relations in triples, ignoring the diversity of entities. To address this problem, TransD proposes a novel projection method with a dynamic mapping matrix depending on both entities and relations, which takes the diversity of entities as well as relations into consideration. Inspired by the fact that concentric circles in the polar coordinate can naturally reflect the hierarchical structure, HAKE [26] was proposed. HAKE can effectively model the semantic levels in the knowledge graph and has a good performance in link prediction tasks.

Models based on neural networks include ConvKB and ConvE, as well as CapsE [27]. CapsE uses the capsule network to model triples. It has a “deep” architecture for modeling the entries in a triple at the same dimension. The number of interactions that ConvE can capture is limited, so InteractE [28] was proposed. InteractE increases the number of interactions between entities and relationships. It proves that increasing the number of interactions can improve the performance of link prediction.

DKRL [29] utilizes text information to model the corresponding triples and entity description information. TKRL [30] makes good use of the hierarchical information of entities. Compared with TransE and TransR, its performance has been improved by 11.3% and 6.2%, respectively. KALE [31] combines knowledge graphs and logic rules and then expresses and models them in a unified framework. This method can make better predictions outside the scope of pure logical reasoning, but the logic rules are more limited. IKRL [32] is the first attempt to combine images with knowledge graphs for KRL. Its promising performances indicate the significance of visual information for KRL. Although the use of information outside the knowledge graph can add semantics to entities, not all entities have access to additional information, and it is not universally applicable.

PTransE [33] is a path-based model. It combines multiple relationships of entities semantically to obtain a vector representation of the path. GAKE [34] defines three contexts with entities and relationships as subjects and uses these kinds of contextual information for modeling. TCE [35] improves on GAKE and proposes two types of context information, path context and neighborhood context. KBAT [36] is a novel attention-based feature embedding model that captures both entity and relation features in any given entity’s neighborhood.

In order to make the model more effective, we refer to KBAT as the basic model. On the basis of making full use of entity neighborhood information, combined with triple structured information, the capsule network is used to extract more in-depth information.

3. The Proposed KRDGC Model

In this section, we describe the proposed method. We first define notations. A knowledge graph is a collection of valid factual triples in the form of (head entity, relation, tail entity) denoted as such that and where is a set of entities and is a set of relations. Embedding model aims to define a score function giving a score for each triple, such that valid triples receive higher scores than invalid triples. The overall structure is shown in Figure 2.

3.1. Structured Features

We use TransD to model the structural features of triples. As illustrated in Figure 3, TransD sets up two projection matrices and that, respectively, project the head entity and the tail entity into the relational space. The specific definitions are as follows:where , and the subscript represents that the vector is a projection vector. Therefore, the mapping matrices are determined by both entities and relations. Compared with other Trans models, TransD makes the two projection vectors interact sufficiently because each element of them can meet every entry coming from another vector.

3.2. Graph Attention Networks with Relations

In the KG, entities and relationships are not independent, and they all have an impact on each other. The local neighbor nodes of the entity contain a lot of important hidden semantic information. In this paper, we use GAT to extract hidden features in the neighborhood of an entity. The original GAT only considers entities, ignoring edge information. We use the method proposed by KBAT to redefine an attention layer.

The structure of the graph attention network is shown in Figure 4. In order to update the vector of entity , a linear transformation layer is used to learn the vector representation of the combination of entities and relations in a specific triple . The corresponding vector after the combination iswhere denotes the linear transformation matrix. Then, we obtain the absolute attention value of the triple through another linear transformation matrix and the LeakyReLU nonlinearity.

We use softmax to normalize to get the relative attention value.where is the neighborhood of entity and is the set of relations between entities and . The new vector representation of entity is obtained by weighted summation of all neighborhood according to the relative attention value and is stabilized through the multihead attention mechanism.where represents the m-th attention head and represents any nonlinear function.

In order to keep the relationship dimension and entity dimension consistent, the relationship vector is updated through linear transformation in the GAT. In the last layer, the new vector and the original vector are linearly combined through the weight matrix to prevent the loss of the original information of the entity.

3.3. Capsule Network

After GAT training, we use improved CapsE [27] as a decoder in our model. It uses a three-column matrix to represent each triple. First, we use CNN to perform convolution operation on the triple vector to generate multiple different feature maps. All feature maps with the same dimension features are encapsulated into corresponding capsules. Therefore, each capsule can capture the different characteristics of the corresponding dimensions of the embedded triples. The products of these capsules and different weights generate smaller-dimensional capsules, and one continuous vector is obtained. The vector and the weight vector perform dot product operation again to obtain the corresponding score, and the result of the sum of all the scores is used to judge the correctness of the given triple. The score function for the triple is as follows:where caps denotes a capsule network operator and is an activation function. Because we use ReLU in this paper, is the shared parameter in the convolution layer. The model is trained using the loss function as follows:in which

The construction of negative triples is to replace the head entity and tail entity of the correct triple with all the entities in the dataset.

4. Experiments and Analysis

4.1. Datasets

In our experiments, we use two widely used benchmark datasets FB15K-237 [37] and WN18RR [19] for evaluation of the performance of link prediction.

FB15K-237 is extracted from Freebase. It contains 14,541 entities and 237 relations. It is an improved version of FB15K dataset where all inverse relations are deleted to prevent direct inference of test triples by reversing train triples. WN18RR is created from WN18, which is a subset of WordNet. WN18 consists of 18 relations and 40,943 entities. Similar to FB15K dataset, all inverse relations are deleted to prevent direct inference of test triples by reversing train triples. WN18RR contains 40,943 entities and 11 relations. Details of the datasets are summarized in Table 1.

4.2. Link Prediction

In the link prediction task, the purpose is to predict a missing entity given a relation and another entity. In a specific experiment, for each triple in the test set, we remove the head or tail entity and then replace it with all the entities in dictionary in turn. We first compute scores of those corrupted triplets and then rank them by descending order; the rank of the correct entity is finally stored. The task emphasizes the rank of the correct entity instead of only finding the best one.

4.3. Evaluation Protocol

We use the filtered setting protocol, i e., not taking any corrupted triples that appear in the KB into accounts. We rank the valid test triple and corrupted triples in descending order of their scores. We employ evaluation metrics: MR, MRR, Hits@1, Hits@3, and Hits@10 (i e., the proportion of the valid test triples ranking in top 1, 3, and 10 predictions).

MR means mean rank, and its specific calculation method is as follows:where is a set of triples, is the number of triple sets, and refers to the link prediction ranking of the i-th triple. The smaller the indicator is, the better the performance is.

MRR means mean reciprocal ranking. The specific calculation method is as follows:

The symbols involved in the above formula are the same as those involved in the MR calculation formula. The bigger the indicator is, the better the performance is.

HITS@n refer to the average proportion of triples that rank less than n in link prediction. The specific calculation method is as follows:where is the indicator function.

Lower MR, higher MRR, or higher Hits@1,3,10 indicate better performance. Final scores on the test set are reported for the model obtaining the highest Hits@1,3,10 on the validation set.

4.4. Training Protocol

We first train TransD for 1000 and 3000 epochs on FB15K-237 and WN18RR, respectively. Then, we get a 200-dimensional entity and relationship vectors and use the vectors to initialize entity and relation embeddings in GAT. In the GAT, we use the following hyperparameters for training to select the optimal result. We use Adam learning rate , -norm or -norm, margin , and dropout . The highest Hits@10 and Hits@3 scores and lower MR on the validation set are obtained when we use learning rate at , -norm, margin = 1, and dropout = 0.3 for FB15K-237 and learning rate at , -norm, margin = 5, and dropout = 0.3 for WN18RR.

After GAT, we train capsule network for 200 epochs. We set batch size to 128. We use the Adam optimizer with the initial learning rate . We monitor the MRR score after each training epoch and obtain the highest MRR score on the validation set when using the initial learning rate at .

4.5. Results and Analysis

Table 2 compares the experimental results of our model with the common classic methods.

Table 3 shows the comparison between our model and some traditional methods on FB15K-237 and WN18RR. These methods only consider the structure vector of the triples and do not consider the semantic information around the triples.

Table 4 compares the experimental results of our model with previous published results of only using neural network methods.

Compared to the baseline method KBAT, our model outperforms it on FB15K-237 across all the metrics and on three metrics for WN18RR. Figures 5 and 6 show that KRDGC gains significant improvement of in MRR (which is 1.7% relative improvement) and in Hits@10 (which is 3.6% absolute improvement) on FB15K-237. We confirm previous findings that KBAT in fact is a strong baseline model, e g., KBAT obtains better MRR and Hits@1 than KRDGC on WN18RR.

In Figure 7, KRDGC achieves better performance of (which is about 25% relative improvement) and (which is about 4.6% relative improvement) on MR for FB15K-237 and WN18RR, respectively.

In summary, combining structural information with neighborhood information can capture more semantic information and improve the effect of knowledge representation learning. The capsule network can extract more semantic information in the same dimension of the feature maps.

5. Conclusion and Future Work

In this paper, we proposed a KG embedding model KRDGC which is able to take advantage of the structure and neighborhood information of the triple. Our method uses TransD to model structural information and GAT to model neighborhood information and finally extracts deep features through a capsule network. We evaluate our model on link prediction, and the experimental results show significant improvements over the major baselines. In the future, we would like to further represent the relation vector in GAT, incorporate our model into the KBQA system, and verify its response time in mobile edge computing.

Data Availability

The data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest

The authors declare that there are no conflicts of interest regarding the publication of this paper.

Acknowledgments

This study was supported by the National Natural Science Foundation of China (no. 61602044). The authors are grateful to the research of http://export.arxiv.org/pdf/1808.04122 for providing new ideas.