Abstract

With the development of science and technology, science and technology policies are increasing year by year. Science and technology policies are literature existing in the form of texts, which are characterized by rigorous structure, clear hierarchy, and standard language. Mining template information from policies can optimize data templates and improve the efficiency of recommending data to users. This paper proposes a joint entity relation extraction model based on capsule networks and part-of-speech weighting. In order to learn more feature information from word vector, capsule network based on bidirectional gated cyclic unit is used to replace the traditional convolutional neural network. In view of the phenomenon of imperfect semantic expression of word vector, part-of-speech features are added to enrich text information. Meanwhile, in order to solve the weight distribution problem of word features and part-of-speech features, an artificial fish swarm algorithm is proposed to optimize the two feature weights by iterative optimization, and the effectiveness of the proposed model is proved by experiments.

1. Introduction

Big data is gradually becoming a new model and element to promote the construction of social development and a modern governance system, which has had a huge impact on national governance, science and technology decision-making, and intelligence services. It has become one of the consensus [1] in the field of intelligence services research to effectively make use of massive data resources and big data analysis methods and technologies to support decision makers. Science and technology policy is a document in the form of text, which has the characteristics of rigorous structure, clear hierarchy, and standard language. Mining template information from the policy can optimize the data template and improve the efficiency of recommending data to users. Through the extract the entity relation from science and technology policy, the theme of data stored in the form of template database not only save time and manpower and can promote the efficient utilization of data and optimize the user of data viewing experience, so that the support project research direction and the formulation of science and technology policy is of great significance.

Entity relation extraction refers to the extraction of relation triples (entity 1, entity 2, and relation) from sentences according to context semantics, which is one of the important subtasks of relation extraction [2]. It is widely used in text summarization, knowledge graph, and search engine. Relation extraction can be divided into two types according to the set of relation extraction results: restricted relation extraction, which requires the set of relation extraction to be determined in advance, and which category the current relation between entities belongs to according to entity and context semantics, similar to text classification; open relation extraction, there is no predetermined relation set and the expected domain of extraction is uncertain.

In this paper, a joint entity relation extraction model of capsule network based on bidirectional gated cyclic unit generation is constructed. Experiments show that the macro-average F1 value based on BLSTM model is higher than that based on BGRU model, and the running time based on BGRU model is shorter than that based on BLSTM model. Considering the imperfect semantic expression of word vector and the location information contained in the capsule network, two part-of-speech weighted models based on self_attention were constructed by adding part-of-speech features to enrich the text information, and the validity of the later combined model was verified by experiments. Aiming at the problem of weight distribution of word feature and part-of-speech feature, artificial fish swarm algorithm is proposed to optimize the weight of the two features by iterative optimization to improve the classification effect. The effectiveness of the two optimization schemes is proved by experiments.

2.1. Entity Relation Extraction

Deep learning supervised entity-based relation mainly includes pipe lined extraction and joint extraction [2]. Pipe lining extraction refers to the realization of named entity recognition, NER, and then judging the relation between the entities, the benefit of this way is very convenient to handle and flexible in combination, but it ignores the association between the two subtasks and produces false superposition [3]. Joint extraction builds entity recognition and relation extraction within a model, which improves the extraction effect by enhancing the dependence of the two subtasks by sharing the encoding layer. Miwa and Bansal [4] used neural networks for joint relation extraction to improve the accuracy of the model,which is the first joint extraction model of neural networks. The joint extraction model designed by Li et al. [5] contains three layers: word embedding, sequence, and dependency layers. The proposedmodel obtain good results on the semeval-2008 dataset. Zheng et al. [6] uses Bi-LSTM to extract text semantic information at the encoding layer as a shared layer for entity recognition and relation extraction, and enter data from the coding layer into the entity recognition model and relation extraction model. The authors experiment the model on the public dataset ACE05 to the highest level and the effectiveness of the joint extraction model is proved. Zhang et al. [7] combine convolutional neural networks with support vector machines and conditional random airports. A joint neural network model is constructed and parameter sharing way in the drug manual corpus has achieved good results. Ma et al. [8] put forward the entity and relation joint extraction model of the feedback mechanism. A feedback mechanism is introduced to modify the feature extraction layer and entity recognition model parameters according to the weighted loss and feedback of the two loss values.

2.2. Capsule Network

Capsule network is a variant of Convolutional Neural Networks, which was first proposed by Geoffrey Hinton et al. The capsule network uses the capsule instead of the neuron node of the traditional neural network. The capsule is a neuron vector and each value of the capsule is a vector, which contains the feature, direction, state, and other information of the entity. The relation between words is trained by dynamic routing. Then the feature vectors are clustered to capture the text space features, including input layer, convolution layer, main capsule layer, category capsule layer, and output layer. Sabour et al. [9] built the capsule network architecture surpassed CNN on the MNIST data set to achieve the most advanced performance at that time. Wang et al. [10] and others designed a hybrid depth network model based on capsule network and cyclic neural network to verify the effectiveness of the model compared with the popular text classification methods at present. The author shows that the location information of the text and the relation between the part and the whole can be learned through the capsule network, which can enrich semantics and reduce the loss of feature information. The integrated model based on transformer-capsule proposed by Tang et al. [11] proves that the capsule network can effectively extract the phrase features in the text.

3. Joint Entity Relation Extraction Model Based on Capsule Network and Part-of-Speech Weighting

3.1. Entity Relation Extraction Based on Capsule Network

This section constructs a parameter-shared joint entity relation extraction model based on capsule network. As shown in Figure 1, the model is mainly composed of four parts: vector representation layer, feature learning layer, entity recognition layer, and relation classification layer.(1)Vector representation layer. For the problem that the computer cannot deal with the text directly, the word is embedded into the vector to represent the text feature, and word2vec [12] is used to train the domain word vector to enhance the semantic accuracy.(2)Feature learning layer. Word vectors can only represent the general meaning of the text. BGRU [13] is used to learn higher level contextual semantic features from the word vector representation of sentences.(3)Weight adjustment layer. For the learning text, feature weights are unreasonable and cannot make efficient use of resources, and the self-attention mechanism [14] is used to calculate weights as the weight adjustment layer. Vector representation layer, feature learning layer, and weight adjustment layer are used as the shared part of the model.(4)Entity recognition layer. Conditional random field is used to label text sequentially. Conditional random field is a typical discriminant undirected probability graph model, which was proposed by Lafferty [15] in 2001, which combines the characteristics of hidden Markov model (Hidden Markov Model, HMM) [16] and maximum entropy model (Maximum Entropy Markov Model, MEMM) [17], but the conditional random field does not have two unreasonable hypotheses of hidden Markov model (homogeneous Markov hypothesis and observation independence hypothesis). This makes the algorithm more flexible in design with more contextual information. Compared with the maximum entropy model, the conditional random field calculates the global optimal node probability, not just the local normalization, so the problem of label bias is solved. McCallum [18] took the lead in using it in named entity recognition, and after continuous improvement, it has become the most successful method in named entity recognition [19]. According to the BIOS tagging method, the text is marked with a corresponding tag, in which the words connected by B and I are combined as an entity, and there is only one B tag as the beginning of the entity, and a plurality of I tags can be connected continuously in the middle; the O tag identifies that the recognition result of the current word does not belong to any entity and does not be processed; the S tag can be used as a separate entity, and when there are continuous S tags, it is recognized as a plurality of entities.(5)Relational classification layer. The capsule network uses the capsule instead of the neuron node of the traditional neural network. The capsule is a neuron vector and each value of the capsule is a vector, which contains the feature, direction, state and other information of the entity. The relation between words is trained by dynamic routing, and then the feature vectors are clustered to capture the text space features.

The first layer is the input layer, in which the weight-adjusted feature coding layer and the recognized entity vector are spliced as the input of the capsule network .

The second layer is the convolution layer, which extracts sentence features through standard N-gram convolution, in which is the size of the convolution kernel determines the quality of feature extraction. The input ABSR of the capsule network layer is LV dimensional data, where L is the data length after the stitching of sentence features and entity vectors and V is the embedded word vector dimension. Set the filter with a window size of K and convolution sentences with a step size of 1, as shown in the following formula:

Among them, represents filter for convolution operation, f represents the activation function ReLU, represents an offset item, represents the data in the range of time i window from sentence i to i + K − 1, you will eventually get a feature sequence with L − K + 1 length .

The third floor is PrimaryCaps, the data extracted by the convolution layer is still a scalar representation. The main capsule layer converts the features extracted by the N-gram convolution layer into capsules and retains the instantiated parameters of the features in the form of vectors. In the shared parameter mode, each N-gram vector is multiplied by the shared parameters, and then transformed into vector neurons for dynamic routing.

The fourth floor is ClassCaps, the dynamic routing algorithm is used to complete the transformation from the main capsule layer to the class capsule layer. The dynamic band routing algorithm ensures that the information of the child capsule is sent to the corresponding parent capsule in a nonlinear mapping way. Through several iterations to enhance or weaken the connection strength between the child capsule and the parent capsule, dynamic routing is more effective than the maximum pooling of CNN to lose a lot of text space information [20]. In the category capsule layer, the module operation is done for each vector, and the vector with the maximum modulus indicates that the more likely the analogy is.

Dynamic routing is the main part of the algorithm and the computing model is shown in Figure 2.

In Figure 2, represents the j weight of the i th cycle, the initial default is 0, v1 and v2 are the lower eigenvectors output from the upper capsule layer, W is the transformation matrix corresponding to the vector, r is the number of iterations, and ci indicates the similarity between the upper capsule and the lower capsule. By updating each iteration, squeeze the vector using the function as the activation function, and the calculation formula is as shown in formula (2). The larger the module of the vector is, the stronger the feature is. After the end of the calculation, the b value is updated to start the next round of calculation until the iteration r stops.

The fifth floor is the output layer. The output layer uses flattening function plus full-link layer plus softmax function. The output of the capsule network is cap_numand cap_ dim dimensional data, where cap_num represents the number of capsules and cap_dim represents the capsule dimension. You need to flatten the output value of the capsule network into one-dimensional data and transfer it to the full-connection layer.

The full-connection layer plays the role of classifier in the neural network, which belongs to the feedforward neural network. The input of the full-connection layer is all the output from the upper layer, and the output is n output nodes, where n represents the number of categories. The output of the full-connection layer is input into the softmax function to calculate the probability, and the final classification result is obtained.

3.2. Entity Relation Extraction Based on Part-of-Speech Weighting

Part of speech can not only improve the accuracy of entity recognition but also promote the problem of text classification. Considering that the capsule network contains location information, this chapter proposes to add part of speech to reduce the ambiguity of words and enrich the feature semantic information. In addition, considering that part-of-speech features play different roles in different data, part of speech and other features are mostly added to the model of text data processing to deal with part of speech and other features in the same proportion. However, from the point of view of the different effects of part of speech on different experiments, the importance of part of speech plays a different role in different data and different models, so a joint entity relation extraction model based on artificial fish swarm part-of-speech weighting is constructed. The artificial fish swarm algorithm is used to optimize the weight of word features and part-of-speech features by iterative optimization,so as to to improve the classification effect [21].

The division of part of speech is based on grammatical norms and the meaning of words. Learning the context features of part of speech through BGRU can increase the grammatical information of sentence vectors. Considering that parts of speech have different importance to features, self-attention is used to add weight to each part of speech. As shown in Figure 3, the model consists of six parts: embedding layer, weight adjustment layer 1, feature learning layer, weight adjustment layer 2, feature binding layer, and relation extraction layer.

Embedded layer. According to the word vector dictionary and the part-of-speech vector dictionary, the word and part of speech are transformed into vector data by word embedding technology, and , the weight of the artificial fish school Q = q directly as input data.

Weight adjustment layer 1. In the weight adjustment layer 1, the part-of-speech vector is weighted according to the weight of the artificial fish school, and the weight of the artificial fish school is multiplied by the part-of-speech vector, so as to adjust the weight of the part-of-speech vector and the part-of-speech vector, .

Feature learning layer. Feature learning layer requires a context semantic and meaning vector of the word vector, a context whispering, and , at this time, BQP vector is indicated by the weighted word semantic.

Weight adjustment layer 2. Weight adjustment layer 2 needs to adjust the internal weight of each word and part of speech in the word vector and part-of-speech vector, respectively, through self_attention to get the word vector and part-of-speech vector representation , , the word vector, and the morphinal vector at this time are two weighted results, respectively.

Weight binding layer. The feature binding layer combines the weighted word vector with context semantics and the part-of-speech vector, and . After splicing, we get the word vector as the output of the feature binding layer, the output vector is the semantic weighting of the context of the containing word feature and the semantic weighting of the part-of-speech feature context.

Different from the general functional model with fixed input and output, the output result of the deep learning model is random, and different results can be obtained even if the same data is used, so the parameters of each layer in the experiment are fixed by saving the model, so that the experimental results of the model under the same data are the same, and that the deep learning can be applied to the artificial fish swarm algorithm. The flowchart of part-of-speech weight optimization based on artificial fish school is as follows:Step 1: Artificial fish swarm algorithm needs to adjust the position of artificial fish according to the objective function, so as to get the optimal solution. However, the randomness of the parameters in the deep learning model leads to the randomness of the experimental results, so the experimental results are not accurate, and the experiment itself will fluctuate under the same configuration parameters and the same experimental data, which may cause artificial fish to fluctuate back and forth, and interfere with the optimization of artificial fish. In order to reduce the influence of randomness on the experiment, the part-of-speech weight is set to 1 into the capsule network entity relation extraction model combined with part-of-speech weighting. After 50 iterations, the trained model is saved to the parameters in the fixed depth model.Step 2: Initialize the population size, visual field, step size, crowding factor, repetition times, and the initial position of each artificial fish.Step 3: Through the model saved in step 1, the macro-average F1 value of each artificial fish is calculated as the objective function value of the artificial fish, and the optimal value is given to the bulletin board.Step 4: According to the objective function value of each individual, the individual artificial fish are carried out foraging, clustering, rear-end collision, and other behaviors, by comparing with the bulletin board to choose whether to update the bulletin board.Step 5: Judge whether the condition for the end is met, then jump out of the cycle, and the value of the bulletin board is the final result; if not, you will turn to step 3.

The better value obtained by the artificial fish swarm algorithm is used as the weight of part of speech, and the weighted part-of-speech vector is obtained by multiplying the weight value with the part-of-speech vector. The weighted part-of-speech vector and the part-of-speech vector spliced after the weighted attention mechanism are taken as the joint sharing part of the capsule network entity relation extraction model with part-of-speech weighting.

4. Experiments

4.1. Entity Relation Extraction Based on Capsule Network

In order to verify the effectiveness of the capsule network based on self-attention mechanism in joint entity relation extraction model, take the macro-average F1 value as the measurement standard. Because the initial values of many parameters are randomly generated during deep learning, most of the experiments are random, so the experimental results are different even in the same model with the same experimental data, in order to reduce the interference of experimental randomness to the experiment. Each experiment is run three times and the average value of three experiments is taken as the final macro-average F1 experimental results. The model proposed in this chapter is compared from two aspects of model comparison and affecting model parameters by using single variable method to verify the effectiveness of the proposed method.

4.1.1. Model Structure

In order to verify the effectiveness of the proposed model, the joint entity relation extraction model based on capsule network (represented by Capsnet), the joint entity relation extraction model based on CNN (represented by CNN), and the joint extraction model of capsule network with self-attention mechanism (represented by Capsnet + self_attention) were tested with the same data. First of all, Cpasnet and CNN are iterated 300 times, and it is found that the macro-average F1 value of the two model experiments fluctuates in a small range after 100 iterations, and the experimental results are shown in Figure 4, so the number of iterations is set to 100 in the future experiment.

As can be seen from Figure 4, the CNN model is small in the case where the number of iterations is small, but there is a crossover at the 9th iteration. The average value of the experimental results of the two models is from 100 to 300 iterations, respectively, the average of the CAPSNET model is approximately 0.78293 (retaining five effective numbers), the average of the CNN model has an average F1 value of 0.77489. Keeping the five effective numbers, the CAPSNET model is about 0.00804 in the CNN model.

As can be seen from Figure 5, the difference between the CAPSNET model and the CAPSNET model after the addition of the self-focus mechanism is not very large, but the model from the 20th iteration begins to add self-focus, which is slightly higher or about equal to the CAPSNET model. Occasionally, the average value of the average F1 value is from 40th to 100 iterations, using the CAPSNET as 0.77452, the model of the self-focus mechanism is 0.78043. It is 0.00591, which is 0.0051, proves that the self-focus mechanism improves the average F1 value of the model to a certain extent, which proves the effectiveness of the incoming and introduction of the capsule network entity relation of the introduction of self-focus mechanism.

4.1.2. Impact Model Parameters

In the depth learning model, suitable parameters can not only improve the operating rate of the model but also improve the learning efficiency of the model. The effects of capsule dimensions and convolutionary window values in the capsule network were studied using a single variable.

4.2. Capsule Network Based on Self-Attention Part-of-Speech Weighting

The model is analysed from two aspects: macro-average F1 value and experimental time. The details are as follows.

4.2.1. Comparison of Macro-Average F1 Values of Four Models

Comparing the precombination model and postcombination model proposed by combining the features of part of speech in this section with the third chapter model (expressed by BLSTM) and the model proposed in the first Section 3 (expressed by BGRU), 50 iterations will be realized, and the experimental results are shown in Figure 8.

The macro-average F1 value of the four models from the 40th iteration to the 50th iteration was calculated, respectively, and the order of the macro-average F1 value of the four models from big to small was: late combination model 0.78394, BLSTM model 0.77792, early combination model 0.77632, and BGRU model 0.7733. It can be seen that the combination model in the early stage is 0.00302 higher than BGRU model, but it is lower than BLSTM alone. The combination model in the later stage has a great influence on the model, which is 0.01064 higher than BGRU alone and 0.00602 higher than BLSTM alone. It verifies the effectiveness of the joint entity relation extraction model in capsule network based on self_attention part-of-speech weighting.

4.2.2. Comparison of Running Time of Four Models

The four models are iterated 30 times, 50 times, and 100 times, respectively, to get the running time. The experimental results are shown in Figure 9. As far as the three experimental results are concerned, the running time of the preintegration model and the postintegration model is not much different, which is longer than that of BGRU alone, but the running time of the preintegration model and the postintegration model is slightly shorter than that of BLSTM alone.

By replacing BLSTM with BGRU and taking part of speech as an extended feature, the macro-average F1 value of the joint entity relation extraction model based on capsule network can be improved on the premise of reducing the running time, thus proving the effectiveness of the joint entity relation extraction model based on self_attention part-of-speech weighting.

4.3. Capsule Network Based on Part-of-Speech Weighting of Artificial Fish Swarm

In view of the randomness of the deep learning model, a set of data cannot represent the overall effect of the model. The model is iterated 50 times to save the experimental model and run three times randomly to save three models: model 1, model 2, model 3; artificial fish experiments are carried out with the same three groups of data for each model; data 1, data 2, and data 3. A total of 9 experimental results were obtained in 3 groups, each of which corresponds to the optimal value of a model and a set of data, and the average value of the 9 experimental results is 0.84293.

The experimental results of the capsule network joint entity relation extraction model without adding part of speech are shown in Figure 10. The weight of the part-of-speech feature is is 0.8 (the approximate value is obtained by rounding) by the artificial fish swarm experiment, which is usedto compare following models, the capsule network joint entity relation extraction model based on the artificial fish school part-of-speech weighting (represented by artificial fish school + self-attention mechanism), a POS-weighted capsule network joint entity relation extraction model based on self_attention POS weighting (represented by self-attention mechanism), and the Pipeline model (represented by Pipeline) [22]. The final experimental results are shown in Figure 10.

Each experiment was performed three times average, and the average F1 value of the Pipeline model was 0.77043, the unrelated capsule network entity relation combined withdrawal model macro-average F1 value is 0.7733, alone uses a self-focused mechanism’s postcombination model of the model. The average F1 value was 0.78394 and the average F1 value of the artificial fish group + self-focus mechanism model reached is 0.78565. It can be seen that the average F1 value of the Pipeline model ratio of the capsule network entity is increased by 0.00287; the model after joining the morphism is increased by 0.01064 than the capsule network entity relation, and the combination of artificial fish group is used in combination. The solid relational combined extraction model is 0.00171, and the macro-average F1 value of the combination of the combination of the combination of self-focusing mechanisms is 0.00171, which proves the effectiveness of the model.

5. Conclusion

On the basis of the capsule network joint entity relation extraction model, the semantic information is improved by adding part-of-speech features and an entity relation joint extraction model based on self_attention is constructed to weigh the part of speech in the sentence internally. The experimental results show that the macro-average F1 value of the model is improved. In order to solve the problem of the weight distribution of word features and part-of-speech features, an artificial fish swarm algorithm is proposed to optimize the two feature weights by iterative optimization, and the classification effect is improved by adjusting the weight of part of speech to control the proportion of word vectors and part-of-speech vectors. To sum up, the part of speech can improve the features of part of speech to a certain extent and the importance of part of speech is different from that of words. In the future, semantic feature representation will be improved by enriching semantic feature, perfecting model structure and optimizing function method.

Data Availability

The datasets used and/or analyzed during the current study are available from the corresponding author upon reasonable request.

Conflicts of Interest

The authors declare that there are no conflicts of interest with any financial organizations regarding the material reported in this article.

Acknowledgments

The authors acknowledge the National Natural Science Foundation of China (61373160), the research project “Research on Recognition Method of Knowledge Evolution Path for Sequential Associated Text Based on Graph Neural Network (F2021210003)” of the Natural Science Foundation of Hebei Province, the research project “Knowledge Graph Construction of Multi-Source Domain Data Based on Knowledge Representation Learning (QN2020197)” of the Education Department of Hebei Province, and the research project of Hebei Science and Technology Information Processing Laboratory.