Abstract

In order to improve the accuracy of financial robot audit question answering, we propose, on the premise of processing corpus features, combining Bi-LSTM network and CRF to identify the domain entities, so as to solve the problem of low recognition rate of financial knowledge domain entities, and introducing the mechanism based on attention and CNN network to construct the multigranularity feature question-answering matching model. Finally, the above methods are verified by experiments. The results show that the AUC, MAP, and MRR increase by 0.74%, 0.85%, and 0.81%, respectively, indicating the feasibility of the improved method.

1. Introduction

Enterprise financial audit is the focus of enterprise financial management, but the key is that auditors have a solid professional knowledge. Therefore, there is an urgent need to provide professional auditors with rapid and massive professional financial field knowledge. The traditional financial domain knowledge acquisition is through the web search method, which has a lot of redundant information and needs to be manually screened. Therefore, people began to turn their attention to the question and answer system, which achieves the answer to the question by simple natural language. And there are many kinds of key technology of question-answering system derived, such as natural language processing, naming entity extraction, intent recognition, knowledge map, deep learning, and so on. For example, QIU XiPeng summarized the current training models of natural language and expounded the value and significance brought to NLP by the emergence of training model [1]. Hu et al. apply natural language processing technology to some popular words in online communities. The results show that the overall performance of NLP algorithm is 0.77. But when concluding popular words in the direction of smoking cessation, it is summarized as edible nicotine [2]. Patra Braja et al. applied NLP to the analysis of health determinants. The results show that NLP can tap the health determinants, and it has great potential for development [3]. Wilfredo et al. applied NLP technology to intelligent robots to make robot capture human emotions by extracting intelligent semantics. In addition to the above technologies, ANN, Bi-LSTM, and other methods are also applied to robots to better recognize semantic features [4]. For example, Man and Bai et al. used SVM and other algorithms to identify features, so as to find the English semantics well. The result shows that it provides a reference for semantic extraction [5]. Zhai used Bi-LSTM to extract emotional semantics. The results show that this method can obtain important text feature information [6]. However, the research of integrated question-answering robot mainly solves the problem of extracting semantic and then matching the problem. In this regard, combined with the above research, an improved Bi-LSTM feature extraction question-answering method is proposed, and the method is verified.

The length of natural language text used by people is not fixed in real life, so it is difficult to divide natural language sequences into uniform fixed dimensions. In this regard, a Recurrent Neural Network (RNN) was created to divide the natural language sequence by the academic community [69]. The design purposes are initially achieved, but there were still some inherent disadvantages, one is only to consider the short-term dependence and that it cannot save long-distance information, and the other is to address long-series problems which may cause gradient explosion or gradient disappearance. In view of this, Hochreiter et al. created the Long Short-Term Memory (LSTM), which retained the advantages of the traditional RNN model and controlled the memory block state by using input gate, forgot gate, and output gate. The frame structure of the LSTM is shown in Figure 1.

It can be seen from Figure 1 that when the input gate is opening, the new input can change the existing information of the neural network, when the output door is open, access to historical information is allowed, and the subsequent output value can be changed. The main function of the forget gate is to clear historical information.

In order to explain the update mechanism of LSTM conveniently, x is set as the input data, h means the unit input of LSTM at time t, and C represents the value of LSTM memory unit.(1)Use the circular memory network formula to calculate the value of the candidate memory unit at the current moment.where represents the corresponding input data, and represents the weight of LSTM output at the last moment.(2)Solve the value of the input gate, which is mainly affected by the current input data and the previous moment and . The formula is as follows:(3)Solve the value of the forget gate, which represents the effect of historical information on the state value of the current memory unit. The formula is as follows:(4)Solve the state value of the memory unit at the current moment, which is mainly affected by its own state and the value of the candidate memory unit at the current moment. The formula is as follows:(5)The formula of solving the input gate is(6)The output formula of LSTM is

In terms of application effect, LSTM has obvious advantages over RNN in information storage, reading, and long-term information update. Bi-LSTM is based on LSTM and adopts bidirectional feature extraction. The specific structure is shown in Figure 2.

The principle of Bi-LSTM is to take the output of CNN network pooling layer as the input of two opposite LSTM, in which the forward LSTM mainly obtains the above information of the sequence, and the backward mainly obtains the below information. Finally, the final context information is obtained by splicing calculation.

1.1. Conditional Random Field (CRF)

The CRF model is usually a model that analyzes the conditional probability distribution of another random variables group under a set of random variables being given. The characteristic of this model is to assume that the output random variable can constitute MRF, and through the special conditional random field of linear chain to annotate the semantics, the specific labeling process is shown in Figure 2 [1012].

In Figure 3, Y represents the output variable, which is mainly used to mark the sequence, and X represents the input variable, which is mainly used to represent the observation sequence that needs to be labeled. In the course of model training, the model probability is estimated mainly by maximum likelihood; the maximum output sequence Y of the model is solved under the condition of given input sequence X.

2. Domain Entity Recognition Algorithm Based on Bi-LSTM + CRF Fusion

In order to better realize entity recognition of financial domain knowledge, which combined the basic principles of Bi-LSTM model, the fusion of CRF model and domain knowledge establishes distributed semantic vector representation of the words. Bi-LSTM model performs feature extraction and encoding for input sequence, and CRF model carries out entity annotation. The overall recognition model includes three parts: input layer, Bi-LSTM layer, and CRF entity annotation layer. The model framework structure is shown in Figure 4.

2.1. Input Layer

Different from ordinary deep learning models, the input layer of this model applies the strong characteristics of lexical information and being domain-related. The expression is as follows [13]:where , , and are word vector input, entity indicator word features, and the characteristics of entities in the field of candidate words dictionary, respectively.

2.2. Bi-LSTM Cod Layer

LSTM is driven in a left-to-right order, and capturing information bias may occur under this operating mechanism. For example, once the word “financial audit” is captured, and if you do not understand it with contacting the context information of finance, it is likely that the “audit” will be labeled independently. For domain entity identification tasks, the vocabulary in the statement is equal, and the following information is as important as the above information, so it is not safe to use the LSTM model alone. Bi-LSTM model is created to capture information, which can take into account and utilize context information. The Bi-LSTM model has built-in forward and backward layers, with the former advancing from the front of the sentence and the latter advancing from the end of the sentence. In this process, storing and utilizing the context information of the sentence can better play the function of entity recognition. The frame structure of Bi-LSTM model is shown in Figure 5.

The Bi-LSTM encoding layer learns the context information of the word through forward and backward operations, so the word can be represented, and it can help to determine whether the word is an entity. The forward and backward calculation formulas of Bi-LSTM cod layer are shown as follows:

In the formula, refers to the semantic vector representation of the input words, and refers to the output of the hidden layer of each word in Bi-LSTM.

2.3. Entity Labeling Layer of CRF

The CRF model is responsible for entity labeling, which is divided into labeling, computational features, model training, entity recognition, and so on. The first is to label the corpus with preset labels. For example, the corpus “What is the steps of duplicate certificate audits/audited?” is labeled, and the results are listed in Table 1 [14].

The CRF model has the ability to calculate the probability distribution of the global sequence and is used to estimate the entity label probability distribution for the entire sequence, so as to obtain the entity label distribution of the whole sequence. During model training, entity transition probability matrix is used to regulate the transition probability between entities, and the formula is as follows [15, 16]:where A is the entity transition probability matrix.

3. Constructing the Multigranularity Question-Answering Matching Model

On the basis of completing the knowledge entity identification of financial robots in the area of finance, to implement financial Q&A is also needed to match the problems. Both question-question matching and question-answer matching belong to the category of text matching task, and their corresponding model structure is consistent; the only difference lies in the selection of matching function. The key to text matching is to capture semantic information at different levels of text. LSA, statistics as well as learning, and LDA are the traditional text matching models matching based on the single semantic information, which is different from human text matching habits. The reason is that people will make a judgment on text matching by comprehensively analyzing global semantic information, contextual semantic information, word similarity, as well as other external knowledge, and so on. On the premise of integrating information of different granularity levels, the text matching model is established by using the input layer, presentation layer, and matching layer, and its frame structure is shown in Figure 6 [17].

3.1. Input Layer

The input layer contains the input of the original word sequence and the input of the entity recognition result. What can be known is that the two input sentences are , so the entity recognition result can be calculated. Entity E is composed of multiple words, and word sequence input is represented by the idea that word vector matrix is mapped to word vector. The formula is as follows:

Here, and are word vector and one-hot vectors, respectively.

Accordingly, entity input is represented by entity vector matrix mapped to entity vector, and the formula is as follows:

In the formula, and are entity vectors and one-hot vectors, respectively.

3.2. The Presentation Layer

The presentation layer encodes the input and maps it to the same semantic space as the input features of the matching layer, which is classified as follows.

3.2.1. The Two-Channel Encoded Text Relationship Modeling Based on Attention Mechanism

At present, there are two kinds of commonly used text semantic encoding models. One is the model based on RNN, which can capture global sequence information and model by analyzing context information. The second model is based on CNN, which can effectively capture local information at phrase level. Obviously, the two kinds of semantic cod models are complementary to each other. RNN model can store the captured context information of the word, but the traditional RNN model cannot store long-distance information, so the improved Bi-LSTM model is selected to deal with the long-distance dependence problem. It can model the above information of the word and store the following information, which enriches the representation of word level. Bi-LSTM model is used to map two word sequences into word vectors in the same semantic space, and the formula is as follows [1820]:where refers to the input word vector matrix, refers to the semantic representation of words, and forward and backward refer to forward operation and backward operation.

People usually understand texts with questions, so they focus on certain words or fragments, which does not take into account the weight information of different words. Therefore, attention mechanism should be introduced to highlight the importance of some words in the process of semantic information expression. Based on the characteristics of domain and model, common attention mechanisms include the attention mechanism based on answer and that based on entity information, having the same principle but different representation.

The attentional mechanism is vested with the corresponding weights depending on the importance of different words for the sentence expression, and the formula is as follows [21]:

In the formula, represents the weight of words solved by the attention mechanism, represents the context vector operated by participating weight, M represents the parameter matrix of the attention mechanism, represents the hidden layer output of the Bi-LSTM model, and represents the nonlinear transformation function.

Local feature of the original word vector can be extracted by using the convolution and pooling operations of CNN model, which can help to obtain phrase-level local information, and the output of each hidden layer in Bi-LSTM model is taken as the input of CNN model, which can further enrich the information of word vector and integrate the global semantic information. The core formula of CNN model is as follows [22]:

Here, is defined as the text semantic vector represented by the two-channel coded text relation model; stands for nonlinear function; refers to the output of Bi-LSTM model; represents the bias.

The two-channel coded model based on attention mechanism is shown in Figure 7 [23].

3.2.2. Interactive Text Relationship Modeling

Two-channel cod text relationship modeling uses the correlation model to represent the semantic vector of two texts, respectively, which are independent of each other and have the probability of losing subtle semantic information. In the interactive text relationship modeling, the interactive process of interactive text semantic representation has begun at the input layer, and the mapping of text to matrix is realized by means of dot product, 1–0, similarity operation, etc., and then feature extraction is carried out to obtain more subtle relationships.

In terms of the interactive text relationship modeling, the first step is to establish an interactive matrix to control the maximum length of the two texts. If the length is insufficient, it should be completed. The second step is to conduct word-by-word matching operation, as shown in the following formula:

In the formula, F represents various calculation methods. Taking 0–1 calculation method as an example, if the two words and are consistent, 1 will be output; otherwise, 0 will be output. Taking cosine similarity calculation method as an example, text interaction matrix is firstly established, and then it is input into multilayer CNN model, and more subtle semantic information is obtained through multilayer convolution-pooling operation. Cosine similarity calculation formula is as follows:where refers to semantic vector representation, and refers to text interaction matrix.

The framework of the interactive text relation model is shown in Figure 8 [24].

3.2.3. The Entity Information Integration

We can judge the matching problem between problems according to entity information, and inputting entity information into the model can improve its effect. The schematic diagram of entity representation is as follows as Figure 9 [25].

The recognition results of the two texts (entity information) are represented by the entity vector matrix, respectively, mapped to entity vector. Since the entity information occupies a low proportion in all word information, the word-average method is used to encode and represent the entity information, and it is used as the input feature of the matching layer. The formula is as follows:where refers to entity-related sentence semantic representation; is the entity vector.

3.3. The Matching Layer

The matching layer calculates the matching score based on the semantic representation of the presentation layer. A knowledge point consists between a question and an answer. If you just match according to the question or answer, there may be serious deviation. For example, the two questions are very similar, but the answer does not match the questions. In this regard, it is necessary to realize the fusion of Paraphrase Identification and Answer Selection to train the final model. In addition, although the model structure of the two matching modes is consistent, there are still some differences, and the appropriate matching methods need to be selected.

3.3.1. Problem-Problem Matching

In this matching mode, the user issues and the problem of knowledge base are in the same semantic space. And according to the semantic similarity, the matching of the two problems can be determined. The semantic vector can be obtained through the problem-problem matching pattern, and it represents the semantic vector representation of two texts and their similarity features. The expression is as follows:

Here, refers to the cosine similarity between the semantic vectors .

3.3.2. Question-Answer Matching

In this matching mode, user issues and answers are in different semantic space, and their correlation or subordination is determined through the semantic matching. Since the matching of question and answer is not the similarity feature of two kinds of semantics, the similarity feature cannot be used to represent their relationship. A more abstract relationship should be established. The nonlinear function is used to ensure the semantic correlation of the two vectors, as follows:where represents the implicit abstract relation, which can map the two vectors into the same semantic space.

The semantic representation features that concentrate multigranularity are integrated; at the same time the two-channel cod output vectors are connected. The subtle difference of semantic representation between problems can be more fully exploited. Then, input it into the fully connected neural network model for learning, and solve the final match score; the related calculating formula is as follows:

In the formula, and are the input characteristics of question-question matching and question-answer matching, respectively; and are the matching probabilities between question-question and question-answer, respectively.

4. Experimental Verification

4.1. Entity Identification Experiment in the Field of Enterprise Financial Audit
4.1.1. Model Training and Parameter Tuning

TensorFlow platform is used for model development. The model related parameters include the term vectors, the implied relation parameter M, entity type vector, full-connection layer parameters, CNN and LSTM parameters, etc. According to the implied relationship matrix M in the question-answer matching, the implied relation parameter can be determined. And the other hyperparameters can be determined by using the gradient search. Thus, the optimal parameter combination can be obtained. The specific optimal parameters are shown in Table 2.

4.1.2. Evaluation Indicators

The entity recognition method based on deep learning and integrating domain knowledge proposed in this paper is tested by experiment. The training data used for evaluation is obtained through manual annotation, and the appropriate evaluation standards are set. In this experiment, F1, precision, and recall were used to evaluate the model. The formula of each index is as follows [26]:

4.1.3. Experimental Results and Analysis

Comparing the effects of referencing different features, the validity of domain knowledge characteristics referred to this paper is verified.

Based on Bi-LSTM + CRF model, the different features such as indicators, entity dictionary, and part of speech are introduced to conduct comparative experiments with the proposed model. The results are listed in Figure 10; the M_1 to M_5 represent the model without attention mechanism introduction, the model with entity attention mechanism, the model with answer attention mechanism, and the proposed model, respectively.

It can be seen from Figure 10 that, based on the Bi-LSTM + CRF model, introducing different features is beneficial to entity recognition. For example, introducing the part-of-speech features can increase F1 value by 0.35%. This is because the introduced part-of-speech features complement the word vector information, but the word vector information already contains rich semantics. Therefore, the introduction of part-of-speech features is of little benefit to entity recognition. The introduction of entity features can increase F1 value and recall rate by 1.24% and 1.47%, respectively, because the introduction of entity dictionary in the professional field of corporate financial audit has an important effect on identifying those relatively sparse entities. The introduction of indicator features can increase F1 value and accuracy by 0.87% and 0.92%, respectively, because the introduction of indicator features can correct the error-prone entities. By comparison, the entity recognition effect of the proposed model is superior to the Bi-LSTM + CRF model and the Bi-LSTM + CRF model with a single feature. It indicates that there is complementarity among several features. After integrating the multiple features, introducing the Bi-LSTM + CRF model can significantly improve the final entity recognition effect.

4.2. Question and Answer Retrieval Experiment
4.2.1. Experimental Settings

The setup of question and answer retrieval experiment mainly involves two aspects: the training data construction and the model evaluation standard.

(1) Training Data Construction. The collected basic financial data are preprocessed to obtain a total of 10,000 training data of “user question-knowledge point title-answer-label format.” Then the training data is manually annotated. Finally, they are randomly assigned to the training set, verification set, and test set with an allocation ratio of 8 : 1 : 1, which is, respectively, used for model training, model parameter tuning, and model effect verification.

(2) Model Evaluation Criteria. The key to question and answer retrieval task is to solve the sentence similarity, and the setting of similarity threshold directly affects the evaluation indexes of each model. In this paper, two sets of evaluation criteria are proposed to evaluate the model. One is AUC, which has nothing to do with the threshold of classification criteria and is often used to evaluate dichotomy models. The second is average precision MAP and average accuracy MRR, which are associated with information retrieval ranking.

4.2.2. Experimental Results and Analysis

This section tests the validity of the Q&A model proposed in this paper by comparing the recognition effects of different attention mechanisms and entity features.

(1) Comparative Experiments of Different Types of Attention Mechanisms. The model in this paper introduces the entity attention mechanism and the answer attention mechanism. The former is based on the entity cod vector, which is used to solve the weight of each word in the problem. The latter is based on the answer cod vector, which is used to solve the weight of each word in the problem. The comparative experimental results are listed in Figure 11; the M_1 to M_4 represent the model without attention mechanism introduction, the model with entity attention mechanism, the model with answer attention mechanism, and the proposed model, respectively.

As can be seen from Figure 11, compared with the model without attention mechanism, the model that introduces attention mechanism has better effect. Comparing the effect of the entity attention mechanism model with the answer attention mechanism model, the improvement effect of the entity attention mechanism is more significant, because in the model in the process of problem representation, the entity attention mechanism can restrict the model to pay more attention to the entity information, so as to identify the core semantics more accurately.

(2) Comparative Experiment of Entity Feature. By comparing the effect of the model without introducing entity feature information with that introducing entity feature information, the effectiveness of introducing entity feature information in improving the effect of Q&A matching model is verified. The experimental results are listed in Figure 12.

According to Figure 12, compared with the Q&A model introducing entity feature information, after entity feature information was introduced, the AUC, MAP, MRR, and other evaluation indicators of the Q&A model increased by 0.74%, 0.85%, and 0.81%, respectively. It indicates that the introduction of entity feature information was feasible and effective, because the entity feature information contains a large amount of entity feature prior knowledge. The question + answer matching model uses training data to learn automatically. So the influence mechanism of entity feature information on the final matching relationship can be clarified, which is helpful to improve the effect of entity recognition.

5. Conclusion

As can be seen from the above studies, the relevant indicators obtained by the improved entity recognition method in this paper have obvious advantages over the traditional methods. In addition, the AUC, MAP, and MRR indicators, etc. obtained by the improved entity recognition method are higher than those of other methods. Therefore, the improved method has obvious advantages. What is more, it can be seen that the accuracy of semantic recognition method is greatly improved; the fundamental reason is that, after combining the Bi-LSTM with CRF to process and extract the entity features of financial knowledge field, there is a large amount of prior knowledge in the features, so as to greatly improve the accuracy of recognition.

Data Availability

The experimental data are available from the corresponding author upon request.

Conflicts of Interest

The authors declare that they have no conflicts of interest regarding this work.

Acknowledgments

This work was sponsored in part by Hebei Intelligent Financial Technology Innovation Center.