Abstract
The use of intelligent judgment technology to assist in judgment is an inevitable trend in the development of judgment in contemporary social legal cases. Using big data and artificial intelligence technology to accurately determine multiple accusations involved in legal cases is an urgent problem to be solved in legal judgment. The key to solving these problems lies in two points, namely, (1) characterization of legal cases and (2) classification and prediction of legal case data. Traditional methods of entity characterization rely on feature extraction, which is often based on vocabulary and syntax information. Thus, traditional entity characterization often requires extensive energy and has poor generality, thus introducing a large amount of computation and limitation to subsequent classification algorithms. This study proposes an intelligent judgment approach called RnRTD, which is based on the relationshipdriven recurrent neural network (rdRNN) and restricted tensor decomposition (RTD). We represent legal cases as tensors and propose an innovative RTD method. RTD has low dependence on vocabulary and syntax and extracts the feature structure that is most favorable for improving the accuracy of the subsequent classification algorithm. RTD maps the tensors, which represent legal cases, into a specific feature space and transforms the original tensor into a core tensor and its corresponding factor matrices. This study uses rdRNN to continuously update and optimize the constraints in RTD so that rdRNN can have the best legal case classification effect in the target feature space generated by RTD. Simultaneously, rdRNN sets up a new gate and a similar case list to represent the interaction between legal cases. In comparison with traditional feature extraction methods, our proposed RTD method is less expensive and more universal in the characterization of legal cases. Moreover, rdRNN with an RTD layer has a better effect than the recurrent neural network (RNN) only on the classification and prediction of multiple accusations in legal cases. Experiments show that compared with previous approaches, our method achieves higher accuracy in the classification and prediction of multiple accusations in legal cases, and our algorithm is more interpretable.
1. Introduction
In contemporary society, the demand for big data assistance in the judgment of legal cases, such as case intelligence research [1] and judgment [2], big data comprehensive supervision, and assistance in handling legal cases, is increasing with the development of big data and artificial intelligence technology. Researchers are committed to creating an “intelligent legal case judgment” project that combines big data and artificial intelligence. Legal case multiaccusation judgment business is an important part of the realization of such a project. Legal case multiaccusation judgment technology fully applies big data and artificial intelligence technology to service judgment making, legal case handling [3], and facilitation of the public. Big data provides judgments with recognized standards for judging legal cases and avoids the occurrence of different judgment results in similar legal cases. Artificial intelligence technology avoids the subjectivity of human beings, performs scientific and accurate analyses of cases from the perspective of cases and laws, and helps judges make objective judgment in legal cases.
The solution to using big data and artificial intelligence technology to accurately judge multiple accusations in different legal cases involves two main points, namely, (1) construction of a comprehensive and accurate characterization method of legal cases and (2) realization of a classification and prediction algorithm for multiple accusations involved in a large number of legal cases. Figure 1 shows the process of multiaccusation classification for legal cases. Traditional methods of entity characterization are often used to model an entity by tagging it. However, these feature extraction methods are highly dependent on the vocabulary and syntax in the entity data set and require heavy manpower and material resources. The generality of the tagged model is poor. In addition, feature extraction methods based on vocabulary and syntax require strong expert knowledge as support. The resulting entity characterization considerably limits the subsequent classification algorithm, and the algorithm’s accuracy becomes highly volatile.
This study proposes an intelligent legal case judgment technique called RnRTD, which is based on the relationshipdriven recurrent neural network (rdRNN) and restricted tensor decomposition (RTD). Figure 2 shows the framework of our approach. We present legal case data as tensor χ and propose an RTD technique. RTD is less dependent on vocabulary and syntax than traditional feature extraction methods, and it focuses more on extracting the information of potential structures in legal case tensors. RTD maximizes the accuracy of rdRNN by combining text and structural information. RTD maps legal case tensor χ into specified feature space , which decomposes the original tensor χ into core tensor and its corresponding feature matrix set under the restricted condition η in RTD. The obtained core tensor represents the tensor structure information that is most helpful in improving the accuracy of the rdRNN classification algorithm. Core tensor can be interpreted as the most advantageous feature structure in χ for rdRNN. RTD is an important feature extraction and dimensionality reduction operation. This study uses rdRNN to update and optimize restricted condition η in RTD iteratively so that its feature space continually approaches an ideal region, thus enabling rdRNN to achieve an optimal effect in the classification of multiple accusations in legal cases.
Compared with traditional feature extraction methods, RTD obtains legal case characterization containing tensor element and structural information that is more conducive for improving the accuracy of the rdRNN classification algorithm, and it has lower dependence on vocabulary, syntax, and expert knowledge. That is, the RTD legal case characterization model has better universality and fewer requirements on the dataset format in comparison with traditional feature extraction methods. Compared with the direct use of the original legal case tensor χ as the input of the RNN classification algorithm, rdRNN with an RTD layer has a better effect on the classification of multiple accusations in legal cases. The main reason is that rdRNN constantly updates and optimizes RTD restricted condition η, thereby enabling RTD to point to feature space where rdRNN has the best effect in legal case classification.
The main contributions of this study are summarized in the following points:(i)This study uses a new method of characterizing legal cases. This study expresses a legal case as a tensor and proposes an RTD method that maps the original legal case tensor into a new feature space. RTD extracts the favorable tensor structure and text information for the subsequent classification algorithm from the original legal case tensor. RTD also extracts valuable tensor features and reduces tensor dimensions. The core tensor obtained by RTD is interpreted as the most valuable tensor structure and textual feature information extracted from the original legal case tensor for the rdRNN classification algorithm.(ii)This study proposes rdRNN, which is a new approach for intelligent judgment of multiple accusations in legal cases. We add a new gate and a similar case list to control the interaction between tensors of legal cases on the basis of the original neural networks. rdRNN is particularly used for the intelligent judgment of multiple accusations in legal cases. It fully considers the impact of the relationship between legal cases on the judgment results of such cases. For example, highly similar legal cases are likely to have similar judgment results and vice versa.(iii)This study proposes a neural networkbased method for the optimization of the restricted tensor. The restricted tensor is a bridge between the RTD algorithm and rdRNN. rdRNN controls the tensor decomposition process by optimizing the restricted tensor, which guides the core tensor along the direction that is most conducive for improving the accuracy of the classification model. We derive the partial derivative of the loss function in rdRNN for the restricted tensor and realize the optimization operation of the neural network for the restricted tensor.
Section 2 gives the recent research progress on the classification of multiple crimes in legal cases. Section 3 introduces related definitions and the concepts involved in this study. Section 4 introduces the proposed approach for the judgment of multiple accusations in legal cases. Section 5 provides the experimental results and analysis of this study, and Section 6 presents a detailed discussion of the proposed method.
2. Related Work
With the advent of the era of big data and the development of artificial intelligence technology [4], the emergence of deep neural networks provides great prospects for accurate classification and prediction [5]. Neural networkbased knowledge representation and reasoning methods enable deep learning approaches to be applied to many scenarios [6]. For the legal field, the combination of artificial intelligence and law has become an inevitable trend [7]. However, current research in this area mainly focuses on legal case modeling [8], legal case document retrieval [9], legal consultation questionandanswer systems [10], and legal case similarity reasoning work [2]. Little research has been conducted on the multiaccusation determination of cases in the legal field.
Bartolini et al. proposed a semantic annotation method for indexing and retrieving legal texts [11]. The method uses a specific segment extraction and text classification algorithm to automatically semantically mark legal documents. Aleven developed a computational model based on artificial intelligence algorithms and professional legal knowledge [2]. The model determines the correlation between cases based on the context and problem scenarios of the case. Joshi et al. proposed a text mining method for electronic evidence review of legal cases [12]. The method uses semantic topic and text classification technology to repeatedly detect the feature vocabulary in legal documents and then automatically segments and screens the documents, avoiding the manual work of legal analysts.
Sulea et al. proposed a legal case judgment system based on SVM classifier [13]. The method uses machine learning techniques to predict the legal field to which the legal case belongs and the outcome of its judgment. By accurately extracting the features of legal cases, the method can roughly predict the specific date of the case. Brninghaus and Ashley proposed a text classification method based on facts of legal cases [14]. The method uses artificial intelligence algorithms and legal background knowledge to predict the outcome of legal cases. The method extracts facts of legal cases, indexes and models them according to the features, and finally completes the classification of legal cases.
The critical part for the prediction of legal case judgments is case modeling and case classification. Traditional text modeling methods are based on feature tags, which rely heavily on the syntax and semantic information of the source data. Labeling features requires a lot of manual work and expert knowledge. Therefore, the text classification algorithm formed on this basis is not scalable, and the accuracy is highly volatile.
3. Preliminaries
This section introduces the related methods, definitions, and background knowledge involved in this study. Section 3.1 presents the basic notations and definitions. Section 3.2 provides a formal representation of the tensor decomposition problem. Section 3.3 introduces the calculation process of forward propagation in bidirectional long shortterm memory (BiLSTM). Section 3.4 presents a formal description of the problem about intelligent legal judgment to be solved in this study.
3.1. Definitions and Notations
This section describes the relevant notations and definitions required in this work. Tensors are actually multidimensional matrices [15], which we represent in Euler script letters, such as χ and ν. We refer to the dimensions as tensor modes and to the number of a tensors modes as order. We describe the scalars in lowercase letters (such as ) and the vectors in boldface lowercase letters (such as c, d). We declare the matrices in capital letters, such as A and B. We use to represent the transpose of matrix A. We express the identity matrix as I, the identity tensor as τ, and the matrix with all elements of 1 as 1. Table 1 shows all the required notations and definitions.
Definition 1 (outer product). The outer product of vectors and is denoted as , where and .
Definition 2 (elementwise multiplication). The elementwise multiplication of vectors and is denoted as , where and . In another case, the elementwise multiplication of vector and matrix is denoted as , where and .
Definition 3 (Kronecker product). Given vectors and , their Kronecker product is denoted as , where and . Given matrices and , their Kronecker product is denoted as .
Definition 4 (Khatri–Rao product). Given matrices and , their Khatri–Rao product is denoted as , which is calculated by combining the Kronecker product of each corresponding column in A and B, that is
Definition 5 (nmode matricization). Given an Nmode tensor χ, . χ can be matrixed into N forms according to each mode. We denote the nmode matricization of χ as , where . is obtained by keeping the nth mode unchanged while expanding and concatenating the slices of the remaining modes into a matrix.
Definition 6 (Frobenius norm of a tensor). Given an Nmode tensor χ, , the Frobenius norm of χ is denoted as
Definition 7 (nmode stretch). Given an Nmode tensor ν, , and a weight matrix W, . The nmode stretch between ν and W is expressed as , where .
Definition 8 (nmode product). Given an Nmode tensor and a matrix , their nmode product is denoted as , .
3.2. Tensor Decomposition
Many tensor decomposition methods, such as PARAFAC and Tucker decomposition, are currently available [15]. As shown in Figure 3, tensor decomposition methods decompose the original tensor into a core tensor and a series of corresponding factor matrices. The essence of tensor decomposition is to approximate the original tensor by using the product of the core tensor and the factor matrices. The mathematical description of tensor decomposition is as follows:
Given an Nmode tensor χ, . The following formula can be obtained by using the tensor decomposition method:where τ is the core tensor, , and is the corresponding factor matrix set, . Each element in is a column orthogonal matrix. τ and also minimize function φ, where
3.3. BiLSTM
RNNs have farreaching implications for the study of sequence data [16]. The nodes between the hidden layers of RNN are connected [17], that is, the input of the hidden layer contains not only the output of the input layer but also the output of the hidden layer at the last moment. In theory, RNN can process sequence data of any length. However, gradient disappearance or gradient explosion often occurs when RNN deals with longdistance dependence, thereby making RNN training difficult. The hidden layers of the original RNN has only one kind of state, which is very sensitive to shortterm inputs. Long shortterm memory (LSTM) deals with longdistance dependence by increasing the longterm memory state in the original RNN [18].
As shown in Figure 4, we represent the input value of LSTM at time t as , the output value from the previous moment as , and the longterm unit state at time as . We record the unit status entered at time t as . The output value of LSTM at time t comprises two parts, namely, the output value of LSTM at current time and the unit state of current time . LSTM sets up three control gates, which are forget, input, and output, to control the longterm unit state c. The forget gate is used to determine how much of the longterm unit state at the previous moment is retained at the current moment. For example, the forget gate at time t determines the weight of in the calculation of . The input gate is used to determine how much of the input of LSTM is retained in the current longterm unit state. For example, input gate determines the weight that takes while calculating . The output gate is used to determine how much the longterm unit state at the current moment contributes to the output of LSTM at the current time. For example, output gate determines the influence of the value of on .
The process of forward propagation calculation in LSTM is described as follows:
The longterm unit state at current time t is calculated by , , and , and the final output of LSTM is calculated by and . That is,where is the output of LSTM at time , is the input of LSTM at time t, σ is the sigmoid function, which is our selected activation function in LSTM, is the unit state input at time t, , , and are the weight matrices of the forget gate , the input gate , and the output gate , respectively, and , , and are the bias matrices of , , and , respectively. The activation function used in calculating is the hyperbolic tangent function, where is the weight matrix and is the bias term.
BiLSTM is a bidirectional RNN [19]. The unit state of the hidden layer in BiLSTM is calculated from the outputs of forward and backward LSTM. We define the output unit state of BiLSTM at time t as , the output unit state of forward LSTM as , and the output unit state of backward LSTM as . The aforementioned forward propagation formula of LSTM implies that
3.4. Problem Description
Problem 1. We express the legal case as a tensor and classify the legal case according to the judgment result. The category of each legal case is indicated by a scalar, such as r. Given a legal case dataset that contains legal cases with judgment results, . represents the nth legal case in the legal case dataset . indicates the type of legal judgment result that corresponds to the nth legal case. Our goal is to train a case classification model that can classify legal cases based on their judgment results.
In this study, legal cases are represented as threedimensional tensors. As shown in Figure 5, the first dimension represents the basic components of the case, such as the defendant’s statement, the plaintiff’s statement, the public prosecution, and the court’s trial. On this basis, the matrix slice that contains the last two dimensions represents the matrix form of the corresponding legal case component. The matrix slice is composed of the accumulation of word vectors inside the legal case component. Generally, case components are matrixed instead of including the word vectors of all the words in the matrix. We selectively extract words that are valuable for the legal case classification. These words can be divided into two categories. The first category usually includes nouns or pronouns, such as characters, times, places, and objects; the second category usually comprises adjectives, numerals, or verbs, such as the means of committing accusations, the degree of harm, and the number of accusations.
4. Our Approach
This study proposes RnRTD for the multiaccusation determination of legal cases. Figure 6 shows the RnRTD framework. First, we extract core tensors from the original tensors using the RTD method. The core tensor approximates the restricted tensor in terms of the tensor structure and elements. Second, we use rdRNN to optimize the restricted tensor so that it guides the core tensor along the direction that is most conducive for improving the accuracy of the classification model.
4.1. RTD Method
This study proposes a new tensor decomposition method called RTD method. The inputs of the RTD algorithm include the restricted condition tensor η and tensor χ that represents the legal case. The RTD outputs include core tensor and its corresponding factor matrix sets, namely, and . RTD decomposes χ into a core tensor under the action of the restricted condition η. is approximated to η in terms of tensor structure and internal element values. RTD can be interpreted as a mapping of the original tensor χ to the core tensor . In short, RTD achieves directional decomposition of tensors and extracts vital information from tensors while reducing their dimension. In this study, we define core tensor as the most favorable tensor structure and element value information for the subsequent legal case classification algorithm, namely, rdRNN. On this basis, we construct a deep neural network model for RnRTD that is dedicated to legal intelligence judgments.
RTD decomposes the original tensor under the restricted condition so that the obtained core tensor constantly approaches the restricted tensor in terms of tensor structure and element value. In Figure 7, the formal description of the problems to be solved by the RTD algorithm is shown as Problems 1 and 2.
Problem 2. Given tensor , restricted tensor , and its weight , we derive two factor matrix sets, namely, and , , , that and minimize the following function:Matrix is preset according to the legal case, . The elements in sets and are orthogonal matrices, that is, they meet the following conditions. For any elements and in sets and ,In this study, we use the alternating least squares (ALS) algorithm to determine the solution of the objective function ϕ. The ALS algorithm can be divided into four steps: (1) randomly pick a variable as a parameter and randomly generate the values of other variables, (2) determine the partial derivative of the loss function ϕ in the specified parameter while fixing the values of other variables, (3) set the partial derivative of ϕ to the specified parameter as zero and calculate the value of the specified parameter, and (4) select another variable as a parameter and return to Step (2). The ALS algorithm continues to iterate Steps (2), (3), and (4) until the error of the loss function ϕ reaches the tolerable upper limit.
Problem 2 needs to be solved using Lemma 1. The specific definition and proof of Lemma 1 are provided as follows.
Lemma 1. Given function , , α can be a vector, matrix, and tensor. The target function , where , , and satisfy equation (12). For any element in , , the partial differential of function φ to is , where ε is a constant.
The proof of Lemma 1 is shown in Proof 1.
Proof 1. We use μ to represent ; it can can be derived that . We abbreviate the target function . We use ν to represent , and we can get that . According to the function derivation rule, we can obtain the following equation: . Since , the calculation formula for the partial derivative of the function φ to iswhere ε is a constant, .
According to the iterative process of the ALS algorithm, the precondition for solving the value of and which minimize the function ϕ in equation (11) using the ALS algorithm is to calculate the value of , where . Equation (11) shows that and are solved in the same manner. Proof 2 provides mathematical proof of the calculation of .
Proof 2. We use λ and ϖ to represent and , respectively. According to formula (11), we can obtain that . We abbreviate the aforementioned formula as . Then, we can determine the following formula: . According to the function derivation rule, we derive that , . In combination with Lemma 1, we can obtain that . Finally, the following formula is determined:We set the value of equation (14) to 0 and obtain thatLet , then . By combining Equation (12), we derive thatWe use the SVD matrix decomposition method to decompose Z, and find that . P and Q are orthogonal matrices, P is a left singular matrix, Q is a right singular matrix, and S is a diagonal matrix. After this analysis, the following solution can be obtained:In summary, according to equations (11)–(17), we can derive a solution of and which are described in equations (14) and (17), respectively. is calculated in the same manner as . On this basis, we calculate the value of and , which minimize the objective function ϕ in formula (11), by using the ALS algorithm.
Algorithm 1 shows the solution of Problem 2 by using ALS algorithm. The inputs of Algorithm 1 are χ which represents one legal case, the restricted tensor η, and its weight . In line 2, we randomly initialize the values of , . in line 2 represents the maximum number of iterations of ALS algorithm. The function in line 4 corresponds to equation (16). Line 5 and 6 show the calculation process of equation (17).
Another problem to be solved by the RTD algorithm is Problem 3, which is the formal description of the process of tensor decomposition on the original tensor under the action of the restricted tensor and its weight. On the basis of Problem 2, we can obtain factor matrix sets and , which minimize the value of function ϕ in formula (11) while satisfying formula (12).

Problem 3. Given a tensor and factor matrix sets and , , , and are derived from Problem 2. A core tensor is determined, where and minimize the following target function:Problem 3 needs to be solved using Lemma 2. The specific definition and proof process of Lemma 2 are as follows.
Lemma 2. Given the target function , , , each element in satisfies formula (12). Then, the partial derivative of the target function ψ to where is , , where ε is a constant.
Proof 3. We use κ to represent ; τ is the identity tensor, . Then, the target function ψ can be rewritten as . We use ρ to represent . We can obtain that . From the function derivation rule, we can get the following formula: ; thus,where ε is a constant, .
After the aforementioned analysis, Proof 4 gives the solution to Problem 3 and its mathematical proof process while combining Lemma 2 and Proof 3.
Proof 4. We use υ to represent and γ to represent , where τ is the identity tensor, . Then, the function can be rewritten as , that is . Known by the definition of function , . Then, we can get the following equation: . It can be derived from the function derivation rule and Lemma 2 thatwhere ε is a constant, . Let be zero. We can get the final solution of Problem 3 by combining formula (12).Algorithm 2 implements RTD by using Algorithm 1. Function in line 1 represents the implementation of Algorithm 1, and the inputs are χ, η, and . Function in line 2 shows the calculation of using equations (18)–(21). Finally, the core tensor of χ is obtained by using Algorithm 2, which approximates the restricted tensor η on the layer of tensor structure and elements information.

4.2. rdRNN
This study proposes a new RNN called rdRNN. Unlike traditional RNN, rdRNN sets up a new gate based on the bidirectional RNN. The new gate uses the similarity matrix between samples as a parameter of the deep neural network training model. Compared with the original bidirectional RNN, the classification result of rdRNN is more accurate and stable. For the intelligent judgment of legal cases, the original deep neural network method does not consider the correlation between legal cases. This disregard may lead to bias in the final case classification. For example, the verdict of a legal case is inconsistent with the description of the case. To solve this problem, rdRNN fully considers the judgment results of legal cases that are similar to the case to be judged. rdRNN uses these results as a parameter of the deep neural network training model and realizes an efficient and accurate classification of multiple accusations in legal cases.
The following section shows the training of rdRNN’s deep neural network while using Rmsprop as its optimization function:(i)We use the dataset and the restricted tensor η as inputs of rdRNN. is the core tensor of , which represents the legal case. is obtained by the RTD algorithm with and η as its inputs. represents the category label of according to the judgment result of legal case.(ii)In this study, we combine rdRNN with the softmax layer to complete the classification of legal cases. For sample , assuming is the output vector of rdRNN, the softmax layer implements the mapping of to the legal case category .(iii)We use cross entropy as the loss function to update rdRNN. rdRNN uses its forward propagation algorithm and error backpropagation formulas to iterate over the values of parameters in neural networks, such as weight matrices and bias terms that are associated with relationship gate, and restricted tensor η, where d is the number of hidden layers.(iv)We select Adam as the optimization function of rdRNN, and Rmsprop completes the optimization and calculation of parameters , , and η by using , , and .
4.2.1. Calculation of Forward Propagation in rdRNN
In this study, we fully consider the relationship between legal cases and set up a new gate to complete the classification of legal cases, eliminate contingency errors as much as possible, and avoid inconsistencies between the predicted judgment result and the actual case. Relationship control gate is used to control the similar relationship between legal cases. helps the rdRNN deep neural network make an intelligent judgment by using the judgment results of cases that are similar to the case to be judged.
rdRNN can be divided into forward and backward LSTM. These networks do not have obvious differences, except for the opposite propagation direction. In the case of rdRNN forward LSTM propagation network, the formal description of relationship control gate is as follows:where and are the weight matrix and bias term of the relational control gate , respectively, σ is the activation function, i.e., the sigmoid function, is the output unit state of the neuron at time , and is the input value of the neuron at time t.
In the forward LSTM network, the output of each neuron at time t is calculated by the following formula:where , , , and are the relational control, forget, input, and output gates, respectively; is the unit status of current inputs; , , , and are the weighted inputs of their corresponding gates at time t; is the weighted inputs of input state generation function ; σ is the activation function, i.e., the sigmoid function, ; is the hyperbolic tangent function, ; is the weight matrix of relationship control gate , ; , , and are expressed in the same manner as ; and , , , and are the bias terms of their corresponding activation functions.
Subsequently, the unit state of the current moment is calculated by , , , and . The calculation formula is expressed as follows:
The final output of the forward LSTM neural network at time t is calculated by , , and and the similar list of x. It is described as follows:where is composed of legal cases where the similarity with x is greater than a threshold so far. refers to the output of the forward LSTM neural network that corresponds to legal case . is a function that calculates the similarity between legal cases. In this study, we set function as the weight of the Euclidean distance and the cosine distance between legal cases.where and refer to the Euclidean distance and the cosine distance between the vectors x and , respectively. is the weight matrix.
4.2.2. Calculation of Backpropagation in rdRNN
In this section, we describe in detail the backpropagation algorithm of the rdRNN neural network, including the backpropagation of the error along time and the hidden layer. In rdRNN, forward and backward LSTM neural networks have the same principle in the backpropagation algorithm. Therefore, this section mainly uses forward LSTM as an example.
Given the error term at time t , . Calculation of the backpropagation algorithm of the error term along time is to calculate the value of . The full derivative formula shows that
Equations (23)–(25) show that , , , , and are all functions of . Then, we can obtain
The formula on the left represents the variable declaration, and the formula on the right is calculated from equation (23). According to equations (27) and (28), we can further derive the following formula:where , , , . According to relationship control gate , equations (23) and (25), we determine that
From equation (29), we can finally figure out the calculation method of the error term in rdRNN is passed from the current moment t to any time k.
Then, we describe in detail the transmission of error between the hidden layers of rdRNN. The error term of the lth hidden layer in rdRNN is assumed to be the partial derivative of the error function versus the weighted input . In rdRNN, the input of the lth hidden layer at time t is .where denotes the activation function of the th hidden layer in rdRNN and denotes the weighted input of the th hidden layer at time t.Given the error term of the lth hidden layer at time t , , the calculation of error propagation between hidden layers is to figure out the value of , where
According to equations (23), (31), and (32), , , , , and are all functions of , and is a function of . Therefore, the full derivative formula shows that
The following formula can be obtained by further calculation:where is the derivative of function at .
According to equations (27)–(34), we can derive the partial derivative of the loss function to the weight matrix set and the bias term set in rdRNN. Given that , we obtain
4.2.3. Calculation of the Partial Derivative of Loss Function to Restricted Tensor η
This study proposes a new intelligent method for judging legal cases called RnRTD, which combines rdRNN and RTD to complete the classification of legal cases. In the process of training the RnRTD neural network, a new problem is involved: updating of the value of the restricted tensor η so that it can continuously approximate the tensor value that is most beneficial for improving the classification accuracy of the RnRTD algorithm.
The crux to solving this problem is to calculate the partial derivative of the loss function to the restricted tensor η, that is, . Directly solving the value of is difficult. We can use the full derivative rule to obtain that
The backward propagation formula of rdRNN shows that . According to equations (19)–(21), we determine that . From equations (14)–(17), we know that and are all functions of η. Therefore, the function full derivative rule shows that , that iswhile
Algorithm 3 provides the optimization process of RnRTD proposed in this study. in line 2 represents the total number of training sessions of RnRTD. in line 6 represents the number of samples per batch while training RnRTD. The RTD tensor decomposition method in line 7 corresponds to Algorithm 2. in line 8 and in line 9 represent forward propagation and error backpropagation algorithms for rdRNN, respectively, which are the implementations of Sections 4.2.1–4.2.3. In line 11, we use the Adam algorithm to realize parameter optimization of RnRTD neural network.
