Abstract

Natural optimization algorithms have attracted much attention from researchers because they can simulate or explain certain prediction processes. The traditional method of predicting the factor value of legal reporting information based on causal window has shortcomings caused by individual weak classifiers, so the prediction adaptability is poor. Aiming at the construction of the early warning model of legal reporting information, this paper proposes a semi-integrated natural optimization algorithm. The natural optimization algorithm uses the variance of the supporting area factor to characterize the smoothness of the factor neighborhood and uses the optimal threshold parameter for factor classification. It solves the capacity-distortion problem of the hidden algorithm of traditional legal reporting information. The experimental results show that the natural optimization algorithm has better performance. The classification error rate in the question is reduced to 0.137, which effectively promotes the practicability of classification prediction of legal reporting information.

1. Introduction

Optimization technology is an applied technology based on mathematics to solve the optimization solution of various problems. There is an important branch among them—the intelligent optimization natural optimization algorithm. The intelligent optimization natural optimization algorithm is developed by simulating or explaining some natural phenomena or processes. Like the ordinary search natural optimization algorithm, it is an iterative natural optimization algorithm, which has the advantages of global, parallel, and efficient optimization performance, robustness, and strong versatility [1]. Fuzzy search, precise search, and intelligent search are carried out according to keywords. Fuzzy search and precise search are based on the literal matching search of keywords, and intelligent search is that the system conducts some related cases, courts, judges, and laws according to the keywords input by the user. Regarding the recommendation of the firm and lawyer, select one of them for precise retrieval; when the user enters a keyword, the system analyzes it in the database and filters out the relevant search terms used by others to automatically complete the user. The searched content uses the knowledge graph behind the system to complete the thinking process of a complete key chain for users [24].

Many methods have been proposed to deal with the traditional uncertainty problem, and the research and prediction of the uncertainty problem constitute an important branch of science. The number of samples of these problems is usually very limited, or even very few, and most data series do not contain explicit quantitative relationship characteristics. This makes it difficult to deal with these problems. Many methods have achieved good results, but these methods still have room for improvement [57]. In the past, for the selection of parameter Q, the default parameters were generally used, or predictions based on actual data and models were used. The effect is set at random, and there is no definite rule; again, the optimization model GM(1,1) needs at least four pieces of data, and the improved new optimization model FGM(1,1) needs at least three pieces of data. The prediction model can be established, and the less data is required, so that the model obtains less data information, which will inevitably affect the prediction effect of the model. In view of the above situation, by introducing the intelligent optimization natural optimization algorithm PS0 into the optimization model, optimizing the parameters of the mean sequence of the model, by establishing a suitable fitness function, using the optimization natural optimization algorithm to search, and searching for suitable parameters, the parameter Q is applied to the model to improve the predictive ability of the model [811].

The intelligent optimization natural optimization algorithm PSO is applied to the support vector machine model, because the classification accuracy and regression effect of the support vector machine model are affected by the penalty coefficient of the support vector machine, the kernel function, and the parameters of the kernel function, and different penalty coefficients and kernel function parameters have a great impact on the accuracy of the model, and there is no definite rule for the selection of the penalty coefficient and kernel function parameters, and the definite values are often selected according to experience, so the selected parameters are often used for different types of data. In the era of big data, more and more attention is paid to the value of data. Data optimization technology uses machine learning algorithms to discover hidden laws or knowledge behind data. The trained model can be used to predict future data, and the continuous improvement of natural optimization algorithms. Refinement and development make predictions more and more accurate. Establishing an effective risk early warning model for litigation-related petitions can help make early warnings before the occurrence of petitions and take corresponding countermeasures in advance, so as to minimize the losses caused by the harm. This paper takes a large number of court legal reporting documents as a breakthrough and establishes an early warning model by optimizing data according to the key information in the documents. It can also be used to guide the areas that need to be improved according to the proportion of petition risks in all cases in the court.

Based on the in-depth exploration of the evolution process and core ideas of the legal whistleblower information hiding methods based on histogram transfer, prediction error histogram transfer, and prediction error expansion, it is clarified that the method of legal whistleblower information hiding based on prediction error expansion is the same as that based on histogram. The similarity relationship between the transferred legal whistle-blowing information hiding methods summarizes the basic idea of the legal whistle-blowing information hiding method based on histogram expansion: by moving some factors in a histogram domain to make space, it is used to expand another part of the factors to embed hidden secrets. This framework includes several existing methods as special cases, which has great flexibility and versatility and has important guiding significance for the design of natural optimization algorithms for hiding legal reporting information based on histogram expansion [1214].

In the method based on legal whistleblower ontology, de Almeida et al. [15] considered the characteristics of the shortest path length, depth, and refinement; in the method based on word vectorization, this paper uses the core collection, legal whistleblower text, and general corpus. In the method based on ontology vectorization, Shields et al. [16] firstly proposed the ontology vectorization construction method based on concept lattice and, secondly, presented the ontology vector dimensionality reduction method based on autoencoder. This calculation method provides a natural optimization algorithm basis for the legal reporting domain ontology combined with the multilabel naive Bayesian classification method to achieve tax law enforcement risk prevention and the use of domain ontology optimization natural optimization algorithm to achieve legal reporting management risk prevention. Hoseini et al. [17] proposed an improved BSA based on chaos to optimize the parameters in the satellite formation flight problem. The natural optimization algorithm uses chaotic mapping to replace the original random generation mechanism, which enhances the disorder of the initial population and improves the traversal of the natural optimization algorithm. In order to make full use of the entire search space, Praticò et al. [18] used the orthogonal design method to replace the original random initialization mechanism and also designed two chaotic maps to improve selection I and mutation parameters of BSA, respectively. In order to improve the quality of the historical population, in his doctoral thesis, they proposed to use the global optimal information to improve the generation method of the historical population. In order to enhance the diversity of the initial population, a BSA based on different reverse learning is proposed, and the initial population and the reverse initial population are generated at the same time to select the optimal candidate solution, and the jumping strategy in the reverse learning is added to update the population.

Abualigah and Diabat [19] proposed a hybrid BSA improved by techniques such as ensemble learning guidance, mutation perturbation, and niche exclusion and successfully applied it to the time series forecasting problem of neural network training. In view of the fact that the “teaching and learning” optimization (TLBO) natural optimization algorithm has strong learning characteristics, the researchers propose an improved BSA of the learning operator embedded in TLBO. The natural optimization algorithm uses a random probability by combining contemporary optimal information, and historical information is used for mutation, so that the remaining individuals can learn the knowledge of the best individual, the worst individual, and other random individuals in the contemporary era and then obtain the population update. The experimental results show that the improved natural optimization algorithm has strong competitiveness. The natural optimization algorithm uses the DE mutation operator to guide the optimal individual according to the probability in each iteration, thereby improving the convergence speed of the natural optimization algorithm without increasing the overall complexity of the natural optimization algorithm. The researchers proposed an improved BSA based on the DE mutation strategy to solve the sidelobe suppression problem of the concentric circular antenna array. Using the original BSA mutation operator combined with the DE/current-to-best/1 mutation strategy, the diversity of the population was controlled, and the search was improved. In the following year, Sultana et al. [20] also proposed the BSA hybrid DE mutation operator and applied it to the optimization design problem of CMOS amplifier circuits. And the difference component calculation is performed with three randomly selected individuals, which improves the convergence speed of BSA. The mutation operator enables the natural optimization algorithm to use historical experience and optimal individual information to balance exploration and mining capabilities in different iterative periods. Therefore, this paper conducts in-depth research around BSA. After a comprehensive analysis of the optimization mechanism of the natural optimization algorithm and its advantages and disadvantages, two improved backtracking search optimization natural optimization algorithms based on nature inspiration are proposed, which improve the performance of the natural optimization algorithm [2024]. The research work in this paper not only improves the optimization performance of BSA, but also provides a new direction for the improved design of other metaheuristic natural optimization algorithms and also has important significance for the intelligent optimization design of metaheuristic natural optimization algorithms in the engineering field.

3. Information Processing Based on Natural Optimization Algorithms

3.1. Natural Optimization Algorithm Architecture

The natural optimization algorithm is a very simple low-residual optimization algorithm and can effectively optimize various functions. To some extent, this natural optimization algorithm is between the genetic natural optimization algorithm and the evolutionary natural optimization algorithm. This natural optimization algorithm relies heavily on stochastic processes, which is also similar to evolutionary programming. The adjustment of the proximity of the global and local optima in this natural optimization algorithm is very similar to the crossover operator in the genetic natural optimization algorithm.

Its biggest feature is that when the samples in the sample set are linearly inseparable in the current feature space, the original sample x is first mapped to a higher-dimensional feature space z by nonlinear mapping, and then a classification hyperplane is found in the high-dimensional space z, thus making it linearly separable. The support vector determines the optimal position of the classification hyperplane, they are the data points closest to the hyperplane, and the interval between the hyperplane and the support vector is called the separation edge, denoted by P. The purpose of SVM is to find an optimal hyperplane that maximizes the separation edge P.

The algorithm prepruning first estimates a node before dividing it. If the division of the node cannot improve the generalization accuracy of the decision tree, the division is stopped, and the node is directly set as a leaf node. Each time a part of the training data is set aside as the test set to measure the generalization accuracy, then the prediction accuracy of the test set before and after the division is compared each time the nodes are divided. The advantage is that it reduces the risk of overfitting and is more efficient than postpruning; the disadvantage is that this method is a greedy operation, and there may be some partial divisions that cannot improve the accuracy temporarily, but the accuracy is improved in subsequent divisions, so there is increased risk of underfitting.

It uses particles to have the ability to self-summarize and learn from outstanding individuals in the group, so as to approach its own historical optimum and the historical optimum within the group. These two parameters have little effect on the convergence of the natural optimization algorithm, but by adjusting these two parameters properly, the trouble of local minimum can be reduced, and of course the convergence speed will be faster. Since there is no actual mechanism to control particle velocity in the fit nature optimization algorithm, it is necessary to place a limit on the maximum velocity.

This results in a corresponding increase in the coefficient values of all frequency components in the DCT coefficient matrix, while reducing the proportion of the energy of the DC coefficients and a few AC large coefficients relative to the total energy of the factor subblock. According to the above analysis, it can be concluded that when the value of the central factor to be predicted changes continuously between [0, 255], the value when the ratio of the energy of several large coefficients in the corresponding DCT coefficients to the total energy of the factor subblock is the largest is the best.

3.2. Confidence of Optimization Results

Data optimization is to convert rules and patterns into models by modeling data, input sample feature data, and output derivation results. As a tool for data optimization, natural optimization algorithms have different characteristics. Finding the most suitable natural optimization algorithm is essential in the process of data optimization. In this paper, ten classical natural optimization algorithms are selected for the same data set. The natural optimization algorithm uses the energy distribution characteristics of the DCT coefficients of the natural legal information subblock to predict the factor values, which improves the prediction accuracy of the legal information factor values of the traditional prediction method. At the same time, a payload-legal information degradation degree function is proposed to find the optimized factor moving direction, hidden information embedding position, and factor sorting/selection threshold in each round of embedding process of multiround embedding of legal information.

Due to the random search nature of fitting natural optimization algorithms, it is less likely to fall into local optima. At the same time, the characteristics of evolution based on the concept of fitness ensure the rapidity of natural optimization algorithms. Therefore, the fitting natural optimization algorithm has strong advantages for complex, especially multimodal and high-dimensional optimization problems. Unlike other evolutionary natural optimization algorithms, fitting natural optimization algorithms do not use the concept of “survival of the fittest.” It does not directly utilize the selection function. Therefore, particles with low fitness values can still survive in the optimization process and may search any domain in the solution space. The quality of the secret legal information obtained by embedding the large-capacity secret information in all 8 pieces of carrier legal information is very low, and it is difficult to feel the difference between the carrier legal information before and after the secret information is embedded and the legal information in Figure 1 through human observation.

Uncertainty reflects the physiological mechanism of natural organisms and is superior to deterministic natural optimization algorithms in solving some specific problems. The uncertainty of the natural optimization algorithm of bionic optimization comes with its randomness, and its main steps contain random factors, so in the iterative process of the natural optimization algorithm, the occurrence of events has great uncertainty. It is generally believed that the convergence ability of the natural optimization algorithm of PSO is not sensitive to the population size, and the performance decline is not very large when the population number decreases. When selecting the particle size, the reliability and calculation time of the natural optimization algorithm should be considered comprehensively. Generally speaking, for general-scale problems, the population size of PSO can be 30; for more complex problems, the particle swarm size can be 50, but for multimodal function optimization problems, the number of particles can be 100–300.

In addition, the overflow location map can also be further compressed using other compression methods to save payload space, such as the widely used JBIG2 binary legal information compression method. In the way of embedding auxiliary information, this chapter uses a classic LSB replacement method to store auxiliary information in the L-layer IL legal information and embeds the LSB least significant bits with the replaced LSBs at these positions into the low-resolution legal information along with the payload.

3.3. Information Data Dimensionality Reduction

In the unsupervised learning of information data, due to the lack of prior knowledge, it is difficult to use manually labeling, or the cost of manual labeling is too high, and the samples are often unlabeled, and the sample categories are identified through some natural optimization algorithms. A representative example is clustering. The purpose of clustering is to cluster similar samples together and does not care much about the category; semisupervised learning combines supervised learning and unsupervised learning at the same time. Some of the samples in the sample set have labels, and some of them have not.

Unlabeled, the same task as supervised learning, means to build a model that maps from samples to labels. What semisupervised learning needs to solve is how to comprehensively utilize labeled samples and unlabeled samples, low-pass Gaussian filtering, and direct downsampling without filtering. For the specific application needs of legal reporting information hiding, all the low-resolution legal information in the legal information sample specified in Figure 2 is obtained from its adjacent high-resolution legal information through undistorted 2 × 2 downsampling.

For each branch node, its weight will be multiplied according to the principle of information gain. Since the sum of the weights is 1, the more the branch nodes are divided, the higher the purity may be. This will cause the information entropy criterion to be more inclined to those attributes with a wider range of values. The comparison results of SSIM further show that the method proposed in this chapter outperforms the three comparison methods in an all-round way and often has better performance in the preservation of the texture details of the secret legal information structure. Finally, it is not difficult to see from the experimental results that, compared with the traditional reversible information hiding method based on causal window sequential scanning, the self-adaptive prediction natural optimization algorithm based on the autoregressive model proposed in this chapter can achieve greater legal information, large capacity, and low cost.

The residual error correction model considers the change trend of the residual error, but the correction model does not consider the change rule of the residual error sign in the prediction. In fact, the change of residuals has a large randomness, and Markov forecast is suitable for forecasting problems with large random fluctuations. The residual of the predicted data sequence is divided into two states, the Markov state transition matrix is introduced, the residual symbol is predicted, and a new model is established by combining the residual correction model. The optimal exponential law means that a nonnegative smooth discrete function can be transformed into a series of approximate exponential law sequences. The first-order accumulation sequence of the original data sequence is applied as an intermediate sequence in the optimization. Using the accumulation technique to convert the original sequence into a monotonically increasing sequence can effectively reduce the noise effect of the original fuzzy sequence, so as to quickly identify the law.

3.4. Topology Basicization of Natural Optimization Algorithms

The factors of one part of the natural optimization algorithm are predicted by the factors of another part. The resulting PE sequences are sorted according to the magnitude of the local variance, so PE sequences in the smooth region (i.e., the region where the prediction error is close to zero) are preferentially used to embed messages. In the experimental example, 980 bits are needed to transmit the frequency of 70 symbols of the carrier sequence, that is, f = 980, and the length of the last block is about 4000, that is, L ≈ 4000, and its specific value will change with the distortion constraint. The test computer is configured with an Intel Core i7 processor with a main frequency of 2.0 GHz and a memory of 2 GB.

The realization software of natural optimization algorithm is Matlab R2008b. Under the current conditions, the average time of one round of embedding process is 9.58 seconds. It can be seen that the experimental results are quite close to the theoretical capacity bound. This also shows the asymptotic optimality of the coding construction method proposed in this paper.

Since the prediction of legal information factor value plays an extremely important role in the whole process of legal reporting information hiding, high prediction accuracy of legal information factor value usually means high embedded amount of secret information and low quality loss of secret legal information. In order to demonstrate the superiority of the self-adaptive prediction natural optimization algorithm based on the autoregressive model proposed in this paper, this paper compares it with other three methods for predicting the value of legal information factors. Based on the traditional statistical learning theory, the asymptotic theory of infinite sample size is implied, that is, the limit property when the number of samples tends to be infinite. Most of the existing learning methods are based on this assumption. However, in practical problems, the number of samples is often limited, and many methods are difficult to achieve ideal results. It consists of nodes and directed edges. The nodes are further divided into internal nodes and leaf nodes. The internal node is used to represent a feature or attribute of the sample, and the leaf node is used to represent the category of the sample. The division from the root node to the bottom inner node is reordered. By continuously dividing each node, the samples contained in each branch node belong to the same category as much as possible, even if the “purity” of the nodes in Figure 3 is as high as possible.

Among them, in the most multiplied support vector machine, regularization parameter 1 and the kernel width are the two parameters that must be adjusted by the least squares support vector machine. Since the two are used as a whole, the value of the parameter will be directly to determine the training and generalization performance of SVM. However, the influence of these two parameters on the performance of the least squares support vector machine is not necessarily related in theory. Therefore, in the application, the selection of the regularization parameters, y and the kernel width, has become a big problem. There is no feasible parameter adjustment method, and most of them use the trial and error method combined with the usual experience. In the implementation of the above coding construction method, we use the arithmetic coding natural optimization algorithm as the information entropy code. In the simulation experiment, the 8 bit gray value carrier signal sequence of length 106 is sampled from the Laplace distribution with a mean value of 127.5 and a scale factor of 6 = 5, and then the signals whose statistical frequency is less than 100 are deleted.

In order to avoid factor value overflow and reduce the degradation of secret legal information, the factor values of all the pixels that may generate overflow remain unchanged before and after the legal reporting information is hidden. The two-dimensional legal information M is used as an overflow location map that records all possible overflow factors, and it is embedded in the carrier legal information together with the secret information as a part of the payload. If the hidden information expansion and embedding direction as the prediction error histogram are unreasonably selected, corresponding to the subtraction of the information bits, a large number of potential overflow positions will be generated, resulting in a surge in the amount of compressed overflow position bit stream M-c, and at the same time, it can be embedded.

In fact, as the independent variables of Figure 4 tend to infinity, the effect of the last sequence block can be ignored, so that the average embedding rate and distortion can be estimated in a single sequence block of length K. The subsequent coding construction process takes less time than the previous optimal density-carrying signal edge distribution pY(y) solution process. This point shows that the time-consuming of the entire embedded natural optimization algorithm is determined by the previous optimal distribution Py estimation solution step. It can be seen that the binary search BFI natural optimization algorithm performs poorly, and the method we propose is more efficient and practical for the hidden problem of legal reporting information in actual scenarios.

4.1. Natural Optimization Algorithm Factor Evaluation

In high-dimensional space, the inner product method of natural optimization algorithm will bring a huge amount of calculation, and the cost of time and space is huge. Therefore, a kernel function is needed to solve this problem. Its function is to use the function K(x, z) in the low-dimensional space, so that it is equal to this inner product in the high-dimensional space, thus avoiding complex nonlinear transformation. Calculation: also for the DLS-PF natural optimization algorithm and the PLS-PF natural optimization algorithm, the only difference is that the matrix A22 is the addition of a diagonal matrix and a highly sparse matrix. It can be seen from the observation that this slight change is negligible for the complexity of constructing the matrix S, so the natural optimization algorithm complexity of the iterative loop process inside Newton’s method is almost the same.

By analyzing the multifeature fusion technology of legal information, this paper designs and uses SIFT, HOG models to model the texture, shape, and color features of legal information. In order to maintain the independence of each feature, this paper firstly separates feature extraction and modeling for a single feature and finally fuses it into a complete legal information feature descriptor. In the process of feature quantization, this paper uses the visual pool bag model BOF to model the features and LLC to locally linearly encode the legal information features and then uses the pooling and spatial pyramid models to model the legal information at a higher level. In order to verify the performance of the natural optimization algorithm proposed in this paper for feature extraction of legal information, this paper conducts relevant performance tests and analysis on the proposed natural optimization algorithm through databases.

Conversely, if the variance of the factor corresponding to the support region is greater than or equal to Te, it is classified into the texture region and predicted using another autoregressive model applied to the texture region prediction. The threshold value Te for determining the legal information factor attribute is determined in an optimal way. This paper uses an exhaustive search method in the range of [0, 250] to find the output minimum average prediction error T as the legal information factor attribute threshold value. From the actual experimental test process, this exhaustive search process is fast, and the calculation time is very short. For some special legal information, such as the carrier legal information whose histogram distribution is concentrated around 0 or 255, the optimal choice of the hidden information expansion and embedding direction will significantly affect the final information hiding performance of the natural optimization algorithm.

In order to evaluate the DLS-PF natural optimization algorithm modified in Table 1, the D port is constrained for the actual given distortion. Regarding the performance of the DLS problem under the condition, we compare it with the original PF natural optimization algorithm with input parameter. Considering the scenario N = B, we first run the PF natural optimization algorithm and then use its obtained D. We take 15 sets of experimental values from small embedding rate to large embedding rate and average the results. The results show that the time-consuming of the DLS-PF natural optimization algorithm is almost twice that of the PF natural optimization algorithm on average. Since a small part of data is inevitably missing in the sample set, in order to reduce the impact of missing values on the results and maximize the effect of data optimization, it is necessary to deal with missing values. The processing principle of this paper is as follows: if the sample has more than one-third of the number of features, that is, 4 or more missing values, the sample will be deleted; if it is less than 4, the assignment and filling operation will be performed, and the assignment method is to select the mean of the feature.

4.2. Analysis of Legal Reporting Information Demands

The methods of obtaining legal reporting document data mainly include manual visits and research, web page data crawling, and public data download. The method of manual visit and research is to visit the people’s courts at all levels, conduct on-the-spot research and investigation, and enter the court department to obtain data. The advantages are simple and direct, and the data is true and comprehensive. Cooperative, uncertain factors are relatively large; the web page data crawling method refers to cyclically traversing and grabbing document data from various court official websites by writing a network data collection program (crawler).

The disadvantage is that, due to the diversity of HTML source code design of each site, it is necessary to write corresponding crawler programs, resulting in high technical costs, and it is also subject to the anticrawling strategy of the target website, and the format of document data is not uniform, resulting in a certain cost. The public data download refers to free or paid download of document data from the national legal reporting document website or other open source document websites. The advantage is that there is no risk, the acquisition speed is fast, and the data scale is large. It is subject to the restrictions of the download rules of the target website, such as restricting IP, restricting download volume.

In essence, the above two types of problems are in a dual relationship; that is, the optimal solution of the first type of problem is also the optimal solution of the second type of problem for a specific value, and vice versa. Based on the convexity and smoothness of their models, many natural optimization algorithms for convex optimization can be used to solve their optimal values, such as gradient projection method, interior point method. The PF natural optimization algorithm shows a relatively constant convergence speed for problems with different data scales, and its main time-consuming part lies in the linear equation solving in the Newton iteration process. Relatively speaking, the complexity of the natural optimization algorithm in a single iteration of BFI is low, but its convergence speed is slow, especially for problems with large B values. For a specific problem, the natural optimization algorithms in Table 2 actually perform better at low embedding rates, and the performance of the natural optimization algorithms decreases as the embedding rate increases.

In this process, three different frequency values are used to construct 12 Gabor filter kernels in four different horizontal and vertical directions, respectively. The edge point of legal information is defined as the point where the grayscale structure of legal information has changed abruptly. Based on this feature, the edge point of legal information can be sharpened. The sharpening method based on gradient mutation is mainly realized according to the change size of the gray value f(x, y) on both sides of the edge in the horizontal and vertical directions; that is, the threshold value is set according to the size of the change to determine whether it is an edge factor.

In this template, the horizontal and vertical direction thresholds are set to 1, the diagonal is set to 1/2, and the other directions are set to 1/8. Through this setting, feature detection in multiple directions can be achieved, and finally the secondary differential to determine the zero point between the positive and negative peaks to determine whether the factor point is the edge point of the picture. In the proposed research natural optimization algorithm, the influence of experimental parameters on the CBIR retrieval performance is mainly in the selection of the size of the visual model and the selection of the threshold of the pseudomarker estimation function. In this paper, the relevant parameters are experimentally analyzed in two open source databases, and the optimal value is taken as the parameter of the experimental process.

4.3. Feature Selection of Information Platform

Each information platform feature cluster center represents a visual word in the BoF codebook. The BoF modeling process is the process of describing all the feature points of a graph by its position in the codebook. This modeling method can greatly reduce the dimension of legal information features and has strong robustness. The shortcomings of BoF modeling through global features are also obvious. Since the model models legal information features through statistical histograms, it does not take into account the description of the spatial location distribution in the original image. Its core is to gradually move the query vector closer to the optimal query vector during the feedback process. Through this query condition vector, the feature description of the image has the highest similarity with the features of its related documents and is not related to it.

Curve smoothing methods include exponential-based smoothing (first-order exponential, second-order exponential, and third-order exponential), differential threshold-based filtering, and moving average. Common methods for region-based shape description include invariant moments, orthogonal moments, general Fourier descriptors, and corner radius transform methods. Its basic idea is to map high-dimensional data to low-dimensional space through a reasonable linear transformation and keep the variance of the data in the projected dimension to the maximum, so as to achieve the purpose of dimensionality reduction and retain more original data. The current legal reporting information hiding is mainly divided into two major steps: (1) first, transform and extract the legal information of the input carrier to obtain the feature signal sequence of the carrier to be embedded. Usually, the transformation process will reduce the autocorrelation of the carrier legal information signal itself, and prediction technology is generally used to obtain the prediction error carrier feature sequence; (2) after the carrier feature (prediction error) sequence is obtained, the specific reversible information hiding process is performed, and the secret message is embedded in the carrier signature sequence.

The feedback technology based on weight adjustment in Figure 5 is mainly used in legal information retrieval systems with multiple feature fusions. The weight value is used to achieve the purpose of optimizing the query. Due to the different emphases of the description of different underlying features, a single underlying feature often cannot construct a complete description of the entire image information. At the same time, how to retain more underlying information in the process of legal information feature quantification and how to prevent the overfitting phenomenon of the feature representation process are the difficulties in the legal information feature representation process.

5.1. Naturally Optimized Data Preprocessing

Due to the large differences in the value ranges of the selected information platform features, some of them are continuous variables, and some are discrete variables. The dimensions of different features are also inconsistent; if not processed, it may cause the natural optimization algorithm to be affected by features with a wide range of values, thus making the model inaccurate. The method chosen in this paper is Z-score standardization, also known as standard deviation standardization. The hidden channel model of legal reporting information not only hopes to embed as many message bits as possible, but also needs to reconstruct the original source carrier signal completely lossless at the receiver. If it is based the standard noise-free channel theoretical model, it is hoped that the receiver can restore the original source carrier signal completely without loss, and its channel model becomes the legal reporting information hidden channel theoretical model.

In this paper, three aspects of color, texture, and shape features are selected to describe the legal information features, respectively. In order to increase the robustness of the legal information feature descriptor and retain more local features of the original legal information, this paper performs local linear encoding and pyramid space representation on the legal information features based on the obtained underlying features. Its value determines the roughness of the legal information after Gaussian filtering. The larger the value of σ, the rougher the processed legal information (the lower the resolution), and the less the details that can be displayed; otherwise, the obtained result is displayed.

This chapter proposes a method of hiding legal whistleblower information based on autoregressive model adaptive prediction, which uses factor classification and autoregressive model to predict factor values, which improves the natural optimization algorithm of traditional legal whistleblower information hiding based on causal window prediction error expansion. Based on the prediction accuracy of the legal information factor value, the capacity-distortion performance of the natural optimization algorithm for hiding legal reporting information in Figure 6 is greatly improved.

Based on the recursive coding structure of the rate-distortion model of legal whistleblower information hiding, the paper proposes the optimal evaluation criteria for the prediction method of legal whistleblower information hiding and, to a certain extent, establishes a bridge between legal whistleblower information hiding and lossless data compression. This paper proposes a prediction method for legal reporting information hiding that minimizes the code rate, so that the front-end prediction part and the back-end embedded coding part of legal reporting information hiding can be better integrated in essence, so as to achieve better overall embedding performance. Combined with the optimal recursive coding structure proposed in the paper, compared with the traditional legal reporting information hiding prediction method, on the general natural legal information carrier sequence, the minimum code rate prediction method proposed in the paper has an average of 1.5 dB to 2.0 dB of embedded performance improvement.

Area under the curve (AUC) is the value of the area enclosed by the lower right of the ROC curve and the coordinate axis. The ROC curve can often only intuitively and qualitatively feel the pros and cons of each model from legal information. AUC is a quantitative treatment of the ROC curve; the larger the AUC, the better the performance. Generally, the classifier is divided into 5 grades according to the value range of AUC : AUC = 1, 0.85 ≤ ACU < 1, 0.7 ≤ AUC < 0.85, 0.5 ≤ ACU < 0.7, AUC = 0.5, AUC < 0.5. In turn, they represent perfect, very good, fair, low, no effect, and poor. Another disadvantage of the SIFT feature detection natural optimization algorithm is that it describes and analyzes according to the legal information feature points, so it can only describe several SIFT feature values with stable feature point factors on the legal information. When the texture features of the legal information are not obvious, or the external influence of the image is large due to insufficient illumination, the image matching efficiency will be greatly reduced.

5.2. Simulation Implementation of Legal Reporting Information Platform

In the LTP segmentation of legal reporting information, two general methods are effectively combined, so that its natural optimization algorithm can not only make full use of the good disambiguation ability of machine learning, but also flexibly integrate external resources such as legal reporting information sets. Combination: in the word segmentation module of LTP, the word-based sequence labeling is used as the basis for its task modeling. For each input sentence, the word sequence is first divided, and then the model will mark each word according to the existing natural optimization algorithm. Take into account the current processing capability of massive texts in special identification rules such as English and Uniform Resource Identifiers that are added to the optimization strategy, as well as common language such as spaces or dashes to mark clues to construct legal reporting information, from large to large. The mutual information of Figure 7 between words and the richness of context is counted in large-scale unlabeled text data.

The method based on integer transformation provides another effective means for legal reporting information hiding technology. This method often has the advantages of low computational complexity, but because the integer transformation usually has low embedding efficiency, and it is inconvenient to control the amount of embedding, in general, it is only suitable for large-capacity legal reporting information hiding. Using a larger-scale region for factor value prediction, more accurate legal information factor prediction values are obtained. At the same time, the characteristics of the human visual system are introduced, and the Just Noticeable Difference (JND) criterion is applied to the original embedding rules. By classifying the smooth area and texture area of the image, the subjective evaluation effect of the secret legal information is improved.

This distributed crawling adopts the Scrapy framework based on Redis, and the MongoDB database is used as the text storage database. A total of four hosts are used, one of which is the master server, and the other three are slave servers. The master server is used to manage the Redis database and distribute download tasks. The slave server is used to deploy Scrapy to crawl web pages, use XPath technology to parse web pages data, and finally store the parsing results that meet the warehousing rules in the same MongoDB database. In order to crawl the data of the website normally, with the help of the proxy IP method provided by the project, first crawl the free IP website, verify the proxy IP, and store it in the database when the verification is passed. If the maximum IP is satisfied, and if the number of IPs is exceeded, then suspend the crawling of free proxy IPs. After a certain period of time, poll to verify the validity of the IPs, delete the invalid IPs, and continue to crawl free IPs until the number of IPs in the IP database is zero, and repeat the operation.

First, using distributed crawling technology, the crawling efficiency can be improved by horizontally expanding host resources, and only a small number of crawling rules, parsing rules, and storage rules need to be written when new websites need to be added; then, the functions of the crawler have achieved a high degree of modularity, which is conducive to the low coupling of the architecture in Table 3 and has the stability and maintainability of the architecture; secondly, after the crawler is started through the architecture deployment, the working status of the crawler can be monitored in real time, such as midway interruption. Network can accurately locate the error report position according to the monitoring log and realize the breakpoint and continue to climb. In order to ensure that the factor values of the support regions before and after embedding remain unchanged, so as to obtain consistent current factor prediction values, it is necessary to specify the order of embedding scans from top to bottom and from left to right.

At the secret information extraction end, secret information extraction and original legal information recovery are performed in the reverse scanning order. Since the support area selects the lower right of the current factor, the rightmost column and the last row of factors (optimized square part) of the carrier legal information cannot embed secret information through the causal window and only serve as the support area for factor value prediction. By setting the address of Scrapy-Redis on the Slave server to obtain the URL as the address of the Master server, multiple slave servers can obtain the URL from the Redis database of the master server. Due to Scrapy-Redis’s own queue mechanism, the links obtained by the slave server will not conflict with each other. In this way, after each slave server completes the fetching task, the obtained results are aggregated to the MongoDB database server. Scrapy can use settings to make it not automatically close after crawling but constantly ask if there is a new URL in the queue.

5.3. Example Application and Analysis

The size settings of the codebook for the retrieval process of thousands of legal information databases include 200, 500, 800, and 1024. Considering that the setting of the BoF codebook size will be affected by various factors such as the complexity of the image content and feature selection, this paper specifically verifies the influence of the BoF size on the experimental results. This experiment will be tested in the Corel-1k database according to seven different BoF size parameters of 50, 100, 200, 500, 800, 1024, and 2048. In order to obtain more accurate experimental results, the experiment abandons all user feedback processes and simply relies on the similarity comparison of Euclidean distance to obtain retrieval results. The secret information bit b is determined by the embedded secret information, which is random and uncontrollable, and the prediction error e is determined by the legal information factor value prediction method. The prediction error e can be reduced by improving the prediction accuracy of the legal information factor value, thereby reducing the change of the factor value by reducing the change of carrier legal information. From this point of view, the prediction of legal information factor value is a key part of the legal reporting information hiding method based on prediction error expansion.

This article has tried the above three methods, respectively. First, it visited and investigated an intermediate people’s court in a certain district. After being stationed for a period of time, the amount of document data obtained was small, only less than 1,000 pieces, which did not meet the requirements of this article. Because the data format of each website is different, some are in text form, some are in image form, and the processing process is cumbersome, so this method is abandoned. Although the National Legal Reporting Documents website provides free downloads, the download link is invalid due to the official server problem and cannot be downloaded normally.

Legal reporting document data: the data is in a unified.xlsx standard format with detailed content and meets the requirements of this article for data volume and data content. There are many types of legal reporting documents, including procuratorate enforcement documents, ordinary civil cases, and criminal cases. The types of parties are also divided into individual-to-individual, individual-to-group, and group-to-group. This paper appropriately simplifies this and selects ordinary civil cases between individual plaintiffs and individual defendants as the research object.

The former uses the causal window to predict the embedded factor of the entire carrier legal information according to a certain scanning order and obtains the prediction error of Figure 8 and expands the embedded information. The latter divides the carrier legal information into blocks, and by means of multiple rounds of embedding, a factor subset of the carrier legal information is embedded simultaneously in each round, while keeping the factor values of the remaining factor subsets unchanged, until all factor subsets are embedded. The embedding/extraction framework of secret information directly affects the choice of the factor value prediction method for legal information. For the causal window sequential scanning framework, the directional semienclosed factor value prediction window can be used, and for the block multiround embedding framework, it is omnidirectional semienclosed or enclosing factor value prediction window.

It can be concluded that the basic idea of legal reporting information hiding technology based on histogram transfer is as follows: first, find the peak point and zero value point in the histogram domain of the carrier legal information. The direction is moved as a whole, so as to make room at the adjacent position of the peak point of the histogram; finally, combined with the bit value of the secret information, the secret information is expanded and embedded by the factor value of the peak point of the histogram. It can be seen that the legal reporting information hiding method based on prediction error histogram transfer is a natural extension of the reversible information hiding method based on histogram transfer.

6. Conclusion

In this paper, the natural optimization algorithm is used to model the legal whistleblower documents, and the content of the legal whistleblower documents is mapped into the K-dimensional vector space. Documents are semantically similar, so as to provide analogous similar cases that comprehensively consider the context of the case for pending cases, and to provide semantically similar cited cases for the litigation process. Then, based on sentence vector, high-dimensional vector aggregation is performed to classify cases with similar semantics into the same category, and through topic analysis method, the topic words that can reflect the classification at the semantic-level are extracted as the search words for this type of cases to establish a semantic-level classification portrait. In the traditional representation of word vectors, there is a word representation based on legal reporting information. The effect of word representation has a great influence, and there is a huge workload in the construction of legal reporting information; there is also one-hot encoding based on the bag-of-words model. It can be seen that the legal reporting information hiding method based on prediction error expansion uses the embedding amount to control the threshold t and simultaneously selects multiple prediction errors for factor shifting, expansion, and secret information embedding, so that higher information hiding can be obtained in the case of a single embedding. Capacity: based on the rate-distortion model of legal whistleblower information hiding, this paper further expands the fast-solving natural optimization algorithm and recursive coding structure, making it suitable for the application scenarios of legal whistleblower information hiding under more general distortion metrics. The rate-distortion model proposed in this paper quickly solves the natural optimization algorithm and the recursive coding constructs the extended natural optimization algorithm, which can be applied to the legal reporting information hiding problem under any distortion metric, such as reversible information hiding for binary legal information, or for binary information hiding. The legal reporting information hiding of two-dimensional or even four-dimensional signals makes the rate-distortion theoretical model of legal reporting information hiding more general application value. [25].

Data Availability

The data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgments

This work was supported by College of Arts and Sciences, National University of Defense Technology.