Abstract

As a means of regulating people’s code of conduct, law has a close relationship with text, and text data has been growing exponentially. Managing and classifying huge text data have become a huge challenge. The PDES image segmentation algorithm is an effective natural language processing method for text classification management. Based on the study of image segmentation algorithm and legal case text classification theory, an image segmentation model based on partial differential equation is proposed, in which diffusion indirectly acts on level set function through auxiliary function. The software architecture of image segmentation algorithm text classification system is proposed by using computer technology and three-layer architecture model, which can improve the classification ability of text classification algorithm. The validity of pDE image segmentation model is verified by experiments. The experimental results show that the model completes the legal case text classification, the performance of each functional module of the legal case text classification system is good, and the efficiency and quality of the legal case text classification are improved.

1. Introduction

Image segmentation is the basis of computer vision and other high-level image processing and is also the key step of image recognition and registration. It separates the target from the complex background according to the characteristics of gray scale, texture, or shape in the image [1]. Image segmentation methods can be divided into the following two categories according to the principle of model design and the different information they depend on: one is based on image information, morphology, topology, partial differential equation, etc. The other is segmentation method based on deep learning, image segmentation method based on region selection, image segmentation method based on RNN, and segmentation method based on upsampling. The partial differential equation method of image segmentation is one of the more successful methods. According to the image characteristics, an energy functional is defined for the evolution curve. In order to minimize the energy functional, the numerical solution of the equation is then used to solve the equation, and the numerical solution of the equation is the desired segmentation curve [2]. This kind of method can not only deal with the change of topology structure of evolution curve effectively but also can deal with the given image directly and does not need a lot of training data, repeated adjustment, and network learning.

With the advent of intelligent information age, the real-time requirements for image segmentation speed are increasing day by day. The accuracy of segmentation results and the speed of segmentation directly affect the subsequent image processing process. Ahmad et al. proposed the contour model, which defined an evolution curve through the form of parameters. The curve was gradually converged to the target boundary under the joint action of internal forces of geometric features of the curve itself and external forces of image information [3]. Aziz et al. uses the level set to represent the snake model, and the curve can be expressed implicitly through the level set of the high-dimensional surface; that is, the evolution of the curve can be described through the evolution of the level set function at different times, so that changes in topological structure can be processed [4]. The active contour model can be divided into edge-based model and region-based model according to different image features. The edge-based model uses the information of image gradient to control the velocity of the curve, and the evolution speed is faster in the place of small image gradient,and slower in the place of large image gradient change, so as to stop the evolution curve at the target boundary. According to the derivation of partial differential equation, active contour can be divided into two types: the partial differential equation model directly designed based on evolution theory and the partial differential equation model derived based on energy functional problem. The former introduces the design equation of time variable by analyzing the change of image in the process of segmentation, while the latter obtains the energy functional of the objective function by analyzing the properties of the image to be segmented and transforms the image segmentation problem into an energy functional minimum problem under certain constraints.

The partial differential equation model of image segmentation can be briefly summarized as follows: define an initial curve in the image, and the evolution direction of the curve is jointly controlled by the internal and external energy terms [5]. The properties of the curve and its evolution process can be described by energy functional, and the evolution curve can be stopped to the target boundary by minimizing the energy functional to achieve the purpose of segmentation. In order to deal with the topological changes of evolution curve, the level set method is often used to implement the image segmentation model numerically. Ruthotto et al. implicitly represent the evolutionary surface as the zero level set of the three-dimensional level set function. The evolution of the level set function is used to drive the evolution of the curve, and the level set method can effectively deal with topological structure changes [6]. In order to speed up the evolution of level set function and improve the segmentation speed, the narrow band method and fast step method are used to speed up the evolution of level set in the numerical calculation of geometric active contour model. In order to improve the segmentation speed, Wang et al. proposed a convex energy functional for image segmentation and implemented it numerically by using alternating minimum method and operator splitting technique. Under certain conditions, the alternating minimum algorithm used is convergent [7]. Yang et al. proposed an image segmentation algorithm based on the global positive radial basis function by combining the method of global positive radial basis function and partial differential equation [8]. This algorithm integrates the global positive radial basis function difference method into the level set function image segmentation model. The difference function obtained has high accuracy and smoothness and overcomes the shortcomings of traditional algorithms such as repeated reinitialization and sensitivity to initial contour position.

2.1. Relevant Theories of Legal Texts

As a means of regulating people’s behavior, law is of great importance both socially and personally. There is a close relationship between law and text. Without text as the carrier, law cannot be disseminated and published [9]. Compared with other types of texts, legal texts are more rigorous and normative, with stronger regularity than ordinary texts, which makes legal texts become the perfect objects to be analyzed by computer technology. The text of legal cases has the following characteristics: the paragraph structure of legal cases is relatively fixed. The first part generally includes case name, case type, case number, court, party name, and cause of action. The main text is generally used to elaborate the facts, handling decisions, and reasons of the case. The tail usually includes relevant items, dates, and notes. The text of legal cases generally requires rigorous language and accurate and specific content. Legal case can be divided into different categories according to different attributes: through the Chinese net, the written judgment of the people’s court judicial documents by case type can be divided into criminal, civil, administrative, execution, and compensation, such as category, according to the document type that can be divided into the judgments or written orders, conciliation, and decision and such as category, according to the trial procedure can be divided into the first instance, second instance and retrial, accusing the process inspection, check, etc.

2.2. Related Theories of Text Classification

Text classification is a process of text classification by using computer technology, which mainly includes two classification methods based on statistics and rules. In the statistic-based approach, the general formula, characteristic dimensions, and labeling system of statistical models do not change with the increase or decrease of corpus [10]. Once these factors are fixed, you just need to adjust some parameters and so on. The whole process does not need professional participation; so, the intelligent level is high and fast classification, suitable for rapid prototyping applications. Rules are derived from a large number of linguistic facts and require a large number of experts to summarize knowledge of the domain. Such methods require human involvement; so, the knowledge gained is readable and understandable. Although logically simple, such methods can sometimes easily solve problems that statistic-based classification methods cannot. As data for court applications, users have high requirements for accuracy and precision. If the data results are inaccurate, the data will lose application value in specific cases.

2.3. Level Set Representation of a Plane Closed Curve

Curve evolution is the basic theory of image segmentation by using partial differentiation. Its central idea is to study the deformation of curves over time by using geometric features. The unit normal vector and curvature of curves are commonly used geometric parameters [11]. The commonly used implicit expressions of plane closed curves are as follows:

where is a two-dimensional function. That is, curve is a set of points satisfying u; then, curve is called a horizontal set of , and is the embedding function of curve . When the constant , it is called the zero level set.

If the directional derivative of with respect to is taken along the tangent direction of the horizontal set at a certain point, then remains unchanged along the horizontal set, and then

where represents the angle between tangent vector and -axis. Then, the gradient vector of is

Perpendicular to the tangent vector of the level set, that is, parallel to the normal vector of the level set, the gradient vector always points in the direction of increasing u. It can be seen that the unit normal vector of the level set can be expressed as

u(x, y) is positive inside the zero level set, and formula (4) is positive; u(x, y) is negative outside the zero level set, and formula (4) is negative. So we know that N always points inside the closed curve.

When discussing the level set method of curve evolution, it is stipulated that the curve we define is the zero level set of the embedded function , it is stipulated that is inside the zero level set, and is outside the zero level set. Therefore, it can be known that the curvature of the level set of the embedded function is

2.4. Variation Principle

The curve evolution theory is adopted to solve the problem of image segmentation. The curve motion equation is firstly to obtain the evolution equation of by minimizing the energy functional of the closed curve and then to achieve the purpose of image segmentation by solving the partial differential equation of by introducing the embedding function [12].

The endpoints of the function u(x) are such that , , and get an extreme value at . Similarly, when variational,

reaches its maximum. Perturbation of is performed to obtain . When both and are sufficiently small, Taylor expansion will be performed:

whereupon

The endpoints and satisfy

Therefore, and , which can be obtained by integration by parts:

Substitute Formula (11) into Formula (9):

According to Formula (12), when reaches the extreme value and any perturbation of is performed, the value of does not change, and then

Similarly, it can be deduced that the two-dimensional variational problem is

3. Image Segmentation Text Classification System Based on Partial Differential Equation

3.1. Preprocessing of Text Data

Feature extraction and preprocessing are key steps in the application of text classification. First, you need to put the text data in data cleaning, remove unnecessary characters and words that meet certain conditions, the data after processing, using the formal feature information extraction method to extract the features of the data, in after extracting the features of the data, and the characteristics of the input set classifier for training and testing; when the model reaches the optimal stop training, the training effect input the samples to be predicted into the learned classifier for classification prediction [13]. The model parses the text into meaningful elements such as words or phrases, and the tokenization method converts the text into a unified standard format, which facilitates the subsequent text preprocessing steps. The algorithm takes these factors as features and texts as the input of the model to increase the data information of the input model.

Legal issues are described by the various parties that shall be carried out in accordance with the personal understanding of text data and have a strong subjectivity and diversity, to describe the same kind of legal text that has great probability not or rarely appears the co-occurrence words, but the text words often on different semantic factors with different degree of similarity or correlation [14]. Different extraction methods of correlation features lead to different measurement standards; so, different distance matrices are needed to store measurement results. Word association feature extraction algorithm takes word as unit element, so when the data set is Chinese text without space separation, word segmentation is needed. Semantic correlation obtained through point mutual information is different from general semantic similarity. Semantic similarity is mainly reflected in the fact that words have similar nature and similar context information and have strong substitutability, such as “criminal law” and “civil law,” “sentencing,” and “ruling.” Semantic correlation is mainly reflected in the fact that words have different properties but often appear in the same text, such as “lawyer” and “crime,” “road,” and “security.”

3.2. Image Segmentation Model of Partial Differential Equation

For a function containing spatial variables and time variable , the simplest diffusion equation is

where is the diffusion coefficient, and ∆ represents the Laplace operator. Equation (14) describes the dynamic process of the diffusion of the function φ, which smooths the function φ isotropic.

In order to realize the indirect diffusion of function φ, the following indirect diffusion equation is proposed:

where gamma is a positive parameter. In Formula (16), the diffusion term smooths the function isotropic, ensuring that Φ is sufficiently close to φ. When , the function Φ increases to approach the function φ; when , the function Φ approaches the function φ by decreasing. Thus, Formula (16) describes the indirect diffusion of Φ by applying the diffusion to the function φ.

Suppose , is a given image, Φ is a level set function, and φ is an auxiliary function. Based on the above indirect diffusion idea, the evolution equation of level set of indirect diffusion for image segmentation is presented as follows:

Initial and boundary conditions are

is called the symbol variable driving force. The driving level set function can move up and down adaptively in the image domain according to the image information and finally identify the target from the background region [15]. The symbol variable driving force has opposite signs inside and outside the target, which can ideally be expressed as

Formula (20) represents the object of interest.

Theorem 1. If function φ satisfies formula (21),

has the properties listed in Formula (18), and then the solution φ (x, y, t) satisfies

3.3. Image Segmentation Algorithm Text Classification System Software Architecture Design

The system adopts a three-tier architecture model for development, which is the presentation layer, logical layer, and data access layer [16]. The presentation layer is the user operation interface of the legal text classification system, which is responsible for interacting with users, receiving input data, and displaying processed data. The logical layer is the bridge between the presentation layer and the data access layer. It responds to and processes the requests put forward by the presentation layer and mainly designs the main services such as user management, model node management, classified word base management, and expression management. The task of the data access layer is to add, delete, modify, and check data according to the request of the business logic layer and return corresponding results. The system software architecture is shown in Figure 1.

User management module achieves the administrator of different categories and different responsibilities of the common user management. Model node management is a structure composed of key words such as cause of action, situation, and circumstance in law. Thesaurus maintenance module can add, modify, and delete thesaurus. The nodes and rules of the node management module provide a judgment basis for text classification.

3.4. System Function Module Design

The user management module has two roles: administrator and common user. User management module achieves the administrator of different categories and different responsibilities of the ordinary user management. User information maintenance module completes the user information input and personnel information change information modification and deletion. Personnel authority management module completes the setting and modification of user authority. Cases in the law are mainly divided into five categories: criminal, civil, administrative, compensation, and enforcement, and users in different groups are responsible for different types of data. Ordinary users can view all content, but are only allowed to modify relevant data within their group.

Legal cases contain a large number of circumstances and circumstances, and each of these circumstances or circumstances will belong to a category of text classification. Model node management is a structure composed of key words such as cause of action, situation, and circumstance in law as nodes [17]. This structure makes it easy not only to manage categories but also to understand the relationships between categories of data. The node management module mainly includes node management and node attribute management. Node management includes adding child nodes, adding peer nodes, deleting nodes, modifying nodes, and moving nodes. Node attribute management includes five parts: basic attribute, extraction equation, derivation equation, FACT element XML template, and relevant laws and regulations. The basic attribute module can display and modify basic information such as node ID, name, category, and parent. Legal text sources include indictment, court record, and judgment, each source of the text structure is different, and the corresponding extraction rules will be different. Extracting rules is the focus of the module, and its function is to configure rules for nodes from different sources. To facilitate system management, all rules are centrally stored in the regular management subsystem. The main function of the fact element XML template is to add nodes in batches by analyzing THE XML structure. After traversing the XML hierarchically, XML nodes, attributes, and subsets are added to the model nodes in sequence.

The classification thesaurus management module includes two parts: classification thesaurus maintenance and thesaurus content maintenance [18]. Thesaurus maintenance module can add, modify, and delete thesaurus. Thesaurus can be divided into general thesaurus and special thesaurus. The generic thesaurus is suitable for all cases, while the specialized thesaurus is suitable for some cases. Thesaurus content maintenance is a function module for adding, deleting, modifying, and querying keywords in thesaurus. Thesaurus includes keywords, word length, and word frequency. The contents of thesaurus are displayed in two forms: text box and list, which is convenient for all kinds of users to query and analyze data.

The expression management module includes data processing, expression template generation, element analysis, and other functions. Data processing is the core of the module, including text classification, privacy processing, classification thesaurus comparison, and other functions. The nodes and rules of module node management module provide judgment basis for text classification, and the module lays a solid foundation for the realization of text classification function [19]. The algorithm design of text classification is as follows: starting from the root node, the data is matched with the rules of the node; if the data matches the rules, the recursion goes down layer by layer; if the data does not match the rules, the recursion of its subordinate nodes is stopped; finally, the data is allocated to the node of the deepest level that meets the rules.

The functional module of the image segmentation algorithm text classification system is shown in Figure 2.

3.5. System Database Design

Database design is very important to the whole system design, and the realization of the system function, stability, and expansion ability is closely related. The database for designing the legal text classification system is shown in Figure 3.

The database entities involved in the legal text classification system mainly include node, source, derivation equation, relevant regulations, case logical section, expression node, expression, thesaurus, and thesaurus keywords [20]. Node description provides basic information about a node. In the source description category of the rule, regular information units are used to describe the rules and logic for extracting text. Legal provisions push relevant laws and regulations for nodes. The case is consistent with the data stored by the entity and is used to describe the data content to be processed. Description node records basic information about the description node. Statement records the basic content of the statement. Thesaurus describes basic information about thesaurus. Thesaurus keywords record basic information about the words contained in the thesaurus.

4. Experimental Measurement

4.1. Model Measurement

The experimental model has strong noise resistance. By selecting document images containing ground truth, the qualitative and quantitative evaluation results of the proposed model are given. The proposed model can be effectively applied to the segmentation of some real images. Through the segmentation of an ideal binary image, the parameters of the model are set as 5, 1, 0.01, , and . The value ranges from 0.4 to 0.99. The value is adjusted according to different images. The segmentation results of the first group of experimental models after 100 and 1000 iterations verify that the model has strong antinoise performance and has good effects in the extraction of sharp corners and deep concave edges. The segmentation results of the experimental model after 10, 100, and 500 iterations verify that the model can better process images with deep concave edges. The segmentation results of the model and the processing results of images with deep concave edges are shown in Figures 4(a) and 4(b).

As can be seen from the figure, with the evolution of the model, the four sharp corners of the rectangle are gradually smoothed, and the contour will eventually disappear. After 1000 iterations, the difference between the level set function and the auxiliary function approaches -11, which is consistent with the result of the theorem. Segmentation results of the model after 10, 100, and 500 iterations. As the four sharp angles of evolution are gradually smoothed, the evolution curve falls into a local minimum and cannot enter the deep concave region. The model can accurately extract sharp corners and deep concave boundaries from two images.

4.2. User Management Function Testing

Data has high security requirements. Therefore, you must strictly manage and control user operations. The system administrator realizes the management of user information and user rights through the user management module: System Administrator Click Manage Menu. The administrator can query, add, modify, and delete user information. The permission management page under the permission module is displayed, where the administrator can set and modify the permission. The functional test of the user management module includes adding users, modifying users, deleting users, and setting user permissions. The test results are shown in Table 1.

4.3. Model Node Management Function Test

The model node management module mainly includes node management and node attribute management. Click the “Criminal,” “Civil,” “administrative,” “compensation,” and “execution” buttons, and the system will display five different types of node trees. Users can right-click the menu bar to add child nodes, add peer nodes, delete nodes, modify nodes, and move nodes. Click the node itself, and the right interface will display the corresponding attributes of the node, including basic attributes, extraction rules, derivation rules, fact elements XML template, and relevant laws and regulations. The functional test of node tree management module includes adding nodes, querying nodes, adding element XML, pushing regulations, and other operations. The test results are shown in Figure 5.

4.4. Classification Thesaurus Management Function Test

Thesaurus maintenance includes adding thesaurus, modifying thesaurus, and deleting thesaurus. Click “Add thesaurus” button to add thesaurus by configuring thesaurus name, the relationship between thesaurus and cause, and other contents. Through the form of text box, the system can intuitively display the keywords and word frequency of thesaurus, which is convenient for business personnel to view and analyze. Through the form of list, the system can add, delete, modify, query, and other functions of thesaurus keywords. The functional test of thesaurus management module includes adding thesaurus, querying thesaurus content, adding keywords, and other operations. The test results are shown in Figure 6.

4.5. Express Management Function Tests

The data processing module determines the data source and the range of nodes to be matched through the configuration of the case ID range, cause of action, trial court, nature, ruling procedure, processing node, and other conditions. After data processing, click the node in the representation node tree to view and analyze the corresponding representation. The expression in law has many attributes, such as constitute, not constitute, cognizance, and cognizance. Click the corresponding menu box to get the statement under a specific attribute. During the analysis, users can add, modify, delete, sort, query, and filter statements. After user filtering and maintenance, standardized statements can be used as statement templates. The process of element extraction is as follows: the functional test of the expression management module includes data processing, query of expression, and view of expression element words. The test results are shown in Figure 7.

5. Conclusion

Image segmentation technology is becoming more and more important in human daily life. Its purpose is to separate the area of interest from the image for better image analysis. The image segmentation method based on partial differential equation has been welcomed by scholars at home and abroad, and the application of image segmentation technology to text classification analysis has a broader prospect. This paper introduces the significance of image segmentation technology and extends the research of image segmentation method based on partial differential equation. An image segmentation model based on partial differential equation is proposed, which is applied to level set function by auxiliary function. The software architecture of image segmentation algorithm text classification system is proposed by using the three-layer architecture model and computer technology. The model is well reflected in image segmentation text classification with the help of partial differential equation, so as to improve the classification ability of text classification algorithm. System in accuracy and development time requirements is relatively high; so, in the text classification which is used than the traditional rule-based classification method, we should further study and study some advanced technology and make the system in the subsequent version more intelligent.

Data Availability

The data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest

The author declares that he has no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.