Abstract

Through data mining technology, the hidden information behind a large amount of data is discovered, which can help various management services and provide scientific basis for leadership decision-making. It is an important subject of current police information research. This paper conducts in-depth research on the investigation analysis and decision-making of public security cases and proposes a case-based reasoning model based on two case databases. Moreover, this paper discusses in detail the use of data mining technology to automatically establish a case database, which is a useful exploration and practice for the public security department to establish a new and efficient case investigation auxiliary decision-making system. In addition, this paper studies the method of using data mining technology to assist in the establishment of a case database, analyzes the characteristics of traditional case storage methods, and constructs a case investigation model based on artificial intelligence data processing. The research results show that the model constructed in this paper has certain practical effects.

1. Introduction

Information construction has been initially applied to police work. However, because it is in its infancy, information business processing can only play its most basic functions, such as simple search, add files, delete files, storage, and statistic, but lack real substantive analysis and prediction functions. With the continuous development of work, the information system has realized the transformation of the database from small to large. Therefore, fully integrating the functions of analysis, judgment, and decision-making into data construction can make a reasonable transformation of data between micro and macro data as required.

At present, the informatization of police work in our country has not been popularized. Even if information technology is used in the work, the technical level is generally low. Therefore, scientific and technological means cannot be effectively used to collect and manage information resources. In order to severely crack down on crimes such as high-tech and high-IQ crimes, it is also necessary to combine police work with information construction [1]. By using advanced technology and the high efficiency of information resources, the level of scientific and technological knowledge of police officers can be improved. The full application of modern information technology to police work can improve the detection capabilities of police officers and effectively combat various forms of crime. Moreover, it is also the best way to improve the efficiency of detection and can fundamentally control the crime rate and achieve social stability and harmony [2]. At the same time, information technology is used to fully reflect the substantive potential problems behind things and find out the rules, so as to meet the detection and investigation needs of police officers. The current development of our country is facing diversification, globalization, popularization of information, and deepening of technology. However, with the rapid development of society, the unstable factors in the social environment are increasingly exposed [3]. How to better maintain social stability and harmony has become the biggest challenge facing police officers in public security organs. Police officers must adhere to the strategic strategy of driving science and technology and fully integrate information technology into practical work to improve the scientific and technological content of our country’s police work. Building the modernization of police work in an all-round way and making police work more efficient and dynamic are the focus of the current work of our police officers [4].

This paper conducts in-depth research on the investigation analysis and decision-making of public security cases and proposes a case-based reasoning model based on two case databases. Moreover, this paper discusses in detail the use of data mining technology to automatically establish a case database, which is a useful exploration and practice for the public security department to establish a new and efficient case investigation auxiliary decision-making system.

As an important content in database information technology, Knowledge Discovery in Database (KDD) was first used when database technology was in the development stage [5]. Later, it was used as an academic research topic, and the academic community would regularly hold seminars on KDD technology. At the meeting, someone vividly combined data with mineral deposits, and the term “data mining” came from this. After that, the United States continued to hold seminars on data mining content, and the research content became more and more professional, and the number of people and experts participating in data mining research increased, even reaching several thousand. Moreover, academic papers on KDD are increasing, and data mining has become a new topic in the field of information technology [6]. With the deepening of research, data research has developed from the initial basic research to comprehensive research and development. It not only incorporates a variety of research strategies but also applies scientific knowledge in various fields. Data mining has become a hot topic in various seminars, and it has been continuously discussed and studied in depth by scholars from various countries [7]. This paper takes the Chicago Police Department (Chicago Police Department, hereinafter referred to as CPD) 2013 prevention work large-scale data as an example. CPD is working to create a social graph similar to Facebook. In order to combat gang violent crimes, it uses network analysis to track active gang members and issue warnings to them to prevent possible violent crimes [8]. In 2012, there were more than 500 murders in Chicago. So far this year, the city’s murder rate has dropped by 22%, many of which are related to violent criminal gang activities [9]. Data mining is a network tool that can analyze the conversations, hobbies, and social activities of its members and provide accurate search results. On this basis, CPD constructed a model. The model includes data variables such as the number of times each gang member has been shot and the number of criminal records, so as to identify the “most active” residents of Chicago, that is, those who are most likely to be involved in violent activities [10]. At the beginning of 2013, CPD announced the list of the top 20 “most active” members of the 22 police districts under the jurisdiction of the city of Chicago. After the list was confirmed, the Chicago Police Department and the “Violent Crime Reduction Strategy” organization cooperated to continuously send warning messages to those included in the gang. Some members of these gangs were surprised to receive such news [11]. A piece of Rigel software written in [12] is a criminal tracking software based on “geographic analysis technology.” Through the analysis of the distribution of crime scenes, it unearthed the distribution of criminals’ (especially criminals in serial cases) residences and other criminal hiding places. The practice of police application has proved that data mining technology can help police analyze past cases, discover crime patterns and characteristics, and find commonalities and similarities. Moreover, data mining can assist leaders in decision-making and serve the various fields of public security work such as combating, preventing, managing, and controlling. In the past, the analysis of police situation and series and parallel cases required manual force to complete, which wasted huge manpower and material resources, and the analysis results were incomplete and unreliable. At present, by using the latest data mining technology, the above problems can be completed in a few hours, which greatly improves the work efficiency of the public security organs [13]. In the long run, big data and data mining technology can optimize the allocation of police resources and enhance the ability of public security organs to fight crime, thereby enhancing the level of social and economic development and the people’s sense of security and satisfaction [14].

3. Search Criminal Cases

This article combines machine learning data processing methods to process case investigation data and then calculates and exercises the algorithm.

The purpose of case retrieval is to retrieve one or more cases that are most similar to the target case from the case database, so case retrieval can be attributed to the comparison of similarities between the two cases. In case reasoning systems, there are three commonly used retrieval methods: nearest neighbor method, inductive index method, and knowledge guidance method. We can use only one, or we can use multiple methods to share.

When searching for similar cases, this paper adopts the nearest neighbor method. The meaning of nearest neighbor is as follows: if case is the nearest neighbor of case , that is, and are the most similar, then for any case , there must be the following [15]:

Among them, is the similarity calculation function, and the larger the value, the more similar the objects. Similarity is not only the core content of the nearest neighbor retrieval method, but also the key to case retrieval.

In the case described in a DBML document, is an element or attribute value of the document, and it may also be a complex data type element (such as victim object information). If it is a complex data type element, we must first calculate the weighted sum of the local similarity of the corresponding nested element (attribute) (the calculation method is the same) and use the calculated weighted sum value as the similarity of the complex data type element to participate in the similarity calculation of the entire case [9].

We set the source case (cases in the typical case database) as and the target case as . is the th attribute of the case. According to the nearest neighbor method, the similarity between the source case and the target case is defined as follows:

In the formula, represents the weight of each attribute.

For criminal cases, after data preprocessing operations, the types of attributes can be summarized into four types: numeric types, binary variables, enumeration types, and string types. Different attribute types have different similarity calculation methods, here are divided into four cases: (1)Numerical type

It uses the minimum method to calculate the similarity [16]:

In the formula, when , there is . Among them, represents the smaller value in , and represents the larger value in .

For example, the age of a person can be calculated as a continuous numeric type. (2)Binary variables

It determines similarity based on whether it is equal or not [17]:

For example, smoking can take two values, 1 means smoking, and 0 means not smoking. If both smoke or neither smoke, it is represented as , and if one smokes and the other does not, it is represented as . (3)Enumeration type

This type is divided into two situations, when the enumeration value is a numeric type [6]:

In the formula, represents the set of enumerated values, represents the largest one in the set , and represents the smallest one in the set . At the same time, is the interval distance of the set .

When the enumeration value is a string type,

For example, the case type is an enumerated string. (4)String type

It determines similarity based on whether it is equal or not [17]:

It can use the “keyword fuzzy matching” method to calculate the similarity value of two strings, instead of simply using 0 and 1 to indicate similarity. This calculation method is more suitable for calculating the similarity of large sections of text.

Tree construction steps. This phase breaks the two phases of the traditional FP-growth (Frequent Pattern Tree) algorithm, that is, the insertion phase and the remanagement phase. In the insertion phase, the item sets of a transaction are inserted into the tree in descending order of frequency. Moreover, the biggest feature of this method is that if is now responsible for inserting, ‘s affairs must be considered. To put it another way, the frequency of items in transactions before needs to be considered. The proportion of this tree in the project edit distance exceeds a certain threshold.

In the frequent item mining stage, the tree mining stage follows the FP-growth mining technology. The FP-growth algorithm is used to mine frequent items, and the frequency must be within the threshold specified by the user to mine the constructed ISPO-tree (reedispO tree). The mining technology of the FP-growth algorithm is also used in CP-tree (compressible-prefix tree) and Cantree methods.

The source of this unstable dataset is that these data may be slightly modified, as shown in Figure 1. First, we introduce an example of the ISPO-tree method without a small amount of data change, that is, is in the inserted state, and there is .

It can be expressed by a formula [18].

Then, after the algorithm adjusts the tree, the distance value is cleared. The transaction that has been inserted is shown in Figure 2, and is replaced with 1.

This is mainly to be closely related to the characteristics of electronic evidence analysis. For electronic evidence, it is not only necessary to ensure its durability, but also to maintain its validity. Then, some evidences must be copied from the source data before and cannot be modified. However, in the electronic evidence analysis system, in order to analyze the validity of the evidence, it can correct the existing copied data according to the correct modification. For example, when a suspect forges identity, time, location, etc., we need to modify the data to better find evidence in the future. However, this kind of modification is definitely a small amount. If all the analysis is rerun, a lot of time and resources will be wasted. Therefore, in this case, a small amount of modified algorithm is proposed to improve efficiency [19].

In Figure 3, the ISPO-tree generated after modifying is shown. After the remodification of item is completed, flag bit is preset to 1 by us. Then, we recalculate the value to determine whether the threshold is greater than or equal to the minimum character and determine whether the tree needs to be reconstructed [21].

represents the affairs to be changed. If is not empty, it proves that there is a task to modify the transaction, and then, the flag bit is changed to 0. The method of calculating the distance is the same as in the previous section. In the method adopted, the “-” sign represents deleting an element, and the “+” sign represents adding an element. For example, in the following table, a transaction is added, the flag bit is set to 1, and is to be changed. Then, the meaning in the table is to change a to f [20].

Since the item header table in FP-tree will be linked to the item’s position in the tree, the algorithm finds the position of the element from the first changed transaction to modify the frequency and recalculate the threshold. After that, according to the threshold, the algorithm decides whether to reconstruct the tree. Therefore, the branch tree constructed by the ISPO-tree algorithm is less than the branch tree constructed by the previous two algorithms, and it supports the modification of a small number of elements.

4. DC-STree Improved Algorithm

The algorithm defines as a dataset. Each object is described by some characteristics. The algorithm defines as a tuple , ( is the domain of , ). The feature subset of an object is defined as , that is, the feature attribute set in S. represents the value of the feature. For each feature subset, has a Boolean similarity function to describe the relationship between objects. For example, the algorithm gives two definitions, and , which means is similar to . On the contrary, is not similar. The following two formulas are examples of Boolean similar functions [22].

In the formula, is the comparison function of characteristic attribute value. The following two example formulas are as follows:

We need to pay attention to the use of similar functions. Equation (10) uses a comparison function like Equation (12), and Equation (12) creates an equivalent function. However, the use of similar functions in formula (10) or formula (11) combines the comparison function formula (12) and also the comparison function formula (13), . Equation (13) creates a similar function, but different from equality.

We define in , there is , the Boolean similarity function is not , and then, the frequent number of (that is, the degree of support) is defined as follows:

If (support) is not less than threshold , is called frequent similarity pattern. If you want to find all the frequent similar patterns contained in the object set, a useful property will be introduced below, which can reduce the number of mining and improve efficiency. The structure of STree is shown in Figure 4.

For any , there is . Then, the following conclusions will be obtained [2]:

5. Model Building

The data in this article comes from the Internet. The data in this article is processed by machine learning methods and combined with mathematical statistics tools to process statistical charts.

The model system in this paper uses the association rule algorithm in data mining technology to mine the database source data in the existing electronic evidence management system, find frequent item sets, generate association rules, and present results. Moreover, it is used to analyze the relationship between cases and the relationship between criminal suspects and predict possible cases and the motives of criminal suspects. This prototype system is mainly an electronic evidence analysis system. The data source of this analysis system can come from the existing computer forensic system and electronic evidence management analysis system and can be obtained from the hard disk, such as text files, video and audio files, email clients, browser records, QQ client and Fetion client chat records. Mobile devices include mobile hard disks, mobile phones, and U disks, as well as mobile phone address books, short messages, and chat records of smart phone clients. The data obtained by these forensic tools will be stored in the database or data warehouse in the electronic evidence analysis system. The focus of this paper is to use association rule algorithms to mine tacit knowledge from these databases or data warehouses that store electronic evidence and finally present it and store it in the knowledge base. The entire system architecture diagram is shown in Figure 5.

After analysis and certification, the system first preprocesses the evidence data in the original database, extracts useful evidence data from the database, and removes some irrelevant evidence data. After that, the system hierarchizes and transforms these evidence data concepts and selects some related attributes to transform into the form required by data mining technology. After the preprocessing of the evidence data is completed, the analysts use the data in the improved FP-growth algorithm proposed in this paper to use association rules for data mining. System business flow chart is shown in Figure 6.

The electronic evidence similarity frequent mining algorithm first proposed in this paper has two advantages. The first point is that the time in the data preprocessing part is saved. By formulating similar rules, the work of frequently modifying data items compared to the data preprocessing stage is saved. Moreover, the time spent in the execution of the entire algorithm is much smaller than that of data preprocessing, as shown in Table 1 and Figures 7 and 8.

6. Model Performance Analysis

This paper analyzes the data processing performance of the model. First, this paper studies the classification effect of the model on criminal cases. In the classification profile, the specific distribution interval of the 10 classifications and the number of cases contained in each classification are reflected. In order to make the boundaries of the 10 classifications clearer, an accurate classification boundary interval is obtained by drilling. The specific money value distribution of 10 classifications is shown in Table 2 and Figure 9.

Next, this paper analyzes the accuracy of case investigation by the model constructed in this paper. This paper collects the case data that has been solved in the past 3 years, inputs the case data into the model for data processing, and compares the obtained results with the real results to calculate the correct rate of case detection. The results are shown in Table 3 and Figure 10.

As shown in Figure 10 and Table 3, after inputting real data from the investigation scene of the case, this paper found that the detection rate after the investigation of the case was distributed between 45% and 65%. It can be seen that the practical effect of the model constructed in this paper is very good and can be applied to practice in the later stage.

7. Conclusion

This paper conducts in-depth research on the investigation analysis and decision-making of public security cases and proposes a case-based reasoning model based on two case databases. Moreover, this paper discusses in detail the use of data mining technology to automatically establish a case database, which is a useful exploration and practice for the public security department to establish a new and efficient case investigation auxiliary decision-making system. In addition, this paper studies the method of using data mining technology to assist in the establishment of a case database, analyzes the characteristics of traditional case storage methods, and designs and implements a CBML criminal case modeling language that conforms to the XML standard. In order to mine traditional case data to extract effective case information, this paper discusses how to perform data preprocessing, including operations such as data cleaning, data integration, data transformation, and data reduction. Then, this paper applies outlier data analysis and cluster analysis techniques to case mining and designs an electronic evidence analysis system model based on data mining. The model proposed in this question mainly includes three parts: electronic evidence preprocessing, electronic evidence frequent pattern mining, and electronic evidence similar frequent pattern mining. The experimental results show that the model constructed in this paper has good performance.

This paper mostly uses simulation research combined with a small amount of data for analysis. The model studied in this paper needs further practical research in the follow-up.

Data Availability

The labeled dataset used to support the findings of this study is available from the corresponding author upon request.

Conflicts of Interest

The author declares no competing interests.

Acknowledgments

This research is supported by the following: (1) General Project of Humanities and Social Science Research in Colleges and Universities in Henan Province in 2022: Research on the Countermeasures of Telecommunication Network Fraud Crimes in the Post-epidemic Era (2022-ZZJH-014) and (2) Henan scientific and technological research projects (202102310487).