Abstract

Community safety has become an important part of social public safety. The construction of a safe community focuses on the accumulation of community safety capabilities. This paper discusses the application of big data technology in community safety construction and the improvement of community safety promotion capabilities. We analyzed the sources and collection methods of community data, classified multisource heterogeneous community data, and constructed seven types of community data. We designed the conceptual structure and storage structure of the community database. On the basis of the construction of the community database, the architecture design of the big data platform for community security was launched. From the perspective of different user types, the functional requirements of the big data platform were analyzed. Combined with demand analysis, the overall architecture design of the community big data platform was carried out. On the basis of the overall architecture, the application architecture and technical architecture were designed in more detail, and the key technologies of the community big data platform were analyzed. Finally, it analyzes how to use the community big data platform to predict public security risks by constructing a CART regression tree model.

1. Introduction

For a long time, the changes in the natural environment and the destruction of man-made factors have intensified the deterioration of the ecological environment in western China, and the contradiction between the environment and development has become increasingly prominent [1]. To protect the ecological environment, while improving the production and living conditions of farmers and herdsmen in the region, starting in the 1990s, with vigorous promotion by the government, the ecological immigration project has been implemented on a large scale in western China [2]. The implementation of ecological migration has played a certain role in improving the local ecological environment and promoting regional economic development. However, for those farmers and herdsmen who voluntarily relocated or were forced, they were faced with both opportunities and challenges. In the process of constructing an immigrant community, there is always some dissatisfaction. At present, safety issues were the most important in the ecological immigrant community [3]. This article will discuss how to improve the safety management of the community and realize the goal of intelligent community management in combination with big data technology.

Yoko Shiraishi of Ritsumeikan University in Japan studied the safety of the elderly under the safe community movement [4]. After analyzing the data of six international safe communities in Japan, she found that the safe community movement has effectively improved community safety. The number of falls and other types of dangerous events has been reduced, which ultimately improves the quality of life of the elderly [5]. Cecilia studied the factors that promoted the sustainability of safe community projects. They believe that safety promotion is different from injury prevention that focuses on technical solutions. Safety promotion tries to influence people’s attitudes; to better carry out the sustainable development of safe communities, information exchange is a necessary factor. The safe community needs to provide information to decision-makers, residents, and staff on a regular basis in order to incorporate the safety plan into their daily work; and the cooperation between the safe community and the media is an area that can be improved in the future [6]. Nordqvist et al. established a control group in the communities where safety interventions were carried out and ordinary communities and assessed the safety risk status of children in the two types of communities. The results showed that the risk of injury to children in communities that took safety interventions was significantly lower than that in the control group [6]. Chen and Adefila proposed a higher education participation model for “effective disaster prevention and mitigation education.” Research shows the positive impact of college students participating in community “effective disaster prevention and mitigation education” through field trips and internships. In their analysis, they concluded that incorporating “effective disaster prevention and mitigation education” into higher education is beneficial to the community and students. Local higher education institutions should cooperate with local communities to maintain an impact on the sustainable development of society [7]. Barbosa et al. researched and tested and used a data service platform with complex authentication, authorization, logging, and auditing mechanisms. This platform is used to achieve safety assistance and comprehensive care for the elderly in the community. This is to establish a support service platform and a complex safety mechanism that effectively combines the safety management of the elderly in the community, but at the same time, it is also found that the platform still has potential for development in terms of data collection [8].

Today, the world has ushered in the era of “big data.” Big data related technologies have become more mature and have been used in many fields. Many scholars study the application of big data in the field of public safety. Scholars design big data platforms, combine and mine urban big data, and then discover new ideas for social public safety governance. The society is also aware of the importance of big data to social development, and the government actively promotes the application and development of big data in various fields [9]. In summary, there are few studies on the application of big data platforms to the construction of community safety. Community security has become the foundation of social security, and big data is also playing an important role in more and more fields, and big data still has a lot of room for development. Therefore, the combination of big data and community security emergency management will be the focus of future research and trends.

2. Construction of Community Management Database

There are many types of data in the community, and there are also many data sources. The first step in establishing a community database is to collect community data, and the basis of collecting community data is to clarify the source of the data and what different types of community data can be collected from these sources. This section will analyze the source of community data.

2.1. Management Department and Organization Data System

The government departments that construct and manage the community have basic information and data in the community, including the basic information of community residents, infrastructure information in the community, construction information, greening information, and so on. The community information data of government departments is the most basic data of the community [10]. Although these data have a certain amount of data and involve many types of data, the content of the data is not detailed, and there is also the problem of timely data updating. The data system of the community property contains the basic information of the owner, the house information, and the information of the cleaning personnel, maintenance personnel, security personnel, and other staff in the community.

The community data in the community construction and management department data system can be used as the basic data of the community big data platform database. Starting from the needs of the platform, these basic data are filtered, cleaned, and stored. However, there are barriers to data sharing with management departments. The construction of community big data platforms must actively cooperate with multiple management organizations to seek data sharing and break the barriers between community data systems [11].

2.2. Equipment Collection Data

Detection and monitoring equipment installed in modern communities collect real-time data on changes in the community. The detection and monitoring equipment can collect the data of the infrastructure equipment in the community, the data of the community residents, the community environment data, and so on, with multiple data types, rich data structures, and the characteristics of real-time data update. The data collected by community detection equipment is an important source of real-time data in the community database. There are mainly the following forms of equipment collection data in the community:

2.2.1. Sensing Detection Equipment

Various sensing devices such as temperature sensors, infrared sensors, and water immersion sensors installed in the community collect dynamic data on the environment, infrastructure, and residents in the community [12]. The data collected by the detection equipment in the community has the characteristics of timely update, and the data reflects the real-time status of the detection object. The data collected by the detection equipment is a powerful support for the daily management of the community, the prediction and prevention of safety accidents, and the emergency decision-making of emergencies.

2.2.2. Video Surveillance Equipment

Video surveillance equipment collects dynamic video, image, and audio data of the movement of people in the community in real time. Face recognition camera collects facial image information of people entering and leaving the community. 3D oblique photography technology collects building house information and outputs 3D video data of community houses after modeling. The community data collected by video monitoring equipment is mainly semistructured and unstructured data. The data is visualized and more intuitive, and it can also realize real-time data collection.

2.2.3. Modern Technology Equipment Such as Drones and Robots

Modern drone technology combines detection equipment and video monitoring equipment to collect data in the community and combines the types and characteristics of data collected by the two types of equipment [13]. Robots patrol the community, collect road information, real-time information on the flow of people, and so on. The community information collected by modern technology equipment can be combined with the characteristics of data collected by multiple devices, including multiple data structures and data types.

2.3. Personnel Collecting Data

The data information collected manually by the community grid members supplement the data that the equipment cannot collect. There are more types of data collected manually, and the data is more detailed. Manual collection has great flexibility. It can check the conditions of equipment, buildings, and environment, communicate with residents, and collect unstructured text information in the community. Manually collected data is an important source of multisource heterogeneous data in community databases.

2.4. Community Safe Data Network

Store the previously data collected in seven types of databases. Data entities in different types of databases are closely linked. Start with a community entity as the center, connect with other data entities, and radiate to the surroundings in a “radiation type.” Each data entity can be used as a center to connect other data entities with a radial model [14]. After continuous connection, the data in the community database will eventually form a community data network that is connected as a whole. Figure 1 shows a schematic diagram of the community data network, with residents of a certain community as the center, forming a part of the community data network. The figure shows an example of the data in the local community data network. The data takes people as the main unit and is expressed in a tree-like structure, which is then associated with related items, fire-fighting facilities, and so on.

The community data network makes the community data a whole. The data entity can be described from different angles. For example, the residents of a certain community can be the owner of a house, the owner of a vehicle, the owner of a pet, and the object collected by video information. Establish a “community data cube,” that is, “multidimensional” data. Each data is related to different data from multiple angles. In practical applications, according to the type of accidents concerned, take a certain community data as the center, and extract the data related to the accident from the multidimensional associated data [15]. For example, in case of a fire in a household, taking the community resident data as the center, select the associated house data, family data, neighbor data, fire equipment data, sensor data, and so on, and extract and process these fire related data from the community data network, which is conducive to further data analysis and processing.

2.5. Design of Community Safety Database Structure

Community data comes from objective entities that exist in the community, and the community database is a collection of community data. The community database is essentially a study of objective entities in the community and reflects the existing connections between entities. However, the physical forms that exist in reality cannot directly enter the computer. They must be abstracted into data and stored in the computer before they can be recognized, analyzed, and processed by the computer. Conceptual structure design is the process of abstracting real entities into conceptual models [16]. The entity-relationship model (i.e. E-R model) is one of the most commonly used modeling methods for database conceptual models. This section will use the E-R model method to design the conceptual structure of the community database. The main components of the E-R model are entities, attributes, and connections. The specific representation methods are as follows.(1)The entity is represented by a rectangle, and the entity name is inside the rectangle.(2)The attribute is represented by an ellipse, and the attribute name is inside the elliptical box, and it is connected to the entity with an undirected edge.(3)The connection between entities is represented by a diamond, and the connection name is inside the diamond box. The connected entities are connected by an undirected line, and the connection type is indicated on the undirected line. There are three types of contact, one-to-one contact (1:1), one-to-many contact (1:n), and many-to-many contact (m:n).

Taking a fire accident in a house in the community as an example, a fire accident occurs in a house. Multiple residents can live in this house. Multiple firefighting and first-aid equipment are used to participate in fire rescue work. The fire accident will have a harmful impact on residents. Part of the data conceptual model design of the community fire accident is shown in Figure 2. The E-R model diagram shows the conceptual structure and logical relationship of some data.

2.6. The Design of the Storage Structure of the Community Security Database

The community database is a relational database, and there is a logical connection between entity data. Design the data storage structure on the basis of the data conceptual structure model. Each community entity has a unique code that can identify it in the storage of the database. This code is the entity’s identification attribute, such as resident code, house number, equipment number, and vehicle number. Establish the layer table, class table, and detailed attribute table of the community database [17].

The layer table corresponds to seven types of information at the database level, including the entity code, entity name, level information, level code, and other content of the level. The class table corresponds to this class code, entity code, and entity name. The detailed attribute table corresponds to the entity code, entity name, and attribute data of the entity. Take population data in the community as an example. The population data layer table contains the population data layer code. The three types of entity codes are permanent population code, temporary resident population code, and staff code; the permanent population category table contains permanent population code and permanent residents code, permanent resident name, and so on; the detailed attribute table of a resident contains detailed attribute information data, such as resident code, name, occupation, and contact information.

On the basis of the conceptual structure model of the database, the data are connected by a logical relationship. In the stored data table, the tables of each database layer establish a connection. The category code in the layer table is the same as the category code in the category table, and the category code in the category table is the same as the category code in the detailed attribute table, so that the layer table, the category table, and the detailed attribute table can be linked together. Through these connections, the query language of the database can be used to query and call relational data. The level definition of the data table is clear and simple. Different types of tables are connected by code relationships, and the connection path is concise, which improves the processing efficiency of query and call. Take the analysis of accidents in the community as an example. Data entities such as population data, building data, and infrastructure equipment data are all logically connected to the accident. Each entity level table, entity class table, and entity detailed attribute table are connected through entity codes to achieve multilevel data analysis and description of community accidents.

3. Community Big Data Platform Architecture Design

This part develops the architecture design of a community big data platform for community security. The community big data platform is built to realize the functions of community safety management, community risk analysis, and equipment and facility safety monitoring. The construction of the community big data platform facilitates the lives of community residents and the work of community managers, improves community safety capabilities, and reduces community safety risks. Starting from big data, modern technologies such as modern big data mining are used to build smart and safe communities. This part conducts research and analysis from requirements analysis, platform architecture design, and key technologies used.

3.1. Demand Analysis

This article divides the user types of the community big data platform into two categories, including general community residents and community manager users. Demand analysis is carried out separately according to user types.

Community residents’ demands on the platform are mainly divided into the following points.(1)Basic Data Acquisition. Residents can obtain public basic information through the data platform. Residents obtain public basic information through the output of the data platform, such as the safe operation and maintenance of infrastructure equipment; video surveillance in public areas; environmental conditions in the community; location and quantity of firefighting equipment. By obtaining this information, residents can master the daily safety situation of the community and quickly make favorable countermeasures in case of emergencies.(2)Receive Community Safety Notices. The big data platform carries out risk analysis and risk assessment on the community through data analysis and mining. Community safety information is sent to residents through the platform. Residents understand community risks and learn safety knowledge through the platform. By receiving safety information, accidents can be effectively prevented.(3)Provide Hazard Information. Residents who find dangerous situations in the community can reflect the situation to the data platform at the first time.

Community safety management is an important part of the management of community managers. Community managers have community safety management needs for the data platform. The main requirements are as follows.(1)Community Daily Safety Management. Community managers obtain community facilities and equipment data, resident data, environmental data, monitoring data, sensor acquisition data, and so on through the platform to carry out daily safety management.(2)Community Risk Analysis. The data platform carries out risk analysis and risk assessment on the community. Knowing the risks existing in the community, managers can timely investigate and manage the community risks, avoid accidents, and improve the safety level of the community.(3)Community Data Monitoring. Through the data collected by sensing detection equipment and video monitoring equipment in the community, the manager can carry out daily monitoring on the facilities and equipment, buildings, and community environment in the community and track and monitor the suspicious people flowing in the community.(4)Community Emergency Management. When an emergency occurs in the community, the manager uses the data platform for emergency management and takes emergency measures for the accident.(5)Analysis and Handling of Community Accidents. After the occurrence of community accidents, process and analyze the accident data, absorb the accident experience, and improve the safety management level and accident emergency level.

3.2. Platform Architecture Design

Using the relevant technologies of big data, build a big data system for community safety, sort out and integrate the community data collected from multiple sources, and design the overall architecture of the community big data platform in this paper in combination with the functional requirements and technical elements of users for the community big data platform. Under the background of big data technology, from the perspective of users, facing community security, real-time processing community structured, semistructured, and unstructured data, so as to realize the data technology intelligence of community security management.

The community big data platform is generally divided into the following parts: data input layer, data storage layer, data processing layer, application access layer, and data output layer. The overall architecture is shown in Figure 3.

For community data collected from different sources, the platform extracts, transforms, and loads the data through ETL technology. After preprocessing the multisource heterogeneous data, the data is stored according to different categories. Hadoop system is used to analyze and process different types of data. According to the functional requirements of platform users, the community data is comprehensively and intelligently analyzed. The results of data analysis can be visually output by means of emergency management platform and mobile terminal.

3.3. Technical Architecture

The technical architecture of the big data platform is divided into five layers:(1)Data Collection Layer. The data collected by various collection methods in the community are processed and imported through ETL technology.(2)Distributed Storage Computing Layer. This layer mainly uses HDFS and other technologies for distributed storage of data, MapReduce, and hive and for distributed calculation and analysis of data, and zookeeper technology for distributed cooperation.(3)Interface Layer. FTP, web service, R language interface, and other technologies will provide interfaces for data transmission.(4)Business Application Layer. This layer will realize the main functional applications of the community big data platform, such as data query, data fusion, data analysis, data matching, and risk analysis.(5)Access Layer. The data platform provides users with visual data output through different terminals, such as mobile terminal, web browsing, community integrated management platform, and so on.

3.4. Data Preprocessing

The multisource heterogeneous data collected by the community through various technical methods shall be preprocessed through ETL technology, such as data extraction, processing, conversion and loading, and the preprocessed data shall reach the data storage and computing layer. The ETL technical workflow is shown in Figure 4. After the system data is integrated with the collected data, data extraction and data cleaning are completed to obtain verification data. Datasets for different purposes are then stored in different locations after matching the dataset to the task.

3.4.1. Data Extraction

The first step of collecting community data is to extract data. The data extraction shall comprehensively determine the extraction rules by considering the data extraction frequency, data update frequency, and other contents based on meeting the needs of users. The data extraction of community data platform shall meet the extraction requirements of structured data, semistructured data, and unstructured data. The frequency of data extraction shall meet the actual needs of real-time updating of community data. The extraction method should include full extraction, incremental extraction, and so on.

3.4.2. Data Cleaning and Processing

The data extracted from different sources will also have problems such as data storage format, data integrity, classification, and import of data types, so it is necessary to carry out subsequent cleaning, conversion, and processing of the data.

Community database is a relational database, and community data has the characteristics of multisource and heterogeneous. Combined with the actual needs of the data platform, the data can be processed in two ways. There are structured community data in the management department data system. This kind of structured data can be processed by SQL statements. This method can be used to convert and process structured data clearly and efficiently. For semistructured and unstructured data, SQL cannot be processed. It needs to be processed in ETL engine to realize data type conversion, data cleaning, data matching query, and so on, according to functional and storage requirements and certain conversion rules.

In addition, due to the different data formats of different data, in order to keep the subsequent data interfaces consistent, the data needs to be aligned. The specific method is to keep different types of data in the same data type and use the long integer format for data storage.

3.4.3. Data Loading

Loading the extracted, cleaned, transformed, and processed community data into the storage computing layer is the last step of community data preprocessing. The formulation of data loading scheme needs to consider the loading efficiency on the basis of meeting the functional application needs of community platform users. Considering a large amount of historical data and real-time dynamic data collected by the community, the community data platform should realize batch data loading. A large amount of data should be loaded into different databases at the same time, and manual loading is supported.

4. Risk Prediction Based on CART Regression Tree

4.1. CART Regression Tree Construction

When analyzing the factors related to community public security risks, it is found that the characteristic data is basically composed of four categories and a total of 21 attributes. The decision tree can more accurately find the relationship between attributes and classes, and when the relationship is found, it can be used to predict feature data of unknown categories. Therefore, this article chooses to use CART regression tree to build a single model. CART assumes that the decision tree is a binary tree, the values of internal node characteristics are “yes” and “no,” the left branch is the branch with the value of “yes,” and the right branch is the branch with the value of “no.” Such a decision tree is equivalent to recursively bisecting each feature, dividing the input space, that is, the feature space into finite units, and determining the predicted probability distribution on these units, that is, the conditional probability distribution output under the given input conditions.

Since the prediction of future community security incidents based on various risk factors is a numerical prediction, the CART regression tree is used for prediction during the construction of the decision tree model. Figure 5 is the flowchart of the CART regression decision tree algorithm. In the process of model construction, the smallest MSE is selected as the dividing node of the feature data, and the average value of all sample data where the feature data falls into a certain node is used as the output value of the predicted value of community public security cases.

CART regression tree construction process is as follows:(1)In order to predict the volume of community security cases and verify the accuracy of the model, first, all feature datasets are divided into training sets and test sets. Construct a regression tree training sample set. In the community public security risk data set S, , where (Xi, Yi) is a single sample in the sample, Xi is a vector constructed according to the data source introduced previously, and Yi is the output target value, which is the predicted volume of community security cases.(2)Create a root node based on the community public security risk training data sample set constructed in step (1). When there is only one sample left, the node is returned as a leaf node, and when the sample sets corresponding to the root node are all of the same category, the node is also returned as a leaf node, and the predicted value is output.(3)For each feature predicted by community public security cases, perform a division on the feature and calculate the corresponding segmentation index value. This study uses the mean square error as the segmentation node and first selects a feature data value in the community public security data as a cutting point, calculates the mean square error at this time, and then recursively calculates the mean square error of the remaining characteristic factors of community public security. After traversing all the eigenvalues, compare all the mean square errors, and find the eigenvalue when the mean square error is the smallest is the best cut point for this regression tree.(4)According to the segmentation index value of the public security risk data in step (3), select the best feature segmentation attribute, and divide the community public security case sample set S into S1 and S2 sample subsets accordingly.(5)For sample subset S1 and sample subset S2, repeat the process from step (2) to step (3) until the generated regression tree no longer produces new branches.

To prevent the CART regression tree from being divided too finely, it will have an overfitting effect on the noise data in the community public security data, and the CART regression tree is postpruned. If the noise data in the community public security data interferes too much, so that the model has overremembered the noise characteristics but ignores the real relationship between input and output, it must be solved by pruning. Moreover, only after pruning the CART regression tree can the prediction of the CART regression tree in this paper retain the most important community security feature attribute value division. This article divides the community public security case data set into a training set and a test set. A training set is used to form the learned CART regression tree, and a separate test set is used to evaluate the accuracy of the CART regression tree constructed in this article on subsequent data. Using the cost-complexity pruning strategy in postpruning, calculate the surface error rate gain value of each nonleaf node in the decision tree, find the nonleaf node with the smallest value, and perform pruning. The calculation formula of is as follows:

Here, is the number of leaf nodes contained in the subtree species; is the error cost when node a is pruned; is the error rate of node a; p(a) is the data on the error rate of the prediction model accounting for all data Proportion; is the error cost of subtree if the node is not pruned.

Finally, after the CART regression tree prediction model is trained, save the model, and use the CART regression tree prediction model to predict the community security case test dataset. By comparing the prediction value of community security cases predicted by the CART regression tree prediction model with the accuracy of the test set data, the prediction accuracy of the CART regression tree prediction model is calculated.

4.2. Analysis of Results

The feature engineering analysis of the original input data is carried out, and the correlation analysis method is used. It is obtained that the correlation between the data was seven days ago and the target value is 0.978, and the correlation between the data was six days ago and the target value is 0.982. The data obtained seven days ago and the data obtained six days ago are strongly correlated with the target value. To enrich the amount of data, the first seven days of community security cases and the first six days of community security cases are taken as the input characteristics, and the maximum tree depth is 4 and the minimum leaf node is 5.

Figure 6 shows the comparison between the prediction results of CART regression tree prediction model on the test set and the actual observation values. The error between the test data and the predicted data is very small, which shows that the prediction model proposed in this paper is suitable, and also proves the feasibility and effectiveness of the method in this paper. The data takes one day as the granularity, the abscissa is the date, and the ordinate is the number of community security cases. The gray solid line is the actual number of community security cases collected every day in Z community of H City in 2019, and the red solid line is the prediction result of cart regression tree model. It can be seen from the figure that the trends of the gray solid line and the red solid line are basically the same. The two solid lines have two wave peaks and two wave troughs. From the beginning of a year to the first wave trough, the predicted value of the regression tree is much higher than the actual value, but the difference between the predicted value and the actual value after the first wave trough is very small. At the wave peak, the predicted value of the regression tree model is higher than the actual value. At the first trough, the trough value of the predicted value of the regression tree is higher than the actual value, and at the second trough, the predicted value of the regression tree model is lower than the actual value.

5. Conclusion

Facing community security, combined with big data technology, this paper analyzes studies and designs the relevant contents of community big data platform. This paper studies the origin and development of the concept theory of community in China. Using the method of questionnaire survey, this paper studies the relationship between factors such as community safety management and community residents’ satisfaction with the community and analyzes residents’ attitude towards providing personal data to noncommercial community big data platform. The classification of community database data types was studied, and the community database structure was designed. Based on the community database, this paper summarizes the functional requirements of the community big data platform, then designs the architecture of the community big data platform combined with Hadoop and other big data technologies, and analyzes and studies the key technologies in the platform. Finally, combined with the data in the database, the CART regression tree prediction model is constructed to predict the community public security risk, which provides convenience for the rational scheduling and allocation of community police resources in the future, and provides decision support for the prevention and control of community public security risk.

Data Availability

The dataset can be accessed upon request to the author.

Conflicts of Interest

The author declares that there are no conflicts of interest.