Metaheuristics-based Explainable Artificial Intelligence (XAI) Models for Real-world ProblemsView this Special Issue
Research on the Effect Evaluation of Urban Regional Environmental Management and Control Based on Data Depth Mining
Industrial and economic development in China are both advancing at a rapid rate. China is currently the world’s second-largest economy. The immediate result of China’s continuous economic development is a steady improvement in people’s living standards, the great improvement of living environment compared to the past, the deepening of the national urbanization level, and cities becoming the first choice for people’s living and life. Nonetheless, as the number of city dwellers continues to rise, the management and control of China’s urban regional environment face increasingly difficult challenges. The constant production of domestic waste by urban residents has a significant impact on the urban environment. Therefore, government departments have issued corresponding urban regional environmental control measures to improve the appearance of the urban environment at the government level. Due to the enormous size of the city, however, urban regional environmental control affects every aspect of residents’ lives. It has become extremely challenging to objectively, effectively, and precisely evaluate the impact of urban regional environmental control. On the basis of this issue, this paper seeks to develop an evaluation system employing data deep mining technology and computer and artificial intelligence-related technologies in order to achieve an objective evaluation of the effect of environmental management and control in urban areas and to provide a scientific foundation for the subsequent decision-making by government departments. Moreover, the model is highly robust and compatible with different levels of management and control regulations, even when the control effect of the corresponding control deployment is evaluated objectively.
In order to adapt to the development of social informatization, domestic state-owned enterprises, central enterprises, public institutions, and private enterprises have developed or implemented enterprise-appropriate management information systems [1–3]. The use of information management can not only improve management efficiency, but also standardize it, make it reasonable and justifiable, and contribute to the enhancement of enterprises’ core competitiveness. In comparison to traditional urban management, “digital urban management” is a novel concept. It refers to the use of advanced digital information technology to supervise and manage the city’s public facilities, city appearance, and environment, as well as the use of information-based technical measures to control all city parts and event information.
During the process of urban information management, an enormous quantity of diverse data will be abstractly stored in the database. Long-term accumulation of fundamental data in the database necessitates a statistical system to reorganize the business process data of urban management, generate statistical charts, and obtain data information. The conventional information management system serves as a tool to collect and organize data, but it only performs the preliminary work. Intelligent sizing of the conventional information system refers to the process of gaining knowledge from data. First, the data generated by the operation of the traditional information system are summarized and categorized to produce information, which is then analyzed and studied to produce knowledge . Ultimately, knowledge is transformed into wisdom. Intelligence can assist enterprise managers in making quick and accurate decisions, and the information obtained can assist senior leaders in understanding the enterprise’s current situation.
Due to the immense size of cities, urban area environmental control impacts every aspect of the lives of urban residents. Evaluating the impact of urban area environmental control objectively, effectively, and accurately has become extremely difficult. This paper attempts to develop an evaluation system employing data deep mining technology and computer and artificial intelligence-related technologies in order to achieve an objective evaluation of the effect of environmental management and control in urban areas and to provide a scientific foundation for subsequent government department decisions. In addition, the model is highly robust and compatible with various levels of management and control regulations, even when the control effectiveness of the corresponding control deployment is evaluated objectively.
2. Related Work
2.1. Data Mining
Data mining, also known as the discovery of useful knowledge in a massive amount of data, is a technique for extracting useful patterns and other crucial information from massive data sets [5–7]. Due to advancements in data warehousing technologies and the rise of big data, the use of data mining techniques has increased in recent decades, assisting businesses in transforming raw data into useful knowledge. Despite the constant development of technology to manage massive amounts of data, executives continue to face scalability and automation challenges.
The process of data mining can be roughly understood as a trilogy: data preparation, data mining, and the interpretation and evaluation of results, as shown in Figure 1.
It has never been easier to get into the world of data mining, and it has never been faster to get useful insights, thanks to technologies like Apache Spark, which combines data analytics and visualization. Artificial intelligence advancements are hastening industry adoption.
From data collecting through visualization to extracting meaningful information from enormous data sets, the data mining process entails a variety of processes. As we all know about data mining, it is important for the technology of data mining to generate descriptions of some concepts and learn some important rules to classify the unseen data for some important applications. In order to satisfy those requirements, lots of computer scientists and data scientists use the way and concepts of patterns, relationships, and some other useful tools to describe the data and store it in the computer with high precision; then we can do the tasks, such as classification, regression, and clustering, to analyze the data carefully and gain some important knowledge hidden in it . There usually exist four different and important phases in the whole procedure of the practice of data mining, which is setting the objective of the tasks, collecting the data with high quality, writing code to implement the algorithm we will use in the data mining task, and having a fair evaluation about our task and algorithms. In the following content, we will briefly introduce the abovementioned four steps.(1)Set objectives of the current business: This may be the most challenging aspect of the real application of the data mining process, and many companies overlook this crucial stage.(2)Data preparation: Once the scope of the problem is determined, data scientists may more easily determine which collection of data will assist them in answering the essential business questions. After gathering the data, they will clean it up, removing any noise like duplicates, missing values, and outliers .(3)Build the model and find the pattern in the data set. Depending on the nature of research, data scientists may look at any noteworthy data linkages, such as sequential patterns, association rules, or correlations [10–13].(4)Evaluation and application of knowledge: After the data has been compiled, the results must be assessed and understood. When it comes to finalizing results, they should be valid, distinct, valuable, and simple to understand.
2.2. Environmental Management
Environment and development are major issues of common concern in today’s world. At present, China’s economy is undergoing fast expansion. The expansion of energy consumption and traffic scale and the development of large industrial areas have increased the total emission of pollutants, and the issue of air pollution is growing increasingly serious. Therefore, the importance of environmental protection is becoming increasingly prominent.
The significance of this paper is to focus on solving major problems in the fields of population, resources, and environment, rely on scientific and technological progress, promote the coordinated development of science and technology, economy, environment, and society, further solve the problem of urban ambient air quality, and improve the quality of people’s living environment. We focus on several pollution projects such as sulfur dioxide, ammonia dioxide, carbon monoxide, and inhalable particulate matter that are most concerned by environmental monitoring departments. Using the relevant meteorological data and geographic information is mined for important information in a large amount of data. According to the analysis and control requirements of ambient air quality, the data mining objectives are clarified, and the data mining is carried out in combination with visualization technology. On this basis, the urban environmental management and control effect evaluation system is established .
2.3. The Evaluation System
China’s software system for quality monitoring and evaluation began relatively late, and there is still a significant gap between China and many developed nations. Currently, the traditional system for quality inspection and evaluation is still in use, which is not only inefficient but also prone to error. According to the traditional system of quality evaluation, when unqualified products are discovered, the quality of the product can be deemed to be unqualified. The final judgment cannot be made until all products have been tested, which significantly increases the product quality inspection workload. The inspection process is susceptible to individual differences and other factors, and the inspection effect is not sufficiently stable. The accuracy of a traditional quality judgment is between 85 and 90 percent based on incomplete statistics, while the rate of unqualified products is between 2 and 5 percent. The manufacturing industry has a relatively large product base, so the number of misjudged products every day is astounding, which hinders the development of enterprises and the improvement of the quality evaluation system.
State-owned enterprises, central enterprises, public institutions, and private enterprises have introduced enterprise-appropriate management information systems to complete scientific, standardized, and information-based enterprise management. Information management can improve management efficiency, standardize it, make it reasonable and justifiable, and boost enterprise competitiveness. “Digital urban management” is a novel concept. It refers to the use of advanced digital information technology to manage the city’s public facilities, appearance, and environment, as well as information-based technical measures to control all city parts and event information .
There are numerous types of data that will be abstracted and stored in the database within the context of urban data management. The database had accumulated a substantial amount of basic data over time. Currently, a statistical system is required to reorganize the urban management business process data, generate statistical charts, and obtain data information. The conventional information management system is merely a tool for collecting and organizing data. The acquisition of knowledge from data is known as intellectualizing the traditional information system. First, the data generated by the operation of a conventional information system are summarized and categorized to produce information, which is then analyzed and studied to produce knowledge. Ultimately, knowledge is transformed into wisdom. The information gathered can assist senior executives in understanding the current state of the company, and intelligence can aid business managers in making quick and accurate decisions.
Using data mining technology to analyze city management statistical data, we can discover the laws governing the urban management industry. Data mining is a form of data intelligence analysis that gained popularity towards the end of the twentieth century. The process can start with a data warehouse and other types of data sources to discover information resources and useful information. Data mining combines multiple technologies. The objective is to extract hidden, meaningful information from vast amounts of data. It is a specific step in the discovery of database knowledge. Finding useful information in the database frequently refers to the entire process of locating useful information in the data warehouse. By employing mining algorithms and evaluation and interpretation patterns, we intend to enhance the discovered knowledge. In this repeated procedure, the outcomes become simple to comprehend. This procedure involves numerous steps. It is composed of three links, as depicted in Figure 2.
3. The Evaluation System of Urban Regional Environmental Management
Through the analysis and investigation of the business process and data of the urban environmental management and control evaluation system, the requirements of the comprehensive evaluation system are determined . The system’s design will be completed in this section. In the system design part, based on the overall architecture diagram of the system, this paper expounds on the design of the system, then uses the frame diagram to explain each function of the system, and finally expounds on the database design.
3.1. The Framework of the System
Based on the comparative study of various architectures, the comprehensive evaluation system adopts a B/S structure for planning. The system adopts three-tier architecture for software architecture design and uses S $h framework to ensure software development efficiency and system stability, which is conducive to later system expansion. A data display layer, a business logic layer, and a data access layer make up the system. Figure 3 depicts the system architecture.
3.1.1. Data Presentation Layer
This layer is mainly responsible for the display of the interface, mainly including the view page for ordinary users and the management page for system administrators. The overall design is clean and simple, which is convenient for users to operate. The pages in the system are stored in S.P. files and can be used. CSS file loading style improves the beauty of the page; use. J.S. file for simple logical processing and interaction with the background. In the presentation layer, the page interaction is completed through JSP pages to send and accept requests to the background.
3.1.2. Business Logic Layer
This layer supports the system’s front-end display and is an integral aspect of the complete assessment system. The most important logic processing is realized in this layer. This part has complex logic and is a bridge to communicate the front-end user operation and database data . This operation is mostly manifested in two ways: for one thing, it receives the request sent by the front-end, judges the user authority, realizes human-computer interaction, carries out corresponding user operation, and returns display data; on the other hand, it operates the database and calls the data of the database according to the user’s operation to complete the routine operation of the database. In the business layer, action handles Dao components through the business model and collaboration objects provided by the spring IOC container to complete the business logic.
3.2. The Design of the Data Mining Algorithm
The results of the data mining module are mainly displayed by the K-means algorithm and C4.5 algorithm to analyze the data. The following describes the two algorithms and their application in the system.
3.2.1. The Basic Information about K-Means Algorithm
As a commonly used clustering method in machine learning and intelligence, the K-means algorithm belongs to the unsupervised learning algorithm. It is also an algorithm without labels on samples and then “segments” according to some rules to put the same or similar objects together . Table 1 lists the main steps of the K-means algorithm.
As we can see in the algorithm, there are several advantages and disadvantages to the K-means algorithm. The benefits include ease of comprehension and a high clustering effect. It is frequently sufficient to be locally optimal, even if it is only optimal locally. Second, the technique can offer strong scalability when processing big data sets. Thirdly, when the cluster is approximate to Gaussian distribution, the effect is very good. Finally, the complexity of the algorithm is low.
But there are some important disadvantages to the K-means algorithm:(1)The K value must be manually set, and the results achieved with various K values vary.(2)It is very dependent on the initial cluster center, and different selection approaches will yield different results.(3)The algorithm is sensitive to outliers.(4)Only one category can be assigned to a sample, making it unsuitable for multiclassification tasks.(5)It is not suited for classification of nonconvex forms, classification of too discrete categories, or classification of unbalanced sample categories.
Hence, we must refine the algorithm. In the following, we will give some useful approaches to improve the performance of the algorithm.
(1) Preprocess the Dataset. At its foundation, K-means partition the data into different clusters on the basis of Euclidean distance. Dimensions having a large mean and variance will have a substantial impact on data clustering. As a result, unnormalized data and the unified unit cannot be directly used or compared. Two typical data preparation approaches are data normalization and data standardization.
(2) Choose an Appropriate Value for K. The choice of K value has a significant impact on K-means, which is also one of their main drawbacks. The elbow approach and the statistical gap method are two common ways of determining the K value.
For the elbow method, we can illustrate by Figure 4.
As shown in Figure 4, when k < 3, the curve decreases rapidly. When k > 3, the curve tends to be stable. Through the elbow method, we believe that inflection point 3 is the best value of K.
The disadvantage of the elbow method is that it needs to be seen manually and is not automatic enough, so we have a statistical gap method, which comes from the papers of several scholars at Stanford University, as shown in formula: is the loss function, where refers to the expectation of . The Monte Carlo simulation is commonly used to generate this value. We randomly manufacture as many random samples as the original number of samples based on a uniform distribution in the region where the sample is located and make K-means for this random sample to obtain . So many times, usually 20 times, we can get 20 . The approximate value of is obtained by averaging these 20 values. Gap statistics can then be calculated. The best K is the one that corresponds to the gap statistic’s highest value.
As can be seen from Figure 5, when , the value of is the highest, so the best number of clusters is 3.
(3) Use the Kernel Function. K-means based on Euclidean distance assumes that each data cluster’s data has the same a priori probability and presents a spherical distribution; however, this distribution is not typical in practice . To optimize a nonconvex data distribution shape, we can employ a kernel function. The algorithm is currently also known as the kernel k-means algorithm, which is a type of kernel clustering method. The kernel clustering method uses a nonlinear mapping to map data points from the input space to the high-level feature space before clustering them in the new feature space. Nonlinear mapping increases the probability that data points are linearly distinct. Therefore, when the conventional clustering process fails, the kernel function can be used to generate more precise clustering results.
3.2.2. C4.5 Algorithm
C4.5 is a classification algorithm family employed in machine learning and data mining . Its purpose is to track learning: given a collection of data, each tuple can use the attribute value of each tuple belonging to one type of mutex to define a set of the attribute values of each tuple belonging to one type of mutex. The objective of C4.5 is to comprehend how to determine a property value for a mapping relation’s category and how to use the mapping to classify unknown things into new categories.
J. Ross Quinlan was the first person to propose the C4.5 method, which was derived from ID3. The decision tree is constructed employing the ID3 algorithm. Once the decision tree has been constructed for a particular tuple class label, the leaf node retains the yuan group by following the path from a root node to the leaf node. Decision tree advantage does not require any domain knowledge or parameter setting, making it suitable for knowledge discovery detection.
Typically, a decision tree consists of a root node, several internal nodes, and several leaf nodes. The leaf node corresponds to an attribute book. Based on the results of the attribute test, the sample set of each leaf node is partitioned into subnodes. The path from the root node to the leaf node consists of a judgment test sequence. The root node contains all samples. The objective of decision tree learning is to generate a decision tree that is highly generalizable. Its primary strategy is based on the straightforward “divide and conquer” tactic.
How to select the optimal attribute for a partition is the key to decision tree learning. In general, we anticipate that the samples contained within the branch nodes of the decision tree belong to the same category to the greatest extent possible using the continuous division procedure. In other words, the “purity” of the nodes is rising.
(1) Information Gain. Information entropy is the most commonly used index to measure the purity of a sample set. Let us suppose that we can split the current sample set into different proportion and denote it as ; then we can use equation (2) to compute the entropy of information on D:
It is clear that if the value of is small, we can guarantee that the purity of is relatively high.
Assuming that the discrete attribute has possible values , if is used to divide sample set , branch nodes will be generated, and the branch node contains all samples with on attribute in as , we calculate the information entropy of according to the above formula, and considering the number of samples contained in different branch nodes, give branch node weight , that is, the more samples, so the “information gain” obtained by attribute can be calculated by
To common sense, the larger the “purity enhancement” obtained by applying attribute , the bigger the information gain. As a result, we may use the information gathered to choose the decision tree’s partition properties.
(2) Gain Rate. In reality, qualities with a large number of values are preferred by the information gain criterion (how to use the serial number as the division attribute, and when each thing is regarded as a separate category, the information gain is often very high, but this division is meaningless). To mitigate the potential negative impacts of this preference, the well-known C4.5 algorithm selects the best partition attribute using the gain ratio rather than the information gain. Equations (4) and (5) define the gain rate:
Because the gain rate criterion prefers attributes with a small number of values, the C4.5 algorithm employs a heuristic: first, find the attribute with a higher information gain than the average level among the candidate partition attributes and then choose the attribute with the highest gain rate.
The CART decision tree uses the “Gini index” to select partition attributes. The purity of dataset can be measured by Gini values in
Intuitively, reflects the probability that two samples randomly selected from dataset have inconsistent category marks. Therefore, the smaller the value of , the higher the purity of dataset . The Gini index of attribute is defined as in the following equation:
As a result, we choose the characteristic that minimizes the Gini index after partition as the best partition attribute in candidate attribute set , that is, .
3.2.3. Verification and Evaluation of Data Mining Model
After the classified model is obtained, the test data should be used to judge the correctness of the model. The data used for the test are extracted from the database and are not associated with the sample. This paper mainly uses the constructed data classification model to carry out the test. In order to ensure the accuracy of the test, the k-fold cross-test method is used to carry out the test. That is, the samples are divided into k equally, then the correctness of each group is tested, and then the correctness of the grouping test is divided by the total samples to obtain c4.5.
3.3. System Function Design
As shown in Figure 6, we can split the system into three different modules, including the user management module, authority management module, query module, statistical management, visualization module, and data mining module.
User management module: after a direct login by the system administrator, an ordinary user can be added to the user management module. After a user has been added to the system, he or she logs in using the default password 1234 or the password set by the administrator. After logging into the system, the user modifies the information via the personal information submodule. The first time a user logs in, they are prompted to change their password. After completing the operation, the user can log out.
This module is primarily subdivided into login system authority management and viewing chart results for authority management. The system administrator is responsible for granting permissions. When the administrator adds standard users to the comprehensive evaluation system, he must be authorized to log in. If not, he cannot log in. The administrator sets the permissions to view statistics, data mining results, and case queries. If the administrator does not configure it, the function will be hidden from the homepage, and the chart will be inaccessible.
The system administrator can manage queries by configuring query conditions. Standard users are able to query cases based on the query conditions established by the administrator. For instance, the administrator may assign a task number to a case, and the user may then query the case by entering the task number.
The system administrator can manage and view statistics, whereas standard users can only view statistics. This module consists of three statistical submodules: workload statistics, postevaluation statistics, and problem type statistics. Users of the system can click various buttons to access the corresponding statistics.
This module consists of three submodules: data preparation, algorithm analysis, and viewing results. The administrator operates the data preparation. After importing, the administrator selects the data source and stores it in system memory. The system preprocesses the fundamental data, analyzes the algorithm based on the selected algorithm, obtains the results, and stores them. The saved results are displayed when the front-end view results button is clicked.
The data mining module is a further analysis of data based on the statistical module, which relies on the Weka software’s data mining API. This module analyzes the statistical data based on the statistical outcomes. The following describes the overall procedure for data mining analysis and result viewing: The administrator chooses the export file to upload. After a successful upload, the backend will convert the .xls file to .csv format, convert it to Weka special data in .arff format, obtain the data set, and store it in memory. The administrator clicks the view button on the data mining result view interface, sends the command to begin data mining to the background, uses the data source in memory to construct the model, and then displays the analysis results on the front-end.
tIn the 1990s, data mining technology received considerable attention. It conducts systematic theoretical research on the database’s information and provides a set of corresponding algorithms. Traditional data mining technology examines the extraction of information from a database in a general sense, but there are different state characteristics for different data characteristics and information needs in different fields. As the primary outcome of our paper, the primary objective of this research is to combine the theoretical results of data mining with the actual characteristics of environmental data in order to develop novel data processing methods and improved decision support for environmental data management.
This paper analyzes the research and application status of data mining, the characteristics of environmental information, the development and technical characteristics of environmental information management, the determination of mining objects, and the selection of mining algorithms and conducts specific research. Additionally, the use of spatial data mining technologies in environmental data is briefly discussed.(1)It is possible to implement data mining technologies in environmental data management. It is uncommon, according to the data we have collected, to find data mining techniques employed in environmental data management. Based on the outcomes of data mining technology application in other sectors, this study proposes a set of more comprehensive specialized schemes for the application of data mining technology in environmental information management and analyzes them in conjunction with actual data in order to realize the combination of data mining technology and environmental information management practice.(2)The implementation of data mining technology compensates for the deficiencies of conventional environmental data processing technology. The conventional environmental data processing technology is unable to reveal the valuable information hidden behind large amounts of data, but data mining technology solves this issue very effectively. On the basis of combining the characteristics of environmental data, it can mine all types of knowledge to the fullest extent, in order to provide an essential foundation for decision support.(3)The information packaging method and star model technology are two new technologies of current data warehouse system development that apply the new technology of current data warehouse system development. Based on this, the conceptual model, logical model, and physical model of the environmental data warehouse are designed in this paper.(4)The software architecture of the urban environmental management and control evaluation system is designed using the concept of a three-tier architecture, and the design of each tier architecture is briefly described. The system is then divided into five functional modules based on function, with the design of the two core functional modules of statistical management and visualization and data mining described in detail. In conjunction with the flowchart, it is described in greater detail according to the contents of the flowchart. Lastly, in accordance with the order of functional modules, explain the primary database tables in the form of tables and elaborate on the key fields.(5)Following the completion of the system design, it is time to implement and test the system. The development environment of the system is described first, followed by the detailed implementation of each module according to the functional modules, along with the text description, key code, and implementation effect diagram. The system is then subjected to both functional and nonfunctional testing, with the latter including performance, compatibility, and stability, among others.
The data used to support the findings of this study are available from the author upon request.
Conflicts of Interest
The author declares that he has no conflicts of interest.
P. Alexej and B. Stefan, “Identifying phosphorus hotspots: a spatial analysis of the phosphorus balance as a result of manure application,” Journal of Environmental Management, vol. 214, pp. 137–148, 2018.View at: Google Scholar
L. Y. Dai, L. Q. Wang, L. F. Li, T. Liang, and Y. Zhang, “Multivariate geostatistical analysis and source identification of heavy metals in the sediment of Poyang Lake in China,” The Science of the Total Environment, vol. 621, pp. 1433–1444, 2018.View at: Google Scholar
J. J. Fan, C. Cai, H. F. Chi, J. B. Reid, and F. Coulon, “Remediation of Cadmium and Lead polluted soil using thiol-modified biochar,” Journal of Hazardous Materials, vol. 388, pp. 391–402, 2020.View at: Google Scholar
B. F. Hu, X. L. Jia, J. Hu, D. Xu, F. Xia, and Y. Li, “Assessment of heavy metal pollution and health risks in the soil-plant-human system in the yangtze river delta, China,” International Journal of Environmental Research and Public Health, vol. 14, no. 9, 2017.View at: Publisher Site | Google Scholar
B. F. Hu, J. Y. Wang, B. Jin, Y. Li, and Z. Shi, “Assessment of the potential health risks of heavy metals in soils in a coastal industrial region of the Yangtze River Delta,” Environmental Science and Pollution Research, vol. 24, no. 24, Article ID 19816, 2017.View at: Publisher Site | Google Scholar
N. Mikulic, V. Orescanin, V. Legovic, and R. Zugaj, “Estimation of heavy metals(Cu, Zn, Pb) input into Punat Bay,” Environmental Geology, vol. 46, pp. 62–70, 2004.View at: Google Scholar
G. Nebula, H. Origin, and M. Diamond, “Assessment of Lead, Cadmium, and Zinc contamination of roadside soils surface films, and vegetables in Kampala City, Uganda,” Environmental Research, vol. 101, no. 1, pp. 42–52, 2006.View at: Google Scholar
H. Pelkey and G. Dogan, “Application of positive matrix factorization for the source apportionment of heavy metals in sediments: a comparison with a previous factor analysis study,” Microchemistry Journal, vol. 106, pp. 233–237, 2013.View at: Google Scholar
M. Ueno, “Data mining and text mining technologies for collaborative learning in an ILMS”samurai,” in Proceedings of the IEEE International Conference on Advanced Learning Technologies (ICALT04), IEEE Computer Society, pp. 1052-1053, Joensuu, Finland, August 2004.View at: Google Scholar
A. Netz, S. Chaudhuri, U. M. Fayyad, and J. Bernhardt, “Integrating data mining with SQL databases: OLE DB for data mining,” in Proceedings of the 17th International Conference on Data Engineering, IEEE Computer Society, pp. 379–387, Heidelberg, Germany, April 2001.View at: Google Scholar
J. Hsu, “Critical and future trends in data mining: a review of key data mining technologies/applications,” Data mining: Opportunities and Challenges, pp. 437–452, 2003.View at: Google Scholar