Abstract

In order to improve the effect or characteristics of Python data analysis and attribute information extraction, a method for intelligent decision system is proposed. The content of the method is as follows: to create a big data mining model for optimal decision making, to use smart data integration method to integrate data functions, to reconstruct management data evaluation information, and to extract and use management multidimensional information parameters regularly. Characterization methods multidimensional information breakdown and optimization of features are performed, characteristics are classified according to their differences, and management and decision-making optimizations are implemented. Based on linguistic Python, combined with rich and powerful libraries such as regular expression, urllib2, and Beautiful Soup, this paper discusses the methods of building modular web data collection, HTML parsing, and capturing link data. The experimental results show that there is a certain gap in the decision support time of the three methods with the change of iteration times. Among them, the decision support time of the method in this paper is always less than 2 s, while the time of the other two methods is longer. Compared with the other two methods, the decision support time of the method in this paper is shortened by about 1.7 s and 3.1 s, respectively. This is because this method classifies the data attribute gap in decision support, which saves the time of decision support. It is verified that the method in this paper can carry out decision support quickly and has certain reliability. It is proved that intelligent decision system can effectively improve Python data analysis and attribute information extraction.

1. Introduction

The optimized decision support system architecture based on intelligent data analysis has the advantages of big data mining and potential law discovery, fast calculation and temporary storage of multisource heterogeneous data streams, and management information scheduling. Based on the system efficiency of intelligent data analysis for optimal decision-making, the whole chain operation and maintenance mechanism with optimal decision-making support data acquisition, integration, pooling, and perceptual decision-making is constructed, and the overall framework of optimal decision-making support system is designed. In order to optimize the decision-making and information management and improve the efficiency, the overall framework of the decision support system is constructed, as shown in Figure 1 [1]. In Figure 1, the overall framework of service quality optimization decision support system and the optimization decision support system based on intelligent data analysis designed in this paper includes four modules: big data mining, fuzzy decision model, management and control information scheduling, and data analysis module of optimization decision. Among them, the main task of big data mining module is to make big data statistics and obtain the main data characteristics [2].The fuzzy decision model module mainly extracts the features of the data collected by the big data mining module through fuzzy decision; among them, the main function of the big data mining module is to perform big data statistics and obtain basic data properties; The fuzzy decision model module mainly uses an uncertain decision-making method to decompose the properties of the data collected by large data mining module; the main function of management and control is the information-based scheduling module. The task is to recover the intelligent data of the information evaluated by the management informatization and to obtain the multidimensional information parameters of the management. Optimal decision-making data analysis modules mainly classify characteristics according to differences and optimize management and decision-making [3]. The system functions are realized through the model construction of these four modules. Linguistic Python is an interpretative computer language, which is popular for its concise syntax style, strong community support, and many mature libraries applied in various fields. It runs based on linguistic Python interpreter and currently supports all mainstream operating systems, including but not limited to UNIX, Microsoft Windows, and Linux. In terms of syntax style, it also supports procedural programming like C language, as well as object-oriented programming like C + + and Java [4]. At the same time, its MapReduce function also supports functional programming like Lisp. In terms of community support, linguistic Python is currently financially supported by the Python software foundation, and growing free programmers have joined its development community. At present, there are two stable versions of Python interpreters, Python 2 and Python 3. Among many languages, Python’s community foundation is very outstanding. The development of various scientific computing libraries has been in a state of rapid development as early as around 2000 [5]. In terms of library support, due to the simplicity of its syntax, Python has a considerable number of libraries in other interdisciplinary fields in addition to simple computer disciplines, such as scientific computing, geographic mapping, and the construction of the most popular deep neural network. These characteristics of Python language make it an excellent choice for small and medium-sized projects that develop specific functions across disciplines described in this paper.

2. Literature Review

Liu and others first proposed the concept of formal background of decision-making, which expanded the research direction of rule extraction. For general decision information systems [6], Wang and others proposed a concept of generation from the formal background of decision-making and then derived an algorithm without redundant rules. The algorithm reduces the complexity of the algorithm to a certain extent, but in some cases, there are still redundant attributes in the obtained rules [7]. Fedyaev and others granulated the attributes and objects in the formal background, reducing the scale of the formal background and the complexity of the algorithm, but the granulation makes the algorithm lose the accuracy of knowledge reduction to a certain extent [8]. Li and others studied if then rules based on formal concept analysis and proposed a nonredundant rule acquisition algorithm. The algorithm obtains rules according to the relationship between conditional concept lattice and decision concept lattice, but it has low applicability for rule extraction of large-scale data sets. By introducing the concept of decision premise [9], Shkapov and others proposed the decision implication canonical basis. Although it can effectively inhibit the generation of redundant rules, the generation of decision implication canonical basis presents exponential complexity [10]; Abed-Elmdoust and others proposed a decision implication canonical basis generation algorithm based on the real premise, which effectively reduces the complexity of the algorithm [11]. Reshadat and others introduced formal concept analysis into granular computing theory, proposed the concept of fuzzy information particle based on the information particle described by formal concept, and discussed the relationship among necessary information particle, sufficient information particle, and sufficient and necessary information particle in detail [12]. Xiao and others believe that the initial solution method is to use the rule-based expert system, that is, first obtain knowledge from domain experts and then model and embed an effective reasoning engine to use domain knowledge under different decision-making conditions. KBS depends on the valuable, concise, and high-quality knowledge of domain experts [13]. Liu and others used the CNN (convolutional neural network) neural network to decipher the sentence features of the task. The two neural network structures, RNN and CNN, are the two most effective ways to automatically learn sentence features for in-depth study. This unique method of self-study has made great strides in other areas of artificial intelligence. This is very important for the development of natural language. In other words, there is no need to rely on grammar analysis tools such as the traditional NLP (natural language procedure), which attracts the attention of scientists because they can automatically learn the features of a sentence [14].

Based on the current research, this paper proposes a method for intelligent decision system. The specific content of this method: The optimization decision making big data mining model is built, intelligent data fusion method is adopted to improve the characteristics of data fusion, smart management informationization evaluation information data is refactored, multi-dimensional information management parameters is extracted, frequent items feature decomposition method is used to optimize information of multidimensional decomposition and feature extraction. According to the differences in classified attributes, optimize management and decision-making. Based on Python, combined with positive expression, urllib2, Beautiful Soup, and other rich and powerful libraries, we explore the method of building modular web data collection, HTML parsing, and fetching linked data.

3. Information Extraction Based on Intelligent Decision System

3.1. Big Data Mining Module

In order to implement data mining and information management in intelligent decision-making systems, it is necessary to first create a large data statistical analysis model, introduce a fuzzy correlation constraint method to support data optimization decisions, and develop a fuzzy data parameter model. Limit data relevance and introduce association rule mining methods [15, 16]. Big data mining and data database design, combined with the level of information extraction in the smart decision-making process, is a smart decision-making system designed to integrate functions and schedule adaptations. The information extraction information system design is combined with fuzzy information scheduling technology to optimize the decision support process and obtain the feature quantity of optimal decision support. represents the cross-correlation characteristic quantity of the database. Using the statistical analysis method, it is obtained that the fuzziness function of the optimization decision support is , the constraint index parameter set of the optimization decision support is VI, and the sample set is

represents the distribution set of optimal decision support scheduling, represents the joint correlation feature of optimal decision support, and the spatial resolution function of optimal decision support scheduling is

Using the method of correlation feature resolution detection, it is obtained that the ambiguity function of the resource base to be supported by decision is

In equation (3), is the quantitative set of optimization decision, and is the similarity characteristic quantity.

3.2. Decision Model Module Based on Data Analysis

The fuzzy decision model of information base management is established. On this basis, the association rule distribution sets and for optimizing decision support are established, and the association distribution relationship is obtained as follows:

The expressions of and are substituted into the above equations to establish the decision function of optimization decision support. The information base model of information management and the statistical analysis model of optimization decision are obtained by multiple regression analysis method as follows:

For the big data analysis model of management, the association feature quantity of optimization decision is extracted, and the fuzzy association feature detection method is used for information distributed detection. The attribute set of information base management can be obtained as follows:

Build an information fusion model to optimize decision support and calculate the similarity feature of decision [17]. Identified as C (Y), the template function of optimal decision support can be obtained as follows:

Combined with the construction of the difference function of optimization decision support, the evaluated fitness function can be obtained: where is the ambiguity function of management decision, is the distribution feature set, and the time information code is:

The distributed information coding technology is used to obtain the distributed feature quantity, the intelligent data fusion method is used for data feature fusion, and the intelligent optimization model for optimization decision is established. The optimization iterative formula is: where is the dimension of information distribution, is the statistical feature of information distribution, and and , respectively, represent the frame difference function. According to the construction of decision model, optimization decision support and optimization control are carried out.

3.3. Management Information Dispatching Module

The results of the management informatization assessment data extraction show that the management informatization assessment data is being intelligently reconstructed and a quantitative analysis model is being developed for making informed decisions [18]. The statistical function of decision optimization is

The above formula is the constraint index parameter set of management, which is a standard normal distribution function. Using the dynamic optimization method, an optimization decision function composed of characteristic decision variables and characteristic variables is constructed. The mathematical model of optimization decision is shown as follows:

Through the similarity fusion method, the optimization decision and information fusion are carried out, and the optimization decision model is obtained as follows:

The membership degree of the optimization decision-making subordinate is recorded as , and the multidimensional information parameters of management are extracted. The frequent item feature decomposition method is used for multidimensional decomposition and feature optimization extraction of information.

3.4. Tool Introduction

For data analysts, it is important to be aware of the many libraries integrated into linguistic Python. Data analysis is generally divided into the following stages: obtaining, storing, reading, calculating, describing, and analyzing. Therefore, understanding the main library is an important part of data analysis [19, 20].

3.4.1. NumPy

NumPy is the basic module for computing in linguistic Python. It can also deal with large matrices. NumPy’s data structure capacity can store any type of data, so NumPy can integrate all kinds of data, and its performance is much higher than Python’s nested list structure. Therefore, when using Python for data analysis, most modules of scientific computing will use NumPy library.

3.4.2. Pandas

Pandas are basic modules for reading, storing, and configuring data structure types in linguistic Python. Due to Panda’s flexibility, it can be more efficient for data processing in Excel. Read Excel spreadsheets, select specific columns, Excel spreadsheet data, convert data types, and more.

3.4.3. Matplotlib

Matplotlib is a description module in linguistic Python. Data representation is an essential module for data analysis to make data easier to observe, to benefit users, learners, and analysts, and to better understand the potential value of data. Common types of representations include bar charts, bar charts, pie charts, and scatter charts.

3.4.4. Pyecharts

Pyecharts is a class library for creating ECharts diagrams. ECharts is an open-source JS database from Baidu. You can create dynamically selectable images with beautiful, changing visual effects. It can be used by data analysts to demonstrate.

4. Experimental Results and Analysis

Python has simple syntax and is very close to natural language. It has more than 100,000 third-party libraries covering website development, operation and maintenance, data processing, and other fields, which greatly simplifies the difficulty of development and is very suitable for liberal arts students to learn and develop. The application of Python in corpus research mainly focuses on the segmentation, text cleaning, part of speech assignment, and retrieval of English text based on NLTK (Natural Language Toolkit) processing package. The web data acquisition technology based on Python has been very mature. Some scholars even demonstrated Python teaching cases for humanities and social science majors in detail. Therefore, this paper tries to use Python to obtain the information of 1000 words on the web page of an online dictionary, and extract the phonetic symbols and writing symbols of the words in the web pages as an example to introduce the process of rapid acquisition and extraction of corpus. Requests and Beautiful Soup can directly retrieve and save the specified information of the web page, which is the most efficient. However, due to the slow processing speed of one page, network exceptions may occur. Secondly, if you need to obtain other content, you need to retrieve the web page again. So it is safest to do two separate, slower but least network-influenced steps: use requests to get all the web page information, save it as a local file, and then use Beautiful Soup to extract the specified information.

Using Python language to carry out data analysis includes the following basic processes: (1) For demand confirmation, in the process of data analysis, we should first clarify the data use requirements, such as financial data analysis and process flow analysis, so as to deeply mine the data characteristics and potential utilization value by adopting appropriate data analysis methods according to the characteristics of target data. (2) For data acquisition, after clarifying the data analysis requirements, the target data shall be collected as comprehensively as possible, and the specific means can be local acquisition, web crawler acquisition, etc. Among them, Python language is used for programming to legally obtain relevant data in the network, so as to meet the data acquisition requirements in big data analysis. (3) For data preprocessing, before formal data analysis, data preprocessing must be completed firstly, specifically by merging, cleaning, transforming, and standardizing the target data to meet the needs of subsequent modeling and analysis. In this process, the data quality can be improved, so as to improve the efficiency of data analysis. (4) In modeling and optimization, modeling is an important link in the process of data analysis. Specifically, the target data can be processed by establishing clustering model, association rules, intelligent algorithms, etc. After modeling, its performance shall be evaluated and optimized to ensure that it is suitable for actual data analysis conditions. (5) After the data analysis is completed, the results shall be displayed and output in visual form to facilitate the users to use the data results [21].

Following the modular design idea, the whole system is divided into modules, and each functional class is regarded as a functional module. The advantage of this is that on the one hand, it is easy to maintain the code and, on the other hand, it can increase the reusability of the code. With the help of Python’s powerful class library, the functions of each module are realized: HTTP communication and data interaction with the server using urllib2; regular expressions and Beautiful Soup are used for text extraction and HTML parsing. The process is shown in Figure 2.

Based on Python, establish a connection, analyze the page characteristics, and write a link path constructor to get the link, as shown in Figure 3.

Several data sets are tested below to verify the correctness and effectiveness of the proposed algorithm. Some common data sets in UCI data set are selected, and the data sets are discretized by Rosetta software. Then, the proposed algorithm (denoted as algorithm A), rule extraction algorithm based on classification consistency rate (denoted as algorithm B), rule extraction algorithm based on granular matrix (denoted as algorithm C), and other algorithms (denoted as algorithm D) are applied to test the data set, respectively. The experimental process is as follows: 8 groups of UCI data sets (as listed in Table 1) are selected, and 4 algorithms are used to obtain rules for the data sets, respectively, and the number of reduction rules, rule length, and running time obtained by each algorithm are recorded. The comparison results are shown in Figures 46, respectively, and the running time pairs of each algorithm are listed in Table 1. The algorithm in this paper extracts rules by extending the decision information system to the formal background of decision. According to the proposition , decision information system can be extended into decision formal background. Therefore, the decision information system was used as the input of the algorithm, the simplest rule set was used as the output of the algorithm, and the running time of each algorithm was recorded. Moreover, the experimental process covered the data processing, core data structure construction, rule extraction, rule redundancy removal, and other processes of each algorithm.

The correct recognition rate is the correct probability of the whole recognition of each data set by the obtained rule set [22]. The specific process is as follows: 50% of each data set is randomly selected as the training sample, and each algorithm is applied to obtain the rules of the training data set and record their own rules, and then the whole identification of each data set is carried out. As can be seen from Table 1, on the same data set, algorithm A has an obvious time advantage in the process of acquiring rules compared with algorithms B, C, and D, and can achieve the goal of acquiring rules in a shorter time. In addition, it can be seen from Figures 46 that algorithm A has fewer rules and shorter rule length under most data sets, but the recognition rate is similar to that of other algorithms, which indicates that the rule set of algorithm A has stronger representation ability. The advantage of algorithm A is that it takes the formal background as the research object and defines the formal vector. Then according to the knowledge granularity and heuristic operator, rules are extracted from the form vector so as to ensure the nonredundancy of rules and minimize the number of rules. From the point of view of covering domain, it also guarantees the minimum length of rules and thus has a high recognition rate. At the same time, the operation not only reduces the redundant calculation, but also greatly reduces the actual time cost of the algorithm and reduces the spatial complexity of the algorithm to a certain extent. The data given in Figure 7 is used as input for optimization decision support, and multidimensional information parameters of management are extracted. Frequent term feature decomposition method is used for multidimensional information decomposition and feature optimization extraction, and statistical feature extraction results are obtained, as shown in Figure 8.

To further validate the reliability of the method in this paper, the reliability of the method, the CPT-FDR method, and the decision support of the PSO method were tested experimentally, and the obtained reliability is shown in Figures 912.

According to the analysis of Figure 7, the optimization decision support can be effectively realized by using the method in this paper, and the dynamics of statistical feature distribution is good. It shows that the adaptability of quantitative evaluation is strong, because this method reconstructs the intelligent data of management information evaluation information, analyzes the quantification of optimization decision support, and realizes the free scheduling of information.

According to Figures 912, the reliability of the three methods for optimization decisions varies with the number of iterations, and the reliability gap is large. Among them, the decision support reliability of the method in this paper is higher than that of CPT-FDR method and PSO method. Compared with the other two methods, this method improves the reliability by about 5.8% and 8%. This is because before the optimization decision, the method uses the intelligent data analysis method to preprocess the relevant data and extract the data features, so as to ensure the reliability of the data and improve the reliability. In order to further verify the feasibility of this method, the time of decision support of the method, CPT-FDR method, and PSO method are experimentally analyzed. The experimental results are shown in Figures 1315.

By analyzing Figures 1315, it can be seen that there is a certain gap in the decision support time of the three methods with the change of iteration times. Among them, the decision support time of the method in this paper is always less than 2 s, while the time of the other two methods is longer. Compared with the other two methods, the decision support time of the method in this paper is shortened by about 1.7 s and 3.1 s, respectively. This is because this method classifies the data attribute gap in decision support, which saves the time of decision support. It is verified that the method in this paper can carry out decision support quickly and has certain reliability.

5. Conclusion

Using methods to integrate data functions, reconstruct management data evaluation information, generate management multidimensional data parameters, and constantly unpack multidimensional properties by creating a large data mining model for optimal decision-making. To achieve optimization of feature classification, management, and decision making according to the extraction and differentiation of feature optimization. Using Python’s custom regular expressions and a modular web data collection in combination with rich and powerful libraries such as urllib2 and Beautiful Soup to retrieve link data. When the method described in the article is used to make optimal decisions, optimal decision support can be effectively implemented, and the distribution of statistical characteristics becomes more dynamic. When the method described in the article is used to optimize decision support, the reliability of decision support can reach 99.3%, which is 5.8% and 8% higher than the traditional method. When the method described in the article is used to optimize decision support, the decision support time is relatively short, always less than 2 seconds. The results of the experiment confirm that the method described in the article is of great importance for optimizing management decision-making, improving the reliability of rational decision-making, and using it in management and informatization schedules.

Data Availability

No data were used to support this study.

Conflicts of Interest

The author declares that there is no conflict of interest with any financial organizations regarding the material reported in this manuscript.