Abstract

Present-day enterprise accounting solutions have been developed to a certain extent to provide authenticity of accounting information and to provide modules for billing, pay role, general ledger, and more, but they come with certain problems such as distortion of accounting information, incomplete selection of indicator variables, and the limited and single use of identification methods. Based on this, this study starts with two points. The first is to give the concepts of decision trees and support vector machine (SVM) in data mining. Then, the accounting distortion information identification model is constructed based on this, and the model effect is verified by setting experiments. The second is to establish a regression model on the relationship between enterprise strategy and accounting information quality to further explore the factors that affect the quality of enterprise accounting information. The following are the research results: (1) The accuracy rates of classification and identification of training set data, overall data, and test set data using the SVM-based identification model are 99.19%, 96.21%, and 94.8%, respectively. (2) The average identification rate of the sample data is 88.5% using the identification model based on the decision tree. (3) The regression coefficients of enterprise strategy and accounting information quality are −0.053 and −0.054, respectively without considering the industry and year variables and with considering the industry and year variables, both of which are negative at the 0.1 significance level. The purpose of this study is to use data mining to achieve high-quality identification of enterprise accounting information and provide some references for enterprises to choose or formulate relevant development strategies.

1. Introduction

Accounting is a method of identifying and recording, analyzing, summarizing, and interpreting the financial information of an organization. The data generated because of a business entity’s transactions is called accounting information. Once this information is identified, it is classified into different categories and recorded so that it can be used in different reports. Accounting information is very useful for the stakeholders of the organization as it acts as a medium through which the organization can communicate with the internal and external world. Accounting information systems are fed this information to be processed using computers. The Accounting activity is recorded and tracked with the help of information technology systems and eventually takes the form of reports used both internally within the company and externally.

Organizations all over the world use different software for enterprise accounting that allows these organizations to keep tabs on financial transactions and related data. Enterprise accounting software generally includes specific modules for accounts payable and receivable, general ledger, billing, and pay role. Enterprise accounting software may be deployed on-campus, or it may be a cloud-based option. It can increase productivity by introducing automation and enhancing visibility and cross-department collaboration, hence streamlining processes. It assists the top management by helping the formulation of enterprise strategy and monitoring business performance.

An important factor in the domain of enterprise accounting is the requirement of reliability and completeness of accounting information. This helps to authenticate the information and to formulate accurate reports that reflect an accurate picture of the financial standing of an organization consequently improving the credibility of the organization. The relevant methods and technologies for identifying the authenticity of accounting information abroad are mature and abundant. Numerous models have been developed for identifying accounting information. Typical ones are the regression model, multivariate discriminant model, neural network model, and text mining model in recent years [1]. However, the logistic regression model and various neural network methods are much studied and used [2].

This study presents an overview of the related works. Then, the methodology of the Identification and Strategic Management of Enterprise Accounting Information using Data Mining Technology is presented. After the proposed approach has been discussed in detail, results are discussed and analyzed. In the end, the conclusion and future scope of the study are discussed.

2. Literature Review

Domestic research starts late as compared to foreign research on the identification of accounting information distortion of listed companies. Domestic research mainly focuses on the analysis and research of financial indicators, including profitability, solvency, operating ability, development ability, cash flow, and asset quality [3]. With the emergence of many financial fraud scandals in China, domestic scholars have conducted in-depth research on financial fraud cases, fully exposing the shortcomings of enterprise governance. Subsequently, domestic scholars also begin to focus on nonfinancial indicators such as enterprise governance [4].

Foreign scholars’ research on enterprise strategy is as follows. Some scholars believe that enterprise strategy deviation is positively correlated with enterprise performance and information quality. Enterprise performance is measured using the return on assets and sales growth rate through factor and cluster analysis [5]. For example, some studies have found that adopting a deviated enterprise strategy will positively impact enterprise performance [6]. Moreover, some studies believe that the reason for the uncertainty of enterprise performance is that the enterprise mainly adopts the industry conventional strategy when its strategy is chosen [7]. Some scholars at home and abroad have researched differences in enterprise strategies, such as the relationship between enterprise strategy and over-investment in Chinese Shanghai and Shenzhen A-share listed enterprises. Studies have shown that enterprise strategy is significantly positively correlated with over-investment, and offensive strategy more greatly impacts over-investment than defensive strategy [8]. In addition, it is found that there is a significant positive correlation between the two in the research on the relationship between enterprise strategy and excess on-the-job consumption. The more aggressive the enterprise’s overall strategy, the greater the scale of excess on-the-job consumption. If the enterprise’s overall governance mechanism is good, or the enterprise’s top managers have high management skills, the impact of enterprise strategy on excess consumption will be inhibited [9].

According to the collected literature, first, although the accounting information identification technology is mature, there are problems, such as distortion of accounting information, incomplete selection of indicator variables, and the use of identification methods is limited and single. Second, the existing literature has sufficiently researched the strategic choice of enterprises and the quality of enterprise accounting information, but there is still less research on the relationship between enterprise strategy and the quality of enterprise accounting information. Based on this, this study uses data mining technology to build an accounting distortion information identification model and set up experiments to verify it. A correlation regression model is established according to the relationship between enterprise strategic management and enterprise accounting information quality, and the results of the model are analyzed by statistical software. The innovation is to use the support vector machine (SVM) technology and decision tree algorithm in the data mining technology to build the accounting distortion information identification model. This study is aiming to achieve good identification of enterprise accounting distortion information and provide a reference for further exploration of the relationship between enterprise accounting information quality and enterprise strategy.

3. Methodology

3.1. Application of Data Mining in Accounting Information Distortion
3.1.1. Accounting Information

Accounting information is defined as “the general term for various acceptable and understandable news, data, and materials, which reflect the past, present, and future of the accounting subject about the flow of funds through actual accounting records or scientific predictions” [10]. Accounting information reveals the enterprise’s financial status, operating results, and capital changes to external information users through financial reports or other forms. Accounting information is not only an important carrier in the process of accounting records but also a basis for the enterprise’s internal business performance evaluation and investment decision-making. Accounting information is the basic condition to ensure the effective operation of the securities market. The most basic requirement of accounting information quality is to ensure the authenticity of accounting information. An important criterion for evaluating the work quality of an accounting information system is whether the accounting information is distorted [11].

3.1.2. Accounting Information Distortion

Distortion of accounting information refers to the formation of accounting information that cannot correctly reflect the financial and operating conditions of the accounting subject under the principle of violating the principle of objective authenticity [12]. Financial reports based on distorted accounting information are misleading and may lead to wrong decisions by external information users such as investors and creditors. Accounting information distortion is divided into unintentional distortion and intentional distortion [13]. Intentional distortion refers to financial fraud or accounting cheating that is deliberate and driven by the personal interests of the staff who is in charge of basic accounting information. This results in a deviation between the real and reported situation. Unintentional distortion refers to calculation mistakes that are unintentional in nature. It may arise from human error while accounting calculations. This also results in a deviation between the real and reported information. [14].

3.1.3. Reasons for Distortion of Accounting Information

The major share of accounting distortion is contributed by either cheating or mistake in calculation. Mistakes are usually a result of an accountant’s carelessness or incompetence while accounting cheating is driven by an accountant’s personal interests [14]. The reasons for the distortion of accounting information can be attributed to external reasons and internal reasons [15]. The specific content of the external reasons is shown in Figure 1.

The specific content of the internal reasons for accounting distortion is revealed in Figure 2.

3.2. Data Mining

Data Mining is the process of sorting through large datasets to find correlations, patterns, and anomalies to predict outcomes by creating and testing models. The amount of data being produced is being doubled approximately every two years in 90% of data that is unstructured data. Data mining is being used to improve organizational decision-making through data analysis. Data mining techniques can be broadly divided into two categories based on their utility. They can either be used to describe specific datasets or using machine learning algorithms, and they can predict outcomes. The process of data mining involves numerous steps ranging from data collecting to visualizing valuable information. Usually, there are four main steps involved: setting business objectives, data preparation, model building, and pattern mining and evaluation of results. There are various data mining algorithms and techniques available to transform raw data into useful information such as association rules, SVM, neural networks, K-nearest neighbor, and decision trees. We will use SVM and decision tree in this study for the identification and classification of information [16].

3.2.1. SVM

SVM was first proposed in 1995. SVM can be applied to pattern classification and nonlinear regression, and its theoretical basis is the Vapnik–Chervonenkis theory of statistics [17]. The main idea is to find a classification hyperplane that maximizes the separation margin between two classes of samples, and it is used as a decision surface. In short, SVM is to achieve the minimum value of structural risk [18]. After the 1990s, with the development of statistical learning theory and machine learning methods, such as neural networks, SVM has received extensive attention and began to develop rapidly [19]. SVM is widely used in other machine learning problems, such as function fitting due to its perfect advantages in solving small-sample, nonlinear, and high-dimensional pattern identification problems [20]. For pattern classification problems, SVM has good generalization capabilities. Figure 3 shows its advantages.

The SVM learning method consists of three models. The first is a linearly separable SVM. A linear classifier is learned by maximizing hard intervals when the training sample data are linearly separable. The second is a Linear SVM. When the training sample data are linearly separable, it means that the constraints of the function interval cannot be satisfied. A linear classifier is learned by introducing a maximizing soft interval or slack variable. The third is the nonlinear SVM. When the training sample data are inseparable, the hyper-surface model of the input space corresponds to the SVM of the feature space through nonlinear transformation. Nonlinear SVM is learned using kernel functions and soft margin maximization. Ultimately, SVM achieves global optimization by constructing an optimal classification hyperplane in the attribute space using the principle of structural risk minimization [21].

3.2.2. Decision Tree

Decision trees are similar to tree-like flowcharts, and they are simple to understand and widely used. In a decision tree classification process, the top level of the tree is the root node. Going down is the non-leaf node. Each variable data starts from the root node and travels to different non-leaf nodes according to different attributes. The last is the leaf node. Each leaf node is a class label. The variable data start from the non-leaf node and also travels to different class labels according to different attributes [22].

An unknown variable X is given, and its class label is determined through a decision tree. First, the variable X starts from the root node and enters the corresponding non-leaf node according to different attribute judgments. Then, the variable X is reclassified to the corresponding leaf node according to the different attributes of the non-leaf node. The way to classify unknown variables into corresponding class labels through this tree-like judgment path is called decision tree classification [23]. The classification process of the decision tree is revealed in Figure 4.

3.3. Overall Process of Accounting Information Distortion Identification Model

According to the relevant theory of accounting information distortion, the specific content of the overall process of the accounting information distortion identification model is given, as shown in Figure 5.

3.4. Information Distortion Identification Model Based on SVM

SVM is trained on variable data, and finally an output value is acquired. It is also the training dataset accounting information distortion identification result. The SVM is a learning machine with a three-layer grid structure, multiple inputs, and single output. Figure 6 shows its architecture.

In Figure 6, K is the kernel function. There are four main types, namely linear kernel function, polynomial kernel function, radial basis kernel function (RBF), and two-layer perceptron kernel function. In this study, the radial basis kernel function is selected through the research of other scholars and the comparison of various kernel functions. RBF kernel is one of the most generalized forms of kernels and is widely used due to its similarity to the K-Nearest neighbor algorithm and possesses its advantages of overcoming the space complexity problem. For two points X and Xi, it calculates how close they are to each other.

In the data training process, the cross-validation method is used to find the best SVM train penalty parameter c and kernel function parameter. Under this method, it is possible to find the parameters c and that enable the training set to achieve the highest classification accuracy under the idea of cross-validation [25].

3.5. Information Distortion Identification Model Based on Decision Tree

This study identifies the process of accounting information distortion based on the decision tree and uses the decision tree to find the relationship between accounting information distortion and certain characteristics to achieve the purpose of identification. The decision tree identification model is constructed using the training set data and test set data randomly generated by the SVM identification model. Then, the division rules of the training results are generated through the pruning optimization of the decision tree. In the constructed decision tree model, one and two are used to represent distorted accounting information and non-distorted accounting information on its leaf nodes respectively.

3.6. Simulation Experiment
3.6.1. Data Sample Source

The samples for this study are from national databases. In this database, the data of A-share listed companies in the past five years is selected as the research sample. According to the industry, a 1 : 1 ratio of listed companies with distorted accounting information and non-distorted listed companies is carried out. Besides, 102 companies with 1362 data records are finally selected as research samples to ensure non-repetitive samples and high data integrity of the indicator system. Among them, there are 51 samples with distorted accounting information and 51 samples without distortion.

In the experiment of the accounting information identification model based on SVM, the parameter of cross-validation is set to 5, and the optimal parameters c and are set to be 64 and 1.52, respectively. The cross-validation accuracy at this time is 80.27%. Then, a classification and identification model based on SVM is obtained through the MatLab platform. The model is used to identify the training set data, the overall indicator variable data, and the test set data. In addition, 500 pieces of data in the obtained 1362 records are used for simulation experiments on the MatLab platform.

3.7. The Influence of Enterprise Strategy on Accounting Information
3.7.1. Enterprise Strategy

Enterprise strategy is a general term for all strategies of an enterprise. Enterprise strategy can be divided into several types according to the level and angle of its planning. Enterprise strategy is an integrated planning process from top to bottom. The enterprise strategy from top to bottom is the strategy of the company level, functional level, business level, and product level. From the classification of overall enterprise strategy, enterprise strategy can be divided into growth strategy, stabilization strategy, and contraction strategy.

3.7.2. Proposition of Hypotheses

Enterprises at different stages have different requirements for the quality of their accounting information, so the quality and efficiency of external accounting information users are also different. At the level of information efficiency, enterprises that choose a growth strategy often mean that the enterprise deviates from industry experience and expert opinions, which will bring efficiency risks. This will lead to distortion of its accounting information, which cannot reflect the situation of the enterprise. Enterprises that choose a contractionary strategy may experience shrinking management and financial staff, and some departments will suspend operations. As a result, accounting information does not fully reflect the true operating status of the enterprise. Enterprises that choose a stable strategy mean that the quality of enterprise accounting information is stable. Reliable accounting information is close to the industry, with high quality and credibility. Hypotheses are proposed based on this:H1: a stable strategy will improve the quality of enterprise accounting information with other conditions remaining unchanged

4. Proposed Model

4.1. Measurement of Enterprise Strategy

This study uses four variable indicators to measure enterprise strategy. These four indicators are the proportion of enterprise research and development expenditure to enterprise sales revenue (X1), the historical growth rate of enterprise sales revenue (X2), the proportion of the total of enterprise sales and management expenses in enterprise sales revenue (X3), and the proportion of fixed assets in total assets (X4). Here, the 1-X4 treatment method is used for measurement.

This study optimizes the above theory and averages the annual values of X1, X2, X3, and 1-X4 in the past five years.

From the previous equation, the enterprise strategy measurement index XT of 1423 enterprises in the past five years is obtained. The larger the value of XT, the more aggressive the enterprise’s strategy is. The smaller the value of XT, the more stable the enterprise is.

4.2. Control Variables and Interpretation

The selection of control variables is important for the correctness of empirical research conclusions. Therefore, the control variables of the empirical research in this study are shareholder connectivity, the number of managements, consistency of independent directors’ office areas, assets and liabilities, return on total assets, and enterprise size starting from the operability and rationality of the research and referring to the existing research progress. Considering the effects of time and industry, this study controls for the year and industry variables. The specific meanings and calculation methods of explained variables, explanatory variables, control variables, and year and industry variables mainly used in this study are shown in Table 1.

4.3. Model Establishment

In the empirical regression model part, the Ordered Probit regression model is first established to study the relationship between enterprise strategy and accounting information quality. The main regression model is divided into model (1) and model (2) according to whether the industry and year are controlled.Model (1):Model (2):

In the above model, represents the accounting information quality of the listed company i in year t. , , and respectively represent the enterprise strategic choice of the listed company i in year t, the proportion of the largest shareholder of the listed company i in year t, and the interaction of the first shareholder’s shareholding in the enterprise strategic choice of the listed company i in year t. , , , , , and represent control variables shareholder connectivity, the number of management, consistency of independent directors’ office areas, assets and liabilities, return on total assets, and enterprise size. represents the residual term. represents the constant term coefficient. represents the coefficient of the explanatory variable. (, ) represent the coefficients of the control variables. Furthermore, in the model (2), IND and YEAR represent controls for industry and year, respectively.

4.4. Simulation Experiment

The explained variables come from the information disclosure rating module of the Shenzhen Stock Exchange. Except for the accounting information quality data, other research data mainly come from the Guotai’an research database. The explanatory variable enterprise strategy is calculated by summarizing the indicators of the Guotai’an database. According to the above data screening criteria, this study obtains a total of 1423 “company/year” observations in Shenzhen. Descriptive statistics and regression analysis are performed on the data through the Statistical Product Service Solutions (SPSS) platform.

5. Results and Analysis

5.1. The Effect of the Accounting Information Identification Model Based on Data Mining

Figure 7 shows the results obtained by training the training set data, the overall data, and the test set data through the MatLab platform.

From Figure 7, the identification accuracy rates are 99.19%, 96.21%, and 94.8%, respectively, when this model is used to classify and identify training set data, overall data, and test set data. The data are all greater than 90%, indicating that the model positively affects the classification and identification of accounting distortion information.

5.1.1. The Effect Based on the Decision Tree

Figure 8 reveals the results obtained by training the decision tree-based accounting information identification model on the MatLab platform.

From Figure 8, for 110 pieces of distorted sample data, the number of confirmed diagnoses obtained by this identification model is 89, the number of misdiagnoses is 21, and the identification rate is 81%. For the undistorted 390 pieces of data, the number of confirmed diagnoses obtained by the identification model is 373, the number of misdiagnoses is 17, and the identification rate is 96%. In general, for 500 sample data, the average identification rate of the model is 88.5%. The above data show that the accounting distortion information identification model based on the decision tree also has good identification ability.

5.2. The Influence of Enterprise Strategy on Accounting Information
5.2.1. Descriptive Statistical Analysis

The results of descriptive statistics on the explained variables, explanatory variables, and control variables using the SPSS platform are shown in Figure 9.

From Figure 9, the average value of accounting information quality is 2.97, and the median is 3. From the perspective of explanatory variables, the median of enterprise strategy is 34.25, the mean is 53.65, the maximum is 796.25, the minimum is 0, and the range is large. The standard deviation is 80.55, which is large, indicating that the differences in enterprise strategies fluctuate greatly. From the perspective of interaction variables, the average shareholding ratio of the first shareholder is 33%, the highest ratio is 90%, and the smallest ratio is 6%. From the control variables, the mean value of shareholder connection is 0.56. The average and median of the number of managers are 6.43 and 6, respectively, indicating that the number of executives in Shenzhen-listed companies is mainly concentrated in 6 people. The mean value of the consistency of independent directors’ office area is 0.39. This shows that 39% of Shenzhen-listed companies on average have independent directors whose working locations are the same as where they are listed. The mean and median of assets and liabilities and return on total assets are close, and the standard deviations are small. This implies that the listed companies on the Shenzhen Stock Exchange are stable in the use of financial leverage and operating returns, which helps to control other objective factors. This plays a controlling role in studying the relationship between enterprise strategy and accounting information quality.

5.2.2. Model Regression Results

The SPSS platform is used to perform regression analysis on the explained variables, explanatory variables, and control variables, and the results are demonstrated in Figure 10.

Figure 10 reveals that the regression coefficient of enterprise strategy and accounting information quality is −0.053 when the industry and year variables are not considered, and it is negative at the 0.1 significance level. When industry and year variables are considered, the regression coefficient of enterprise strategy and accounting information quality is −0.054, which is negative at the 0.10 significance level. The coefficient value increases compared when the industry and year variables are not controlled. The above data show that the strategies adopted by enterprises are radical, and the degree of uncertainty increases. Then, the quality of the accounting information of the enterprise will decrease significantly. When an enterprise adopts a stable strategy, the quality of enterprise accounting information is high. Accordingly, the H1 proposed here is verified.

6. Conclusion and Future Works

This study further studies the problems existing in the identification of enterprise accounting information. The accounting distortion information is effectively identified by the support multi-vector and decision tree algorithm in data mining technology. The relationship between enterprise strategy and the quality of enterprise accounting information is studied to obtain the factors that affect the quality of accounting information. The results show that: (1) the information classification and identification model based on data mining has good identification performance. (2) Different deployments of enterprise strategies have different effects on the quality of enterprise accounting information. The more stable the enterprise strategy, the higher the quality of accounting information. Moreover, the deficiency of this study is that it only studies the direct relationship between the enterprise strategy and the quality of enterprise accounting information. Whether the relationship is controlled by mediating variables is unknown. In the future, we will focus on expanding the impact indicators of enterprise accounting information and further research its relationship. The follow-up study will also aim towards coming over the deficiencies of this study by exploring indirect relationships between the enterprise strategy and the quality of enterprise accounting information and their effects. This study aims to create high-quality enterprise accounting information, thereby promoting the further development of enterprises.

Data Availability

The datasets used during the current study are available from the corresponding author upon reasonable request.

Disclosure

Jia Shao and Pei Zheng are the co-first authors of this study.

Conflicts of Interest

The authors declare no conflicts of interest.

Authors’ Contributions

Jia Shao and Pei Zheng contributed equally to this work.

Acknowledgments

This work is supported by Key Projects of the National Social Science Foundation of China “Research on the cost sharing mechanism and effect of technological innovation” (No. 20AJY003).