#### Abstract

The identification of generating units that abuse market power is an essential part of risk prevention in a spot market, especially in the early stage of the construction of the spot market. In this study, a model for identifying generating units that abuse market power is designed based on the AdaBoost-DT algorithm. It is targeted at the imbalance between samples of generating units that abuse market power and normal generating units in the spot market. First, the four main methods by which market power is abused by generating units in the spot market are described: collusion, economic withholding, physical withholding, and extreme quotation. Second, the specific characteristics of the four methods are analyzed, and the identification indexes for generating units that abuse market power are established. Thereafter, a sample set of generating units that abuse market power using different methods is constructed. Furthermore, a training set is formed with samples of normal generating units to construct a model based on the AdaBoost-DT algorithm, for identifying generating units that abuse market power. Finally, the spot market data of a certain region are used for an example analysis. The results show that the accuracy of model identification is 97%, which validates the method.

#### 1. Introduction

Since the promulgation of Zhongfa [2015] No. 9 and its relevant supporting documents, China’s electricity market has begun to undergo a new round of reform. This concept is proposed to discover prices in the spot market and establish an electricity market with all available trading varieties and perfect functions [1]. In July 2020, China’s National Development and Reform Commission and National Energy Administration jointly issued the Notice on Strengthening the Work Related to the Power Spot Market Pilot Continuous Trial Settlement to further promote the development of the spot market. The trading rules of the spot market under the new scenario are still preliminary. Furthermore, during market trading, generating units abuse market power by exploiting the drawbacks in the market rules to obtain high profits. This severely impairs the capability for price discovery in the spot market [2]. Therefore, the establishment of a set of identification methods for units that abuse market power in the spot market and thereby maintain the safe, stable, and reliable operation of the electricity market is highly significant.

Based on the abuse of market power by generating units, Xue et al. [3] review the problem of market power abuse in terms of research methods and control. Yan et al. [4] design a mechanism to prevent the abuse of market power by generating units, while Chen et al. [5] summarize the fundamental concept of market power, common market power monitoring indexes, and market power mitigation methods. Moreover, the study introduces the implementation status of three typical electricity markets: the United States, the United Kingdom, and Northern Europe. These studies presented the definition of market power, methods to abuse market power, and methods to mitigate market power. However, these works do not identify the abuse of market power by generating units. Li et al. [6] propose various evaluation methods to comprehensively assess market power in electricity markets from different perspectives. Liu et al. [7] consider five types of market indicators (market supply and demand, market structure, bidding strategy, supplier position, and trading results) and propose a system for evaluating the abuse of market power by generating units. Zhao et al. [8] propose a process for implementing the comprehensive evaluation index system of the electricity market based on the multilevel fuzzy comprehensive evaluation method. In these studies, evaluation indexes based on the characteristics of data on generating units are constructed, as well as the comprehensive evaluation method to identify market power abuse. However, the comprehensive evaluation method is not applicable to large volumes of power transaction data that display high dimensionality. Wu et al. [9] focus on a simplified version of the convex hull pricing model, analyze the potential market manipulation behavior and manipulation ability of market participants in the simplified model, and propose an index to quantify market power. Dai et al. [10] build a multi-leader-follower Stackelberg game based on real-time pricing, model the strategic interaction behavior between multiple electricity retailers and users while simultaneously considering the power load uncertainty of users and the price competition among electricity retailers, which can reduce the real-time electricity price. Dai et al. [11] propose a dynamic pricing scheme based on Stackelberg game for an electric vehicle charging station with a photovoltaic system. These researchers have analyzed market power and real-time electricity price, but they have not identified the abuse of market power. Sun et al. [12] construct a method for identifying market power abuse in cartel-type generating units based on the ranked multivariate Logit model. Liu et al. [13] propose the identification of the abuse of market power in the electricity market based on the cloud model and fuzzy Petri net using the pattern recognition approach. Xu et al. [14] propose an intelligent identification method that is applicable to computer analysis to identify violations by power generation enterprises. It is based on an improved support vector machine and provides a reference for the credit evaluation of electricity market entities. These studies identify the market power abuse behavior of generating units, through intelligent identification algorithms. However, the distribution of generating units that abuse market power and normal generating units in the electricity market is unbalanced. Therefore, the accuracy of these algorithms in identifying data with unbalanced positive and negative samples has not been considered. By considering the characteristics of the spot market (high data dimensionality, large data volume, and unbalanced positive and negative samples), this study adopts the AdaBoost algorithm which has significant advantages in handling the unbalanced sample problem.

The AdaBoost algorithm is a classical integrated learning method that has been used in various fields for recognition problems. In electric power field, Li et al. [15] propose a composite perturbation classification strategy for power quality based on the conditional mutual information average optimal feature selection method and AdaBoost dynamic integrated classifier. Chen et al. [16] address the shortcomings of small current ground fault routing with unbalanced sample data, dimensional catastrophe, and high empirical risk. They propose a new method for small current ground fault line selection based on sample data processing and the AdaBoost method. These studies validate the advantages of the AdaBoost algorithm in solving the sample data imbalance problem. However, the speed of recognition needs to be improved. Yao et al. [17] propose the AdaBoost-decision tree (AdaBoost-DT) identification method that integrates multiple features to identify partial discharges of a gas-insulated composite apparatus. Zhang et al. [18] construct a new composite DT algorithm. They design and implement a DT-based remote sensing image classification system. The AdaBoost evolving DT algorithm is proposed by Zhao et al. [19]. Their experimental results show that the AdaBoost evolving DT can achieve a high recognition accuracy in a short period of time. Thus, the advantage of the AdaBoost algorithm in solving the unbalanced sample data problem is combined with the advantages of the DT (i.e., small computational volume, high recognition accuracy, and high recognition speed). This is suitable for scenarios where few generating units abuse market power in the spot market and the market data volume is large.

This study develops a method based on the AdaBoost-DT for identifying generating units that abuse market power in the spot market. First, the overall framework for identifying generating units that abuse market power in the spot market is designed. Second, the different means of market power abuse by generating units in the spot market are analyzed. Furthermore, the identification indexes for market power abuse by generating units are constructed to form the sample and training sets of generating units that abuse market power. Thereafter, an AdaBoost-DT model for identifying generating units that abuse market power is developed. Finally, the method is applied to the spot market in a region and compared with other methods to verify its effectiveness.

#### 2. General Rationale for Identifying Generating Units That Abuse Market Power

A method is designed based on the AdaBoost-DT technique, to identify generating units that abuse market power in the spot market by combining multiple means. The general rationale is as follows.

First, the methods by which generating units abuse market power are classified into collusion, physical withholding, economic withholding, and extreme quotation. Additionally, indicators for identifying each of these four means are developed.

Second, the data on the generating units in the spot market are used to calculate the identification indexes and construct the characteristics of generating units that abuse market power. Thereby, a sample of generating units that abuse market power in different ways is formed. Together with the normal generating unit samples, a sample set is formed.

Thereafter, an equal sample weight is assigned to each generating unit sample in the sample set, and the DT model is trained. According to the classification error rate of the DT model, the weight of the samples in the sample set is adjusted, while the weight of the misclassified samples is increased, and the weight of the correctly classified samples is reduced. The new DT model is trained such that the new model pays more attention to the misclassified samples. Iteration is continued until the misclassified samples are sufficiently few or the iteration terminates when it attains the set value. The weighted voting method is used to combine all the DT models to form the final AdaBoost-DT model for identifying generating units that abuse market power.

Finally, a model for identifying generating units that abuse market power is used to determine such units in the market. Thereafter, the identification results are evaluated.

#### 3. Modalities and Indicators of Market Power Abuse by Generating Units

##### 3.1. Main Methods by Which Generating Units Abuse Market Power

Market power in the electricity market refers to the capability of market members to manipulate the electricity price in the market and maintain it at an abnormal level for a certain period of time with the aim of making a profit. There are several methods by which generating units can abuse market power [20].

###### 3.1.1. Collusion

Collusion refers to the scenario wherein generating units participating in the market conclude an “alliance” through negotiations and contract signing and, subsequently, apply the negotiated quotation strategy to quote high prices to obtain excessive profits. Alternatively, certain generating units quote high prices to increase the market clearing price so that other generating units in the “alliance” can obtain excessive profits and take turns to “be the banker” [21]. When a generating unit has a large market share, the long-term gains that can be obtained through collusion among generating units are relatively substantial, and collusion among generating units is more likely to occur. Additionally, the quotation and its variations by colluding generating units tend to have certain similarities.

###### 3.1.2. Withholding

Generating unit capacity withholding refers to the scenario wherein a unit does not participate in the market with its available capacity (which should be traded in the market by the unit) because of certain deliberate actions of the unit including physical and economic withholding.

*(1) Physical Withholding*. Physical withholding refers to the intentional underreporting of available generating capacity by the generating unit. This is mainly in the form of false claims that the generating unit is faulty and cannot generate electricity, or that the equipment is undergoing or requires overhauling. The intention is to reduce its own generating capacity and, thereby, its supply to the market. Generating units frequently misreport physical withholding as failure of the unit, to evade market regulation and thereby abuse market power. Therefore, if the target generating unit uses physical withholding, its profits are determined by analyzing the influence of the unavailability of the unit on the market clearing price [15].

*(2) Economic Withholding*. Economic withholding refers to the unreasonably high price (significantly higher than the cost of power generation) quoted by a generating unit for a part of its capacity that results in nongeneration by that part. While participating in the market, the quoted price of an economically withheld generating unit is frequently close to the maximum market price [22] or significantly higher than its power generation cost and historical quoted price. This results in an increase in the clearing price and, consequently, a high profit. Therefore, it is possible to determine whether a generating unit has abused its market power through economic withholding, by comparing its historical price quotations with its winning bids.

###### 3.1.3. Extreme Quotation

Extreme quotation refers to the following acts: (1) frequently quoting a price that exceeds that of similar generating units and their own historically quoted prices at the time of market quotation and (2) frequently quoting at excessively low prices to ensure that the generating units win the bids. The main characteristic of generating units with extreme quotations is that these ensue a high number of quotations with extremely high or low prices at the time of quotation. It is possible to determine whether generating units have abused market power through extreme quotations by comparing their quoted price levels in the market with winning bids.

##### 3.2. Indicators of Market Power Abuse by Generating Units

To better identify generating units that abuse market power in the spot market, identification indexes are constructed considering three aspects: market structure, market behavior, and market performance. This is based on the characteristics of generating units that abuse market power in different ways and considering the principles of systematicity, scientificity, and operability.

Market structure characteristics mainly reflect the market share and position of the generating units, while market behavior characteristics mainly reflect the behavior of the generating units participating in the market, including the reporting scenario and quotation strategy. Market performance characteristics mainly reflect the performance of the generating units participating in the market, including the winning scenario of the generating units. The specific definitions and formulas of the identifying indicators of market power abuse by the generating units are as follows.

###### 3.2.1. Market Structure Category

*(1) Market Share* [23]. Market share is defined as the proportion of a generating unit’s generating capacity to the total generating capacity of all the generating units in the market. It is calculated using the following formula: where *s*_{i} is the market share of the *i*-th generating unit in the market, *N* is the total number of generating units in the market, and *q*_{i} is the generating capacity of the *i*-th generating unit in the market considering the maximum declared capacity of generating unit *i*’s current offer. The generating unit’s market share characteristics are used to reflect whether it has market power. Moreover, when its market share is excessively large, it is the key generating unit in the spot market and has a decisive influence on the clearing price in the market.

*(2) Key Supplier Index* [23]. The key supplier index is the number of generating units that must be utilized to satisfy the market demand. That is, the market is undersupplied when the key supplier does not contribute. It is calculated as follows: where OPS_{i} denotes the critical supplier index for generating unit *i*, denotes the declared capacity of generating unit *i*, denotes the declared capacity of generating unit *j*, *D* denotes the entire market demand for power at time *t*, and OPS_{i} < 1 is the key supplier for that time period when the generating unit has market power. This generating unit may be a key generator for the entire system or for a particular region.

###### 3.2.2. Market Behavior Category

*(1) Weighted Average Quotation* [24]. The average quoted price of a generating unit is defined as the sum of the product of the declared tariff and declared capacity of each segment of the effective quoted segment of the unit in the spot market, divided by the declared capacity. It is calculated as follows: where is the average quoted price of generating unit *i*, *p*_{i,h} represents the declared price of the *h*-th segment of the generating unit, *q*_{i,h} represents the declared capacity of the *h*-th segment of the generating unit, and *Y* represents the total number of declared segments in the generating unit’s quotation curve. The weighted average quotation of the generating unit reflects the quotation of the generating unit. If the weighted average quotation is higher, the generating unit is suspected of quoting a higher price.

*(2) Relative Level of Quotations*. The relative level of quoted prices reflects the overall level of the generating unit’s quoted price in the market. It is calculated by the following formula:where is the relative level of quotations for generating unit *i* and is the average quoted price of the unit at present. The closer the quoted price is to the average quoted price of all the generating units in the market, the closer it is to zero. If the quoted price is significantly different from zero, it implies that it is significantly different from the average quoted price of all the generating units in the market and that the degree of abnormal quoting by the generating unit is high.

*(3) Capacity Pricing Index* [25]. A generating unit's capacity pricing index is the sum of its declared electricity price and the product of the declared capacity multiplier. This index can reflect the declared high price of the generating unit. It is calculated according to the following formula: where CPI_{i} is the capacity pricing index of generating unit *i*. This index can reflect the relationship between the quantity and price of the generating unit.

*(4) High Quotation Rate* [25]. High quotation rate is defined as the number of times a generating unit’s offer attains the high quotation level in a cycle as a fraction of the number of offers by the generating unit in that cycle. It is calculated by the following formula: where *R*_{i} is the high quotation rate of generating unit *i*, NUM_{h,i} is the number of times that an offer by generating unit *i* attains the high quotation level, and NUM_{i} is the total number of offers by generating unit *i*. If the high quotation rate of a generating unit is high, it indicates that it displays abnormal offer behavior with the aim of increasing the market clearing price and gaining excess profit.

*(5) Similarity in High Quotation Rates*. The high quotation rate for a generating unit is calculated as shown in (6)*.* The formula for calculating the similarity in high quotation rate between generating units *i* and *j* over *S*cycles is where HB_{i,j} is the similarity in the generating units’ high quotation rate. *R*_{i} and *R*_{j} are the high quotation rates for generating units *i* and *j*, respectively, in the *s-*th cycle. The lower the HB_{i,j} is, the higher the likelihood of simultaneous high quotations by *i* and *j*. The likelihood of collusion between generating units with similar offers can be reflected to a certain extent by the similarity in their high quotation rates over a number of cycles. If HB_{i,j} is close to zero, it is necessary to conduct focused monitoring of these two generating units.

*(6) Generating Unit Failure Rate*. Generating unit failure rate is defined as the fraction of time that a generating unit is out of service or being overhauled. It is calculated by the following formula:where *F*_{i} denotes the failure rate of the generating unit, *T*_{fail} denotes the total time of failure or overhaul of the generating unit within a period of time, and *T*_{all} denotes the period of time.

*(7) Correlation Coefficient of the Quotation Curve*. The correlation coefficient of the quotation curve of generating unit *i* reflects the correlation between this curve and the quotation curve of generating unit *j*. The formula is as follows: where *R*_{i,j} denotes the correlation coefficient of the quotation curve of generating units *i* and *j*, Cov(*P*_{i}, *P*_{j}) is the covariance of the quotation series *P*_{i} and *P*_{j}, Var[*P*_{i}] is the variance of *P*_{i}, and Var[*P*_{j}] is the variance of *P*_{j}. The correlation coefficient of the quotation curves can reflect the degree of correlation between the quotations of the generating units: the larger it is, the higher the similarity between the quotations of the generating units is and the higher the possibility of collusion is between these. The generating units with a high correlation coefficient of quotation curve are those with the risk of “collusion.”

###### 3.2.3. Market Performance Category

*(1) Rate of Increase in Clearing Prices*. The clearing price escalation rate for a specified generating unit is the actual market clearing price minus the simulated clearing price in a fully competitive market, divided by the simulated clearing price in a fully competitive market. The simulated clearing price is obtained by modifying the declaration behavior of the generating units that abuse market power and, then, recalculating the clearing. The formula is as follows:where denotes the rate of increase in the generating unit’s clearing price. *P*^{clear} denotes the actual market clearing price and denotes the simulated clearing price in a fully competitive market, both after modifying the offer of generating unit *i*. The higher the clearing price escalation rate, the higher the risk of market power abuse by the generating unit.

*(2) Rate of Winning Bids* [24]. The rate of winning bids is defined as the proportion of the generating unit’s total winning bid to its declared total power. It is calculated by the following formula:where WR_{i}, , and are the winning bid rate, total bid winning electricity, and total declared electricity, respectively, of generating unit .

*(3) Out-of-Merit Capacity Index* [23]. The out-of-merit capacity index is defined as the ratio of the out-of-merit capacity of the generating unit as a fraction of the unit’s actual declared capacity to the market unsuccessful bid (the sum of the available capacity of all the generating units that participated in the bidding system and failed to win the bid) as a fraction of the market declared capacity, in a certain trading period. It is calculated using the following formula:where OCI_{i} is the out-of-merit capacity index of generating unit *i*, is the power of generating unit *i* with unsuccessful bids, is the declared capacity of generating unit *i*, *Q*^{oc} is the power with unsuccessful bids in the market, and *Q*^{bid} is the total declared capacity in the market.

As is evident from the definition of the out-of-merit capacity index, its magnitude is primarily related to the ratio of out-of-merit capacity to available capacity of the generating unit and the ratio of the remaining system capacity to available market capacity. In an ideal electricity market, the ratio of each generating unit’s out-of-merit capacity to its available capacity should be relatively close to the ratio of the remaining capacity of the system to the available market capacity. Thus, the ideal out-of-merit capacity index should be 100. If the out-of-merit capacity index of a generating unit is less than 100 for a specified time period, it indicates that the unit's offer for that time period is normal. Otherwise, it indicates that the unit's offer for that time period is higher than that of the majority of the units in the system and that the unit may have engaged in collusive bidding behavior.

Depending on the different means by which the generating unit abuses market power (each of which has different characteristics), the specific problems are analyzed on a case-by-case basis, as shown in Table 1.

#### 4. Model Based on AdaBoost-DT Algorithm for Identifying Generating Units That Abuse Market Power

##### 4.1. Sample and Sample Set of Generating Units That Abuse Market Power

Based on the different methods by which generating units abuse market power and the indicators used to identify these, the sample of generating units that abuse market power and the sample set used for model training are constructed in the spot market CONTEXT.

First, an indicator set for generating units that abuse market power is constructed. Using spot market data and indicators for identifying market power abuse, the indicator set of generating units that abuse market power is constructed for different methods of abusing market power, as shown in where *X*_{i} denotes the set of indicators of the *i*-th generating unit that abuses market power. denotes the *m*-th indicator of the *j*-th risk of generating unit *i*. *J* denotes the total number of methods of market power abuse, and *M* denotes the number of indicators.

Thereafter, a sample of generating units that abuse market power and a sample set are constructed. Based on the set of indicators obtained from (13)–(15), a sample of generating units that abuse market power and a sample set of generating units that abuse market power are created (as shown in (16)).where *y*_{i} denotes the label of generating unit *i*. It takes values in the range {0, 1, 2, 3, 4}, representing normal, collusive, physically withheld, economically withheld, and extreme quotation generating units, respectively.

denotes the weight of the *m*-th indicator of the *j*-th method of market power abuse. is the threshold of the *j-*th generating unit that abuses market power. All the generating units beyond this threshold are identified as generating units that abuse market power, as determined by experts. *T*_{i} denotes the *i*-th generating unit sample. It includes the labels *y*_{i} of different methods of abusing market power and the indicator *X*_{i,j} of generating units that abuse market power. *T* denotes the set of samples used for model training. It consists of the samples of generating units that abuse market power (*T*_{1}, *T*_{2}, …, *T*_{n}) and a sample of a normal generating unit (*T*_{n+1}, …, *T*_{N}).

##### 4.2. AdaBoost-DT Recognition Model

The fundamental concept underlying the AdaBoost algorithm [26] is as follows: (1) vary the classifier weights based on the misclassified samples by continuously iterating until a sufficiently small error rate is attained and (2) combine the different classifiers of each iteration by a strategy to form the final strong classifier (as shown in Figure 1) by first training weak classifier 1 using the original training set and, then, readjusting the weights of the samples in the training set, which is then used to train weak classifier 2. Iterations continue until a sufficiently marginal error rate is achieved. Finally, the weighted voting method is used to combine the weak classifiers. The AdaBoost algorithm has good generalization capability and practicality [27]. Different weights are assigned to different samples through cyclic training to achieve accurate classification by increasing the focus on difficult samples.

Using DT as a weak classifier, samples of generating units with different labels in the training set are used as inputs to the AdaBoost algorithm for training. Thereafter, the unknown generating unit samples are identified. The specific steps of the algorithm are as follows.

*Step 1. *Initialize the weight distribution of the training data.

Based on the obtained training set, each generating unit sample in the training set is assigned an identical weight in the first iteration, as shown in (17). The weights of the generating unit samples are updated in each iteration:where is the weight of the first generating unit sample *i* in the first iteration (*i* = 1, 2, …, *N*).

*Step 2. *Iterative training of weak classifiers.(1)Denote the number of iterations by *k* (*k* = 1, 2, …, *K*). Set the weight coefficient of each generating unit sample at the *k*-th iteration as *D*(*k*) = {*ω*_{k1}, *ω*_{k2}, …, *ω*_{kN}}, to obtain the *k-th* weak classifier *G*_{k}(*x*_{i})*.*(2)Calculate the weighted classification error rate *e*_{k} for the weak classifier *G*_{k}(*x*_{i}) on the training set. Here, the weighted classification error rate represents the sum of the weights of all the generating unit samples that have been misclassified by the current classifier.(3)Calculate the weight coefficient of the weak classifier *G*_{k}(*x*_{i})*.* It is combined with the weighted error rate *e*_{k} to calculate the weight coefficient of the weak classifier using (19). The weighting coefficient indicates the importance of the weak classifier *G*_{k}(*x*_{i}) in the final classifier. The smaller the error rate, the larger the weight coefficient [28]. The weight coefficient of the *k-*th weak classifier *G*_{k}(*x*_{i}) is(4)Update the weight distribution of the training dataset: where *ω*_{k+1,i} denotes the weight of the *i*-th generating unit sample in the (*K* + 1)-st iteration. It is calculated as follows: where *Z*_{k} is the normalization factor such that *D*(*k* + 1) is a probability distribution. It is calculated as From the above-stated equation, the weight of correctly classified generating unit samples is reduced according to the *k*-th iteration of classification, whereas the weight of misclassified generating unit samples increases. Misclassified generating unit samples play a larger role in the next iteration [29]. This enables each generating unit sample to be learned completely through this process.(5)Repeat (1)–(4) in Step 2 to obtain a series of weak classifiers and their corresponding weights.

*Step 3. *Linear combination of weak classifiers using weight parameters.The continuous function *f(x) is* transformed into a discrete function using the *sign()* function. Thus, the final identification model isUse identification models to identify generating units that abuse market power in the spot market.

#### 5. Analysis of Calculation Examples

The data on the spot market of a region in 2005 (containing information on 170 generating units) are used to validate the method proposed in this study for identifying the anomalous generating units in the spot market. The method is based on AdaBoost and DT techniques. The distribution of positive and negative samples is shown in Figure 2. Among the 170 generating units, the number of units abusing market power accounts for a quarter, including 14 collusive units, 7 physical holding units, 9 economic holding units, and 10 extreme quotation units.

##### 5.1. Samples of Generating Units That Abuse Market Power

Considering the indicators for identifying generating units that abuse market power and the actual scenario of the spot market in a certain region, and based on the relevant data of generating units in the market, the sample of generating units abusing market power and the sample set are constructed. A few samples of the generating units that abuse market power are presented in Tables 2–5 .

##### 5.2. Indicators for Model Evaluation

In this study, accuracy, recall, and F-Measure are used to evaluate the model for identifying generating units that abuse market power. Accuracy represents the fraction of genuine positive samples out of all the samples identified as positive, recall is the fraction of all the positive samples that are identified as positive, and F-Measure is a composite index of accuracy and recall. The formulas for the three evaluation indexes are as follows:where pre denotes the accuracy of the recognition model: the higher the accuracy, the better the model being evaluated; rec denotes the recall of the recognition model: the higher the recall, the better the model being evaluated; *F-*denotes the *F*-measure of the recognition model: the closer it is to one, the better is the model; TP denotes a positive sample that is correctly recognized; FP denotes a negative sample that is recognized as a positive sample; and FN denotes a positive sample that is recognized as a negative sample.

#### 6. Results of Model Identification

Of the sample set of generating units that abuse market power obtained above, 70% is used as the training set to train the recognition model. The remaining (30%) is used as the validation set to verify the accuracy of the model recognition. The accuracy of the classification is low when the number of iterations of the AdaBoost algorithm is relatively marginal. However, when this number is excessively large, it causes overfitting. The validation accuracy increases with the number of iterations. The result is shown in Figure 2. Meanwhile, the time utilized for the iteration increases with the number of iterations.

Figure 3 shows that the accuracy of classification increases with the number of iterations. However, it is essentially stable when the number of iterations is higher than 30. Thus, the number of iterations of the classifier can be set as 30. For the different methods by which generating units can abuse market power, the results of the model evaluation indexes for the AdaBoost-DT identification model are shown in Figure 4.

To validate the model, this study compares the results of identification using DT, SVM [14, 30], and AdaBoost-SVM, as shown in Figure 5.

**(a)**

**(b)**

**(c)**

To better illustrate the relationship between the accuracies of AdaBoost-DT and AdaBoost- SVM recognition with the number of iterations, the assessment of the recognition results is simplified here. The accuracy of recognition is the mean of the accuracies of the five sets (0, 1, 2, 3, and 4). Figure 6 shows a comparison of the two models based on the relationship between the recognition accuracy and the number of iterations, while Figure 7 shows a comparison based on the relationship between the iteration time and the number of iterations.

Figure 6 shows that the accuracy of identification by AdaBoost-SVM is higher when the number of iterations is less than 10. Furthermore, the accuracy of identification by AdaBoost-DT is significantly higher than that for AdaBoost-SVM when the number of iterations is larger than 10. Meanwhile, it is apparent from Figure 7 that AdaBoost-DT iterates faster. In Figures 4 and 5, the pre, rec, and *F-*results for each of the five generating units derived from the different models are averaged to obtain the results of the evaluation indexes of the different models. The specific results are shown in Table 6.

The experiment results demonstrate that the recognition effect of the SVM model is marginally higher than that of the DT model when the AdaBoost algorithm is not used for integration learning. When it is used for integration learning, the recognition accuracy of the AdaBoost-DT model improves from 0.76 to 0.97 relative to the single DT model, and the recognition effect significantly improves. Compared with the single SVM model, the recognition accuracy of the AdaBoost-SVM model improves from 0.82 to 0.93. The recognition effect is also improved. Furthermore, the recognition accuracy of the AdaBoost-DT model is higher than that of AdaBoost-SVM, which indicates the effectiveness of the former for identifying groups that abuse market power.

#### 7. Conclusion

This study analyzes the four methods market power abuse by generating units based on the actual scenario in the spot market, constructs identification indexes of market power abuse by generating units based on three dimensions (market structure, behavior, and performance), and develops an AdaBoost-DT identification model by constructing samples of generating units that abuse market power and sample sets for model training according to the data characteristics of generating units of the spot market. The conclusions are as follows:(1)This study makes use of the advantage of the AdaBoost-DT algorithm in unbalanced sample identification. The AdaBoost-DT algorithm displays a high identification accuracy of generating units from unbalanced positive and negative samples in the spot market. Additionally, it is fast and robust; that is, it can rapidly and effectively identify the generating units that abuse market power.(2)This study compares the AdaBoost-DT algorithm with the DT algorithm, as well as the AdaBoost-SVM algorithm and the SVM algorithm. The identification accuracies of the DT and SVM algorithms are not as high as those of the AdaBoost-DT and AdaBoost-SVM algorithms. Notably, when the number of iterations is small, the AdaBoost-SVM algorithm has a higher accuracy, but with an increase in the number of iterations, the AdaBoost-DT algorithm shows a better performance.(3)With the increase of iterations of the AdaBoost algorithm, the recognition accuracy of abusing market power units becomes increasingly higher, but the recognition time also lengthens. Therefore, considering the recognition accuracy and recognition time, an appropriate number of iterations is selected.

#### Data Availability

The data used to support the findings of this study were supplied by Guangdong Power Exchange Center under license and so cannot be made freely available. Requests for access to these data should be made to Yuting Xie ([email protected]).

#### Conflicts of Interest

The authors declare that there are no conflicts of interest regarding the publication of this paper.

#### Acknowledgments

This work was supported by the Science and Technology Project Assistance of China Southern Power Grid Corporation (Project No. GDKJXM20200211).