#### Abstract

In this paper, a novel power load forecasting model is proposed to fully extract the periodic characteristics of short-term load at various time scales and explore the potential correlations between influencing factors and characteristics of load components. Firstly, the t-distributed stochastic neighbor embedding algorithm is used to map sample points of high-dimensional load influencing factors to low-dimensional space, and the ensemble empirical mode decomposition algorithm is employed to split the historical load curve into multiple signal components with different frequencies. Then, several long short-term memory networks including nonlinear mapping and time series models are established to mine the relationship between low-dimensional comprehensive influencing factors and each intrinsic mode function component by utilizing different inputs. Finally, the effectiveness of the hybrid model is verified via using the short-term load dataset of 3-hour data granularity in a certain region, and the influence of key parameters of the model on the forecasting effect is discussed.

#### 1. Introduction

Load forecasting in short term is one of the important daily tasks in the contemporary power system. The unit output plan and economic dispatch strategy based on good forecasting results can improve the operation stability and economy of the system to a certain extent. Because the short-term load is influenced by succession uncertain factors, the model can be expressed as the sum of nonlinear mapping relationships between short-term load and related influencing factors and a series of uncertain random loads.

Up to now, there have been some state-of-the-art works to explore the ways to enhance the accuracy of load forecasting from the aspect of uncertain random loads. Reference [1] used user behavior to reflect the fluctuation of some uncertain loads, used smart meters to extract user-level data, analyzed the similarity of user behavior, and introduced them into the forecasting model. Reference [2] proposed a multistep forecasting model containing three channels (load, time, and user behavior). The user behavior type was identified by combining convolution automatic encoder and k-means, and it was used as a feature to combine with the feature information obtained by the other two channels for the comprehensive forecasting of short-term power load. Most of the existing user behavior feature extraction starts from the similarity of user energy consumption curve, classifies and forecasts users. However, the user’s energy consumption law itself has strong randomness, and the error generated by the classification will be transmitted to the final forecasting results through the forecasting model, resulting in the instability of the model in practical application.

At present, mainstream load forecasting research still focuses on how to better and more comprehensively excavate the nonlinear mapping relationship between load and its influencing factors. This kind of research mainly adopts machine learning modeling methods [3–5]. Reference [6] used the Bat algorithm and Kalman filtering method to optimize the support vector machine (SVM) and combined the fuzzy combination weight and empirical mode decomposition to forecast the load in short term. Reference [7] established a combination forecasting model including SVM and generalized regression model and used the weight determination theory to determine the final forecasting results. In Reference [8], the daily load curve was clustered, and a step-wise compound forecasting model was constructed based on a convolutional neural network (CNN) and long short-term memory (LSTM) neural network [8]. Reference [9] utilized a CNN to construct a deep learning model and carried out load forecasting for users with similar energy consumption. Under the comprehensive influence of many influencing factors, short-term power load still has certain periodic characteristics, and the aforementioned research methods to extract the periodic characteristics of load by influencing factors inevitably lead to incomplete feature extraction.

To solve this problem, empirical mode decomposition (EMD) [10–12] was introduced into load forecasting models. EMD can decompose the complex load curve into multiple intrinsic mode functions (IMFs) [10] with various frequencies and amplitudes and sole residual signal. In such studies, each sub-signal was often used for multiple modeling via the same forecasting model, such as the state-dependent autoregressive model [10] and CNN-LSTM model [11], ignoring the characteristics of each sub-signal. Appropriate and different ways of load modeling according to the characteristics of different IMFs can undoubtedly better and more comprehensively extract the load characteristics.

In this paper, a novel hybrid model for load forecasting in short term is proposed on the basis of previous studies. First, with the help of the time series memory capacity, the LSTM [13] model is divided into two categories by different model inputs, and different IMFs are modeled in an appropriate way to excavate the potential information of various IMFs. Second, the ensemble empirical mode decomposition (EEMD) [14] is applied to split the load signal, so as to overcome the mode mixing phenomenon that is easy to occur in the EMD and avoid its adverse impact on the forecasting effect. The nondestructive dimension reduction of power load influencing factors is carried out by t-distributed stochastic neighbor embedding (TSNE) [15], which reduces the amount of model calculation and the redundant features extracted from the model. Third, several LSTM models in two types that have different inputs are established by using the data obtained by TSNE and EEMD. This study introduces the algorithms used in the proposed hybrid model, and gradually elaborates the structure and construction process of the model. Finally, two key parameters of the model are discussed through examples and experiments, and the forecasting performance of the model is verified. As a result, the main contributions and highlights can be summarized as follows:(1)The EEMD algorithm is employed to decompose the power load curve into load components with different time-scale characteristics, and the appropriate modeling is carried out according to the characteristics of different components, which provides a new application form for the time series decomposition of load.(2)The nondestructive dimension reduction of power load influencing factors is realized through TSNE, which solves the problems of long training time of hybrid model composed of multiple sub-models.(3)This paper provides a new short-term load forecasting scheme, which can also be applied to the forecasting fields of new energy generation and integrated loads.

#### 2. Data Processing and Basic Forecasting Model

##### 2.1. Decomposition of Load Curve Based on EEMD

EMD and EEMD belong to nonlinear signal decomposition algorithms, which decompose a nonstationary time series signal into several groups of IMFs with frequency from high to low and a group of residual signals representing the overall trend of the original signal. For the power load signal, the IMFs components of different frequencies contain the periodic characteristics of power load at different time scales. With the decrease of IMFs component frequency, the low-frequency component represented by the residual signal contains the load trend of power load for a time.

Due to this phenomenon of missing data and abnormal data in the acquisition process of power load data, this phenomenon may lead to the discontinuity of the original load signal or the existence of signal step-change on the time scale, which makes such data prone to modal aliasing when using EMD for signal decomposition. Modal aliasing refers to the characteristic signal that an IMF component contains different time scales at the same time [14], which affects the signal decomposition effect and load modeling. As an improved method of EMD, based on EMD, EEMD adds the process of adding white noise to the signal many times and calculating the mean value of each sub-signal, which can avoid the occurrence of modal aliasing.

The steps of EEMD signal splitting are as follows:(1)Set the number of adding white noise: .(2)Add normal distribution white noise to the historical load curve to constitute the signal , and , denotes the number of sub-signals obtained by decomposition after step (5).(3)Find out all the local maximum and minimum values contained in the signal, and fit the envelope curves with the maximum and minimum points, respectively.(4)The mean curve constructed by two envelope curves is obtained, and the difference between the signal and the curve is obtained. The mean curve will participate in the next iteration, and .(5)Repeat steps (3) and (4) until the difference between signal and curve is small enough, that is, the signal cannot be decomposed again. At this time, the quantity of obtained by decomposition is .(6)Repeat steps (2) to (5) until the original signal is processed times by white noise and decomposed. At this time, the mean of all is obtained by calculating the times signal decomposition, namely, the signal components obtained by EEMD. The calculation formula is given by where is each component of the original load signal obtained by EEMD, and each component satisfies where is the original power load signal, are the IMF components, and is the residual.

##### 2.2. Dimension Reduction of Influencing Factors Based on TSNE

TSNE algorithm is improved from stochastic neighbor embedding (SNE). SNE deems that the distance between corresponding points in original dimension and converted dimension is also similar, and conditional probability is used to represent the similarity of this distance relationship [15].

In SNE algorithm, for the data points and of the influencing factors of power load in any high-dimensional space, the probability of as the proximity point of is set to be . After mapping to the low-dimensional space, the probability of low-dimensional mapping point as the proximity point of is set to be , and the calculation formulas of and are expressed aswhere is the standard deviation of Gaussian distribution with as the center point, and are the arbitrary initial data points in original dimension and mapping points in converted dimension, respectively.

The cost function is constructed and defined as

At this time, the data dimension reduction problem is transformed into an optimal problem. Through iterative calculation, when the cost function takes the minimum value, the optimal solution of the mapping point of original dimension samples in converted dimension can be obtained. The iterative process is given bywhere represents the sample points in lower dimension obtained in iteration , denotes the number of iterations, is the learning rate, is the learning momentum in iteration , is the gradient vector, and the gradient at point is written as

In the SNE algorithm, the samples in both the original dimension and converted dimension use Gaussian distribution to represent the similarity between data, which will lead to the inconvenience of data congestion in the converted dimension, which is laborious to distinguish [16], and increase the difficulty of model feature extraction. In addition, the probability of data points calculated by the SNE algorithm in original dimension and converted dimension is asymmetric and the calculation of gradient is complex.

Given these limitations of the aforementioned SNE algorithm, the TSNE algorithm has made the following improvements based on SNE:(1)In the TSNE algorithm, the symmetric SNE method is applied to calculate the probability of samples in original dimension and simplify the calculation of gradient formula. The probability of samples and the joint probability density of data points in original dimension can be expressed as where is the number of data points of high-dimensional space load influencing factors.(2)In SNE, Gaussian distribution is used to represent the similarity between sample points in space with different dimensions, while TSNE improves the Gaussian distribution in converted dimension to t-distribution, which reduces the inconvenience of data crowded and difficulty in distinguishing after dimension reduction. So far, the joint probability density of low-dimensional space samples can be defined as where and are the mapping points of any low-dimensional space samples, respectively. TSNE algorithm uses joint probability density to replace conditional probability in the SNE algorithm. For any joint probability density between samples, there are and . The improved cost function and gradient can be written as Finally, the TSNE algorithm uses (6) to continuously iteratively obtain the optimal solution of mapping points in low-dimensional space.

##### 2.3. LSTM Neurons

LSTM is an improved recurrent neural network (RNN) published by Hochreiter and Schmidhuber [13], which overcomes the difficulty of training RNN in practical applications. LSTM neural network has a strong storage capacity of time series features and has strong adaptability in feature extraction of data with such properties. LSTM unit can be divided into input subunit, output subunit, forgetting subunit, and alternative cell state, and their respective states are expressed aswhere is the alternative cell state at moments, is the hyperbolic tangent activation function, and are the weights assigned to and in the calculation of the alternative cell state, is the input of this unit at moments, is the output of this unit at moments, represents time, *b _{D}* is a bias term for alternative cell state, , and are the gating coefficients of the input subunit, output subunit, and forgetting subunit, is the Sigmoid function, , and are the weights assigned to in the three subunits. , and are the weights assigned to in the three subunits, and , and are the bias terms of the three subunits.

Thus, the weight value and bias value of each door control the size of the corresponding gating coefficient. The gating coefficient of the input subunit is applied to control how much information in the alternative cell state can be written in the updated cell state, and the gating coefficient of the forgetting subunit is utilized to restrict how much information from the previous cell state can be inherited to updated cell state. The state of the cell of this LSTM unit at moments can be written aswhere and are the states of this cell at moments and moments, respectively.

Similarly, the function of the LSTM unit output gating coefficient is to control how many cells state information activated by activation function can be output from this unit, so the output of a single LSTM unit is given bywhere is the output data of this unit at moments.

LSTM network is composed of multiple memory units and even multi-layer LSTM unit layers. In the training, the weight value and bias value of all LSTM units are determined by iteration with the minimum loss function as the goal. At this time, the input-output mapping relationship of the whole network can represent the nonlinear connection between influencing ingredients and power load or the time series tendency from the historical load.

#### 3. Hybrid Model Based on TSNE-EEMD-LSTM

According to the aforementioned introduction, the TSNE-EEMD-LSTM model proposed in this paper includes power load decomposition, influence factor dimension reduction, and load forecasting combination modeling. The proposed model structure is shown in Figure 1.

In step 1 of Figure 1, the influence factors dimension reduction based on TSNE algorithm and load signal decomposition based on EEMD algorithm are described, respectively.

There are six types of power load influencing factors in the original dataset, including time, temperature, pressure, humidity, wind speed, and precipitation. These influencing factors together constitute a high-dimensional sample space, and these points in the original dimension are reduced to a lower dimension by TSNE algorithm. It is assumed that the number of the target dimension is , and the projection of sample points in -dimensional space on each coordinate system is the value of each comprehensive influencing factor.

When the value of is 2 or 3, the data after dimension reduction can be visually represented, which is convenient for intuitive analysis of the distribution relationship of points in lower dimension. The visualization effect is shown in Figure 2 and Figure 3. The mapping points in the low-dimensional space still maintain good dispersion after the dimension reduction of the power load influencing factor data by the TSNE algorithm.

According to relevant conclusions in reference [14], some essential parameters in EEMD are set as follows:(1)The standard deviation of white noises amplitude added to the original data is 0.2 times that of them.(2)The number of adding white noise is set to 200.

The results of signal decomposition of power load application EEMD are shown in Figure 4.

Each IMF component has different frequencies and characteristics. The IMF1 component has the largest variation frequency and changes dramatically with the change of each sample number in the original data, which contains the load characteristics of the minimum particle size. The period of the IMF2 component is almost the same as the original signal, and the change of each extreme point has a similar trend with the peak and valley value of the original signal, which indicates that the IMF2 component may contain characteristic information related to the peak and valley value of daily load. The frequency of the IMF3 component to the IMF6 component decreases gradually; these signals may contain several days, a week, or longer load periodic characteristics.

In step 2 of Figure 1, forecasting models based on LSTM are established. Figure 4 shows that the original load signal is decomposed into six IMF components and one residual signal, so . The established seven models can be divided into two categories. One is the nonlinear mapping model based on the influencing factors, which is applied to explore the nonlinear connection between the influencing ingredients and corresponding load component data. The other is the time series model to explore the time series trend of certain load component curves on the basis of itself.

The nonlinear mapping model takes the comprehensive influencing factors of power load in low-dimensional space as the input of the model, and each IMF component or residual signal as the output of the model. Assuming that the target dimension of the TSNE algorithm is , the data of comprehensive influencing factors in low-dimensional space is , and then the output data of LSTM network participating in model supervised learning at moments is , the corresponding input data of LSTM network is .

Time series model, the input or output of the model is the same IMF component or residual signal sequence. If the output data of a LSTM network participating in model supervised learning at moments is , the corresponding input data of the LSTM network is .

When applying to forecast, the execution process is shown in Figure 5. Trained models include nonlinear mapping models and time series models. Firstly, TSNE is used to reduce the data dimension of the power load influencing factors needed in the forecasting day, and the data points of the influencing factors in original dimension are mapped to the lower dimension. Then, influencing mapping factors in converted dimension are input into trained nonlinear mapping models, and the historical data of IMF components or residual signals corresponding to the models are input to time series models. Finally, a total of model outputs are obtained, and the model outputs are reconstructed according to Equation (2) to get the final forecasting results.

#### 4. Experimental Verification and Discussion

In this section, the essential parameters of the model are discussed and the forecasting performance of the combined model is testified by using datasets including the 31-day 3-hour granular load data of a city and the corresponding time temperature, pressure, and other six kinds of power load influencing factors data.

##### 4.1. Discussion on Key Parameters of Model

Selecting appropriate parameters is advantageous to enhance the forecasting effect of the proposed model. This part will discuss the influence of low EEMD parameter and the number of nonlinear mapping models based on LSTM on the forecasting efficiency of the model.

Different will affect the calculation and forecasting speed of the nonlinear mapping model. To verify this hypothesis, six IMFs and one residual signal gained via EEMD are used as the output of supervised learning models to establish seven nonlinear mapping models, and then the average total time required to complete the training and forecasting of these seven models is counted when takes different values. In Figure 6, there is a tendency that when the input dimension of the model decreases, the total time required to complete the model training and forecasting is shorter. That is, the forecasting speed of the model increases with the decrease of .

The forecasting speed of the model can be improved by mapping the influencing factors of power load to the low-dimensional space for model building. However, the data gained via TSNE have similar distribution relations in different dimensions. When the target dimension is too low, the projection process of raw data in lower dimension will become difficult, and even the phenomenon of feature loss occurs. This kind of phenomenon is explained as that when the low-dimensional space dimension is too low, the data mapping points cannot carry sufficient feature information.

Therefore, we compare the forecasting accuracy of the combined model with different and . Among them, each group of composite models is composed of seven models, namely, nonlinear mapping models or time series models. At this time, the number of nonlinear mapping models is , and the number of time series models is . Except for different model structures, the parameters of the two models are all the same. In terms of model structure, the nonlinear mapping model consists of a layer of LSTM, which has 60 LSTM units, and the time series model consists of three LSTM layers, each of which has 60 LSTM units. The reason for this distinction is that the input of the time series model is the load information at the historical time, and the data itself has the characteristics of time series. Therefore, it is necessary to extract the relevant characteristics from more layers of the LSTM layer. The input of the nonlinear mapping model is the comprehensive influencing factor data in low-dimensional space, and it does not have a time-series relationship. Therefore, only one layer of the LSTM layer is set to simplify the calculation of the model.

The experimental results are shown in Figure 7. When and , the combined model has the highest average forecasting accuracy.

Laterally, the combination model with the same will have the best forecasting performance at , that is, the combination model using all IMF components to build nonlinear mapping model and using residual signal to build time series model has the best effect, indicating that the relatively flat residual signal is more suitable for time series forecasting method. When , the combined model consists of seven time series models, that is, all IMF components and residual signals are modeled by time series models. At this time, the average forecasting accuracy of the combined model is the lowest, indicating that each IMF component is not suitable for time series forecasting.

Vertically, the models with higher accuracy are concentrated in the middle part of Figure 7, indicating that too large or too small will influence the efficiency of the proposed model. Part of the reason is that there is a certain potential connection between the influencing factors of high-dimensional space. If all the input forecasting models are not reduced by data dimension, it will make the model extract some redundant features, increase the calculation amount of the model, and even bring negative improvement to the forecasting accuracy of the combined model. On the contrary, if the dimension of influencing factors is reduced to 2 or lower, the data in low-dimensional space cannot carry all features of the raw data, thus reducing the generalization ability of the model.

##### 4.2. Model Comparison

To test the performance of the proposed model in short-term load forecasting, this paper establishes multiple models for simulation and comparison, including the time series model based on LSTM (TLSTM), the nonlinear mapping model based on LSTM (MLSTM), the back propagation neural network (BP), the model combining TSNE and MLSTM (TSNE-MLSTM), the model combining TSNE and BP (TSNE-BP), the model combining EEMD and MLSTM (EEMD-MLSTM), the model combining EEMD and TLSTM (EEMD-TLSTM), and the model combining EEMD, MLSTM, and TLSTM (EEMD-BLSTM). Among them, BP uses three hidden layers, each hidden layer contains 60 neurons, and LSTM model parameters are set as above.

The forecasting results of each model are shown in Figure 8, and the detailed comparison information is demonstrated in Table 1, including the mean absolute percentage error (MAPE), the root mean square error (RMSE), and the mean absolute error (MAE).

In Figure 8, the load forecasting value of each sample point of the EEMD-TLSTM model deviates from the actual load value greatly, and the IMF component with high-frequency characteristics has a large error when using the time series model to forecast. The accuracy of TLSTM model is superior to that of EEMD-TLSTM, but its forecasting accuracy is still not remarkable in the comparison of all models, mainly due to the strong fluctuation of the original load signal, the effect of load forecasting directly using the time series model is not good. On the contrary, if the original load signal is decomposed by EEMD, the decomposed IMF component is used to build the MLSTM model, and the residual signal is used to build the TLSTM model. Then, the appropriate models are used to forecast, respectively, such as EEMD-BLSTM and proposed model, and the forecasting efficiency will be significantly improved.

Ulteriorly, due to the fact that TSNE reduces the dimension of the influencing factors of power load in high-dimensional space, the potential redundant features in the comprehensive influencing factors in low-dimensional space are less, so that the forecasting effects of TSNE-MLSTM and TSNE-BP compared with MLSTM and BP models are improved by 1.9186% and 4.2731%, respectively. The proposed model also improves the accuracy of 1.0235% for the EEMD-BLSTM model with the EEMD algorithm. The forecasting values of each sample point in proposed model are shown in Figure 9. The size of the error bars in Figure 9 indicates the absolute error. The position of the bar end is the actual load data of each sample point. The RMSE of the proposed model is 39.3329 , and the MAE is 31.0916 . The forecasting results are close to the actual load value.

#### 5. Conclusions

This paper establishes a short-term power load forecasting model based on TSNE-EEMD-LSTM. TSNE is used to map the data of power load influencing factors in original dimension to the lower dimension, which reduces the calculation of redundant features. EEMD is used to split the power load curve into multiple IMF components with different frequencies and a flat residual signal. Based on the characteristics of IMF components and residual signals, appropriate models based on LSTM are established, including the nonlinear mapping model with low-dimensional space comprehensive influencing factors as model input and the time series model with historical data of signal components as model input. Finally, the important parameters of the combined model are determined by comparative experiments, and the feasibility of the proposed hybrid model in short-term load forecasting is verified. Several following conclusions are drawn:(1)There are redundant features between the original power load influencing factors. The multiple extractions of redundant features by the forecasting model will increase the calculation amount of the model, decrease the forecasting speed of the model, and even lead to poor generalization ability of the model. TSNE algorithm can greatly retain data structure and reduce redundant feature extraction. When the original influencing factors are time, temperature, pressure, humidity, wind speed and rainfall, it is best to map the influencing factors to the four-dimensional space.(2)The IMFs signal and residual signal decomposed by EEMD have different characteristics. Using these characteristics to build several targeted sub-models can improve the accuracy of prediction. In this paper, the LSTM model is divided into nonlinear mapping model and time sequence model by changing the model input. All IMFs signals are suitable for modeling by nonlinear mapping model, and the residual signal is suitable for modeling by time series model.(3)The TSNE-EEMD-LSTM model proposed in this paper can avoid extracting redundant features as much as possible and fully tap the potential characteristics of influencing factors, which has an excellent performance in the comparison of multi-model forecasting performance.

The shortcomings of this study and follow-up work:

The models used in this paper are based on LSTM, and work types are divided by changing the input of the model. In the subsequent research, more advanced and more appropriate forecasting models will be taken into account to improve the performance of each sub-model in the hybrid model. In the future, it is expected that our proposed scheme can be applied to the forecasting fields of new energy generation [17,18] and integrated loads [19–21].

#### Data Availability

The data that support the findings of this study are available from the corresponding author upon reasonable request.

#### Conflicts of Interest

The authors declare that they have no conflicts of interest.

#### Acknowledgments

This work was supported by 2022 Innovation and Entrepreneurship Training Program for College Students of Shenyang Institute of Engineering, Scientific Research Project of Education Department of Liaoning Province (LJKQZ2021079), Doctoral Scientific Research Foundation of Liaoning Province (2020-BS-181) and Liaoning Revitalization Talents Program (XLYC1907138).