International Journal of Photoenergy

Volume 2018, Article ID 6973297, 10 pages

https://doi.org/10.1155/2018/6973297

## Short-Term Photovoltaic Power Generation Combination Forecasting Method Based on Similar Day and Cross Entropy Theory

^{1}School of Electrical & Automation Engineering, Nanjing Normal University, Nanjing 210042, China^{2}Jiangsu Province Gas-Electricity Integrated Energy Engineering Laboratory, Nanjing 210046, China^{3}State Key Laboratory of Smart Grid Protection and Control, Nanjing 211106, China

Correspondence should be addressed to Qi Wang; nc.ude.unjn@iqgnaw

Received 30 July 2018; Accepted 11 November 2018; Published 24 December 2018

Academic Editor: Mohammed Ashraf Gondal

Copyright © 2018 Qi Wang et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

#### Abstract

The forecast for photovoltaic (PV) power generation is of great significance for the operation and control of power system. In this paper, a short-term combination forecasting model for PV power based on similar day and cross entropy theory is proposed. The main influencing factors of PV power are analyzed. From the perspective of entropy theory, considering distance entropy and grey relation entropy, a comprehensive index is proposed to select similar days. Then, the least square support vector machine (LSSVM), autoregressive and moving average (ARMA), and back propagation (BP) neural network are used to forecast PV power, respectively. The weights of three single forecasting methods are dynamically set by the cross entropy algorithm and the short-term combination forecasting model for PV power is established. The results show that this method can effectively improve the prediction accuracy of PV power and is of great significance to real-time economical dispatch.

#### 1. Introduction

Photovoltaic (PV) power generation is widely used in human life and built around the world in recent years. However, due to the obvious impact of meteorological factors, PV power generation has some characteristics such as uncertainty, volatility, and intermittency. It is detrimental to do safe dispatch and energy management [1]. The operational risk will increase when PV power generation system is connected to the grid. Accurate prediction of PV output power can provide a basis for dispatching decision and it is of great significance to reduce system operating costs and ensure the safety and stability of the power system.

Until now, much research has been devoted to the forecasting of PV power generation. According to the difference of forecasting methods, it can be divided into two categories: direct method and indirect method [2]. The indirect method predicts solar irradiance by using meteorological data of PV power plants and then uses the relevant calculation formulas or algorithms to calculate the PV output power [3, 4]. The direct method combines the meteorological data with historical PV power data to predict output power [5–8]. Yuan et al. [5] used BP neural network method to predict the PV power and the weather types have been considered. Lan et al. [6] established an ARMA and Markov chain prediction model for short-term PV power forecast. Jing et al. [7] used extreme learning machine method for the short-term PV power forecasting. In [8], a forecast model for PV power was established by least square support vector machine, but the prediction accuracy is affected by the model parameters.

Single direct forecast methods mentioned above may have some limitations and the prediction accuracy can be further improved. Appropriate combination forecasting methods can effectively improve the prediction accuracy of PV power generation [9–12]. Wang et al. [9] proposed a combination forecast method based on improved grey BP neural network; fuzzy C means method was used to classify the historical data and then select similar days as the model training samples. Li et al. [10] adopted the grey relational analysis method to determine the meteorological factors of the highest impact on PV power generation. Then combined with the advantages of each individual prediction model, it proposed the combined forecasting model based on IOWA operators. Yang et al. [11] established the combination forecast method for PV power based on entropy method and obtained appropriate combination weights. Yang and Chen [12] proposed a combination method in PV power forecasting based on the correlation coefficient and the prediction accuracy was improved. Although these combination forecasting methods can effectively improve the prediction accuracy, the weights of the single forecast methods are fixed in these combination methods and cannot reflect the real-time changes of PV power.

Accordingly, in order to further improve PV power forecasting accuracy, this paper proposes a short-term PV power forecasting method based on similar day and cross entropy theory. First of all, the main influencing factors of PV power are analyzed. Distance entropy and grey relation entropy are introduced. The similarity degree selection index is proposed to select similar days. Then least square support vector machine (LSSVM), autoregressive and moving average (ARMA) method, and BP neural network are used to forecast PV power, respectively. Using the cross entropy algorithm to dynamically set the weights of three single forecast methods, the short-term PV power combination forecasting model based on the cross entropy theory is established. The correctness and superiority of this model are verified by cases analysis and comparison with the combination method based on the sum of the squared errors and the combination method based on the correlation coefficient.

#### 2. Similar Day Selection

##### 2.1. Entropy Theory

Entropy is a measure of the degree of chaos in a sequence. Entropy theory originated from the laws of thermodynamics and has been widely used in many fields such as systems science, information science, and management science.

Entropy can describe the degree of chaos in the sequence. The probability of occurrence of each variable in the sequence is ; the entropy value of the sequence is defined as where is a constant and is the number of variables.

##### 2.2. Distance Entropy

Distance entropy is a combination of Euclidean distance and information entropy [13]. Euclidean distance is a method to effectively measure the similarity between two sequences, but it treats the differences between variables of different nature in the sequence as equivalent and sometimes cannot meet the actual requirements. Here, the combination of Euclidean distance and information entropy can overcome the shortcomings of Euclidean distance. The lower the distance entropy value, the more information is represented. That is to say, the less the difference between the comparison sequence and the reference sequence, the closer to the reference sequence.

The Euclidean distance of each meteorological characteristic variable between the historical day and the forecast day is where and are the meteorological characteristic variable of the forecast day and the historical day , respectively. Here and need to be normalized first.

The ratio of the distance between each meteorological characteristic variable and the distance sum of all the characteristic variables of the historical day can be calculated. The ratio is the probability of occurrence of the meteorological characteristic variable for the historical day. Thereby, the distance entropy value of the historical day can be calculated as where is the distance of the meteorological characteristic variable between the historical day and the forecast day; is the total number of meteorological characteristic variables.

##### 2.3. Grey Relation Entropy

Grey relation entropy is a combination of grey relational analysis method and information entropy. Based on the grey relational analysis method, the information entropy theory is used to quantitatively describe the degree of similarity between the influencing factors in the system, which can compensate for the local correlation tendency and personality information loss [14]. The greater the grey correlation entropy value is, the stronger the correlation between the comparison sequence and the reference sequence.

The correlation coefficient between the historical day and the forecast day is where and are the meteorological characteristic variable of the forecast day and the historical day . Here and need to be normalized first.

The ratio of the correlation coefficient of each meteorological characteristic variable and the sum of the correlation coefficient of all characteristic variables for historical days can be calculated. The probability of occurrence of the meteorological characteristic variable can be obtained as where is the correlation coefficient of the meteorological characteristic variable of the historical day and the forecast day; is the total number of meteorological characteristic variables.

The grey relation entropy of the historical day can be expressed as

##### 2.4. Influence Factors of PV Power

The output power of PV power is related to many influencing factors, including solar irradiance, temperature, humidity, wind speed, and weather type. The output power of PV array is calculated as follows [15]:
where is the operating temperature (°C) of the photovoltaic cell, is the solar irradiation intensity (kW/m^{2}), is the area of the photovoltaic array (m^{2}), and is the conversion efficiency of the photovoltaic cell (%).

It is assumed that the array area and conversion efficiency are constant for the short-term PV power prediction. It can be seen from equation (7) that when the two parameters are determined, the PV power is only related to two factors: solar radiance intensity and temperature. Due to the different seasons and weather types, the intensity of solar radiance is also quite different and the output power of PV arrays is also very different. Since the data of solar irradiance of the forecast day is generally difficult to be obtained, this paper uses the sunshine hours to replace it and selects the mean temperature, the maximum temperature, the minimum temperature, the relative humidity, the minimum humidity, and the sunshine hours as the main influencing factors to select similar days. These similar days are selected as training samples to predict PV power.

##### 2.5. Similar Day Selection Based on Entropy Theory

The basic steps for similar day selection based on entropy theory are as follows:

*Step 1. *First, determine the type of season and weather of the forecast day, and then select the samples of historical days with the same season and weather type, and the number of samples is .

*Step 2. *Select the mean temperature, maximum temperature, minimum temperature, relative humidity, minimum humidity, and sunshine hours of samples to form the meteorological characteristic vectors. The day meteorological characteristic vector is . , , and indicate the mean temperature, maximum temperature, and minimum temperature of the day, respectively; and indicate the relative humidity and minimum humidity of the day; indicates sunshine hours of the day.

*Step 3. *Calculate the distance entropy and the grey relation entropy of the day.

*Step 4. *The distance entropy and grey relation entropy are combined into a similarity index to characterize the similarity of historical day . The calculation formula is
According to the comprehensive similarity index , the six similar days are selected as the training samples for PV power prediction.

##### 2.6. Cross Entropy Theory

Cross entropy (CE) is derived from the definition of entropy and is a measure method for the information difference between two random vectors. It is also known as the Kullback-Leibler (K-L) distance. K-L distance is not a physical distance in length but is used to describe the difference between two probability distribution functions. The lower the cross entropy value is, the more similar the two probability density distribution functions. The definition of cross entropy can be divided into two situations [16].

Discrete situation:

Continuous situation: where and stand for probability vectors in equations (9) and (10).

The cross entropy stands for the distance between and . It is the description of the closeness degree between the two probabilities.

Theorem 1. * and are probability density functions; for , it equals to zero only when .*

*Property 1. *For , it equals only when almost everywhere.

Cross entropy algorithm is as a random optimization method and can be used to simulate small probability event and solve the optimization problem. The cross entropy method has been applied to solve the practical problems such as combined optimization, multiobjective optimization, combined forecast, and machine learning.

##### 2.7. Single Forecasting Methods

PV power is affected by solar irradiance, weather types, season types, temperature, humidity, etc. It is difficult to describe by a mathematical model. ARMA, BP neural network, and LSSVM method are often used to forecast PV power. ARMA is a linear model, which can predict the overall trend of the data. BP neural network and LSSVM model are nonlinear models with strong nonlinearity learning ability. Therefore, this paper selects these three single methods to forecast PV power.

##### 2.8. ARMA

Auto regression moving average (ARMA) model is a random time sequence analysis model founded by Box and Jenkins; it is also called the B-J method. The ARMA model is a combination of autoregressive (AR) and moving average (MA) models [17]. The AR model uses past values, and the MA model uses past error values. The linear ARMA can be expressed as where and are the AR and MA model coefficients. is the output of the ARMA model and is the residual. is the sampling time; and are the orders of AR and MA models, respectively.

In this paper, the input variables of ARMA model are historical PV power data of similar days. The forecast process mainly includes the test of sequence stability, model parameter estimation, and model ranking. Stationarity is tested by Augmented Dickey Fuller (ADF) unit root test. The model parameters are estimated by least square method and the model order is determined according to the Akaike info criterion (AIC).

##### 2.9. BP

BP neural network is a multilayer feedforward neural network with error back propagation training, which has good self-organizing learning ability and can realize any nonlinear mapping from input to output. The prediction model mainly uses the input signal forward propagation and error signal back propagation to realize the training process, which can deal with parallel process of large-scale data and has the ability of robustness and fault tolerance [18]. The BP neural network basic structure is shown in Figure 1. is the connection weight between the input layer and the hidden layer node and is the connection weight between the hidden layer and the output layer node.