#### Abstract

The intermittence and fluctuation character of solar irradiance places severe limitations on most of its applications. The precise forecast of solar irradiance is the critical factor in predicting the output power of a photovoltaic power generation system. In the present study, Model I-A and Model II-B based on traditional long short-term memory (LSTM) are discussed, and the effects of different parameters are investigated; meanwhile, Model II-AC, Model II-AD, Model II-BC, and Model II-BD based on a novel LSTM-MLP structure with two-branch input are proposed for hour-ahead solar irradiance prediction. Different lagging time parameters and different main input and auxiliary input parameters have been discussed and analyzed. The proposed method is verified on real data over 5 years. The experimental results demonstrate that Model II-BD shows the best performance because it considers the weather information of the next moment, the root mean square error (RMSE) is 62.1618 W/m^{2}, the normalized root mean square error (nRMSE) is 32.2702%, and the forecast skill (FS) is 0.4477. The proposed algorithm is 19.19% more accurate than the backpropagation neural network (BPNN) in terms of RMSE.

#### 1. Introduction

Along with the rapid increase of solar power generation, more and more solar power is connected to the grid, which has already shown its substantial economic impact. Based on the statistics of the International Renewable Energy Agency (IRENA), the total installed capacity for PV has reached 205.493 GW in China at the end of 2019 [1]. However, power generation from photovoltaic systems is highly variable due to its dependence on meteorological conditions. There is a severe challenge to the security of the power grid because of the fluctuation of solar power. Therefore, an effective method of solar irradiance forecasting can mitigate intermittency as it gives information about future trends and allows users to make decisions beforehand.

Solar forecasting is a timely topic, and several short-term solar irradiance forecasting approaches have been presented recently. Broadly, prediction can be divided into five categories based on forecast methods as follows [2]: (1) time series; (2) regression; (3) numerical weather prediction; (4) image-based forecasting; and (5) machine learning. A time series is a sequence of observations taken sequentially in time. That is divided into stationary and nonstationary time series forecasting models. Autoregressive (AR), moving average (MA), and autoregressive moving average (ARMA) are commonly used to forecast stationary trends; integrated moving average (IMA), autoregressive integrated moving average (ARI-MA), seasonal autoregressive integrated moving average (SARIMA), and other models are used to forecast nonstationary trends [3–6]. Regression is a statistical process for estimating the relationships among variables; it is a handy tool to describe the relationship between solar irradiance and exogenous variables [7, 8]. Numerical weather prediction (NWP) models directly simulate the irradiance fluxes at multiple levels in the atmosphere, separately considering the shortwave and longwave parts of the solar spectrum [9, 10]. Image-based forecasting method is using satellite cloud images and all-sky images as main or auxiliary data sources to forecast irradiance. This can effectively increase forecasting skills, as it provides warning of approaching clouds at a lead time of several minutes to hours [11–13]. The machine learning method, as a branch of artificial intelligence, can learn from datasets and construct a nonlinear mapping between input and output data. Nowadays, machine learning (ML) is perhaps the most popular approach in solar forecasting and load forecasting [2]. Although artificial neural networks (ANNs) and support vector machines (SVMS) are still the basis of machine learning methods in solar irradiance prediction, many other approaches have been used recently, such as k-nearest neighbors (kNN), random forest (RF), gradient boosted regression (GBR), hidden Markov models (HMMs), fuzzy logic (FL), wavelet networks (WNN), and long short-term memory networks (LSTM) [14–22]. Meanwhile, some hybrid algorithms are used to improve the prediction accuracy. For example, the metaheuristic algorithms, such as cuckoo search (CS) algorithm, krill herd (KH) algorithm, and chaotic immune algorithm, are combined with a support vector regression (SVR) model to predict electric load [19, 23–26]. Some signal preprocessing methods, such as variational mode decomposition (VMD) method and empirical mode decomposition (EMD), are also used in the hybrid model [24, 25]. Obviously, the abovementioned methods are not detailed lists. Many other applications of machine learning algorithms in solar radiation prediction can be found in recent literature [27].

As a novel machine learning tool, LSTM has successful applications in solar irradiance forecasting [28–30]. Due to its special maintaining a memory cell structure, it can preserve the important features which should be remembered during the learning process and improve performance. Therefore, using LSTM to predict irradiance can not only obtain the correlation during continuous hours but also extract its long-term (e.g., seasonal) behavior trends [30]. Yu et al. [29] proposed an LSTM-based approach for short-term global horizontal irradiance (GHI) prediction under complicated weather conditions, the result indicated that LSTM outperforms ARIMA, SVR, and NN models, especially on cloudy days and mixed days. Qing and Niu [30] proposed a novel hourly day-ahead solar irradiance predicted method using weather forecasts based on LSTM networks. The proposed algorithm uses the hourly weather forecasts of the same day and data information at the predicted time as the input variables, and the hourly irradiance values of the same anticipated day are taken as the output variable. Experimental results show that the proposed learning algorithm is more accurate than persistence, linear least squares regression method (LR), and BPNN due to the consideration of time dependence. Srivastava and Lessmann [28] studied the ability of LSTM in predicting solar irradiance, demonstrated the robustness of LSTM, and showed that the LSTM model with optimally configured outperforms GBR and FFNN for day-ahead GHI forecasting. Abdel-Nasser and Mahmoud [31] proposed a method based on LSTM to forecast the output power of PV systems accurately. Liu et al. [32–34] proposed a new hybrid approach for the wind speed high-accuracy predictions based on some decomposition algorithm (such as secondary decomposition algorithm (SDA), empirical wavelet transform (EWT), and VMD) and the LSTM networks.

However, the LSTM methods mentioned above do not deeply study the effects of different parameters and structures on experimental results, but these factors will affect the prediction accuracy. In this paper, two different models based on traditional LSTM network are applied, and the effects of various parameters are investigated; meanwhile, four models based on a novel LSTM-MLP structure with two-branch input is proposed. For the new LSTM-MLP model, we use historical irradiance (or historical irradiance and meteorological parameters) as the main input and the meteorological parameters at the current time or the next time as the auxiliary input to predict the irradiance at the next time through the multilayer LSTM-MLP network. Experimental results show that the proposed model can achieve better prediction results.

The main innovations of this study are as follows: (1) An LSTM-MLP structure with two branches, including main input and auxiliary input, is proposed, which can provide a reference for similar models. (2) It is confirmed that the lagging time plays an important role when the input variables of the LSTM model are small. Still, for more input information, it is not that the more the lagging parameters, the higher the accuracy. (3) The meteorological parameters at the next moment play a vital role in the prediction accuracy, which can be gained by the weather forecast.

The organization of this paper is as follows: The methodology is described in detail in Section 2. Section 3 provides information about the dataset. Experimental results and discussion are presented in Section 4. Finally, conclusions are given in Section 5.

#### 2. Method

##### 2.1. Long Short-Term Memory Network

In the learning phase, the traditional neural network cannot use the information learned in the previous time step to model the data of the current step. This is the main shortcoming of conventional neural networks. RNNs attempt to solve this problem by using loops that pass information from one step of the network to the next, ensuring the persistence of the information. In other words, the RNNs connect the previous information to the current task. Using previous sequence samples may help to understand the current sample.

The LSTM network, which has the time-varying inputs and targets, is a special RNN and was initially introduced by Hochreiter and Schmidhuber [35]. Due to the excellent ability to solve the long-term and short-tern dependency problem, the LSTM network often has satisfactory performance in processing time series. A general architecture is composed of a cell (the memory part of the LSTM unit) and three “regulators” (usually called gates), of the flow of information inside the LSTM unit: an input gate, an output gate, and a forget gate. The memory unit is an essential parameter of the LSTM network, which can store information over an arbitrary time. The input gate, forget gate, and output gate can control the actual input signal by adding or deleting information to the signal state.

A schematic of the LSTM block can be seen in Figure 1. Every time a new input comes, its information will be accumulated to the cell if the input gate is activated. The prior cell status could be forgotten in this process if the forget gate is activated. Whether the latest cell output will be propagated to the final state is further controlled by the output gate.

The model input is denoted as , and the output sequence is denoted as , where is the prediction period. In the context of solar irradiance forecasting, can be considered as historical input data (e.g., irradiance and meteorological parameters), and is the forecasting data. The predicted irradiance will be iteratively calculated by the following equations [36]:where denotes the input gate, is the forget gate, is the activation vectors for each cell, is the output gate, is the activation vectors for each memory block, is the weight matrices, is the bias vectors, and “ʘ” represents the scalar product of two vectors, and denotes the standard logistics sigmoid function defined as follows:

is a centered logistic sigmoid function defined as follows:

is a centered logistic sigmoid function defined as follows:

##### 2.2. Model Development

As previously mentioned, the primary objective of this study is to examine the feasibility of the LSTM network for short-term solar irradiance forecasting and find the optimal structure of the LSTM for the forecast. In this section, firstly, the standard LSTM solar irradiance forecasting pipeline is introduced. Then, a classical LSTM model with two input structures and a novel model with four different input structures were conducted to discuss the performance of the LSTM network.

Figure 2 presents a standard pipeline for solar irradiance forecasting through LSTM. The data is divided into training, validation, and test. The feed-forward and feed-backward are the two types of LSTM models that are used to process the data and train network further. The error calculation is carried out when the models are developed, which can be used to describe the training accuracy and decide the feed-backward. At the final stage, the selection of a successful model for prediction is established.

The structure of the conventional LSTM model (we call it Model I) for solar irradiance forecasting can be seen in Figure 3. The network structure contains 1 input layer, 2 LSTM layers (or 1 LSTM layer), and 1 output layer. The input layer includes two different structures in which the input A is the data of historical irradiance, and input B is the data of historical irradiance and meteorological parameters. These structures can be expressed as I-A and I-B. For input A (I-A), the historical irradiance at time is feed LSTM layer 1; for input B (I-B), the historical irradiance and meteorological parameters at time is feed LSTM layer 1, and *m* is the length of the lagging window in time.

Meanwhile, the novel LSTM-MLP structure is proposed in Figure 4 (named Model II). A two-branch structure is designed, including one main input, one auxiliary input, one main output, and one auxiliary output. The data of history irradiance (or irradiance and meteorological parameters) is as main input, which is feed to LSTM layers. When the data is output from the LSTM layer, one part is output as auxiliary output, and the other part is previously combined with the meteorological parameters (auxiliary input) at the current or next time and sent to a new MLP structure. After several hidden layers of MLP, the final output is the main output, which is the irradiance prediction value at the next time.

The simplified expression of the above operation is as follows:where represents the main input, which is the time series of historical irradiance (or together with historical meteorological), denotes the LSTM layer, represent the output through the LSTM layer, denotes the auxiliary input described in Figure 4, means the concatenate operator, means the fully connected layer, denotes the MLP layer, and and denote the auxiliary output and main output, respectively.

As can be seen in Figure 4, there are two input methods for the main input and auxiliary input, respectively. According to the different combinations of main inputs A and B and auxiliary inputs C and D in Figure 4, the model can be expressed as II-AC, II-AD, II-BC, and II-BD. In order to find better network parameters, six experiments are designed with two models mentioned above. There are (1) Model I-A; (2) Model I-B; (3) Model II-AC; (4) Model II-AD; (5) Model II-BC; and (6) Model II-BD, where the influence of different lagging time parameters (e.g., from Lagging 1 and Lagging 12) is discussed.

Figure 5 shows the input (or main input) time series structure of the train samples. *S*(*t*) is the current data, *n* is the number of train samples, and *m* is the number of input data in each group, which is the number of lagging time and the length of the lagging window in time. For example, we used *S*(*t*−*m*), *S*(*t*−*m*−1), *S*(*t*−2), and *S*(*t*−1) as training input and *S*(*t*) as training output. Then the data are shifted; the input has become *S*(*t*−*m*−1) to *S*(*t*−2), the output is *S*(*t*−1), and so on.

##### 2.3. Forecasting Accuracy Evaluation

To assess the prediction performance of the involved models, four error measures, which include the root mean square error (RMSE), the normalized root mean square error (nRMSE), the mean absolute error (MAE), the mean bias error (MBE), and *R* (Pearson’s correlation coefficients) are utilized in the forecasting experiments.

These indexes can be defined as follows:where denotes the number of testing instances, denotes the prediction value of the models, denotes the mean value of , denotes the measured value, and denotes the mean value of .

Besides, forecast skill (FS) is an indicator that compares a selected model with a reference model (usually with the persistence model), regardless of the prediction horizon and location [37,38], which is a fair-minded approach to evaluating performance in solar irradiance prediction, as described by the following equation [2]:

The persistence model is one of the most basic prediction models, which is often applied to compare the performance of other prediction models. The definitions of this model are varied; this paper adopts the most basic definition, which is to assume that the predicted value at the next time is the same as the present value [39,40]:

To further evaluate the performance of the adopted model compared with the benchmark model, the promoting percentage of RMSE is employed to make a further comparison. The formulas are as follows:where is promoting percentage of RMSE, and and are the root mean square error computed from the benchmark model and comparison model, respectively.

#### 3. Data and Analysis

The data used in this study came from a solar power plant in Denver, Colorado, USA. Average global horizontal irradiance (GHI; in this paper, solar irradiance represents GHI) and meteorological data (such as ambient temperature, relative humidity, wind velocity, atmospheric pressure, precipitation, and so on) have been collected in a one-hour resolution during January 1, 2012, to December 31, 2016, from NREL Solar Radiation Research Laboratory [41]. The data from 2012 to 2015 is used for training and validation; the data from 2016 is used for testing. The main statistical characteristics of solar irradiance in this dataset are shown in Table 1.

Pearson’s correlation coefficient is the test statistics that measures the statistical relationship or association between two continuous variables. The relationship between irradiation and wind speed, atmospheric pressure, air temperature, and relative air humidity was analyzed to determine whether these variables should be included as inputs and which parameters to choose as inputs in this network. Table 2 shows the Pearson correlation coefficient between the five weather variables and the solar irradiance on the dataset. It can be observed that only temperature and humidity have a high correlation. However, the irradiance is not correlated with wind speed, precipitation, and pressure, so these three meteorological parameters are excluded. Figure 6 shows the average hourly irradiance distribution for different months in 2016. It can be noticed that there is a strong correlation between hours for each day and solar irradiance. Obviously, the irradiance value is low at the beginning of the day and increases to the peak value at noon and then gradually decreases in the afternoon. Meanwhile, it can be noticed that the peak of irradiance is different every month. The highest peak is between June and July, and the lowest peak value is between December and January. Consequently, the time must be used as an input variable.

Autocorrelation function (ACF) refers to the degree of similarity between time series and their own lag series in a continuous-time interval. However, irradiance is a time-series data, which can be characterized by ACF. Let be a time series with length . Denote the lagged time series by periods. The autocorrelation of at lag is given bywhere is the autocovariance of at lag , is the autocovariance of at lag , and is the expected value of .

From the ACF plot above, we can see that our daily period consists of 24 timesteps (where the ACF has the second-largest positive peak). While it was easily apparent from the natural law, it can also be seen from Figure 7 that the time interval of the maximum positive and negative correlation is 12 hours. At the same time, in the actual model calculation, when the lagging time is between 12 and 24, the performance is very similar. Therefore, in this paper, we choose a 12-hour lagging time.

The training dataset is optimized by Adam algorithm, and the sigmoid function is used in the output layer for all models. The program code of this paper is performed on an Intel® Core™ I7-8600 CPU using Python 3.7.5 and Keras 2.3.1 with TensorFlow 2.0.0 backend.

#### 4. Results and Discussion

In this section, the above six models were simulated and calculated to verify the performance of the proposed method. We discuss the effect of the input length (determined by the lagging time). The forecasting results under different lengths of the input sequence with different models are shown in Tables 3–6. The details of forecasting results and analysis are given as follows:(1)For Model I, since it has only one single-branch input, the number of input variables directly affects the prediction accuracy. As can be seen in Table 3, it is clear that with the increase of lagging time parameter, the RMSE and nRMSE decrease continuously. This fact implies that, for this case study, data from previous points in time is vital for forecasting, especially when only historical irradiance is used for prediction.(2)However, when the historical irradiance and meteorological parameters are input to the LSTM network at the same time, the influence of the lagging time parameters on forecasting accuracy has a significant downward trend. When the lagging time is only one hour, the RMSE of Model I-A is 110.64 W/m^{2}, and the RMSE of Model I-B is 75.4654 W/m^{2}, which shows that when the lagging time is fixed, the information of meteorological parameters helps the prediction of irradiance very well.(3)As can be seen in Tables 3 and 4, in general, the prediction accuracy will increase with the increase of lagging time in 1–12 hours. However, the expansion of lagging time will lead to a rise in input variables, increasing in operation time. Considering these factors, we need to choose a more reasonable lagging time. In this case, although the best lagging time is 10 hours and 11 hours for Model I-A and Model I-B, respectively, we think the 8 hours lagging time is reasonable. Without a doubt, the perfect lagging time may be different for different datasets.(4)For Model II-AC and Model II-AD, compared with Model I, we add an independent branch with meteorological parameters (C: meteorological parameters at the current time; D: meteorological parameters at the next time) as input, which plays an important role. Comparing the results with Models I-A and II-AC in Tables 3 and 5, with the same lagging time, the prediction accuracy has a noticeable difference; especially when the lagging time is small, the difference is more prominent. For instance, when the lagging time is 1 hour, the RMSE is 110.64 W/m^{2} in Model I-A, but the RMSE is 73.2477 W/m^{2} in Model II-AC. The best prediction accuracy of the two models is 75.22 W/m^{2} and 71.0791 W/m^{2} by RMSE, respectively, which shows that the proposed new branch can improve the prediction accuracy. Meanwhile, it can also be seen from Table 5 that historical irradiance is used as the main input, and whether the auxiliary input is the meteorological parameter at the current time or the next time, the prediction accuracy is the same.(5)Comparing Tables 5 and 6, we find that using the meteorological parameters of the next moment can better take advantage of the proposed new branch structure. As shown in Model II-BD in Table 6, when historical irradiance and meteorological parameters are the main input and the meteorological parameters at the next moment are the auxiliary input, the prediction effect is the best; the RMSE and nRMSE are 62.1618 W/m^{2} and 32.2702, respectively. For Model II-BC, because the current meteorological parameters in the auxiliary input already exist in the main input, the accuracy improvement effect is not apparent.

The best parameters and architecture of the LSTM network for 1-hour-ahead forecasting with the proposed six models are shown in Table 7. In Model I, two LSTM layers within 100 and 40 neurons (100–40) are used with lagging time 10 and 11, respectively, but in Model II, a 64–32 MLP hidden layer is added, and most of them used only one LSTM layer.

The performance of the six models with the optimal parameters and structure can be seen in Table 8 and Figure 8. Compared with the persistence model, the performance of the forecast skill (FS) and the promoting percentage of RMSE (P) of each model is significantly improved. Compared with BPNN, the P of each model has also improved, and the improvement advantage of Model II-BD is more visible, reaching 19.19%.

The RMSE and time cost curve of the different models with different lagging time are shown in Figure 9. It can be seen from the figures that with the increase of lagging time (the dimension of the input variable increases), the time cost increases approximately linearly (especially, in Figure 9(f), there is a sudden change in time cost because the number of LSTM layers increased from 1 to 2). This is because the increase in input variables leads to an increase in the amount of calculation. Meanwhile, except for Model I-A, the RMSE of other models does not decrease linearly with the rise of lagging time, but only shows a certain downward trend, and the whole curve is fluctuant. This indicates that the optimal lagging time is not the maximum lagging time; we need to choose the appropriate lagging time according to the actual dataset and the required accuracy.

**(a)**

**(b)**

**(c)**

**(d)**

**(e)**

**(f)**

The one-hour-ahead irradiance forecasted results for the proposed Model II-BD with the best parameters and architecture are shown in Figure 10. As can be seen in Figure 10(a), the blue circle (O) in the figure represents the measured value, the red asterisk (^{∗}) denotes the forecasted value, and the predicted value and the actual value can remain the same for most of the time. It can be shown more clearly from the local enlarged drawing that the difference between measured and forecasted values is small. It is clear from Figure 10(b) that the predicted values are strongly correlated with the measured solar irradiance data, and the linear regression coefficient reaches 0.9642. So, in summary, the forecasted values of the solar irradiance have good agreement with the measured values.

**(a)**

**(b)**

Through the above experimental results, we found that the Model II-BD structure of the LSTM-MLP model has the best prediction accuracy. The following LSTM-MLP model specifically represents the LSTM-MLP model with a Model II-BD structure.

Six experimental simulations were performed to verify the performance of the proposed LSTM-MLP model, including BP network, general RNN network, random forest network, SVM network, general LSTM, and LSTM-MLP model. The forecasting results are shown in Table 9. As can be seen from the table, the RMSE, nRMSE, MAE, MBE, and *R* criteria of the proposed LSTM-MLP model outperformed the other five general machine learning models. Compared with BPNN, RNN, random forest, SVM, and LSTM, the promoting percentage of RMSE (P) was improved by 19.19%, 20.15%, 11.68%, 19.31%, and 13.48%, respectively. Obviously, the LSTM-MLP model’s predicted results are better than those of the other five models. This is because, for the LSTM-MLP model, it is a mixture model of LSTM and MLP models, in which the MLP model adds a new input containing the future hidden information, so it can improve the prediction accuracy.

Furthermore, the data of three weather conditions are randomly selected from the test set, and the results are shown in Figure 11 and Table 10. On a sunny day (June 27, 2016), all the prediction curves and the measurement curve are in good agreement, LSTM model has the best prediction result, but LSTM-MLP also has high precision, and the nRMSE is only 4.73%. On a cloudy day (June 8, 2016), the measured values show significant volatility, and the predicted values of different models have similar trend curves, but the error is more prominent. The rapid change of the cloud layer in the cloudy day brings enormous difficulties with irradiance prediction. In contrast, the proposed model shows a good prediction effect; the nRMSE is 35.89%. On a rainy day (May 17, 2016), the measured value of irradiance is low, which can be seen from the solid red line in Figure 11, but the predicted value of the red dotted line can better follow the change of measured value. This indicates that the proposed LSTM-MLP model shows better performance on rainy days. All related results are reported in Table 10.

**(a)**

**(b)**

**(c)**

In order to place the work with other published works, the results with the proposed approach and results from different studies of others are compared in Table 11. The results are similar.

#### 5. Conclusions

In this work, a new novel LSTM-MLP structure with two-branch input is proposed. The proposed LSTM-MLP includes one main input, one auxiliary input, one main output, and one auxiliary output. The data of historical irradiance (or irradiance and meteorological parameters) is as main input, which is feed to LSTM layers. One part from the LSTM layer is output as auxiliary output, and the other part is previously combined with the meteorological parameters (auxiliary input) and sent to a new MLP structure. The output from several hidden layers of MLP is the main output, which is the final irradiance prediction value. Four network structures based on LSTM-MLP and two network structures based on traditional LSTM are designed and developed. A real-world test case in Denver, which consists of 5 years of data, is used to verify and discuss the potential of each model.

The experimental results demonstrate that the proposed Model II-BD, which with historical irradiance and meteorological parameters as main input and the next moment meteorological parameters as an auxiliary input, significantly outperforms other models in terms of three widely used evaluation criteria. The RMSE is 62.1618 W/m^{2}, the nRMSE is 32.2702%, and FS is 0.4477. Compared with BPNN, the promoting percentage of RMSE (P) of Model II-BD is 19.19%. The meteorological parameters at the next moment play a vital role in the prediction accuracy, which can be gained by the weather forecast. The lagging time is a significant variable for the input of LSTM, especially when only historical irradiance is used as input (e.g., Model I-A).

#### Data Availability

The data used to support the findings of this study are available from the corresponding author upon request.

#### Conflicts of Interest

The authors declare that there are no conflicts of interest regarding the publication of this article.

#### Acknowledgments

This work was supported by the National Natural Science Foundation of China (Grant nos. 61875171, 61865015, and 61705192) and the National Natural Science Foundation of Yunnan Province (Grant no. 2017FD069).