Abstract

In recent years, solar energy has attracted a great deal of attentions from scientific researchers because it is a clean and renewable form of energy. To make good use of solar energy, an effective way to forecast solar radiation is essential to guarantee the reliability of grid-connected photovoltaic installations. Although an artificial neural network (ANN) is of great importance, irrelevant variables are utilized which results in complex model and intractable computation cost. To remove these irrelevant variables, the combination of variable selection methods and ANN are applied. However, how to select the regularization parameters in these techniques is challenging. This paper successfully investigates a square root elastic net- (SREN-) based approach to tackle this challenge and selects all the important variables. An Elman neural network (ENN) is constructed with the important variables selected by SREN as inputs. Based on meteorological data, SRENENN has been developed for 1-year period in Xinjiang area of China. The present model delivers superior relationship between the estimated and measure values.

1. Introduction

Owing to the rapid development of the global economy, energy crisis and environmental pollution problems have threatened the sustainable development and human health. More and more countries pay much attention to green and renewable sources of energy, so it is essential to utilize sources of clean energy instead of fossil fuel [1, 2]. In fact, all kinds of energy sources derived from the sun have a diameter of 1.39 × 109 m and emit ferocious energy of 3.8 × 1020 MW, but the earth only obtains a small fraction of 1.7 × 1014 kW [3]. As one of the most significant forms of green energy, solar energy was used since the prehistoric times because it can be captured anywhere. Solar energy is a renewable and clean alternative for solving the worldwide energy shortage and environmental problems [4]. It can be applied in several fields including locating photovoltaic power plants, scheduling electrical load, and developing low-carbon economy [5]. It is significant to get the reliable global solar radiation data for investigating, assessing, and utilizing solar energy resource. Although ground-based measurements can obtain the accurate global solar radiation, all the locations are not available [6]. In recent years, geostationary weather satellites can be applied to estimate global solar radiation at ground level, but it is worse than the forecasting models because it is an indirect approach [7]. In addition, the weather is intrinsically chaotic and instable which greatly affects the global solar radiation. These volatilities will threaten the stability and quality of the whole power system [8]. Therefore, it is vital to develop some models to improve the forecasting accuracy of global solar radiation through several atmospheric factors.

Many researchers attempt to study several soft computing methods to forecast solar radiation and evaluate their potential of solar energy. These models include time series regression models (ARMA, ARIMA, and GARCH), empirical models, and machine learning techniques (artificial neural networks, support vector machine, etc.) [9]. Sun et al. proposed ARMAX-GARCH model to forecast daily global solar radiation using several meteorological variables. The results of experiment showed that global solar radiation depends more on sunshine duration than temperature difference at certain stations [10]. David et al. applied the combination of ARMA and GARCH model to provide probabilistic forecasts for solar irradiance. The proposed recursive ARMA-GARCH model was easier to estimate parameters and got a good accuracy [11]. Quej et al. developed a new empirical model to predict hourly global solar radiation applying meteorological factors such as rainfall, temperature, and humidity at six sites in Mexico. Through comparing with other models, the proposed model had the best forecasting precision [12]. Ouderni et al. utilized several empirical models including Benson model, Page model, and Angstrom-Prescott-Page model to assess the solar potential in the gulf of Tunis [13].

As one of the most popular forecasting models, machine learning techniques including artificial neural networks (ANNs), intelligent optimization algorithms, and support vector machines (SVMs) own self-adaptiveness and robustness and have already been successfully applied to forecast global solar radiation. ANN techniques include backpropagation (BP), radial basis function (RBF), multilayer perceptron (MLP), and extreme learning machine (ELM) [1417]. Benmouiza and Cheknane used -means method to find the input samples and took advantage of nonlinear autoregressive (NAR) neural networks to forecast hourly global horizontal solar radiation [18]. Chen et al. presented a model based on fuzzy rules and neural network to forecast solar radiation; the case study revealed that the proposed technique achieve excellent forecasting accuracy [19]. Renno et al. developed two ANN models to estimate hourly direct normal irradiance and global radiation [5]. Salcedo-Sanz et al. proposed a novel approach Coral Reefs Optimization-Extreme Learning Machine (CRO-ELM) to predict daily global solar radiation and achieved satisfactory results [20]. Gairaa et al. adopt a new hybrid technique combining the linear ARMA and the nonlinear ANN to forecast daily global solar radiation in Algeria. The experimental results revealed that the hybrid model is superior to the single one [21].

Although ANNs have been widely exploited because of their nonlinear mapping ability, prediction capabilities, and robustness, the optimal parameters in the network such as weights, bias, and the number of the hidden layer nodes are not easy to determine, and the training of the network is likely to converge to a local minimum [22]. Furthermore, its structure of the network would be quite intricate if all the variables are applied as inputs. This will cause the following two problems: (1) the complex structure makes critically trouble for forecasting and selection performance and (2) the complex structure needs much computation time. The weights between the nodes in an ANN are going to be estimated, and it would spend a lot of time if ANN has excessive number of nodes. Based on the above discussion, investigating an effective method to establish a simple neural network is essential. Since its structure relies much on the number of input sets, variable selection techniques are needed to choose the significant variables which are considered as inputs of an ANN.

Some researchers focus on selecting some important variables as inputs of the forecasting models including but not limited to ANN and SVM. Benghanem et al. applied Levenberg-Marquardt learning algorithm to construct ANN to study daily global irradiation of Saudi Arabia. Air temperature, sunshine duration, relative humidity, and day of the year are used as the input variables which achieved good forecasting accuracy [23]. Rahimikhoob used temperature including the highest temperature and the lowest temperature to forecast global solar radiation in Southwest of Iran [24]. Qing and Niu developed a new technique long short-term memory (LSTM) networks to predict hourly solar irradiance and used the weather data (temperature, wind speed, dew point, etc.) to enter the networks [25]. Vakili et al. established MLP neural network to estimate daily solar irradiance using temperature, wind speed, relative humidity, and particulate matter 10 [26]. Rohani et al. proposed a Gaussian process with K-fold cross-validation model to forecast daily and monthly solar radiation using temperature, humidity, pressure, and sunshine hours as input variables [27]. It is found that the above hybrid approaches combine the advantages of several single models and perform better. Variable selection algorithms can be used to reduce high-dimensional data that select the optimal input variables or model [28, 29]. Jović et al. studied the solar radiation and used adaptive neuro-fuzzy inference system (ANFIS) to select the most relevant factors from temperature, mean sea level, and relative humidity as the predictors [30]. Almaraashi applied four different feature selection methods to determine the input space and forecast daily solar radiation in Saudi Arabia based on a multilayer neural network [31]. Aybar-Ruiz et al. adopted a grouping genetic method to select the relevant atmospherical features in extreme learning machine model for predicting global solar radiation [32]. Mori chose meteorological variables using graphical modelling to estimate solar radiation [33]. Jiang and Dong developed penalized kernel SVM approaches to select structural variables and forecast global horizontal radiation [34].

As far as we know, the current research papers focus on the way to select variables by trying some specific combinations or groups. However, there is no theoretical guarantee of the way to determine these combinations and considering all the possible combination of variables is time-consuming. Penalized variable selection methods are advocate to select the important variables directly without trying possible combinations, and they are more straightforward to use. Furthermore, compared with the conventional ANNs and SVMs, Elman neural network (ENN) is a local recurrent neural network with a single hidden layer, which owns fast learning rate, good dynamic characteristics, and high global stability [35, 36]. In this paper, an ENN structure can be selected as the forecasting technique for global solar radiation forecasting. This work advocates square root elastic net variable selection procedure in the Elman neural network (SRENENN) approach to forecast the global solar radiation in the Xinjiang area of China. The primary novelty and contributions of this study are provided in the following list: (1)An ENN is applied to forecast global solar radiation with meteorological variables.(2)Square root elastic net is used to effectively extract the meteorological variables which are applied as inputs of ENN, and the optimal model is determined by the 10-fold cross-validation to improve forecasting precision.(3)A novel square root elastic net variable selection procedure in the Elman neural network (SRENENN) algorithm is proposed, and the corresponding forecasting results are compared systematically using Wilcoxon signed-rank test and Friedman test.

The structure of this study is given: Section 2 describes the square root elastic net variable selection procedure and Elman neural network; Section 3 investigates the case study based on real data analysis; Section 4 provides the forecasting accuracies and corresponding experimental results; the conclusions are presented in Section 5. The schematic overview of the whole paper is given in Figure 1.

2. Materials and Methods

2.1. Square Root Elastic Net

Based on the dataset , the following linear regression model is considered after centering and : where denotes the target response which is going to be studied. represents the data matrix with samples and variables, and is the coefficient for the true model. Let be the identity matrix, the error term follows Gaussian distribution with . To obtain an interpretable model, the following optimization problem is considered: where denotes the penalty function with representing the tuning parameter. When which is convex penalty, (2) becomes a well-known LASSO [37] problem given in (3). LASSO is more easy to compute in big data because of its convex form.

In addition to convex penalty function, nonconvex penalty function is also proposed to perform variable selection. For instance, [38] investigate the SCAD penalty which is given below. where , is selected by general cross-validation. The elastic net (EN) [39] penalty is given as

In this paper, we are going to fulfill the following two tasks: (G1) model interpretation and (G2) forecasting accuracy. Elastic net can be used to achieve these goals because its penalty function consists of both LASSO and ridge penalty. However, its forecasting performance is still affected negatively by the noise level which is difficult to estimate. To solve this problem, square root regularization is considered in our work by using square root error loss function instead of square error loss function . Therefore, we combine the benefits of square root error loss and EN penalty by proposing square root elastic net (SREN) which considers the following optimization problem.

Comparing with EN which takes (5) into account, SREN has the following advantage: two tuning parameters ( and ), which are determined by , can be selected properly since they do not involve that is extremely difficult to estimate in data analysis. Specifically, it is known that , where RSS represents residual sum of square. When , cannot be estimated. Even when , the high coherence causes a large RSS value which results in large value. SREN avoids estimating in the parameter tuning work which boosts the model forecasting accuracy.

Square Root LASSO (SRL) [40] considers the optimization problem as follows:

Although both SRL and LASSO use the same L1 penalty, SRL applies square root error loss function which can facilitate the parameter tuning work. Comparing with SRL, SREN adds ridge penalty which is a L2 type penalty to handle the high coherence between variables and enforce more shrinkage to the model. Although they both apply square root error loss function, SREN is able to get more accurate result in a model with high coherence. Furthermore, SREN applies two tuning parameters ( and ) to adjust the model performance while SRL just use one tuning parameter .

Two novel plans are proposed to design the algorithm for solving (6), which are denoted by Plan A and Plan B, respectively. (i)Plan A: denote the following: The algorithm is designed based on the following iterations: Notice that soft thresholding operator is able to be applied to solve (9).(ii)Plan B: the algorithm is derived based on the following iterations: To solve (9) and (10), threshold functions so-called -estimators [41] are applied in our work. The definitions of thresholding rules are given as below.

Definition 1. A thresholding function is a real valued function defined for and such that (1),(2) for ,(3),(4) for .

It can be told from Definition 1 that is an odd monotone unbounded shrinkage rule for , at any . can be used in a vector manner if either or is given as a vector. The LASSO, SCAD, and EN thresholding functions are provided as follows: where , are two regularization parameters.

2.2. Parameter Tuning

Parameter tuning work is of great importance in assuring the performances of forecasting methods. Notice that there are two tuning parameters and used in the proposed method. Cross-validation (CV) is a famous data-driven method which has been widely applied in machine learning community. Given a fixed value for and , the in-sample data will be randomly partitioned into pieces of roughly equal size. The forecasting model will be trained using pieces of in-sample data, and the test error is computed using the th piece. CV will repeat this procedure for times. The CV errors are obtained by adding the test errors, and the optimal regularization parameter is determined by the smallest CV error.

2.3. Elman Neural Network

Elman neural network (ENN) was first advocated by Elman in 1990 to solve speech recognition problem. It is a typically global feed forward local recurrent network. Its main network structure is consist of input layer, hidden layer, and output layer which are also the structure of three-layer feed-forward neural network [42] and backpropagation network [43]. The weights between different layers are going to be trained based on learning rule. The feedback connection has sets of neurons that record the output, and the weights are fixed. There is also a context layer in ENN which stored the output of hidden layer in the previous time point. Comparing with multilayer perceptron, ENN has a short memory and performs the task based on sequence prediction which adapts to time-varying characteristics. The schemes of ENN can be described in the following way: where is the input vector, is hidden layer vector, is output layer, , , and are weights and biases of the ENN, and and are activation functions.

The weights in the network are trained by gradient-based backpropagation through time (BPTT). To reduce the model complexity of the neural network, L2 regularization is often applied and the following optimization problem is considered: where denotes the mean square error of the forecasting model. The network weights needed to be estimated are given by . represents the estimated forecasting value, and is the tuning parameter. Notice that the complexity of neural network depends on the number of input layers and hidden layers. If variable selection method is applied appropriately, the number of inputs will be reduced so that a simple neural network can be constructed.

2.4. Square Root Elastic Net Elman Neural Network Model

This paper combines the advantages of SREN and ENN and proposes a novel forecasting model called SRENENN model. The flowchart of SRENENN model is shown in Figure 2, which is designed in the following 5 main steps: Step 1:Split the original global solar radiation dataset into training dataset and test dataset (cf. Section 3 for more details).Step 2:CV procedure is applied to training data for selecting the optimal regularization parameters.Step 3:SREN is used to select the important variables with regularization parameters.Step 4:Elman neural network is established with variables selected by SREN.Step 5:The forecasting performance is evaluated using the test dataset.

Algorithm 1 shows SRENENN algorithm with , , and defined. When , SRENEENN algorithm converges. However, there is no need to let the algorithm run until convergence to reduce the computation time. The stop criterion of SRENNE algorithm is determined based on trial and error. The convergence error tol is set as 1e – 4, and maximum number of iteration is given as 100. SRENENN algorithm uses Accelerated Gradient Method (AGM) [44] to reduce the number of iterations so that the convergence can be achieved using less computation time. AGM has three advantages: (i) it does not involve any computation of inverse of matrix; (ii) paralleling the selection of unknown parameter and computation of gradient; (iii) making use of momentum to increase the convergence speed.

Inputs: X (centered and scaled), y (centered), M: maximum number of iterations, : tuning parameters, tol: error tolerance.
Outputs: forecasting errors.
Step 1. Data Splitting
1. Split the original dataset into Training data Dtrm (76% of original data) and test
2. data Dtst (24% of original data).
Scale of data
3. with
Step 2. Cross validation
4. Divide the training data into K folds;
5. for i = 1 to K;
6. Use i-th fold as CV test data F and the remaining folds are regarded as CV training data T;
7. Generate grid values of and .
Step 3. Run SREN algorithm with AGM
8. for u = 1 to s;
9.  for v = 1 to m;
10.   Initialization:
11.   while or do
12.    Step 1.
13.    Step 2. (Plan A).
14.       (Plan B).
15.    Step 3. (Plan A).
16.        (Plan B).
17.    Step 4.
18.   end while
19.  end for
20. end for
21. Obtain the solution path and corresponding sparsity pattern using CV training data T.
Calculate CV errors
22. Calculate CV errors using F, B and G. Find the optimal tuning parameters
23. and with respect to the smallest CV error.
Step 4. Establish elman neural network
24. Determine the optimal model parameters using Training data with selected variables considered as inputs
Step 5. Evaluate the forecasting performance
26. Calculate the test error using Test data
27. End for

Define , and the convergence of SRENENN algorithm is guaranteed theoretically in Theorem 1 whose proof is shown in Appendix.

Theorem 1. Let be the step size of SRENENN algorithm and , and assume the following regularity condition hold , where collects all the linear combination of and . Then the following inequality holds for some .

3. Case Studies

For the real data application, six sites from Xinjiang area in China are considered to demonstrate the advantages of the proposed SRENENN model via comparisons with traditional methods.

3.1. Data Description

Qinghai, Tibet, Xinjiang, and Inner Mongolia are suitable locations to install photovoltaic power station because sunshine is quite rich in these areas. That is why six sites (Site 1, Site 2, Site 3, Site 4, Site 5, and Site 6) are selected from these regions. The including latitudes and longitudes of six sites are provided in Figure 3. The dataset applied in this work is collected from National Renewable Energy Laboratory (NREL) which is available at http://www.nrel.gov/gis/solar.html. In addition to global solar radiation which is going to be studied, seven meteorological variables including solar zenith angle, precipitation, temperature, wind direction, wind speed, relative humidity, and pressure are provided in the dataset. The samples of this dataset are collected based on the global solar radiation from 11:00 am to 19:00 pm in 2014 because the solar sources are very abundant during this time interval. The main purpose of this paper was to choose important variables from seven meteorological variables to perform the forecasting task. The strategy of splitting the data into training data and test data is given as follows: 19 days of months in each season are randomly selected as the training data to establish the forecasting model. The test data consists of 6 days which are also selected randomly from the remaining days in every season. Thus, the size of training data is 684 (19 days × 9 hours × 4 seasons = 684), and the size of test data is 216 (6 days × 9 hours × 4 seasons = 216). Furthermore, experiments on each season are also implemented based on the training samples (19 days × 9 hours = 171) and forecasting samples (6 days × 9 hours = 54) presented in Table 1; it is observed that the forecasting performance of different models is going to be tested using four seasons, and the forecasting samples in each season take up approximately 24%, which is a reasonable proportion.

3.2. Evaluation Criterion

To evaluate the forecasting performances of the proposed method and other comparing approaches, several criteria including mean absolute percent error (MAPE), mean absolute error (MAE), root mean square error (RMSE), and Theil inequality coefficient (TIC) are applied as evaluation criteria [45]. Let be true value, represents the estimated value, and denotes to be the sample size of test data. The best forecasting model provides the lowest MAPE, MAE, RMSE, and TIC. The evaluation criteria are provided as below.

3.3. Wilcoxon Signed-Rank Test

The Wilcoxon signed-rank test was applied to determine if the proposed SRENENN model was superior to the SVM, ENN, LASSOENN, PCAENN, SCADENN, and SRLENN models for global solar radiation. The Wilcoxon signed-rank test is a nonparametric statistical hypothesis test used when comparing two matched or related samples to assess whether their population median ranks differ (i.e., it is a paired difference test). It can be used as an alternative to the parametric -test.

Let be the sample size or the number of pairs. The prediction sample size of each method was 216 (6 days × 9 hours × 4 seasons = 216), and thus, . For , let and be the forecasting values of two different approaches and consider the following hypothesis: where and are medians of sequences and . The details of Wilcoxon signed-rank test process can be listed as follows [4648]: Step 1:For , calculate and , where is the sign function.Step 2:Exclude pairs with .Step 3:Let be the reduced sample size. Order the remaining pairs from smallest absolute difference to largest absolute difference . Rank the pairs, starting with the smallest as rank 1.Step 4:Calculate the sum of the positive ranks and the sum of the negative ranks.Step 5:As increases, the sampling distribution of converges to a normal distribution. Thus, for larger samples, statistic can be calculated as . If , then reject . For small samples, can be calculated as . If , then reject . Alternatively, a value can be calculated from enumeration of all possible combinations of or given .

3.4. Friedman Test

Friedman test is a nonparametric statistical test which can be applied to evaluate the performances of forecasting methods based on different criteria on multiple datasets [49]. The Friedman test considers the following hypothesis: where with representing the rank of th of algorithms of datasets. Based on the Friedman statistics [50] which follows chi square distribution , an -distribution statistics is calculated

If the null hypothesis is rejected which means there exist significant differences between the comparing algorithms, a post hoc test will be given based on critical difference (CD).

3.5. Statistical Analysis of Selected Variables

To test whether the selected variables are significant or not, the following test statistic is considered. where is the residual sum of square for the least square fit of full model with variables. And the same for the smaller reduced model with variables. Under the Gaussian assumption and the null hypothesis that the smaller model is correct, the test statistic will have a distribution. If the value is larger than the critical value, then the selected variables are determined to be significant.

4. Results and Discussion

In this paper, SREN is combined with the Elman neural network (ENN) to select the important variables and forecast the global solar radiation. SREN is a penalized variable selection method using convex penalty function which is computational efficient. Comparing with subset variable selection which considers all the possible combinations of the variables, SREN selected all of the important variables directly.

Lots of approaches are considered for global solar radiation such as SVM, ENN, LASSOENN, SCADENN, SRLENN, and PCAENN [50]. Comparisons between these methods and SRENENN are presented in this part.

Table 2 shows the parameters applied in establishing the comparing forecasting models. The regularization parameters and are selected as 0.0625 and 5e − 5 in SRENENN methods using 10-fold cross-validation. In LASSOENN and SCADENN, the regularization parameters are chosen as 4 and 1. represents the maximum number of iterations to establish neural network and is set as 2000. The activation function from input layer to hidden layer (Func1) is given as Tansig transfer function based on trial and error. Similarly, denote Func2 to be the activation function from hidden layer to output layer and it is set as Tansig transfer function. The selection of the number of hidden neurons which determines the model complexity is important in constructing an ENN. The best value for is selected from a generated grid values {5, 10, 15, 20}. The back propagation through time (BPTT) is applied to construct an ENN with the weights initialized using random values from the uniform distribution . Based on trial and error, the gradient descent with momentum and adaptive learning rate are set as 0.9 and 0.01. SVM is implemented using R package “e1071” with rbf kernel function with two unknown parameters selected from two grids and using 10-fold cv. All the parameters are selected in all the models by proper tuning work.

The results presented in Table 3 reveal that SRENENN-B achieves the best results in terms of forecasting accuracy on average in all the sites except Site 3 where SRENENN-A has the best performance. The significant differences are observed among SRENENN-B, SVM, and ENN methods which do not involve any dimension reduction. For instance, MAE obtained by SRENENN-B is much lower than ENN in Site 2. The error has been reduced by about 37.23% using fewer variables. Comparing with SVM which also has a good performance, SRENENN-B improves the forecasting accuracy by 17.89%. In Site 4, SRENENN-B has boosted the RMSE of the ENN and SVM by 47.87% and 39.36%, respectively. Comparing with PCAENN, LASSOENN, and SCADENN which performs better than ENN, SRENENN-B is still the winner in terms of MAE, RMSE, MAPE, and TIC. It is easy to observe that the PCAENN, LASSOENN, and SCADENN provide similar performances but LASSOENN delivers better results than PCAENN, SCADENN, and SRLENN in almost all the sites. From the aspect of MAPE, LASSOENN provides better results in all sites except Site 1 and Site 2. Further, it was noticed that the performance of SVM is better than SRLENN in terms of MAE in Sites 2–6 and SRENENN-A outperforms SRLENN except Site 2. On the other hand, the computation time of different forecasting methods is shown in the last column of Table 3. Obviously, it takes SRENENN-B less computation time than other approaches. Both SVM and ENN which take all the variables as inputs use more computation time than penalized ENN and PCAENN. The computation time of other forecasting methods is comparable. The corresponding plot is shown in Figure 4. Therefore, SRENENN-B delivers better forecasting results with less computation time.

Table 4 depicts the scores of the compared models. The best model will give the lowest total score. Obviously, SRENENN-B provides the lowest score among all the compared methods (see the last column), followed by LASSOENN, SRENENN, SRLENN, SVM, PCAENN, SCADENN, and ENN. Tables 5 and 6 show the performances of compared forecasting approaches including four seasons. It is not difficult to find SRENENN-B that gives the highest forecasting accuracy. SVM provides better results than other methods in spring, autumn, and winter while ENN gives the worst result in four seasons. The results are quite similar as what we observe in Table 3.

The results using Wilcoxon signed-rank test between SRENENN-B and other forecasting approaches are summarized in Tables 7 and 8, which show statistic values and values. In this study, the significant level is set as 0.05 so that the critical value is 1.96. From the tables, it is easy to observe that all of the statistic values are larger than 1.96 and values are much smaller than 0.05. Thus, the null hypothesis is rejected and we decide that the proposed SRENENN-B model is significantly different from the other models. Since SRENENN-B has provided the smallest errors at all sites, it is concluded that SRENENN-B is superior over other models in terms of forecasting accuracy.

Table 9 reveals MAE, RMSE, MAPE, and TIC values of the SRENENN and other forecasting approaches. The results of Friedman test show that the F distribution statistics follows distribution and the critical value of it is 0.39. Thus, the null hypothesis that the ranks of compared methods are equal with each other is rejected. This means that a post hoc test based on Bonferroni-Dunn test is needed to make more comparisons. The CD value is calculated as 3.59 based on [49]. Therefore, SRENENN performs significantly better than ENN, PCAENN, SCADENN, LASSOENN, and SVM for RMSE and TIC. This is because the average ranks between SRENENN-B and these competitors are larger than 3.59. On the other hand, SRENENN-B does not show great improvement over SRENENN-A and SRLENN in terms of evaluation criteria.

Figure 5 summarized the results of estimated values against the true value. It is not difficult to tell that the estimated values of SRENENN-B are closer to the true value than other compared approaches. ENN provides the worst results because all the variables are employed as inputs. Thus, there must be some redundant features contained in the ENN. SRENENN-B gives good forecasting results in all the sites which demonstrates that the selected variables temperature, pressure, solar zenith angle, wind direction, and wind speed are considered to be important for inputs of ENN. Table 10 reveals statistical analysis of selected variables. It is observed that all the variables selected are significant because values are much greater than critical values . Furthermore, the coefficients of determination are approximately one which indicates that the established model is trustworthy.

Figure 6 reveals the boxplots in terms of RMSEs of the compared forecasting models everyday in order to reveal the benefits of SRENENN. Figures 6(a)–6(f), (A), (C), (E), (G), (I), and (K), show the RMSEs of each model with limitation including all the outliers. Median is applied here to make the comparisons because it is less sensitive to outliers. Here, RMSEs of ENN are far larger than other forecasting approaches at all the sites. From Figures 6(a)–6(f), (B), (D), (F), (H), (J), and (L), obviously, SRENENN-B gives the lowest RMSE values. Therefore, based on boxplots (A)–(L) in Figures 6(a)–6(f), SRENENN-B delivers better forecasting performances.

5. Conclusions

Global solar radiation is a vital and hot research topic. Looking for a way to predict the global solar radiation accurately is crucial. There are a number of methods derived to achieve this goal. Our work investigated and studied the SRENENN method. The key findings of this paper are demonstrated as follows: (1)To build an interpretable model and overcome the selection inconsistency of existing variable selection methods, this work studies the square root loss function and elastic net and proposes the square root elastic net (SREN) which is a novel variable selection method.(2)To boost the computational capacity of ENN and other penalize ENN models, this paper derives a simple-to-implement and fast algorithm to implement SREN. The experiments of computation time demonstrate the computation efficiency of the proposed algorithm.(3)To improve the forecasting accuracy of ENN and other penalize ENN models, this paper establishes SRENENN model using the inputs selected by SREN.

To sum up, the proposed SRENENN model provides better forecasting performances than the traditional methods based on the real data application of six locations in Xinjiang area of China. For the future research, the following research directions will be focused on: (i) explore the performance of SRENENN model on a solar radiation problem under complex weather condition; (ii) investigate the application of SREN on other time series forecasting models; and (iii) study the application of SREN on high-dimensional data.

Appendix

Proof 1. A proof is provided based on [51, 52]. Define the object function as . A surrogate function is defined as After some simple algebra, is the same as which can be reformulated as Applying Lemma 7 in reference [53], we obtain that in Plan A. For Plan B we also have and notice that Combining with the iterates defined by (9) or Eq. (10) and using Taylor expansion, we can get for some with . A simple reformulation of (A.7) yields that Under the regularity condition and large enough, is monotone decreasing. Define and let , we have . Thus, using the optimal conditions, has a unique limit point . Furthermore, satisfies the KKT condition which means it is a global minimum. This completes our proof.

Nomenclature

Abbreviation
ENN:Elman neural network
EL:Elastic net
MAE:Mean absolute error
MAPE:Mean absolute percentage error
RMSE:Root mean square error
SRL:Square root LASSO
SREN:Square root elastic net
TIC:Theil inequality coefficient
CD:Critical difference.
English Symbols
:Number of nodes in the hidden layer
:Sample size
:Maximum number of iterations in ENN
:Number of variables
:Data matrix
:Response variable.
Greek Symbols
:True variables
:The estimate in the jth iteration
:Regularization parameter for l2 part
:Thresholding rules
:Regularization parameter for l1 part
:Noise level.

Conflicts of Interest

The authors declare that there is no conflict of interest regarding the publication of this article.

Acknowledgments

This research is supported by the National Natural Science Foundation of China (Grant no. 71761016), China Postdoctoral Science Foundation (Grant no. 2017M620277 and no. 2018T110654), and Natural Science Foundation of Jiangxi, China (Grant no. 20171BAA218001).