#### Abstract

In recent decades, the integration of solar energy sources has gradually become the main challenge for global energy consumption. Therefore, it is essential to predict global solar radiation in an accurate and efficient way when estimating outputs of the solar system. Inaccurate predictions either cause load overestimation that results in increased cost or failure to gather adequate supplies. However, accurate forecasting is a challenging task because solar resources are intermittent and uncontrollable. To tackle this difficulty, several machine learning models have been established; however, the forecasting outcomes of these models are not sufficiently accurate. Therefore, in this study, we investigate ensemble learning with square root regularization and intelligent optimization to forecast hourly global solar radiation. The main structure of the proposed method is constructed based on ensemble learning with a random subspace (RS) method that divides the original data into several covariate subspaces. A novel covariate-selection method called square root smoothly clipped absolute deviation (SRSCAD) is proposed and is applied to each subspace with efficient extraction of relevant covariates. To combine the forecasts obtained using RS and SRSCAD, a firefly algorithm (FA) is used to estimate the weights assigned to individual forecasts. To handle the complexity of the proposed ensemble system, a simple and efficient algorithm is derived based on a thresholding rule and accelerated gradient method. To illustrate the validity and effectiveness of the proposed method, global solar radiation datasets of eight locations of Xinjiang province in China are considered. The experimental results show that the proposed RS-SRSCAD-FA achieves the best performances with a mean absolute percentage error, root-mean-square error, Theil inequality coefficient, and correlation coefficient of 0.066, 20.21 W/, 0.016, 3.40 s, and 0.98 in site 1, respectively. For the other seven datasets, RS-SRSCAD-FA still outperforms other approaches. Finally, a nonparametric Friedman test is applied to perform statistical comparisons of results over eight datasets.

#### 1. Introduction

An increasing number of countries are paying substantial attention to environmental issues such as global warming, climate change, and greenhouse gas emissions. A major cause of global warming is the burning of fossil fuels including coal and oil for supplying traditional electricity generation. This encourages the exploitation and utilization of renewable energy sources including solar, wind, hydro, tidal, wave, biomass, and geothermal sources as alternatives [1]. Nevertheless, solar energy, as a typical renewable energy source, has stochastic, intermittence, time-varying, and uncertainty properties, which may result in defects in the stability and reliability of power grid systems [2–4]. This introduces new challenges for the incorporation of solar energy sources into the power grids. In addition, solar radiation can be measured based on sensors including pyrheliometers, pyranometers, and radiometers combined with data-acquisition hardware and software. However, it is time-consuming, cumbersome, and expensive to install sensors across the world [5].

To tackle these challenges, it is necessary to develop accurate global solar radiation forecasting models. A wide variety of models have been established using machine learning techniques. Machine learning, which is classified as an artificial intelligence technology, is a subfield of computer science. It can be applied to numerous domains, and its characteristic is that the model can identify relations between inputs and outputs notwithstanding whether the representation is infeasible. This advantage promotes the application of machine learning models in classification, pattern recognition, spam filtering, data mining, and forecasting [6]. At present, a popular method for forecasting global solar radiation is to use an artificial neural network (ANN). ANNs have been used in various areas with different input covariates (multiple meteorological, geographical, and astronomical parameters such as temperature, humidity, wind speed, precipitation, altitude, declination, and zenith angle) and multiple time scales (hourly, daily, and monthly) [7, 8]. The main advantage of ANNs is that they do not require as many adjustable parameters as other classical techniques and provide higher forecasting accuracy. Ozgoren et al. [9] proposed an ANN model based on a multinonlinear regression (MNLE) algorithm to forecast the monthly mean daily sum global solar radiation for Turkey. Hejase et al. used a multilayer perceptron and radial basis function (RBF) techniques with comprehensive training architectures and different combinations of inputs to forecast global horizontal irradiance for three major cities in the United Arab Emirates [10]. Renno et al. [11] developed two ANNs to forecast daily global radiation and hourly direct normal irradiance for the University of Salerno. Notwithstanding its growing popularity, ANNs have their drawbacks. For example, the backpropagation neural network (BPNN) exhibits difficulty in training the input data owing to the iterative tuning of parameters required for hidden neurons because it does not always produce a unique global solution with different models; moreover, BPNN exhibits a slow response owing to its gradient-based learning algorithm and requires a longer training time [12, 13].

It is well known that the support vector machine (SVM) developed by Vapnik [14] is a powerful tool for exploring appropriate SVM regressors to establish a prediction model with sufficiently high accuracy. The main goal of SVM regressors is to construct the optimum prediction model that boosts the prediction accuracy efficiently. It is a regularization network and exhibits an advantage over ANN model. To reduce the high computation cost, least-squares SVM (LSSVM) uses a least-squares method via a set of linear equations based on structural risk minimization; therefore, it is able to prevent overfitting of the training data and does not require iterative tuning of model parameters [15–17]. Wu and Liu investigated an SVM model to estimate monthly mean daily solar radiation using measured minimum, mean, and maximum air temperatures in 24 sites in China. Five SVMs based on different input covariates were developed [18]. Chen et al. used seven SVMs and five empirical sunshine-based models to forecast daily global solar radiation at three sites in Liaoning, China. The experimental results demonstrated that all SVMs exhibited higher performance than empirical models [19]. Ekici proposed a LSSVM-based intelligent model to forecast the daily solar insolation for Turkey [20]. It was noted that the temporal behavior of global solar radiation and its corresponding atmospheric input covariates are complex and chaotic [21]. Although SVM is flexible in handling nonlinear input covariates, it encounters fluctuations and difficulties in training input datasets with wide frequencies [22]. Some researchers have focused on Gaussian process regression (GPR) methodology, which shows promising results with lower computational costs. Rohani et al. [23] presented GPR with a -fold cross-validation (CV) model to forecast daily and monthly solar radiation at Mashhad city in Iran based on several meteorological parameters. Considering the effect of climatic factors such as sunshine duration, relative humidity, and air temperature, Guermoui et al. [24] used the GPR to predict the daily global solar radiation in the Saharan climate. Guermoui et al. [25] proposed a novel method called weighted Gaussian process regression (WGPR) to predict daily global and direct horizontal solar radiation in Saharan climate. The experimental results illustrate that these developed models exhibit good forecasting performance.

It is challenging for a single forecasting model to provide an optimum solution to a complex problem. Each model has advantages and drawbacks. Thus, it is worthwhile to investigate appropriate combinations of their advantages for implementing the forecasting task. As solar radiation simultaneously exhibits nonstationary and nonlinear characteristics at different time scales, the analysis of these fluctuations at all scales can be performed using multi-scale decomposition methods. Wang et al. [26] decomposed solar radiation into a quasi-stationary process at different frequency scales using wavelet transform (WT) and employed BPNN to predict solar radiation based on temperature, clearness index, and phase space reconstruction radiation. Monjoly et al. [27] applied three multiscale decomposition methods, empirical mode decomposition (EMD), ensemble empirical mode decomposition (EEMD), and WT, to analyze the nonstationary and nonlinear features of the clear sky index, and then developed a hybrid model AR-NN combined autoregressive process (AR) and neural network (NN) to forecast global horizontal solar irradiation (GHI). The experimental results showed that the WT-hybrid model has the best forecasting accuracy. To improve the forecasting performance, Hussain and Alalili [28] investigated a new hybrid technique based on WT and four different architectures of ANN to forecast GHI in the Abu Dhabi region.

Ensemble methods are straightforward and highly effective techniques that average the approximate predictions of a few weak learners to present accurate estimates rather than find a single remarkable sophisticated learner [29–31]. The advent of ensemble methods has gained prominence. The most commonly used ensemble methods include random forest, gradient boosting, and bagging. Although these methods have wide applications in other scientific fields, they have only rarely been adopted in forecasting global solar radiation [6]. Gala et al. [32] used support vector regression, random forest regression, and gradient boosted regression as a hybrid model to improve 3-hour accumulated radiation forecasts for seven sites in Spain. Sun et al. [33] proposed different random forest models with and without considering the air pollution index to estimate solar radiation in three sites in China through comparisons with a few empirical models; their experimental results demonstrated that the performance of random forest techniques with the air pollution index is higher. Aler et al. [34] applied gradient boosting machine learning to improve the performance of diffuse and direct separation models. Ensembles of linear and nonlinear weak learners have both been adopted. The results indicated that the gradient boosting method can reduce large random errors of separation models efficiently.

The computational cost is mainly determined by the complexity of the forecasting model. It is time-consuming if several covariates are employed to establish the model. Thus, it is essential to investigate an efficient method to simplify the model structure. Covariate selection is an efficient method to extract the important covariates. There are several penalized covariate selection methods including least absolute shrinkage and selection operator (LASSO) [35], least angle regression [36], elastic net [37], and adaptive LASSO [38]. LASSO is a popular covariate selection method applying an -type penalty. Yang et al. [39] selected the important time-lagged covariate using LASSO and revealed its advantages over the exponential smoothing (ES) model and the autoregressive integrated moving average (ARIMA) model. LARS is closely associated with LASSO, and their connection is established in Efron et al. [36]. Zou proposed adaptive LASSO involving the addition of weights to the penalty parameter of each feature [38]. It outperforms LASSO in terms of selection consistency. The elastic net was first proposed by Zou, and it considers -type penalty in addition to the -type penalty [37]. Thus, the elastic net applied -type penalty, which provides the following advantages: the part focuses on feature selection and the part can be applied to boost forecasting accuracy. However, Zhang determined that the abovementioned feature selection methods apply convex penalties, which makes it difficult to detect the important covariates when the model coherence is high [40]. To solve this problem, the nonconvex penalized covariate selection approaches are advocated. Knight and Fu proposed bridge regression, which focuses on the penalty function of the form [41]. Fan and Li [42] proposed the smoothly clipped absolute deviation (SCAD) method and demonstrated its advantage over LASSO. However, to the best of the present authors’ knowledge, there is no research work on feature selection using both square root loss function and SCAD penalty in the literature. The main contribution of this study is the advocation of a novel global solar radiation forecasting approach that combines ensemble learning, square root smoothly clipped absolute deviation (SRSCAD), and the firefly algorithm (FA). Specifically, the contributions of this study are given as follows:(i)An SRSCAD method is proposed to extract the important covariates efficiently. Moreover, the optimal regularization parameter is selected by a 10-fold CV; this CV completely utilizes the data through resampling, which is particularly useful for data with a small sample size.(ii)The ensemble learning is the main structure of the proposed method. The diversity created by covariate selection of the ensemble system substantially aids in boosting the forecasting accuracy. The FA, which is demonstrated to obtain the global optimal solution successfully, is applied as a weight policy in the forecasting model.(iii)A simple-to-implement and efficient algorithm is designed based on thresholding rules and an accelerated gradient method is used to further speed up the convergence of the algorithm.

The rest of this paper is organized as follows. Section 2 presents a preliminary discussion of related methodologies. Section 3 shows the proposed method and the design of the algorithm used in the proposed method is also revealed. The empirical study and corresponding results analysis are presented in Section 4. Finally, Section 5 summarizes this study and provides concluding remarks. The nomenclature is provided in Table 1.

#### 2. Methodology and Materials

##### 2.1. Forecasting and Covariate Selection in the Interactive Model

Several nonlinear models including SVMs and ANNs have been considered to forecast global solar radiation because researchers have gradually realized that the outcomes given by a linear regression model are not accurate. Interactive models are one of nonlinear models and have wide applications in many scientific fields. They are favored because they take the connections between covariates into account to capture the nonlinear relationship between the global solar radiation and the other meteorological and geographical covariates. In this study, an interactive model is considered to build the forecasting model in a global solar radiation problem and study the following nonlinear additive setup:where denotes the global solar radiation to be studied, is the data matrix with samples and meteorological and geographical covariates, represents the noise term, which follows a Gaussian distribution with representing the noise level, and let be an identity matrix. The distribution of can also be a binomial distribution if a classification problem is studied. Use to denote the two-way covariates. Here is the intercept term, and and are the coefficients of covariates to be estimated. Note that several covariates are included in this model. Furthermore, the large computation time is one challenge caused by the excessive number of covariates.

In the global solar radiation forecasting domain, a proper metric that measures the fitness of global solar radiation and other meteorological and geographical features needs to be defined. This metric is called a loss function, denoted by , in the machine learning community. Because follows a Gaussian distribution in the interactive model (1), a simple way to define this metric is to consider the negative loglikelihood of the distribution of (after centering), that is, where represents the Euclidean norm. A common procedure is to restrict the number of covariates to avoid the overfitting problem. Namely, a few of the covariates may not be useful to the model and they are considered to be nuisance covariates. Therefore, how to extract the important covariates that make great contributions to the model is worth studying. This problem is well known as covariate selection. Covariate selection is essential because the dimensions are high in the interactive model. LASSO is a famous covariate selection method that minimizes the square error loss function and -type penalty to enforce sparsity into the model. It has been shown to be accurate and can be seen as a Bayesian covariate selection using a Laplace prior distribution [43, 44]. However, LASSO often selects nuisance variables when the model coherence is high. Namely, if two covariates play a similar role in global solar radiation forecasting, it is not easy to distinguish between them. To overcome the disadvantages of LASSO, a nonconvex penalty SCAD is proposed and is shown to have better statistical properties (unbiased, continuity, and sparsity) than LASSO [42]. SCAD considers the following optimization:with the SCAD penalty expressed by for , which is determined using generalized cross-validation.

To further increase the forecasting accuracy and facilitate the parameter tuning work, square root regularization approaches are used in this paper. Square root regularization has attracted the interest of researchers in the statistics and machine learning community for years because it makes the parameter tuning work easier. Specifically, in LASSO or SCAD problems, the tuning parameter depends on the value of the noise level , i.e., . However, the noise level is difficult to estimate. To avoid this problem, Belloni et al. (2009) proposed square root LASSO (SRL), which applied a square root loss function and considered the following optimization problem:Different than LASSO, the regularization parameter in SRL depends on , which is independent of the noise level . As is known, parameter tuning is highly significant in determining the forecasting accuracy. Therefore, square root regularization may obtain accurate outcomes. To this end, this paper studies a novel covariate selection method that borrows strengths from both the square root loss function and the SCAD penalty, which is given as follows:In the following, we focus on (6).

##### 2.2. Ensemble Learning

Famous statistician George E.P. Box said “all the models are wrong but some of them are useful” [45]. This indicates that a single forecasting model cannot always perform very well under all circumstances. The basic idea of ensemble learning is to create a finite number of individual learners and combine them through assigning different weights and summing them up [46]. Ensemble learning is a supervised learning algorithm because it can be used to make predictions and classifications. A single model that is going to be trained can be seen as one hypothesis. However, it is difficult to decide which hypothesis is the best. Thus, the ensemble learning system combines all the hypotheses and creates a better hypothesis although the computational time is increased. Therefore, ensemble learning can represent more flexible functions than a single model. Note that an overfitting problem can be caused by the ensemble techniques. In practice, regularization methods are applied to reduce the problem related to overfitting of the training data.

Empirically, ensemble learning tends to provide better results than a single model using the significant diversity among the models. Therefore, many ensemble methods seek to promote diversity. Specifically, the diversity can be generated from the sample space (boosting and bagging), covariate space (RS [47] and random forest [48]), and parameter space. To gain better performance, the method used to combine single models is very important. The simplest way is just to calculate the average of all the results given by the single models. However, these single models may play different roles in the ensemble system. Thus, linear and nonlinear weighting strategies are applied to evaluate their contributions. In this paper, the FA, which is introduced in the following section, is used as a weighting strategy for single models.

##### 2.3. Firefly Algorithm

Xin-She Yang proposed a nature-inspired metaheuristic in 2008 called FA, which was derived from the flashing behavior of fireflies [49]. The main purpose of the FA is to use a signal flash to attract other fireflies. There are three hypotheses in this algorithm: (i) no consideration of their sex, each firefly will move towards all other fireflies; (ii) attractiveness is associated with brightness, the fireflies with low light intensities will be attracted by the brighter ones in the vicinity; nevertheless, the brightness will increase as their distance decreases; (iii) if no fireflies are brighter than the given one, it will move randomly [50, 51].

Let be the th firefly in the population, where each firefly represents a candidate solution in the search space and is the population size. Fireflies move towards the optimal solution positions. The attractiveness is related to the light intensity; thus we can define the attractiveness between two fireflies and as follows:andfor , where denotes the Euclidean norm, is the number of dimensions, is the Cartesian distance between and , and and are the th components of and , respectively. At the distance , the parameter represents the attractiveness and denotes the light absorption coefficient. According to [52], is used as a typical initial value, , where is the length scale of the optimization problem.

The movement of the firefly towards another brighter firefly can be given bywhere is the step size of the random walk and () is a random sign whereas the random step length is obtained from a Lévy distribution in (10). In this paper, the minimization problems are considered. Let be the fitness function. If , this means that is brighter than the firefly . The main steps of the FA are described as follows:(i)Step 1. Initialize the positions of fireflies (or solutions) randomly based on the following: where and denote the lower and upper bounds for the dimension , respectively. The fitness function can be used to evaluate the position of these fireflies in the initial population.(ii)Step 2. Compare the positions of and (); if , will be attracted by and updates its position based on (9). Then the updated position of each firefly can also be evaluated.(iii)Step 3. If the iteration process satisfies the stopping criteria, the algorithm stops; otherwise, continue with step 2.

#### 3. The RS-SRSCAD-FA Method

In this paper, a novel global solar radiation forecasting method is invented based on RS, SRSCAD, and the FA and is called RS-SRSCAD-FA for short. The main procedure of RS-SRSCAD-FA is described as follows:(i)* Step 1*: Build the interactive model as described in (1).(ii)* Step 2*: The RS method is applied to the interactive model. In particular, randomly select covariates without replacement for times, which results in blocks denoted by with covariates in each block.(iii)* Step 3*: SRSCAD covariate selection method is used in each block .(iv)* Step 4*: Assign weights to blocks and concatenate the estimates using the weights determined by FA.

*Remark*. In step 1, an interactive model is established so that all the two-way covariates that describe the possible relationships between covariates and are considered. Note that the dimension of the model has increased from to . In step 2, similar to random forest, the interactive model is divided into a number of blocks using the RS method, which randomly extracts a number of covariates. This procedure is the same as the divide procedure in the “divide and conquer” strategy of the ensemble learning. The complex model is divided into some simple models that are easy to handle separately. After this procedure, all the solutions are combined. In step 3, the proposed dimension reduction method SRSCAD is used to reduce the number of covariates. This step is essential because it can reduce the number of covariates effectively. This delivers two benefits: the computation issue is solved because only a few covariates are included; the collinearity between covariates is decreased. In step 4, the SRSCAD estimates obtained in step 3 are concatenated using the FA instead of linear combination. This procedure aggregates the diversity generated by steps 2 and 3. The flowchart of the proposed method is shown in Figure 1.

To solve the penalized optimization problems efficiently, the thresholding rule (generally referred to as the function [53]), rather than the penalty function, is applied as the main tool throughout this study. The threshold function is defined rigorously as follows.

*Definition* (threshold rule). A threshold rule is a real-valued function defined for and such that(1);(2) for ;(3); and(4) for .

From the definition, it is evident that is an odd monotone unbounded shrinkage rule for , at any . The LASSO and SCAD rules are defined as follows:and is the sign function. The vector versions of and are denoted by and for any vector .

To describe the proposed RS-SRSCAD-FA algorithm in a simple way, we first define a matrix with its th column given by , which indicates the th covariate and all of its associated two-way covariates. The details of the proposed algorithm are given in Algorithm 1 after initialization with , and . When designing the algorithm, the following points should be noted:(i)* Data splitting scheme*. A total of 75% of the original data are going to be applied as training data and the remaining part (25%) is considered to be the test data. The training data is applied to train a forecasting model and the test data is used to evaluate the model performances.(ii)* Convergence*. * Step size*. The step size is used to ensure the convergence of the SRSCAD algorithm (cf. lines 11–19 in Algorithm 1). Usually, it can be determined based on a theoretical analysis with the surrogate function defined. Another way to determine the step size is to apply a line search method. * Stopping criteria*. The error tolerance between two successive iterates and maximum iteration number are determined by trial and error. In particular, the error tolerance is chosen from the set and the maximum number of iterations is selected from . The optimal combination of the error tolerance and maximum iteration number is determined using the trade-off between forecasting accuracy and computational efficiency.(iii)* Acceleration*. Accelerated gradient method (AGM) is applied to increase the convergence speed. The design of AGM was first proposed by [54] and the computation complexity can be reduced from to , where represents the number of iterations of the algorithm.(iv)* Parameter tuning*. In a regularization-based problem, the choice of regularization parameter is critical. In this work, the regularization parameters are selected based on 10-fold CV. The main procedure can be described as follows: the data is divided into parts with roughly equal sizes. The parts will be used to train the model and the left one part is applied to calculate the test error. CV repeats this procedure for times for each candidate regularization parameter and the final CV error is the average of test errors. The optimal parameter is selected based on the smallest CV error. When , this procedure is known as leave-one-out cross-validation (LOOCV), which is often used in complex problems.

Inputs: : , the dataset | |

: the number of blocks generated by Random Splitting | |

: maximum number of iterations in the SRSCAD algorithm | |

: the error tolerance used in the SRSCAD algorithm | |

: number of folds in cross-validation | |

Output: The ensemble test error | |

1: Randomly divide the original data into the training dataset T and test dataset F. | |

2: Establish interactive models QT and QF, respectively | |

3: Randomly divide TIT into blocks with features in each model | |

4: Initialize FA parameters including the maximum number of iterations | |

5: for to do | |

6: Divide the training data into folds | |

7: forto | |

8: Use the th fold as subset data and the remaining folds are regarded | |

as subset data | |

9: Generate grid values of | |

10: forto | |

11: Initialization: , | |

12: Scaling: , | |

13: while or do | |

14: Step 1. | |

15: Step 2. | |

16: Step 3. | |

17: Step 4. | |

18: end while | |

19: end for | |

20: Obtain the solution path and the corresponding sparsity patterns . | |

21: end for | |

22: Calculate CV errors using and . Determine the optimal tuning parameters w.r.t. | |

obtain the optimal SRSCAD estimator | |

23: endfor | |

24: Concatenate the optimal SRSCAD estimates using w and | |

calculate the fitness using the fitness function | |

25: Obtain the optimal weight vector by the FA algorithm, and concatenate the optimal | |

SRSCAD estimates using | |

26: Calculate the forecasting error using the test dataset QF | |

27: Output the ensemble test error |

#### 4. Forecasting Global Solar Radiation in Xinjiang Province of China

In this section, to demonstrate the forecasting accuracy of the proposed RS-SRSCAD-FA algorithm, comparisons are made between this and other forecasting methods including two benchmark models, the Elman neural network (ENN) and LSSVM, the Angstrom–Prescott empirical model (Angstrom–Prescott) [55], popular feature selection models including LASSO, SCAD, and SRL and existing hybrid models including the combination of cuckoo search and square root progressive quantile variable selection procedure in sparse quadratic radial basis function (CS-SRPQVSP-QRBF) [56], particle swarm optimization in a backpropagation neural network (PSO-BPNN) [57], and the firefly algorithm in a support vector machine (SVM-FA) [58]. The experimental results are evaluated using forecasting accuracy and computation efficiency. Furthermore, the analysis of multihour-ahead cases including 24 h ahead and 48 h ahead is also provided in the experiments.

##### 4.1. Data Collection and Description

The Xinjiang area is rich in sunshine and far from the population centers so that it is appropriate for large-scale solar thermal power industry installations. Therefore, it is worthwhile to investigate the global solar radiation by conducting research in eight sites in this region; see Figure 2. This dataset is collected from the National Renewable Energy Laboratory (NREL) and is available at the following website http://www.nrel.gov/gis/solar.html. In addition to the global solar radiation (W/), this data consists of seven meteorological covariates, which are the solar zenith angle (degrees), precipitation (cm), temperature (°C), wind direction (degrees), wind speed (m/s), relative humidity (%), and air pressure (mbar). It was collected between 11:00 and 20:00 from 1/1/2014 to 12/31/2014 as the sunshine is poor during other periods. As an illustrative example, the global solar radiation signal and other meteorological covariates from 1/1/2014 to 1/31/2014 in Site 1 are shown in Figure 3. The main purpose of this paper is to forecast the hourly global solar radiation using these meteorological covariates accurately and efficiently, which could play a dominant role in the design of solar power plants.

##### 4.2. Individual Models

ENN and LSSVM are implemented using the MATLAB neural network toolbox and LSSVMlab 1.5 toolbox. Specifically, a three-layer ENN is established with the number of input neurons , which is the number of covariates in the interactive model. The optimal number of hidden neurons is 5, which is selected from the set using 10-fold CV. The number of output neurons . Thus, ENN uses a network architecture. Furthermore, to estimate the weights between layers more accurately, weight regularization with penalty function is applied during the weight estimation procedure. The regularization parameter is selected from a grid of values denoted by . The optimal regularization parameter is determined as using the Akaike information criterion (AIC) [59]. LSSVM is established by applying an RBF with two kernel parameters selected from the interval using 10-fold CV. SRL is implemented based on the algorithm COORD, the MATLAB code of which may be downloaded from Belloni’s website. Theoretical selection is applied for the selection of regularization parameter in COORD: in the optimization problem (5) recommended in [60], with representing the cumulative distribution function of a Gaussian distribution. For the Angstrom–Prescott empirical model, we apply the empirical model in the form of a second polynomial, which has been proved to be superior to other forms of model by [55]. The possible maximum monthly mean of the daily sunshine duration is given by with representing the location latitude and , where is the number of days in the year. CS-SRPQVSP-QBF, PSO-BPNN, and SVM-FA are implemented using MATLAB and all the model parameters are tuned to give good performances. The proposed RS-SRSCAD-FA is implemented based on Algorithm 1.

##### 4.3. Statistical Measures of Forecasting Performance

In this paper, four common criteria, the mean absolute percentage error (MAPE), root-mean-square error (RMSE), Theil inequality coefficient (TIC), and correlation coefficient (), are used to evaluate the prediction accuracy. The definitions of the criteria are as follows: where is the sample size of the test data, and represent the true value and estimated value, respectively, and and denote the mean value of and . The data is divided into the training dataset (75%) and test dataset (25%). Specifically, the training data is used to establish the forecasting models, and the out-of-sample forecasting performances are evaluated based on the test data. The division procedure is repeated 30 times, and the median of the MAPEs, RMSEs, TICs, and values is reported. Furthermore, the convergence cost (in seconds) is denoted by CC, and the experiments are implemented using a PC with an Intel Core i7 CPU at 3.60 GHz.

##### 4.4. Results Analysis

Table 2 summarizes the forecasting results and computation cost of all the compared approaches at Sites 1–8. As the median is more robust to the outliers than the mean, we focus on the median when making comparisons. Apparently, the performance of the proposed RS-SRSCAD-FA is remarkable because it provides the lowest RMSE and TIC values, whereas its MAPEs are marginally higher than those of other forecasting methods. For instance, RMSEs of RS-SRSCAD-FA are 20.21, 21.93, 21.20, and 20.29 W/ at sites 1–4, which are approximately 20.37%, 13.22%, 18.99%, and 22.65% higher than these of ENN. The Angstrom–Prescott model performs better than both CS-SRPQVSP-QRBF and SVM-FA in terms of accuracy, except for Site 8. For instance, RMSEs of the Angstrom–Prescott model are given as 23.13, 24.13, 24.63, 25.14, 24.74, 22.32, and 22.22 W/, which are lower than those of CS-SRPQVSP-RQBF at 25.78, 25.16, 25.62, 22.34, 27.41, 22.65, and 22.89 W/. CS-SRPQVSP-QRBF delivers better results than SVM-FA except for Sites 1 and 2. Both CS-SRPQVSP-QRBF and SVM-FA outperform PSO-BPNN in terms of MAPE, RMSE, and TIC. The reasons for this are probably that CS and FA are superior over PSO in parameter estimation. However, PSO-BPNN still produces better outcomes than LSSVM, LASSO, SCAD, and SRL, which demonstrates that the hybrid forecasting methods have proved the model performance to some extent.

LASSO, SCAD, and SRL provide comparable forecasting results and training times that are much shorter than other methods. Although LSSVM is computationally rapid, its performance is not as good as ENN and provides the largest forecasting errors at Sites 2–7. The Angstrom–Prescott model is more computationally efficient than CS-SRPQVSP-QRBF, PSO-BPNN, and SVM-FA, which take more training time. For instance, in Site 4, the computational time of the Angstrom–Prescott model is 4.06 s, and those of CS-SRPQVSP-QRBF, PSO-BPNN, and SVM-FA are 415.35, 502.35, and 408.56 s, respectively. The reason why CS-SRPQVSP-QRBF takes more computation time is that it considers the two-way interaction terms and the selection of parameters in the RBF also takes some time. The the low training speed of PSO-BPNN and SVM-FA is that the models are retrained when the parameters changed. Furthermore, both ENN and RS-SRSCAD-FA compute slowly as their model structures are more complex than those of other models. However, this is an acceptable trade-off between forecasting accuracy and computational efficiency because the accuracy is improved at the cost of more computation time. The correlation coefficients of the compared methods are given in the last column of the table. RS-SRSCAD-FA is still the best by giving the highest values. For instance, the values of RS-SRSCAD-FA are 0.98, 0.96, 0.92, 0.94, 0.94, 0.91, 0.95, and 0.93 for Sites 1—8, respectively. The Angstrom–Prescott model also provides comparable values, which are also higher than 0.90. LSSVM only produces values of 0.97 for Site 1 and around 0.80 for the other sites. The performances of the compared methods on cases 24-hour ahead forecasting and 48-hour ahead forecasting are listed in Tables 3 and 4, respectively. The results are consistent with what we have discussed above. It is observed that RS-SRSCAD-FA provides remarkable results followed by the Angstrom–Prescott model and CS-SRPQVSP-QRBF. The RMSEs of RS-SRSCAD-FA for 24 hours ahead are 13.11, 16.90, 13.72, 14.49, 13.16, 14.71, 13.70, and 14.53 W/ for Sites 1—8, respectively. The correlation coefficients of RS-SRSCAD-FA are given around 0.90, which are also higher than other methods. The RMSEs of LASSO, SCAD, and SRL are approximately 20–21 , which are comparable to but worse than those of CS-SRPQVSP-QRBF, PSO-BPNN, and SVM-FA (approximately 14–18 W/). Similar phenomena can also be seen in Table 4 for the case of 48-hour ahead forecasting.

The boxplots shown in Figure 4 clearly demonstrate that the forecasting values of RS-SRSCAD-FA are critically lower and a few outliers are observed (the red lines represent the median values). Figures 5 and 6 exhibit the relationship between the measured data and the estimated values in the solar radiation datasets. Apparently, the points of RS-SRSCAD-FA are closer to the straight line, which demonstrates its satisfactory performances. ENN demonstrates better forecasting results than the other penalized feature selection methods. Figure 7 exhibits the errors between actual values and forecast values provided by the proposed RS-SRSCAD-FA. It is obvious that the RS-SRSCAD-FA values closely match the actual data in almost all time periods. That is, RS-SRSCAD-FA consistently displays the best forecasting performance over a vast majority of the time period.

##### 4.5. Statistical Comparisons of Results over Multiple Real Datasets

Statistical tests for comparison of the proposed algorithm with other existing algorithm are more reliable and trustworthy on multiple datasets than on a single dataset [61]. To make a fair comparison, a nonparametric Friedman test is applied to evaluate the forecasting and selection performances of the algorithms in this global solar radiation datasets. The Friedman test examines whether the measured average rank is significantly different from the mean rank that is likely under the null hypothesis. Table 5 reveals the ranks of the MAPEs, RMSEs, and TICs of the proposed approach and of the other competitors. It is convenient to calculate the and values as shown in the table. Using the six approaches and eight datasets, is distributed according to and degrees of freedom. The critical value for the statistic for is 1.85; therefore, the null hypothesis is rejected, which indicates that there are differences between these algorithms and more tests are needed.

To confirm the advantage, the newly proposed RS-SRSCAD-FA method is regarded as the control algorithm, and the other competitors are tested against it. The most straightforward method is to calculate critical difference (CD) with the Bonferroni–Dunn test [62]. According to [61], the critical value for for the 10 methods is 2.326. Thus, CD is . It is evident that the forecasting models except LSSVM, PSO-BPNN, and SVM-FA yield comparable MAPEs as the differences between their average ranks are less than the CD value 3.52. For example, the difference between SRL and RS-SRSCAD-FA is less than 3.52 (). In terms of RMSE, RS-SRSCAD-FA performs better than LSSVM, LASSO, and SCAD, and SRL and is comparable with ENN, Angstrom–Prescott, CS-SRPQVSP-QRBF, and SVM-FA based on the differences between the average ranks. For instance, the RMSE differences between LASSO and RS-SRSCAD-FA are larger than 3.52 (). The differences between Angstrom–Prescott and RS-SRSCAD-FA are less than 3.52 (). Similar phenomena can be observed from the aspect of TIC and values: most of the results of the nonparametric Freidman test are consistent with our observations. Unlike ENN and SVM-FA, RS-SRSCAD-FA provides an interpretable model that does not employ all the covariates. Therefore, RS-SRSCAD-FA exhibits a higher prediction capacity relative to other forecasting methods.

#### 5. Conclusions

Forecasting solar radiation is fundamental to solar power technology. For the utilization and conversion of solar power, accurate and continuous radiation data is essential. Therefore, establishing an accurate and efficient forecasting model plays a vital role in global solar radiation forecasting. To overcome the drawbacks of a single model, which yields low forecasting accuracy, a novel global solar radiation forecasting method called RS-SRSCAD-FA has been proposed. The main structure of the novel method is ensemble learning for enhancing the forecasting accuracy and model stability. An efficient covariate selection method SRSCAD is used in the input space; simultaneously, a weight vector, which best represents the importance of each individual model in the ensemble system, is determined by the FA. To illustrate the validity and superiority of the proposed method, the datasets from eight locations in the Xinjiang area of China have been applied as real data examples. The results demonstrate that the proposed RS-SRSCAD-FA method is superior to other forecasting approaches.

#### Data Availability

The data used to support the findings of this study are included within the article.

#### Conflicts of Interest

The authors declare that the received funding did not lead to any conflicts of interest regarding the publication of this manuscript and there are no conflicts of interest regarding the publication of this article.

#### Acknowledgments

This research was supported by the National Natural Science Foundation of China (Grant Nos. 71761016 and 71861012), the Natural Science Foundation of Jiangxi, China (Grant Nos. 20181BAB211020 and 20171BAA218001), the China Postdoctoral Science Foundation (Grant Nos. 2017M620277 and 2018T110654), and Scientific Research Fund of Jiangxi Provincial Education Department (Grant No. GJJ180287).