#### Abstract

Global solar radiation (GSR) is a critical variable for designing photovoltaic cells, solar furnaces, solar collectors, and other passive solar applications. In Nepal, the high initial cost and subsequent maintenance cost required for the instrument to measure GSR have restricted its applicability all over the country. The current study compares six different temperature-based empirical models, artificial neural network (ANN), and other five different machine learning (ML) models for estimating daily GSR utilizing readily available meteorological data at Biratnagar Airport. Amongst the temperature-based models, the model developed by Fan et al. performs better than the rest with an of 0.7498 and RMSE of . Feed-forward multilayer perceptron (MLP) is utilized to model daily GSR utilizing extraterrestrial solar radiation, sunshine duration, maximum and minimum ambient temperature, precipitation, and relative humidity as inputs. ANN3 performs better than other ANN models with an of 0.8446 and RMSE of . Likewise, stepwise linear regression performs better than other ML models with an of 0.8870 and RMSE of . Thus, the model developed by Fan et al. is recommended to estimate daily GSR in the region where only ambient temperature data are available. Similarly, a more robust ANN3 and stepwise linear regression models are recommended to estimate daily GSR in the region where data about sunshine duration, maximum and minimum ambient temperature, precipitation, and relative humidity are available.

#### 1. Introduction

Some of the critical global issues currently encountered by human civilization include global warming and environmental pollution particularly instigated by the excessive use of fossil fuels like petroleum products and natural gas and traditional fuels like timber and firewood [1]. Nepal, being a developing nation where 60% of the entire population is involved in agriculture, has a disproportionate dependence on traditional fuels [2]. However, clean and perpetual solar energy is gaining more and more attention from the government as well as the private sector in recent years. Global solar radiation (GSR) data serve to be one of the critical variables in applications relating to hydrology, meteorology, agriculture, and renewable energy. The GSR is important in the renewable energy sector to predict the capacity and efficiency of devices based on solar energy applications like photovoltaic cells, solar furnaces, and solar collectors. However, unlike routine meteorological parameters, the data pertaining to daily GSR are not readily available in many locations all over the world [3]. This is particularly relevant for developing countries like Nepal. Lack of measured GSR data has led to the development of several methods to estimate GSR, namely, a neural network [4, 5], empirical models [6, 7], stochastic algorithm [8], and satellite-based methods [9]. Despite the current development in new methods and technologies, the empirical method utilizing meteorological data is preferred because of the cost and technical constraint imposed on new methods and technologies [3, 6, 7, 10].

According to the Department of Hydrology and Meteorology (DHM), only 284 meteorological stations are currently in operation in Nepal. Out of these, only 64 meteorological stations are equipped with pyranometer to measure daily GSR, while only 34 meteorological stations have the necessary infrastructure to measure daily sunshine duration [11]. The data of daily GSR are not readily available for most of its locations. Thus, various meteorological parameters can be used instead for the estimation of daily GSR.

As of 2019, more than 294 empirical models are available for the estimation of GSR employing readily available meteorological data [7]. Some of the major categories of empirical models include sunshine-based models [12], temperature-based models [13], cloudiness-based models [14], and complex models. Angstrom pioneered the estimation of GSR employing a linear empirical model which was later modified by Prescott [15]. Simplicity of the Angstrom–Prescott (A-P) model and strong correlation of sunshine duration with GSR are the reasons for its extensive application all over the world [12, 16, 17]. After the development of the A-P model, one or more meteorological parameters have been incorporated in the original model to improve the estimation [15]. Sunshine-based models perform more efficiently than models based on other meteorological parameters since the sunshine duration is strongly correlated with the GSR [18–20]. However, the high initial investment and high maintenance cost of the instrument are constraints to the widespread application of sunshine-based models. Therefore, developing empirical models that utilize readily available meteorological parameters such as ambient air temperature, relative humidity, and precipitation are widely preferred.

The ambient temperature range is the most readily available meteorological parameter. One of the simplest temperature-based models consisting of mean monthly maximum and minimum temperature as inputs was proposed by Hargreaves and Samani [21]. After the introduction of the H-S model, several modifications have been developed by incorporating other meteorological parameters to improve the model performance. Hassan et al. modified the H-S model by introducing precipitation term that performed better than two of the most effective sunshine-based models from the literature [13]. Jahani et al. recently developed two new accurate polynomial models that outperformed several temperature-based models from the literature [3]. Although all of the temperature-based models were derived empirically, the variation in ambient temperature was assumed to be largely dependent on the solar radiation arriving on the Earth’s surface [22].

Although empirical models are widely analyzed and evaluated, the performance of these models is found to vary according to the geographical location and local climate [6]. Lately, several machine learning (ML) models are employed to estimate GSR at several locations [4, 5]. The capability of generalizing and optimizing time and capacity to resolve problems that are difficult to be represented by an explicit algorithm are some of the biggest advantages of ML models [23, 24]. The main ML models currently in practice include artificial neural network (ANN), support vector machine (SVM), genetic programming (GP), random forests (RF), and adaptive neural-fuzzy inference system (ANFIS). Some of the predominantly applied ANN models include radial basis function network (RBFN) and multilayer perceptron (MLP). Behrang et al. [4] concluded that MLP was more accurate than the RBFN for the estimation of GSR in Iran. Belaid and Mellit [25] applied SVM with different input combinations and concluded that it required a fewer number of input parameters to provide better accuracy than ANN.

The current study presents a comparison between the temperature-based empirical models and ML models. The most common type of feed-forward network, i.e., MLP, is employed in the current study to estimate the GSR at Biratnagar Airport, located in the Eastern Terai Belt of Nepal. Hence, the objectives of the study include the following:(1)The performance analysis of six different temperature-based empirical models to estimate daily GSR(2)The application of ANN and other ML models to estimate daily GSR(3)The comparative analysis of the aforementioned models to recommend the best model for the estimation of GSR

#### 2. Materials and Methods

##### 2.1. Study Location and Data

Nepal is situated between ° N and ° N latitude in the temperate zone. Nepal experiences 300 days of annual sunshine with an annual average solar radiation of [26]. Fourteen-month daily data of various meteorological parameters for Biratnagar Airport (26.4840° N latitude, 87.2662° E longitude, and 236 m altitude) were obtained from DHM. The average annual temperature in Biratnagar is °C with an average annual rainfall of [27].

CMP6 pyranometer is employed to measure the daily GSR on a horizontal surface. The pyranometer consists of a blackened thermopile that absorbs the solar radiation which is converted into heat. Voltage output is generated by the thermopile which is then calibrated to indicate the GSR. A data logger is utilized to record the measured daily GSR. Campbell–Stokes sunshine recorder is employed for the measurement of sunshine duration. Mercury-filled and alcohol-filled meteorological thermometers are used to measure wet-bulb and dry-bulb temperature for normal ambient and low ambient temperature, respectively. Similarly, data for relative humidity is computed using the dry-bulb and wet-bulb temperature.

##### 2.2. Temperature-Based Empirical Models

Several models have correlated ambient air temperature with GSR empirically. Two of the most widely employed empirical model for the estimation of GSR using ambient air temperature data only are the Hargreaves and Samani (HS) model and Bristow and Campbell (BC) model. Thus, the above two models along with other four recently developed highly accurate models are selected in the present study.

Hargreaves and Samani [21] proposed a simple model employing a mean monthly maximum temperature and mean monthly minimum temperature as inputs for the estimation of daily GSR:where and is the empirical constant.

Chen and Li [28] developed and analyzed the performance of more than 20 different temperature-based empirical models. Two of the top-performing temperature-based models incorporating ambient temperature range and mean monthly maximum temperature and mean monthly minimum temperature as inputs are taken for the current study. One model incorporates an additional constant term to the original H-S model with an exponent of “1” (abbreviated as Chen and Li (model 1)):where and are the empirical constants.

Another model is a multiple regression model that takes mean monthly maximum and minimum temperature as inputs (abbreviated as Chen and Li (model 2)):where , , , and are the empirical constants.

Bristow and Campbell [29] developed a model considering that the GSR is exponentially related to the ambient temperature range:where , , and are the empirical constants.

Jahani et al. [3] proposed a model considering a polynomial correlation of GSR with ambient temperature range:where , , , and are the empirical constants.

Fan et al. [6] modified the Jahani model by using a different exponent in the ambient temperature range term and incorporating additional average temperature term to improve the model performance:where , and , , , , and are the empirical constants.Extraterrestrial GSR is computed using the following equation [30]:

The angle of declination is determined by using the following equation [31]:

The day length is calculated as follows:

The sunset hour angle is calculated as follows:

##### 2.3. Machine Learning Models

Multilayer perceptron (MLP) model is employed in the current study among several other ANN topologies available in the literature. MLP is particularly useful in modeling to resolve a complex problem. Figure 1 illustrates the structure of MLP that consists of an input layer, a hidden layer, and an output layer. Input signals are multiplied by a set of weights as they are sent to the output layer through the hidden layer.

Typical MLP with a hidden layer can be modeled as follows [32]:where is the bias in the hidden layer. A nonlinear activation function (typically a sigmoid) is employed for the calculation of output of neurons given by [33]

Support vector machine (SVM) is a powerful supervised learning technique with excellent generalization ability because of which it is extensively utilized for solving problems regarding pattern recognition, classification, regression, and prediction [34]. SVM function can be used with various kernel functions to implement its regression learner. The application of the Gaussian kernel is popular with SVM classification and regression. Very complex boundaries and relations can be established with the help of Gaussian-based SVM. Medium Gaussian and fine Gaussian are defined based on the slenderness of the Gaussian function being used. Fine Gaussian uses very thin Gaussian function with very low standard deviation. As a result of these thinly separated boundaries, fine Gaussian is susceptible to overfitting. Linear regression learner performs a multivariate linear regression on a set of input data. On the other hand, stepwise linear regression utilizes only highly correlated variables for linear regression removing redundant weakly correlated variables.

##### 2.4. Model Training and Testing

The fourteen-month data is divided into two datasets, initial 85% of the data is utilized for the development of ML models or the calibration of empirical models and the rest 15% is employed for the model assessment. The present study employs extraterrestrial solar radiation , sunshine duration, maximum ambient temperature , minimum ambient temperature , precipitation, and relative humidity as inputs for the development of ML models.

The performance of the neural network is analyzed by differing the number of neurons in the hidden layer and recording the respective statistical indicators. Neural Net Fitting [35], an implicit application in MATLAB, is employed to design and train the neural network. The most well-known feed-forward network, i.e., MLP, is employed in the current study to model the GSR. Levenberg–Marquardt backpropagation algorithm is utilized to train the network. Training terminates when generalization stops improving, as demonstrated by an increase in the mean square error and the corresponding decrease in . Regression Learner [36], an implicit application in MATLAB, is employed to analyze and evaluate the performance of linear regression, stepwise linear regression, medium Gaussian SVM, matern 5/2 Gaussian process regression (GPR), and exponential GPR.

##### 2.5. Statistical Indicatives

Four statistical indicators, namely, coefficient of determination , adjusted , mean square error (MSE), and root mean square error (RMSE) are utilized to evaluate the performance of various models. The performance of the model is evaluated using the following equations:where and represent the measured and predicted values, while and represent the average measured and average estimated values, is the number of data points, and is the number of independent regressors.

RMSE value provides an indication of the short-term performance of the model. Lower RMSE value corresponds to better performance. indicates the variance of the dependent variable that is explained by independent variables.

#### 3. Results and Discussion

##### 3.1. Calibration of Temperature-Based Empirical Models

Figure 2 illustrates the correlation between the measured value and the estimated value of daily GSR for the calibration dataset. The estimated value of GSR is reasonably correlated with the measured value of GSR for all models. From the statistical indicators (Table 1), the model developed by Fan et al. is better correlated among other temperature-based models with an of 0.6435 and corresponding of 0.6373. Concerning RMSE, again model developed by Fan et al. performs better than other temperature-based models with an RMSE of . Chen and Li (model 2) ranks second among temperature-based models with , , and RMSE of 0.6248, 0.6200, and , respectively. The model developed by Fan et al. is reasonable to apply in the region where only data pertaining to ambient temperature range are available.

**(a)**

**(b)**

**(c)**

**(d)**

**(e)**

**(f)**

The least-square method is employed to fit the empirical coefficients for temperature-based models. Empirical coefficients obtained for all temperature-based models are incorporated in Table 2.

##### 3.2. Training and Validation of ANN Models

Figures 3 and 4 illustrate the correlation between daily GSR estimated by ANN and measured daily GSR for 6 and 10 neurons in the hidden layer, respectively. Table 3 provides a summary of the statistical indicators for ANN models. For the training set, the model with 10 neurons in the hidden layer (abbreviated as ANN5) performs better than other ANN models with an of 0.8485 and RMSE of . Likewise, ANN3 ranks second among other ANN models in the training set with an of 0.8341 and RMSE of . A comparison of statistical indicators shows that and RMSE sometimes follow a different trend. Similar to the training set, and RMSE sometimes follow a different trend in the validation set. Although ANN5 is the best model in terms of , it exhibits a comparatively higher RMSE of . In the model development, ANN models account for greater variance in the training data in comparison with the temperature-based empirical model.

##### 3.3. Training and Validation of Other ML Models

Figure 5 illustrates the correlation between daily GSR estimated by various ML models and measured daily GSR. In order to prevent the overfitting of the model, 5-fold cross-validation is performed during the model development. The statistical indicators for these models are incorporated in Table 4.

**(a)**

**(b)**

**(c)**

**(d)**

**(e)**

Medium Gaussian SVM performs better than other models with an of 0.79 and RMSE of . Similarly, GPR matern 5/2 ranks second with an of 0.79 and RMSE of . The performance of GPR exponential degrades extensively with cross-validation which might be attributed to the overfitting of the dataset. On the contrary, the cross-validation has little or no effect on the performance of the linear regression and stepwise linear regression model. In the model development, most of these ML models perform extremely well in comparison with temperature-based empirical models. On the contrary, the ANN5 model accounts for greater variance in the training data in comparison with that in ML models.

##### 3.4. Performance Comparison of Empirical and ML Models

Model assessment is carried out on the 15% unseen data after the model development. RMSE and of empirical and ML models on the test data are illustrated in Figures 6 and 7, respectively. All of the temperature-based empirical models perform reasonably well on the test data. Amongst the empirical models, the model developed by Fan et al. outperforms other models with an of 0.7498 and RMSE of . A comparison of statistical indicators shows that and RMSE sometimes follow a different trend. HS model ranks second among empirical models in terms of RMSE while exhibiting an of 0.7323. Similarly, the model proposed by Jahani et al. also performs reasonably well with an of 0.7356 and RMSE of .

The performance of ANN on the test data is not consistent with the performance during the model development. Amongst the ANN models, ANN3 performs better than other models with an of 0.8446 and RMSE of . ANN1 ranks second with an of 0.8134 and RMSE of . The performance of ANN4 and ANN5 degrades substantially in the model assessment in comparison with the model development. The overfitting of the training data during model development is the primary reason for the degradation of the performance.

Similar to the ANN models, the performance of other ML models on the test data is also not consistent with the performance during the model development. A comparison of statistical indicators shows that and RMSE sometimes follow a different trend. Concerning , stepwise linear regression performs better than other ML models with an of 0.8870 and RMSE of . Likewise, concerning RMSE, linear regression learner performs better than other ML models with of 0.8102 and RMSE of . The performance of medium Gaussian SVM, GPR matern 5/2, and GPR exponential degrades substantially in the model assessment in comparison with the model development. The overfitting of the training data during model development is the primary reason for the degradation of the performance.

#### 4. Conclusion

The present study analyzes and evaluates six different temperature-based empirical models, ANN, and other five different ML models to estimate daily GSR at Biratnagar Airport. Initially, six different temperature-based empirical models with ambient temperature range and daily average temperature as inputs are calibrated. In the model assessment, the model developed by Fan et al. performs better in comparison with other temperature-based models with an of 0.7498 and RMSE of . The performance of ANN with a higher number of neurons in the hidden layer degrades substantially in the model assessment because a large number of neurons tend to overfit the dataset. In the model assessment, ANN3 performs better than other ANN models with an of 0.8446 and RMSE of . Similarly, ANN1 ranks second with an of 0.8134 and RMSE of . Five different ML models available in the MATLAB Regression Learner are analyzed and evaluated to determine the best performing ML model. The performance of medium Gaussian SVM, GPR matern 5/2, and GPR exponential degrades substantially in the model assessment because of the overfitting of the training data during the model development. Concerning , stepwise linear regression performs better than other ML models with an of 0.8870 and RMSE of . Likewise, concerning RMSE, linear regression learner performs better than other ML models with an of 0.8102 and RMSE of .

Considering the generalization capability of temperature-based empirical models, the model proposed by Fan et al. is recommended to estimate daily GSR in the region where only data pertaining to ambient temperature are available. For the regions where data about sunshine duration, maximum and minimum ambient temperature, precipitation, and relative humidity are available, a more robust ANN3 and the stepwise linear regression model are recommended to estimate daily GSR.

#### Nomenclature

GSR: | Global solar radiation |

MLP: | Multilayer perceptron |

SVM: | Support vector machine |

ANN: | Artificial neural network |

DHM: | Department of Hydrology and Meteorology |

ANFIS: | Adaptive neurofuzzy inference system |

GP: | Genetic programming |

RF: | Random forest |

RBFN: | Radial basis function network |

GPR: | Gaussian process regression |

: | Measured global solar radiation |

: | Extraterrestrial solar radiation |

: | Solar constant |

_{ }: | Day of the year |

: | Day length |

: | Latitude |

: | Solar declination angle |

: | Sunshine hour angle |

: | Maximum ambient temperature |

: | Minimum ambient temperature |

: | Ambient temperature range |

: | Average daily temperature. |

#### Data Availability

The data used to support the findings of this study are supplied by the Department of Hydrology and Meteorology, Government of Nepal, and are included within the supplementary information file.

#### Conflicts of Interest

The authors declare that there are no conflicts of interest regarding the publication of this paper.

#### Acknowledgments

The authors are grateful to the Department of Hydrology and Meteorology, Government of Nepal, for assisting to complete the study by providing necessary data. The research and publication of the article are funded by the authors.

#### Supplementary Materials

Supplementary 1: raw data of daily global solar radiation, sunshine duration, maximum and minimum ambient temperature, relative humidity, and rainfall of Biratnagar Airport, Morang, Nepal, used for the study are included in the supplementary information file. Supplementary 2: Solar Radiation file contains average solar radiation data measured per second for a particular day. So, . Supplementary 3: Sunshine Duration file contains daily sunshine duration data in . Supplementary 4: *T*_{max} and *T*_{min} file contains daily maximum and minimum ambient temperature data in . Supplementary 5: Rainfall file contains daily rainfall data in millimeters . Supplementary 6: RH file contains data of relative humidity computed a couple of times a day.* (Supplementary Materials)*