Abstract

Accurate and complete global solar radiation (Hs) data at a specific region are crucial for regional climate assessment, crop growth modeling, and all operations that use solar energy. However, in the Minas Gerais state, Southeastern Brazil (SEB), the number of weather stations that measure global solar radiation is scarce, and when it is available, it presents gaps in the time series. An attractive alternative to solve the data gap problem is to estimate global solar radiation using empirical models. In this study, thirteen models based on maximum and minimum air temperatures, precipitation, sunshine duration, and extraterrestrial solar radiation were compared in the daily estimation of Hs. Data from 10 weather stations, from 1999 to 2017, located in Minas Gerais were used. Also, cluster analysis was used to group the localities (weather stations) with similar patterns of model performance, climatic classification (Köppen–Geiger and Thornthwaite), and seasonal data variability, considering minimum and maximum air temperatures, precipitation, sunshine duration, and global solar radiation. Although it is apparently simple, studies on this subject are scarce and the few existing ones in Minas Gerais have flaws, which justifies this study. The models were evaluated by root mean square error (RMSE), mean absolute percentage error (MAPE), mean bias error (BIAS), Willmott’s index of agreement (d), and performance index (c-index). Models based on sunshine duration, such as those proposed by Ertekin and Yaldiz and by Newland, showed the best performance (average c-index = 0.71). Models based on temperature and precipitation showed the worst results (average c-index = 0.41). Cluster analysis showed that there is a similar pattern between the performance of the models, climatic classification, and seasonal variability of data among the localities of Minas Gerais. In general, models that presented extremely poor performance were formed with weather stations located in the dry zone, but with different climate classification, and models that presented very good (and good) performance were composed by weather stations located in the humid zone (dry subhumid) with the same climate classification and similar seasonal variability. Furthermore, the models based on temperature have a tendency to overestimate radiation values below 10 MJ·m−2 day−1 and to underestimate values higher than 25 MJ·m−2 day−1. This point is a limitation of the model for estimating global solar radiation below and above these levels, showing the influence of atmospheric systems and atmospheric attenuation mechanisms of global solar radiation.

1. Introduction

Global solar radiation (Hs) directly influences the physical, chemical, and biological processes that occur in the biosphere atmosphere [1]. Spatial, temporal, and seasonal variations in Hs influence planning and decision-making activities related to many fields, such as the energy, hydrological, environmental, agricultural, and meteorological areas [24]. The Hs is an important input variable for developing thermal and photovoltaic energy systems, estimating evapotranspiration, dimensioning water demand for irrigation in agriculture, agroclimatic zoning, or simulating growth models for crop yield, harvesting, and projecting sustainable buildings, not to mention studying climate change [57]. For this, complete and accurate Hs data at a specific region are crucial for regional climate assessment and all other applications [4].

Although Hs is an important variable, recorded data have been restricted to automatic weather stations (AWSs) [1, 3, 8], and the number of AWSs capable of registering these data has fallen due to the high costs associated with acquiring, maintaining, and calibrating the instruments [912]. Furthermore, when Hs is available in the AWS, it does not present a long time series [13]. There is a general lack of data coming from most Brazilian AWSs [1, 13] even in the most developed regions of the country [14]. For example, according to the Integrated Environmental Data System (Sistema Integrado de Dados Ambientais (SINDA)), of the 61 AWSs in operation in the Minas Gerais (MG) state (Figure 1), only 32 have Hs measurement capabilities, and of these 32, only 10 have Hs records available (Figure 1). These data are thus insufficient to carry out studies surrounding this particular variable. For this reason, in the absence of measured Hs data, it is fundamental to opt for its estimation, mainly by using empirical models (EMs) or artificial neural networks (ANNs), and even by using recent techniques such as satellite-based data and mesoscale data [6, 7, 12]. EMs and ANNs are the most used empirical models, and some studies show that ANNs provide slightly better results. This technique has not, however, demonstrated its superiority in estimating Hs when compared to traditional EMs [12]. EMs use different functional relationships and include easily measurable variables, such as air temperature, precipitation, and insolation [2, 8, 1315]. This makes EMs attractive since these variables are frequently recorded in weather stations. Thus, EMs can be used as an appropriate tool for estimating Hs [1, 7, 12, 13].

There are many existing EMs, which differ in both their complexity and the input variables considered [1, 3, 5, 16]. They can be classified into three main categories [11]:sunshine duration-based models, cloud-based models [1722], and air temperature and/or precipitation-based models [6, 2327]. Most of these models have been used at different locations, and the calibration coefficients have changed considerably. Thus, the empirical relationships vary spatially and temporally, which implies that the models should be calibrated to each specific location [8, 9, 11]. Additionally, few studies report on the scale of EM estimations [12]. In other cases, regression analysis is performed on EMs using monthly average values [9, 14, 28, 29], highlighting the need to perform goodness-of-fit analysis on daily values, commonly used as input values in evapotranspiration, water requirement, irrigation sizing, and crop modeling [30, 31]. Furthermore, when Hs is measured on a daily basis, results are more accurate since they reflect daily changes in Hs [12].

It is not new that several studies of this nature have been recently carried out in various locations around the world, including some regions in Brazil, and have been conducted using different models [2, 6, 7, 9, 11, 13, 30]. However, the superiority of one model over the other has not been verified. A better functional relationship between the input variables that should be considered in EM studies has also not been proven. In MG, there are scarce works of this nature, and the few existing ones were made to the locality in Lavras [32], the metropolitan, Zona da Mata, and Vale do Rio Doce regions [16], and the northwest of MG [3], where only the category of EM based on air temperature and precipitation was used [3, 16, 32, 33]. These few studies [3, 16, 32] present some negative points such as short and insufficient temporal series for a good fit (1-2 years only), no model validation been performed, and models that consider only average air temperature and precipitation data, in addition to the lack of information with regard to the model fit time scale. Additionally, MG is a state that deserves special attention since it has a large land area (586,521,235 km2), is the 4th largest state in Brazil, is the 2nd most populous state with 20,997,560 inhabitants, and contributes greatly to the Brazilian economy with diversified agricultural production [34] making it one of the largest producers in the country.

Given the agricultural and economic importance of MG, applications of solar energy have an important guiding significance to the agricultural clean production, energy conservation, and emission reduction. Therefore, reliable estimation of Hs is important and necessary, i.e., for the operation of solar-powered pump station systems, lift irrigation projects, and potential crop yield [4] in the MG state [3, 5, 16]. But due to the scarcity of studies of this nature in MG, the aim of this study was to evaluate the fitting and validation of a series of daily Hs, which include as input data the air temperature, precipitation, and sunshine duration, and group the performance of these models with the climatic conditions of Minas Gerais, Southeastern Brazil (SEB) (Figure 1).

2. Materials and Methods

2.1. Observed Climate Data

MG is located in SEB (Figure 1) and is an agricultural state, occupying the fifth overall position in national production level rankings. We emphasize the diversity of the production, which includes fruit trees, olive trees [34], coffee trees [35], sugar cane [36], potato [37], maize [38], soybean [39], and bean plantations [40]. Furthermore, MG has a typical monsoon climate, with two well-defined seasons: the dry winter (June, July, and August) and the humid summer (December, January, and February), influenced by local convective activities and the South American Monsoon System (SAMS) [36], and both contribute to the state ranking with one of the country’s biggest agricultural yields [41]. They also contribute to the great variability of Hs values (Figure 2).

Due to the great spatial and temporal variability of air temperature and precipitation, MG has five climate types according to Köppen–Geiger’s classification [42], mainly, Aw: tropical water deficiency in winter (67%) and Cwa: subtropical water deficiency in winter and hot summer (21%) (Table 1). While by Thornthwaite’s climate classification, MG has twenty-five climate types with five predominant ones, as shown in Table 1.

Daily data were obtained for 10 locations in MG (Figure 1; Table 1) for the maximum and minimum air temperatures (Tmáx and Tmín (°C)), sunshine duration (n (hours)), and precipitation (P (mm)) obtained from the Meteorological Database for Teaching and Research (Banco de Dados Meteorológicos para Ensino e Pesquisa (BDMEP)) provided by the National Institute of Meteorology (Instituto Nacional de Meteorologia (INMET)), and global solar radiation (Hs (MJ·m−2 day−1)) was obtained from SINDA provided by the National Institute of Space Research (Instituto Nacional de Pesquisas Espaciais (INPE)). BDMEP considers the data measured in conventional weather stations (CWSs), while SINDA considers AWSs. The meteorological data sets were available simultaneously on both the BDMEP and SINDA platforms for the period from 1999 to 2017.

The meteorological data were first submitted to quality control by two steps: basic validation and temporal validation [9, 13]. In both steps, when the observation was considered spurious, it was removed from the series. The basic validation was based on the following elimination criteria [9]: (a) missing data for any of the variables, (b) Tmáx < Tmín, and (c) Hs/H0 > 1, where H0 is the extraterrestrial solar radiation (MJ·m−2 day−1). The temporal validation was performed according to the study [13]. No other quality control criteria were applied because the data sets from CWSs were provided by INMET, which carries out rigorous quality control.

The data were divided into two independent sets. The first set was used to calibrate the EM coefficients obtained from goodness-of-fit tests (step 1). The second set was used to validate performance (step 2) (Table 1). The data were divided differently for each location based on the number of valid Hs daily data from the database, of which 70% were used for calibration and 30% for validation [1, 13].

2.2. Global Solar Radiation Models

The EMs selected in this study (Table 2) are widely discussed in the literature but were not fitted and validated to MG climatic conditions. They propose different functional relationships and include easily measurable variables [1, 11]. All of the EMs have a strong relationship to Hs due to the heating of the earth’s surface and changes in the environment [24].

2.3. Statistical Analysis

Statistica software [56] was used to conduct the goodness-of-fit testing for the EMs using the nonlinear estimation procedure and also by considering the Gauss–Newton approximation method of ordinary least squares. The quality of the fit was evaluated statistically using [1, 12, 57] root mean square error (RMSE), mean absolute percentage error (MAPE), mean bias error (BIAS), Willmott’s index of agreement (d), performance index (c-index), and the significance of the coefficients fitted by the t-test ( for the null hypothesis).

The criterion of the performance interpretation to c-index (c) is as follows: >0.85, excellent; 0.76 to 0.85, very good; 0.66 to 0.75, good; 0.61 to 0.65, reasonable; 0.51 to 0.60, poor; 0.41 to 0.50, very poor; and ≤0.40, extremely poor [58].where  = estimated values,  = average of the estimated values,  = observed values,  = average of the observed values, and  = number of observations.

For brevity, only five EMs that presented the best fit, i.e., lower RMSE, MAPE, and BIAS values and higher “d” and c-index values, were selected for performance validation (step 2). Validation was carried out with an independent data set (Table 1), and the quality of the validation was verified via RMSE, MAPE, “d,” and c-index statistical analysis. Additionally, variance homogeneity between the estimated and observed Hs values for each location was tested using the Bartlett test (H0 = homogeneous variances () versus H1 = heterogeneous variances ()) [59, 60].where  = Bartlett’s test value; , in which  = number of data;  = variance between observed Hs and estimated Hs (for each model and location); and  = 2 which refers to observed and estimated pooled data.

Bartlett’s test can also be used to test the normality of data, and this test was used in this study at the same probability. Also, the observed mean values of Hs were compared, using a paired t-test, with the mean values estimated. In the paired t-test, it was tested whether H0 = mean of observed Hs is equal to the mean of estimated Hs (for each model and location) () versus H1 = mean of observed Hs is unequal to the mean of estimated Hs (), by [60]where  = paired t-test value,  = average of the observed values,  = average of the estimated values,  = variance between observed Hs and estimated Hs (for each model and location), and  = number of data.

Cluster analysis was used to group the locations (weather stations) with similar patterns of model performance, climate classification (Köppen–Geiger’s climate classification and Thornthwaite’s climate classification were used), and seasonal data variability (Tmáx, Tmín, n, P, and Hs). The method for grouping locations was Ward’s hierarchical clustering [61], which considers the squared Euclidean distances (De) as a measure of dissimilarity [62, 63]:where is the Euclidean distance and and are quantitative variables i from elements l and k.

In De, the number of clusters is determined by finding the level at which the within-group similarity is maximized while the between-group similarity is minimized, according to the study [63]. When groups are being arranged, their Euclidean distances are small and increased in each step. The process is interrupted when a threshold is reached [61].

3. Results and Discussion

All locations in MG present daily, monthly, and seasonal variability. It is possible to verify that increases in Hs are usually accompanied by Tmáx, Tmín, and n variations but are not accompanied by variations in P (Figure 2). The lowest Hs values (16.66 ± 2.29 MJ·m−2 day−1) were observed in Caratinga (point 4, Figure 1, and CA, Figure 2). The highest values (20.50 ± 2.73 MJ·m−2 day−1) were observed in Viçosa (point 10, Figure 1, and VI, Figure 2). The SAMS causes changes in seasonal cloud patterns [64] influencing the value of the Tmáx, Tmín, n, and P variables, and consequently Hs. According to the study [1], seasonal changes in cloud patterns and latitude are the main factors that determine variations in Hs. Additionally, cold fronts, mainly from the south of MG, reduce temperatures and affect rainfall in the region, especially in the winter months [65]. The highest air temperatures were found in the northern MG state (Araçuaí, Montes Claros, Paracatu, and Pirapora), while the lowest were found in the southern MG state (Lavras and Machado) (Figure 2), corroborating [41]. This shows that the variables used in the empirical relationships vary temporally and spatially in MG, and these factors are crucial for determining the goodness of fit for EMs of Hs [5, 8].

The calibrated coefficients for the same EM may vary considerably between locations, i.e., Gd (b1: 0.3110 to 17.2790), and between other EMs themselves (S1). These results and their discrepancies are expected as have been discussed in the literature [1] and were presented in the limited studies conducted in MG in [3, 16, 32] that were specifically fit for the city of Lavras, the northwestern part of MG, and the metropolitan, Zona da Mata, and Vale do Rio Doce regions. This further emphasizes that EMs must be calibrated to each particular location [9, 11] as this is crucial for improving EM performance [1, 66].

In general, the majority of EM coefficients fitted with daily data are significant by the t-test () (S1). This is ideal for EM goodness-of-fit tests [24]. They change from place to place with the largest differences being observed for coefficients of the Hg model. For example, regarding the sunshine-based models, such as the AP model and its modified versions (Nw, AE, AD, EM, and EY models) (Table 2), the b0 coefficient values relate to diffused Hs, while b1 values relate to direct Hs [20, 67], and both vary between 0 and 1 [9]. In clear sky conditions, the diffuse solar radiation decreases (and b0 is closer to 0), while direct solar radiation increases (b1 is higher) with increasing values of Hs. Then, large (small) values of Hs are related to large (small) values of n/N (Figure 2) [68]. Considering the AP EM, the highest b1 values (lowest b0) (S1) were found in the northern MG state (Montes Claros, Paracatu, and Pirapora) and southern MG (Machado and Lavras) which present dry subhumid and humid climate, respectively, according to Thornthwaite’s classification [42], besides higher n values. The values of the AP EM coefficients for the localities in MG (b0 0.24 to 0.44 and b1 0.17 to 0.60) were quite similar to those obtained in [68] in the Alagoas state, northeastern Brazil (NEB) (b0 0.24 to 0.34 and b1 0.37 to 0.48), in [69] for estimating monthly Hs in the Seropédica city, located in the Rio de Janeiro state, SEB (b0 = 0.28 and b1 = 0.43), and in [2] for Pampas in a central western part of Argentina (average values of b0 = 0.214 and b1 = 0.571). In another example, in the Gd EM (modified Bristow–Campbell model) (Table 2), three empirical coefficients are presented, wherein b0 represents the transmissivity of the atmosphere on a clear sky day and coefficients b1 and b2 determine the incremental effect of differences in air temperature [11] (S1). The b0, b1, and b2 coefficients of the Gd EM did not present a distribution pattern with the local climatology and were also not influenced by the thermal amplitude of the analysis areas, similar to that reported in the study [13]. Furthermore, the values of Gd coefficients varied widely among the locations (b0 0.45 to 0.83, b1 0.37 to 17.28, and b2 0.30 to 7.00). The b1 coefficient presented the largest variation, different from that reported in studies [4, 13]. But b0 and b2 (except in Caratinga) were quite similar to those obtained in [4] in China (0.5 < b0 < 0.7 and 1.8 < b2 < 2.3) and in [16] in three MG regions: metropolitan, Zona da Mata, and Vale do Rio Doce (0.632 < b0 < 0.780 and 1.702 < b2 < 2.851).

Generally, the goodness-of-fit tests for EMs with Tmáx, Tmín, and P variables had the worst fits, with higher RMSE (3.36 to 18.29 MJ·m−2 day−1), MAPE (20.12 to 90.43%), and BIAS (−0.90 to 0) values and lower “d” (0.28 to 0.88) and c-index (0.08 to 0.70) values (S1). In summary, it was possible to classify the EMs into three groups based on their goodness of fit: group I had the best fits and comprised the EY, Nw, and AE EMs which depend on variables n and N; group II had moderate fits and comprised the AP, EM, AD, Gd, and JS EMs; and group III had the worst fits and comprised the remaining Ch, Hn, Hg, HS, and Ws EMs. We emphasize that the Ws EM had a c-index of 0.01 to 0.29, which is one of the worst fits for all the locations studied. This EM was also not efficient in estimating Hs in Cruz das Almas (BA) [5], in North America [70], and in Seropédica (SEB) [71] nor in the different biomes found in the Mato Grosso state, central-west Brazil [1]. There are limitations as to the applicability of Ws since this model does not have fixable coefficients and since estimations for Hs can be made only in function of and , without proper local calibration [1, 5, 70]. This result shows the necessity to calibrate the models to the MG climate conditions, similar to that reported in [1, 67, 68, 71] for other locations of Brazil.

Although air temperature-based EMs are usually better tools for estimating Hs [11, 24], the quality of the fit changes in function of the large amplitude of the daytime cycle air temperature in MG [27]. In MG, the air temperature is influenced by latitude, altitude, continentality, and the predominance of the South Atlantic Subtropical High (SASH), causing subsiding movements, inhibiting convection, providing clear skies at night, and causing consequently low Tmín due to the nighttime radiative losses [41]. On the contrary, a sky with fewer clouds facilitates surface heating resulting in higher Tmáx [72] making possible these seasonal and temporal temperature variations, justifying the low quality of the fits for these EMs to the ten locations in MG. For example, HS and Hg EMs were proposed as more convenient, effective, and strong applicability models with fewer input data [4]. And they presented excellent performance (c-index between 0.78 and 0.95) in Alagoas, NEB [73], and in some regions of Rio de Janeiro, SEB [13] (c-index between 0.75 and 0.96), both located on the coast and near the Atlantic Ocean. However, the HS EM did not show good performance in locations in Brazilian continental regions, such as MG (S1), Jaguaruana city (Ceará state, NEB), Campinas and Jales cities (São Paulo state, SEB) [74], and all regions located in SEB [33]. In addition, the same results were found in [4] in all subzones of China. The values of the b0 coefficient—associated with thermal amplitude—vary from 0.14 to 0.17 (S1) and were similar to those obtained in [11] in Madrid (0.146 to 0.161), in [4] in China (0.14 to 0.16), and in [13] in the Rio de Janeiro state, SEB (0.13 to 0.19). Besides, they were similar to the recommended values for interior regions (≈0.162) [75], but they differed from those reported in [14] for Telêmaco Borba of the Paraná state, Southern Brazil (SOB) (0.11 to 0.12), and in [13] in Arraial do Cabo and Campos dos Goytacazes cities (both located on the coast in the Rio de Janeiro state) (0.23 to 0.26).

The JS model (group II) and Hn model (group III) use Tmáx, Tmín, and P (Table 2) in different functional input data relationships, making it difficult to fit these particular EMs, especially in sites with seasonal and spatial P variations, and this is exactly what happens in MG [41, 65, 76]. The coefficients associated with P have values ≈ zero in both EMs, indicating that including this variable exerts little influence on Hs [1, 3, 9, 16]. Usually, EMs that have P-dependent calibration coefficients have weak to moderate fits, as occurring in the Hn model. This difficulty is attributed to measurement errors in P instruments and the lack of accuracy in recording and filing failures [1].

However, the fit for the JS model was reasonable for most locations in MG (S1) and validation was consistent, except for Araçuaí, Araxá, Belo Horizonte, and Caratinga (Table 3; Figure 3), being categorized in the moderately performing group (II). It is important to note that none of the EMs were able to adequately estimate Hs in these four cities, and this may have occurred due to the lack of calibration in the observed data [20, 77]. Nonetheless, the authors decided to keep the data results from Araçuaí, Araxá, Belo Horizonte, and Caratinga as an alert for other users. This was also done in [6], where JS was classified in the intermediate group because it did not show high correlation with Hs for locations in northern Spain. Thus, one may conclude that the P variable influences EM precision resulting in greater errors, especially on rainy days.

The AP, EM, AD, and Gd models were the other EMs in group II that had reasonable predictive power for estimating Hs. All models, except Gd, include n/N in their different functional relationships. n/N is a variable directly related to Hs, since cloud occurrence and formation are mainly responsible for restricting Hs on the earth’s surface. This is the main reason for greater EM accuracy when models are based on sunshine duration and daylength [4, 19, 67]. EMs that contain the n variable have given the most satisfactory results in several studies carried out in different locations [18, 21, 68, 78]. Gd accurately estimated Hs for the Amazon region (RMSE ≈ 2.35 MJ·m−2 day−1) and Brazilian Cerrado (RMSE ≈ 2.76 MJ·m−2 day−1) [1]. For some locations in Egypt [24] and the Yucatán Peninsula, Mexico, [26] Gd showed satisfactory results (RMSE ≈ 1.87 and 3.04 MJ·m−2 day−1, MAPE ≈ 9.64 and 15.39%, respectively) but was superior to other EMs that are based on air temperature. In this study, the Gd EM had high RMSE and MAPE values (≈4.17 MJ·m−2 day−1 and ≈31.97%, respectively), and low c-index values (≈0.49), even though validation was not coherent for the majority of locations in MG (Table 3). One reason for the limitations surrounding this particular EM is the inference that the radiation is the only mechanism that exerts influence on the air temperature, and although it is a valid assumption, frontal movements and regional advections also influence temperature [11].

The other EMs—AP, EM, and AD (group II) and EY, Nw, and AE (group I)—included the n/N variable, constituting a widely used category for estimating Hs [9, 15, 28, 68]. This category is derived from sunshine-based models and cloud-based models, originally proposed by Angström and adapted by Prescott [32, 79]. In theory, sunshine-based EMs perform better in Hs fit tests [8, 12] because changes in Hs/H0 give information on the energy availability at the earth’s surface and changes in the local atmospheric conditions, besides the frequency distribution and occurrence of cloudier days [14, 28, 68]. The good performance of the AP model and its modified versions was also observed for the climate conditions of the Alagoas state (NEB) [68], Seropédica city (SEB) [69], and Santa Maria city (SOB) [8] and of others countries in the world, such as China [4, 9], Spain [28], and Tunisia [21]. Except for Araçuaí, Araxá, Belo Horizonte, and Caratinga, EY, Nw, and AE models had a c-index of ≈0.75, on average, and these three EMs had better goodness of fits for locations in MG (S1).

For brevity, validation was performed only for the EMs with the best fits in 100% of the locations, such as models EY, Nw, and AE, and in addition, models Gd and JS belong to the moderate fit group (II) and had good fits for 60 and 70% of the locations studied, respectively. This additional choice was made taking air temperature and P into account, which are easily measurable variables, constituting a convenient and attractive alternative [4, 23, 24]. Despite resulting in better fits, EMs that include the n and n/N variables imply that these variables be measured, and this is usually done only at CWSs [4, 8, 67].

In general, all of the EMs tended to overestimate Hs values between 5 and 10 MJ·m−2 day−1 and to underestimate values higher than 25 MJ·m−2 day−1 (Figures 3 and 4). Using different EMs and different fit scales, mostly monthly scales, this same tendency was observed in China [9], in Brazilian regions like Cruz das Almas [5], in metropolitan regions, in the Zona da Mata region, in the Vale do Rio Doce region [3], in the northwestern region of MG [16], and in the state of Mato Grosso [1]. This is due to the fact that EMs have limitations in estimating Hs below and above these levels given the high concentrations of Hs data between 10 and 20 MJ·m−2 day−1, i.e., around the average, causing a reduction in extreme values. Although authors [1, 3, 5, 9, 16] do not justify this, the data percentage that is within these thresholds is low. In general, of the 13462 observed Hs, 11.8% of the locations were between 5 and 10 MJ·m−2 day−1 and 11.9% were above 25 MJ·m−2 day−1. By synoptically analyzing some locations, one can notice that the majority of data between 5 and 10 MJ·m−2 day−1 occur throughout the year. Data above 25 MJ·m−2 day−1 occur in spring and in the summer months (Figure 2).

Regardless of the specific location within the MG state, validation was similar among all the EMs that depended on n/N and obtained the best results (Table 3), except for Araçuaí, Araxá, Belo Horizonte, and Caratinga. In summary, Lavras, Machado, Montes Claros, Paracatu, and Viçosa had good validation rates, a c-index between 0.60 and 0.83. Pirapora had very poor validation performance, a c-index between 0.41 and 0.49. Araçuaí, Araxá, Belo Horizonte, and Caratinga had extremely poor validation performance, a c-index between 0.18 and 0.44.

The cluster analysis identified four groups with similar patterns of model performance (Figures 5(a) and 5(b)). These groups were associated considering model performance, Köppen–Geiger’s climate classification (Figure 5(a)) and Thornthwaite’s climate classification (Figure 5(b)), and seasonal data variability. Group 1 was composed of models which presented extremely poor performance and stations 1 (Araçuaí) and 4 (Caratinga), located in the dry zone (dry subhumid and subhumid climate according to Thornthwaite’s classification). Group 2 was formed by the models with extremely poor performance and stations 2 (Araxá) and 3 (Belo Horizonte), located in the mesothermal and humid zones. Araxá and Belo Horizonte are border regions between the northern and southern parts of MG and have Aw climate according to Köppen–Geiger’s classification. Group 3 was composed by stations 7 (Montes Claros), 8 (Paracatu), and 9 (Pirapora) that have good performance, with exception of station 9 (Pirapora), and all stations are located in the dry megathermal zone (dry subhumid). And group 4 was formed by models which presented very good performance and stations 5 (Lavras), 6 (Machado), and 10 (Viçosa), located in the humid zone with the same Köppen–Geiger’s classification (Cwa) and Thornthwaite’s classification (B2rB′4). The three stations are located in central and southern parts of MG, have similar seasonal variability (Figure 2), and have higher dissimilarity among groups 1 and 2.

In Araçuaí, Araxá, Belo Horizonte, and Caratinga (groups 1 and 2), the EMs had estimated values that were basically constant, around 10 to 20 MJ·m−2 day−1, which highlights problems in heteroscedasticity, corroborated by the Bartlett test (Table 4). The Gd EM had the worst validation performance in all the locations, followed by the JS EM (Table 3; Figures 3 and 4).

Otherwise, for most locations, the Nw, AE, and EY EMs did not violate the normality and homogeneity assumptions. This is a desirable factor when choosing the most appropriate EM (Table 4). Similarly, EMs that violated normality and homogeneity assumptions also showed differences between the observed and simulated averages for Hs, such as the Gd EM, except in Araçuaí, Caratinga, Paracatu, and Viçosa.

The extremely poor validation results made it impossible to choose an EM for estimating Hs in Araçuaí, Araxá, Belo Horizonte, and Caratinga (Table 3; Figure 3). This may have been due to the quality of the observed Hs data, which also impaired the goodness-of-fit tests for the EMs (S1) possibly due to the lack of calibration instruments, similar to what was verified in [1, 5]. Such information was requested from the research institutes and from the researchers responsible for maintaining the Hs recording equipment; however, we did not receive a response. The cities of Lavras, Machado, and Viçosa (belonging to group 4) and Montes Claros, Paracatu, and Pirapora (belonging to group 3) (Table 3; Figures 3 and 4) had similar validation performance for the EY, Nw, and AE EMs and did not show any violation of the normality and homogeneity assumptions for most locations. They also did not show any differences between observed and simulated averages for Hs (Table 4). Despite this, the Nw model resulted in slightly superior results for the city of Lavras. The EY model resulted in slightly superior results for the cities of Machado, Montes Claros, Paracatu, and Viçosa, being considered the best EM for estimating Hs. In the city of Pirapora, however, EM results were poor (Table 3), and the EY model was slightly superior to the others. This can be verified by the close proximity of the 1 : 1 line (Figure 4). Although the EY model did not violate the normality and homogeneity assumptions, it showed differences between the observed and simulated means of Hs (Table 4).

Authors in [4, 12, 21] have suggested that using satellite imagery, ANNs, time series, vectors, or hybrid and stochastic methods may allow for greater space-time coverage for estimating Hs [22]. However, this requires more robust input data and complex algorithms, which require more computational power and have not yet been proven to be superior to EMs for providing estimations of Hs [12, 24]. By contrast, EMs, especially those selected for validation—Nw (Lavras) and EY (Machado, Montes Claros, Paracatu, Pirapora, and Viçosa)—have the advantage of being less complex and generally require easily measurable variables and demand less time, cost, and computational capacity [8, 9, 18]. Less complex models have a greater range of applicability, especially in countries that do not have readily available Hs measurement instruments, like Brazil. Models that consider the n variable may be more restrictive than those that consider air temperature, since some locations do not have records of the n variable [24]. There is a need for constant improvements in the functional relationships surrounding EMs [3], as well as a need to develop more adequate EMs, especially for regions with temporal and spatial variations for Tmáx, Tmín, P, and cloudiness, such as MG, SEB [41, 65].

Although this study compared the most widely used models as presented in the existing literature, it did not propose improvements for functional relationships, and it should be noted that studies like those proposed in [24] have suggested improvements in functional relationships by inserting Tmáx and Tmín. However, these are still similar models to those already existing in the literature, like model 6 from [24] to HS and Hg EMs, for example.

By taking into account parameters that have a direct relationship with Hs, such as latitude, altitude, daylength, and sunshine duration, one can improve EM accuracy [19]. This also resulted in more accurate estimates for several locations such as China [4, 19], Saudi Arabia [17], Algeria [18], Paraná [20], Tunisia [21], and Iran [78]. Since EMs are empirical in nature and specific to the atmospheric conditions under which they were developed, it is possible to overestimate or underestimate Hs. This in turn may lead to EM inefficiencies applied to other states in Brazil [17] and may result in considerable changes in coefficient values [1, 8, 9, 11, 66]. Additionally, the n variable has a large influence on estimated Hs values and presents better results, both in goodness-of-fit tests (S1) and in validation tests (Table 3; Figures 3 and 4), whereas EMs that have variables Tmáx and Tmín should be considered when n data are not available, similar to what was observed in [12, 13, 24].

4. Conclusions

In Minas Gerais, Southeastern Brazil, there are a limited number of weather stations that measure Hs, and it is thus necessary to estimate this variable using EMs. In this study, thirteen models based on maximum and minimum air temperatures, precipitation, sunshine duration, and extraterrestrial solar radiation were evaluated in the daily estimation of Hs for 10 locations in Minas Gerais. Additionally, cluster analysis was used to group the localities (weather stations) with similar patterns of model performance, climatic classification (Köppen–Geiger and Thornthwaite), and seasonal data variability, considering minimum and maximum air temperatures, precipitation, sunshine duration, and global solar radiation. Minas Gerais has a great spatial and temporal weather variability due a typical monsoon climate with dry season in winter and rainy season in summer, with predominance of two climate types according to Köppen–Geiger’s classification (Aw and Cwa) and five types according to Thornthwaite’s climate classification (B1rB′4, B2rB′4, C1sA′, C1dA′, and C2rB′4).

EMs based on Tmáx, Tmín, and P had the worst fits, with highest RMSE (3.36 to 18.29 MJ·m−2 day−1), MAPE (20.12 to 90.43%), and BIAS (−0.90 to 0) values and lower d (0.28 to 0.88) and c-index (0.08 to 0.70) values. The lack of quality of the fits in the models based on air temperature and precipitation is related to the fact that the state is located inland. These models result in better fits for coastal regions. Furthermore, the rainy season is in summer, and in other seasons, P is closer to zero, thus influencing the coefficients associated with this variable, resulting in them being insignificant and close to zero. Sunshine-based models had the best fits for all locations in Minas Gerais, with lower RMSE (2.75 to 4.01 MJ·m−2 day−1), MAPE (15.22 to 24.05%), and BIAS (−0.01 to 0) values and higher d (0.83 to 0.93) and c-index (0.59 to 0.81) values.

The EMs that had the best Hs fit estimation were selected for validation. Three of them—EY, Nw, and AE—use the n and n/N variables, since clouds are responsible for restricting Hs, directly resulting in the higher accuracy of the EMs based on sunshine duration and daylength. The P variable, present in the JS model, negatively influenced the model accuracy. This model, like the temperature-dependent Gd model, had the worst validation performance. EMs based on the n variable had the best fits and validations and had similar results amongst themselves.

It was possible to group similar patterns of model performance with climate classification and seasonal data variability. Except for Araxá and Belo Horizonte, models that presented extremely poor performance were formed with locations located in the dry zone (dry subhumid and subhumid climate according to Thornthwaite’s classification). Araxá and Belo Horizonte are border regions between the northern and southern parts of Minas Gerais and presented extremely poor validation performance. For this reason, it was impossible to choose the best model to estimate Hs. Models that presented very good (and good) performance were composed by weather stations located in the humid zone (dry subhumid) with the same climate classification and similar seasonal variability—Cwa and B2rB′4 (Aw and C1sA′).

The EY model was slightly better than the others in estimating Hs for Machado, Montes Claros, Paracatu, Pirapora, and Viçosa cities, while the Nw model was better for the Lavras city. It was not possible to choose an efficient EM to estimate Hs values for Araçuaí, Araxá, Belo Horizonte, and Caratinga cities, since the estimated values are practically constant, around 10 to 20 MJ·m−2dia−1. In these four cities, the normality and homogeneity assumptions were violated.

The scientific contribution of this work is due primarily to the lack of studies of this nature in Minas Gerais and to the determination of a coherent model with calibration (and validation) statistically adequate for obtaining Hs on a daily scale. The few existing studies for Minas Gerais present some negative points such as short and insufficient temporal series for a good fit (1-2 years only), no model validation been performed, and models that consider only average air temperature and precipitation data, in addition to the lack of information with regard to the model fit time scale (i.e., if they are based on monthly average daily solar global radiation or based on daily global solar radiation). These flaws are sources of inconsistencies and may hinder the use of these models for Hs estimation in the cities in Minas Gerais. On the contrary, the addition of the cluster analysis showed that there is a similar pattern between the performance of the models, climatic classification, and seasonal variability of data among the localities of Minas Gerais. This verification has never been performed for Minas Gerais and presents another scientific contribution of this work. Although this study only compares models widely used in the literature, it is important to emphasize that any improvement in the functional relationships can only be performed through the evaluation (and calibration) of models already described in the literature. Unfortunately, our work did not take into account the effects of climate change and human activities on global solar radiation. We will take into account this question in the future research.

Data Availability

The data were obtained for 10 locations in the Minas Gerais state for the maximum and minimum daily air temperatures (Tmáx and Tmín (°C)), sunshine duration (n (hours)), and precipitation (P (mm)) obtained from the Meteorological Database for Teaching and Research (BDMEP) provided by the National Institute of Meteorology (INMET). These data are available for free at http://www.inmet.gov.br/portal/index.php?r=bdmep/bdmep. The global solar radiation data (Hs (MJ·m−2day−1)) were obtained from SINDA provided by the National Institute of Space Research (INPE). These data are available for free at http://sinda.crn.inpe.br/PCD/SITE/novo/site/index.php. If necessary, all data (from BDMEP and SINDA) used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest

The authors declare that there are no conflicts of interest regarding the publication of this paper.

Acknowledgments

The authors would like to thank the Coordination for the Improvement of Higher Education Personnel (CAPES), referring to the first author’s scholarship (case number 1780316), and also thank the Foundation for Research Support of Minas Gerais, with respect to projects APQ 01392-13 and APQ 01258-17, for financial support. The authors are grateful to Juan Perez, Marcel Carvalho Abreu, and Marcus Vinicius Xavier Senra for helping to improve the quality of maps and figures and to improve the readability of the text.

Supplementary Materials

S1: statistics used for the best-of-fit tests for the 13 global solar radiation estimation models, calibration coefficients, and significance for the 10 locations in Minas Gerais, Brazil. (Supplementary Materials)