Solar resource data derived from satellite imagery are widely available nowadays, either as an open-source or paid database. This article is intended to assess open-source databases, which cover the region of Indonesia. Here, four known solar resource databases, which spatially cover the Indonesian archipelago, have been used, namely, Prediction of Worldwide Energy Resource (POWER), Surface Solar Radiation–Heliosat-East (SARAH-E), CM SAF Cloud, Albedo, Radiation edition 2 (CLARA-A2), and SolarGIS. In addition, a minor portion of the Meteonorm database by Meteotest, around five sample points across Indonesia, has been assessed in terms of coherency to the four mentioned databases. Correlation coefficient and relative bias of the multiyear monthly mean annual cycle global horizontal irradiation (GHI) between pairs of databases are inspected. Three out of four databases are then validated through the available irradiation ground measurement data provided by the World Radiation Data Centre (WRDC). The correlation between each pair varies mostly between 0.7 and 1, which shows that the four databases to a certain extent agree on how the intermonthly variation would behave throughout the year. On the other hand, the validation result reveals that the three databases, i.e., POWER, CLARA-A2, and SARAH-E, are suffering from positive bias error ranging from 3% to 7%. Despite that fact, the correlation between measured and estimated values is still acceptable with SARAH-E showing the best performance among the three. Careful selections and adjustment enable the possibility of these databases to be utilized as a tool for depicting interannual and intermonthly variations of solar irradiation throughout the Indonesian archipelago.

1. Introduction

In the modern energy society, photovoltaic (PV) has emerged as one of the leading technologies contributing more than 20% of the worldwide total installed renewable energy power plant by the end of 2018. Around 57% of this PV capacity was installed within the region of Asia where China, Japan, and India are the three top contributors. Meanwhile, Indonesia is only able to contribute for less than 0.03% (around 60 MW) of the global total installed PV capacity [1]. For the last 10 years, PV installed capacity annual growth in Indonesia only counts at 5.2 MW on average. This rate is the fourth slowest growth among Southeast Asia countries despite being the largest economic power within the region. Although the price of solar panels has dropped significantly over the last decade, the growth of the PV industry in Indonesia does not seem to get much impact from it. Other issues are frequently addressed regarding this slow development such as ineffective support regulation, opaque framework, low IRR for the developer, bank reluctancy on financing solar developers, low electricity tariff, and many more [2, 3].

Nevertheless, the fact that the solar PV project requires high initial investment cost remains. Employing reliable solar resource data with a long-term historical database enables a more detailed solar resource analysis such as inquiring how solar resources on potential power plants would behave throughout the investment years. This kind of analysis will help investors to understand more about the uncertainty involved in solar project investment. Likewise, the developer would also be able to refine this analysis into a P50/P90 uncertainty report, which is required to obtain funding from the investors. The existence of reliable solar resource data without a doubt is an important aspect of stimulating the growth of the solar PV project within a region.

Several studies have been conducted to estimate local solar irradiation in some regions in Indonesia via ground measurements [410]; unfortunately, most of the measurements were only conducted for a short period. Some works present a thorough summary of solar resources data acquired from ground measurements throughout the Indonesian archipelago between the mid-70s and mid-90s periods [11, 12]. A couple of models also have been proposed as tools to estimate local solar irradiation values, e.g., artificial neural network (ANN) [13], Weather Research and Forecasting (WRF) [7], the stochastic model [5, 6], and the physics model [10]. Among those prior studies, one work has tried to build a complete map of Indonesian solar resources based on the surface and solar energy (SSE) database whose data are mainly derived from images of atmosphere captured by satellite [14].

Large area coverage of satellite images makes them superior in acquiring data over certain regions instantaneously, while the ground-based sensor only acquires data within a perimeter of the sensor’s location. Moreover, the captured images are well-archived, which enables the possibility of building a proper solar resources database from archived historical images. Due to these pronounced advantages, most of the solar resources databases that exist today incorporate satellite images into their algorithms either as a primary or secondary input variable.

Unlike developed countries who put some interest in developing their solar resource database and provide the service for free, Indonesia currently is only able to rely on the data that have been provided by the third party. As a support to the development of renewable energy worldwide, some research institutions funded by developed countries have been providing worldwide open-source solar resource databases service for quite sometimes, while some private institutions charge some fees for the solar resource data services. Despite having their limitations, such as temporal and spatial coverage ranges and resolutions, the open-source databases are more preferable for solar PV players in Indonesia since they do not incur any additional cost.

The aforementioned limitations of several open-source solar resources databases will be thoroughly discussed within this article, and finally, a common ground map where all open-source solar resources databases agree on several statistical measures limit is drawn out. Hopefully, the map can act as a guide for solar PV players in Indonesia to utilize the open-source databases and adjust them appropriately to any specific needs.

2. Methods

2.1. Satellite-Derived Solar Resources Databases

As mentioned previously, open-source solar databases addressed within this article are all satellite-derived. In general, each database only differs in terms of satellite selection as images producer and surface irradiation derivation algorithm. The basic derivation algorithm usually starts with predicting the total amount of solar radiation that would reach the Earth’s surface if the sky is clear (commonly known as clear-sky irradiation). The prediction applies some physics modeling that normally requires the estimated value of solar radiation on the top of the atmosphere (known as TOA irradiation) and some atmospheric parameters such as aerosols optical depth (AOD), precipitable water vapor (PWV), and others [15].

Upon predicting clear-sky irradiation, the algorithm continues by deciding the inclusion of additional effects caused by cloud conditions. The cloud condition is often described by several cloud properties retrieved from satellite images. The cloud properties retrieval algorithm inspects each pixel (normally in form of a square) of the image and determines the most appropriate cloud properties for that particular pixel. The number of pixels within an image determines the level of detail on the captured 2D space that can be derived or simply known as spatial resolution. While the number of captured images over a period of time determines the level of 1D time detail that can be inferred from a group of images or often known as temporal resolution. The quality of the solar resources databases also depends on these resolution values, higher resolutions mean closer the database to resemble continuous behavior of true irradiation values.

The first solar resource database is Prediction of Worldwide Energy Resource (POWER) developed by NASA Langley Research Centre. This database is an outgrowth of the previously mentioned SSE database with a similar spatial resolution. Unlike most solar resources databases, POWER historical data are built from 3 different sources for each respective period, GEWEX SRB 3.0 (1983–2007), FLASHFlux v2 (2008–2012), and FLASHFlux v3 (2013-present) [16]. Due to this discontinuity, it is not recommended to use this database for analysis that encompasses a change of sources. Validation of daily global irradiance value from the aforementioned sources relative to ground measurements stations (BSRN, ARM, and Ocean Buoy) shows that FLASHFlux product is slightly better than GEWEX SRB product in terms of the root mean square error (RMSE) by about 0.5%, while in average worldwide, the relative root mean square error (rRMSE) value is around 18% [17].

The second database is Surface Solar Radiation–Heliosat-East (SARAH-E) provided by the EUMETSAT Satellite Application Facility on Climate Monitoring (CM SAF) [18]. Since the easternmost view of METEOSAT-East Satellite is limited at 128 E; therefore, only around 70% of the Indonesian archipelago is spatially covered by this database. Despite having this limitation, SARAH-E offers high spatial resolution and a long historical dataset, which is beneficial for conducting thorough solar resources analysis. Unlike the POWER database, global scale validation of SARAH-E was only conducted on 8 ground measurement stations and included a single year instantaneous global irradiance dataset. The result reveals that the average rRMSE value is around 33%, while the relative mean bias error (rMBE) shows inconsistent patterns across the eight stations with some high positive and negative bias [19]. Some local validations within the region of Asia are also available, such as India [20] and China [21].

The third database is Cloud, Albedo, and Surface Radiation edition 2 (CLARA-A2), also developed by CM SAF [22]. Unlike SARAH-E who derived the data from the MVIRI sensor of METEOSAT-E, CLARA-A2 utilized the AVHRR sensor mounted on polar-orbiting satellites operated by NOAA and MetOp polar orbiter satellites operated by EUMETSAT. CLARA-A2 daily mean global irradiance dataset has been validated against the data provided by the BSRN ground measurement stations network. The result shows that the database on average is suffering from negative bias for about 1.7 W/m2 [23], while a local validation against Chinese ground measurement stations reveals that the database overestimates the global irradiance value by about 10 W/m2 [21].

The fourth solar resources database is provided by SolarGIS, which comes with paid and free versions. An open-source version of the database is presented in form of high spatial resolution local solar resource map product assisting the request of the World Bank’s Energy Sector Management Assistance Program (ESMAP) as the funding source [24]. The map itself is presented as a multiyear monthly mean annual cycle of irradiation values (as shown in Figure 1); therefore, it does not hold any information regarding the historical datasets as the proper solar resources database should do. In exchange, it offers a very fine spatial resolution dataset, which might be a good alternative for solar players in Indonesia, e.g., government, NGO, solar PV developer, consultant, R&D entity (university and research centre), and manufacturer, who wish to use Typical Meteorological Year (TMY) solar radiation datasets. Local validation of SolarGIS global irradiance database of Indonesia shows that the rMBE and rRMSE values are 0.6% and 2.5% for Bukit Kototabang Station, while for Palangkaraya Station, the values are around −4.6% and 8%, respectively. Since Indonesia is located in the humid tropical climate region, the rMBE value could go as high as 8% outside the measurement sites [25]. Important information regarding these databases is summarized in Table 1.

Nowadays, the initial solar resource database is usually included within common PV design software, e.g., PVsyst with Meteonorm and HOMER with NASA-SSE. Since many Indonesian solar developers are already familiar with both software types, the coupled solar resource databases within will be assessed in terms of coherency with the four aforementioned satellite-derived databases; NASA-SSE is excluded since it has been represented by POWER database. Similar to SolarGIS, Meteonorm 7.2 brought by Meteotest also offers free and paid versions. The open-source version covers a multiyear monthly mean annual cycle irradiation dataset, limited at a certain maximum number of sites. The algorithm of Meteonorm is a mixed version of ground measurement data interpolation and satellite-derived process, which roughly follows this rule [26]:(1)If the closest station is within 30 km (for Europe 10 km) radius, the data will be either similar to the station’s measured data or derived from the interpolation of nearby stations(2)If the closest station is within a radius of 30–300 km, the data will be a mixture of interpolation and satellite-derived data(3)If there is no station within a 300 km radius, the data will be purely derived from a satellite

To stand in equal terms with the other databases in this article, only data generated from the last term of the aforementioned rule is considered. So far, Meteonorm Satellite Irradiation (MNSI) has only been validated for Europe, Africa, and Middle East regions whose result has found that rRMSE of hourly values lies around 12–25% [27].

2.2. Solar Radiation Ground Measurement Stations

Besides comparing, the article also intends to check these databases locally whose result might be able to reveal some specific characteristics that the user needs to be aware of. The local validation process of the solar resource database requires a reliable ground measurement dataset, and in this article, daily sum global irradiation data provided by World Radiation Data Centre (WRDC) network stations [28] was selected. As the basic requirement for the validation process, the actual measurement time should coincide with the available timeframe of the satellite database. Even though there are 5 WRDC stations located in Indonesia, only one station satisfies this requirement. Therefore, other nearby stations from surrounding countries that fulfill the timeframe are also included within the validation process. A list of stations that comply with coincidence time terms is presented in Table 2.

2.3. Physics Parameter and Statistical Measures

Variations of global horizontal irradiation (GHI) value has been proven able to represent the energy yield variations of the flat PV system [29], which is the most frequent PV installation type in Indonesia nowadays. At least for the present, GHI alone is sufficient to be the main parameter for comparison and validation processes. The main interest focuses on daily, monthly, and annual behavior of flat solar PV energy yield potential; therefore, the main unit of the parameter is initially defined as kWh/m2. The database usually comes with its output parameter, mostly either as daily total incident irradiation or daily mean of valid instantaneous irradiance, which later needs to be adjusted into the energy unit.

The first thing that has to be verified is the GHI annual intermonth trend of each database, which is a cycle formed by a monthly average of irradiance data within a year. The trend similarity of two databases is estimated by employing Pearson correlation coefficient (r) to the daily long-term average for each month within a year; this means a multiyear average of typical daily irradiation data for each month, which is then arranged as a year dataset. Since each database has a different spatial and temporal resolution, some adjustments such as data downscaling and timeframe filtering are required to compare both datasets in an equal manner. Spatial downscale via averaging reduces variations within a dataset; therefore, the coefficient of variation (CoV) [30] is employed to estimate how much has been lost; Ei and Ep terms stand for GHI value for each pixel and mean GHI value of pixels after downscaling, respectively. Temporal CoV (CoVT) is also added as an extra information to analyze some disagreement on GHI annual intermonth trend between the four databases as shown on the later section; Ej and Eq terms stand for multiyear average of typical daily GHI value for each month and mean of Ej in a year period. Since the four databases have different temporal coverage, the 1999–2016 period has been selected to analyze CoVT for POWER, SARAH-E, and SolarGIS, while for CLARA-A2, the 1999–2015 period has been chosen:

Another statistical measure that has been employed is the mean bias (Bm) between one database relative to another. Computation steps of mean bias are similar to mean bias error (MBE). However, instead of comparing the estimated values against true values of the parameter, it compares two estimated values relatively. The measure is computed on a yearly time frame for each month and then mapped across the Indonesian archipelago to explore the major behavior of a particular database compared to another. Unlike r and CoV, Bm is not a dimensionless parameter, and it holds kWh/m2/day as the unit. Besides, a rule where a database with the lowest spatial resolution acts as a reference for Bm computation has been set. The reference data (db1) simply replace “measured data” term on widely known MBE formula, while the db2 term still acts as estimated data. As for validation purposes, common statistical measures such as RMSE and MBE are employed; Ee and Em terms on equations (4) and (5) stand for estimated and measured solar energy resources, respectively:

At the last step, both computed r and Bm are employed as agreement constraints to depict a spatial common ground for the four databases. The values are set to be r > 0.8 for “very strong” correlation [31] and −x < Bm < x, where x = 0.4 kWh/m2/day; selected x value covers roughly 90% of overall database pairs mean bias distribution. Generated common ground map is acting as a reminder for solar players in Indonesia that there are some regions where the databases statistically exhibit weak agreement or even do not agree on each other. In these regions, the user is advised to be careful when using the data provided by one of the aforementioned databases.

3. Results and Discussion

3.1. Databases Validation against WRDC Stations

The validations presented in Table 3 are only conducted on a database, which provides free access to at least a daily basis historical dataset, i.e., POWER, CLARA-A2, and SARAH-E. Overall, POWER and SARAH-E exhibit consistent trends across all validation stations both in terms of time series trend and bias error direction, which consistently overestimate their GHI data. Meanwhile, CLARA-A2 performs relatively well on time series coherency, but it does not have any tendency toward either positive or negative bias for every measurement station. It might be related to strangely high initial spatial variations that will be discussed on the CoV topic. Another interesting fact is related to CLARA-A2 low performance (especially on RMSE and r) compared to POWER, which fundamentally offers a lower spatial resolution. Both SARAH-E and CLARA-A2 similarly utilize the MAGIC algorithm for estimating clear-sky irradiance; therefore, the low performance issue might be related to how the database handles cloudy conditions.

According to the result, it is no doubt that SARAH-E outperforms the other two databases especially on RMSE and time series correlation measures. However, there is a concern regarding sudden performance drops on 2 validation sites, namely Kuching and Kota Kinabalu. Similarly, POWER and CLARA-A2 also suffer from these performance reductions, but somehow, the drops are not as pronounced as the one on SARAH-E. The two stations are located at the Malay side of Kalimantan, close to Indonesian regions where the majority of databases do not agree on how the GHI annual intermonth trend should be. Therefore, the performance drop could also be triggered by low/no-temporal variation dataset as elaborated on the next section or transnational forest fire pollution, which has been proven to be one of the causes of air quality degradation in Malaysia [32]. Databases validations on Christmas Island station seem to exhibit unsatisfactory MBE and RMSE values, most likely due to a limited amount of validated data. Meanwhile, the validations on Brunei Airport station reveals that POWER and CLARA-A2 estimated data do not perform well on predicting local time series trends. Since the last data time stamp was recorded in 1995, it was less likely caused by transnational forest fire effects. The most probable reason should be related to the low/no-temporal variation within the satellite-derived and/or ground-measurement datasets. Bukit Kototabang station is located relatively high on altitude compared to other mentioned stations; therefore, the result of the validations could represent database performance on tropical hilly conditions. Despite being on a different condition, the three databases are still performing quite well with SARAH-E exhibiting the best estimation, while CLARA-A2 is still behind POWER on three statistical measures. By excluding Christmas Island result validation, it is clear that all validated databases overestimate the true GHI value by about 0.07–0.32 kWh/m2/day (3–7% rMBE) on average, which is roughly in line with the upcoming statement regarding the position of true GHI value among satellite-estimated GHI dataset.

3.2. Comparison between Databases

As implicitly stated by the previous equation, the correlation coefficient explains the relation of two variables that move together in parallel, or in this case, the two GHI databases intermonth variations. Two representative results, POWER vs SolarGIS and CLARA-A2 vs SolarGIS, presented in Figure 2 describe that every pairing case exhibits roughly similar GHI annual intermonth trend across the country indicated by r value close or equal to +1. Some disagreements on the GHI annual intermonth trend largely occur at the centre region of Kalimantan and are vaguely visible at the southern and central parts of Sumatera. While the exact cause of the discrepancies has yet to be revealed, some clues from prior works might provide one of the possible answers.

Looking closely at the distribution of low/no correlation regions, there are two possibilities that might be able to explain this situation. The first cause should be related to the low/no temporal variation cells within the dataset; without temporal variation, the correlation itself would be minimum or even do not exist since the covariance between pair of databases should also be minimum. The maps of CoVT in Figure 3 represent the magnitude of temporal variation within each cell of each database with their own original spatial resolution. As can be seen that regions with low/no temporal variation are roughly similar between the three representative databases with majority of low/no correlation cells coinciding with low/no temporal variation cells. Though one question still remain, only some part of low/no temporal variation cells trigger low/no correlation cells, while the rest still exhibit some correlation between databases.

The second most probable cause is the disturbance of aerosol temporal trend within the atmosphere, which is most likely related to a natural or man-made forest fire that occurs frequently during the dry season. Citing a summary data released by Indonesian Ministry of Living Environment and Forestry (MENLHK) within the 2014–2016 period, Sumatera, Kalimantan, and Papua were the three top regions suffered from forest fire phenomenon, with each contributing 38%, 33%, and 17% of total around 3 million hectares of the burnt area across the country [34]. Moreover, locations of specific provinces where large portions of forest fire occurred coincide well with low/no correlation regions, e.g., Central Kalimantan, Riau, South Sumatera, and Papua. It is suspected that the combination of regions with low/no temporal variation existence and the absence of aerosol temporal trends due to the forest fire phenomenon triggers the low/no correlation regions within the result.

CM SAF surface radiation products have been validated on several ground measurement sites around the world; among those sites, there are some cases where the local climate is prone to aerosol load changes such as China [21], India [20], and Eastern Mediterranean [33]. These three cases reported that CM SAF surface radiation products exhibit some discrepancies, and the most probable explanation so far is related to the absence of aerosol temporal trends, which is inherited from algorithm as stated on both databases manual [18, 22]. The reasoning could also be applied to the present issue regarding solar databases disagreement of the GHI annual intermonth trend for some regions in Indonesia.

The spatial distribution of Bm is presented in the box plot format instead of the map as shown in Figure 4. The box plot explains the upper and lower limits of the dataset through its caps (ends of whiskers), which has been set to be 95% and 5%, respectively, while the upper and lower border of the box informs Q3 and Q1 positions along the axis. Points outside the caps (red diamonds) are usually called outliers, or in this context, they are interpreted as the region with the mean bias value exceeds either the upper or lower limit value. As a reminder, zero axis of the box plot is relative to reference definition, i.e., POWER [13], CLARA-A2 [4, 5], and SARAH-E [6].

As depicted by the first three box plots, the POWER database overvalues other databases by about 0.5–0.22 kWh/m2/day on average. More than 90%, 60%, and 55% of POWER data estimations overvalue estimated GHI of SolarGIS, SARAH-E, and CLARA-A2, respectively. Meanwhile, when CLARA-A2 is appointed as reference (box plot 4 and 5), more than 95% and 75% of its data estimations overvalue SolarGIS and SARAH-E by 0.22 and 0.14 kWh/m2/day, respectively. The last box plot reveals that around 70% of SARAH-E data estimations overvalue SolarGIS by 0.09 kWh/m2/day. Sorting the result in the ascending order will lead to following rank configuration, SolarGIS < SARAH-E < CLARA-A2 < POWER, which overall explains the relative position of each database in term of estimated GHI magnitude.

The loss of spatial variations during the downscaling process is recorded and presented in Figure 5, which has been sorted to follow the scale of spatial resolution reduction in the descending order. By comparing 95% data of each plot (upper cap), it is clear that the box plots exhibit expected trends where the CoV value decreases as the scale of spatial resolution reduction are getting smaller. The CoV reduction of the downscaling process resembles a linear behavior with the different decreasing rate (gradient) between databases, i.e., SolarGIS (upper caps 1, 2, and 4) and SARAH-E (upper caps 3 and 5). The plots also reveal strange facts regarding variations in loss of CLARA-A2. Somehow, with a much smaller downscaling factor (upper cap 6), it is suffering roughly similar or even more spatial variations loss compared to other databases. It implies that, initially, the CLARA-A2-estimated GHI dataset already has high spatial variations. In addition, all box plots agree to have right-skewed data distribution, which implies a minor portion (less than 25%) of CoV tends to spread over high values.

In addition, the spatial CoV loss of two representative figures that have undergone the strongest downscaling (i.e., SolarGIS to CLARA-A2 and SolarGIS to POWER) is presented on Figure 6. Both figures generally agree in which region that significant impact from this downscaling process is estimated to be emerged, i.e., northern Kalimantan, most regions in Sulawesi, centre of Papua, southern of Sumatera, and Java. Apart from these regions, the spatial CoV loss is negligible.

By putting some constraints over r (GHI annual intermonth trend) and Bm (magnitude difference range) as requirements, several relative common grounds have been extracted out of the previous result and summarized into maps which are presented in Figure 7. Each map represents the agreement region of databases relative to the selected reference database. Assuming the true GHI data lies within databases estimated values, the pixels labeled as “agree” should give a higher chance of being close to the true value instead of the one with “not agree” label since they have been agreed at least by two databases. Due to the possibility of false signals triggered by one or more databases, three maps with a different number of databases combination and reference databases have been generated.

According to the maps, it is clear that the main highlights of “not agree” regions are highly influenced by previously discussed GHI annual intermonth trend disagreement around Kalimantan, Sumatera, and Papua. Since the GHI annual intermonth trend does not seem to be the cause of some “not agree” pixels on Sulawesi and Maluku region as shown in Figure 7(a), the disagreement must be triggered by unfit mean bias values, which means the relative bias between two databases does not satisfy the requirement of −x < Bm < x, where x = 0.4 kWh/m2/day. According to Figure 4 (1–4), the three databases (i.e., SolarGIS, SARAH-E, and CLARA-A2) hold some pixels whose Bm values are outside the required range. However, among the three, only SolarGIS and SARAH-E, whose caps, extend beyond the limit of defined x value, the higher probability of exhibiting excessive Bm value. Upon confirming Bm distribution for each possible databases pair, it is found that “not agree” pixels around eastern Sulawesi, eastern Nusa Tenggara, and Maluku are indeed caused by the unfit Bm values of SARAH-E and SolarGIS against POWER as the reference database. Both SARAH-E and SolarGIS employ METEOSAT-East products as either primary or secondary inputs [19, 25]. As mentioned previously, METEOSAT-East easternmost viewpoint lies at 128 E, which approximately coincides with questioned regions. As commonly known, the edge of the viewpoint is prone to the parallax effect, which usually triggers misinterpretation of actual cloud conditions, and in the end generates inaccurate irradiance values.

To assess Meteonorm, five sample points across Indonesia were randomly selected. The points have been arranged to be positioned within regions that satisfy “agree” requirements on the three aforementioned common ground maps. The points also have been filtered to only consist of a satellite-derived dataset (no interpolation involved). The assessment is conducted by simply checking r and Bm values for designated points and confirm whether the values satisfy the minimum requirements of being “agree” pixel. The assessment results presented in Table 4 reveal that only two sample points, i.e., Sumatera and Kalimantan, fulfill the GHI annual intermonth trend, as indicated by a strong correlation coefficient of more than 0.8. And one sample point complies with “agree” cell requirement of “−0.4 < Bm < 0.4 kWh/m2/day,” i.e., Sulawesi. Overall, none of these sample points passes the minimum requirement of being “agree” pixel. At this point, the Meteonorm satellite-derived dataset seems to be undervalued around 0.12–0.15 kWh/m2/day by the majority of the database while only overvalues SolarGIS by about 0.06 kWh/m2/day. Further investigation using a larger pool of data points is suggested to enhance the reliability of the present conclusion.

4. Conclusions

In this article, four satellite-derived databases, i.e., POWER, SARAH-E, CLARA-A2, and SolarGIS, which cover the region of Indonesia have been assessed in terms of data correlation and mean bias relatively between each other. Most database pairs are showing a strong correlation, which implies they follow roughly similar GHI annual intermonth trend. Meanwhile, some regions such as Central Kalimantan, Riau, South Sumatera, and Papua are exhibiting weak/no correlation, which is suspected to be triggered by the existence of cells with low/no temporal variation within the inspected databases and the absence of aerosol temporal trends due to the forest fire phenomenon. Spatial distributions of mean bias values reveal the relative position of each database in terms of estimated GHI magnitude, which on average follows this order: SolarGIS < SARAH-E < CLARA-A2 < POWER. Both computed correlation coefficient and relative bias are employed as agreement constraints to depict spatial common grounds for the four databases. The majority of “not agree” regions on presented common ground maps are influenced by the GHI annual intermonth trend disagreement between databases. While some minor “not agree” regions triggered by mean bias disagreement are believed to be the consequence of METEOSAT-East’s parallax effect.

In addition, a similar assessment was conducted on Meteonorm satellite-derived dataset relative to the four databases in five selected sample points, Bali, Java, Kalimantan, Sulawesi, and Papua. It reveals that none of these sample points passes the minimum requirement of being “agree” pixel, while the four databases do. Three databases, i.e., POWER, CLARA-A2, and SARAH-E are validated against WRDC stations located in Indonesia and surrounding nations by employing three statistical measures, namely RMSE, MBE, and correlation coefficient. According to the result, SARAH-E outperforms the other two databases especially on RMSE and time series correlation measures, while CLARA-A2 is still behind POWER on three statistical measures. The result also reveals that the three satellite-derived databases tend to overestimate their GHI value by about 0.07–0.32 kWh/m2/day (3–7% rMBE) on average compared to the true measured value.


σdb1:The standard deviation of db1
σdb2:The standard deviation of db2
Bm:Mean bias between one database relative to another
CoV:Coefficient of variation
CoVT:Temporal coefficient of variation
db1:The GHI reference data
db2:The GHI estimated data
Ee:Estimated solar energy resources
Ei:GHI value for each pixel
Ej:Multiyear average of typical daily GHI value for each month
Eq:Mean of multiyear average of typical daily GHI value for each month in a year
Em:Measured solar energy resources
Ep:Mean GHI value of pixels before downscaling
MBE:Mean bias error
n:Amount of data
RMSE:Root mean square error
r:Pearson correlation coefficient
rMBE:Relative mean bias error
rRMSE:Relative root mean square error.

Data Availability

The data used to support this study are obtained from NASA Langley Research Center (LaRC) POWER Project funded through the NASA Earth Science/Applied Science Program. The work performed was done (i.a.) by using SARAH-E and CLARA-A2 data from EUMETSAT’s Satellite Application Facility on Climate Monitoring (CM SAF), copyright (2019) EUMETSAT. © 2019, The World Bank, Source: Global Solar Atlas 2.0, Solar resource data: Solargis.

Conflicts of Interest

The authors declare that there are no conflicts of interest regarding the publication of this paper.