#### Abstract

High resolution global irradiance time series are needed for accurate simulations of photovoltaic (PV) systems, since the typical volatile PV power output induced by fast irradiance changes cannot be simulated properly with commonly available hourly averages of global irradiance. We present a two-step algorithm that is capable of synthesizing one-minute global irradiance time series based on hourly averaged datasets. The algorithm is initialized by deriving characteristic transition probability matrices (TPM) for different weather conditions (cloudless, broken clouds and overcast) from a large number of high resolution measurements. Once initialized, the algorithm is location-independent and capable of synthesizing one-minute values based on hourly averaged global irradiance of any desired location. The one-minute time series are derived by discrete-time Markov chains based on a TPM that matches the weather condition of the input dataset. One-minute time series generated with the presented algorithm are compared with measured high resolution data and show a better agreement compared to two existing synthesizing algorithms in terms of temporal variability and characteristic frequency distributions of global irradiance and clearness index values. A comparison based on measurements performed in Lindenberg, Germany, and Carpentras, France, shows a reduction of the frequency distribution root mean square errors of more than 60% compared to the two existing synthesizing algorithms.

#### 1. Introduction

The efficiency of PV modules depends mainly on the irradiance, amongst other secondary effects such as module temperature [1, 2]. The nonlinear dependency of the module efficiency on the irradiance and the influence of temperature on the module efficiency require simulations with a high temporal resolution.

For the understanding of the dynamic interaction of PV generator, storage systems, loads, and grids on a world-wide scale, one-minute data series of high quality in terms of realistic variability and frequency distributions are a key factor. Simulating those systems with hourly averaged values neglects significant behavior patterns like short time power enhancements [3].

To illustrate the importance of one-minute data for the simulation of PV systems, a 1 kWp PV example system with PV generator, DC/AC inverter, and grid is analyzed at the location of HTW Berlin, Germany. DC/AC inverters are used in grid-connected PV systems as power processing interface between the PV power source (DC) and the electric grid (AC). The output power is very sensitive to the temporal variability of the solar radiation which is highest during broken clouds.

In some important markets (e.g., Germany), PV systems can be affected by grid connection restrictions that define the maximum AC power output of the inverter as a percentage of the installed PV power on the DC side, where the usual limit is around 70% [4]. In Figure 1 the power output of the PV example system is shown in a one-minute temporal resolution (grey) and in an hourly averaged temporal resolution (blue) for a day with broken clouds. An energy yield loss of 7% is calculated when the 70% restriction is applied to the hourly averaged power output. When applying the restriction to the one-minute power output values, an energy yield loss of 10% is calculated.

Following Vanicek et al. in his contribution on the energy yield losses as a function of inverter dimensioning [3], we analyzed the dependency of energy yield losses due to maximum power clipping for PV inverters. Figure 2 shows that these losses are dependent on the inverter sizing factor as well and increase significantly when using one-minute instead of hourly averaged time series. In sum, energy losses due to inverter undersizing and maximum power clipping add up to a constant value within the inverter dimensioning range until the reciprocal of the power clipping value is reached (143%). This threshold marks the optimum inverter sizing factor for PV inverters with maximum power clipping, since losses will not decrease when using a larger inverter. With hourly averaged values (grey), the total energy loss is at 1.3% while the more precise simulation with one-minute values (blue) returns a total energy loss of 3.9%. These examples indicate that the use of hourly averaged irradiance datasets can result in falsified yield predictions.

While there exist several commercial providers and free sources of meteorological data in a resolution of one hour (e.g., Meteotest, SolarGIS, and TMY), covering nearly the whole earth, the availability of measured irradiance data with a resolution of less than an hour is very limited. This limited availability leads to the necessity to synthesize one-minute time series from hourly averaged data.

Several algorithms were developed in the past in order to synthesize one-minute global irradiance datasets with realistic variability and frequency distributions from hourly averaged datasets. The supposedly most established algorithms were developed by Aguiar and Collares-Pereira [5, 6], Skartveit and Olseth [7], and Glasbey [8]. Like many similar algorithms, the aim of those approaches is the reproduction of the characteristic frequency distributions of the solar irradiance or the clearness index , which is a measure for atmospheric transmission.

The contribution of Aguiar and Collares-Pereira was originally designed for the generation of hourly averaged time series with daily averages as input. It is based on the modeling of probability densities as Gaussian functions that depend on the clearness index . Skartveit and Olseth focused on the modeling of frequency distributions of global and direct irradiance, depending on intrahour and interhour irradiance variability, while using first-order autocorrelation for the generation of the actual time series. Glasbey proposed nonlinear autoregressive time series generation with joint marginal distributions as multivariate Gaussian mixtures. The estimation of probability density distributions of the irradiance has recently been investigated by Voskrebenzev et al. [9].

Other important contributions to this topic were provided by Assunção et al. [10] with investigations on the dependency of from the air mass and by Tovar et al. [11] with the analysis of the relation of hourly averaged* [clearness indices]* to one-minute clearness indices.

However, current algorithms only insufficiently withstand the validation against measurement values, since they underestimate irradiance enhancements caused by broken clouds, overestimate mid irradiance values, and provide one-minute time series with a variability that is too high.

Therefore, we developed an improved algorithm capable of synthesizing one-minute global irradiance time series based on hourly averaged global irradiance. The algorithm takes three different weather conditions (cloudless, broken clouds and overcast) into consideration. We show that the improved algorithm exceeds the performance of the Aguiar and the Skartveit algorithm in terms of temporal variability and characteristic frequency distributions for the calculation of short-term global irradiance at two exemplary PV installation locations.

#### 2. Measurement Data and Methodology

The new algorithm consists of two parts. The first part comprises a data preparation process that categorizes the input dataset and produces transition probability matrices (TPM) for three weather conditions: cloudless, broken clouds and overcast. The preparation process has to be executed only once.

The input dataset used for the initialization consists of global irradiance measurements conducted by the Baseline Surface Radiation Network (BSRN), featuring more than 50 locations all over the world with up to 20 years of measurements. The BSRN database is updated continuously with new measurement data; in this study we used a snapshot of May 2013. A subset of these data, one-minute global irradiance measurements performed in Lindenberg, Germany (2005), and Carpentras, France (2001), is used for the model validation.

The second part is the synthesis process for one-minute time series from hourly averaged time series. The required input of this process only consists of the prepared set of TPM and the hourly averaged time series of global irradiance that is to be disaggregated. The core of the process is based on Markov chains [12, 13], utilized in a similar way by McCracken [14].

The central idea in both parts of the new algorithm is the classification of weather situations by the temporal feature of the clearness index. In the first part, the preparation process, the BSRN dataset is split into three individual datasets corresponding to three weather conditions: cloudless, broken clouds and overcast. Each subset is then processed separately and transformed into a transition probability matrix. In the second part, the synthesis process, each daily dataset of the hourly averaged input values is categorized as well and processed according to their weather category. As a consequence, the main process steps of the new algorithm are only depending on those weather categories, in disregard of specific location information.

This leads to the advantage that the algorithm can be applied to hourly averaged datasets of arbitrary locations. Furthermore, the only required input is the hourly averaged datasets, once the TPM are created. Hence, the new algorithm combines aspects of existing work on this subject with a universally applicable method for the synthesis of one-minute time series from hourly averaged values.

##### 2.1. Classification of Weather Condition by the Clearness Index

The determination of predominant weather conditions is needed in both steps of the presented algorithm. The weather conditions are determined by the calculation of the clearness index . The clearness index is defined as the ratio of measured global irradiance at Earth’s surface and the irradiance calculated for cloudless conditions at the particular measuring site, denoted by clear sky irradiance : The calculation of the clear sky irradiance has a significant influence on the index. A modification of Bourges’ calculation [15] is used in this work, since it provided the best results for all analyzed locations: where is the elevation of the sun and is the extraterrestrial irradiance. The extraterrestrial irradiance was calculated using Maxwell’s approach [16], whereas the elevation of the sun was modelled by the algorithm of Reda and Andreas [17] from NREL.

The predominant weather condition on a particular day results in a characteristic temporal pattern of that can be used to categorize the day into one of the three classes. The detection algorithm of the weather condition is based on the daily average of hourly averaged values and the variability during a day : where is the number of hours where global and clear sky irradiance is above 0 W/m².

Table 1 gives an overview about the three weather classes and their detection conditions. An example for the classification is shown in Figure 3 for some days in August 2005 in Lindenberg, Germany. For a better visualization we fall back on one-minute values here, whereas it is to be noted that the detection is based on hourly averaged values of the clearness index , because these values form the input of the synthesis algorithm. The classification conditions are visualized as well in Figure 4 for an example dataset of Lindenberg, Germany.

##### 2.2. Transition Probability Matrices

For each class that represents a specific weather situation, matrices of transition probabilities (TPM) are created. The TPM contain information on how probable the switch is from one specific at time to another value at the time . To create those matrices, diurnal courses of measured one-minute values of equal weather class, independent of their location, are analyzed and converted into a common matrix. The frequency of every possible transition in the measured data is registered and afterwards normalized to obtain the transition probabilities. Therefore, a TPM contains all probabilities of the change of a specific value of to from one minute to the next under a specific weather condition. An example for a TPM of broken clouds weather condition is given in Table 2. In this case, the probability of to change from 0.1 to 0.09 during one minute is 17.6%, the probability to stay the same is 53.2%, and the probability to change from 0.1 to 0.01 is 0%.

The excerpt of a TPM shown in Table 2 is an example of how such transition probability matrices are structured. The actual values of the TPM however are subject to the underlying dataset that is used to create those matrices. In this study we will use different subsets of the BSRN databases for the creation process, depending on the dataset we use for validation. The validation dataset is omitted from the dataset for the TPM creation process to avoid self-reference. Hence, the resulting values in the matrices may vary, whereas the presented method to create the matrices is universal. For this reason we refrain from listing all 200 × 200 TPM in this study.

Since the TPM are created using real weather data in one-minute resolution, each measured irradiance within a given time interval leaves a fingerprint in a TPM. Hence, the spatial and temporal validity of the algorithm is increasing with the number of input datasets. As of May 2013, the BSRN comprises more than 6900 irradiance measurement months distributed globally, which is equal to more than 200 000 measurement days in one-minute resolution that leave their fingerprint in the TPM. The influence of the number of input data on the synthesis quality is referred to in Results section as well.

##### 2.3. Generation of Sequences with Markov Chains

To generate one-minute values from hourly averaged sequences of the global irradiance, the weather condition of the day in question is detected at first. Depending on the weather condition the correspondent -TPM is chosen.

The actual generation of the one-minute values is conducted with the help of the so-called discrete-time Markov chains (DTMC). DTMC is a state-based process for the modelling of real-world events. In the first order, the process is memory-less, so that the next state only depends on the current state [12, 13].

To determine the successor of a specific value at a given point in time , the probabilities belonging to are cumulated. Then, a Markov number between 0 and 1 is generated and inserted as a threshold value into the cumulated probability function. The point at which the probability function is bigger than the Markov number for the first time is defined as . The process continues in the same manner and generates a chain of 60 values per hour. From these sequences, the global irradiance for every point in time can be calculated with the help of the clear sky irradiance: This process is repeated until the mean value of the generated one-minute values equals the hourly averaged input value with desired accuracy . If necessary, the values are scaled as well:

#### 3. Results

In the following section the new algorithm is validated with measurement data and compared to the algorithms by Aguiar and Skartveit. The result comparison is conducted for two exemplary datasets of one year at two different locations: Lindenberg, Germany, 2005, and Carpentras, France, 2001. Both datasets are taken from the BSRN database. To avoid self-reference in the presented results, the creation process of the TPM excludes all measurement data of the respective location.

First, the results are presented on the basis of diurnal courses to assess the temporal variability, afterwards in the form of frequency distributions. In addition we provide a table with comparative uncertainties.

When assessing the temporal variability of synthetized one-minute values, the results for days with broken clouds and overcast skies are more important, since the simulation of sunny days is not difficult. In Figure 5 the measured course of the global irradiance (black (a)) is displayed in comparison to the temporal course of the values generated with the new algorithm (blue (b)) and the algorithms by Aguiar (c) and Skartveit (d) for an overcast day.

**(a)**

**(b)**

**(c)**

**(d)**

Although the exact occurrence of irradiance peaks in the modelled time series may differ from the measured time series, the variability of the values modelled with the new algorithm agrees with measured values to a very high degree. The mean variability of irradiance changes from one minute to the next is 7.0 W/m² for measured time series, whereas it is 8.2 W/m² for the data modelled with the new algorithm in the example dataset of Figure 5. With being the number of minutes of a day (1440), the mean variability of the global irradiance is calculated as follows: The methods of Aguiar and Skartveit lead to higher mean variability values of 13.1 W/m² and 16.6 W/m², respectively. Scientists of the Sandia National Laboratories as well refrain from using these algorithms for this reason:

Without an adequate method to account for autocorrelations (of relatively high order) in the one-minute time series of clearness index, simulations using these distribution forms would likely prove too variable, as we found for simulations using Glasbey’s model, and as we suspect would have resulted using the model of Skartveit and Olseth [18].

A more complete picture of the variability of solar irradiance can be obtained by analyzing the frequency of its gradients over a whole year. The gradients, in this case the absolute difference of the irradiance values of one minute to the next for the measured data and model data, are calculated and transferred into frequency plots. Figure 6 shows the frequency of irradiance gradients for Lindenberg, 2005, whereas the data for Carpentras, 2001, is displayed in Figure 7.

In both cases, the frequency distribution of the data modelled by the algorithms of Aguiar and Skartveit, respectively, shows significant overestimations for the gradient range from 10 to 100 W/m², while the new algorithm is able to produce irradiance values that feature a similar frequency distribution in this range. For gradients of less than 10 W/m² the data modelled by all algorithms show similar deviations from the measured data. For gradients of more than 100 W/m², the new algorithm and the approach of Skartveit display similar quality, whereas the algorithm of Aguiar shows significant underestimations for both locations.

For the transfer into deviation indicators, the deviations of the modelled data from the measured ones for each irradiance value are squared, weighed by its frequency, and summed up. The frequency weight is added in order to obtain information about the energetic relevance of each irradiance gradient. For the calculation of root mean square errors, these sums are then divided by the number of gradient steps and the square root is applied. Table 3 shows the results for all three analyzed models. In accordance with the visual impression of the frequency plots in Figures 6 and 7, the RMSE values for the new model presented in this study are significantly smaller than the RMSE values of the other two models by Aguiar and Skartveit: Since the frequency distributions of the global irradiance and the values are more reliable indicators for the applicability to simulations of photovoltaic systems, they are displayed in Figures 8, 9, 10, and 11. Measured values (black) are compared to values calculated by the conventional algorithms by Aguiar and Skartveit (grey dotted and solid), as well as to the new algorithm presented in this study (blue). Each of these distributions is calculated from values of one whole year.

For the generation of those figures, measured one-minute values were averaged to hourly means, which were then fractionized again using the new improved algorithm as well as the approaches of Aguiar and Skartveit. The figures show how often a specific irradiance value occurs in a year. The maximum at high irradiance values represents clear sky situations, while the second maximum at lower values is evoked by skies covered by clouds. Hence, the maximum at high irradiance values is considerably more pronounced at sunnier locations than at locations with very variable weather.

It can be seen that the new algorithm is reproducing the frequency distributions of the global irradiance much better than the conventional approaches. Mid irradiance values are not overestimated, and a good modelling quality is present at high irradiance values. However, very high irradiances above 1100 W/m² are slightly overestimated.

If those frequency distributions are looked at in the form of the clearness index , the problems of the existing algorithms become equally apparent (see Figures 10 and 11). With the improved algorithm the distributions can be reproduced very well, and the typical bimodal character of the distribution is modelled very precisely for cloudy locations (Lindenberg) as well as for sunnier locations (Carpentras) with a pronounced clear sky peak of a value near 1. The practical relevance of these effects was demonstrated with the help of the introductory example of the maximum power clipping at 70%.

These visual impressions give an indication, but an analysis of the uncertainty can be used for quantitative assessment. Table 4 lists root mean square errors (RMSE) for all distribution diagrams shown in Figures 8–11. For each irradiance or clearness index class the modelled distributions are compared to the measured data, and the deviations are squared and summed up for the whole range and then divided by the number of classes . The square root of this value gives the RMSE listed: The RMSE of both the irradiance and the clearness index distributions can be considerably decreased with the new algorithm compared to the conventional ones. In the case of Carpentras both distribution RMSE can be reduced between 24 % and 35 %, in case of Lindenberg between 31 % and 43 %.

To analyze the influence of the amount of input data for the TPM on the synthesis quality of the algorithm, the creation process of the TPM is varied as follows.

First, the algorithm is processed three times with its original setup, which includes all TPM except the ones from the respective location, to estimate the influence of the random Markov number generator on the RMSE range. Second, only TPM of the respective location are used. In a third iteration, the only measurement values included in the creation process are taken from BSRN stations that are located in the same climate zone as per the definition of Köppen [19]. Current data published by Rubel and Kottek [20] is taken to assign the locations to climate zones. Lindenberg is located in the climate zone* Cfb, *which mainly comprises Western Europe. In the BSRN dataset there are another seven locations in this climate zone: Cabauw (the Netherlands), Camborne and Lerwick (Great Britain), Cener (Spain), Lauder (New Zealand), Palaiseau (France), and Payerne (Switzerland). According to Rubel and Kottek, Carpentras lies in climate zone* Csa*, but unfortunately there is no other location of this climate zone in the BSRN dataset. So this third iteration is conducted for Lindenberg only. The fourth iteration comprises the usage of all available TPM, this time including the TPM of Lindenberg and Carpentras.

The synthesis of one-minute irradiance values is now repeated with all varied TPM. The RMSE values are determined according to the previous chapter. Table 5 compares the error values of the variations with the original version of the process.

The repetition of the synthesis process with the original setup (all TPM except own 1–3) demonstrates the RMSE range that can be expected due to the random nature of the Markov number generator. The interesting aspect of the various TPM modifications (own TPM only,* Cfb* TPM only, and all TPM) is that the resulting RMSE mostly lie well within the natural RMSE range of the original algorithm. In other words, the synthesis quality remains approximately the same, whether the algorithm uses only data of the respective location or all globally available data except those from the respective location. By classifying the weather situation on a daily level, the influence of location specific weather phenomena is reduced at the best. This implies that the presented algorithm is location-independent and can be applied to every location worldwide.

#### 4. Conclusions

An improved method for synthesizing one-minute time series of global irradiance has been presented that was developed on the basis of a large worldwide measurement dataset. It combines the advantages of conventional algorithms and adds new elements like the differentiation of weather conditions. It could be demonstrated that with the new approach it is possible to synthesize one-minute values of high statistical quality and realistic temporal variability. The independence on the location has been shown for selected cases. Such an independence would allow synthesizing one-minute time series for any location.

#### Conflict of Interests

The authors declare that there is no conflict of interests regarding the publication of this paper.

#### Acknowledgments

The authors acknowledge support by Deutsche Forschungsgemeinschaft and Open Access Publishing Fund of Leibniz Universität Hannover.