#### Abstract

The increasing use of solar power as a source of electricity has led to increased interest in forecasting its power output over short-time horizons. Short-term forecasts are needed for operational planning, switching sources, programming backup, reserve usage, and peak load matching. However, the output of a photovoltaic (PV) system is influenced by irradiation, cloud cover, and other weather conditions. These factors make it difficult to conduct short-term PV output forecasting. In this paper, an experimental database of solar power output, solar irradiance, air, and module temperature data has been utilized. It includes data from the Green Energy Office Building in Malaysia, the Taichung Thermal Plant of Taipower, and National Penghu University. Based on the historical PV power and weather data provided in the experiment, all factors that influence photovoltaic-generated energy are discussed. Moreover, five types of forecasting modules were developed and utilized to predict the one-hour-ahead PV output. They include the ARIMA, SVM, ANN, ANFIS, and the combination models using GA algorithm. Forecasting results show the high precision and efficiency of this combination model. Therefore, the proposed model is suitable for ensuring the stable operation of a photovoltaic generation system.

#### 1. Introduction

Taiwan generally imports more than 97% of its energy; 88% of all energy is generated by burning fossil fuels. Additionally, Taiwan’s power grid is mainly a centralized power grid, which can cause insufficient peak power supply and lacks diversified energy sources and distributed peak-load auxiliary power. Growing domestic and foreign environmental awareness has made the development of clean power increasingly important. Wind and solar energy are the most common natural energies. In metropolitan areas with weak winds, solar energy has become the dominant renewable energy. Notably, many developed countries have invested heavily in R&D and provided incentives that promote the use and development of photovoltaic (PV) systems. During 2000–2013, the compound growth rate of the global solar cell market reached 35.5%, indicating that the PV industry is growing rapidly. However, due to the fact that solar irradiance and climate factors can influence solar power output, variation in power generation capacity may be a nonsteady random process. Additionally, the types and installation locations of PV panels by users and utilized in local power generation systems may vary markedly. Hence, PV panels may impact a power system once integrated into the power grid. To reduce the uncertainty associated with PV power generation capacity and to incorporate energy storage systems into power systems, analyzing and predicting PV power output have become essential.

Many factors, such as the sun’s elevation angle, atmospheric conditions, hours of sunshine, and the season, affect solar irradiance. This results in significant randomness in solar power generation. As meteorology has progressed, many studies of solar irradiance have applied several models, such as the clear-sky solar irradiance model [1], for example, the American Society of Heating, Refrigerating and Air-Conditioning Engineers (ASHRAE) energy model, the semi-sinusoidal model, and the Collares-Pereira and Rabl model. These models do not consider variations in surface solar irradiance due to complex weather and environmental effects, typically resulting in large differences between calculation results and actual values. Additionally, although huge databases of weather forecasts exist, determining surface solar irradiance accurately and predicting power output accurately are difficult tasks. Therefore, one must consider using artificial intelligence and statistical theory to make short-term predictions about the amount of power generated by PV systems.

Among the many solar energy-related studies, the prediction of solar irradiance remains a basis for most predictions of PV power generation [2–7]. Many solar irradiance forecasting models have been developed. These models can be divided into two main groups: statistical models and NWP models. Statistical models are based upon the analysis of historical data. They include time-series models, satellite data based models, sky images based models, ANN models, and wavelet analysis based models. NWP models are based on the reproduction of physical phenomenon. From the practical point of view, different data sources and forecasting techniques vary significantly with the time scope of forecasting. Generally, statistical models are typically used for the short-term forecasting, that is, from few minutes to hours. NWP models results tend to have large error for the very short horizons. However, the longer-term forecasts will depend much more heavily on the NWP models. Additionally, cloud imagery and a hybrid model can improve the results of forecasting when solar irradiance presents a strong variability like in many of insular territories.

In Taiwan, a Weather Research and Forecasting (WRF) based data assimilation system was implemented to configure the operational numerical weather prediction system (NWP) at Central Weather Bureau (CWB) of Taiwan. The NWP system consists of the Advanced Research WRF dynamical core model and the three-dimensional variational data assimilation (3DVAR) system. The triple nested model domains were centered over Taiwan Island with horizontal resolution of 45-, 15-, and 5-km and 45 levels in the vertical. The outermost domain covered most of the Asian and west Pacific area in order to better describe the evolution of the subtropical high over the Pacific Ocean and avoid the dilution from the lateral boundary problem due to the Tibetan Plateau. The operational NWP system was running 4 times a day and provided hourly output up to 84-hr forecast length. The short wave radiation parameterization chosen is based on the Goddard shortwave radiation scheme, which is two-stream multiband scheme that accounts for both diffuse and direct solar radiation. This scheme provides the downward short wave radiation flux as used in this study. A standard Monin-Obukhov similarity theory was applied in the model surface layer to interpolate the wind field at 10-m AGL.

In recent years, several research works have discussed direct prediction of PV power output [8–17]. These studies mainly used various neural network-based prediction techniques [8–10], time-series analysis [11], and hybrid forecasting models [14–17]. Various hybrid models for the PV power forecasting have become more popular. For instance, reference [14] proposed a power forecasting system that combines three forecasting modules: two numerical weather prediction models (one global and one at mesoscale) and an artificial intelligence based model. Reference [15] proposed a two-stage method where first the clear-sky model approach is used to normalize the solar power and then adaptive linear time-series models are applied for prediction. Reference [17] combines two well-known methods: the seasonal autoregressive integrated moving average method (SARIMA) and the support vector machines method (SVMs) to predict the short-term PV power. The aim of those hybrid models for PV power forecasting is to benefit from the advantages of each model and obtain globally optimal forecasting performance. For instance, several statistical and AI-based methods are utilized to determine the optimum weight between online measurements and meteorological forecasts. An accurate measurement can significantly improve the accuracy of PV power generation predictions.

There are several differences between renewable energy forecasting and load forecasting. The load forecasting is highly dependent on the historical data because load curve is periodical and seasonal. However, most of renewable energy output is not periodical; therefore, the input valuables of the PV forecasting module need not only historical PV measurement data but weather-related variables, such as solar irradiance and temperature.

The main objective of this study is to propose a novel hybrid model for PV power forecasting. The proposed model combines the ARIMA, SVM, ANN, and ANFIS methods with GA algorithm. The detail about the hybrid forecasting model is illustrated in the following sections.

#### 2. System Monitoring and Database

In this study, PV power generation systems at three different locations were considered as our case studies for 1-hour-ahead prediction of PV output; the locations of PV systems are Malaysia’s Green Energy Office (GEO) Building, Taiwan’s Taichung Thermal Power Plant, and an academic building at the National Penghu University (NPU). Although the environment in which each building is located differs slightly, monitoring records for these locations are similar, including power generation data, atmospheric temperature, solar irradiance, and module temperature. Table 1 lists the characteristics and related parameters for these three PV systems.

By the end of 2013, the total installed capacity of PV systems in Taiwan is up to 333.4 MW. To encourage solar PV installations in Taiwan, drive economic growth, and facilitate the development of the solar PV industry, the Bureau of Energy, Ministry of Economic Affairs (BOE, MOEA), has launched the “Million Solar Rooftop Program.” The target of installed capacity of the PV systems is 6200 MW by 2030 in Taiwan.

In view of abundant agriculture residue, sunshine, and rainfall, the most significant sources of renewable energy in Malaysia are biomass, solar, and small-hydropower. Currently, the installed capacity of renewable energy in Malaysia stands at less than 1% (55 MW) of total power generation capacity, including 1.5 MW of cumulative grid-connected PV installations. Nevertheless, renewable energy is expected to grow with the implementation of an FiT scheme, in which individuals can sell the power generated to utility companies such as TNB and Sabah Electricity Sendirian Berhad (SESB) at a fixed premium rate for specific period.

Although daily PV output power varies, by observing time-series characteristics, a regular daily pattern can be identified. If two similar days with similar solar irradiance and cloud distribution are compared, the patterns of power output on those two days should be very similar. However, due to dramatic variations in cloud cover and the difficulty in making accurate predictions, the actual curve of PV power generation is a fundamental waveform with random fluctuations. For instance, the PV system power output curve of Malaysia’s GEO Building has large fluctuations, mainly because data are sampled at 15-minute intervals (Figure 1). Short-term variations in the amount of solar irradiance and cloud cover also produce different daily fluctuations in its power curve. Figure 2 shows time-series data of PV power generation by the Taichung Thermal Power Plant for one week in July using a three-dimensional (3-D) graphic representation. This daily variation curve has a fundamental curved waveform with some increasing and decreasing random fluctuations. Figure 3 compares PV power generation curves of the Taichung Thermal Power Plant in summer and winter. The power output in summer (solid line) is markedly higher than that in winter (dashed line); however, dramatic fluctuations within a pattern occur in both summer and winter. Additionally, variation in power output during roughly 13-14 hours of sunshine generally corresponds to the variation in solar irradiance (Figure 3). However, when a day suddenly changes from a sunny day into a rainy day or from a very cloudy day into a partially cloudy day, irradiance changes dramatically, resulting in an inaccurate prediction of power output. Therefore, using hourly weather forecast information, such as sunny day, cloudy day, rainy day, and average cloud cover, as input variables for a prediction model can increase its prediction accuracy for sudden changes in day type.

#### 3. Determination on Input Variables for the PV Power Forecasting Model

Generally, sufficiently accurate solar irradiance data can be input into a formula to derive predicted output power. Predicting power output from renewable energies is closely related to weather forecast predictions. To predict the amount of solar irradiance or power generated, various environmental factors, such as solar irradiance, cloud cover, atmospheric pressure, and temperature, along with the conversion efficiency of PV panels, installation angles, dust on a PV panel, and other random factors must be considered. All these factors affect PV system output. Hence, in choosing input variables for a prediction model, one should consider deterministic factors strongly correlated with power generation. Additionally, time-series data for PV power generation are strongly autocorrelated and therefore these historical data should be the input data of the forecasting model. An accurate prediction of PV power generation must be accompanied by a stable and detailed monitoring system to record all related information that may aid in solar energy prediction. Many large renewable energy systems with a large number of weather stations and highly stable monitoring systems exist worldwide. This infrastructure is very important to predicting PV power output.

One must choose model input variables carefully, as they are essential to accurate predictions and include weather data that can affect PV power output. However, these detailed data are not recorded by existing systems that monitor atmospheric data for various PV power generation systems. For small-capacity PV systems in small areas, typical data collected are limited to solar irradiance, power output, atmospheric temperature, and module temperature [18]. Notably, most systems cannot monitor cloud cover accurately. Additionally, typical meteorological centers usually predict hourly average cloud cover inaccurately, such that predicting PV power output accurately becomes extremely difficult. In this study, correlation analysis of measured data for the selected PV systems was applied to the three PV systems to identify variables that are strongly correlated with PV power output. Figure 4 shows the correlation analysis of power output with respect to module temperature of the Malaysian PV system. Analytical results indicate that these two variables, power output and module temperature, are strongly and positively correlated.

Figure 5 shows correlation analysis of power output with respect to solar irradiance for the Malaysian PV system. Although some data deviate slightly from the straight line with a correlation coefficient of 1, the overall trend indicates that these two variables remain strongly correlated.

In this study, correlation coefficients of power output with respect to the other three external variables for these three PV systems are calculated (Table 2). Hourly power output and hourly solar irradiance produce the strongest correlations, and hourly power output has an extremely weak correlation with hourly atmospheric temperature. Power generation module temperature is also strongly correlated with power output. However, module temperature should not be utilized as an external variable for a prediction model because it is unpredictable and weakly correlated with atmospheric temperature. Furthermore, solar irradiance can be estimated using a physical model with appropriate correction. Therefore, solar irradiance in this study is used as an external variable in the prediction model for PV power output. Further, the time-series data of power output are autocorrelated. Table 2 shows correlation coefficients of independent variables for the three PV systems. Table 2 also lists the first three correlation coefficients of independent variables. This study calculates correlation statistics for the first three entries and autocorrelations for the first 4–40 entries. Calculation results demonstrate that historical data with high correlation coefficients are data from the previous entry and the previous and next entries of the previous two cycles. These correlation analyses will be a very important reference when choosing input variables to construct any prospective model.

#### 4. Description of the Proposed PV Power Forecasting System

This work utilizes four statistical and artificial intelligence methods, including the autoregressive integrated moving average (ARIMA) model, least-squares support vector machines (LS-SVMs) model, artificial neural network (ANN), and adaptive neurofuzzy inference systems (ANFISs). Furthermore, a novel two-stage model combining these models with the GA is applied to predict short-term wind power.

Several approaches exist for time-series modeling. In an ARIMA model, the future value of a variable is assumed a linear function of several past observations and random errors. An ARIMA model has three iterative steps for model identification, parameter estimation, and diagnostic. The autocorrelation function (ACF) and the partial ACF (PACF) of sample data are utilized as basic tools to identify the best order of the ARIMA model; this involves selecting the most appropriate lags for the AR and MA parts, as well as determining whether a variable requires first differencing to induce stationarity. Once a tentative model is specified, estimation of model parameters is straightforward, usually involving the use of a least-squares estimation process. The last step in model building is diagnostic assessment of model adequacy. This three-step model construction process is typically repeated several times until a satisfactory model is finally selected. This selected model can then be utilized for prediction purposes. Compared to other forecasting techniques, the ARIMA time-series model does not require the meteorological forecast of solar irradiance that is often complicated. Due to its simplicity, the ARIMA model has been widely discussed as a statistical model for forecasting power output from a PV system. An ARIMA model is a single-variable time-series model; the basic description of the ARIMA model is illustrated by the following equation: where , ; , are autoregressive and seasonal-moving average parameters of the ARIMA model, and are trend and seasonal difference equations, and is a backshift operator that defines . By the difference equations and both ACF and PACF plots, the significant historical data in certain time lags were chosen as input variables.

The ANN techniques have been used widely to solve forecasting problems. An ANN is a mathematical tool originally based on the way the human brain processes information. An ANN, which may be considered a multivariate, nonlinear, and nonparametric method, should be able to model complex nonlinear relationships much better than conventional linear models. In this work, the multilayer feed-forward back-propagation network is used as the training algorithm because it has been the most frequently used method for training networks of PV power forecasts; furthermore, a Levenberg-Marquardt approach is applied to train the ANN model. The accuracy of ANN solutions relies heavily on chosen input variables. Many methods for identification of ANN input variables have been developed. Most utilized correlation analysis is accompanied by heuristics and an operator’s experience. In this work, significant exogenous variables and actual measured historical data have been utilized as input for the ANN model.

Fuzzy systems and neural networks are complementary tools when building intelligent systems; however, fuzzy systems lack the ability to learn and cannot adjust themselves. Merging a neural network with a fuzzy system into an integrated system would be a promising approach for building wind prediction models. The ANFIS system, a fuzzy inference system based on the Sugeno model, incorporates the self-learning ability of an ANN with the linguistic expression function of fuzzy inference, whose membership functions and fuzzy rules are acquired from a large lot of existing data instead of by experience or intuition. The classical network structure is a five-layer feed-forward neural network, which includes a fuzzification layer, rule layer, normalization layer, defuzzification layer, and a single summation neuron. In this work, the Gaussian function is the membership function.

A support vector machine (SVM) is a machine learning algorithm based on statistical learning theory and the principle of structural risk minimization. Various SVM technologies have been applied successfully for such purposes as pattern recognition, nonlinear regression estimation, and time-series forecasting. The least-squares support vector machines LS-SVM is an extension of the standard SVM. Let be independent data pairs with each denoting a vector belonging to an input space and let each be its corresponding target value in an output space, where , and is the number of data pairs. In a SVM, data are nonlinearly mapped from the input space to a high dimensional feature space using . The forecasting function can be introduced as , where is the connection weight vector and is bias. The optimization problem for the standard SVM is represented as where is a slack variable and is a positive real constant for determining penalties for forecasting errors. The above formula can be modified for the LS-SVM; the final LS-SVM model can be represented as where and are Lagrange multipliers. The Kernel function, which is an inner product of two functions, is incorporated into computations. This work uses the radial basis function (RBF) kernel.

Each single prediction model has advantages and disadvantages. For instance, some prediction models have a better response capability for rapid changes in a waveform, while other models may be better at capturing steady-state periodical variations. A novel hybrid model combining these prediction models is proposed in this work. It includes a two-stage forecasting process: in the first stage, the PV power is predicted individually by four models; then the forecasting results become the inputs of the second-stage model. In the second stage, the objective of the proposed second-stage hybrid forecasting model is to assign the weight coefficient for each first-stage individual model and form the final hybrid model. In this work, weight coefficients are acquired using an adaptive GA. This weighted-variable hybrid forecasting model consisting of prediction models can be denoted as follows: where and is the number of forecasting models. In this work, , indicating that four first-stage forecasting models are used, that is, the ARIMA, ANN, ANFIS, and SVM models.

The GA in this work, which is based on natural selection, is a parallel, stochastic, and adaptive search algorithm. The GA simultaneously optimizes the entire populations of designs, among which initial populations can be produced randomly. New populations are produced by such operations as selection, crossover, and mutation based on the fitness of the object function, which controls the survival of different samples in offspring and continuously improves the object’s property in the next generation to generate the best results. Figure 6 shows the flowchart of the proposed hybrid GA-based model. Additionally, the complete model block diagram is shown in Figure 7.

Actually, there are many methods to construct the combinational forecasting model. For example, the immune algorithm (IA) can also be utilized, which mimics a basic immune system defending against bacteria, viruses, and other disease-related organisms. Additionally, the coding structure for an IA is similar to that of a GA but adds the diversity and affinity calculation strategy. In this paper, the GA was utilized to combine the first-stage forecasting models because it is routinely used to generate useful solutions to optimization and search problems, including the forecasting works.

#### 5. Forecasting Results and Discussions

In this study, five prediction models, including four individual models and one combination model, are used for a 1-hour-ahead prediction of power output by the three PV systems. Among these models, the ARIMA model uses a single input variable, in which the autocorrelation coefficient of time-series data is used to identify important historical lead times; the ANN prediction model uses two input variables, including PV power output with high correlation coefficients and the anticipated value of hourly solar irradiance; the LS-SVM and ANFIS prediction models use the previous entry of PV power output and the current entry of predicted solar irradiance. Although the LS-SVM and ANFIS models can use additional input variables, estimation and prediction time for model parameters will increase. Thus, this study mainly uses two input variables to reduce the time required for a prediction. The proposed hybrid prediction model, which is based on the genetic algorithm, uses evolutionary theory to estimate the weight of each single prediction model in the hybrid model; these weights are then used to predict PV power output during the next hour.

Figures 8, 9, and 10 present prediction results by applying each prediction model to the PV system installed at NPU. Due to the length limitation for this paper, only the predicted power within a certain week in a predicted month is shown. In this case, the ANFIS and LS-SVM predictions are the most accurate, and predicted curves fit the actual power output curve well. However, some peaks and turning points for the PV output are not predicted accurately. The ARIMA model generates a relatively less accurate prediction. Particularly, when the profile of the daily PV curve varies markedly, fitting the curve with a linear mathematical function is very difficult. According to anticipated prediction results, the prediction PV power time series obtained by the ARIMA model normally follows the time sequence profile from the previous cycle. Therefore, values from the previous cycle or even the previous two cycles are used as input variables for this model, such that prediction results can exhibit a memory effect from previous cycles. Figure 10 shows prediction results by the hybrid model using the genetic algorithm. The time-series data correspond to actual data; however, some inconsistencies fail to generate a very accurate result.

The performance of the proposed forecasting model must be evaluated. There are several evaluation criteria for PV forecasting models, such as mean absolute error (MAE), root mean square error (RMSE), and others. In this work, the normalized root mean square error (NRMSE) was utilized because it can provide the comparative analysis for different PV installed-capacity cases. It is defined as follows: where , , indicate PV installed capacity, actual PV power output, and PV power forecasting value, respectively, and is the total number of samples. Table 3 summarized the forecasting results at three PV systems by using different forecasting models. It is obvious that the proposed hybrid model is superior to other traditional statistical or artificial intelligent methods because its forecasting error is the minimum.

Figure 11 shows the prediction of hourly power output by the PV system at the Taichung Thermal Power Plant. This figure presents the predicted value, which is compared with actual data, and the analyzed time series was taken only for a certain week in the predicted month. Although the actual profile of time-series data of PV power output varied greatly due to varying weather conditions, applying the relatively more accurate hybrid prediction model still provides a much more accurate 1-hour-ahead prediction of power output.

#### 6. Conclusions

As the construction scale and capacity of the PV power systems continue expanding, PV power prediction techniques can reduce the effect of randomness on PV power output. In this study, five prediction models were applied for short-term prediction of power output by three PV systems, and actual predictions have been performed for these PV systems in Taiwan and Malaysia. In terms of input variables used by prediction models, the correlation analyses of related external variables have been carried out and historical power output data and solar irradiance intensities are used as input variables by the prediction models. Analytical results demonstrate that the hybrid prediction model generates the most accurate predictions in most cases; however, it still needs additional and accurate data to monitor the prediction process for large variations in time-series data of PV power output. Specifically, the classification of any daily prediction with respect to the weather forecast data should be performed first. These data are used to predict weather patterns for that day. The amount of cloud cover is also an input variable for the PV prediction model. However, this requires long-term and precise observational records, and an obvious, individual exception sample should be excluded prior to analysis. Furthermore, for yearly prediction models, date and time should also be considered as input variables.

#### Conflict of Interests

The authors declare that there is no conflict of interests regarding the publication of this paper.

#### Acknowledgments

This work was partially supported by National Science of Council (NSC) of Taiwan under Grant no. NSC 102-2221-E-194-032-.