Abstract

Solar energy is a costless and readily available form of energy that has shown to be one of the cleanest and most plentiful renewable energy sources. Various large-scale solar photovoltaic (PV) facilities are being utilized to minimize pollution and carbon emissions generated by fossil energy in many nations across the world. The power sequence of PV is influenced by a variety of diverse variables, and it is very unpredictable and volatile. Unlike the distributed PVs, the centralized PVs have the same intensity and location. The obstruction of clouds causes minor variations in the output power of the PV, making the power forecasting more difficult. To solve the aforementioned difficulties, this article provides a new neural network-based technique for PV power optimization and forecasting. The first stage is to create a cloud trajectory tracking system based on cloud photos taken from the ground. Second, a cloud trajectory tracking-based irradiance coefficient prediction model was built. Then, to increase forecast accuracy, build an error correcting model. For verification, data from a centralized solar power station was used. The results show that the proposed algorithm has technological applications and may greatly improve prediction accuracy.

1. Introduction

With the continuous improvement of the level of solar energy development and utilization, the proportion of photovoltaics connected to the grid is increasing. Due to the instability of the output of large-scale photovoltaics, the grid connection is easy to cause fluctuations in grid voltage, current, and frequency, affecting the power quality of the grid [1]. In order to eliminate the above-mentioned adverse effects, it is particularly important to improve the prediction accuracy of photovoltaic power. Accurate centralized photovoltaic power prediction is an important means to improve the operation stability of the power system and the photovoltaic power absorption capacity [2]. The power generation of centralized photovoltaic power plants shows strong volatility and randomness due to the influence of many meteorological and environmental factors [3]. The distribution and motion changes of clouds and the differences in radiation attenuation of different characteristic clouds are the strong correlation factors that cause the uncertainty of photovoltaic power changes [4].

Due to the shading of solar radiation by moving cloud clusters, the photovoltaic power generation fluctuates rapidly and violently in minutes, which will have a huge impact on the stability of the power grid [5]. The traditional photovoltaic power prediction model based on the historical power data of photovoltaic power plants and numerical weather forecast is restricted by the algorithm principle and data accuracy, and it is difficult to accurately predict the minute-level power fluctuation caused by cloud movement. Therefore, for the traditional “ultra-short-term” forecast, it is necessary to further find a photovoltaic power forecast method suitable for “minute-level” weather changes. In particular, under certain weather conditions such as cloudy, the surface irradiance level is affected by the moving cloud sound which fluctuates wildly on a minute-scale time scale. At this time, there is almost no correlation between irradiance fluctuations and historical irradiance data [6]. The above phenomena pose challenges to minute-level meteorological feature extraction and prediction.

The all-sky imager is currently the mainstream equipment used in minute-level photovoltaic forecasting. Scholars at home and abroad use it to photograph cloud clusters in the sky to obtain intuitive cloud cluster characteristics. On this basis, the ultra-short-term photovoltaic power is predicted [7, 8]. Reference [9] extracts atmospheric radiation data through ground-based cloud image processing and uses radial basis function neural network to predict irradiance, but it lacks refined extraction and description of cloud clusters and their motion and ignores air temperature, aerosol concentration, etc. influence factors, resulting in large prediction errors. Reference [10] uses a prediction method combining numerical weather forecasting and ground-based cloud image processing, which improves the photovoltaic prediction accuracy to a certain extent but ignores the sudden change of photovoltaic power that may be caused by moving cloud clusters. The power prediction accuracy needs to be improved. Reference [11] established a mixed mapping model of surface illumination based on deep learning method through cluster analysis of sky image data. However, this method lacks an early warning mechanism for sudden changes in irradiance, resulting in low prediction accuracy in the case of variable irradiance. To sum up, the photovoltaic power prediction based on ground-based cloud map has become a research hotspot of scholars at home and abroad in recent years, but there is still some room for improvement. In particular, considering the refined extraction and description of moving cloud clusters and the establishment of the irradiance mutation prediction mechanism, the ultra-short-term photovoltaic power prediction accuracy can be improved. In terms of prediction methods, artificial neural network algorithms have been studied and practiced in the field of photovoltaic power prediction [12, 13]. Reference [14] proposed an ultra-short-term photovoltaic power prediction method based on variational modal decomposition combined with a deep echo state network hybrid model, but its prediction steps and network complexity are too high, resulting in multifeature sample sets and prediction efficiency in different scenarios, which needs improvement. In addition, this method lacks refined error classification and correction strategies, and the prediction accuracy needs to be improved. Reference [15] proposed a hybrid model short-term load prediction method based on convolutional neural network (CNN) and long short-term memory network (LSTM), which improved the prediction accuracy to a certain extent, but this method simply combined CNN and LSTM, destroys the temporal structure of the feature matrix, weakens the inherent correlation between the time series of each feature, and leads to an overall poor prediction accuracy.

Aiming at the above problems, based on cloud picture feature extraction, this research proposed an efficient power forecasting algorithm. First, the trajectory of the cloud cluster is followed and the irradiance coefficient is predicted by extracting and matching the characteristics of the color cloud map. Second, the IAM-CNN-LSTM hybrid neural network prediction yields tentative results. Finally, to acquire the final accurate prediction result, a classification error correction model based on output volatility identification is applied. The numerical example verification results demonstrate that the suggested technique can successfully increase the prediction accuracy of solar power under a variety of weather situations.

The remaining of the paper is organized as follows. In Section 2, the cloud image feature extraction and correction are discussed. In Section 3, the cloud trajectory tracking model is explained. In Section 4, the proposed algorithm is discussed. In Section 5, the centralized PV power prediction process is elaborated. In Section 6, the case study is evaluated with experimentations while Section 7 concludes the paper.

2. Cloud Image Feature Extraction and Description

2.1. Distortion Correction of Ground-Based Cloud Images

The ground-based cloud image can effectively reflect the characteristics of moving cloud clusters in the whole sky. In this paper, the SRF-02 all-sky imager with no shielding arm is used to capture 180° wide-angle cloud images with the help of fisheye lens JPEG format storage, the size is (pixels), the effective sky part accounts for (pixels), and the coverage is an area with a radius of 5 km, which is suitable for most centralized photovoltaic power plants. Since the ground-based cloud image is captured in the whole sky by the fisheye camera, the sky image is deformed, and it is difficult to quantitatively describe the cloud movement [16].

The traditional latitude and longitude fisheye correction algorithm [17] is widely used because it does not require external equipment calibration, and its process is shown in Figure 1.

The comparison of fisheye cloud images before and after distortion correction is shown in Figure 2.

2.2. Feature Point Extraction Method of Cloud Image

In order to improve the prediction accuracy of photovoltaic power under complex and sudden weather, the meteorological information needs to meet the data integrity and real-time performance as much as possible. Under the time scale of “minute level,” the cloud shape and local characteristics of moving cloud clusters are not easy to change greatly. The cloud clusters in adjacent time point cloud images usually have high similarity in local features. Therefore, feature points can be selected. The extraction algorithm enables fast and precise positioning of cloud clusters [18].

In order to meet the rapidity requirement of minute-level power prediction, this paper conducts improvement research on the basis of a fast improved algorithm of Scale Invariant Feature Transform (SIFT), namely, the Speeded Up Robust Feature (SURF) algorithm. The traditional SURF algorithm converts the color information of the image into a grayscale image for feature point detection and extraction, but the extracted features are not enough to describe all the information of the feature point. In order to take into account the speed and efficiency of image processing algorithms, this paper proposes a SURF algorithm (denoted as HSV-SURF) that considers HSV color space. The HSV-SURF algorithm consists of three parts: feature point extraction, feature point color space conversion, and feature point description.

2.2.1. Extraction of Feature Points

The SURF algorithm uses the Hessian matrix to extract feature points. The Hessian matrix is the core of the SURF algorithm:

The discriminant of the Hessian matrix is

In the formula, is the Hessian matrix, and det(·) is the determinant of a matrix; that is, the value of the above discriminant is the eigenvalue of the Hessian matrix. According to the positive or negative value of the discriminant, it is judged whether the point is an extreme point. In the SURF algorithm, the image pixel is used to replace , and the Hessian matrix of the pixel point whose scale is can be defined as

is the representation of an image at different resolutions, which can be realized by convolution of Gaussian kernel (·) and pixel . (·) is a Gaussian function. Taking (·) and (·) as an example, the calculation process is as follows ( represents the convolution operation):

Since the Gaussian kernel obeys a normal distribution, the coefficients are getting lower and lower from the center point. In order to improve the operation speed, SURF uses a box filter to approximately replace the Gaussian filter to improve the operation speed. Constructing a fast Hessian, further solving gives an approximation of the determinant of the Hessian for each pixel:

In the formula, , , and are the convolution of the box filter and the pixel value of this point, respectively, to replace , , and .

As shown in Figure 3, taking the filter as an example, if the feature value of the pixel marked in red in Figure 3 is greater than the surrounding pixels, it can be regarded as the feature point of the region. After the above operations, the characteristic points of cloud clusters can be quickly and accurately extracted, which is helpful for the prediction of subsequent cloud clusters’ trajectory and irradiance coefficient and ensures the rapidity and accuracy of ultra-short-term photovoltaic prediction.

2.2.2. Feature Point Color Space Conversion

The commonly used RGB color space is the same color standard in the industry. This color space is suitable for computer display systems, but it is not suitable for cloud image processing because the external brightness easily affects the three components of red (), green (), and blue (). Compared with the commonly used RGB color space, the HSV color space can intuitively express the light and shade, hue, and vividness of colors, which is convenient for the contrast between colors.

The HSV model has three parameters that reflect color information: hue (Hue), saturation (Saturation), and brightness (Value). Each parameter independently represents an image attribute.

The specific steps and calculation methods for converting RGB space to HSV space are as follows: (1)First, normalize the values; find the maximum and minimum values; and calculate the brightness value as follows:(2)If the maximum value is equal to the minimum value, namely, or or , it means that the point is gray and the saturation is 0:(3)If the maximum value and the minimum value are not equal, the value of saturation is calculated according to the brightness:

If , then , satisfying ,, and .

Finally, it is necessary to convert the values of the 3 parameters of HSV into the proper data types (set the cloud image to 8-bit image), and the conversion method is as follows: , and .

2.2.3. Description of Feature Points

The HSV-SURF algorithm adopts the Haar wavelet feature in the circular neighborhood of statistical feature points and combines the description method of the joint feature of HSV color information. The center point of Figure 4(a) is the position of the current feature point. Each small cell represents a pixel in the scale space where the feature point neighborhood is located. The direction of the arrow represents the gradient direction of the pixel, and the length of the arrow is long. Degree represents the gradient modulus value, and it is weighted with a Gaussian window (blue area in Figure 4). Plot the accumulated value of each gradient direction on each small block ofto form a seed point; as shown in Figure 4(b), each feature point consists of 4 seed points; each seed point has 8 consisting of a direction vector.

The original SURF feature point descriptor is a 64-dimensional vector, and now, the three parameters of the HSV color space are added to it to obtain a new feature point descriptor, as shown below:

In the formula, are the characteristic factors of Haar wavelet feature extraction in the neighborhood of feature points. According to Equation (9), it can be seen that the feature point descriptor has included the color information of the image feature point itself. Considering the calculation time, the change from the expansion of the 64-dimensional vector to the 67-dimensional vector is not enough to have a great impact on the operation time. From the performance analysis, after considering the color information, the information of the feature points themselves is more detailed, and the accuracy will be improved to a certain extent when the feature points are matched.

The final SURF feature point extraction result is shown in Figure 5. The black circle center represents the center position of the extracted feature points, and the neighborhoods of different sizes (circular areas corresponding to the black feature points) are the feature point description areas obtained by the feature points enlarging in equal proportions.

3. Cloud Trajectory Tracking and Cloud Cluster Extraction

3.1. Cloud Trajectory Tracking Model

The HSV-SURF algorithm extracts several feature descriptors. It is necessary to match the corresponding feature points and compare the coordinate changes of the feature points to obtain the cloud movement. In order to meet the requirements of fastness and accuracy of feature point matching, this paper proposes a matching method based on Fast Nearest Neighbor Approximation Search Function Library (FLANN) and Improved Random Sampling Consensus (IRANSAC) algorithm to match and correct cloud feature points.

The feature space of the FLANN algorithm model is generally an -dimensional real number vector space. The core of the algorithm is to find the closest point to the instance point by using the Euclidean distance . It is calculated as follows:

The data points are divided into specific parts in the -dimensional space by random k-d forest algorithm. After all the Euclidean distances in the vector space are stored through the random k-d forest structure, the closest point to the reference point can be searched effectively. The whole search process is a recursive process from top to bottom in random k-d forest.

In order to make up for the problem that the traditional RANSAC algorithm takes a lot of time when the number of matching points is large and the number of matching points is high, this paper proposes a modified RANSAC (IRANSAC) algorithm. Using the IRANSAC algorithm to find an optimal parameter matrix can maximize the number of feature points that satisfy the following matrix: where represents the position of the feature point in the cloud image at time and represents the position of the feature point in the cloud image at time . The specific steps of the IRANSAC algorithm are as shown in Algorithm 1.

(1) The feature point matching pairs whose number of intersection points with other matching pairs is greater than 3 are eliminated from the matching pair set.
(2) Sort the remaining feature point matching pairs in an ascending order according to the ratio to form a feature point matching pair set . Among them, is the minimum value of the Euclidean distance in the feature point matching pair; is the Euclidean distance of the remaining point pairs.
(3) Divide into four parts, , and in sequence; extract five sample data from in turn; and solve the transformation matrix .
(4) If the error between the remaining matching point pairs in and is less than the agreed threshold (take 0.6), it is a consistent set with the sample drawn.
(5) Use the obtained transformation matrix to traverse the corresponding matching points in , calculate the proportion of matching point pairs in the set that satisfy the transformation matrix under the threshold, repeat the above steps, select the transformation matrix with the largest , and record it as as the final transformation matrix.

Using the IRANSAC algorithm to filter the feature points of the cloud image, the feature point set for further eliminating the bad points is obtained. Figures 6 and 7, respectively, show the matching of feature points before and after IRANSAC point pairing.

Based on the spatial scale invariance of the image before and after the time, the four coordinate points of the cloud map at time are analyzed through the transformation matrix, and the numerical changes of the corresponding coordinates of the cloud map at time are analyzed, and the movement direction and speed of the cloud group from time to are extracted. Due to the relatively stable air flow movement at the high altitude where the cloud cluster is located, the speed and direction of the cloud cluster are not prone to sudden changes in a short period of time under normal weather. In this paper, the cloud map at the current time and the cloud maps at the five time nodes before the current time are used to predict the time . The cloud movement speed is as follows:

In the formula, is the influence factor of the speed on the prespeed at the time of .

To sum up, the cloud trajectory tracking model based on FLANN-IRANSAC can effectively reduce the feature point pair mismatch rate and algorithm time consumption and improve the description and prediction accuracy of cloud trajectory, which directly affects the irradiance coefficient prediction accuracy, thereby improving photovoltaic power prediction accuracy.

3.2. Irradiance Coefficient Prediction Method considering Cloud Extraction and Trajectory Prediction
3.2.1. Cloud Extraction Model Based on Improved Threshold Segmentation Method

The thickness of cloud clusters has a significant effect on the transmittance of solar radiation, and cloud clusters with different thicknesses in ground-based cloud images have significant differences in their corresponding pixel gray values [19]. In order to more accurately identify and extract cloud clusters and further improve the estimation accuracy of ground radiation level, this paper proposes an improved threshold segmentation method, which analyzes cloud images with the help of the grayscale of historical clear sky images that match the current clear sky pixels. The judgment is based on the following:

In the formula:, is the average grayscale value of the historical clear sky image after removing the sun and its surrounding areas that are difficult to distinguish; is the grayscale mean value of the background clear sky area at time .

On the basis of checking the cloud clusters, thin clouds and thick clouds are further extracted. Thick cloud thresholds and thin cloud thresholds can be inferred by the maximum interclass variance (Otsu) method. The method flow is shown in Figure 8.

The specific steps are as follows: (1)Set the part of the surrounding area of the sun that is higher than the set grayscale threshold as the extraction error area, and replace it with black pixels to improve the subsequent extraction accuracy(2)Calculate the gray value of each pixel point, and compare the gray value of the most matching clear sky image in the clear sky image set, and calculate the difference value . Compare the grayscale difference of the two with the set thick cloud threshold to distinguish thick cloud pixels(3)For the remaining pixels, a correction factor is introduced to improve the accuracy of cloud cluster distinction and further reduce the confounding effect of atmospheric aerosols on thin cloud pixels(4)Use the corrected grayscale difference to compare with the set thin cloud threshold to distinguish the thin cloud pixel points. The grayscale distribution of the cloud image and the thin and thick cloud thresholds are shown in Figure 9

3.2.2. Irradiance Coefficient Prediction Method considering Cloud Trajectory Tracking

The cloud cover obtained by the above model can only directly reflect the irradiance level under a specific clear sky background type, because the same cloud layer distribution will cause different irradiance levels under different atmospheric aerosol concentration weather types [20]. Therefore, it is difficult to accurately reflect the solar irradiance under variable meteorological conditions only by refining cloud cover. Aiming at the above problems, this paper proposes the irradiance coefficient obtained by predicting the cloud movement trajectory in advance and transforming it from the cloud image pixel matrix. Intuitively, the irradiance coefficient is the minute-level irradiance level predicted value derived from the real-time cloud map to represent the irradiance level under the cloud layer distribution at the next moment.

After the cloud image is analyzed by the HSV color model, the value representing the brightness of the image is extracted as a direct indicator to measure the brightness of the sky. The extraction steps of the irradiance coefficient are as follows: (1)First, the grayscale image for distinguishing thin and thick clouds is obtained by the improved threshold segmentation model, set the transmission coefficients of thin clouds and thick clouds, respectively, and establish a matrix () with the same pixel dimension as the picture, where and are the pixel sizes of the original cloud image, respectively. Assign the corresponding positions of the thin cloud and thick cloud pixel points to the transmittance coefficient of thin cloud and thick cloud, respectively, and assign the corresponding position of the clear sky pixel point to 1 to obtain the cloud image transmittance coefficient matrix at time .(2)Similarly, the cloud image transmission coefficient matrix at time can be obtained(3)Extract the value of the optimal matching clear sky image pixel point at the current moment, and form the target matrix , and the matrix order is the cloud image pixel ()(4)Build the irradiance coefficient estimation model:

In the formula: is the value of the coordinate pixel point in the matrix ; is the clear sky correction coefficient, which is the ratio of the average value of the clear sky pixel at time to the average value of the best matching clear sky map.

In engineering practice, the prediction of real-time solar irradiance by traditional numerical weather forecast has the problems of large interval between forecast time points and poor prediction accuracy, and it is difficult to be accurate to the minute-level change of irradiance. The purpose of introducing the irradiance coefficient is to use the advanced irradiance prediction information extracted from the ground cloud image level as one of the input characteristic factors at the prediction time and to improve the adaptability of the photovoltaic prediction model to complex weather conditions.

4. IAM-CNN-LSTM Hybrid Neural Network

The CNN-LSTM hybrid model faces the time series matrix composed of relatively independent feature sequences and makes full use of CNN to extract local correlation features in the data space. Since the features used to predict photovoltaic power (temperature, irradiance, irradiance coefficient, cloud cover, etc.) are relatively independent feature time series, it is difficult to describe the intrinsic relationship between the feature time series and a single use of CNN or LSTM. The correlation characteristics between sequences and the long-term regularity of feature time series cannot be extracted at the same time [21, 22]. The traditional CNN-LSTM network simply splices CNN and LSTM, which is easy to destroy the correlation between time series. Therefore, improvements need to be made on the basis of traditional CNN-LSTM to eliminate the above drawbacks. This paper proposes an improved CNN-LSTM neural network algorithm combined with the improved attention mechanism. The main advantage of this algorithm is that an improved attention layer is added between the CNN network and the LSTM layer to adjust the network’s attention to different features, and combined with the Dropout algorithm with a pruning strategy to suppress overfitting, an error correction model based on fluctuation classification is proposed.

As shown in Figure 10, this paper combines the feature sequences at a certain time and the photovoltaic power at that time into a feature vector describing the photovoltaic output at that time and uses the sliding window method at fixed time intervals to intercept to obtain the input feature matrix [23]. Let the time window width be 5, the step size be 1, and the feature vector dimension be , so the input order is a feature matrix with 5 rows and a column, and the input feature matrix at each moment also moves backwards with the change of the moment to be predicted. Then, the feature vector extracted by the CNN network is input into the multilayer LSTM recurrent neural network, and the preliminary power prediction value can be finally obtained.

In order to improve the prediction performance of the traditional CNN-LSTM algorithm, this paper makes the following improvements.

4.1. Improved Attention Mechanism

Due to the large number of features in the input photovoltaic prediction model, in order to highlight more critical factors and improve the accuracy of the model, this paper proposes an improved attention mechanism (IAM) based on the Competitive Random Search (CRS) algorithm [24, 25]. It makes up for the deficiency that the network pays attention to the characteristics of different related factors on the same time scale and improves the accuracy of photovoltaic prediction models through the formulation of differentiated weight distributions.

CRS generates optimal parameter combinations in the attention layer. Figure 11 introduces the operation process of CRS, which consists of four parts “I, II, III, and IV.”

“I” provides the weights of the attention layer, then converted to binary code in “II,” and the subset is the attention weights, which are transmitted to the LSTM neural network, where corresponding prediction errors are generated in the network loss value. Next, select the optimal attention weight subset according to the loss of in “III” and , and repeat the cycle of their subset combinations. Finally, a new attention weight is reconstructed in “IV.”

4.2. Overfitting Method

In the artificial neural network model, the number of iterations of weight learning is too many, and the phenomenon of overfitting is easy to occur because it is easy to fit the noise and irrelevant features of the training data. The latest research shows [26] that the traditional Dropout of random selection of neurons will significantly increase the computational load and algorithm time consumption and have a great impact on the rapidity of the prediction algorithm.

The Targeted Dropout algorithm can rank weights or neurons according to a fast approximation measure of weight importance and apply Dropout to less important elements [27]. The specific implementation method of the Targeted Dropout algorithm is as follows: (1)Pruning operation: for a parameterized neural network , it is hoped to find the optimal parameter , so that the loss function is as small as possible while retaining the highest order of magnitude in the neural network. Prune according to the methods of weight pruning and unit pruning. The operations of weight pruning and unit pruning are as follows:where is the network loss function; is the neural network model parameter matrix; argmax- is a function that returns the largest elements among all elements; is the column vector of the 0th column of the weight matrix ; is the row of the weight matrix element in column ; and and represent the number of columns and rows of the parameter matrix, respectively (2)Introducing randomness: by introducing the targeting ratio and deletion probability , the researchers introduced randomness into this process. The targeting ratio indicates that the smallest weights will be selected as the candidate weights of Dropout, and then, the weights in the candidate set will be removed independently with the deletion probability

The Targeted Dropout algorithm can effectively improve the model’s ability to suppress overfitting, so that the model has the same prediction performance as the model training in actual engineering operation and further improves its generalization ability.

4.3. Error Correction Model Based on Fluctuation Identification
4.3.1. Classification and Identification Method of Photovoltaic Output Volatility

Photovoltaic output is closely related to meteorological factors such as cloud layer factors and atmospheric transmittance. Figure 12 shows the photovoltaic power curves under three weather conditions. It can be seen that the photovoltaic power has significant and violent fluctuations in cloudy/cloudy days, and the fluctuation can even reach 50%. In sunny days and rainy and snowy days, the output fluctuation is relatively gentle due to the small amount of cloud or the large area of cloud cover with high thickness.

It can be seen from Figure 12 that the types of power fluctuations are closely related to weather conditions, and various types of fluctuations are relatively clear. Preliminary simulation results show that different fluctuation segments correspond to different error distributions. Therefore, different types of fluctuations can be used to classify the types of error distribution indicators, and a refined classification error correction model can be established to further improve the prediction accuracy of the model.

This paper proposes the characteristic parameters of volatility , volatility standard deviation , volatility , high output ratio and low output ratio , as shown in Table 1. Among them, is the duration of the fluctuation period. and are the power of the two points in the fluctuation section, respectively; is the number of extreme points in the fluctuation segment; is the mean value of power in the fluctuation section; and are the time when the output level is above 80% and below 20% of the rated capacity, respectively. (1)Low output and high output fluctuations: corresponding to the power fluctuations of the photovoltaic array under relatively stable low irradiation and relatively stable high irradiation, respectively(2)Rising and falling output fluctuations: corresponding to the obvious rising or falling trend of the irradiance of the photovoltaic array, respectively, without accompanying violent fluctuations(3)Oscillation fluctuation: the corresponding photovoltaic array is under the influence of more complex cloud conditions, causing the photovoltaic output to oscillate at a high frequency for a period of time(4)Peak fluctuation: when the corresponding photovoltaic array is blocked by the low transmittance moving cloud for many times, the output curve oscillates greatly, forming obvious “peaks” and “valleys”

In order to avoid the difficulty of extracting the typical fluctuation characteristics of the fixed time window division sequence, the variable time window time series segmentation method can extract the typical fluctuation segment more completely. In this paper, based on the swinging door algorithm (SDA) [28], the typical photovoltaic solar power curve is preliminarily divided into sequences. On this basis, the self-organizing map neural network (SOM) is used for cluster analysis to realize the classification and identification of the fluctuation process, as shown in Figure 13.

4.3.2. Classification Error Correction Model

Based on the volatility classification theory and identification method, considering that the fluctuations of low output and high output, as well as the rising and falling fluctuations, are relatively gentle, the difficulty of preliminary prediction is lower than that of oscillation and peak fluctuation. Error correction is performed on peak fluctuations and oscillating fluctuations to improve model prediction accuracy. Specifically, considering that peak fluctuation and oscillating fluctuation power time series prediction errors may be related to some strongly correlated meteorological indicators, the mapping relationship between segmental error values and related meteorological factors is established:

In the formula, is the error prediction value; (·) is the error mapping model corresponding to the th type of fluctuation process; and are the relevant factors extracted from historical meteorological data (irradiance, temperature, irradiance, etc.).

Because the output of the above-mentioned fluctuation segment to be corrected is relatively unstable and fluctuates violently, it is difficult to fit the linear relationship between the relevant meteorological factors and the output time series through mathematical models such as multiple linear regression. Here, the model back-substitution method is used to construct a data set composed of related meteorological factors and historical errors for different types of fluctuation segments, and the IAM-CNN-LSTM model is used to predict the fluctuation segment errors. Then, the final power prediction result is expressed as

In the formula, is the predicted photovoltaic power at time ; (·) is the preliminary photovoltaic prediction algorithm.

5. Cloud Trajectory-Based Tracking–IAM-CNN-LSTM for Ultra-Short-Term Centralized PV Power Prediction Process

The main idea of the method proposed in this paper is shown in Figures 1416 which show cloud image processing, feature data matrix composition, and algorithm flow, and the specific steps are as follows.

Step 1. Select the historical power data and historical meteorological data of the centralized photovoltaic power station and the ground-based cloud atlas with time series attributes in the same period, and delete the missing data and wrong data.

Step 2. Extract the HSVSURF feature points from the ground-based cloud image at adjacent time points, use the FLANN algorithm to perform feature matching, and use the IRANSAC algorithm to filter the mismatched feature points to predict the cloud movement direction and trajectory.

Step 3. Use the improved threshold segmentation method combined with the historical clear sky atlas to extract the thick cloud and thin cloud coverage areas, respectively, and establish an irradiance coefficient prediction model.

Step 4. Construct a convolutional neural network, input the feature matrix into CNN for feature extraction, and sequentially input the extracted time series features into the LSTM layer combined with the improved attention mechanism, and calculate the output result of the IAM-CNN-LSTM.

Step 5. Calculate the network prediction error, and use the Adam optimizer to optimize the parameters of the prediction model.

Step 6. If the maximum number of iterations () is reached, the iteration is terminated and the IAM-CNN-LSTM network parameters are output. Otherwise, let , and go to step 4.

Step 7. Use the IAM-CNN-LSTM network after training to perform ultra-short-term photovoltaic power prediction, and obtain the preliminary predicted power for the period to be predicted.

Step 8. Identify the fluctuation type of the preliminary predicted power, adopt corresponding error correction strategies for different types of fluctuation processes, and finally, add the preliminary predicted power and the error prediction result to complete the final power prediction.

6. Case Analysis

6.1. Explanation of Experimental Data

In this experiment, a photovoltaic power station in northwest China is selected, which is equipped with SRF-02 all-sky imager and meteorological environment data acquisition system. The experimental data and information include the power output of the power station, the ground-based cloud images and irradiance, surface temperature, and meteorological data from 05:00 to 19:00 every day. The specific characteristic indicators are shown in Table 2.

The sampling time interval of the above data is 5 min. Due to the obvious similarity of photovoltaic power under the same weather type, this paper selects three typical weather type days as the test set, respectively, and takes the 10 days before the test day as the training set. During the cloud map data, the cloud amount will be extracted; the same irradiance, irradiance coefficient, and measured photovoltaic power will be used as the input of the model; and the photovoltaic power will be used as the output for model training.

In order to improve the prediction performance of the model and fully reflect the timeliness of prediction, the real-time prediction method is adopted, and the average value of photovoltaic power in the next 5 minutes is predicted with the interval of 5 minutes as the prediction point to avoid the accumulation of errors caused by rolling forecasts.

Historical PV power data and meteorological data usually have abnormal data due to data communication errors, packet loss, etc., so it is necessary to exclude abnormal data to improve prediction accuracy. At the same time, in order to eliminate the dimension of the feature data and further enlarge the feature, the feature data is first normalized. In this paper, a training set is established based on the full-year data, and the 5-day data at the end of each month is set as the validation set to generate historical prediction errors and train the error correction model.

6.2. Prediction Result Verification Based on IAM-CNN-LSTM Algorithm
6.2.1. Determination of Neural Network Input Matrix Step Size

The IAM-CNN-LSTM hybrid neural network used in this paper takes the data matrix as the input of the convolutional neural network. First, the size of the convolutional neural network convolution kernel is selected. The control variable method is used to select network parameters. Since the one-dimensional convolution kernel Conv1D is selected as the CNN convolution kernel, the width of the input data matrix is determined as the total number of features 11. Now, keep the number of LSTM layers unchanged, and change the input step of the feature matrix. To make the prediction result optimal, the selection process is shown in Table 3.

6.2.2. Comparison and Analysis of Algorithm Prediction Accuracy

In order to verify the prediction performance of the algorithm in this paper, ARIMA [29], LSTM [30], CNN-LSTM [31], attention-CNN-LSTM, and the proposed method are used to predict the photovoltaic power of different weather types. The prediction results of the algorithm are shown in Table 4, and the curve comparison of the prediction results of different methods is shown in Figure 17.

The weather types a, b, and c in Table 4 represent the multimovement cloud cluster weather, the low-irradiance weather, and the less-movement cloud cluster high-irradiance weather, respectively. It can be seen that the average prediction accuracy of the algorithm in this paper is higher than that of the other four models.

The IAM-CNN-LSTM algorithm proposed in this paper has the best overall performance in prediction. Compared with the suboptimal attention-CNN-LSTM algorithm, the RMSE of the three typical weather types a, b, and c are reduced by 21.45%, 25.89%, and 22.14%, respectively, indicating the necessity of improving the attention mechanism and the Targeted Dropout algorithm. The attention-CNN-LSTM algorithm combined with the attention mechanism further filters the longitudinal correlation of the input sequence time dimension on the basis of the single CNN-LSTM algorithm, so the prediction accuracy is further improved.

Comparing the research results of similar literatures, it can be seen that reference [10] adopts the prediction method combining numerical weather forecasting and ground-based cloud image processing. Reference [32] used cloud-enhanced analysis combined with LSTM neural network to predict ultra-short-term photovoltaic forecasting method. Compared with the prediction accuracy of the above literature, the prediction accuracy of the method proposed in this paper has been significantly improved, and the specific comparison is shown in Table 5.

6.2.3. Prediction Time Evaluation

For the photovoltaic power prediction studied in this paper, it needs to meet the ultra-short-term requirements, and the balance between the calculation speed of the algorithm and the comprehensive performance of the model prediction needs to be strictly considered. Table 6 shows the time consumption comparison of each algorithm.

As shown in Table 6, comparing the training and testing time of LSTM, CNN-LSTM, attention-CNN-LSTM, and the IAM-CNN-LSTM hybrid algorithm under different epoch numbers, it can be concluded that for the IAM-CNN-LSTM hybrid network, compared with the LSTM network, the operation time is longer. Considering improving the IAM-CNN-LSTM network, the prediction accuracy of the LSTM network is significantly improved, and considering the requirements of ultra-short-term prediction and other performance indicators, improving the IAM-CNNLSTM training speed is still within the acceptable range.

6.2.4. Irradiance Coefficient Validity

In order to verify the effectiveness of the introduction of the irradiance coefficient into the prediction model and to reflect the impact of the introduction of the irradiance coefficient on the prediction results more specifically, this paper selects the multimovement cloud cluster weather for comparison and verification under the condition of using the IAM-CNN-LSTM hybrid model.

Combined with the analysis of Figure 18, it can be seen that the introduction of the irradiance coefficient has an obvious correcting effect on the “hysteresis” of the power prediction curve. The power prediction model with the introduction of the irradiance coefficient has better prediction performance. For the prediction model, the RMSE dropped by 14.52% after the power prediction model of the irradiance coefficient was introduced, indicating the effectiveness of the introduction of the irradiance coefficient.

6.2.5. Error Correction Based on Volatility Classification

In the preliminary prediction, the measurement error of the multimovement cloud cluster weather type is larger than that of the other two weather types, and the error correction based on the volatility classification is carried out for the preliminary prediction of the multimovement cloud cluster.

As shown in Figure 19, this paper uses the preliminary prediction model and the power model after introducing error correction to compare and predict and obtain the error distribution. The error distribution has a high degree of aggregation at low power and high power, and a large error occurs during power dips or swells. The prediction error distribution points have a high coverage rate for the measured error distribution points, which proves that the error correction using volatility classification for multimovement cloud cluster weather types in this paper can better fit the error distribution.

As shown in Figure 20, the error correction model significantly reduces the maximum error of the initial prediction, and the overall error of the system is controlled between -15% and 15%, indicating that the classification error correction has a certain effectiveness in improving the prediction accuracy of the model.

7. Conclusion

Aiming at the problem that the power of centralized photovoltaic power plants fluctuates violently in minutes due to the occlusion of moving clouds, based on cloud picture feature extraction, this research presents an efficient algorithm. Compared with the existing methods, the proposed method predicts and obtains the irradiance coefficient representing the irradiance level at the moment by processing the ground-based cloud image, which enhances the model’s ability to perceive the irradiance mutation in advance. By proposing an IAM-CNN-LSTM hybrid neural network prediction algorithm, formulate an error correction strategy based on fluctuation identification, which significantly improves the prediction accuracy of the model.

The prediction method proposed in this paper can obtain higher prediction accuracy under various meteorological conditions, has good generalization performance, and can meet the practical engineering requirements of “minute-level prediction.”

In this paper, the meteorological factors inside the photovoltaic power station are regarded as the same, and the uneven illumination caused by the cloud occlusion in the photovoltaic power station is not considered. The follow-up research work will focus on the division of the internal area of the large photovoltaic power station and further study the effect of cloud blocking on each area of the photovoltaic power station.

Data Availability

The data used for the findings of this study is available upon request from the corresponding author.

Conflicts of Interest

The authors declare that they have no conflicts of interest to report regarding the present study.