Abstract

This paper proposes a diagnosis method based on time series and support vector machine (SVM) to improve the timeliness, accuracy, and feasibility of fault diagnosis for photovoltaic (PV) arrays. It obtains the nominal output power of the PV array based on real-time collected data such as voltage, current, radiation, and temperature and normalizes the power values at different time points throughout the day to form a time series. Using the time series values as input data for a “one-to-one” multiclass classifier, we can identify and classify typical operational faults such as random shading, fixed shading, and aging degradation of PV arrays. The developed algorithmic model is trained and tested for different fault conditions using the data sets generated by the PV array simulation device. The experimental results show that our model has fairly good reliability and accuracy, and to some extent, it solves the problem of classifying shading and aging faults, two of which exhibit rather similar degradation characteristics.

1. Introduction

In recent years, the increasing energy shortages and climate degradation have made the importance of clean energy more urgent. Photovoltaic (PV) power generation has garnered widespread attention as a key form of clean energy due to its technological maturity and market scale advantages [1]. As of the end of 2022, China’s grid-connected PV capacity had reached 392.04 GW, and it is expected that the new PV grid-connected capacity will exceed 110 GW in 2023. This has led to China ranking first in the world for ten consecutive years in terms of new PV installations. As a result, PV power generation is becoming an important part of China’s green clean energy and is one of the key means to achieve the strategic goals of “carbon peak” and “carbon neutrality.” However, due to the presence of unpredictable factors, such as environmental changes, aging materials, operation errors, and foreign body impact, PV arrays will inevitably encounter failures or damage during their 25-year lifespan [2]. Failure to detect and handle these issues in a timely manner can significantly reduce the investment income of PV power stations and, in serious cases, even cause fires and endanger the safety of people and property. Therefore, it is of great significance to deeply explore the fault mechanisms and characteristics of PV arrays and establish a low-cost and highly reliable health detection method to ensure the long-term stable operation of PV stations and improve their economic and safety performance [3].

Currently, fault diagnosis methods for PV arrays mainly include multisensor detection, infrared image analysis, mathematical models, capacitance to ground, time-domain reflection analysis, and intelligent detection [49]. Among them, intelligent detection technology based on machine learning algorithms has become the most concerning development direction, due to its advantages in speed and accuracy, making it more suitable for unmanned and remote operation.

Liu et al. proposed an artificial intelligence diagnosis method based on feature parameters (XGBoost), which quantifies sketches using sparse data weights, approximates tree learning, extracts effective feature variables, and establishes a fault diagnosis model [10]. Zhu et al. proposed a classification method based on neural network models and unsupervised clustering to classify and diagnose fixed shading, short circuit, and aging faults of PV systems [11]. Liu et al. proposed a discriminating method based on neural network models for complete shading, partial shading, and short-circuit problems [12]. Chen et al. used the random forest algorithm to identify short circuit, aging degradation, and partial shading faults of PV systems [13]. Although these works use different algorithms, they all rely on similar electrical characteristics as the basis for identification, such as open-circuit voltage, short-circuit current, maximum power point voltage, and maximum power point current.

Research shows that the above methods can meet the engineering application requirements of remote fault diagnosis in most cases. However, when shadow shading and aging degradation occur simultaneously, the PV array may exhibit very similar or even identical electrical characteristics, leading to a significant increase in the error rate of the above determination method and resulting in diagnosis failure. Furthermore, existing research does not further classify the shadow shading type, so the related algorithms may misjudge short-term random shading caused by clouds, birds, etc., as fixed shading caused by foreign object shading, leading to the generation of false alarm information.

To solve the above problems, this paper proposes a method for distinguishing PV array faults based on time series and support vector machines. This method constructs normalized and labeled time series data based on the output power characteristics of PV arrays under different fault conditions. This data is used as the input condition for a “one-to-one” multiclass classifier, enabling effective classification of panel faults such as occlusion, aging, and attenuation. The characteristic of this method is that it fully considers the correlation of time series and views the fault as a dynamic process, thus better revealing the evolution of the fault over time. At the same time, the application of support vector machines in this method improves the ability to handle faults and better handle nonlinear problems, thus achieving more accurate fault prediction and identification.

2. Fault Classification Model

2.1. Basic Principles of Support Vector Machines

Support vector machine (SVM) is a binary classification model that seeks to identify the maximum interval hyperplane in a given feature space in order to achieve linear classification of data sets [14], as shown below:

In equation (1), the parameter matrix is denoted as , the -th input vector group is represented as , and the function constraint is denoted as It is noteworthy that the function interval in equation (1) is assumed to be 1 for convenience of derivation and optimization, and this assumption has no bearing on the optimization outcome of the objective function [15].

When tackling multiclassification problems using SVM, a multiclass classifier needs to be constructed. There are two common methods for this: the first involves solving the parameters of multiple classifiers and combining them, while the second involves combining multiple binary classifiers [16]. Due to the complicated calculation process and challenging engineering implementation of the first method, the “one-versus-one” multiclass classifier is constructed using the second method. SVM classification learners are designed for sample sets with categories, such that there is an SVM between any two samples, and the target sample is classified into the category that receives the most votes.

In practical applications, SVM needs to map data to a high-dimensional space in order to solve nonlinear classification problems and increase the computational power of the linear learner [17]. Therefore, its objective function is modified as follows:

To simplify the calculation process of equation (2), SVM introduces a kernel function, which allows direct computation of by establishing a nonlinear learner [18]. Commonly used kernel functions include linear kernel function, polynomial kernel function, and radial basis function (RBF). Here, represents the mapping from the input space to a feature space, and represents the characteristic dual form. In this paper, the RBF kernel function is adopted [19]: where is the kernel function and is the width of the kernel function [20].

2.2. Construction of Output Power Time Series

The real-time output power is first calculated based on the output voltage and current of the DC side of the PV array. Solar radiation and battery temperature compensation are then performed according to Equation (4) to obtain the conversion power under standard test conditions (1.5 atmospheric mass, irradiation 1000 W/m2, and temperature 25°C):

Here, represents the conversion power of the PV array, represents the output power of the PV array, represents the PV module power temperature coefficient, represents the temperature of the PV module, represents the rated power of the PV array, and represents the solar radiation intensity.

To form a one-dimensional time series as shown in Equation (5), uniformly select time points during a specific time period of a day and extract their system conversion power values.

Here, represents the time series with sampling time points, where represent the system conversion power values at different time points calculated using Equation (4). To speed up the operation of the SVM classification model, the above time series are normalized according to formula (6), and a fault type label is added to form a time series that can be used as the model’s input [21]. where is the mean value of the sequence and is the standard deviation of the sequence.

We repeat Equations (4)–(6) to obtain the data sample sets, and these sample sets compose the multiple time series, which is shown as follows:

Here, represents the number of sampling points in the time series, and represents the number of samples in the time series. The samples in the data set were randomly selected to constitute the training data set and the test data set based on a 7 : 3 ratio. During the training process, data with different labels were divided into two groups, and the independent SVM classifier was used to train each group. In the diagnosis phase, the output result was determined by selecting the most voted label.

The specific flow chart is shown in Figure 1. First, multiple data such as maximum operating point current, voltage, temperature, and irradiation are collected under different fault conditions for a PV power station and converted into peak power under STC conditions to construct a time series data set together with time. In order to make the data analysis results more comparable and operable, the time series data were normalized, and tags were added to different categories of fault data for subsequent classification and identification. Next, the entire data set is randomly sampled and divided into a training data set and a testing data set in a ratio of 7 : 3. For each pair of fault data with different category labels, an SVM classifier is trained to ultimately construct an SVM model with multicategory classification capabilities. Finally, the test data or real-time data is input into the trained SVM multiclass classifier for performance testing. If the accuracy of the diagnosis result is low, the relevant parameters of the SVM multiclass classifier are adjusted and retrained to optimize the performance of the classifier and improve the diagnosis accuracy.

3. Experiment and Analysis

3.1. Simulation Experimental Device

In order to train and test the SVM classifier, a large amount of operating data of the PV array under different fault conditions needs to be collected. Due to the high cost and time-consuming nature of directly simulating and collecting data on outdoor PV systems, as well as the potential for catastrophic damage to the power generation system, computer simulation or experimental modeling is typically used to obtain data samples during research [22]. However, literature research has shown that existing simulation and experimental schemes cannot reflect the impact of the irradiance intensity of the solar light source on the output characteristics of the PV array at different times [23]. Therefore, this paper designs a PV array simulation device that can simulate the electrical characteristics of an actual PV power station, allowing for the conduct of faulty photovoltaic power station experiments, as shown in Figure 2. This device uses an LED array instead of a solar light source and controls its light intensity by varying the current of the LED array, thus simulating the irradiation changes of the solar light source during the day.

The experimental device comprises a LED lamp array, a solar cell array, and a system control panel. As shown in Figure 3, the LED light array consists of 144 circular beads arranged in a square matrix. The round lamp beads have a color temperature of 5200 and are rated for a current of 700 mA and a voltage of 3.7 V. The solar array consists of two groups in series-parallel, each of which contains three solar cells. The selected solar cell has an open-circuit voltage of 7.2 V, a short-circuit current of 121 mA, and a maximum power of 0.7 W. The system control board consists of four parts: a LED lamp control module, an acquisition module, a storage module, and a display alarm module.

During the test, 12 columns of LED lights were divided into six groups for control. By adjusting the voltage of the DC constant current source, the current through the LED array was changed to simulate changes in irradiation intensity. The change in the combination of light-on time was used to simulate the change in the irradiation intensity of the solar light source at 13 different times from morning to evening throughout the day. It is worth noting that during this process, the stability of the output voltage of the constant current source needs to be ensured every time the voltage is adjusted. The adjustable voltage range is 0.8-3.6 V. When the voltage is greater than 3.6 V, the irradiation intensity is equivalent to 1000 W/m2 in daily use. Similarly, the lower limit of the adjustable voltage corresponds to the limit of zero irradiation intensity. The recorded data curve is shown in Figure 4. The simulated light source and the solar light source exhibit similar changes in irradiance intensity over the course of a day, both exhibiting a Gaussian distribution. This indicates that the simulation device can accurately reflect the characteristics of solar irradiance at different time points.

3.2. PV Array Fault State Simulation

Faults on the DC side of a PV array can be roughly divided into two types: recoverable faults and nonrecoverable faults [24]. Recoverable faults mainly refer to power loss or cluster mismatch caused by shadow shading, including random shading caused by clouds and fixed shading caused by foreign bodies such as bird droppings, leaves, and dirt. These types of PV array failures return to normal when the shading disappears or is cleared [25]. On the other hand, nonrecoverable faults mainly include irreversible damages such as short circuits, open circuits, and aging degradation. The performance of the PV array can only be restored by replacing the damaged parts. Due to the obvious electrical characteristics of short circuits and open circuits, the diagnosis technology for these faults is relatively mature [26, 27]. This paper mainly focuses on faults caused by random shading, fixed shading, and aging degradation, as shown in Figure 5. In the aging degradation experiment, a series resistor is added to the output of the array. The cloud shading experiment collects data by randomly introducing shading objects above the array. In the fixed shading experiment, the shading objects were randomly placed on the array surface.

As we know, aging degradation will directly affect the output power of PV modules, with a relatively small impact on short-circuit current and open-circuit voltage. Fixed shading will cause power losses in photovoltaic power plants for a long time, with a regular temporal pattern, while random shading will cause power losses in PV power plants for a short time, with instantaneous characteristics. In order to verify the reliability of the accelerated simulation device for PV power plants, we conducted experiments and measured the IV curves under different fault conditions as shown in Figure 6. The aging degradation 1, 2, and 3 correspond to PV arrays with resistors of 12 Ω, 24 Ω, and 36 Ω in series, respectively. It can be seen that as the resistance increases and the aging state deepens, the peak power of the simulated PV system continues to decrease, and the short-circuit current and open-circuit voltage remain basically unchanged. Fixed shading 1 and 2 correspond to artificially created shadows of larger and smaller areas, respectively. As can be seen from the figure, there is a clear “hump” effect. It can be seen that the VI characteristic curves of the PV array under different fault conditions obtained based on the simulation device are basically consistent with the output characteristics of the actual PV power station and therefore suitable for training and testing of classification models. Among them, the voltage-ampere curves for random cloud shading and fixed shading caused by foreign objects have similar shapes, showing a “hump” effect, and the position of the “hump” is related to the shading area and position. It can be seen that it is challenging to distinguish the type of shading based solely on the morphology characteristics of the voltage-current curves of the PV array.

3.3. Sample Data Set Acquisition

Based on the experimental device described above, different combinations of lights were used to simulate seven time points between 9:00 and 15:00 in a day. Key data, such as output voltage, output current, cell temperature, and illumination intensity of the PV array, were collected. The aforementioned test was repeated under various fault states of the PV array, and a time series matrix was constructed according to equations (4)~(7) as the input data for the classification model. Table 1 presents the data classification labels and sample information for different fault types, where D0 represents normal operation, D1 represents fixed shading, D2 represents random shading, D3 represents aging degradation, classification represents the classification of faults, feature represents different working states, quantity represents the number of samples, and the proportion represents the proportion of the number of samples in this category in the total number. The power-time curves of the PV array under different failure states are depicted in Figure 7. Upon comparing with Figure 6, it is evident that although the PV array may exhibit the same or similar electrical characteristics, such as the output power at the maximum power point, under different fault conditions, the temporal characteristics of the output power differences are still apparent. Therefore, utilizing the temporal characteristics of the output power for fault classification can enhance the accuracy of the diagnosis model.

To enhance the diversity of the sample data, we varied environmental parameters such as light intensity and cell temperature during the simulation of PV array failure. Additionally, we considered shading in two ways: fixed shading based on shading area size and random shading based on shading duration, as well as aging degradation, which considered the change in resistance of the resistor series. A total of 440 sets of data were generated, including four typical fault types, according to Table 1. Among them, 308 sets of data were used for model training and the remainder was used for testing.

3.4. Analysis of Experimental Results

Based on the above data sample set, this article conducted an automatic parameter tuning experiment on the radial basis kernel function of the SVM classification model, with the penalty coefficient and gamma parameter automatically tuned within the ranges of 0.1-100 and 1-100, respectively, to optimize the classification performance of PV array faults. Figure 8 shows the classification results of the trained SVM algorithm model. The different color regions represent the four operating states of the PV array, while the different color dots represent the test data under the corresponding operating states. Green indicates normal operation, red indicates fixed shading, blue indicates random shading, and orange indicates aging degradation. The optimization process of the classifier in Figure 8(a) is based on known fault characteristics, meaning that the training data contains all the fault characteristics of the test data. Despite a few misclassification points in the classification diagram, the accuracy of the classifier in classifying 132 groups of test data has reached 99.5%.

In practical engineering applications, it is often challenging to include all fault characteristics in the training sample sets. Therefore, effectively classifying faults with unknown characteristics is crucial [28]. Figure 8(b) displays the classification diagram of the SVM algorithm model after adding fault characteristics that were not included in the training sample sets. The figure shows that the presence of unknown characteristics not only separates the fixed shading area (red) from the random shading area (blue) but also causes a morphological change in the aging degradation area (orange).

This paper tested 42 groups of fault data with unknown characteristics using the classifier, and the results are presented in Table 2.

Table 2 shows that out of the entire randomized experimental test data set, two data points in the fixed shading category were not correctly classified. One was classified as aging degradation, and the other was classified as random shading, resulting in an overall test data accuracy of 95.2%. However, when the test data set was added to the training data set for retraining modeling, the accuracy rate improved to 99.5%. Therefore, it is evident that including more diverse situations in the training data set can improve the classification accuracy, leading to more precise fault classification of PV arrays. Nonetheless, the proposed fault diagnosis method based on time series and SVM achieved 95.2% accuracy for test data without training.

In summary, the fault diagnosis method proposed in this paper using time series and SVM exhibits a high fault classification accuracy and fast convergence rate, which can effectively diagnose fixed shading and random shading caused by fixed objects. This provides a strong basis for the assessment of PV arrays and later operation and maintenance.

4. Conclusion

This paper presents a theoretical and experimental study on the automatic classification of common faults in PV arrays, including aging degradation, random shading, and fixed shading. The output power of PV arrays varies significantly under different fault states, which provides the basis for the proposed fault diagnosis method based on support vector machines and time series. The paper uses the output power at different times of the day to build a time series and establish a fault classification model based on one-versus-one multiclassifiers. The model is tested and validated through simulation experiments, with results indicating an accuracy rate of 99.5% for faults with known characteristics and 95.2% for faults with unknown characteristics.

Data Availability

The data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

This work was supported by the Doctoral Fund of Chengdu Technological University (grant number 2019RC012), the International Scientific and Technological Cooperation Projects of Chengdu City (grant number 2021-GH02-00087-HZ), and the school-level project of Chengdu Technological University (grant number 2020ZR014).