Abstract

Aiming at the problem of fault diagnosis of the photovoltaic power generation system, this paper proposes a photovoltaic power generation system fault diagnosis method based on deep reinforcement learning. This method takes data-driven as the starting point. Firstly, the compressed sensing algorithm is used to fill the missing photovoltaic data and then state, action, strategy, and return functions from the environment. Based on the interaction rules and other factors, the fault diagnosis model of the photovoltaic power generation system is established, and the deep neural network is used to approximate the decision network to find the optimal strategy, so as to realize the fault diagnosis of the photovoltaic power generation system. Finally, the effectiveness and accuracy of the method are verified by simulation. The simulation results show that this method can accurately diagnose the fault types of the photovoltaic power generation system, which is of great significance to enhance the security of the photovoltaic power generation system and improve the intelligent operation and maintenance level of the photovoltaic power generation system.

1. Introduction

With the continuous advancement of energy transformation, the proportion of clean energy in the energy supply is increasing year by year. At present, the development of photovoltaic power generation technology has been relatively mature and has been more and more widely used at home and abroad. Statistics show that, by the end of 2020, the cumulative installed capacity of photovoltaic power generation in China has reached 204.3 million kW, and the total annual photovoltaic power generation has reached 224.3 billion kWh [1]. Because solar energy is intermittent energy, in order to ensure the normal operation of the photovoltaic system and reduce the life reduction and power loss caused by faults, the research on accurate and fast photovoltaic fault diagnosis method is of great significance.

With the development of artificial intelligence technology, there are various fault diagnosis methods based on intelligent algorithms. The neural network method proposed in [2] can judge the existence of short-circuit fault after learning by establishing several neural network structures. The fuzzy algorithm proposed in [3] estimates the output power value under normal conditions and then compares the value with the real-time measured value. If the difference between the two is greater than the set threshold, it is proved that there is a fault. References [4, 5] proposed a photovoltaic array fault detection method based on pattern recognition. This method obtains appropriate fault characteristic parameters through signal decomposition technology and then uses a fuzzy inference system to judge whether the photovoltaic array has a fault. This method needs to formulate fuzzy rules in advance, and the formulation of fuzzy rules often depends on experience or experts in this field, so it is difficult to obtain fuzzy rules. Reference [6] proposed a fault diagnosis method of photovoltaic power generation system based on BP neural network, which has strong adaptive nonlinear pattern recognition ability and is suitable for multifault complex systems. Reference [7] proposes a multiclassification supported axis to diagnose the faults between neutral lines and equipment faults of photovoltaic cells. Reference [8] proposed a graph-based semisupervised detection method for fault diagnosis of short circuits, open circuits, and line to line faults. Reference [9] proposes a method of applying investigation to solve photovoltaic fault, but this method needs to obtain the data of fault data set in advance. Reference [10] proposes a photovoltaic array fault diagnosis method based on long-term and short-term memory neural network (LSTM). This method establishes the LSTM neural network fault diagnosis model and trains the model by collecting the characteristic parameters of the photovoltaic array under different fault conditions as training samples.

The above literature provides a good reference for the fault diagnosis research of the photovoltaic power generation system, but most of the above research depends on specific algorithm models, with low monitoring accuracy and lack of self-learning of diagnosis methods [11].

In order to accurately diagnose the fault types of the photovoltaic power generation system, a photovoltaic power generation system fault diagnosis method based on deep reinforcement learning is proposed in this paper. Firstly, for the photovoltaic power generation system data summarized by the operation and maintenance platform, the compressed sensing algorithm is used to fill in the missing data, then the enhanced learning algorithm is used to establish the fault diagnosis model of the photovoltaic power generation system, and the deep neural network is used to approximate the decision network to find the optimal strategy, so as to realize the fault diagnosis of the photovoltaic power generation system. Finally, the feasibility and accuracy of the proposed method are verified by simulation experiments.

2. Deep Reinforcement Learning Algorithm

2.1. Reinforcement Learning

Reinforcement learning is a branch of machine learning, which is mainly used to learn control strategies. Its learning process is similar to the process of living organisms getting along with the external environment, which is in line with human behavioral psychology. The model of reinforcement learning is shown in Figure 1. The brain represents the agent, and the Earth represents the environment. The agent continuously interacts with the environment for learning, that is, the process of reinforcement learning [12].

As can be seen from Figure 1, the interaction between the agent and the environment will produce a time series composed of state, action, and return. Based on the premise of time series and certainty, reinforcement learning can be regarded as a Markov decision-making process.

Markov decision process is usually defined by five tuples: .(1)S represents the state space, which is the external environment that the agent can perceive.(2)A represents the action space that the agent can choose. In each state, the agent selects a behavior action to feedback to the environment according to the strategy.(3) represents the state transition probability of the environment. See formula (1). represents the probability that the environment reaches state st + 1 after deciding to take action a in state st. At this time, state st + 1 is only related to st and action a and has nothing to do with all states before time t.(4)r(st, at) refers to the return of the agent to implement the action at through decision-making when the agent is in the state st.(5) represents the discount factor. ; the discount factor is the important parameter that determines each return.

In reinforcement learning, two important state value functions are defined to describe the importance of state and value, respectively, as shown in the following equations:

Through the interactive process of enhanced learning, it is finally required to find the optimal strategy that can maximize the benefits of the agent:

The state value function is an iterative expression, which meets the requirements of the Behrman equation, so it can be solved by the iterative method. When the transition probability between states is known, the value iteration method is adopted, that is, the state value function is updated through the iterative method, and the adopted strategy is changed according to its value, and the final convergence result is the optimal state value function.

The main content of the Q (Q-learning) learning algorithm is to calculate the maximum value function of state and behavior, update it by using the past and recent weight average, and then solve it by using the optimal action state value function to obtain the optimal state value function, to obtain the optimal learning strategy [13, 14]. As shown in the following equation,

2.2. DQN Algorithm

The deep neural network is introduced into the Q-learning algorithm, which is called DQN (Deep Q-learning Net) algorithm [15]. Neural network training samples need to have labels, and reinforcement learning is a type of learning without direct labels. Therefore, the target Q value is used as the training label, and the purpose of training is to make the Q value close to the target Q value. The calculation of the target Q value is as the formula in step 11 of Algorithm 1 and then makes a difference with the output of the current network. The parameters of the neural network are updated by the method of backpropagation gradient descent until the Q network converges [16].

The implementation of the DQN algorithm involves the experience playback mechanism; that is, the information of each interaction is stored. During training, a sample is randomly selected from the experience pool for training, which can maintain independent and identically distributed among samples and eliminate the correlation between samples [17].

In the DQN algorithm, the Q-learning algorithm and deep learning network are trained at the same time. A large number of training samples are obtained through Q-learning, and then the neural network is trained. The key lies in the label (i.e., target Q value).

(1)All parameters of the Q network are initialized randomly with the corresponding value Q
(2)Clear set D of experience playback
(3)for episode = 1, M do
(4)Initialization status , then get the eigenvector
(5)for t = 1, T do
(6)Use ε−greedy selection action
(7)Execute action to get return value and next state
(8)Sets and gets
(9)Stores back to experience pool
(10)Randomly collect a sample from the experience pool
(11)Update:
(12)Perform gradient descent steps:
(13)End

3. Photovoltaic Data Filling Based on Compressed Sensing Algorithm

3.1. Compressed Sensing Algorithm

A compressed sensing algorithm is an algorithm that compresses the signal at a very high compression rate and reconstructs and recovers the compressed signal after transmission [18]. This algorithm can change the asymmetry of the signal in the process of acquisition, transmission, and processing. The signal acquisition is generally carried out by using sensor devices. Generally, these devices have poor storage endurance and do not support complex processing such as collecting a large amount of data and compressing the data. After transmission to computers and other devices with strong computing power, the computer only needs to do some simple decompression; this asymmetry brings great pressure to the sensor acquisition equipment [19]. The compressed sensing technology completes the compression in the sampling process, so it only needs to collect a small amount of data and use the computer to process a large amount of reconstruction calculation. Therefore, the compressed sensing algorithm is widely used in signal processing and so on.

Compressed sensing algorithm: one is the sparse representation of the signal. For a signal , we select a group of orthogonal transformation basis Ψ to sparse decompose the signal to obtain a group of sparse signals. Second, the observation matrix is designed to observe the signal. The observation matrix is required to be uncorrelated with the sparse orthogonal transformation basis [20, 21], and an observation matrix with the size is selected Փ. The sparse representation of the original signal S is projected into M dimensionality reduction vectors Y=ՓS, where . The third is signal reconstruction. The process of signal reconstruction is the process of finding the optimal solution under constraints. The reconstruction algorithm is equivalent to the following mathematical programming problem.

Objective function: . Constraints: .

The flow chart of the compressed sensing algorithm is shown in Figure 2.

Compressed sensing is that the data is incompletely sampled when it is less than Nyquist sampling law, and then the original signal is reconstructed, which is very similar to the partial loss of photovoltaic monitoring data. Therefore, the compressed sensing algorithm is used for photovoltaic missing data reconstruction. In the photovoltaic monitoring signal, the same physical quantity is sampled in adjacent periods. The change between the two sampling values is very small and smooth. After sparse transformation, it has the characteristics of a sparse signal. The compressed sensing algorithm is used for data filling, and finally, the reconstructed signal is used for filling. Secondly, in the design of the observation matrix, the observation matrix is designed according to the location of the missing data, so that the sparse representation basis of the observation matrix has little correlation. The signal reconstruction process uses the orthogonal matching pursuit algorithm, which can reconstruct the signal with high quality.

3.2. Photovoltaic Data Filling

The photovoltaic monitoring system is an important part of the photovoltaic power generation system, which can collect a large amount of data. Through the extraction and analysis of the collected massive data, much valuable information is obtained, which plays a positive role in improving the power generation efficiency of the photovoltaic power generation system and power station operation and maintenance. However, in practice, the collected data are missing due to various reasons (such as transmission fault, sensor fault, etc.), and these missing data may have a great impact on the analysis and mining of later photovoltaic data and the fault diagnosis of the photovoltaic power generation system. In serious cases, it may lead to the direct failure of the fault diagnosis model of the photovoltaic power generation system. Compared with statistical and intelligent algorithms, this paper uses a compressed sensing algorithm to fill in the missing photovoltaic data. Compared with statistical and intelligent algorithms, this paper uses a compressed sensing algorithm to fill in the missing photovoltaic data. The process is as follows:(1)Suppose a monitoring signal sampled at a certain time is , in which there is some missing data. After the missing data is supplemented with zero, the signal is obtained again; that is, data is missing.(2)The photovoltaic monitoring data is calculated by using a matrix to obtain . The obtained signal is sparsely represented by discrete cosine transform; that is .(3)Design the observation matrix. By deleting the missing data in the unit matrix of relative to the signal , an observation matrix of can be obtained. Observe the signal after the observation matrix is sparsely represented to obtain .(4)Reconstruct the signal. The reconstruction algorithm is equivalent to the following mathematical programming problem.Objective function: , constraint condition: . The problem is solved by an orthogonal matching pursuit algorithm.(5)The filling value of missing data can be obtained by inverse discrete cosine transform of the obtained signal .(6)Calculate the mean square error.

4. Fault Diagnosis of Photovoltaic Power Generation System Based on DQN Algorithm

4.1. Diagnostic Model

The fault diagnosis model of the photovoltaic power generation system is established based on the DQN algorithm. Figure 3 is the schematic diagram of fault diagnosis of photovoltaic power generation system based on DQN algorithm. The modeling process is as follows.

4.1.1. Diagnostic Tasks and Interaction Rules

The diagnosis task is constructed as a continuous decision-making process of the agent: the agent successively diagnoses the fault of each training sample in the environment, uses the reward to guide the agent to carry out training and learning, and gives the corresponding reward according to a certain reward principle. The training goal is to maximize the cumulative return of agents in diagnostic tasks.

In the fault diagnosis task of the photovoltaic power generation system, to guide the agent to learn the fault diagnosis strategy, the interaction rules between the agent and the environment are formulated: determine a corresponding return according to the distribution of each category. The principle is that if the agent correctly diagnoses the fault type in the sample, it will give the agent a positive return, and if the agent diagnoses the fault, it will give a negative return; that is, it needs to be deducted from the reward.

In reinforcement learning, the agent is allowed to interact with the environment continuously, record each interaction completely, and then store it in the experience pool. The subsequent learning is to continuously sample and train from the experience pool. Each training process starts from the first sample and ends when the most common fault type in the sample is diagnosed incorrectly. This process is called a plot.

4.1.2. Simulation Environment

The environmental state is an important element in the reinforcement learning model. In the fault diagnosis of the photovoltaic power generation system, because the fault diagnosis of photovoltaic power generation system mainly depends on the data at a certain time, the collection of photovoltaic monitoring data collected at a certain time is regarded as a state, and the data at each time represents a state.

4.1.3. Action Space

The action space of the agent corresponds to the label of the sample (i.e., fault type). There are as many actions as there are fault types for the agent to select during fault diagnosis. Here, the fault types are numbered with Arabic numerals.

4.1.4. Return Function

In the training process, the value of agent action is evaluated by the return function. If the fault distribution is balanced, all samples shall be treated equally. However, due to the unbalanced distribution of photovoltaic power generation system faults, the fault distribution of photovoltaic power generation system equipment of different regions and manufacturers is also different. To better guide learning and training, the return after each fault diagnosis shall be given according to the actual distribution of various faults in the power plant. If the agent correctly diagnoses many faults in the sample, it will give a relatively small positive return. If the agent correctly diagnoses a few faults in the sample, it will give a relatively large positive return. On the contrary, if the agent makes a relatively negative return for this kind of fault diagnosis error with few samples, if the agent makes a diagnosis error for many common faults in the sample, it indicates that the agent has not learned experience and knowledge at all, so it is not necessary to continue, and the current round of diagnosis process should be terminated immediately.

In the fault diagnosis of the photovoltaic power generation system, it is assumed that there are n fault types, the label is defined as k, the training sample set of photovoltaic power generation system fault type is , is the number of training samples of label k, and the imbalance proportion of category k is defined as , as shown in formula (7). Take all the training samples of the most unbalanced n categories as , and the return function is formula (8). When the agent classifies the samples in incorrectly, the current classification task will be terminated.

4.1.5. Classification Task Termination Condition

For the problem of fault diagnosis of the photovoltaic power generation system, when the agent diagnoses the fault of the sample with the largest number of samples, this scenario ends, and the score of the agent in this scenario is cleared. If the previous situation does not occur, but the agent completes the fault diagnosis of all samples, it will reset the agent’s cumulative return and start a new round of tasks.

4.1.6. Tactics

In the training stage, to enable agents to fully learn knowledge and experience, they began to focus on exploration, followed by utilization, so linear annealing greedy strategy is used [22]. The purpose of the test phase is mainly to detect the learning situation of the agent, mainly for utilization. Therefore, the greedy strategy is used; that is, we select an action with the largest Q value every time:

4.1.7. Training Objectives

Deep reinforcement learning is applied to the fault diagnosis of photovoltaic power generation systems. A large number of training samples are learned through a data-driven method, and the ultimate goal is to correctly diagnose the fault types.

Because the DQN algorithm uses an empirical playback mechanism, it is necessary to use a submechanism to train samples when designing a photovoltaic power generation system fault diagnosis model based on deep reinforcement learning. Store the information of each interaction in the experience pool, and then, randomly sample it to train the Q network. The specific process is shown in Figure 3. When the depth neural network is used to fit the Q function, the actual Q value of the target state s is the output value of the current Q network, and the target Q value is recorded as y, which is determined by the progress of the classification task, as shown in equation (8).

Taking the target Q value as the label of deep neural network training, the loss function of Q network training is , as shown in equation (11). According to equation (11), the parameters of the neural network are updated by the gradient descent method through backpropagation until convergence, and the Q function is obtained.

4.2. Evaluation Index

In this experiment, firstly, the fault distribution of the photovoltaic power generation system is counted according to the obtained photovoltaic monitoring historical data, and then, the obtained photovoltaic monitoring data are sampled according to a certain proportion to simulate the fault distribution of other photovoltaic power generation systems to verify the effectiveness of the model, to explore the influence of fault distribution on the effect of photovoltaic power generation system fault diagnosis model based on deep reinforcement learning.

In this paper, is taken as the evaluation index of fault diagnosis [23]. Two categories ci and cj are selected from the fault types of photovoltaic power generation system to calculate the index of the two fault diagnosis results, and then, all values are weighted and summed [24, 25]. The calculation formulas are shown in equations (13) and (14), respectively.

In equation (13), TP refers to the number of correct diagnoses of most samples, TN refers to the number of correct diagnoses of a few samples, FP refers to the number of diagnostic errors of most samples, and FN refers to the number of diagnostic errors of a few samples.

At the same time, accuracy and index are used as evaluation indexes.

5. Example Analysis

5.1. Data and Parameter Design

Based on the historical data of a photovoltaic power station, fifty thousand groups of daytime photovoltaic power station operation monitoring data are selected and recorded as PV monitoring data set. The data set is shown in Table 1. The collected monitoring information mainly includes meteorological environment information, photovoltaic array information, combiner box information, photovoltaic inverter DC and AC side information, and grid connection information. The amount of information related to photovoltaic power generation system fault diagnosis is selected for photovoltaic power generation system fault diagnosis.

This paper mainly focuses on the five common fault types in Table 2. There are 6 operating states, including 5 fault states and one normal operating state. Each group of monitoring data has only one operation state. There are thirty thousand groups of normal operation state data, four thousand groups of data for fault 1, four thousand groups of data for fault 2, four thousand groups of data for fault 3, four thousand groups of data for fault 4, and four thousand groups of data for fault 5.

Label the six operating states, respectively, normal operation (label 0), fault 1 (label 1), fault 2 (label 2), fault 3 (label 3), fault 4 (label 4), and fault 5 (label 5).

The simulation experiment in this paper is based on the photovoltaic power station data set. To simulate the distribution of faults of different photovoltaic power generation systems and study the impact of the distribution of fault samples on the experimental results, the obtained photovoltaic monitoring data set is selected from the samples labeled 0–5 according to different methods, as follows:(1)The distribution of various faults in the original sample is shown in Figure 4, and this data set is recorded as DS0.(2)4000 samples are taken from label 0, and all other labels are taken. The fault distribution data of the photovoltaic power generation system obtained after sampling is shown in Figure 5. At this time, the number of samples of various tag types is equal and the distribution is balanced. This data set is recorded as DS1.(3)The fault types of No. 1, No. 3, and No. 5 labels are selected according to 50%. The fault distribution data of the photovoltaic power generation system obtained after sampling is shown in Figure 6. This data set is recorded as DS2.(4)Label 0, label 2, and label 4 are sampled by 50%. The fault distribution data of the photovoltaic power generation system obtained after sampling is shown in Figure 7. This data set is recorded as DS3.

Due to the complexity of the photovoltaic power generation system, there are many related physical quantities to be monitored, and the units of each physical quantity are also different. During data analysis, the number size problem caused by the problem of each physical quantity unit may occur, which may have an impact on the analysis. Therefore, these data need to be dedimensioned before data analysis.

For the PV monitoring data set, firstly, convolution neural network CNN is used to extract the features of normalized signals. Four convolution layers with a convolution kernel size of three are used, and then, the fault of the photovoltaic power generation system is diagnosed through a fully connected neural network. Because deep reinforcement learning is a fitting regression model, the output layer cannot use the activation function when using the neural network, so the fully connected result is directly used as the output result of the neural network.

In training, the input of the neural network is the number of system states. The number of output neurons is the number of fault types. The activation functions used by all neurons are ReLU, and the loss function is the mean square error. The Adam optimizer is used for model training. The network learning rate is 0.00025 and the discount rate of immediate return is 0.99. When using the DQN algorithm and linear annealing strategy, we set ε start with one.

The imbalance rate and return function of the three extracted data sets are calculated for use in the experiment. The details are as follows:(1)The unbalance rate and return function of the original sample DS0 data set are shown in Table 3.(2)For the DS1 data set with four thousand samples taken from tag 0 and all other tags, the imbalance rate and return function are shown in Table 4.(3)For the DS2 data set extracted according to 50% for labels 1, 3, and 5, the imbalance rate and return function are shown in Table 5.(4)For the DS3 data set extracted according to 50% for labels 0, 2, and 4, the imbalance rate and return function are shown in Table 6.

5.2. Result Analysis

The imbalance rates of the four experimental data set DS0, DS1, DS2, and DS3 after processing are different, and the return function in training is also different. Among them, data set DS1 is a balanced data set, the number of other samples is equal, and other data sets can be compared with data set DS1.

Table 7 shows the fault diagnosis accuracy of various data sets under the DQN algorithm. Table 8 shows the evaluation indexes of fault diagnosis under the DQN algorithm for different data sets.

It can be seen from Tables 7 and 8 that the fault diagnosis of a photovoltaic power generation system based on a deeply enhanced learning algorithm performs well under four different distributed data sets, which shows that it is feasible to introduce a deeply enhanced learning algorithm into the fault diagnosis of the photovoltaic power generation system, and reflects that this method can be applied to different photovoltaic power generation systems in different regions.

In addition, it can be seen from Tables 7 and 8 that the number of samples of each fault type of data set DS1 is the same, the distribution between samples is balanced, and the return function of each fault type is the same during training. Therefore, the fault diagnosis accuracy of the data set reaches 96.6% in the DQN algorithm. Through the comparative experiment of different distributed data sets under the same algorithm, it can be seen that the actual effect of photovoltaic power generation system fault diagnosis is related to the distribution of various faults and the balance rate between faults. For the fault diagnosis of different photovoltaic power generation systems, the return function should be designed according to the distribution of various faults of photovoltaic power generation systems.

In order to verify that the fault diagnosis method proposed in this paper has high accuracy, the model in this paper is simulated and compared with cascade random forest [26] and BP neural network [27]. One thousand groups of labeled data are used as training samples, and seven hundred and fifty groups of data are randomly selected as test samples. Fault samples account for 20% and 4% from fault 1 to fault 5; that is, 30 data are randomly selected for each fault type as markers for simulation experiments.

It can be seen from Table 9 that the accuracy of the DQN algorithm is 96.6%, that of the cascade random forest model is 89.63%, and that of the BP neural network model is 88.00%. Therefore, under the same sample size, the DQN algorithm has higher accuracy than cascade random forest and BP neural network.

6. Conclusion

Based on the operation and maintenance data of a photovoltaic power station, in order to realize the accurate fault diagnosis of a photovoltaic power generation system, a data-driven photovoltaic power generation system fault diagnosis method based on deep reinforcement learning is proposed. It is verified and analyzed by simulation, and the following conclusions are drawn:(1)Through the simulation of different distributed data sets under the DQN algorithm, it is concluded that the actual effect of fault diagnosis of photovoltaic power generation system is related to the distribution of various faults and the balance rate between faults.(2)The accuracy of the photovoltaic power generation system fault diagnosis model based on deep reinforcement learning is 96.60%. Under the same sample size, the proposed method can effectively judge the fault type of the photovoltaic power generation system and has higher accuracy than other diagnostic methods.

Data Availability

The data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

This paper is one of the phased achievements of State Grid Gansu Electric Power Company’s science and technology project “Research on distributed photovoltaic power station monitoring and forecasting technology based on ubiquitous power Internet of things holographic sensing” (522722190002) and Tianyou innovation team of Lanzhou Jiaotong University (TY202009).