Abstract

This paper presents a fault diagnosis application of the Latent Nestling Method to IGBTs. The paper extends the Latent Nestling Method based in Coloured Petri Nets (CPNs) to hybrid systems in such a manner that IGBTs performance can be modeled. CPNs allow for an enhanced capability for synthesis and modeling in contrast to the classical phenomena of combinational state explosion when Finite State Machine methods are applied. We present an IGBT model with different fault modes including those of intermittent nature that can be used advantageously as predictive symptoms within a predictive maintenance strategy. Ageing stress tests have been experimentally applied to the IGBTs modules and intermittent faults are diagnosed as precursors of permanent failures. In addition, ageing is validated with morphological analysis (Scanning Electron Microscopy) and semiqualitative analysis (Energy Dispersive Spectrometry).

1. Introduction

Nowadays digital electronic applications [1], power electronics [2], and even PCB [3] introduce IF diagnosis techniques for the analysis of faults by corrosion, contamination, overtemperature, overloads, electrochemical migration, and defects in manufacturing. IF diagnosis allows the utilization of preventive maintenance routines instead of corrective maintenance, so system reliability is increased.

PSD and particularly IGBT are fundamental in many industrial systems. Some of the most important IGBT applications include lighting controls, power supplies, computer systems, industrial control devices, voltage converters [4], motors, or electric generators [57]. Recent studies about IGBT diagnosis focus on optimizing their properties as power inverter [8], as a switch [9], aging [10], thermal fatigue [11], or manufacturing defects [12].

IF diagnosis in PSDs can be applied to predict the onset of permanent failures. Moreover it can be used to detect persistent IF episodes that degrade the operation of the system and can be considered as a failure. IF diagnosis applied to PSDs under stress tests predict the wearing-out of the component and can be contrasted with the aging related damage or morphological changes on the physical structure of the component allowing for the validation of the proposed diagnosis model. Then IF diagnosis allows for the estimation of the wear out phase in the hazard rate curve of electronic devices and can be applied in preventive maintenance procedures [13].

Low power equipment is subjected to lower levels of energy during operation and gets discontinued before reaching the wearing-out stage. On the other hand, high power electronics equipment (IGBTs) is subjected to higher energy levels and faces wearing-out due to aging. The main objective of the paper is to show the relevance of the LNM to detect IF in IGBTs.

Different methods have been proposed to diagnose semiconductor faults [14]. In [15] a study to characterize the IGBT behavior under stress conditions using a SPICE model was introduced. The authors develop an IGBT test circuit and they tested it in two conditions: normal operation and under stress. It is important to note that this diagnosis does not allow predictive maintenance tasks.

In [16] it is discussed as a new method for IGBT fault detection based on gate voltage monitoring. This study takes into account only the degradation due to overcurrent or overtemperature. This analysis is very interesting and is taken into account for our prototype test development.

Another interesting work [17] shows different methods for the aging analysis, such as thermal cycling (TC), hot carrier injection on electrical stress, and dielectric breakdown of time-dependent stimulus. Two of these techniques are applied in our work as accelerated test methods.

The LNM was introduced by García et al. (2008) for the fault diagnosis in complex, large scale systems. LNM relies on CPNs as design platform and a method for nesting faulty marks in every place of the net. The formalization and methodology as well as some examples of the LNM can be seen in [1821].

The LNM was developed to handle complex discrete event systems, but many systems can be better modeled with hybrid models. This paper will extend the LNM to hybrid systems so it could be applied to diagnose them.

Numerous studies have been carried out to explain hybrid process fault diagnosis using different methodologies [2224]. New techniques need to be developed for diagnosis of Ifs, like the residual analysis proposed in our method.

Furthermore, some researchers [25] analyzed fault models in hybrid PNs. Other authors propose an approximation of differential places to represent continuous places with negative markings (differential PNs [26]) in each place of latent nesting faults in order to avoid unobservable transitions and allow faulty tokens of discrete type to be nested in places of continuous nature. The above provide advantages in solving hybrid systems of increasing complexity and finding failure times of each faulty token in the using the stay time.

IF diagnosis is carried out based on the work by [27] where the authors present a prognosis method to diagnose IF and predict the lifetime of electromechanical devices.

The paper is structured as follows. Section 2 introduces LNM for hybrid systems. It also includes a simple example to show its performance. Section 3 shows the IF diagnosis modeling based on LNM applied to IGBTs. Section 4 explains the test bench, the analysis, and experimental results. Finally, Section 5 draws some relevant conclusions.

2. Latent Nestling Method in Hybrid Systems

2.1. LNM Definition in Hybrid Systems

LNM is a methodology for fault diagnosis of discrete event complex systems (see [18, 20, 21]). Because this paper introduces a hybrid model for the IGBTs (presented in Section 3.2) we present an update of LNM to handle hybrid systems.

The diagnoser will be a hybrid model of the system including normal and faulty behavior of each device in the system. In order to avoid the combinational explosion, [19] the model is built using hybrid colored PNs.

A hybrid CPN for fault diagnosis (HCPNFD) is defined aswhere is a finite set of places, is a finite set of transitions and Pre and Post are the input and output arc functions, with an additional argument which is the color of the transition firing . Thus and correspond in the general case to a linear combination of token colours related to place .

These functions can be divided into two subsets, depending on the transition-type behavior, namely, normal transition or faulty transition where and are the fault and recovery transitions, respectively. is the initial marking. is the subset of fault latent nestling places, where . If includes a faulty token in . This is now called . is the subset of fault verification places.

The places set and transitions set can be divided into two subsets

is the set of discrete places and is the set of continuous places. is the discrete transitions set and is the continuous transitions set.

will represent discrete states of a device such that the device is on and off and is starting and stopping, and so forth.

will represent the continuous states of a device so it computes a differential equation model.

will represent a discrete state change.

will represent step execution of the model contained in a .

In addition, the normal behavior marks can have discrete or continuous nature:

will represent a normal behavior token of a device and its evolution through the diagnoser will show the device state.

is the colour set assigned to different identifiers. , where is the subset of coloured tokens representing the fault set.

Initial marking for a place in (called ) will be , and the initial marking for a place in (called ) will be . or , or . stands for the rational numbers (positives or zero). Then, for ,

Let be the input arc function corresponding to subset . Consider

Let be the output arc function corresponding to subset . In and case, the number of arc functions corresponding to subset of each depends on the continuous places mutually influenced, such that is the initial continuous place influenced and is the last continuous place influenced.

This represents a continuously variable behavior and also allows the nesting of discrete type faults.

: is a composite function that is defined for every place of the net.

: it is the hybrid states set in the analyzed system. This set is composed of the operating states OS, fault signatures , and recovery signatures .

: it is a delay function that associates a rational number to each timed transition, where if for a function , is a delay associated with the transition , expressed in time units, if for a function , , such that represents the maximum firing speed associated with the transition and is the firing frequency that represents the sampling time. The method for delay fixing or fixing the frequency firing depends on the system behaviour.

Definition 1. A normal discrete transition in a HCPNFD is enabled at a marking if each place in meets the condition:

Definition 2. A normal continuous transition in a HCPNFD is enabled at a marking if each place in meets the condition:where is the set of the input places of discrete and is the set of the input places of continuous . Likewise, it is necessary to meet the condition and , . .

2.2. Initial Model and Fault Selection

The initial model is similar to that presented in the LNM. However, it includes continuous places where we could model the continuous behavior of the system variables. This step applies the techniques of modeling hybrid PNs [25].

According to [19], the sensors map is defined as sm: , where SR is the sensor readings, such that for marking the expression is given by , . In the discrete case is the set of sensor read output values for each discrete marking, such thatwhere , are subsets of expected and unexpected values accordingly.

2.3. Latent Nestling Places and Trajectories of Fault Verification and Fault Recovery

Latent nestling places are defined by the LNM. However, in a hybrid system, there is a continuous place which represents an operating state during a certain time according to the states of the discrete places. Faults are assigned to this continuous place, such that . This implies that the faults have been generated by the anomalous behavior of the continuous variable, where the faults are nesting in the same continuous place now called owing to this hybrid character.

The trajectories of the faulty tokens are defined only by the fault and recovery transitions. The normal discrete and continuous transitions are defined by a classical method for modeling Hybrid PNs [25], as well as the firing rules for these transitions. Furthermore, fault and recovery transitions must be added to make restrictions that allow including both the place status as tokens of normal behavior.

Definition 3. A fault or recovery transition in a CPNFD or HCPNFD is enabled at a marking for discrete places if each place or in meets the condition: for fault transitions for recovery transitions Let be the fault marking obtained after firing of transition with respect to the fault signature . This fault marking is deducted from the marking by the following relation.
For fault trajectory,For recovery trajectory, is the last , is the initial continuous place influenced, and is the last continuous place influenced.
In the example case of Figure 3 we have for fault verificationAnd for fault recovery,To find the residues, it is necessary to obtain the operation dynamic model of the continuous variables. Depending on the complexity, the models could be represented in state variables, as in the hybrid PN analysis [26]. In this case, the approach presented in our example introduces a series of residues of the form in every continuous place. The residue is computed in the continuous place, while the residual evaluation is checked in each fault and recovery transition.

The definitions on states of hybrid operation, fault signatures, and diagnosability can be seen in [21].

3. IFs Diagnosis Using the LNM Based on HCPNFD

3.1. Temporal Modeling of IFs

The main purpose of diagnosing IFs is the generation of tools to perform preventive maintenance of devices in industrial systems. It becomes necessary to apply data obtained online to determine the best time to replace or repair a component. The basic idea is to employ prediction methods based on process fault information. This information is indicative of the deterioration that is suffering the component.

From this method we get two measures based on [27]: temporal failure density and pseudo period. Temporal failure density ( or density in the rest of the paper) is defined as the average time a particular fault is active within a sliding time window of duration . computed at time for failure is defined aswhere CNT is the number of faults inside the window, stands for the index of the first fault detected inside the window , and if it exists; otherwise   and takes into account the duration of a failure occurred before which continues active inside the window. Therefore,

Equation (17) is valid only if   is positive; otherwise , as this fact would indicate that the th failure time is completely outside of the window. In a real system, DF tends to increase with time, thus confirming the hypothesis that IFs progressively damage the faulty device. In our case we only apply this measure DF with the LNM.

3.2. Initial Hybrid Model

For this case we will focus on a nonlinear model that represents the turn-on and turn-off switching waveforms and will get the and value that must have the IGBT. Some references that model different aspects of IGBTs and MOSFETs and the turn-on and turn-off waveforms can be seen in [28]. For each state (turn-on, turn-off) there are equations that define its operation.

For the turn-on these equations are as follows.

The increasing time constant from to is limited by

The decreasing time constant from to is limited bywhere is the voltage when it reaches the maximum collector current and is the voltage across the gate to the emitter of the transistor during conduction. The increasing time constant from to is limited by

The reverse transfer capacitance or is approximately equal to because the emitter is connected to ground. Then we will use in our final model.

Based on the equivalent circuit of the IGBT gate, the gate current is deduced by

Note that is directly affected by which causes a large change in gate voltage.

For the turn-off the equations are as follows.

falls from injected to with a time constant given by (21). At this time, there is no change in the values of or .

Then increases in this region, and the rate can be controlled with as shown in the equation below:

Then the value of is maintained at , while decreases at a rate defined by the following equation. The rate of increase can also be controlled with where is the input capacitance measured between the gate and emitter terminals with the collector shorted to the emitter for AC signals, . The value of these fixed capacitances can be found in the data sheet of the manufacturer.

3.2.1. Hybrid Model Using Hybrid PNs

The hybrid model is implemented following the scheme of Figure 1. Continuous places and represent the ideal behavior of voltages and , respectively. The continuous place represents the load voltage as a function of the collector current. Transition represents the activation of the IGBT (turn-on) and transition shows the switch off the IGBT (turn-off).

Anytime during IGBT switching this model represents voltages and . This allows us to detect any small changes in these voltages during the stress tests. Depending on the experimental condition a complete cycle lasts from to 100 ms as shown in the Results section.

As there are two continuous places, the model has two different residues that verify the same fault. It is important to nest in every place the same fault but with a different designation. Therefore we nested faults as if the fault is from the and if the fault comes from the , likewise for fault .

We consider two types of IGBT faults. The first fault is the device in opencircuit. When there is a difference between and , such that remains in a positive value, it is considered that the system is in a fault mode because the IGBT does not respond to the control signal for some reason. This fault mode can be caused by two conditions: command level design or an internal failure of the component (intermittent fault). This fault is called . When there is a difference between signals and , such that remains in a negative value, it is considered that the system is in a fault mode because the IGBT does not respond to the control signal for some reason. This fault mode can be caused by the same two previously defined conditions. This fault is called .

The residues were analyzed using a nonlinear model based on HCPNFD.

Observing the model in Figure 2, the residues may be obtained using the sensor readings with the values of continuous places. In this model is the forward transconductance.

Therefore, if the fault comes from the place the faulty mark nested is as and if the fault comes from the place the faulty mark nested is , similarly to the faulty mark . The proposed HCPNFD model has been verified by a reasonably good agreement with measurements. Figure 2 shows the resulting waveforms of the turn-on and turn-off. In this case the turn-on starts with high, zero or negative and constant gate charging current producing a linear increase of the gate voltage. With falling collector-emitter voltage the gate bias current is utilized for changing the charge of () and the gate voltage remains constant. When the collector-emitter voltage has come down becomes larger as much that also at reduced slope of still all the bias supplied gate current is used up. Only when finally the current needed for charging becomes smaller than the bias supplied current the gate voltage rises again. The turn-off starts with low, positive or greater than the threshold voltage . The gate voltage first decreases nearly linearly. With still low collector-emitter voltage and with only moderate increase there is the strongest change (decrease) of . Decrease of a capacitance at constant charge increases the voltage. As there is a bias source which is drawing current out of the gate, the gate-emitter voltage remains constant. Subsequently increases and most of the gate discharge current is used up for . The gate voltage further remains constant. The charge over process is finished when roughly reaches the operating voltage. Now a further decrease of the gate voltage is possible again.

In this case, a residue signal is obtained that would be expressed aswhere is the real reading and is the estimated reading. is the IGBT analyzed, and is the obtained residue number for this IGBT. In the case of the four IGBTs of our test, we obtain the following two residues for each IGBT:

Also, the implemented IF diagnosis needs the computing of DF with (17) and (18). So, some parameters must be computed: (i)a counter CNT in the place for each type of fault; (ii)a timer associated with each faulty mark. This timer is reset each time the fault is recovered; (iii)a timer associated with the place.

These parameters allow obtaining for each fault the temporal density and analyze the prediction of change for each IGBT. Therefore for each fault

—timed place is , where is the duration of the sliding window.

The counters for each fault are given by = number of times the fault type was isolated in place in a window of duration ; = number of times the fault type was isolated in place in a window of duration ; = number of times the fault type was isolated in place in a window of duration ; = number of times the fault type was isolated in place in a window of duration ; = residence time of the fault in place; = residence time of the fault in place; = residence time of the fault in place; = residence time of the fault in place.

An example of IFs can be seen in Figure 3 with a window of units. There are two iterations of failure and recovery of type . From left to right you can see the iteration number, the fired transition, the fault counter for that specific fault, time on the window, the timer of the fault, and finally the vector that stores timer information every time that a fault occurs.

Based on the analysis of continuous places of Section 2.3 and in Figure 2, we observe that and places are of isolated type; therefore

, where , respectively, to , which are the discrete places that influence the behavior of continuous place . Consider the following.

, respectively, to , which are the discrete places that influence the behavior of continuous place . According to    for continuous isolated places, we obtain , and each state is equal to each fault signature ; therefore,

Applying (13) to fault trajectory, :

:

Applying (16) to fault recovery,

4. Analysis and Experimental Test Results

4.1. Hardware Implementation

The test bench is based on two test circuits. The first circuit is a direct operational model of activation with a resistive load. The second circuit has a driver that protects and regulates the current in the base of the IGBT to avoid losses and high currents that lead to high temperatures and can cause damage to the IGBT. Likewise, this one has a resistive load. The main components of the test bench are the IGBTs with commercial reference IRG4BC30KDPBF, the reference driver HCPL-316J for each IGBT, four driver modules, one for each IGBT, four thin film PT100 2 mm × 10 mm, four variable resistive loads of 10 Ω  25 W for each IGBT, and a ceramic heater 10 cm × 10 cm with a range of 5°C to 540°C. Figure 4 shows a complete scheme of the assembly created for the test bench.

The basic driver circuit was based on the circuit presented in [15] that allows stress aging tests using thermal cycling (TC) and hot carrier injection (HCI).

The TC is strongly associated with failure by degradation and removal of welding. The HCI is another form of accelerated aging. This aging mechanism can be performed by applying high voltages at the gate of the IGBT or can also be produced by magnetic fields.

This circuit uses a  V, ,  Ω,  A. The second circuit works as inverter in industrial installations for motor control or power generating systems. This driver circuit allows precision in the control signal at the IGBT gate. For aging tests using the technique of thermal cycling, it is necessary to limit the current out of the driver; therefore,

The maximum current driver is  A, the maximum switching voltage is  V, and  V according to the manufacturer’s data sheet. Using the low voltage output maximum  V (manufacturer’s data sheet), it has a  Ω. modifies the voltage slope in the and . If is of greater value the transition in and is slower. Therefore we have to employ small values for . The maximum switching frequency is determined bywhere

Likewise, total dissipated power is given bywhere is the maximum input power dissipated, limited by  mW. is the maximum output power dissipated, limited by  mW.

Consider where , , and , are given by the manufacturer of the driver selected as our circuit. is the maximum switching frequency of the driver and is the power dissipated in a resistive load switching defined by

Knowing that , take a  Ω for some aging tests as electrical overstress (EO) and TC methods; then  A. With these data we obtain μJ. Finally solving (34) and comparing with the maximum input and output values of power dissipated,  V · 16.5 μA = 82.5 mW  < 150 mW,  mA · μJ · 5 kHz = 86.25 mW < 600 mW.

Figure 5 shows the driver circuit for each IGBT.

In this case the maximum power dissipation is not exceeded according to the more demanding tests performed in our test bench.

Figure 6 shows the task execution blocks, interconnected to the data acquisition card and the test bench. Figure 7 shows the graphical user interface for fault diagnosis in the IGBTs test bench. Each red number designs information or task panel:(1)start/finish test;(2)test mode, switching frequency, and gate voltage;(3)input voltage signal;(4)measured gate voltage;(5)collector current (by shunt effect);(6)switching counter;(7)temperature display;(8)temperature zoom in;(9)collector current standard deviation;(10)type 1 and 2 faults switching counter.

Figure 8 shows in blue IGBT 22 failure in short circuit with a load short circuit fault current of 1.4 A.

4.2. Results

In the IGBT fault-free, the first thing we get is the performance curve versus for several new IGBTs. Curve is commonly presented to show the performance of IGBts (IGBT-IRG4BC30KDPBF).

Figure 9 shows in graph (a) the behavior of the current versus the collector voltage for a 7 V fixed value of the gate voltage. It can be seen in IGBTs 15 to 19 that the inclination angle of the curve remains almost constant regardless of the initial resistive load (graphical detail (b)). Aging of IBGTs modifies - curves as it can be seen in Figure 10(b).

In addition, it is presented as a morphological analysis and chemical analysis of some selected samples to determine the compounds of the IGBTs.

In this case the tests were performed at 4 IGBTs per sample. Each IGBT is carried out the algorithm for detection of IFs. In total 64 IGBTs are analyzed for different types of stress. Most tests were performed by TC and by load. We selected a test to give the best results in this case with the following characteristics: IGBT surface temperature of , switching frequency of 500 Hz, gate voltage of  7 V,   and load voltage of 10 V. Finally a condition monitoring by loading with .

Figure 11 shows the open circuit fault better than the other test due to the intensity of the test. The graph (b) shows the initial faults and from hour 16 the intermittent faults due to wearing-out of the IGBT. At approximately 23.5 hours occurs the abrupt fault. The graph (a) shows the intensity of the switching and that the last fault was detected in switching 198000.

Detail (a) of graph (a) in Figure 12 shows small short circuit IF. From fault number 20 onwards the IGBT only fails in short circuit leading to a short circuit permanent failure. Graph (b) in Figure 12 shows the initial short circuit faults and the wearing-out faults during the last 30 minutes.

Figure 9 shows the performance curve versus in order to see the aging curve. The curves represent the beginning of the normal curve, but the curve at hour 22 shows the wear on the IGBT. The detail (c) of the graph (b) shows that the state of the IGBT tends to open circuit as seen in the IFs shown in Figure 11. The end of the IGBTs life by short circuit at the 43rd hour of operation can be seen in the detail (d) of the graph (a).

Completing this IFs analysis we proceeded with the SEM/EDS analysis to the samples. This analysis corroborates the morphological and physical changes appreciated in the IGBTs structure. SEM analysis in Figure 13 shows an almost union separation and a grain size quite appreciable. Although this qualitative information is not very valuable, the semiquantitative information of EDS clearly shows that the compounds of silicon, copper, and tin are increased in the union. This increase is directly related to the test type, intensity, and hours of operation. While stress by TC and by load increases, the amount of these compounds increases too. These tests were conducted at 700 μm.

SEM/EDS analysis is also applied to the gate union in the IGBTs. In this case it was applied 600 μm testing. Figure 14 shows that deformations are very remarkable. Analyzing the results we observed in EDS that silicon and oxygen increase with the aggressiveness of the tests and copper and tin decreased in the same proportion.

5. Conclusions

Typical electronic devices such as IGBTs have a type B failure characteristic with an infant mortality followed by a constant or slowly increasing failure probability; therefore, they have no an identifiable wearing-out age. So an age limit is not applicable normally. The major contribution of our work is the inclusion of intermittent faults in the developed fault diagnosis model. Intermittent faults can be used as precursor symptoms of identifiable wearing-out age permanent failures in order to apply preventive or predictive maintenance in the electric and electronic devices. This paper shows the validity of this kind of intermittent fault diagnosis for IGBTs.

We have used models based in the Latent Nestling Method and HCPN. The dynamics of HCPN allow for the representation of transitions between transitory faults and fault-free states including quantitative measures.

Some conclusions can be drawn from the stress tests. The IGBTs condition and fault mode depend on the experimental procedure and stress level applied in the tests. Condition 1 of 10 Omh/230°C/500 Hz had hardly any effect in the aging process of the components leading to no fault. Condition 2 of 5 Omh/250°C/500 Hz produced open circuit intermittent faults leading to short circuit permanent failures. Our aging hypothesis has been confirmed by morphological and chemical analysis (SEM/EDS) carried out on the failed IGBTs.

Nomenclature

:Printed circuit board
:Power semiconductor device
:Thermal cycling
:Latent Nestling Method
:Place of latent nesting fault
:Place of fault verification
:Set of hybrid states
:Set of faults
:Set of sensor readings
:Subset of output of expected values
:Subset of output of expected values
:Fault marking
:Fault signature
:Intermittent faults
:Temporal failure density
:Hybrid coloured Petri net for fault diagnosis
:Gate-emitter voltage
:Collector emitter voltage
:Reverse transfer capacitance
:Input capacitance between gate and emitter
:Faults counter
:Timer associated with the
:Timer associated with the
:Hot carrier injection
:Maximum current driver
:Turn-on
:Turn-off
:Scanning Electron Microscopy
:Energy Dispersive Spectrometry
:Maximum switching frequency of driver
:Power dissipated in a resistive load switching
:Sliding time window
:Number of times the fault type was isolated in in a window of duration .

Conflict of Interests

The authors declare that there is no conflict of interests regarding the publication of this paper.

Acknowledgments

This work was supported by the Spanish Ministerio de Ciencia y Tecnología Project DPI2009-14744-C03-03, by Generalitat Valenciana Project GV/2010/018, and by Universitat Politècnica de València Project PAID06-08.