Modeling and Experimental Study of Soft Error Propagation Based on Cellular Automaton
Aiming to estimate SEE soft error performance of complex electronic systems, a soft error propagation model based on cellular automaton is proposed and an estimation methodology based on circuit partitioning and error propagation is presented. Simulations indicate that different fault grade jamming and different coupling factors between cells are the main parameters influencing the vulnerability of the system. Accelerated radiation experiments have been developed to determine the main parameters for raw soft error vulnerability of the module and coupling factors. Results indicate that the proposed method is feasible.
Single-event effects (SEEs) are induced by the interaction of an ionizing particle with electronic components. This becomes possible when the collected fraction of the charge liberated by the ionizing particle is larger than the electric charge stored on a sensitive node. These effects are categorized into hard errors and soft errors. Hard errors are nonrecoverable, while soft errors may be recovered by a reset or simply a rewrite of the information. There is a great variety of manifestations depending upon the device considered, such as single-event transients (SET), single-event upset (SEU), and multibit upset (MBU) [1, 2]. With very-large-scale integration (VLSI) being widely used in space-borne electronic systems and recorded as accidental failures of space instruments, SEEs have become the main cause of such failures. The Beijing Institute of Spacecraft System Engineering has studied 272 satellite failures since the 20th century around the world, results of which indicate that 40% of the total number of failures was induced by SEEs [3, 4].
Three processes can be identified based on the transformation of related faults into functional failures. First is the generation of SEE soft errors. The second process is propagation of an SEE soft error. The propagation of an SET pulse in a combinatorial circuit is affected by three mask effects: electrical masks, logical masks, and timing masks . For example, Wang et al. showed that 85% of the soft errors in a processor are masked at the architectural level . The third process is the functional failure of the system. This means that the raw fault has propagated into the output of the system.
Many efforts have been made in recent decades to measure, model, and mitigate radiation effects, applying numerous techniques and approaching the problem at various abstraction levels. There are three main research methods: mixed-level simulations, fault injection [7–9], and accelerated radiation ground testing [10, 11].
Mixed-level simulations have been used to study the production and propagation of digital single-event transients (DSETs) in scaled silicon CMOS digital logic circuits . The system soft error rate (SER) modeling of a chip or its components can be analyzed accurately by combined Monte Carlo device simulations and SPICE simulations . A technique for evaluating the soft error vulnerability of application-specific integrated circuit (ASIC) designs by employing circuit partitioning and fault propagation techniques was also presented . A method based on probabilistic model checking, a formal verification technique, has been used to analyze designs at an early stage for avionics applications . This is a new method, but it is not suitable for large-circuit analysis.
Fault injection and accelerated radiation ground testing are the two main methods of evaluating the SEE performance of a device. However, these are time consuming and cannot be used in the early design phase of the system.
For modern complex electronic systems, especially those that consist of VLSIs, such as SRAM-based FPGA and digital signal processors (DSPs), the above-mentioned methods are not suitable for estimating the SEE soft error vulnerability of the whole system.
A cellular automaton (CA) is a discrete model studied in computability theory, mathematics, physics, complexity science, theoretical biology, and microstructure modeling . Cellular automata can simulate a variety of real-world systems, including biological and chemical ones [17, 18]. In complex electronic systems, the SEE soft errors induced by heavy ions or protons can be propagated and coupled. Research on SEE soft error propagation in complex electronic systems is equivalent to research in modeling cellular automata.
The main purpose of the present work is to expand knowledge in this field. The cellular automaton is introduced to model SEE soft error propagation. As far as we know, this is the first time this approach has been used to model a complex electronic system and acquire the sensitive parameters of a cellular automaton by accelerated radiation experiments.
There are four main parts in this paper. First, background on cellular automata is introduced. Second, the estimation methodology is presented. Third, soft error propagation based on CA is proposed. Accelerated radiation experiments to acquire the parameters of the system are then described. Finally, results are discussed.
2.1. The Theory of Cellular Automaton
A cellular automaton is introduced to simulate the dynamic processes of a nonlinear system. A cellular automaton consists of four components: the cell, cell space, neighborhood cell, and rule, as shown in Figure 1.
Cell. The cell is the basic part of the cellular automaton, and each one is in a finite number of states. The state of the cell varies with discrete time, and the current state at time only depends on the state at time and the current state of the neighborhood cell. A subcircuit unit can be represented as a cell, and the state of the cell can be modeled as a fault grade.
Cell Space. The cell space is the set of grid spaces that the cell arranges, including geometric structure and boundary conditions. For a two-dimensional CA, the grid arrangement can be triangle, square, or hexagon. The boundary conditions are used to determine the neighborhood cell of the cell space. It is suitable to adopt a two-dimensional CA in this research because the subcircuit is arranged on a two-dimensional surface.
Neighborhood Cell. The neighborhood cell is the cell that arranges around the center cell and can be affected by the center cell. For a two-dimensional CA with a square grid space, the neighborhood may be von Neumann-type, Moore-type, or expanded Moore-type, with respective numbers of neighborhood cells of 4, 8, and 24, as shown in Figure 2.
(a) von Neumann-type
(c) Expanded Moore-type
Rule. A fixed rule (generally a mathematical function) determines the new state of each cell in terms of the current state of the cell and the states of the cells in its neighborhood. It is modeled as , where at time is the state of the neighborhood and is a mathematical function. Ordinary rules include Pascal’s triangle, Hex Wolfram’s code, and the outer totalistic rule.
2.2. Applicability Analysis of a Cellular Automaton
In complex electronic systems, the SEE soft error induced by a heavy ion or a proton can be propagated and coupled. A complex electronic system can be divided into several subsystems or subcircuits; an SEE soft error is generated in subcircuit and propagates among the subcircuits. Research on SEE soft error propagation in complex electronic systems is equivalent to research on modeling cellular automata.
As shown in Figure 3, an SEE soft error is generated and propagates in an electronic system. The system consists of four functional modules. In module 1, the SEE vulnerability cell of the device is irradiated by high-energy particles and an SET pulse is generated in the CMOS transistor and propagates in the circuit. It is captured by the flip-latch, so an SEU is generated. A soft error is generated in module 1 and propagated through module 4, and a functional failure of the system is induced.
Such a system can be modeled as a two-dimensional cellular automaton, where the subcircuit is modeled as a cell, and others are modeled as the neighborhood cell. Fault grade is modeled to evaluate the working state of the subcircuit; it is the SEE soft error occurrence probability of the subcircuit. The value is in the range . The value of 0 represents the case where the subcircuit is normal, and a value of 1 denotes the case where the subcircuit is absolutely disabled.
Fault coupling factor is modeled to indicate the fault propagation probability among subcircuits and . Its value is in the range , where a value of 0 represents that the subcircuit is normal and a value of 1 denotes the case where the subcircuit is absolutely disabled.
3. Proposed Estimation Methodology
The proposed methodology is used to model the propagation behavior of a soft error in modern electronic systems and to evaluate the soft error vulnerability of such a system.
Complex systems consist of multiple interconnected subsystems. A subsystem is called a function module, which is a part of the system. System partitioning can be achieved in several ways depending on the given set of criteria. Typically, this can be done based on the hierarchy of the design.
Suppose that a system consists of N function modules,where M(i) is th function module of the system.
The soft error vulnerability of each module is , This is the occurrence probability of output errors of the function module induced by a heavy ion and is determined by the radiation environment and the detailed physical technology implementation of the module. The following equation is in errors/particle:
has the same meaning as fault grade stated in Section 2.2.
The soft error propagation rate between modules is , . And . This is the output error probability of one function module induced by another output of the module. The equation is as follows:
In this paper, the soft error propagation rate is called the coupling factor. It describes the interlink state between modules. Suppose that there are N function modules in the system; then, based on the definition of the coupling factor, the coupling factor matrix of the system is
Moreover, when , then .
Coupling factor matrix represents the probability that a soft error propagates among function modules. It is the foundation of the soft error propagation model. Based on this methodology, the soft error propagation model on a system level is described as follows: where is the failure probability of module . For different structures of the system, the coupling factor matrix is different. Moreover, if we can measure the raw soft error vulnerability of each module and the coupling factor matrix, we can calculate the function failure probability of each module and of the whole system.
4. The Soft Error Propagation Model Based on Cellular Automata
4.1. The Modeling Method for Soft Error Propagation Based on CA
Based on SEE soft error characteristics and propagation behavior and combining the methodology and advantages of the cellular automaton, the soft error propagation model using a CA for complex electronic systems is described as follows. The model is established as a two-dimensional cellular automaton, as shown in Figure 4. matrix is modeled to represent the number of CAs, where , , , represents a cell, and . The neighborhood is Moore-type, and the boundary is circular, which means that the left boundary is connected to the right boundary, and the top boundary is connected to bottom boundary.
The model of the CA is defined as
represents the output state of cell at time . A value of 1 indicates that the cell is normal, a value of −1 indicates that the cell is disabled, and a value of 0 indicates that the cell is vacant.
represents the fault grade value of cell at time ; the larger the value, the more the influence on the cell and its neighborhood at the next time step.
indicates the critical value of the fault grade on a range of . When the fault grade of the cell is larger than this value, a failure will occur in the circuit cell.
is the fault grade matrix of the neighborhood cell for cell . It is the fault grade set of the eight Moore-type neighborhood cells. The first cell is in the top left corner, the third cell is in the top right corner, and the eighth is in the bottom right corner.
is the neighborhood matrix of cell , which is used to mark the connection relationship between the cell and the neighborhood cell. If there is a connection relationship between and kth neighborhood, then the value of is 1; otherwise, it is 0.
represents the coupling factor between cell and , the range is , and .
represents the coupling factor matrix of the neighborhood of cell .
represents whether jamming of the SEE soft error acts on the cell on a range of .
The coupling grade of the cell is modeled to evaluate the coupling performance of the circuit. It is the ratio of the number of coupling factors in larger than to the number of nonvacancies in .
4.2. Algorithm and Simulation
In this section, based on the soft error propagation model, the algorithm is first presented and the soft error propagation evolution simulation is described.
4.2.1. The Algorithm
To simulate soft error propagation based on a two-dimensional CA, the algorithm flow shown in Figure 5 is adopted. Details are as follows.
Step 1. First, at time , for each cell , , is initialized with , where 1 represents that there is a subcircuit in a cell, and 0 represents that there is none. Second, initializing fault grade to randomly, all cells are in the normal state, and the number of faulty cells is . Third, fault coupling factor is initialized on . Finally, based on the neighborhood type and boundary condition, neighborhood matrix of cell is formed along with fault grade matrix and coupli critical valueng factor matrix of cell .
Step 2. At time , the SEE fault jamming situation is set to , simulating the CA based on rules (6) and (7). At time , cell is chosen randomly to be acted upon by SEE fault jam . By simulating using rule (8), if fault grade of the cell is larger than the critical value, cell is disabled.
Step 3. At , based on rules (6) and (7), whether fault grade of the cell is larger than critical value is judged in turn. If , it indicates that cell is disabled and the number of disabled cells is recorded as ; otherwise, .
Step 4. Judging the number of disabled cells, if , it indicates that soft error propagation is steady and the simulation is ended; if , then return to Step 3.
4.2.2. Results and Discussion
Evaluating the effect of SEE soft error propagation on complex electronic systems, different fault grade jamming situations and different coupling factors between cells were simulated.
(1) Different Fault Grade Jamming. The purpose of this experiment was to evaluate different fault grade jamming situations acting on the CA. In physically complex electronic systems, fault grade jamming is associated with the effect of SEE, including the energy and flux of the incoming particles.
The CA model was that of a two-dimensional CA with a size of , a Moore relationship, and a circular boundary. In the initialization, 77 nonvacant cells were generated, and the fault grade was set on range of while the coupling factor was set on range of . The critical fault grade was and the average coupling factor was . At time , fault jam R was enacted on cell . The value of R was set to 1, 10, 100, 1000, and 10000, successively. For different values, the CA model was simulated 100 times, recording fault amount of the whole cell. The average was the estimated value of the number of faulty cells at time .
The simulation was performed on the MATLAB7.1 platform, and the results are shown in Figure 6.
When the fault jam occurred with , , , , and , the number of faulty cells was, respectively, 1 at time 11 (affecting the current cell only), 1.6 at time 13, 2.7 at time 20, 4 at time 20, and 4.2 at time 20.
Generally, the number of faulty cells increased with increasing fault jamming, and the time needed to become steady grew longer. In physical complex electronic systems, the higher the energy and flux of bombarding particles, the more the harm to the circuit.
(2) Different Coupling Factors between Cells. The purpose of this experiment was to evaluate the different coupling factors acting on the CA. In physically complex electronic systems, the coupling factor is associated with the placement and routing of a circuit.
The CA model was that of a two-dimensional CA with a size of , a Moore relationship, and a circular boundary. During initialization, 78 nonvacant cells were generated, and the fault grade was set on range of while the coupling factor was set on range of . The critical fault grade was . At time , a fault jam with is applied to cell . The values of were 0.57576, 0.70034, 0.79125, 0.86869, and 0.99307. For different values, the CA model was simulated 100 times, recording fault number of the whole cell. The average is the estimated number of faulty cells at time .
The simulations were performed on the MATLAB7.1 platform, and the results of which are shown in Figure 7.
When the fault jam was , , , , and , the number of faulty cells, respectively, was 3.3 at time 20, 3.5 at time 20, 3.8 at time 20, 4.3 at time 21, and 4.5 at time 21.
Generally, the number of faulty cells increased with increasing fault jamming, and the time required to become steady grew longer. In physical complex electronic systems, the higher the density of the circuit, the higher the fault coupling factor, the more the harm to the circuit.
5. Experimental Study
In this section, the method used to determine the raw soft error vulnerability of each module and the coupling factor matrix is described by the analysis of one case.
5.1. Analysis of One Case
A data reception and transmission system with a common reception and transmission protocol used in many fields was adopted for study. As shown in Figure 8, there were three function modules.
Receiver Module (M1). This received the data bits one by one and formed a byte from each group of 8 bits.
Condition FIFO Module (M2). This was a three-order FIFO. It received one byte from M1, and if the value of the byte could be divided by 4, then the byte was piped out. Otherwise, the hex value of 1 was piped out.
Transmitter Module (M3). This received one byte of data and transformed it into data bits one by one.
Based on the proposed methodology, we define the following: The raw soft error vulnerability of each module: , The soft error propagation rate between modules: , . And .
The soft error propagation model on the system level is
The functional failure of the system is
5.2. The Accelerated Ground Radiation Experiment
The accelerated ground radiation experiment was used to study the response of the system in a real radiation environment. Both the raw soft error vulnerability and the coupling factor can be obtained.
A test system was established as shown in Figure 9. The radiation board was the circuit system to be tested. The motherboard was the main controller used to monitor the radiation board, acquire the test data, and transmit it to the test computer. Other components included the test computer, digital power supply, and remote control computer. Only the remote control computer was in the testing room; other equipment was in the radiation room; the radiation board and motherboard were in the vacuum room.
In order to irradiate the different modules of the circuit at the same time, four copies of the circuit were implemented, as shown in Figure 10. The tested modules included the receiver module, condition FIFO module, and transmitter module, which are the shaded blocks in the diagram. One copy was a golden system used to judge the others. The output signals of each module were monitored; fifo1_1 was the output of receiver module; fifo1_2 was the output of the condition FIFO module; SDO was the output of the transmitter module. The same arrangement was used for the other copies.
The four copies were implemented in an Actel FPGA A54SX32A, which is an antifuse FPGA. The method of placement and routing for each copy was based on the area constraint method  so the radiant modules of different copies could be placed on one side of the device.
The surface package of the A54SX32A was removed before the experiment. The silicon area of the device was and the irradiation area was . The heavy ion was Br, and detailed parameters are listed in Table 1. The heavy ion was provided by the HI-13 Tandem Accelerator of the Atomic Energy Institute of the Physical Science Research Institute in Beijing.
The actual test scene is shown in Figure 11, and the experimental flow was as follows:(1)Establish the testing system shown in Figures 9 and 11.(2)Adjust the heavy ion beam; the incident angle was apeak. The spot area of the beam was about in size. The flux of the Br ion began at /cm2/s.(3)Start the experiment. We operated the remote control computer to control the testing computer. The test computer recorded the test results. The number of data bits was 703840, and periods of the system were covered.
5.3. Results and Discussion
Based on the radiation experiment, the soft error vulnerability of each module was
This indicates the functional failure probability of each module induced by one heavy ion (Br), on average. As shown in Table 4, the lower the combinatorial unit/sequential unit ratio of the module, the higher its soft error vulnerability.
Based on the experiment, as shown in Table 4, is the coupling factor between modules 1 and 2 and is that between modules 2 and 3. The radiation testing is an approximate method for ascertaining the coupling factor and is used because it is almost impossible to change all the inner node values of a circuit in a real physical system. The soft error vulnerability of the whole circuit is
This represents the output error probability of the circuit induced by one heavy ion (Br). Based on the results, the soft error propagation model is a reliable model for evaluating the functional failure of one system. It is a suitable method for evaluating the soft error vulnerability of circuit systems in early design.
In order to study SEE soft error propagation in complex electronic systems and expand relevant knowledge, soft error propagation based on a cellular automaton was proposed, and the method of estimating the soft error vulnerability of a circuit by employing circuit partitioning and error propagation was presented.
The basic theory of cellular automaton and analysis of its applicability was presented. Based on the characteristics of soft error propagation, the cellular automaton model was proposed and the corresponding cell, cell space, neighborhood cell, and rule were stated. The main parameter fault grade and the fault coupling factor were modeled.
A complex electronic system was modeled as a two-dimensional cellular automaton. The algorithm was presented and simulated. Different fault grade jamming situations and different coupling factors between cells were simulated. Generally, the number of faulty cells increased with increasing fault jamming and coupling factors, and the time required to become steady increased. The soft error vulnerability of each module and the coupling factors between modules were determined by accelerated radiation experiments. A testing system was established, and the testing circuit was implemented in FPGA, with detailed methods developed to ascertain the parameters. Results of the experiments indicate that the soft error vulnerability of a circuit is determined by soft error vulnerability of each module and by the interlink state between modules. The output error probability of the circuit induced by one heavy ion was calculated. The sum of the different modules’ corresponding values contributed to the final output of the system.
This approach highlights error propagation behavior through the internal modules of a circuit design. Future work may involve modeling other modern electronic systems to improve confidence in the generality of the results and further research on cellular automata, especially regarding the rule of cellular automata. The results of simulations and experiments indicate that the method is feasible.
The authors declare that there are no competing interests regarding the publication of this paper.
This work is supported by National Natural Science Foundation of China (61171019, 61201031).
M. Nicolaidis, Soft Errors in Modern Electronic Systems, Springer, New York, NY, USA, 2011.
Z. Sen, S. Jun, and W. Jiulong, “Satellite on-board failure statistics and analysis spacecraft engineering,” Spacecraft Engineering, vol. 19, no. 4, pp. 41–46, 2010 (Chinese).View at: Google Scholar
N. J. Wang, J. Quek, T. M. Rafacz, and S. J. Patel, “Characterizing the effects of transient faults on a high-performance processor pipeline,” in Proceedings of the International Conference on Dependable Systems and Networks, pp. 61–70, IEEE, July 2004.View at: Google Scholar
N. A. Harward, M. R. Gardiner, L. W. Hsiao, and M. J. Wirthlin, “Estimating soft processor soft error sensitivity through fault injection,” in Proceedings of the 23rd IEEE Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM '15), pp. 143–150, IEEE, Vancouver, Canada, May 2015.View at: Publisher Site | Google Scholar
J. Chetia, An Efficient AVF Estimation Technique Using Circuit Partitioning, Graduate School of Vanderbilt University, Nashville, Tenn, USA, 2012.
S. Wolfram, “Statistical mechanics of cellular automata,” Modern Physics, vol. 3, pp. 601–621, 1983.View at: Google Scholar