Department of Nuclear Engineering, Polytechnic of Milan, Via Ponzio 34/3, Milan 20133, Italy
Artificial neural networks are powerful algorithms for constructing nonlinear empirical models from operational data. Their use is becoming increasingly popular in the complex modeling tasks required by diagnostic, safety, and control applications in complex technologies such as those employed in the nuclear industry. In this paper, the nonlinear modeling capabilities of an infinite impulse response multilayer perceptron (IIR-MLP) for nuclear dynamics are considered in comparison to static modeling by a finite impulse response multilayer perceptron (FIR-MLP) and a conventional static MLP. The comparison is made with respect to the nonlinear dynamics of a nuclear reactor as investigated by IIR-MLP in a previous paper. The superior performance of the locally recurrent scheme is demonstrated.
1. Introduction
Several design and verification activities in the field of
nuclear power plant engineering rely on the simulation of the plant dynamic
response under different scenarios and conditions. However, the complexity and
nonlinearities of the involved processes are such that analytical modelling
becomes burdensome, if at all feasible.
For this reason, empirical modelling is becoming very popular
since it does not require a detailed physical understanding of the processes or
knowledge of the material properties, geometry, and other characteristics of
the plant and its components. The underlying dynamic model is identified by
fitting plant operational data with a procedure often referred to as
“learning”.
In this respect, artificial neural networks are powerful algorithms for
constructing nonlinear empirical models from operational data. As a fact,
artificial neural networks are being used with increasing frequency as an
alternative to traditional models in a variety of engineering applications
including monitoring, prediction, diagnostics, control, and safety.
Whereas standard feedforward neural networks can model only static
input/output mappings [1–4], recurrent neural networks (RNNs)
have been proven to be universal approximators of nonlinear dynamic systems [5–7].
Two main methods exist for providing a neural network with dynamic
behavior, that is, the insertion of a buffer somewhere in the network to
provide an explicit memory of the past inputs, or the implementation of
feedbacks.
As for the first method, it builds on the structure of feedforward networks where all input signals
flow in one direction, from input to output. Since a feedforward network does
not have a dynamic memory, tapped delay lines (temporal buffers) of the inputs are introduced. The buffers can be applied at
the network inputs only, keeping the network internally static as in the buffered
multilayer perceptron (MLP) [8] or at the input of each neuron as in
the MLP with finite impulse response filter synapses (FIR-MLP) [9, 10]. The main disadvantage of the
buffer approach is the limited past history horizon which needs to be used in
order to keep the size of the network computationally manageable, thereby
preventing modelling of arbitrary long-time dependencies between inputs and
outputs [11]. It is also difficult to set the
length of the buffer given a certain application.
Regarding the second method, the most general example of implementation of feedbacks in a neural
network is the fully recurrent neural network constituted by a single layer of
neurons fully interconnected with each other [12] or by several
such layers [13, 14]. Because of the
required large structural complexity of this network, in recent years growing
efforts have been propounded in developing methods for implementing temporal
dynamic feedback connections into the widely used
multilayered feedforward neural networks. Recurrent
connections can be added by using two main types of recurrence or feedback: external or internal. External
recurrence is obtained, for example, by feeding back the outputs to the
input of the network, as in NARX networks [15–18]; internal recurrence is obtained by feeding back the outputs of
neurons of a given layer to inputs of neurons of the same layer, giving rise to
the so-called locally recurrent neural networks (LRNNs) [19, 20].
The major advantages of LRNNs with respect to the buffered tapped delayed
feedforward networks and to the fully recurrent networks are [19] as follows: (1) the hierarchic multilayer topology on which they are
based is well known and efficient; (2) the use of dynamic neurons allows to
limit the number of neurons required for modelling a given dynamic system,
contrary to the tapped delayed networks; (3) the training procedures for
properly adjusting the network weights are significantly simpler and faster
than those for the fully recurrent networks.
In a previous paper [21], an infinite impulse response
locally recurrent neural network (IIR-LRNN) has been trained by a recursive backpropagation
(RBP) algorithm [19] to track the nonlinear continuous time
dynamics of a nuclear reactor [22]. In the IIR-LRNN, the synapses
are implemented as infinite impulse response (IIR) digital filters, which
provide the network with system state memory.
In this paper, the same case study is considered to show the benefits gained from the use of the
IIR-LRNN by making similar comparisons as in [19] with two static
networks, namely, an FIR-MLP and a conventional static MLP.
The paper is organized as follows. For
completeness and self-consistency, in Section 2, the main features of the IIR-LRNN
architecture and forward calculation are briefly summarized [19]. In Section 3, the application of the IIR-LRNN to the reactor neutron
flux dynamics made in [21] is illustrated and then compared to that of the two above mentioned static
neural models. The conclusions drawn from such comparison are proposed in Section
4.
2. Locally Recurrent Neural Networks
2.1. The IIR-LRNN Architecture and Forward Calculation
The following description is a brief synthesis
of the illustration of the IIR-LRNN given in [19, 21].
An LRNN is a
time-discrete network consisting of a global feedforward structure of nodes interconnected
by synapses which link the nodes of the kth
layer to those of the successive th
layer, , with layer 0 being the
input and M being the output.
Different from the classical static feedforward networks, in an LRNN, each
synapse carries taps and feedback connections. In particular, each synapse of
an IIR-LRNN contains an IIR linear filter whose characteristic transfer
function can be expressed as ratio of two polynomials with poles and zeros
representing the autoregressive (AR) and moving average (MA) parts of the model,
respectively.
During the forward phase, at the generic time , the
generic neuron belonging to the generic layer receives in input the quantity from neuron of layer :
The quantities , , are summed to obtain the net input to the nonlinear activation function ,
which is typically a sigmoidal Fermi function of the jth node, , of the kth
layer, :
The output of the activation function gives the
state of the jth neuron of the kth layer :
For simplicity of illustration, and with no loss of generality, an
example of a network constituted by only one hidden layer (i.e., ) is depicted in Figure 1.
Figure 1: Scheme of an IIR-LRNN with one hidden layer.
Note that if all the synapses contain only the MA part (i.e., for all j, k, l), the architecture reduces to an FIR-LRNN, and if all the
synaptic filters contain no memory (i.e., and for all j, k, l), the classical multilayered feedforward static neural network is
obtained.
Further details about the IIR-LRNN architecture and forward
calculation may be found in [19, 21].
2.2. The Recursive Backpropagation (RBP) Algorithm for Batch Training
The recursive backpropagation (RBP) training algorithm [19] is a gradient-based minimization algorithm which makes use of a
particular chain rule expansion for the computation of the necessary
derivatives. When used in batch mode, it is equivalent to real-time recurrent learning
(RTRL) [23] and backpropagation through time (BPTT)
[24]. For brevity, the RBP algorithm is not presented in
this paper; the interested reader may refer to [19, 21] for details.
3. The LRNN Model for the Simulation of Neutron Flux Dynamics
In this section, we first proceed to illustrate the case study and
its development by the IIR-LRNN presented in [21] (see Sections
3.1 and 3.2), then we proceed to the comparison of the achieved performance
with two static neural networks,
namely, an FIR-MLP and a conventional static MLP properly devised to the scope
(see Section 3.3).
3.1. The Dynamic System
The neutron flux dynamics are described by a simple model based on a
one-group point kinetics equation with nonlinear power reactivity feedback, combined
with xenon and iodine balance equations [22]:
with the usual nuclear physics meaning of the symbols employed (see Acronyms
and Symbols).
The reactor evolution is assumed to start from an equilibrium state
at a nominal flux level n/cm2 s. The initial reactivity needed to keep
the steady state is = 0.071, and
the xenon and iodine concentrations are = 5.73·1015 nuclei/cm3 and nuclei/cm3, respectively. In what follows, the values of flux, xenon, and iodine
concentrations are normalized with respect to these steady-state values.
The objective is to design and train an LRNN to reproduce the
neutron flux dynamics described by the system of differential equations (see (4)),
that is, to estimate the evolution of the normalized neutron flux , knowing the forcing function .
Notice that the estimation is based only on the current values of
reactivity. These are fed in input to the locally recurrent model at each time
step t. Thanks to the MA and AR parts
of the synaptic filters, an estimate of the neutron flux at time t is produced, which recurrently accounts for past values of both the network
inputs and the estimated outputs, namely,
where is the set of adjustable parameters of the
network model, that is, the synaptic weights.
On the contrary, the other nonmeasurable system state variables, Xe(t)
and I(t), are not fed in input to the LRNN; the associated information
remains distributed in the hidden layers and connections, which renders the
LRNN modelling task quite difficult.
3.2. The LRNN Training
The LRNN used in this work is characterized by three
layers: the input layer with two nodes (bias
included), the hidden layer
with six nodes (bias included), and the output layer with one node. A sigmoidal activation
function has been adopted for the hidden and output nodes.
The training set is made up of = 250 transients, with each one lasting for T = 2000 minutes and sampled with a time
step of 40 minutes, thus
generating patterns.
Notice that a temporal length of 2000 minutes allows for the development of the
long-term dynamics which are affected by the long-term Xe oscillations.
All data have been normalized in the range of 0.2–0.8.
Each transient has been created varying the reactivity from its steady-state
value according to the following step function:
where is a
random steady-state time interval and is a random reactivity variation amplitude. In order to build the 250 different
transients for the training, these two parameters have been randomly chosen
within the ranges of 0–2000 minutes and ,
respectively.
The training procedure has been carried out on
the available data for learning epochs (iterations). During each epoch, every transient is
repeatedly presented to the LRNN for consecutive times. The weight updates are performed in batch at the end of
each training sequence of length T. Neither
momentum term nor an adaptive learning rate [19] turned
out to be necessary for increasing the efficiency of the training in this case.
The principal training parameters are summarized in Table 1.
Table 1: Training parameters of the LRNN for simulating the reactor neutron flux.
The number of delays (orders of the MA and AR
parts of the synaptic filters) has been set by trial-and-error, so as to obtain
a satisfactory performance of the LRNN, measured in terms of a small root mean
square error (RMSE) on the training set. The best LRNN structure resulting from
these tests is summarized in Table 2.
Table 2: Structure of the LRNN for the simulation of the reactor neutron flux.
3.3. Results
The generalization capability of the LRNN is
verified on test transients generated by forcing functions’ variations quite
different from those used in the training phase (e.g., ramp, sinusoidal, and
random variations).
The evolutions of the flux, normalized with
respect to the steady-state value , corresponding to three sample
transients of the test set are reported in Figures 2, 3, and 4. The LRNN
estimate of the output (crosses) is in satisfactory agreement with the actual
transient (circles), even for dynamics quite different from those used for the
network training. Notice the ability of the LRNN to deal with both the
short-term dynamics governed by the instantaneous variations of the forcing
function (i.e., the reactivity step) and the long-term dynamics governed by Xe oscillations. Furthermore, the
computing time is about 5000 times lower than that required by the numerical solution
of the underlying model (4). This makes the LRNN model very attractive for real-time
applications, for example, for control or diagnostic purposes, and applications
for which repeated evaluations are required (e.g., uncertainty and sensitivity
analyses).
Figure 2: Comparison of the model-simulated normalized flux (circles) with the
LRNN-estimated one (crosses), for one sample ramp transient of the test set.
Figure 3: Comparison of the model-simulated normalized flux (circles) with the
LRNN-estimated one (crosses), for one sample sinusoidal transient of the test
set.
Figure 4: Comparison of the model-simulated normalized flux (circles) with the
LRNN-estimated one (crosses), for one sample random transient of the test set.
3.1.1. Comparison with Two Static Neural Networks
Two additional static neural network models have
been examined for comparison: a buffered multilayer perceptron (MLP), where tapped delay lines are applied at the
network inputs only, keeping the network internally static (see Figure 5) [8], and a finite impulse response multilayer perceptron (FIR-MLP), where
temporal buffers are applied at the input of each neuron; that is, all
connection weights are realized by linear FIR filters (see Figure 6) [8, 10, 25–27].
In passing, notice that the buffered MLP and FIR-MLP can be shown to be
theoretically equivalent since the internal buffers can be implemented as an
external one [27]. However, to implement an FIR-MLP as a buffered MLP, the
first layers’ subnetworks must be replicated with shared weights, and this
increases the complexity with respect to the case of considering the internal buffer
[27]. This leads to different architectures of the buffered MLP and
FIR-MLP in their actual implementations.
Figure 5: Example of buffered MLP with input buffer.
Figure 6: (a) Model of the neuron for an FIR-MLP and (b) example of an FIR filter of
fourth order.
For a fair comparison, the structures of the
static neural networks considered have been selected so that they contain
approximately the same number of adaptable parameters as does the IIR-LRNN
described in Section 3.2. In particular, the buffered MLP is chosen with fourteen
hidden neurons (bias included) and fifteen input delays, whereas the FIR-MLP is
selected with ten hidden neurons (bias included) and linear FIR filters of twelfth
order.
Three different learning algorithms have been
used: standard static backpropagation (BP) for the buffered MLP, temporal backpropagation
(TBP) for the FIR-MLP, and recursive backpropagation (RBP) for the IIR-LRNN. The
information concerning the structures and learning algorithms of the three neural
networks is summarized in Table 3.
Table 3: Structures and learning algorithms of the three neural networks involved in
the comparison (i.e., buffered MLP, FIR-MLP, and IIR-LRNN).
The training procedures have been carried out
on the learning dataset described in Section 3.2, and their results have been
expressed in terms of the root mean square error (RMSE) computed after each
learning epoch. From Figure 7, it is evident that the IIR-LRNN outperforms both
the static MLP and the FIR-MLP, showing better modelling capabilities, faster
training, and significantly higher accuracy; the asymptotic RMSE values are
0.081 for the static MLP, 0.075 for the FIR-MLP, and 0.007 for the IIR-LRNN.
Figure 7: Convergence
performance of the buffered MLP, FIR-MLP, and IIR-LRNN applied
to the reactor neutron flux estimation.
The representation and generalization capabilities
of the three neural architectures considered have then been compared on a
number of different test datasets. The results are synthesized in Table 4 in terms of root mean square
error (RMSE), mean absolute error (MAE), and mean relative error (MRE):
where and are the values of the normalized flux and its
neural estimate in the oth pattern of
the nth transient, respectively.
Table 4: Values of the performance indices (RMSE, MAE, MRE) calculated over different
test sets for the buffered MLP, FIR-MLP, and IIR-LRNN trained to estimate the
reactor neutron flux dynamics.
Owing to the richness of the network
architecture, the IIR-LRNN model exhibits consistently better performance
compared to the static models. For instance, considering the ramp test set of
Figure 2, the IIR-LRNN for the normalized neutron flux provides an RMSE of
0.0049 and an MAE of 0.0039. These values are significantly (7-8 times) lower
than those provided by both the buffered MLP (0.0751 and 0.0503, resp.) and the
FIR-MLP (0.0698 and 0.0435, resp.); these results are pictorially confirmed by a
comparison of Figures 2, 3, and 4 (IIR-LRNN) with Figures 8, 9, and 10
(buffered MLP and FIR-MLP), respectively.
Figure 8: Comparison of the model-simulated normalized flux (circles) with the one estimated
by (a) the buffered MLP (crosses) and by (b) the FIR-MLP (crosses) for one
sample ramp transient of the test set.
Figure 9: Comparison of the model-simulated normalized flux (circles) with the one
estimated by (a) the buffered MLP (crosses) and by (b) the FIR-MLP (crosses) for one sample sinusoidal transient of the
test set.
Figure 10: Comparison of the model-simulated normalized flux (circles) with the one
estimated by (a) the buffered MLP (crosses) and by (b) the FIR-MLP (crosses) for one sample random transient of the test
set.
Figures 8, 9, and 10 point out a key
disadvantage of the buffer and FIR approaches with respect to the locally
recurrent one, that is, the limited past history horizon which prevents modelling
of arbitrary long-time dependencies. In this view, the IIR-LRNN represents a
generalization of the FIR-MLP to the infinite memory case [28].
4. Conclusions
The design, operation, and control of complex industrial
systems, such as the nuclear, chemical, and aerospace ones, entail the
capability of accurately modelling the nonlinear dynamics of the underlying
processes.
In this respect, artificial neural networks (ANNs) have
gained popularity as valid alternatives to the lengthy and burdensome
analytical approaches for reconstructing complex nonlinear and multivariate
dynamic mappings.
In particular, recurrent neural networks (RNNs) are
attracting significant attention, because of their intrinsic potentials in
temporal processing, for example, time series prediction, system identification
and control, and
temporal pattern recognition and classification.
In this paper, an infinite impulse response locally
recurrent neural network (IIR-LRNN) has been compared to a finite impulse response
multilayer perceptron (FIR-MLP) and a conventional static MLP. The comparison has
been carried out with respect to the problem of estimating the evolution of the
neutron flux in a simplified nuclear reactor model of literature, starting from
the knowledge of reactivity evolution only.
The ability of the trained IIR-LRNN to deal with both the
short-term dynamics, governed by the instantaneous variations of the forcing
function (i.e., the reactivity), and the long-term dynamics, governed by the Xe oscillations, is very satisfactory
and turns out to be the main reason for outperforming the static neural modelling
approaches of the buffered MLP and FIR-MLP, in terms of both estimation
accuracy and generalization capabilities.
Acronyms and Symbols| ANN: | Artificial neural network |
| RNN: | Recurrent neural network |
| LRNN: | Locally recurrent neural
network |
| FIR: | Finite impulse response |
| IIR: | Infinite impulse response |
| MLP: | Multilayer perceptron |
| RBP: | Recursive backpropagation |
| NARX: | Nonlinear
autoregression with exogenous inputs |
| AR: | Autoregressive |
| MA: | Moving average |
| RTRL: | Real-time recurrent learning |
| BPTT: | Backpropagation through time |
| RMSE: | Root mean square error |
| MAE: | Mean absolute error |
| MRE: | Mean relative error |
| k: | Layer index
(in particular, k = 0 and k = M denote the input and the output
layers, resp.) |
| : | Number of
neurons in the kth layer (in
particular, and denote the numbers of input and output
neurons, resp.) |
| J: | Neuron index |
| t: | Continuous time index |
| : | Output of the jth neuron of the kth
layer at time t (in particular, refers to the bias inputs; note
that , j = 1, 2,, , are the input signals) |
| : | Order of the MA part of the synapse
of the jth neuron of the kth layer relative to the lth output of the (k 1)th layer ( and ) |
| : | Order of the AR part of the
synapse of the jth neuron of the kth layer relative to the lth output of the (k 1)th layer ( and ) |
| : | (p = 0, 1,, ) coefficients of the MA part of the
corresponding synapse (if ,
the synapse has no MA part, the weight notation becomes ,
and is the bias) |
| : | (p = 1, 2,, ) coefficients of the AR part of the synapse (if ,
the synaptic filter is purely MA) |
| : | Nonlinear activation function
relative to the kth layer |
| : | Derivative of |
| : | Synaptic filter output at time t relative to the synapse connecting the jth neuron of the kth layer to the lth input |
| : | “Net” input to the activation
function of the jth neuron of the kth layer at time t |
| : | (r = 1, 2,, )
desired target of output node r at
time t |
| : | Learning
coefficient |
| : | Momentum
coefficient |
| : | Number of
learning epochs during training |
| Number of
consecutive repetitions of each transient during training |
| y(t): | Generic system output vector at
time t |
| x(t): | Generic forcing functions vector
at time t |
| Generic set of adjustable parameters of a model |
| : | Mapping
function of a process (possibly nonlinear) |
| : | Normalized
neutron flux at time t |
| Xe(t): | Normalized xenon concentration at
time t |
| I(t): | Normalized iodine
concentration at time t |
| (t): | Reactivity
value at time t |
| : | Nominal normalized neutron
flux |
| : | Nominal
xenon concentration |
| : | Nominal
iodine concentration |
| : | Nominal
reactivity value |
| : | Reactivity
amplitude variation |
| : | Effective fission macroscopic cross-section
(cm−1) |
| : | Effective xenon microscopic cross-section
(cm2) |
| : | Xenon fission yield |
| : | Iodine fission yield |
| : | Xenon decay rate (s−1) |
| : | Iodine decay rate (s−1) |
| : | Lumped temperature feedback
coefficient (cm2s) |
| C: | Lumped
dimensional conversion factor of xenon concentration to reactivity |
| : | Effective neutron mean
generation time (s) |
| : | Neural estimate of the normalized
neutron flux at time t |
| : | Number of transients in the
training/validation/test sets |
| T: | Temporal
length of a transient |
| : | Time
step for the numerical simulation of a transient |
| : | Number of patterns in a training/validation/test
transient |
| n: | Transient
index |
| o: | Pattern
index |
| : | Steady-state time interval (in step and ramp
forcing functions) |
| : | Variation time interval (in ramp forcing
functions) |
| F: | Oscillation
frequency (in sinusoidal forcing functions) |
| : | Normalized
flux value in the oth pattern of the nth transient |
| : | Neural
estimate of the normalized flux value in the oth pattern of the nth
transient |