#### Abstract

This paper reviews the major developments of modeling techniques applied to nonlinear dynamics and chaos. Model representations, parameter estimation techniques, data requirements, and model validation are some of the key topics that are covered in this paper, which surveys slightly over two decades since the pioneering papers on the subject appeared in the literature.

#### 1. Introduction

The field of nonlinear dynamics experienced a very quick and intense development in the last thirty years or so. To determine the starting point of any subject is very difficult simply because we all stand on somebody elses showlders in any “new” attempt we make. For the sake of presentation, and because we feel that this understanding is not absurd anyway, the origins of what we nowadays call the field of nonlinear dynamics can be traced back to the work of Henri Poincaré.

In his studies on nonlinear dynamical systems Poincaré figured out that, since no analytical solution to most of nonlinear systems can be obtained, the whole set of solutions can be investigated in the so-called phase space spanned by the set of variables required for a complete description of states of the system [1]. A few years later, while investigating the three-body problem, he observed that small perturbations can deeply affect the solution [2]. In order to investigate nonlinear dynamics, Poincaré introduced the concepts of phase portrait, Poincaré section, periodic orbit, return map, bifurcation, fixed point, and so on. Most of these concepts were used by Andronov's school in the 20 s [3] but the first representation in the phase space of a now called “chaotic solution” was due to the solution Edward Norton Lorenz [4]. The so-called Lorenz attractor thus represents the first chaotic attractor ever drawn.

From the publication of Lorenz's paper up to the mid of the 70 s not many papers were published. Among those who contributed to the emergence of “chaos theory,” we can quote Ruelle and Takens paper on turbulence [5], May about bifurcation [6], Rössler's new chaotic attractors [7], Li and Yorke's theorem for existence of chaos in map [8]. These contributions, published before 1980, popularized the word “chaos” as well as related techniques to investigate these new types of solutions.

An important turning point happened around 1980 (Figure 1). There can be little doubt that such a turning point was provoked, by the papers [9, 10]. The reason for this is quite simple. So far, in the investigation of a nonlinear system , with , typically it was assumed that the entire state vector was available. This constraint was clearly too restrictive in practical problems and consequently it was unclear what kind of role could the theory of nonlinear dynamics play in * real world problems*. From a suggestion by Ruelle, Packard and colleagues showed that it was possible to build a phase portrait for the original system using a scalar , with [9]. Furthermore, Takens and Mañé mathematically proved under which conditions the phase space reconstructed using was diffeomorphically equivalent to the one that would be obtained if the entire state vector was available [10, 11]. A decade later, other important works on the subject were published [12, 13]. When such works came to light, hopes of using the “new theory of nonlinear dynamics” in practical problems were kindled, because now all that was required, in principle, and assuming an ideal measurement function was the recording of a single variable.

When, in the early 1980, it became accepted that a scalar-reconstructed attractor could be equivalent to the original one, a great deal of work was devoted to developing tools to quantitatively characterize chaotic attractors. In the following decade, geometric measures such as dimensions [14, 15], Lyapunov exponents [16] and entropies [17] were adapted, applied, and intensively investigated in the context of nonlinear dynamics. By the end of the 1980, there was a large set of works on these subjects as can be found in [18, 19].

Another turning point which can be detected in the number of papers published in the field of nonlinear dynamics and chaos happened in the end of the decade 1980 (Figure 1). Unlike the turning point in the beginning of the decade, which quite clearly can be related to fundamental papers on embedding theory, it is quite hard to relate the turning point in the late 1980 to a specific topic within the field. Most likely, such a turning point was an indication that several important topics started to be investigated as well as many applications started in various fields. Among such topics we are to find the * modeling of nonlinear dynamics and chaos*. To survey the main developments of this topic is the aim of the present paper.

#### 2. Nonlinear System Identification

Before actually addressing how the modeling of nonlinear dynamics and chaos developed within the field, it is important to say that a significant amount of work was developed independently, and sometimes previously, in the field of nonlinear system identification. The basic goals were very similar to those that later were pursued by the nonlinear dynamics community but there were some important differences. Nonlinear system identification, which has its origins in the field of engineering, is usually concerned with nonautonomous systems, discrete-time models, disturbance modeling (this is vital to correctly deal with noisy data), and hardly ever was concerned with chaos. On the other hand, modeling nonlinear dynamics and chaos, with its origins in physics and applied mathematics, usually concerned with autonomous systems, very often considers continuous-time models, does not typically model disturbances, and is strongly focused on chaotic systems.

Having established such main distinctions between the two fields, let us point out that this paper is a survey of modeling techniques applied to nonlinear dynamics and chaos. Many such techniques have indeed been developed in the field of nonlinear system identification and will be mentioned in this survey only to the extent in which they were applied to modeling nonlinear dynamics and chaos. In the reminder of this section, for the sake of completion, we present some issues in nonlinear system identification which will be relevant in the discussion of modeling techniques applied to nonlinear dynamic and chaos. In [20] the authors also point out some conceptual differences between what they call stochastic and deterministic modeling.

By the end of the 1970 linear system identfication was well established. The typical problem was to build a model, usually a transfer function of the form from a set of possibly noisy data , , . In (2.1) and are polynomials in the backward shift operator , that is, . A commonly used representation of (2.1) is the ARX (autoregressive model with exogeneous inputs) model where and are the maximum lags used. Noise is usually assumed to appear in the output . The presence and the type of noise usually are such that the standard least squares estimator will produce wrong results. This calls for disturbance modeling and some type of generalized least squares algorithm, which is no longer a linear estimator. The simples disturbance model which is the moving average (MA), which added to (2.2), yields the well-known ARMAX model. More general disturbance and system models have been proposed and studied in detail in standard texts such as [21, 22].

The approach to nonlinear system identification happened along different lines of action. One such line of action was to build so-called block-oriented models which consisted of a linear transfer function model (see (2.1) either after or before a nonlinear * static* funtion , such were the Hammerstein and Wiener models. Another line of action was the development of nonparametric models like the Volterra models [23]. The numerical algorithms involved in the development of such models were, in one way or another, related to the correlation function. A glimps of the state-of-the-art in the development of block-oriented and Volterra models at the end of 1970 can be found in [24].

A third, and last one to be mentioned here, line of action was to rewrite (2.2) as and to generalize for the case in which was a nonlinear function. Because the regressor variables in (2.3) were of the ARX type, and because was now allowed to be nonlinear, (2.3) was referred to as a NARX model. As before, in the case of noisy data, disturbance modeling is a must and, therefore, additional regressor terms of the MA (moving average) type have to be included in (2.3). This lead to the now well-known acronym NARMAX. The building of a NARMAX model is typically composed of three steps: (1) the choice of the class of models to be used for approximating . Some model classes include: polynomial models [25], rational models [26], neural models [27], radial basis function models [28], and wavelet networks [29]. (2) The following step is to determine which regressor variables to use in (2.3), this is equivalent, although it is more general, to the choice of the embedding space in modeling nonlinear dynamics and chaos. For recent discussions and a survey on such techniques we refer the reader to [30–35]. (3) In model building the final step is to estimate whatever parameters may have. This step is far the easiest. Standard and robust techniques for accomplishing parameter are available [21, 22]. The important problem of model validation, which will be addressed later on, has more to do with model testing than model building.

#### 3. Model Building for Nonlinear Dynamics and Chaos

After presenting a brief introduction (Section 1) and pointing out some standard problems in nonlinear system identification (Section 2), we are ready to start surveying the development of data-driven models for nonlinear dynamics and chaos. There are various possible ways of addressing this vast subject, and it is not clear which will turn out to be the most pedagogical one. We choose to start covering five of the earliest papers in the field and follow by discussing some of the mostly used model classes.

##### 3.1. A Few Pioneering Works

In what follows, we begin by mentioning explicitly five pioneering works, the first four of which were of great influence; [36–40]. The reason for including [40] in this group is its early date. In the remainder of the section we will survey some of the major developments in the field. Together, these five papers have been quoted 2 719 times from their publication up to the end of 2008. (The search machine used was ISI Web of Science.)

###### 3.1.1. The Local Linear Predictor

The first papers on modeling nonlinear dynamics and chaos seem to have appeared in 1987. One of them is related to local linear modeling, the rest is generally concerned with some aspect of global modeling. Because of its historical importance, in this subsection, the main points concerning the local linear predictor [36, 41] will be reviewed.

Suppose that a measured time series lies on a -dimensional attractor of an th-order deterministic dynamical system. The starting point obtains an embedding from the recorded data. A convenient, though not unique, representation is achieved by using * delay coordinates*, for which a delay vector has the following form:
where is the * embedding dimension* and is the * delay time*. Takens has shown that embeddings with will be faithful generically so that there is a smooth map such that
for all integers , and where the forecasting time and are also assumed to be integers. Takens' theorem gives no indication as to how find. One of the first attempts to to build the map from the data was the local linear predictor method, which started by partitioning the embedding space into neighbourhoods within which the dynamics can be appropriately described by a linear map such that

Several choices for have been suggested in the literature such as linear polynomials [36, 42] which can be interpolated to obtain an approximation of the map [43]. Simpler choices include * zeroth-order approximations*, also known as * local constant predictors* [36, 44], and a * weighted predictor* [45]. Linsay proposed to weigh the predictions according to information obtained from the local data [45]. In [46] the authors put forward the concept of * pseudofalse neighbors* which are left out from the construction of a local linear predictor to improve performance, and in [47] the authors consider the multivariate case.

A common difficulty of such approaches is that the data have to be separated into neighbourhoods. Thus given a point in the embedded space the closest neighbours to such a point must be found. It is well-known that for many methods most of the CPU time is spent in searching for close neighbours in the embedding space within the data [48] and that the effort required to accomplish this grows exponentially with the embedding dimension.

Another drawback has to do with model representation. The local linear predictor does not have a closed form, in other words, it is not a global model. This prevents using the obtained model in any kind of analytical investigation.

The local linear predictor has been applied to several real-time series including Wolf's sunspot numbers [20, 49], daily dollar exchange rates [50], R-R heart intervals [49], to mention a few.

###### 3.1.2. Equations of Motion

In 1987 James Crutchfield and Bruce McNamara published a paper that became a reference in the field [37]. The title of this section is a reference to that paper rather to a particular class of models. It seems safe to say that this was the first journal paper in the field devoted to global modeling, as oposed to the “patchwork-models” produced by the local linear approach, and which the authors call the “atlas equation of motion procedure.”

In [37] the authors consider linear-in-the-parameter models of the type , with , where is a vector of function basis and is a vector of parameters to be estimated from data. Discrete-time models are also considered in the paper. The authors described their method without any particular class of function basis in mind. The parameter vector was estimated using a singular value decomposition implementation of least squares. Their method is general and quite basic: the embedding dimension and the number of basis functions used are varied over a set of alternatives. The choosen model is the one with smallest entropy . In their examples, that include the Hénon map, Duffing, and van der Pol oscillators, the function basis used was B-splines and discrete monomials.

In closing, it should be said that more than a method, Crutchfield and McNamara put forward a general philosophy for modeling chaotic data. With the exception of some issues that are specific to such data, their procedure does not differ substantially from works in the field of nonlinear system identification using linear-in-the-parameter models widely available at that time (see [34] for a survey).

###### 3.1.3. Multivariable Functional Interpolation

The paper by David Broomhead and David Lowe is among the four pioneering works that we have selected to start surveying the main developments in modeling of nonlinear dynamics and chaos [38]. Such a paper is often quoted as being responsible for the proposition of using Radial Basis Function (RBF) models to describe nonlinear systems. Although in their paper the authors quote a review work by M. J. D. Powell entitled “Radial basis functions for multivariable interpolation: a review,” it seems that the work by Powell did not consider dynamical systems but only static problems.

A typical model, as proposed by Broomhead and Lowe, can be written as
where is the number of radial basis functions, , , and are the unknown parameters to be estimated, are the centers, and is usually the Euclidean norm. As it will be seen in the next section, Martin Casdagli also worked with RBF models, as pointed out by Broomwhead and Lowe: “We note that in this application our approach is very close to that of Casdagli who has applied radial basis functions to the construction of nonlinear maps from time series data. (Here the authors quote Casdagli's paper as “submitted to * Physica* D.” That paper, that will be mentioned in the next section, was eventually published in 1989.) Unlike Casdagli who used strict interpolation, we will employ the least squares generalization.” By “strict interpolation” the authors mean that the number of unknowns (parameters) and constraints (built from data) is the same (more on this in Section 3.1.4).

This paper is curious in some aspects. First of all, it is clearly devoted to machine learning for purposes of classification. Several pages are devoted to the problem of approximating the exclusive-OR function. The last examples in the paper seem to be an afterthought and consist of the modeling of the doubling map (modulo 1) and of the quadratic map . Also, only three works are listed in the references which are related to modeling nonlinear dynamics and chaos: [36, 39] and a preprint by Lapedes and Farber [51]. (Such a manuscript was at the time submitted to Proceedings of the IEEE. It seems that this paper was never published as such, but has circulated as a technical report. This report is cited by a number of early papers on modeling nonlinear dynamics and chaos, proving to have been quite influential.) It is also very curious that no paper or result on embedding theory was used by Broomhead and Lowe. Because both their examples use 1D maps, there was no need for any serious embedding. In summary, the contribution of [38] seems to be the suggestion of using RBF models for nonlinear dynamics, but for the general view of the embedding-modeling problem we should look into [37] as the pioneering work. The fusion of such papers, that is, the use of RBF models in an embedding environment to model and predict nonlinear dynamics, happened in the next paper to be considered.

###### 3.1.4. Prediction of Chaotic Time Series

From the abstract of the paper we clearly read the objective of the paper: “numerical techniques are presented for constructing nonlinear predictive models directly from time series data” [39]. The basic motivation for such was to “convincingly distinguish low-dimensional chaos from randomness.” In fact, Casdagli argues in favor of his model-based approach when comparing it with the use of dimension calculations and Lyapunov exponents to distinguish noise from chaos.

With respect to the model class, this paper is remarkable because it compared polynomial, rational, and RBF models, plus local linear predictions. Very few papers present results for more than one model class, a more recent exception to this rule is [52].

As for the use of “strict interpolation” (Broomhead and Lowe) or simply “interpolants” (Casdagli) or “approximants” (Casdagli), the situation is not completely clear. The paper mentions both. When describing general techniques (polynomial and rational models), Casdagli mentions the least squares solution. However, when presenting the RBF model the only case considered is that of choosing the model structure so as to have an inverse problem with unique solution. At the end of that section, the least squares solution is mentioned again, but as “ongoing research.” One of the critical problems of the “strict interpolation” approach is the total lack of robustness to noise. In fact, Casdagli does not consider any noise in the “global modeling” examples, but only in the “distinguishing noise from chaos” examples. In two occasions interesting remarks are made. In discussing modeling difficulties Casdagli says: “this does not appear to be due to numerical errors in the least-squares fitting and matrix inversion algorithms” [39, page 343]. This gives the impression that he was actually implementing both: the least-squares algorithm (to fit approximants) and the matrix inversion algorithm (to find the solution to the strict interpolation problem). The first was probably related to polynomial and rational models and the latter to RBF models. The second case, when discussing the use of RBF models for predicting invariant measures, the number of data points is quoted as and the data were noise-free. This clearly suggests “strict interpolation.”

The RBF models considered by Casdagli were of the form (compare with (3.4)) where is the number of data iterates, is a basis of the space of polynomials of degree at most from to , with known, and are parameters. The author points out that “frequently the polynomial term is not included,” which is interesting in the light of other results, to be mentioned in Section 3.2.3.

Casdagli investigated the accuracy of short-term predictions based on model iteration and the * direct* method, which amounts to fitting a predictor with a single prediction horizon equivalent to the various one-step-ahead predictions of the * iterative* method. In the various cases investigated, the iterative method proved superior. A similar aim was pursued in the context of neural networks in [53], in the context of Volterra expansions in [54] and using a kernel-based approach in [43]. Finally, the author showed that the fitted RBF models were capable of reproducing some invariants, such as Poincaré sections and bifurcation diagrams. Although he only considered the noise-free case, a very significant step towards serious model validation was given. Perhaps the greatest lack in this relevant paper is that there is no mention on structure selection. Model instability under iteration is prematurely attributed to the model class and not to an unsuitable model structure.

By model class we mean the * type* of models used, as for instance polynomials, RBFs or neural networks. By model structure we mean the model * topology*, of a particular class. For instance, in the case of RBF models, the model structure is determined by the type and number of basis functions used, the lags used to compose the input space, and so on.

###### 3.1.5. Construction of Differential Equations

The early paper by Cremers and Hübler had the clear objective to “obtain a concise description of an observed chaotic time sequence” [40]. From the beginning, the authors were clearly concerned with chaotic data. The model class they considered was a basis of Legendre polynomials, and in the introduction of the paper it was stated that the true objective was to estimate the parameters of a differential equation. This more restricted aim was confirmed later in [55].

The authors discuss a few aspects of embedding and the effect of dynamical noise. In their numerical examples no noise is considered. They mention sucessful estimation of the parameters for two systems: Lorenz and an autonomous version of the van der Pol oscillator, for which the following 4th-order approximation was used: where are the parameters to be estimated. The Legendre polynomials and must be evaluated at and , in addition to the time derivatives of and at each point in state space considered. The derivatives were estimated numerically, but there is no clear indication as to how this was accomplished nor what would be the effect on such estimates of measurement noise (which they considered not). The authors justify the use of a polynomial expansion by stating “if nothing is known about the properties of the flow vector field, a fit by a polynom (sic) series is frequently favourable” [40, page 800]. This last remark results from a general theorem by Weierstrass stating that any analytical function can be approximated by an infinite polynomial expansion.

Breeden and Hübler later remarked that the 1987 paper also considered the discrete-time case, which is definitely not obvious. Then, so far the field of nonlinear dynamics is concerned, this paper seems to mark the beginning of building differential polynomial equations from data. Although no mention on important practical issues is made in this paper, such as structure selection, derivative and parameter estimation, it has the merit of having opened the way for a prosperous modeling class of techniques, to be surveyed in Section 3.2.1.

The citation relationships among these five pioneering papers are illustrated in Figure 2. In the center of the diagram the embedding papers [9, 10] appear as a common feature, with the exception of the paper [38] (see discussion in Section 3.1.3). This quotation diagram suggests that Cremers and Hübler were not aware of the other papers on model building. (Their paper was submitted in September 1986.) It is interesting to notice (not shown in the diagram) that Broomhead & Lowe and Casdagli mention the work of Lapedes and Farber (see [51]).

##### 3.2. Model Classes

In the description to follow, the order of the model classes mentioned is somewhat arbitrary. We follow a rough chronological order of the first papers in each class.

###### 3.2.1. Continuous-Time Polynomials

A number of papers appeared in the early nineties which had as a common goal fitting ordinary differential equations (ODEs) to observed data. As pointed out in Section 3.1.5, the paper [40] was pioneering in this field, although it did not deal with any practical issue. Unlike Cremers and Hübler, most papers that followed, instead of using a basis of Legendre polynomials, used “Taylor-series expansions.” As an example, such an expansion for a 2D system is [56]

A similar representation was used by Gouesbet who started investigating under which conditions it was possible to reconstruct the vector field from a single measurement of an original vector field [57, 58]. The reconstructed system was expressed as
and both polynomial and rational expansions for were investigated. The polynomial expansions of used were equivalent to (3.7). Although no reconstruction from data was performed in that paper, the challenge was acknowledged and pursued shortly after [59]. In the last paper, the important issue of estimating the derivatives based on finite-difference schemes from data was considered. This was an important step towards dealing with real data but a key hurdle still had to be transposed: measurement noise. To do this was one of the key objectives in [60], where this global modeling technique finally reached maturity. As pointed out in an early paper “noise removal is required because standard systems rely on derivative evaluations” [58, page 1795], to deal with noise was to * remove* it from the data, not so much as for discrete-time modeling techniques which are made to cope with noise.

Other polynomial expansions are to be found in the literature. Giona and colleagues investigated the use of a basis of polynomials to approximate both continuous-time and discrete-time dynamical systems [61]. In both cases the system was approximated by a linear combination of such polynomial functions which are obtained “from the knowledge of the hierarchy of moments.” The procedure is quite involved, and long data sets are required to estimate the parameters. In their examples, was used for the Hénon and Ikeda maps, and was used for the Lorenz system.

Practical implementation of global modelling using continuous-time model requires an integration scheme. For instance, the explicit Euler integration scheme
where with the time step may be used. It is known that the Euler is not very robust against time step change. This is why a Runge-Kutta scheme is most often used. Another alternative was proposed by [62]. They used an Adams-predictor-corrector scheme according to
where are the implicit Adams predictor-corrector coefficients [62]. designates the order of the corrector portion of the integration. The global model has a polynomial form and is optimized with the help of a minimum description length criterion and the error function
which is quadratic with respect to c*œ* fficients . This is thus a least square problem.

Structure selection issues for continuous-time polynomials have been discussed in [63, 64]. In [65] the authors use a priori knowledge of the driving force to help determining the model structure. When differential embeddings are used, the global model (3.8) is usually searched using a truncated polynomial expansion involving successive derivatives of the measured time series. In order to reduce the number of monomials, that is, the length of the model, it is possible to use a rational function whose monomials are selected according to an Ansatz library. Such a library contains canonical forms which can be mapped into an equivalent polynomial system whose variables are combinations of the measured time series and its successive derivatives. The canonical function thus only contains selected terms. This reduced function allows to avoid numerical instabilities usually encountered with rational function. This procedure was introduced in [66] and improved in [67]. For instance a very accurate 7-term global model was obtained from the -variable of the Lorenz system. A 26-term model was obtained from a copper electrodissolution experiments [68]. This model has to be compared to the 52-term model previously obtained in [69].

Other approximations for functions and in (3.7) include Volterra series expansions [70]. Examples of building global continuous-time models from real data can be found in a number of papers [69, 71–74].

###### 3.2.2. Neural Networks

A neural network is a model class that resembles some aspects of a brain. Conventional simplifications made for perceptron models are: (i) to take only one hidden layer of nodes, (ii) to consider the output node linear, and (iii) to consider all the activation functions of the hidden layer nonlinear are the same. A perceptron model with such features is illustrated in Figure 3(a) and can be described mathematically as
where indicates a weight (to be estimated) of the hidden layer that connects the th input to the th neuron of the hidden layer. is the weight (to be estimated) of the th hidden neuron output, ’s are constants, called bias parameters, and the neuron * activation function* is . Finally, and is the number of neurons in the hidden layer. The function shown in the right-hand side of (3.12) is often called * feed-forward* because there are no feedback loops * internal* to the network. It is important to notice that (3.12) is in the form . Common choices for nonlinear activation functions are Gaussian, sigmoidal, and the hyperbolic tangent. In [75] they discuss the use of polynomials as activation functions for neural networks. An accessible introduction on the subject of neural networks applied to modeling chaotic systems is given in [76].

**(a)**

**(b)**

**(c)**

The first papers to build this kind of models from chaotic data seems to have been the references [78, 79]. (An earlier paper built and characterized a chaotic neural model with a single neuron but the network was not trained from data [80].) More recent and impressive results have been discussed in [81].

A rare study of bifurcation diagrams achieved by neural models has been presented in [82]. And an early work which reports on the improved performance of pruned networks is [83]. Neural networks have been applied in a number of instances, some examples include [84–86].

###### 3.2.3. Radial Basis Function Models

The RBF model class is shown in Figure 3(b). One important difference with respect to the first class is that now there are no weights associated to the connections between the inputs and the nodes in the hidden layer. The —which is usually chosen as a radial function—is nonlinear and often depends on certain tuning parameters which are usually known when the weights associated to the connections indicated by solid lines (Figure 3(b)) are to be estimated. As a consequence, the model is linear in the parameters and can be written thus (compare with (3.4) and (3.5) where is the number of radial basis functions. , and are the unknown parameters to be estimated. Two of the first works to use this model class were mentioned in Sections 3.1.3 and 3.1.4 above. Work using this type of model has been recently surveyed in [87].

A number of papers have described the application of RBF networks in the modeling of nonlinear dynamical systems and chaos [88–91]. An accessible introduction on the subject of RBF models applied to modeling chaotic systems is given in [76].

###### 3.2.4. Discrete-Time Polynomials

The model class illustrated in Figure 3(c) is quite similar to the preceding one. The main difference is that the basis functions are usually different, that is, . Another important difference, not revealed in the figure, is that, whereas it is usually assumed in the second class that the input vector is uniform, in the third class such uniformity is not required (see Section 4.1). However, the main difference is that various basis functions are often used in Figure 3(c) in such a way as to enable matching different data features. One possible choice of basis functions is monomials of different degrees of nonlinearity in the range . Each th-order term can contain a th-order factor in and a th-order factor in and is multiplied by a coefficient as follows:
where
and the upper limit is if the summation refers to factors in or for factors in . In the case of noisy data , noise terms * must* be included in (3.14) to avoid the error-in-the-variables problem (see Section 5).

The choice of monomials as basis functions constrains the resulting models to those cases in which the dynamics underlying the data can be approximated by a linear combination of nonlinear monomials. For systems that are more strongly nonlinear, other basis functions should be preferred. On the other hand, the choice of monomial basis functions enables building models which are more information dependent than models for which all the basis functions are of the same type.

Discrete-time polynomial models have been used in a number of papers within the field of nonlinear dynamics and chaos. Bagarinao and colleagues used this model class to reconstruct bifurcation diagrams from data [92–95]. Data analysis and modeling applications include [96–103]. Applications to real data—in the field of nonlinear dynamics—include [104–106].

###### 3.2.5. Rational Models

Rational models are composed by the division of two polynomials such as (3.7) in the continuous time, and such as (3.14) in discrete time. It should be noticed at once that whereas * deterministic* polynomial models are linear-in-the-parameters, rational models, even deterministic, are nonlinear-in-the parameters. This calls for a series of cares in their estimation [107–109].

Rational models for continuous-time systems were considered in [57], although only analytically, as no model building was attempted in that paper. Rational models were estimated from data produced by an electronic oscillator in [110]. The practical difficulties with rational models seem to be revealed by the small number of papers that deal with such models in the context of nonlinear dynamics and chaos. At least for continuous-time model, rational functions lead to many numerical instabilities that are not trivial to remove. Nevertheless, such a problem is no longer observed when return maps are considered. Thus, Lorenz map and Ikeda map have been successfully reproduced with rational models [111].

###### 3.2.6. Wavelet-Based Models

Wavelet models, wavelet networks, or just * wavenets* are similar to the linear-in-the-parameter models described in Sections 3.2.3 and 3.2.4. The main difference is that instead of radial basis functions or monomials, wavenets are composed of a linear combination of a set of wavelet functions formed from a so-called wavelet prototype by dilations and translations such as
where and . As for RBF models, there are various choices of the wavelet function . There seems to be no clear guidance as to which type of function to choose. Since the various functions used to compose the model are different (for different and ), then the structure selection for this type of models resembles that of the discrete polynomials described in Section 3.2.4 because the user must chose not only the number of basis functions to use, but also the features of them (the s and s).

One of the first papers to use wavenets in order to model a chaotic system was [29]. The authors used a mexican hat type wavelet function to model various benchmark models. Optimized tensor product wavelet models were obtained for data from a vibrating string experiment in [112], and B-splines wavelet models for the nonautonomous Duffing oscillator were discussed in [113]. Wei and Billings investigated the use of structure selection algorithms for wavelet models in the context of chaotic systems [114].

###### 3.2.7. Fuzzy Logic

Fuzzy models use internal variables which are linguistic. At a certain point of modeling the numerical variables must become linguistic by taking labels such as small negative large positive. The core of a fuzzy model then consists of a set of rules of the type: if is large, then is medium. Finally, the linguistic variables must be changed back to numerical variables. Of course there are a number of subtelties which actually make this type of modeling work [115–118].

From the list of model classes surveyed in this paper, it seems to us that fuzzy logic models have been the least used to model nonlinear dynamics and chaos. One of the first papers on the subject seems to be [119]. A comparison of serveral techniques, including fuzzy models, in a problem of nonlinear prediction can be found in [120].

###### 3.2.8. Input-Output Models

Input-output models are not a class of models but simply mean that the model class caters for the use of input signals. By input we mean external, time-dependent signal(s). Therefore an input-output model will be nonautonomous.

For the modeling of input-output models it is necessary to record not only the output, but also the input. As a general rule, global differential equations built from data are autonomous (for an exception, see [65] where the method applies to harmonically driven systems). In [121] the authors discuss the use of a bifurcation parameter (as an input) in the estimation of differential equations. However, in general, if inputs must be handled, then discrete-time models are more convenient.

There have been attempts to develop an embedding theory for input-output models [122, 123]. The feasibility of this theory and its benefits are not totally clear. Fortunately, there are other theoretical approaches to the input-output case [25, 124], and a vast number of examples illustrate the practical value of such models.

Nonautonomous chaotic models have been obtained in [65, 113, 125].

###### 3.2.9. Equivalence between Continuous and Discrete-Time Polynomials

As surveyed above, there are various methods for building continuous-time models, typically in the form of ODEs. However, the data available is always discrete in time, and the simulation of ODEs is also carried out on digital computers. Therefore, at some stage some type of discretization of the ODEs must occur. In some methods such discretization is intentional [62, 126], however, in such cases the final aim was to have an ODE at the end of the day. What happens to the ODEs fixed-points after it is discretized was discussed in [127].

A more fundamental question concerns the choice of the integration step of numerical integration schemes. Given an ODE, say that produces a chaotic attractor, what happens to the attractor if the integration is varied? If it becomes too great, certainly the attractor will change, but will the “new” attractor still be some attractor of the original system? These questions have been addressed in [128]. In fact, so long as the frequency corresponding to the integration step is more than twice the highest frequency in the Fourier spectrum (Nyquist criterion), the “new” attractor is still topologically equivalent to an attractor solution to the set of ODEs. Only a displacement in the parameter space is induced by the integration step, which functions as a bifurcation parameter in some cases. It is only when the integration step is larger than the maximum established by Nyquist's criterion that the obtained attractor is spurious, that is, associated with numerical instabilities.

#### 4. Structure Selection

In few words, the problem of structure selection is that of deciding * how many * and in some cases * which* function basis (see, e.g., Figure 3(c)) should be used to build a dynamical model from data. This problem gains different tones depending on the model class considered but it is always present. Often in the context of networks, structure selection is known as “topology determination.” This issue has been addressed in a number of papers [129–133]. In the case of fuzzy models, structure selection boils down to define how to partition the input space and to define the number of rules. Structure selection issues for fuzzy models have been investigated in [35, 134, 135]. Even for local linear models, structure selection would be the definition of the embedding and the number, size and location of neighborhoods, and so forth.

In the field of system identification, the issue of structure selection for nonlinear models gained much attention by the late 80 s. However, in the field of nonlinear dynamics and chaos this trend was delayed a few years. In the early (and sometimes late!) 90 s several papers simply assumed the structure known [55, 136–138]. A good example of what happens when structure selection is not considered during model building is provided in [139].

Soon the practical importance of model structure selection started to be recognized and dealt with in various ways and in varying degrees of success [62, 140, 141]. By the mid 90 s most authors became aware of the necessity of structure selection regardless of the model class chosen. However, the problems due to overparametrization were not well diagnosed, usually being attributed to an allegedly poor model class, poor data, and so on. A detailed study of the dynamicall effects of overparametrization on model attractors and bifurcation diagrams was produced in [142]. Other similar studies followed [143–147].

The key point in structure selection is to choose a model structure that is as simple as possible, but also sufficiently complex to capture the dynamics underlying the data. One of the easiests (and less efficient) methods for model selection is called zeroing-and-refitting [141] which consists of estimating the parameters for a large model—which is already a problem—and to eliminate those which are “sufficiently small.” Of course, the “size” of a parameter will depend on the particular window of data used, on the noise variance, and on the particular type of basis function used, especially if the data are not normalized [148]. Therefore, zeroing-and-refitting does not generally work in practice [62].

An alternative and simple way of tackling this problem for nonlinear systems has been proposed in [149]. Instead of of deleting specific model terms, entire term clusters should be deleted, based on the behavior of the respective cluster parameters. Also, certain term clusters can be deleted to force symmetry [144, 150], which is necessary in some cases [151]. The concept of the nearest neighboors was used in [152] to determine the best input and output lags in NARX (nonlinear autoregressive with exogenous inputs) models.

One way of addressing the structure selection problem are to define some measure of complexity for a given model. In their paper Crutchfield and McNamara were concerned with quantifying and limiting the complexity of their models. They chose models that minimized the model entropy and argue the importance of such a measure in the context of chaotic data [37]. The * maximum description length* (MDL) [153] has been used in several papers [132, 154]. Other ways of tackling the structure selection problem is to define a measure of quality for each regressor * in an orthogonal space* and then use such a criterion to select those regressors that are most relevant. This is the basic idea behind the * Error Reduction Ratio * (ERR) proposed in [155] and used in [156].

Unfortunately, there is no definite solution to the model structure selection problem so far. Situations in which the current methods fail abound. Brown and colleagues report failure of the MDL criterion [150], and Piroddi provides simple examples in which the ERR fails [33]. Fortunately, the benefits and successes outnumber the failures. New algorithms for structure selection are published at an amazing pace [30–35]. The authors of this survey are convinced that the use of a priori information (see, e.g., [144, 151, 157]) will prove useful in addition to the new techniques.

##### 4.1. Uniform and Nonuniform Embeddings

A key issue in modeling nonlinear dynamics is that of selecting an appropriate embedding space. In principle, this would include two stages: the choice of observables [158, 159] and the choice of embedding parameters [160]. In many practical situations, although the observable is determined before data acquisition, the embedding parameters—basically the embedding dimension and the delay time—can be determined by the user a posteriori. In [161] a method has been suggested to try to detect if a given variable is appropriate for global modeling. Using the nomenclature of Figure 3, the input vector, at the left-hand side of the models, determines the embedding space.

It has been duly pointed out that the problem of choosing an embedding in the context of model building is a bona fide stage of the modeling procedure [162]. Also, the uniform embedding defined by taking the elements of in (3.1) to be the coordinates is not necessarily optimal. Therefore, which elements (coordinates) should be chosen to compose is also part of the modeling problem. An optimal solution to such a problem might require an embedding space in which the “temporal distances” from one coordinate to another are not necessarily the same. The authors of [162] classify such embedding spaces as irregular. Despite this, to build discrete-time models using regular embeddings seems to be the rule rather than the exception in most of the literature.

To enable irregular embeddings (see Figure 3) it suffices not to connect certain input nodes to the middle layer. Here it is pointed out that the choice of particular basis functions is, in some cases equivalent to the choice of embedding coordinates which could turn out to be irregular. This is one of the advantages of using the ERR criterion to accomplish structure selection. In fact, (13) and (21) of [156] and (37) of [125] are some examples of models (automatically) built on irregular embeddings. In conclusion, it becomes clear that the modeling procedure followed in [125], for instance, includes the choice of embedding coordinates as a part of the modeling procedure, as pointed out in [162].

#### 5. Parameter Estimation

Before, it is noted that synchronization has been used in parameter estimation problems [163–166].

One of the most commonly used algorithms for estimating unknowns in linear-in-the-parameters models is the least-squares algorithm or some generalization of it [34]. In the case of noisy measured data the least-squares estimator is biased due to the error-in-the-variables problem [167, 168] because such an estimator does not take into account the measurement errors [169]. Fortunately, there are well-established and robust algorithms for such situations which are mildly nonlinear [22]. Some of such algorithms solve the optimization problem in a higher-dimensional space thus successfully avoiding bias. Due to the unbiased estimators—some of which take into account the measurement errors—in the field of system identification, the error-in-the-variables problem only occurs when both input and output are noise contaminated. When only the output is noisy, the unbiased estimators successfully circumvent the error-in-the-variables problem.

All such algorithms are based in some norm of one-step-ahead prediction error. An alternative would be to minimize some norm of a -step-ahead prediction error, with the advantage of gaining robustness to noise. In such a case, however, the resulting optimization problem becomes strongly nonlinear and should be solved with adequate tools. The use of a back-propagation algorithm in this context was used in [170]. In [43], which also followed a local model procedure, the local maps were fitted to the data in such a way as to be consistent with -step-ahead predictions. The authors of that paper reported that estimating parameters by least squares did not always yield good results. This should come as no surprise because they were fitting local maps and therefore required additional information to constrain parameter estimation. Global models are less sensitive to this type of problem.

In what concerns parameter estimation, an important difference between [59, 60] and [56] is that, whereas the former uses linear estimators—at the expense of having to estimate derivatives from data—the latter numerically integrates (3.7) and uses the observed data as they are. This procedure, which was called the * trajectory method*, results in an optimization problem which the authors solve using standard routines in the IMSL10 library. In [55], the authors provide more information on this estimation procedure and also consider the discrete-time case. The trajectory method in the context of RBF models is discussed in [76]. It should be noticed that [55, 56] do not consider modeling problems such as structure selection. The application of the trajectory method was illustrated in [171], and the use of the multiple shooting approach to estimated parameters of ordinary differential equations was discussed in [73, 136, 169].

For * neural networks* the weights and the bias terms are determined by optimization algorithms that search to minimize a cost function which usually depends on the difference between the given data and the network output. Because such models are nonlinear in the parameters, the optimization problem must be solved using some kind of nonlinear estimator, such as the back-propagation algorithm. In a recent study, different approaches to network training were compared [172].

Breeden and Packard have discussed the use of genetic algorithm and evolutionary programming for solving a number of optimization problems which occur in the modeling of nonlinear dynamics and chaos [173]. Other parameter estimation techniques include a generalized Gauss-Newton method for maximizing a likelihood function [74]; ridge regression or regularization [49, 89]. The multiple shooting approach has been used and discussed in a number of papers [136, 174]. Nonparametric estimation (a rare example in nonlinear dynamics) was provided in [175]. The use of regularization has been considered in [49, 76]. A specific procedure for estimating three parameters of a differential-delay equation was put forward in [176] and another method, specific to a certain class of one-dimensional maps, was put forward in [177]. A multiobjective estimator was discussed in [178]. Finally, a number of issues directly concerned with the estimation of dynamical invariants and indirectly related to the estimation of model parameters were discussed in [167].

#### 6. Model Validation

The issue of model validation is vast. In order to cover such a wide subject in limited space, we will base this section on the paper [179] and refer the reader to [180] for a coverage of the field up to the beginning of the nineties.

Before actually starting to describe some results in the literature, a few remarks are in order. First and foremost, the challenge of model validation or of choosing among candidate models should take into account the intended use of the model. Hence, a model could be good for one type of applications and, nonetheless, perform poorly in another. In the context of this paper, the main concern is to assess the model dynamics. A different concern, though equally valid, that would probably require a different approach would be to assess the forecasting capabilities of a model. Second, it should be realized that two similar though different problems are: (i) model validation, which usually aims at an * absolute* answer like: valid or not valid, (ii) model selection, which usually aims at a * relative* answer such as: model is better than model according to criterion . Finally, when it comes to model validation, the safest approach is to use many criteria, rather than just one.

Although very popular in other fields, the computation of various * measures of prediction errors* (one-step-ahead and free-run) in the case of nonlinear dynamical systems is not conclusive in what concerns the overall dynamics of the identified model [81, 132, 180], though it does convey much information on the forecasting capabilities of a model.

Subjective though it is, the * visual inspection of attractors* (or simply comparing the morphology of two time series) is quite common a way of assessing the quality of models [52, 60, 65, 72, 73, 78, 91, 119, 132, 162, 168, 171, 174, 181–183]. Such a procedure is not only subjective but also ineffective to discriminate between “close” models, that is, models with slight, but important, differences in their dynamical behavior. What renders this procedure subjective is the fact that no quantitative mechanism is used to compare how close are two reconstructed attractors. In this respect the work by Pecora and coworkers could be an alternative for determining how close are the original and the model attractors [184]. To the best of our knowledge the statistic measures that put forward in the mentioned paper have not yet been used in the context of model validation.

Still in relation to the visual inspection of attractors, it should be noticed that in many practical instances there is not much more that can be done consistently. For instance, in the case of slightly nonstationary data, to compare short-term predictions with the original data is basically the best that can be done. Building a model for which the free-run simulation approximates the original data in some sense is usually a nontrivial achievement. (In this respect we rather disagree with [185] who consider free-run simulation of models a trivial validity test.)

Other * attractor features* are still in common use when it comes to model validation. Among such features the following are frequently used: (some of such properties have been recently discussed in the context of model validation in [186]) Lyapunov exponents [43, 81, 97, 145, 147, 187, 188]; correlation dimension [78, 81, 132, 188]; location and stability of fixed-points [112, 125, 183]; (In particular, it has been shown that fixed-point stability of nonlinear models is consistent with breathing patterns found in real data [189].) Poincaré sections [81, 176]; geometry of attractors [85]; attractor symmetry [150, 151]; first-return maps on a Poincaré section [66, 190]; probability density functions of recurrence in state-space [191]; topological features such as linking numbers and unstable periodic orbits (UPOs) [192–194].

Meaningful validation can only be accomplished by taking into account the intended use of the model. A model that provides predictions consistent with the observed data will probably not be a good model to study, say, the sequence of bifurcations of the original system.

The use of model free-run simulations in the context of surrogate data analysis for model validation was suggested in [195]. For details on surrogate data analysis the reader is referred to [196, 197] and for achemical process application; see [198]. The main idea is to use estimated models to produce a large number of time series and to use some test statistic to try to assess if it is likely that the data could have been produced by a model (or models) such as those used to produce the surrogates. Of course, a key point in this procedure is the choice of the test statistic. For instance, suppose that the correlation dimension or the largest Lyapunov exponent is chosen to compare a set of free-run simulated data with the measured data. Suppose further that the null hypothesis is that the measured data is compatible with the estimated model. Even if we cannot reject the null hypothesis, that does not guarantee equivalence of * dynamical* behavior because quite different attractors may have similar indices [199]. This is also true for the correlation dimension.

A related method has been described in [200] and has been named consistent nonlinear dynamic (CND) testing. The main idea is depicted in Figure 4.

**(a)**

**(b)**

In the following two subsections we briefly discuss two other procedures which the authors have developed for model validation. The first one, following an early paper by Brown and colleagues [201], is based on the concept of synchronization. The second procedure, which is much more demanding, but also provide a much deeper insight into the model dynamics is based on topological analysis.

##### 6.1. Synchronization

As done in Figure 4, let us denote the “true” dynamics by and a given model by . In the present section it is assumed that the dynamics are continuous, that is [201], where it has been assumed that and where the matrix denotes the coupling between the true system and the model. The scheme illustrated in (6.1) will be referred to as dissipative synchronization or entrainment.

The rationale behind this procedure is as follows. Assume that the data lies on a chaotic attractor. In many situations, provided is adequately chosen and , . That is, the model will synchronize to the system. If and differ slightly, the error will not go to zero but will stay around the origin of the error space. The average distance to the origin of such a space will depend on . Therefore such a distance (which in practice is a measure of quality of “synchronization”) is a measure of how far the estimated model is from the true dynamics [201].

The difference —where is the model state vector * without* any driving force—is compared to . If drops below a certain threshold ( was used in [185]) and is “clearly” smaller than , that is, and it is assumed that the model is synchronized to the data, therefore should be sufficiently close to .

As it often happens in the realm of model validation, this procedure also is highly subjective, since it requires an ad hoc threshold, mentioned in the previous paragraph. In what concerns dissipative synchronization, it is well-known that in many cases by increasing the strength of the coupling (matrix ) it is possible to force a greater degree of synchronization and, in some cases, even attain identical synchronization [201]. For instance, in [185] the authors found that for values of the coupling greater than 2, models of an electronic circuit would synchronize with the measured data. On the other hand, in [202] they have found a lower bound of 0.1 for the coupling strength in order to guarantee synchronization between the Rössler system and perturbed versions of the original equations. It therefore becomes clear that it is sometimes possible to synchronize even a poor model to the data so long as the coupling strength is made sufficiently large. In fact, is has been shown that even different systems can synchronize, at a rather high cost [203].

Therefore although the concept of synchronization could be useful in the context of model validation, it becomes apparent that some adjustments are required to render the procedure more practical. In [179] some steps were given in this direction, and it was shown using numerical examples that synchronization yielded similar results to the use of bifurcation diagrams for model validation [180] but at a much lower computational cost. Bifurcation diagrams in model validation were used in [86, 89].

##### 6.2. Topological Analysis

A topological analysis usually starts by computing a first-return map to a Poincaré section of the phase portrait. This is indeed useful for extracting periodic orbits using a close return method although that other techniques can be used. In the best cases, the first-return map is made of monotonic branches separated by an critical point. The first-return map induces therefore a partition of the phase portrait in zones. A symbol is associated with each branch.

Chaotic trajectories and the periodic orbits constituting their skeleton are thus encoded over the symbol set . Even symbols are associated with increasing branches. Increasing branches are preserving order and decreasing branches are reversing order. The phase portrait is thus divided in preserving order and reversing order strips. A preserving order strip presents an even number of half-turns while a reversing order strip represents an odd number of half-turns. The attractor can thus be schemed by a branched manifold—a template or a knot holder—on which periodic orbits can be drawn.

All topological properties are encoded in the template. Thus it is possible to extract topological invariants like linking numbers from a template construction. Linking number between two periodic orbits is the most often used topological invariant. It can be counted on a regular plane projection of the two periodic orbits. Each crossing (in the plane projection) between the two orbits is associated to according to the third coordinate of each orbit segment. The linking number is then the half sum of the orientated crossings. The template is valid when all linking numbers it predicts are equal to those counted from the periodic orbits extracted from the attractor studied. Many details about topological analysis can be found in [204].

Topological analysis is surely the strongest validation method. Unfortunately, it is up-to-now restricted to 3D dyanmics. Topological validation of global models was performed for the copper electrodissolution experiments [69], a string experiment [193], a Belousov-Zhabotinskii reaction [205], and so forth.

#### 7. Modeling with a Priori Knowledge

We start this section with a quotation from one of the pioneering papers of the field: “If extra information is available about a system in addition to a time series, such as explicit underlying partial differential equations or symmetries, then the inverse approach as presented here does not exploit such information. Consequently, for such systems it may be possible to improve significantly upon the inverse approach using other techniques” [39, page 354].

Despite opinions as the one just quoted, the problem of building models from data * and* additional information (sometimes referred to as a priori or * auxiliary*) has been postponed. In the context of linear models or nonlinear process models, some results are available [206–210]. However, as pointed out in [211] there seems to be less applications of gray-box modeling in the realm of nonlinear dynamics.

The general setting is to have the dynamical data * plus* some other source of information and to use both in building a model. The knowledge of the model equations is optimistic to be considered a priori information, and here it is assumed that structure selection techniques (Section 4) will be used to find out which basis functions should be used to compose the model. If all that is desired of the system is already available in the particular set of data used for model building, then granted that the model class is sufficiently general and that the model structure is adequate, the resulting model should be a sufficiently accurate representation of the system. In this case there is no motivation to use an additional piece of information. However, often in practice, for a number of different reasons—such as presence of noise, poor choice of observable, poor frequency content in the data, limited amplitude excursion, and the like—the available data either does not have all the desired information about the system or such information is difficult to obtain. In such cases it is conceivable that additional information is available and it is natural to enquire if it is possible to build the model. Thus, more often than not, if the data is of “good” quality and “sufficiently” complete, black-box modeling should be the practitioner's first choice.

In the realm of nonlinear dynamics, procedures have been put forward for building models using auxiliary information. A number of fixed-points were used in [144], the location of fixed points were used in [178]. Information about the harmonic driving were used to choose the model structure in [65]. Information about the symmetry was used to constrain not only the topology but also the parameter estimates of network and radial-basis function models [151]. In the last quoted paper, for instance, symmetry was imposed on a multilayer perceptron neural network in order to guarantee the pitchfork bifurcation—which is structurally unstable—when modeling the Duffing-Ueda oscillator. Topological features such as folding and tearing mechanisms [199, 212] in addition to the location and local eigenstructure of fixed points have been used in [213]. Lastly, information about the first period-doubling bifurcation diagram was used in [77].

#### 8. Remaining Challenges and Conclusions

As early as 1987, the issue of modeling spatiotemporal nonlinear dynamical systems can be found in the literature [37]. Defining which type of a priori information is usable and then learning how to use it for a given model class remain a challenge. In this difficult problem it is necessary to find adequate duplets of information and model class. To posses a priori information and not knowing how to use it for a given model class is useless. The reverse is equally true. The combination of models with different properties in an assemble approach [52] seems a promissing line for future research in nonlinear dynamics and chaotic systems.

Over fifteen years ago Robert Gilmore, referring to model building, mentioned: “There are at present two distinct methods to model data. One is analytic, and has not been extensively used. The other is topological, and has been used even less” [214, page 515]. Although this paper did not aim at describing in detail the significant progress made in the field of “topological analysis,” hopefully the reader did get the feeling that much has been done. On the other hand, concerning the “analytic methods” for modeling data, which was the main focus of this paper, it should become clear that quite a lot has been achieved since Gilmore's summary was written.

The view presented in this work, as any review paper, is strongly influenced by the authors' experience. A varied list of introductory and review articles was included. A rather short list of benchmark models and widely available data with indication of works that have used such models and data were provided. Although every effort has been made to survey such a wide subject in a comprehensive way, it goes without saying that a complete list of papers is outside the authors' reach. Nevertheless we are sure that the present survey will serve as a good starting point for future works in modeling nonlinear dynamics and chaos.

#### Appendix

#### A. Benchmark Models and Data Sets

As a helpful aid to those involved in developing new tools for modeling nonlinear dynamics and chaos, in this appendix we list some benchmark toy models and benchmark data sets (widely available) and point out papers that have discussed their modeling and analysis.

##### A.1. Maps

*Logistic*

The logistic map was originally investigated by Robert May [215, 216]. Papers that investigated the estimation of models from data produced by the logistic map include [38, 125].

*Hénon*

The logistic map was originally proposed in [217]. It was considered as a benchmark for modeling in [61, 125, 168, 218]. This map was modeled using a kernel estimation approach in [43], recurrent networks in [84], and a wavelet model in [114].

*Ikeda*

The Ikeda map was introduced in [219] and considered as a benchmark in [39, 61]. In [29] this map was modeled using wavelet networks, and in [132] results are reported for neural networks.

##### A.2. Autonomous Systems

*Van Der Pol's Oscillator*

The autonomous van der Pol oscillator settles to a limit cycle and therefore can be modeled. This oscillator has been considered in [119].

*Lorenz*

The Lorenz system was introduced in [4] and considered as a benchmark in [39, 76], although only model prediction errors are given. The polynomial and rational expansions of that system from a single observable were considered in [58], and the construction of ODEs from data was presented in [59, 60, 62]. This system was modeled using a kernel estimation approach in [43], in [125] it was modeled usign discrete-time polynomials, and in [29] it was modeled using wavelet networks.

Fuzzy models for this system were built in [119]. A piece-wise affine model for this system was developed and analyzed in [213], in a sense such a model bears resemblance with Takagi-Sugeno fuzzy models [116]. Unconstrained [85] neural networks and constrained RBF models [151] were also used in the modeling of this system.

*Rössler*

The well-known Rössler system was proposed in [220]. The polynomial and rational expansions of that system from a single observable were considered in [57], and the construction of ODEs from data was presented in [59, 60]. Fuzzy models for this system were built in [119, 125] it was modeled usign discrete-time polynomials. A piece-wise affine model for this system was developed and analyzed in [213], and a neural network was reported in [132].

*Makey-Glass*

The Makey-Glass delay-differential equation was introduced in [221] and considered as a benchmark in [39, 76, 125]. In [29] this system was modeled using wavelet networks, and in [84] the reader will find results with feedforward and recurrent neural networks.

##### A.3. Nonautonomous Systems

It must be pointed out that a nonautonomous oscillator or order driven by a single frequency, say, an input of the type is, in practice, an autonomous system with dynamical order [191]. In such cases the resulting models do not have an external variable or which can be * any* time-dependent signal.

*Duffing's Oscillator*

This oscillator was considered * for a fixed input* of the type in [37]. In [125] a model valid for any input was built from data. The estimated model and original oscillator have very similar bifurcation diagrams. Wavelet models with similar features were obtained in [113]. A close look at the pitchfork bifurcations will reveal that they are not (the symmetrical) pitchforks but rather (the unsymmetrical) saddle-node bifurcations. Because symmetry is structurally unstable, any small deviation due to noise or “imperfect” estimation from data will destroy it, and with it the hope to reproduce a pitchfork bifurcation. However, by imposing symmetry it is possible to recover the correct bifurcation pattern. This has been done with neural network models for the Duffing oscillator in [151].

*Van Der Pol's Oscillator*

This oscillator appeared in [222]. An experimental implementation of this oscillator was considered * for a fixed input* of the type in [37]. In [125] a model valid for any input was built from data. The estimated model and original oscillator have very similar bifurcation diagrams. Wavelet models for this oscillator have been obtained in [114].

##### A.4. Benchmark Measured Time Series

Several papers in the literature illustrate the proposed techniques using measured time series. It would be impossible to list all of them. In this subsection, however, we list those time series that are most commonly used because they are widely available and therefore constitute typical benchmarks.

*Santa Fé Time Series Competition Data*

In 1994 the Santa Fé Institute promoted a time series prediction competition, which is described in [223]. Some measured and simulated time series were made available as benckmarks for the competition. The description of these data and the files can still be found at http://www-psych.stanford.edu/andreas/Time-Series/SantaFe.html.

The Laser data was considered in [81, 91]. The biomedical data set has been described in detail by Rigney and coworkers [224] in view of the Santa Fé Time Series Prediction and Analysis Competition [225]. Such data were considered in [189, 226–230].

*Wolf's Sunspot Number Time Series*

These data can be found at http://www.ngdc.noaa.gov/ and have been considered in [20, 49, 91, 106, 132, 231, 232]. In the last two references, the original data was transformed to a symetrical space in which the modeling is easier. The transformed data can be found at http://www.atomosyd.net/.

*Copper Electrodissolution Data*

These data can be found at http://www.atomosyd.net/spip.php?article40 and have been considered in [68, 69]. They were recorded by Zihao Fei and Jack Hudson. A 52-term global continuous-time model [69] and a 26-term model [68] were obtained from the data. A structure selection using an Ansatz library was used in the second case.

*Canadian Lynx*

These data results from the recrods of the Hudson Bay Company regarding the populations of lynx and hares. They can be found at http://www.atomosyd.net/spip.php?article39 and have been considered in [233]. They were recorded by Zihao Fei and Jack Hudson.

*Electronic Oscillators*

The electronic oscillator described in [234] was modeled and analyzed in [71, 73]. These data can be found at http://www.y2k.maths.ox.ac.uk/.

The so-called Chua's circuit [235] was modeled and analyzed in [236] using discrete-time polynomials, in [64] usign continuous-time polynomials and in [110] using rational models. Neural networks from simulated data of this circuit are described in [237]. These data can be found at http://www.cpdee.ufmg.br/~MACSIN/.

#### Acknowledgments

This work has been partially supported by CNPq and CAPES (Brazil) and CNRS (France). The authors are grateful to the editors for their encouragement and continued assistance in the preparation of this survey.