Complexity

Volume 2018, Article ID 5608286, 11 pages

https://doi.org/10.1155/2018/5608286

## A Multidimensional Data-Driven Sparse Identification Technique: The Sparse Proper Generalized Decomposition

^{1}ESI Chair, ENSAM ParisTech. 151, bvd. de l'Hôpital, F-75013 Paris, France^{2}LAMPA, ENSAM ParisTech. 2, bvd. de Ronceray. F-49035 Angers, France^{3}Aragon Institute of Engineering Research, Universidad de Zaragoza. Maria de Luna, s.n. E-50018 Zaragoza, Spain^{4}Laboratori de Càlcul Numèric, Universitat Politècnica de Catalunya. Jordi Girona 1-3, E-08034 Barcelona, Spain^{5}ESI Group. Parc Icade, Immeuble le Seville, 3 bis, Saarinen, CP 50229, 94528, Rungis Cedex, France

Correspondence should be addressed to Elías Cueto; se.razinu@oteuce

Received 6 July 2018; Revised 11 September 2018; Accepted 25 September 2018; Published 1 November 2018

Academic Editor: Diyi Chen

Copyright © 2018 Rubén Ibáñez et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

#### Abstract

Sparse model identification by means of data is especially cumbersome if the sought dynamics live in a high dimensional space. This usually involves the need for large amount of data, unfeasible in such a high dimensional settings. This well-known phenomenon, coined as the curse of dimensionality, is here overcome by means of the use of separate representations. We present a technique based on the same principles of the Proper Generalized Decomposition that enables the identification of complex laws in the low-data limit. We provide examples on the performance of the technique in up to ten dimensions.

#### 1. Introduction

In recent years there has been a growing interest in incorporating data-driven techniques into the field of mechanics. While almost classical in other domains of science like economics, sociology, etc., big data has arrived with important delay to the field of computational mechanics. It is worth noting that, in our field, the amount of data available is very often no so big, and therefore we speak of data-driven techniques instead of big-data techniques.

Among the first in incorporating data-driven technologies to the field of computational mechanics one can cite the works of Kirchdoerfer et al. [1, 2], or the ones by Brunton et al. [3–5]. Previous attempts exist; however, to construct data-driven identification algorithms, see, for instance [6, 7]. More recently, the issue of compliance with general laws like the ones of thermodynamics has been also achieved, which is a distinct feature of data-driven mechanics [8]. Other applications include the identification of biological systems [9] or financial trading [10], to name but a few.

The problem with high dimensional systems is that data in these systems is often sparse (due precisely to the high dimensional nature of the phase space) while the system has, on the contrary, low dimensional features—at least very frequently. Based on this, a distinction should be made between methods that require an a priori structure of the sampling points and others which do not require such a regularity.

Regarding the methods that need a rigid structure in the sampling points, the Nonintrusive Sparse Subspace Learning (SSL) method is a novel technique which has proven to be very effective [11]. The basic ingredient behind such a technique is that the parametric space is explored in a hierarchical manner, where sampling points are collocated at the Gauss-Lobato-Chebychev integration points. Also, using a hierarchical base allows improving the accuracy adding more hierarchical levels without perturbing the previous ones. To achieve such hierarchical property, just the difference at a given point between the real function minus the estimated value, using the precedent hierarchical levels, is propagated. For more details about the method, the reader is referred to [11]. However, in the high-dimensional case, this technique shows severe limitations, as will be detailed hereafter.

On the other hand, nonstructured data-driven techniques are commonly based on Delaunay triangularization techniques, providing an irregular mesh whose nodes coincides with the sampling points. Afterwards, depending on the degree of approximation inside each one of the Delaunay triangles, it gives rise to different interpolation techniques; i.e., linear, nearest, cubic, and natural, among other techniques, are commonly used. Apart from techniques that depend on a given triangularization, it is worth mentioning Kriging interpolants as an appealing technique to provide response surfaces from nonstructured data points. The key ingredient behind such technique is that each sampling point is considered as a realization of a random process. Therefore, defining a spatial correlation function allows to infer the position of unknown points just like providing confidence intervals based on the distance to the measured points. Nevertheless, the calibration of the correlation matrix has an important impact in the performance of the method itself.

Kriging also possesses a very interesting property: it is able to efficiently filter noise and outliers. Therefore, it is expected that it also could help us in problems with noise in the data.

However, in high dimensional settings, all of the just mentioned techniques fail to identify the nature of the system due precisely to the curse of dimensionality. A recent alternative for such a system could be Topological Data Analysis (TDA), which is based on the use of algebraic topology and the concept of* persistent homology* [12]. A sparse version of this technique also exists [13].

Hence, if a competitive data-driven identification technique is desired, such a technique should meet the following requirements:(i)*Nonstructured data set*: this characteristic provides versatility to the method. Indeed, when evaluating the response surface requiring a lot of computational effort, recycling previous evaluations of the response surface, which do not coincide with a given structure of the data, may be very useful. In addition, the SSL technique establishes sampling points at locations in the phase space with no physical meaning in an industrial setting.(ii)*Robustness with respect to high dimensionality*: triangularization-based techniques suffer when dealing with multidimensional data just because a high dimensional mesh has to be generated. Nevertheless, the separation of variables could be an appealing technique to circumvent the problem of generating such a high dimensional mesh.(iii)*Curse of dimensionality*: all previous techniques suffer when dealing with high dimensional data. For instance, the SSL needs sampling points just to reach the first level of approximation. Thus, when dealing with high dimensional data ( uncorrelated dimensions) plenty of sampling points are required to properly capture a given response surface.

In what follows we present a method based on the concept of separate representations to overcome the curse of dimensionality. Such separate representation has previously been employed by the authors to construct a priori reduced-order modeling techniques, coined as Proper Generalized Decompositions [14–20]. This will give rise to a sparse Proper Generalized Decomposition (s-PGD in what follows) approach to the problem. We then analyze the just developed technique through a series of numerical experiments in Section 4, showing the performance of the method. Examples in up to ten dimensions are shown. The paper is completed with some discussions.

#### 2. A Sparse PGD (s-PGD) Methodology

##### 2.1. Basics of the Technique

In this section we develop a novel methodology for sparse identification in high dimensional settings. For the ease of the exposition and, above all, representation, but without loss of generality, let us begin by assuming that the unknown objective function lives in and that is to be recovered from sparse data. As in previous references, see, for instance [21]; we have chosen to begin with a Galerkin projection, in the formwhere stands for the—here, still two-dimensional—domain in which the identification is performed and is an arbitrary test function. Finally, will be the obtained approximation to , still to be constructed. In previous works of the authors [8] as well as in other approaches to the problem (e.g., [21]), this projection is subject to additional constraints of thermodynamic nature. In this work no particular assumption is made in this regard, although additional constraints could be imposed to the minimization problem.

Following the same rationale behind the Proper Generalized Decomposition (PGD), the next step is to express the approximated function as a set of separate one-dimensional functions,

The determination of the precise form of functional pairs , , is done by first projecting them on a finite element basis and by employing a greedy algorithm such that once the approximation up to order is known, the new th order term is found by any nonlinear solver (Picard, Newton,…).

It is well-known that this approach produces optimal results for elliptic operators (here, note that we have in fact an identity operator acting on ) in two dimensions, see [14] and references therein. There is no proof, however, that this separate representation will produce optimal results (in other words, will obtain* parsimonious models*) in dimensions higher than two. In two dimensions and with it provides the singular value decomposition of [15]. Our experience, nevertheless, is that it produces almost optimal results in the vast majority of the problems tested so far.

It is worth noting that the product of the test function times the objective function is only evaluated at few locations (the ones corresponding to the experimental measurements) and that, in a general high dimensional setting, we will be in the low-data limit necessarily. Several options can be adopted in this scenario. For instance, the objective function can be first interpolated in the high dimensional space (still 2D in this introductory example) and then integrated together with the test function. Indeed, this will be the so-called* PGD in approximation* [15], commonly used when either is known everywhere and a separated representation is sought or if is known in a separated format but a few pairs are needed for any reason. Under this rationale the converged solution tries to capture the already interpolated solution in the high dimensional space but in a more compact format. As a consequence, the error due to interpolation of experimental measurements on the high dimensional space will persist in the final separate identified function.

In order to overcome such difficulties, we envisage a projection followed by interpolation method. However since information is just known at sampling points , , it seems reasonable to express the test function not in a finite element context, but to express it as a set of Dirac delta functions collocated at the sampling points,giving rise to

The choice of the test function in the form dictated by (4) is motivated by the desire of employing a collocation approach while maintaining the symmetry of standard Bubnov-Galerkin projection operation.

##### 2.2. Matrix Form

Let us detail now the finite element projection of the one-dimensional functions and , , (often referred to as* modes*) appearing in (2). Several options can be adopted, ranging from standard piecewise linear shape functions, global nonlinear shape functions, maximum entropy interpolants, splines, kriging, etc. Regarding the kind of interpolant to use, an analysis will be performed in the sequel. Nevertheless, no matter which precise interpolant is employed, it can be expressed in matrix form aswhere and , , represent the degrees of freedom of the chosen approximation. We employ as the most usual nomenclature for the shape function vector. It is important to remark that the approximation basis could even change from mode to mode (i.e., for each ). For the sake of simplicity we take the same number of terms for both and , namely, .

By combining (1), (2), (4), (6), and (7) a nonlinear system of equations is derived, due to products of terms in both spatial directions. An alternate direction scheme is here preferred to linearize the problem, which is also a typical choice in the PGD literature. Note that, when computing modes , the variation in the other spatial direction vanishes, , and vice versa.

In order to fully detail the matrix form of the resulting problem, we first employ the notation “” as the standard tensorial product (i.e., ) and define the following matrices For the sake of simplicity but without loss of generality, evaluations of the former operators at point are denoted as Equations (10)-(11) below show the discretized version of the terms appearing in the weak form, (1), when computing modes in the direction. Again, stands for the number of modes in the solution while denotes the number of sampling points.

Hence, by defining allows to write a system of algebraic equations

Exactly the same procedure is followed to obtain an algebraic system of equations for . This allows performing an alternating directions scheme to extract a new couple of and modes.

This formulation has several aspects that deserve to be highlighted:(1)No assumption about has been made other than assuming known its value at sampling points. Indeed, both problems of either interpolating or making a triangulation in a high dimensional space are circumvented due to the separation of variables.(2)The operator is composed of rank-one updates, meaning that the rank of such operator is at most . Furthermore, if a subset of measured points share the same coordinate , the entire subset will increase the rank of the operator in one unity.(3)The position of the sampling points will constrain the rank of the PGD operators. That is the reason why even if the possibility of having a random sampling of points is available, it is always convenient to perform a smart sampling technique such that the rank in each direction tends to be maximized. Indeed, the higher the rank of the PGD operator is, the more cardinality of and can be demanded without degenerating into an underdetermined system of equations.

There are plenty of strategies to smartly select the position of the sampling points. They are based on either knowing an a priori error indicator or having a reasonable estimation of the sought response surface. Certainly, an adaptive strategy based on the gradient of the precomputed modes could be envisaged. However, the position of the new sampling points will depend on the response surface calculated using the previous sampling points, making parallelization difficult. That is the reason why Latin hypercube is chosen in the present work. Particularly, Latin hypercube tries to collocate sampling points in such a way that the projection of those points into and axis are as far as possible.

##### 2.3. Choice of the 1D Basis

In the previous section, nothing has been specified about the basis in which each one of the one-dimensional modes was expressed. In this subsection, we will use an interpolant based on Kriging techniques. Simple Kriging has been used throughout history in order to get relatively smooth solutions, avoiding spurious oscillations characteristic of high order polynomial interpolation. This phenomena is called Runge’s phenomenon. It appears due to the fact that the sampling point locations are not chosen properly; i.e., they will not be collocated, in general, at the Gauss-Lobato-Chebychev quadrature points. Kriging interpolants consider each point as a realization of a Gaussian process, so that high oscillations are considered as unlikely events.

Hence, by defining a spatial correlation function based on the relative distance between two points, , an interpolant is created over the separated 1D domain, where is a weighting function which strongly depends on the definition of the correlation function and the coefficients are the nodal values associated to the Kriging control points. Note that these control points are not the sampling points. We have chosen this strategy so as to allow us to accomplish an adaptivity strategy that will be described next. In the present work, these control points are uniformly distributed along the 1D domain. Although several definitions of the correlation function exist, a Gaussian distribution is chosen as where is the variance of the Gaussian distribution. Several a priori choices can be adopted to select the value of the variance based on the distance between two consecutive control points, e.g., . The magnitude of should be adapted depending on the desired global character of the support. To ensure the positivity of the variance, should be in the interval .

Let us define now a set of control points and the sampling points Let us define in turn a correlation matrix between all control points and a correlation matrix between the control points and the sampling points as Under these settings, we define a weighting function for each control point and for each sampling point as where .

If we reorganize the terms in the same way that we did in the previous section to have a compact and close format of the shape function , we arrive to where each shape function is given by

Figures 1 and 2 depict the appearance of the simple Kriging interpolants using 7 control points uniformly distributed along the domain, for and , respectively. It can be highlighted that both the Kronecker delta (i.e., strict interpolation) and the partition of unity properties are satisfied for any value of . Moreover, it is worth noting that the higher the variance the correlation function has, the more global the shape functions are. Furthermore, it is known that per cent of the probability of a Gaussian distribution is comprised within an interval of , being the mean value of the distribution. This issue explains perfectly well why the support of each Gaussian distribution takes 2 elements for the case, where . Indeed, the shape of the interpolants is quite similar to standard finite element shape functions, but with a Gaussian profile. The remaining 1 per cent of probability is comprised in the small ridges happening in the middle of the elements.