Abstract
The least squares fit of observations with known error covariance to a strongconstraint dynamical model has been developed through use of the time evolution of sensitivity functions—the derivatives of model output with respect to the elements of control (initial conditions, boundary conditions, and physical/empirical parameters). Model error is assumed to stem from incorrect specification of the control elements. The optimal corrections to control are found through solution to an inverse problem. Duality between this method and the standard 4DVar assimilation using adjoint equations has been proved. The paper ends with an illustrative example based on a simplified version of turbulent heat transfer at the sea/air interface.
1. Introduction
Sensitivity function analysis has proved valuable as a mean to both build models and to interpret their output in chemical kinetics (Rabitz et al. [1], Seefeld and Stockwell [2]) and air quality modeling (Russell et al. [3]). Yet, the ubiquitous systematic errors that haunt dynamical prediction cannot be fully understood with sensitivity functions alone. We now include an optimization component that leads to an improved fit of model to observations. The methodology is termed forward sensitivity method (FSM)—a method based on least squares fit of model to data, but where algorithmic structure and correction procedure are linked to the sensitivity functions. In essence, corrections to control (the initial conditions, the boundary conditions, and the physical and empirical parameters) are found through solution to an inverse problem.
In this paper we derive the governing equations for corrections to control and show their equivalence to equations governing the socalled 4DVar assimilation method (fourdimensional variational method)—least squares fit of model to observations under constraint (LeDimet and Talagrand [4]). Beyond this equivalence, we demonstrate the value of the FSM as a diagnostic tool that can be used to understand the relationship between sensitivity and correction to control.
We begin our investigation by laying down the dynamical framework for the FSM: general form of the governing dynamical model, the type and representation of model error that can identified through the FSM, and the evolution of the sensitivity functions that are central to execution of the FSM. The dual relationship between 4DVar/adjoint equations is proved. The stepbystep process of assimilating data by FSM is outlined, and we demonstrate its usefulness by application to a simplified airsea interaction model.
2. Foundation Dynamics for the FSM
We have included a list of mathematical symbols used in this paper. These symbols and associated nomenclature are found in Table 1.
2.1. Prediction Equations
Let denote the state and let denote the parameters of a deterministic dynamical system, where and are column vectors, and are positive integers, denotes the time, and superscript denotes the transpose of the vector or matrix. Let be a mapping, where with for . The vector spaces and are called the model space and parameter space, respectively.
Consider a dynamical system described by a system of ordinary nonlinear differential equations of the form or in component form where denotes the time derivative of the state , with the given initial condition. The control vector for the model is given by , the combination of initial condition and parameters referred to as the control space. It is tacitly assumed that the map of in (1a) and (1b) is such that the solution exists and is unique. It is further assumed that has a smooth dependence on the control vector such that the first () partial derivatives of with respect to the components of also exist. The solution of (1a) and (1b) is known as the deterministic forecast of the state of the system at time . If the map in (1a) and (1b) depends explicitly on , then this system is called a time varying or nonautonomous system; if does not depend on , then the system is known as a time invariant or autonomous system.
Let be the observation vector obtained from the field measurements at time . Let be the mapping from the model space to the observation space .
If denotes the (unknown) true state, then we assume that is given by where is the additive (unobservable and unavoidable) noise. The mapping is known as the forward operator or the observation operator. It is further assumed that is a white Gaussian noise with mean zero possessing a known covariance matrix . That is, .
2.2. A Classification of Forecast Errors
The forecast error is defined as follows: the sum of a deterministic part and the random part induced by the observation noise. Our immediate goal is to analyze and isolate sources and types of forecast errors.
First, if the model map and the forward operator are without error, that is, exact, and if the control vector is also error free, then the deterministic forecast must be correct in the sense that , the true state. Then from (3), the forecast error is purely random or white Gaussian noise. That is,
Second, if and are known perfectly but has an error, then the forecast will have a deterministic error induced by the incorrect control vector. In such a case, we can, in principle, decompose the forecast error as a sum where the deterministic part, is purely a function of the control vector error.
Third, if , , and are in error, then the forecast error can be represented as where the deterministic part may have a complex (additive and/or multiplicative) dependence on errors in , , and .
The following assumption is key to our analysis that follows. The model of choice is faithful to the phenomenon under study. The system is predicted with fidelity—the forecasted state is creditable and useful in understanding the dynamical processes that underpin the phenomenon. Certainly, the forecast will generally exhibit some error, but the primary physical processes are included; that is, the vector field includes the pertinent physical processes. In this situation the forecast error stems from erroneously specified elements of control. Thus, in our study the forecast error assumes the form shown in (6). Dee’s work [5] contains a very good discussion of the estimation of the bias in (7) arising from errors in the model and/or observations.
2.3. Dynamics of FirstOrder Sensitivity Function Evolution
Since our approach is centered on sensitivity functions, we develop the dynamics of evolution of the forward sensitivities in this section.
Differentiating both sides of (1b) with respect to , interchanging the order of differentiation on the lefthand side, we obtain for and with as the initial condition.
These equations can be succinctly written in matrix form as with as initial condition. This system of linear timevarying ordinary nonhomogeneous differential equations describe the evolution of the elements of where the Jacobian matrices and are given by Similarly, by differentiating both sides of (1b) with respect to , we obtain for , with where is the standard Kronecker delta. These equations can be succinctly represented as with , the identity matrix. This system of linear, timevarying homogeneous equations governs the evolution of the elements of the matrix . Notice that (9) and (13) are independent of the observations and have the same system matrix on the righthand sides; thus, the homogeneous solutions to (9) and (13) have the same structure.
The evolution of the sensitivities (solution to (9) and (13)) is dependent on the solution to the governing dynamical equations ((1a) and (1b)). Generally, these equations are solved numerically using the standard fourthorder RungeKutta method. Rabitz et al. work [1] contains more details relating to solutions of (9) and (13). In special cases such as in air quality modeling, the sensitivity equations (9) and (13) exhibit extreme stiffness. Special methods are needed to handle the inherent stiffness of these equations. Seefeld and Stockwell work [2] includes a discussion of these issues. Gear’s work [6] is a good reference for a general discussion of stiff equations.
3. Duality between the FSM and 4DVar Based on Adjoint Method
Let be a sequence of observation vectors at times . The goal is to use these observations to improve the estimate of the control vector . This estimation problem is recast as a constrained minimization of an objective function given by where the model state evolves according to (1a), (1b), and is the known covariance of the observational errors at time .
Fundamental to minimizing (14) is the computation of the gradient of , denoted by . In the following we describe two ways of characterizing where and .
3.1. The Adjoint Method
This method is based on the basic principle that if is the first variation of induced by the perturbation in , then where denotes the standard inner product of two vectors and of the same dimension. Once the first variation is determined, the gradient can be found. In the following we exploit two basic properties of inner product:
(i) linearity
() adjoint property where is the transpose or the adjoint of the matrix .
From first principles (Chapter 24 in Lewis et al. [7]), it can be verified that the first variation of in (14) is given by where the forecast error given by is the Jacobian of forward operator with respect to .
By invoking the adjoint property in (17), (19) becomes where The first variation in at resulting from the perturbation in is given by (A.4) in Appendix A. Using (A.4) and the linearity of the inner product it follows that where we will refer to the first term on the righthand side of (23) as “Term I” and the second as “Term II.” Using the adjoint and linearity property in (17), we get from which we obtain Similarly, rewriting Term II as we get Hence, we obtain the components of which are used in conjunction with the minimization algorithm to find the optimal that minimizes in (14).
We conclude this discussion with an efficient recursive method for evaluating the expressions in (25) and (27). Define and for It can be verified that in (25).
Details on the recursive computation of (27) are given in Appendix C.
3.2. SensitivityBased Approach
Let us first consider the special case when . Then (14) becomes where we recall that and is the solution of the model equations (1a) and (1b) at time .
Setting and , the expression in (30) becomes identical to in (B.16) (Appendix B). Hence, by using (B.20) it follows that where is given by (22) and Now comparing (32) with (25)–(27), we obtain the duality relation: The sensitivity matrices on the lefthand side of (33) are obtained by solving the forward sensitivity equations, whereas the matrix products and are obtained by integrating along the path as given by (A.3) and (A.5) (Appendix A).
The primary advantage of the sensitivitybased approach is that it provides a natural interpretation of the expression for the gradient in (31). Recall from (22) that is the weighted version of the forecast error mapped onto the model space by the linear operator . Also the th column of is the sensitivity of the th component of the state vector with respect to the control vector given by . Thus, it follows from (31) that is a linear combination of the columns of which are the sensitivity vectors, where the coefficients of the linear terms are the components of . Thus, columns of with large norms that are associated with large forecast errors will be dominant in determination of the components of . In other words, we gain some insight into the interplay between corrections to control and forecast errors—something that can be seen through a careful examination of the sensitivity vector at various times from initial state to forecast horizon. (The illustrative example in Section 5 further explores this diagnostic function.) Expression (31) also enables us to isolate the effect of different components of on the performance index .
For the general case of observations at multiple times, (31) assumes the following form: This gradient is the sum of linear combinations of the columns of at various time instances. With so many directions (the directions associated with the columns of contributing to the components of , the connection between sensitivity, and forecast error is obscured.
4. Data Assimilation Using Sensitivity
We seek to find the solution to the following problem using the FSM. Given and , the control vector , the observation , and the error covariance of observations , find a correction to the control vector such that the new model forecast starting from () will render the forecast error purely random; that is, the systematic forecast error is removed and accordingly .
We start by quantifying the change in the solution resulting from a change in . Invoking the standard Taylor series expansion, we obtain where is the th variation of , the fraction of the total change that can be accounted by the th partial derivatives of with respect to and the perturbation . Since practical considerations dictate that the total number of correction terms on the righthand side of (35) be finite, we often settle for an approximation of only terms ( generally 2). This is a tradeoff between improved accuracy resulting from a large value of and the complexity of the resulting inverse problem. Although we have developed the methodology for secondorder analysis ( = 2, where is approximated by the sum ) (Lakshmivarahan and Lewis [8]), our development will follow the firstorder analysis where is given by the first variation ). Secondorder analysis is justified when is a significant fraction of —this occurs when and/or exhibit strong nonlinearity. It is further shown in Section 5 that iterative application of the firstorder method often leads to improved results.
4.1. FirstOrder Analysis
From first principles and (35) we obtain where is the Jacobian of with respect to , and is the Jacobian of with respect to .The matrices and , found as solutions of (13) and (9), respectively, are known as the firstorder sensitivity of the solution with respect to and , respectively, and the elements of these matrices are called sensitivity functions.
4.1.1. Observations at One Time Only
We first consider the case where observation is available at one time . The first variation in induces a variation in the forward operator . Again, by approximating by the first variation, we get where is the Jacobian of with respect to and is given by substituting (36) into (37), we obtain where and . Setting and where and , (39) becomes Given the operating point , our goal is to find the perturbation such that the observation is equal to the model counterpart, that is, or From (43), it follows that the required perturbation is obtained by solving the linear inverse problem where and .
4.1.2. Observations at Multiple Times
The above analysis can be readily extended to the case where observations are available at times. We denote these sets of observation vectors by where . The forecast error is given by Define Then at time we have where Now define a matrix and a vector as Then, the N relations in (46) can be succinctly denoted by A number of special cases arise depending on (a) the value of relative to , namely, over (under) determined cases when and (b) the rank of the matrix , namely, is of full rank or rank deficient. In all these cases, the linear inverse problem (43) is recast as a minimization problem using the standard least squares framework (Lawson and Hanson [9]). The resulting minimization problem can then be solved using one of many standard methods, for example, the conjugate gradient method (Lewis et al. [7]; Nash and Sofer [10]).
As an illustration, consider the case when and that the rank of is , that is, full rank. The solution is then obtained by minimizing the weighted least squares criterion where is an diagonal matrix with as its th diagonal block.
Although it is computationally efficient to minimize (50) by using a method like conjugate gradient, there is an advantage to analyze the properties of the optimal solution via the classical approach, that is, by setting the gradient of to zero. It can be verified that the minimizing is found by solving the symmetric linear system or succinctly as where and are defined in (48) and (51), and subscript “LS” refers to the least squares solution.
From the discussion relating to the classification of forecast errors, recall that the forecast error inherits its randomness from the (unobservable) observation noise. The vector on the right hand side of (53) is random and hence the solution of (53) is also random.
Since we are interested in forecast errors in response to incorrect control, we have Now define Hence, the vector in (48) can be expressed as with since . Substituting (56) into (53) and taking the expectation give Thus, the expected value of the correction to control is indeed a linear function of the forecast error itself. It can be verified (Lewis et al. [7]) that the covariance of the least squares estimate [(53)] is given by where is the Hessian of in (50).
5. A Practical Example: Air/Sea Interaction
We choose a simple but nontrivial differential equation to demonstrate the applicability of the forwardsensitivity method to identification of error in a dynamical model. We break this discussion into three parts as follows: (1) the model, (2) discussion of the diagnostic value of FSM, and (3) numerical experiments with data assimilation using FSM.
5.1. The Model
Consider the case where cold continental air moves out over an ocean of constant surface temperature. We follow a column of air in a Lagrangian frame; that is, the column of air moves with the prevailing lowlevel wind speed. Turbulent transfer of heat from the ocean to air warms the column. The governing equation is taken to be where : temperature of the air column : temperature of the sea surface : turbulent heat exchange coefficient (nondimensional),: speed of air column : height of the column (mixed layer)(m),: time (h).Equation (59) is nondimensionalized by the following scaling: The governing equation then takes the form Assuming , , , then Thus, we take our governing equation to be where . The solution to (63) is with and , that is, and = 2. The solution depends linearly on and but nonlinearly on .
There are three elements of control: initial condition, , boundary condition, , and parameter, .
5.2. Diagnostic Aspects of FSM
The Jacobians of with respect to and are given by and the Jacobians of the solution with respect to and are given by From (9) and (13) the evolution of the forward sensitivities is given by where Either by solving (67)–(70) or by computing directly from (64), it can be verified that the required sensitivities evolve according to The plots of the solution and the three sensitivities are given in Figure 1.
(a) Solution 
(b) Sensitivity of w.r.t. 
(c) Sensitivity of w.r.t. 
(d) Sensitivity of w. r. t 
Let be the direct observation of the state , namely, In this case, is the forecast variable and therefore . Then is the forecast error.
Following the developments in Section 4 for the case of a single observation described by (39)–(46), we obtain the analog of (46) as where is the forward sensitivity vector and . Clearly, (74) corresponds to an underdetermined linear least squares problem, whose optimal solution is given by [7, chapter 5] where is the unit forward sensitivity vector at time and is a scalar that is the forecast error normalized by the norm of the forward sensitivity vector . In other words, the direction of the optimal corrections to the control that annihilate the forecast error is a constant multiple of the unit forward sensitivity vector.
If we assume the following control vector: , that is, [, , and (nondimensional)], we get The time variations of elements of are given in Table 2 (also refer to Figure 1). From this table, it is clear that the direction of corrections to control varies as increases. At , the corrections lie in the direction , where is only sensitive to the initial condition . For large , the corrections lie in the direction , where is only sensitive to the boundary condition (the seasurface temperature). For intermediate times, all the components of control have nonzero sensitivities. reaches its maximum at .
5.3. Numerical Experiments
We assume that the incorrect control vector is ; in dimensional form, , , and (nondimensional). Thus, for an ideal correction to control, To mimic reality, the correction process uses sensitivity functions that stem from the erroneous solution, that is, where the incorrect control is used.
We have explored both the goodness and failure of recovery of control under two different scenarios, where either 3 or 6 observations are used to recover the control vector. Since there are 3 unknowns, the case for 6 observations is an overdetermined system.
We execute numerical experiments where the observations are spread over different segments of the temporal range—generally divided into an “early stage” and a “saturated stage.” By saturated stage we refer to that segment where the solution becomes close to the asymptotic state, that is, . The dividing time between these segments is arbitrary; but generally, based on experimental results to follow, we divide the segments at = 24 where (24) = 10.975, 0.025 less than = 11.0 (see Figure 1).
The following general statement can be made. If more than one of the observations falls in the saturated zone, the matrix becomes illconditioned. As can be seen from (39) and the plots of sensitivity functions in Figure 1, and and 0 as . Accordingly, if two of the observations are made in the saturated zone, this induces linear dependency between the associated rows of the matrix and in turn leads to illconditioning. This illconditioning is exhibited by a large value of the condition number, the ratio of the largest to the smallest eigenvalue of the matrix . The inversion of this matrix is central to the optimal adjustments of control (see (55)).
Illconditioning can also occur as a function of the observation spacing in certain temporal segments. This is linked to lack of variability or lack of change of sensitivity from one observation time to another. And, as can be seen in Figure 1, the absolute value of the slope in sensitivity function curves is generally large at the early stages of evolution and small at later stages. As an example, we find satisfactory recovery, (–0.882, – 0.067, +0.922), when the observations are located at 5.0, 5.1, and 5.2 (a uniform spacing of 0.1). Yet, near the saturated state, at = 20.0, 20.1, and 20.2, again a spacing of 0.1, the recovery is poor with the result = (+5.317, – 0.142, +0.998). The associated condition numbers for these two experiments are and , respectively. Similar results follow from the case where 6 observations are taken. In all of these cases, the key factor is the condition number of . For our dynamical constraint, a condition number less than ~1 portends a good result.
For the case where we have 6 observations at = 2, 7, 12, 17, 22, and 27, with a random error of 0.01 (standard deviation), we have executed an ensemble experiment with 100 members to recover control. In this case, the condition number is . Results are plotted three dimensionally and in twodimensional planes in the space of control, that is, plots of correction in the , , and planes. Results are shown in Figure 2.
(a) 3D clusterplot
(b) Cluster of corrections of versus 
(c) Cluster of corrections of versus 
(d) Cluster of corrections of versus 
Finally, we explore the iterative process of finding corrections to control. Here, the results from the 1st iteration are used to find the new control vector. This vector is then used to make another forecast and find a new set of sensitivity functions. The error of the forecast is obtained, and along with the new sensitivity functions, a second correction to control is found, and so forth. For the experiment with 6 observations that has been discussed in the previous paragraph, we apply this iterative methodology. As can be seen in Figure 3, the correct value of control is found in 3 iterations.
(a) Iterative correction
(b) Projection onto  plane 
(c) Projection onto  plane 
(d) Projection on to  plane 
6. Concluding Remarks
The basic contributions of this paper may be stated as follows.
(1) While the 4DVar has been the primary methodology for operational data assimilation in meteorology/oceanography (Lewis and Lakshmivarahan [11]), and while forward sensitivity has been a primary tool for reaction kinetics and chemistry (Rabitz et al. [1]) and air quality modeling (Russell et al. [3], to our knowledge these two methodologies have not been linked. We have now shown that the method of computing the gradient of by these two approaches exhibits a duality hitherto unknown.
(2) By treating the forward sensitivity problem as an inverse problem in data assimilation, we are able to understand the fine structure of the forecast error. This is not possible with the standard 4DVar formulation using adjoint equations.
(3) While it is true that computation of the evolution of the forward sensitivity involves computational demands beyond those required for solving the adjoint equations in the standard 4DVar methodology, there is a richness or augmentation to the information that comes with this added computational demand. In essence, it allows us to make judicious decisions on placement of observations through understanding of the time dependence of correction to control.
Appendices
A. Dynamics of Evolution of Perturbations
Let be the perturbation in the control vector and the resulting perturbation in the state induced by the dynamics (1a) and (1b). Our goal is to derive the dynamics of evolution of .
From first principles, the evolutionary dynamics of are given by the variational equation (Hirsch et al. [12]) or the tangent linear model (Lewis et al. [7]) where the Jacobians and are given by Define the integrating factor premultiplying both sides of (A.1) by and integrating, we get the solution of the linear nonhomogeneous and nonautonomous equation (A.1) as where From definitions (A.3)–(A.5), it can be verified that, for We now consider two special cases.
Case A. Let , that is, the initial perturbations are confined only to the initial condition, . Then setting , from (A.5) we see that . From (A.4) we get
Case B. Let , that is, the initial perturbations are confined only to the parameter, . Then setting , from (A.3), we see that , the identity matrix. Now from (A.4) it follows that
B. Computation of Sensitivity Functions
Let where and . Let and consider By applying the standard chain rule it can be verified that the gradient with respect to is given by where is the Jacobian of and is the (Jacobian) sensitivity of the vector with respect to at time .
Now consider a quadratic form where is a symmetric and positive definite matrix.
Then by the product rule where . By applying (B.10) to each of the two terms on the right side of (B.14), it follows that Finally, if then expand since does not depend on , by using (B.10) and (B.15), we get Define then That is, the gradient of with respect to is the linear combination of the sensitivity vectors, that is, the columns of , where the coefficients in this linear combination are the elements of the vector .
C. Computation of in (27)
Given , for , the expression on the righthand side of (27) can be efficiently computed as shown in Algorithm 1.

Then = − Grad. It is to be noticed that there is only matrixvector multiplication involved in these operations and not matrixmatrix multiplication.
Acknowledgments
At an early stage of formulating our ideas on this method of data assimilation, Fedor Mesinger and Qin Xu offered advice and encouragement. Qin Xu and Jim Purser carefully checked the mathematical development, and suggestions from the following reviewers went far to improve the presentation: Yoshi Sasaki, Bill Stockwell, and anonymous formal reviewers of the manuscript. S. Lakshmivarahan’s work is supported in part by NSF EPSCoR RII Track 2 Grant 105155900 and by NSF Grant 10515400, and J. Lewis acknowledges the Office of Naval Research (ONR), Grant No. N000140810451, for research support on this project.