#### Abstract

Difficulty of data assimilation arises from a large difference between the sizes of a state vector to be determined, that is, the number of spatiotemporal mesh points of a discretized numerical model and a measurement vector, that is, the amount of measurement data. Flow variables on a large number of mesh points are hardly defined by spatiotemporally limited measurements, which poses an underdetermined problem. In this study we conduct the sensitivity analysis of two- and three-dimensional vortical flow fields within a framework of data assimilation. The impact of measurement strategy, which is evaluated by the sensitivity of the 4D-Var cost function with respect to measurements, is investigated to effectively determine a flow field by limited measurements. The assimilation experiment shows that the error defined by the difference between the reference and assimilated flow fields is reduced by using the sensitivity information to locate the limited number of measurement points. To conduct data assimilation for a long time period, the 4D-Var data assimilation and the sensitivity analysis are repeated with a short assimilation window.

#### 1. Introduction

The use of measurement data to improve a numerical prediction is known as a data assimilation method in meteorological and oceanographic communities [1]. Recent data assimilation methods are based on the optimal control theory. As a consequence there are two major approaches: sequential and variational methods. The application of these methods to large scale problems in meteorology made the development of data assimilation methods somewhat independent of optimal control studies; that is, the effort is put into the reduction of numerical costs of those methods and the handing of the large degree of freedom of forecast models. One example is the invention of ensemble Kalman filter, which approximately represents system error covariance by an ensemble of a limited number of model runs. By this the cost of matrix operations in the Kalman filter algorithm can be drastically reduced. On the other hand, the numerical cost of variational methods such as the four-dimensional variational (4D-Var) method is usually smaller than the ensemble Kalman filter; therefore, the implementation of the 4D-Var data assimilation into the operational weather forecast was earlier than that of the ensemble Kalman filter. However, the cost for maintaining adjoint codes used for the 4D-Var data assimilation and the need for massive parallel computation, which is driven by the architecture of recent supercomputers, would accelerate the use of ensemble Kalman filters in meteorological community. Because of its rational approach to estimate a true state based on both observation and model prediction, the application of data assimilation methods is not limited to the area of meteorological and oceanographic studies. For example, a pioneering “hybrid wind tunnel” concept was proposed by Hayase et al. [2], in which measurements in a wind tunnel are used to improve the numerical simulation with coarse mesh resolution. The experiment was focusing on a von Karman vortex street in low Reynolds number flows.

The present authors have been studying the applicability of data assimilation methods in aeronautical researches. Numerical simulations of atmospheric turbulences such as clear air turbulence and aircraft wake turbulence were performed by the 4D-Var method coupled with aeronautical computational fluid dynamics (CFD) codes [3, 4]. Recent attempt is the mitigation of the uncertainty of a Reynolds-averaged Navier-Stokes (RANS) turbulence model by the use of the ensemble Kalman filter, where parameters of the Spalart-Allmaras turbulence model are optimized based on pressure measurements around the RAE2822 airfoil [5]. On the other hand, a classical nudging technique is used to initialize realistic aircraft wake in a computational domain to simulate aircraft wake vortices, although a high-fidelity RANS flow field is nudged instead of measurement data [6]. To alleviate the large numerical cost of data assimilation methods compared to a single model run, a reduced-order model (ROM) prediction of terrain-induced turbulence has also been conducted in combination with the particle filter [7]. Based on experiences gained from those applications our interest is to have a more general guidance to apply data assimilation methods to fluid dynamic problems with limited measurements.

Predictability in systems with the large degree of freedom is a topic of concern in numerical weather prediction (NWP) in conjunction with data assimilation [1]. Here, the predictability is defined by the growth rate of the distance of initially close trajectories in a phase space. From the work of Lorenz, it is widely known that the predictability of a certain type of dynamical system, for example, the Lorenz model, is limited due to deterministic chaos [8, 9]. This suggests a framework to consider the predictability of an actual dynamical system, such as a system of Navier-Stokes equations. Here the predictability of a system could be rephrased as the ease of data assimilation. To investigate the predictability of a system, the maximum Lyapunov exponent has been used to determine the global predictability. For an actual dynamical system, however, the local Lyapunov exponent, such as the finite size Lyapunov exponent (FSLE), is important to represent the predictability of the system [10]. Carrassi et al. coupled a breeding technique with a data assimilation procedure based on the concept of assimilation in unstable subspace [11]. They showed the stability of the data assimilation system based on the Lyapunov exponent and employed adaptive (targeted) observation to stabilize the system.

The idea of targeted observation has been studied in meteorological community to improve the NWP. The targeted observation has a large potential to use recently available measurements from aircraft (the Aircraft Meteorological DAta Relay, AMDAR) and unmanned aerial vehicles (UAVs) which can be considered as “flying sensors.” One of the major approach of the sensitivity analysis aiming for targeted observations is an adjoint-based method because the major operational weather centers employ the adjoint-based 4D-Var data assimilation system [12]. Daescu provided a framework of the forecast sensitivity with respect to observations in the context of the 4D-Var data assimilation [13]. Hossen et al. studied the effect of random perturbations on the forecast error [14]. The adjoint equation approach has a variety of applications including estimation problems [15–17] as well as data assimilation problems [3, 4, 18].

The present study is an attempt to investigate the impact of measurement strategy in a data assimilation system by using a sensitivity analysis method. We refer to the preceding work of the sensitivity analysis conducted by Daescu and Navon [12]. In this paper we consider an idealized situation in numerical experiments, that is, a system of a counter-rotating vortex pair where self-induced advection velocity realizes a transient flow field. The impact of the number of measurement points on the assimilated flow field is firstly investigated, and the possibility of adaptive/targeted measurement is explored in the numerical experiments where locations of the measurement points are optimized to effectively reduce the error against a reference flow field. The idea of adaptive/targeted observation in meteorology is interpreted in fluid dynamic problems where unsteady flows of much smaller scales are of interest. The rest of this paper is organized as follows. Section 2 describes the numerical methods for flow simulation and data assimilation including the sensitivity analysis. The computational setting of two- and three-dimensional flow fields defined by a vortex pair is presented in Section 3. In Section 4, we discuss the results of the sensitivity analysis and the impact of measurement strategy in the two- and three-dimensional flow fields. Section 5 concludes our study.

#### 2. Numerical Methods

##### 2.1. Governing Equations and Numerical Methods

For flow simulation we employ incompressible Navier-Stokes equations:
where and represent the velocity components in three spatial directions () and the pressure deviation from the reference state , respectively. denotes the strain rate tensor. The summation convention is used for the velocity components . In this study, a typical value of air density is employed: kg/m^{3}. The kinematic viscosity in (1) is defined by the sum of molecular viscosity and eddy viscosity obtained by a subgrid-scale model.

The equations are discretized by the fully conservative fourth-order central difference scheme [19]. Time integration is performed by the third-order low storage Runge-Kutta scheme [20]. The Lagrangian dynamic model is used as a subgrid scale model for large-eddy simulation (LES) [21], which avoids excessive eddy viscosity in vortex core region [22]. The adjoint code is derived first by linearizing the above equations and then by rewriting it from backward by replacing inputs and outputs of each code line. The latter operation corresponds to a transpose of the matrix composed of coefficients of the linearized Navier-Stokes equations. Both Navier-Stokes and the adjoint code are parallelized by a domain decomposition approach using the message passing interface (MPI) library.

##### 2.2. Sensitivity Analysis of an Unsteady Flow Field

The objective of the 4D-Var data assimilation is to obtain an initial flow state which reproduces corresponding measurements during a certain time period (assimilation window) [1]. Figure 1 shows a schematic of the 4D-Var data assimilation. The vertical and horizontal axes show a flow state in a phase space and time, respectively. A solid black line shows a trajectory of a true flow state. Broken lines show trajectories of the simulated flow state starting from different initial states. The 4D-Var data assimilation searches the initial flow state of the true trajectory by evaluating the difference between those trajectories and measurements within an assimilation window.

The difference between measurements (usually measurements have spatiotemporally less information compared to a prediction model) and corresponding numerical results evaluated by conducting the numerical simulation over a period of time is represented as a cost function with respect to an initial flow state : Here is a measurement operator which converts the dimension of computational flow variables into that of measurement data to evaluate these differences, while a vector represents measurements. Subscript shows a time step of the flow computation and is the total number of time steps. Equation (3) is a function of ; that is, the data assimilation process is formulated as a minimization problem of with a control variable . In a standard formulation of a cost function for the 4D-Var data assimilation, a background error term is taken into account. Here we do not consider the background error term because we attempt to retrieve a flow field based on available measurements; therefore, a background state just plays a role of initial conditions for the 4D-Var iteration. Note that we follow the notation of quantities used in meteorological community [23], except for a state vector , because a variable is used as a spatial coordinate in (1). The 4D-Var method handles measurement errors through a measurement error covariance matrix , where its elements are the covariance between measurements. In this study is set to the identity matrix. Basically the covariance matrix defines the relative importance of measurements. We do not consider the relative importance of each measurement in this study; instead we attempt to locate measurement points effectively during the 4D-Var iteration.

To obtain the gradient of which is used for the minimization of , a Lagrange function is introduced using a Lagrange multiplier vector . The procedure to obtain the gradient is finally written as follows [3]: Equation (4) shows that the gradient of is obtained by the inverse time integration of using the adjoint operator with a force term: . After obtaining the gradient, the minimization of is conducted by the quasi-Newton method modifying the initial flow variable . The Hessian matrix is approximated using the limited-memory Broyden-Fletcher-Goldfarb-Shanno (BFGS) method [24, 25]. In this method, memory requirements are reduced because the approximated Hessian matrix is not stored explicitly.

In this study we investigate the impact of measurements on a retrieved flow field within a framework of the 4D-Var data assimilation. We refer to the preceding work of Daescu [13] and Hossen et al. [14] regarding the derivation of the sensitivity of the 4D-Var cost function to the observation (observation sensitivity). We start with rewriting (4) as where a linearized operator advances a flow state from the time to . By differentiating (5) with respect to the observation vector , we obtain On the other hand, the gradient of the 4D-Var cost function (3) could become at a certain step during the 4D-Var iteration, where denotes a nonzero constant vector. Referring to the derivation in Daescu [13] we consider the differentiation of the implicit function, which is a function of an initial flow state and an observation vector , as follows: Note that Daescu [13] considers the implicit function which is a function of an analysis state and an observation vector ; that is, , where the first-order optimality condition is satisfied. On the other hand, we apply the implicit function differentiation to the gradient of the 4D-Var cost function during the minimization process, where the 4D-Var cost function does not satisfy the optimality condition. Numerical experiments are conducted in this paper to investigate the applicability of this treatment in a data assimilation system based on Navier-Stokes equations. The rigorous analysis of the applicability is beyond the scope of the present study.

Using (6) and (7), as well as a chain rule , we obtain the sensitivity of the 4D-Var cost function with respect to the observation as where represents a Hessian matrix, and we use the approximated Hessian matrix obtained from the limited-memory BFGS. As described in Daescu and Navon [12], the use of the approximated Hessian matrix may introduce errors in the observation sensitivity evaluation that are difficult to quantify. A mapping of the observation sensitivity onto the mesh points of the LES is conducted as follows: In the following the observation sensitivity is rephrased as the measurement sensitivity, which is more appropriate to describe quantities acquired in laboratory experiments.

The development of linear and adjoint codes can be done by step-by-step processes in the following way. First the derived linear code is checked by where the left-hand side decreases with the order of . Here the small perturbation to the reference state is scaled by . It is also possible to check the angle, , which decreases with . The adjoint code is a transpose of the linearized code and is derived line by line without composing an explicit matrix, where the following relation holds in the order of 10^{−14} with the FORTRAN double precision real. The above validation procedures can be conducted in small program modules such as convective and diffusion terms of Navier-Stokes equations as well as a whole code including all terms and a time integration part. Finally, the gradient computation is validated by , where again the left-hand side decreases with . Table 1 shows the validation of the adjoint code based on small state perturbations. On the other hand, Table 2 shows the strong scaling of the parallel gradient computation where the number of total mesh points is fixed while increasing processor numbers.

#### 3. Computational Setting

##### 3.1. Computational Domains

We consider a flow field defined by a vortex pair in two- and three-dimensional computational domains. The computational setting of the two-dimensional case is as follows. A flow field is defined by a pair of Lamb-Oseen vortices which are characterized by vortex circulation m^{2}/s, vortex core radius of m, and vortex separation m. The vortex flow field is initialized within a domain bounded by sides of and m as shown in Figure 2. A constant mesh spacing of 2 m is used for all spatial directions, resulting in a total number of mesh points of 16,384. Parallel computation is performed by a domain decomposition approach, where the divisions of and in - and -directions are considered. In the three-dimensional case, we consider a domain bounded by sides of , , and m as shown in Figure 3(a). The vortices are initialized along the -axis. A constant mesh spacing of 2 m is used for all three spatial directions. The resulting number of mesh points is 524,288. Parallel computation is performed with the divisions of , , and in -, -, and -directions, respectively. For both cases, molecular kinematic viscosity is set to m^{2}/s such that the vortex circulation based Reynolds number amounts to . Periodic boundary condition is used for all spatial boundaries in the two- and three-dimensional cases considered in this study.

**(a)**

**(b)**

##### 3.2. Procedure of Numerical Experiments

In the two-dimensional case, a numerical experiment is conducted first by generating a reference flow field starting from the conditions mentioned above. From the reference flow field, we acquire pseudomeasurements based on the following strategies; that is, velocity components of all mesh points are extracted as measurements (referred to as 2D-F), velocity components on every second mesh point in both - and -directions are used (2D-H), and those on every fourth mesh point are used as measurements (2D-Q). In comparison with 2D-F, the number of measurements is one fourth in 2D-H and one sixteenth in 2D-Q. Then the 4D-Var cycle (forward time integration for the evaluation of the cost function, backward time integration of the adjoint code, Hessian matrix computation with limited-memory BFGS, and linear search) is started from an arbitrary flow field, where we consider a weaker vortex pair with larger vortex separation compared to the reference flow field ( m^{2}/s, m and m). Targeted measurement process starts after a few iterations of the 4D-Var cycle. Having an approximated Hessian matrix and the gradient from the adjoint code, the measurement sensitivity can be computed by using (8). Here we only consider the measurement sensitivity at the beginning of time integration. Using the measurement sensitivity mapped onto the numerical mesh by (9), measurement points are redistributed based on the magnitude of the measurement sensitivity, where the total number of the measurement points is kept constant, that is, one fourth of the LES mesh in 2D-HA and one sixteenth of the LES mesh in 2D-QA. The time integration of the Navier-Stokes equations is conducted until one tenth of vortex reference time s, that is, 3.35 s. The vortex pair descends a distance of one tenth of vortex separation during this period [22].

Another procedure of the numerical experiment is considered in two- and three-dimensional vortex evolutions for a long time period of one vortex reference time , where the 4D-Var iteration and the sensitivity analysis are repeated with a short assimilation window. We consider the cases which use velocity components of the all mesh points as measurements (referred to as 2DL-F and 3DL-F for two- and three-dimensional cases, resp.), velocity components on every second mesh point in each spatial direction as measurements (2DL-H, 3DL-H), and the case in which every second mesh point in each spatial direction is used as measurements along with the redistribution based on the measurement sensitivity (2DL-HA, 3DL-HA). The vortex system of the reference case is characterized by vortex circulation m^{2}/s, vortex core radius of m, and vortex separation m. To invoke three-dimensional instability of a vortex pair such as the Crow instability, a sinusoidal perturbation with the amplitude of 7.2 m is applied along vortex centerlines as in Figure 3(a). On the other hand, the data assimilation starts from the conditions of m^{2}/s, m, and m, where the sinusoidal perturbation is not applied as shown in Figure 3(b). The absence of the sinusoidal perturbation critically affects the evolution of a vortex pair in three dimensions; that is, the Crow instability leading to vortex reconnection does not appear without the source of instability such as the sinusoidal perturbation [22].

#### 4. Results and Discussion

##### 4.1. Two-Dimensional Case

Figure 4 shows the history of the cost function defined by (3) for three measurement strategies, while the global error evaluated by the quantities on all mesh points is also presented. Here the horizontal axis shows the number of the 4D-Var iteration. Figure 4 indicates that the decrease of the cost function in the 4D-Var data assimilation is not always connected with the convergence of the retrieved flow field to the reference flow field. In 2D-F the value of the cost function and the global error are identical because three velocity components on all mesh points in the domain are used as measurements. The value of the cost function is proportional to the number of measurement points and time steps; therefore, the drastic reduction is realized in 2D-F compared to 2D-Q. On the other hand, the decrease of the global error becomes slow as the number of measurements is reduced. This confirms that there is a limitation for defining a flow field by using a limited number of measurements, that is, a number smaller than a degree of freedom of a numerical model. Note that the behavior of the cost function and the global error might vary depending on the data assimilation methods used and parameters included in the optimization algorithms; however, the tendency shown here may be unchanged in the case of other data assimilation methods.

Figure 5 shows a similar plot as Figure 4, but with adaptive measurement strategy, that is, 2D-HA and 2D-QA. Here the redistribution of measurement points is conducted at every 5th iteration of the 4D-Var assimilation cycle. Until the 5th iteration the values of the cost function and the global error are the same as those of Figure 4. At the 6th iteration the value of the cost function rapidly increases because the measurement points are redistributed to the regions where the global error is relatively large. The cost function again decreases after a few iterations. During the reduction of the cost function, the global error values are also decreased in 2D-HA and 2D-QA, and the values become smaller than those in Figure 4. This implies that the global error can be effectively reduced by locating measurement points in regions where the relative error is large. And it is realized by evaluating the measurement sensitivity during data assimilation. Even with the adaptive measurement strategy, the global error of 2D-HA and 2D-QA is larger than that of 2D-F where measurements are given at all mesh points.

Figure 6 shows the distribution of measurement points in 2D-HA, where the total number of measurement points is one fourth of that in 2D-F. Figure 6(a) shows the initial distribution, and Figure 6(b) shows the distribution after the first redistribution of measurement points conducted at 6th iteration of the 4D-Var cycle. These measurement points are colored by the magnitude of the measurement sensitivity at those locations, where reddish color indicates larger sensitivity. It is confirmed that the measurement points are clustered near vortices. Since the measurement points are redistributed using the mesh points of the numerical simulation, the measurement points do not come closer than the distance of mesh spacing. The resolution of measurements finer than that of numerical simulation may not improve the retrieved flow field, although accurate estimates of the observational errors, including the error correlation structures, become available in the context of data assimilation. Figure 6(c) shows the distribution of measurement points at 16th iteration, where the distribution is similar to that of Figure 6(b). In the same way, Figure 7 shows the measurement points of 2D-QA where the total number of measurements is one sixteenth of that in 2D-F. As in the 2D-HA the measurement points are clustered near vortices after the first adaptation.

**(a)**

**(b)**

**(c)**

**(a)**

**(b)**

**(c)**

##### 4.2. Two-Dimensional Case for a Long Time Period

In this section, we again consider a two-dimensional case, but for a longer time evolution considering multiple assimilation windows during that period. Figure 8 shows the history of the cost function for several data assimilation strategies, where the measurements produced from the reference flow field are provided for data assimilation in 2DL-F, while every second mesh point in each spatial direction has measurements in 2DL-H; that is, the total number of measurement points is reduced to one fourth of the 2DL-F. On the other hand, 2DL-HA is the case where measurement points are redistributed based on the measurement sensitivity. The number of measurement points is the same as 2DL-H. As in the previous section, the cost function indicates the difference between the reference and the reproduced flow fields evaluated at measurement points. And the global error indicates the error evaluated at all mesh points. The horizontal axis shows the number of iterations where the 108th iteration corresponds to one vortex reference time s. Here the period is divided into nine subperiods and the 4D-Var iteration is conducted for 12 times during each subperiod. The cumulative number of the 4D-Var iteration is shown in the horizontal axis of Figure 8.

The 2DL-F shows a drastic reduction of the cost function, where the values of the cost function and the global error are the same in this case. The 2DL-HA shows the degraded reduction of both cost function and global error. Even so, the global error keeps deceasing during the evolution of a vortex pair. 2DL-HA indicates the improved error reduction compared to 2DL-H.

Two-dimensional evolution of a vortex pair is well evaluated by the evolution of vortex positions because a vortex pair descends due to self-induced velocity; therefore, the velocity distribution needs to be correctly reproduced to obtain correct vortex positions in time. Figure 9 shows the vertical vortex positions from the reference case, from an initial flow field before data assimilation and from the reproduced flow field by data assimilation. Here the results of 2DL-H and 2DL-HA are shown. Before the data assimilation, the vortex descent is reduced due to the weaker vortex circulation compared to the reference case. For 2DL-H and 2DL-HA, a descending distance is comparable to that of the reference case. The difference between 2DL-H and 2DL-HA is not significant.

Figure 10 shows the distribution of measurement points during the time integration in 2DL-HA. Those points are colored by the magnitude of the measurement sensitivity, where reddish color indicates the larger sensitivity. Initially the measurement points distribute near a vortex pair as shown in Figure 10(a). Later the measurement points spread to wider area and the magnitude of measurement sensitivity becomes small.

**(a)**

**(b)**

**(c)**

##### 4.3. Three-Dimensional Case for a Long Time Period

In this section, we consider the three-dimensional evolution of a vortex pair. As in the previous section, we use an iterative procedure for a long time evolution of one vortex reference time. Figure 11 shows the history of the cost function for several data assimilation strategies, where the measurements produced from the reference case are provided for data assimilation in 3D-F, while every second mesh point in each spatial direction has measurements in 3D-H; that is, the total number of measurement points is reduced to one eighth of the 3D-F. On the other hand, 3D-HA is the case where measurements points are redistributed based on the measurement sensitivity. The horizontal axis shows the number of iterations in the same way as in Figure 8. As in the previous two-dimensional case, 3D-HA case effectively reduces the global error compared to 3D-H.

Figure 12 shows the evolution of a vortex pair, where the sinusoidal disturbance applied to the reference case substantially affects the evolution of a vortex pair such as vortex reconnection. The evolution of the reference case is shown in Figures 12(a) to 12(c), while that of the reproduced flow field is shown in Figures 12(d) to 12(f). The initial state of the reproduced vortex pair in Figure 12(d) is disturbed near the domain edge, that is, the position of vortex reconnection; however, it disappears as the data assimilation process proceeds. Note that the vortex pair shown in Figure 3(b) keeps descending without any three-dimensional vortex deformations if we do not apply the 4D-Var data assimilation. Therefore, the flow field at appears very different with or without assimilating measurements.

**(a)**

**(b)**

**(c)**

**(d)**

**(e)**

**(f)**

Figure 13 shows the evolution of measurement points colored by the measurement sensitivity. For visualization purpose, a measurement point which has measurement sensitivity smaller than a certain threshold is not shown. As in Figure 13(a), the measurement points are focused near a vortex pair, especially near the position of vortex reconnection. In the later time in Figure 13(c), the measurement points cluster near the position of vortex reconnection. It is confirmed that the error of the reproduced flow field against the reference flow field remains near the position of vortex reconnection, and the measurement points are effectively distributed near the location with the higher measurement sensitivity.

**(a)**

**(b)**

**(c)**

#### 5. Conclusions

In this study we conducted the sensitivity analysis of a vortical flow field within a framework of the 4D-Var data assimilation. The idea of adaptive/targeted observation in meteorology which aims to effectively determine a flow state by limited measurements was employed in fluid dynamic problems where unsteady vortical flows of much smaller scales are of interest. We firstly investigated a two-dimensional flow field defined by a pair of vortices which descends due to self-induced advection velocity. The amount of measurement points affected the convergence of the cost function as well as that of the global error against the reference flow field. The measurement strategy based on the measurement sensitivity effectively redistributed measurement points near vortices. This resulted in the further reduction of the global error. Then, the same configuration with a longer time period was investigated by repeating 4D-Var data assimilation with a short assimilation window, where the impact of adaptive measurement was also confirmed. In the three-dimensional configuration of a vortex pair, the reproduction of the flow field becomes difficult because there exist three-dimensional mechanisms leading to flow instabilities such as the Crow instability. The redistribution of measurement points based on the measurement sensitivity successfully reduced the global error against the reference flow field. During the data assimilation iterations, the measurement points were focused near the position of the vortex reconnection which helps to reproduce the onset of the vortex reconnection.

#### Conflict of Interests

The authors declare that there is no conflict of interests regarding the publication of this paper.

#### Acknowledgments

The authors would like to thank the anonymous reviewer for the constructive comments, which helped us to improve the paper. The authors also thank the Advanced Fluid Information Research Center at Institute of Fluid Science, Tohoku University, for the computational resources used in this study.