#### Abstract

Based on an internal tidal model, the practical performances of the limited-memory BFGS (L-BFGS) method and two gradient descent (GD) methods (the normal one with Wolfe’s line search and the simplified one) are investigated computationally through a series of ideal experiments in which the open boundary conditions (OBCs) are inverted by assimilating the interior observations with the adjoint method. In the case that the observations closer to the unknown boundary are included for assimilation, the L-BFGS method performs the best. As compared with the simplified GD method, the normal one really uses less iteration to reach a satisfactory solution, but its advantage over the simplified one is much smaller than expected. In the case that only the observations that are further from the unknown boundary are assimilated, the simplified GD method performs the best instead, whereas the performances of the other two methods are not satisfactory. The advanced L-BFGS algorithm and Wolfe’s line search still need to be improved when applied to the practical cases. The simplified GD method, which is controllable and easy to implement, should be regarded seriously as a choice, especially when the classical advanced optimization techniques fail or perform poorly.

#### 1. Introduction

In the recent years, with the rapid advances in ocean observing systems, increasingly more oceanic observation data have become available. At the same time, mainframe supercomputers have become more powerful, with improvements in memory size, the introduction of multiprocessor capabilities as well as enhanced processor speed. These advances have provided us with an opportunity to improve the precision of numerical simulations of the ocean with these available observation data, for both operational and research purposes. Indeed, this is an active topic of a fast-growing research field, data assimilation, which is an effective method for marine research, and it has become widely used in meteorological and oceanographic predictions in recent years. Among all data assimilation methods, four-dimensional variational (4D-Var) data assimilation is often considered as one of the most effective and powerful approaches developed over the past two decades. It is an advanced data assimilation method that involves the adjoint technique. This method has the advantage of directly assimilating various observations distributed in time and space into the numerical model and can maintain dynamical and physical consistency with the model at the same time. Thus, 4D-Var has been widely applied to meteorological and oceanographic data assimilation, sensitivity studies, and parameter estimation. The issue of 4D-Var applied to the shallow water equations model has been the subject of numerous works (see, e.g., [1–29] and references therein).

In this paper, we are interested in solving a large-scale optimization problem minimizing a cost functional where represents the control variables and is defined by the open boundary conditions (OBCs) for a regional internal tidal model, which are denoted by a series of Fourier coefficients in the present work, is the time, denotes the assimilation window where and are the initial and final time, respectively, and and are the time-dependent state variable in an -dimensional Euclidean space and the time-dependent observation variable in an -dimensional Euclidean space , respectively. The operator represents a projection from the -dimensional space () of the model solution to the -dimensional space () of observations. Superscript indicates transpose. The components of are values of the various model fields (elevation, velocity, temperature, salinity, etc., alone or in combination) at each of the model grid locations. The number of components of is denoted by , and the number represents the number of observations at any given time. In general, is larger than . In fact, the objective function is the weighted sum of squares of distance between the model solution and available observations distributed in space and time. is the weighting matrix and theoretically should be the inverse of observation error covariance matrix [1, 2]. However, determining the “exact” form for is far from easy [3]. Since this work represents a preliminary study in the application of an adjoint assimilation model, we proceed under the best and even unrealistic scenario by assuming that there are no observation errors and that the model solutions will have a perfect fit to the observations such that . Therefore, is simplified to be unit matrix [30].

The control variables and the state vector satisfy the time-dependent model equations of the form where is model vector function of . In general, the model equations (2) are nonlinear and are assumed to be twice continuously differentiable.

The objective of the adjoint assimilation method is to find model control variables that minimize the cost functional . Note that is only a function of the control variables because is uniquely defined by the model equations (2) for any prescribed . The control variables can be initial conditions, boundary conditions or parameters to be estimated, or combination of them. In a regional ocean model, the open boundary conditions, which must be prescribed to complete the model description at non-land boundaries, are very important and have a critical impact on the modeling results. Therefore, for simplicity, only OBCs are taken as control variables in this paper. A number of studies have been concerned about the OBC estimation with adjoint method. For example, the work of Lardner et al. [4] discussed the optimal control of OBC in the channel using a 2D adjoint tidal model. Seiler [5] performed a series of identical twin assimilation experiments using the adjoint method to estimate lateral open boundary values of stream function and relative vorticity for a quasigeostrophic open ocean model. Zou et al. [2] also developed a sequential open boundary control scheme augmenting radiation conditions and applied it to idealized barotropic wind-driven ocean simulations. ten Brummelhuis et al. [6] used the data assimilation technique to estimate the OBCs for a shallow sea model of the entire European continental shelf. Heemink et al. [7] used the adjoint approach for a 3D shallow water flow system (TRIWAQ) to estimate the harmonic constants in the OBCs, the friction parameter, viscosity parameter, and water depth by assimilating tide gauge data as well as altimeter data. Shulman [8] proposed a data assimilation approach to specify open boundary conditions, and their method was then tested by Shulman et al. [9] with respect to the improvement of the model prediction skills of the tidal amplitudes and phases in the Yellow Sea. Taillandier et al. [10] used a four-dimensional variational data assimilation method to control boundary conditions in a nonstratified open coastal model. In the work of Zhang et al. [11], lateral tidal OBCs that forced tides in the internal region were estimated by assimilating predicted coastal tidal elevations into a 2D POM with the adjoint method. Gejadze and Copeland [12] studied the open boundary control problem for free-surface barotropic Navier-Stokes equations with adjoint data assimilation technology. By assimilating the tidal harmonic constants derived from T/P altimeter data with a 3D numerical barotropic adjoint tidal model constructed in [13], Zhang and Lu [14] optimized the OBCs and simulated the tide and tidal current in the Bohai and Yellow Seas. Based on a 2D tidal model, Cao et al. [15] and Guo et al. [16] studied the inversion of OBCs by assimilating the T/P altimeter data with adjoint method. Kazantsev [17] used a variational data assimilation technique to control boundary conditions on rigid boundaries for a shallow-water model. Recently, in the work of Chen et al. [18], an adjoint assimilation model for the simulation of internal tides was constructed and simply tested with a series of ideal experiments in which several prescribed spatial distributions of OBCs were successfully inverted by assimilating the model-generated pseudoobservations.

The previous minimization problem (1) requires large-scale optimization techniques. This problem attempts to find a solution of the model equations (2) that best fits, in the generalized least-squares sense, the observations distributed in space and time. In meteorology and oceanography, the fluid dynamics can be enforced through the use of a Lagrangian function constructed by appending the model equations to the cost function as constraints. To solve problem (1), many feasible large-scale optimization methods can be found in [31]. However, the number of studies discussing the performances of various optimization methods in the meteorological and oceanographic application is still relatively small. Wang et al. [19] proposed an adjoint truncated Newton algorithm for the large-scale unconstrained minimization in the meteorology application and applied this algorithm to a limited-area shallow-water equation model with model-generated data where the initial conditions served as control variables. Subsequently, Wang et al. [20] presented a new adjoint Newton algorithm for carrying out large-scale unconstrained optimization required in variational data assimilation and applied this algorithm to three 1D models and to one 2D limited-area shallow-water equation model with both model-generated and First Global Geophysical Experiment data. Zou et al. [21] compared the performances of several limited-memory quasi-Newton and truncated Newton methods for unconstrained nonlinear optimization on some large-scale problems in oceanography and meteorology. Their results showed that among the tested limited-memory quasi-Newton methods, the limited-memory BFGS (L-BFGS) method of Liu and Nocedal [32] has the best overall performance for the problems examined, and the numerical performance of two truncated Newton methods, differing in the inner-loop solution for the search vector, is competitive with that of L-BFGS. More recently, Alekseev et al. [33] compared the performance of several advanced large-scale minimization algorithms, including the conjugate gradient method, quasi-Newton (BFGS), the limited-memory quasi-Newton (L-BFGS), Hessian Free Newton method, and a new hybrid algorithm, applied to the minimization of the cost functional in the solution of ill-posed inverse computational fluid dynamics problems related to parameter estimation. In the work of [14], the comparison of a simple gradient descent (GD) method with the L-BFGS algorithm was conducted within a 3D barotropic tidal model, and the results demonstrate that, compared with the steepest descent method, the L-BFGS really uses much fewer steps to reach a minimum, while the values of the minimized cost function and inverted Fourier coefficient are almost the same. In the work of [18], the L-BFGS version of Liu and Nocedal [32], combined with the adjoint assimilation technique, was applied to the optimization of the OBCs for an internal tidal model and was proved to be very efficient.

The main objective of this paper is to make a computational investigation on the performances of the L-BFGS method and two versions of GD method. All these methods do not require any evaluations of the Hessian matrices but gradient vectors and thus are computationally feasible. In the present work, the number of the control variables (OBCs) is often large. In this case, the limited-memory methods are suggested, and one of the most commonly used limited-memory methods is the L-BFGS method [31, 32]. The L-BFGS method is an adaptation of the standard BFGS method to large-scale optimization problems. This method has been proved to be globally convergent for convex objective functions and provides a fast rate of linear convergence as well as requiring minimal storage. As the competitor, we choose the GD method. The GD method is a fundamental and probably the earliest method for nonlinear optimization so that it attracts many researchers’ attention since its proposition. A recent and interesting report on the GD method can be found in [31, 34]. The GD method has the negative gradient direction as the search direction and is one of the simplest minimization algorithms that require calculation of derivatives. The GD method is particularly useful when the dimension of the problem is very large [35]. Two versions of GD method are considered here, differing in the formulation of the step length. One normally uses the step length satisfying the strong Wolfe conditions [31, 36, 37] which have been widely used in line search (see [38, 39] and references therein). And the other is a simplified GD method without the line search which has been used in a relatively small number of studies [13, 14, 22, 23]. In this paper, based on the model constructed by Chen et al. [18], the performances of the L-BFGS method and the two versions of GD method are compared through a series of ideal experiments in which the OBCs for a numerical internal tidal model are estimated by assimilating the interior observations with the adjoint method.

This paper is organized as follows. We start with a brief introduction of the two-layered internal tidal model in Section 2. For the details of the model, we refer interested readers to [18]. The optimization methods for comparison, including the L-BFGS method and two GD methods, are described briefly in Section 3. In Section 4, as illustrative numerical examples, a series of ideal experiments are carried out to invert the prescribed OBCs with the different methods examined here, and the experimental results are presented. Finally, we make a summary and draw some conclusions in Section 5.

#### 2. Internal Tidal Model

##### 2.1. Forward Model

A two-layer version of the numerical internal tidal model described in [18] is considered in this paper. Assuming that the potential density () in each layer is constant, the layer-averaged, nonlinear, time-dependent continuity and momentum equations of each layer subject to the hydrostatic approximations are derived from the primitive 3D governing equations. Here, the subscripts 1 and 2 refer to the upper and lower layers, respectively. Using spherical coordinates in the horizontal direction and isopycnic coordinates in the vertical one, we obtain the internal mode equations as follows:

upper layer:lower layer:Here, is the time, and are the eastern longitude and northern latitude, respectively, and are horizontal velocities in and , respectively, is the time-varying layer mass, and where and are the undisturbed layer thickness and interface (surface for ) elevation above the undisturbed level, respectively. is the radius of the earth, the gravitational acceleration, the Coriolis parameter, and , where represents the angular speed of Earth’s rotation, the horizontal eddy viscosity coefficient, the Laplace operator, and and are the interface and bottom friction coefficient, respectively, , and . In the forward model , and are the main outputs and called the state variables in this paper.

By defining the barotropic currents we can derive a 2D external mode from the internal mode equations. For further details about the external mode, we refer the interested reader to the previous work of [18].

Usually, the model domain is defined within a certain range of space and time, enclosed by the initial and boundary conditions. In the present model, a zero field is assigned to the model state variables as the initial conditions. The model is run for several tidal cycles, and the simulation results in the last cycle are taken as the model output. The boundary located in the wet grid is treated as the open boundary, and the boundary located in the dry grid (e.g., the island and the land) is treated as the closed boundary. The details of the boundary conditions are described as follows.

The Flather condition which was originally proposed by Flather [40] yielded good results [18, 41] when applied to the open boundaries in this model. In this paper, an adaptation of the Flather condition used by [18] is installed in the external mode of the forward model, which is given as follows: where is external data beyond the model boundary representing the clamped surface elevation relating to the boundary barotropic tidal force and is the time-varying total water depth. The sign in (8) depends on the boundary location (positive for eastern and northern boundaries; negative for western and southern boundaries).

For the internal mode, the relaxation conditions, which have been found to have a robust performance in a variety of situations [42, 43], are applied at open boundaries. In this condition, a relaxation region of several grids close to the boundary is defined. Within this region, total tidal values including the interface elevation and fluid velocity of both barotropic and baroclinic tides are first calculated using the standard discretization and then relaxed towards the barotropic values as follows: to give the final values . Here, represents the tidal states , , or , and represent the barotropic and baroclinic tides, respectively, and is the relaxation factor in grid . In our model, is chosen to be increased linearly from 0 to 1 while the grid is getting close to the open boundary.

Closed boundary conditions for both the internal and the external modes are zero flow normal to the coast; that is, for the grid points at closed boundary, where is the outward unit vector and is the velocity vector.

##### 2.2. Adjoint Model

The adjoint method is a powerful tool for parameter estimation. The basic idea of the adjoint method is quite simple: a model is defined by an algorithm and its control variables including initial conditions, boundary conditions, and empirical parameters. The cost function that measures the data misfit between the model output and observations is minimized through optimizing the control variables. In detail, the cost function decreases along a certain search direction which can be calculated with a certain optimization algorithm according to the gradient of cost function with respect to the control variables, and this gradient is calculated by what has been known as the adjoint model. Based on the governing equations (3a)–(4c) of the forward model, its adjoint model can be constructed as follows. The details of the adjoint model in a generalized form (not limited to the two-layer case) can be found in the work of [18].

Concretely, the cost function is defined by where is layer index, denotes the model domain of both time and space, represents the generalized control variables and is defined by the open boundary conditions (OBCs) for a regional internal tidal model, which are denoted by a series of Fourier coefficients in the present work, and are the models simulated, and and are the observations. In (10), the variables and can be uniquely determined by the control via the model equations (3a)–(4c). Hence, the functional is implicitly dependent on .

The Lagrangian function is defined by where , , and are called adjoint variables of the state variables , , and , respectively. denote the left-hand side of (3a), (3b), …, (4c), respectively.

According to the typical theory of Lagrangian multiplier method, we have the following first-order derivatives of Lagrangian function with respect to all the variables and parameters: Equations (12) return the governing equations (3a)–(4c). The adjoint equations can be derived using (13). From (14), we can obtain the gradients of the cost function with respect to control variables.

Similar to the forward model, the adjoint model also consists of the internal and external modes. Actually, the equations derived from (13) are considered as the internal mode, and the external mode can be derived from the internal mode in a similar way the external mode of the forward model is derived. The details of both the internal and external modes of the adjoint model can be found in [18].

##### 2.3. Discretization

Several numerical methods have been widely used in the discretization of time-dependent 3D primitive equations. The time integration schemes of these methods can be fully explicit [44], semi-implicit [45, 46] or fully implicit [47]. For large-scale oceanic problems, the applications of 3D models are becoming a reality with the aid of modern computers. The fully explicit finite difference method is relatively simple to implement, except that its time step is strictly restricted by the Courant-Friedrich-Lewy (CFL) stability criterion [48]. At present, many existing ocean models are based on an Alternating Direction Implicit (ADI) method which was proposed for the approximate solution of the shallow-water equations in [49]. ADI method results in computational efficiency superior to fully explicit methods because their improved stability allows large time steps to be employed, and it is also easy for implementation (see [13, 14]). Since the model must simulate fields of both velocity and elevations in each isopycnic layer, a technique known as external-internal mode splitting has been used in several ocean models by Simons [50]. Complete details on the model discretization were given in [18].

##### 2.4. Gradients

Among all the control variables in this model, the OBCs are the most important and have critical impacts for a regional tidal simulation. Solutions in model interior are uniquely determined by the tidal OBCs once the initial conditions and other parameters have been determined. The Flather condition (8) can be determined once the external data is described. Hence, the determination of the open boundary conditions for the present model is equivalent to the estimation of the tidal force . Assume that at a certain open boundary grid point , the tidal force at the th time step is subject to where denotes the frequency of constituent, is the time step length, and and are the Fourier coefficients as well as tunable parameters in the model. The gradients of cost function with respect to and can be deduced from (14) which yields Details on computing were given in [18].

Having determined the adjoint variables (, and ) with the adjoint model, the gradients of cost function with respect to the OBCs (i.e., the Fourier coefficients and ) can thus be calculated. Then, the OBCs are optimized with some minimization algorithms which will be described in the following section.

#### 3. Optimization Methods

In general, numerical methods solving the minimization problem (1) have the following iterative formula: where , , and are current iterative point, a positive step length, and a search direction, respectively. For simplicity, and are denoted by and , respectively. There are many formulas to define search direction in formula (17) [31]. In this paper, three optimization methods are compared when applied to the assimilation model. They are different in the selection of or . The three methods are given as follows.

##### 3.1. L-BFGS Method

The L-BFGS method is a well-known limited-memory quasi-Newton method for solving large-scale problems whose Hessian matrices cannot be computed at a reasonable cost or are not sparse [31]. It requires the search direction to be where

Here, , , , , is the identity matrix, and is the initial Hessian approximation. Many previous studies have shown that typically , where does not improve the performance of L-BFGS. In this paper, the L-BFGS version of Liu and Nocedal [32] is employed, and the Fortran codes are authored by Nocedal [51], while and the number of corrections used in the L-BFGS update is taken as 5 [33].

##### 3.2. Gradient Descent Method with Wolfe’s Line Search (GDM-W)

This method requires the search direction to satisfy while the step length satisfies the following strong Wolfe conditions [36, 37]: with . The line search routine enforcing the strong Wolfe conditions can also be found in [31].

##### 3.3. Simplified Gradient Descent Method (GDM-S)

This method requires the search direction to satisfy while the step length is simply defined by where the denominator denotes the norm of the gradient and is an empirical constant. This method has been used in a relatively small number of studies (see [13, 14, 23, 52]) showing that this method is indeed feasible and effective in practical application.

#### 4. Numerical Experiments

##### 4.1. Numerical Implementation

The performances of L-BFGS, GDM-W, and GDM-S are compared by a set of ideal experiments. The computing domain has horizontal grids (with the horizontal resolution ) and two vertical isopycnic layers (Figures 1(a) and 1(b)). The maximum undisturbed water depth is 6000 m, and the undisturbed interface is placed at the depth of 2000 m. The 2D bottom topography is constructed by the superposition of an inclined plane and a Gaussian surface (Figure 1(b)). The angular frequency of tide is , and the whole-time step is 496.863 s (1/90 of the period of tide) for both external and internal modes. The horizontal eddy viscosity coefficient is chosen to be . The coefficients of bottom and interface frictions are taken as and , respectively.

As shown in Figure 1(a), all four boundaries are open and installed with the passive Flather condition allowing the outgoing wave to radiate out of the computing domain through the open boundaries. In this work, for simplicity and no loss of generality, only the eastern boundary is treated as the source of tidal force driving the forward model (meaning that both Fourier coefficients and are equal to zero on the other three boundaries), and only the estimation of Fourier coefficient is studied. To be specific, on the eastern boundary, the Fourier coefficient is always treated as the known parameter (equal to 0 in this work), and the Fourier coefficient to be estimated here is assumed to be spatially distributed and is prescribed with a spatial distribution which is constructed by the trigonometric function (see Figure 1(c)).

Some grids are randomly picked out from the horizontal grids and treated as the observation positions (see Figure 1(a)). The forward model is run with the prescribed OBCs, and the model-generated results of the surface currents (i.e., currents in the upper layer) at these observation points are taken as the pseudo-observations (referred to as observations for brevity hereafter). Then, the OBCs can be optimized with the following steps.

*Step 1. *An initial value is chosen empirically and assigned to the unknown control variables.

*Step 2. *Perform the simulation by running the forward model, and the simulation results, mainly the state variables especially including the current velocity at observation points, are obtained. The value of the cost function is worked out in this step.

*Step 3. * The difference between simulation results and observations serves as the external force driving the adjoint model. Also, the adjoint variables in a period of tide are calculated through backward integration of the adjoint equations.

*Step 4. *Having known both the state and adjoint variables from Steps 2 and 3, respectively, the gradients with respect to control variables to be inverted are calculated by formula (16).

*Step 5. *Update the unknown control variables with a certain optimization algorithm.

*Step 6. *If some stopping criterion is satisfied, then stop and return the optimized control variables, otherwise; replace the initial value with the new control variables and go to Step 2.

With the procedure previous repeated, the OBCs will be optimized continuously, and the difference between simulated values and observations will be diminished. Meanwhile, the difference between the prescribed and the inverted OBCs will also be decreased. The initial value of is taken as zero in each experiment. For all methods examined here, the total number of iterations is allowed to be 100 at most, and we report the cost function values, the gradient norms, and the inversion results obtained by each method. Note that the L-BFGS iteration may stop before it reaches the maximum number due to the termination criterion installed in the L-BFGS routine authored by Nocedal [51]. The number of corrections used in the L-BFGS update is taken as 5. For the GDM-W, Wolfe’s line search is carried out with parameters and . For the GDM-S, the constant is experientially chosen to be 0.1.

##### 4.2. Results

As shown in Figure 1(a), all observations can be divided into two groups: O1 which are closer to the eastern boundary and O2 which are further from the eastern boundary. Accordingly, the experiments are divided into three groups: Group 1 (both O1 and O2 are used), Group 2 (only O1 is used), Group 3 (only O2 is used). After assimilation, the inversion error (in the root mean square sense) and correlation coefficient between the prescribed and inverted OBCs are reported in Table 1, and the inverted OBCs are compared with the prescribed ones in Figure 2. All experiment results will be examined group by group as follows.

**(a) Group 1**

**(b) Group 2**

**(c) Group 3**

First, let us examine the performances of the three methods in the experiments of Group 1 in which observations of both O1 and O2 are assimilated. The results of Group 1 shown in Figure 2(a) and Table 1 indicate that with sufficient observations assimilated, the inversion results obtained by using all three methods are satisfactory in terms of the inverted OBC curve as well as the inversion error and the correlation coefficient. And especially, the solution obtained with the L-BFGS method is the closest to the exact solution (the prescribed OBCs). The variation of the cost function normalized by its initial value, that of the norm of the gradient of the cost function with respect to the OBCs, and that of the inversion error are plotted in Figures 3(a), 3(b), and 3(c), respectively, as a function of the iteration number. The optimization procedure clearly shows that among the three methods examined, the L-BFGS performs the best and its convergence rate is much faster than those of the GD methods, which is consistent with the classical theories about the convergence rate of the quasi-Newton methods and GD method [31]. Now, let us pay more attention to the results obtained with GDM-W and GDM-S. The comparison between the two versions of GD method is rather interesting. At the beginning stage of the iteration, the GDM-W has a faster convergence rate than the GDM-S. After more than 20 iteration steps, the result of GDM-S begins to be better than that of the GDM-S. Then, after a neighborhood of the exact solution is reached, a slow convergence of GDM-S occurs in the following iterations, also with oscillations in both of the cost function and gradient norm which may be caused by the constant making the step length relatively overlarge when the solution is close to the exact one. In contrast, the GDM-W keeps the cost function and the gradient norm declining on the whole due to its step length satisfying the strong Wolfe conditions (21a) and (21b). It is worth noting that in the latter part of the assimilation procedure (after about 30 iteration steps when the oscillations in the cost function and in the gradient norm for the GDM-S appear (see Figures 3(a) and 3(b)), the convergence rate of the GDM-W is a little faster than that of the GDM-S. This leads to the final values of both the cost function and gradient norm obtained with the GDM-W slightly smaller than those obtained with the GDM-S. On the other hand, however, the comparison between the inversion errors for the GDM-W and GDM-S presents a different pattern. As shown in Figure 3(c), during the beginning several iterations, the descent rate for both GD methods is very rapid and the inversion error for the GDM-W has a faster descent rate than that for the GDM-S. This state continues until the 25th iteration step when the inversion error for the GDM-W begins to exceed that for the GDM-S, and these two methods almost begin to have the same slow descent rate of the inversion error. Finally, the inversion result obtained with the GDM-S is better than that obtained with the GDM-W (also see Figure 2(a) and Table 1).

**(a)**

**(b)**

**(c)**

For all experiments in Group 2, only the observations of O1 are assimilated. For this group, the variation of the cost function normalized by its initial value, that of the norm of the gradient of the cost function with respect to the OBCs, and that of the inversion error are plotted in Figures 4(a), 4(b), and 4(c), respectively, as a function of the iteration number. As we can see, the assimilation procedure of Group 2 shown in Figure 4 follows a similar pattern as that of Group 1 shown in Figure 3, although the number of observations for Group 2 is only half of that for Group 1. This suggests that removing the observations that are far from the unknown boundary may have very little effect upon the experiment results, whereas the observations that are closer to the eastern open boundary play a much more important role in the estimation of the OBCs in our model. Probably, this may be the main reason why we can obtain the satisfactory results in Group 1.

**(a)**

**(b)**

**(c)**

To sum up, from the previous results, we have seen that the L-BFGS method can hold its advantage over the other two versions of GD method in terms of both the optimization procedure and the final results providing the observations that are closer to the unknown boundary are included for assimilation, and the L-BFGS method is recommended in this case. Meanwhile, both results obtained with the two versions of GD method are also satisfactory. The comparison between the results obtained with the GDM-W and GDM-S indicates that the Wolfe’s line search is effective in finding a reasonable step length that can achieve adequate reductions in the cost function and can guarantee a rapid rate of convergence at the beginning of iteration. This allows the GDM-W to gain an advantage over the GDM-S in terms of the final values of the cost function and the gradient norm. Nevertheless, their difference is smaller than expected, and finally, the GDM-S excels the GDM-W in the inversion result which is an essential criterion testing whether the whole model is valid. Therefore, the GD method simplified with a constant step length is competitive to that with the Wolfe’s line search in this case.

It is very interesting to find that when we come to the results of Group 3 which are dramatically different from those of Group 2 although the same number of observations is used in these two groups. In Group 3, only the observations O2 that are further from the unknown boundary are assimilated. For this group, the variation of the cost function normalized by its initial value, that of the norm of the gradient of the cost function with respect to the OBCs, and that of the inversion error are plotted in Figures 5(a), 5(b) and 5(c), respectively, as a function of the iteration number. From the optimization procedure shown in Figure 5, we can clearly see that, in this case, the L-BFGS method stops at the 36th iteration step and fails to converge, and the GDM-W has an unacceptable slow convergence rate, both leading to a relatively large discrepancy between the inverted and prescribed OBCs (see Figure 2(c) and Table 1), whereas the simple GDM-S performs the best and, more importantly, can still yield satisfactory inversion result (see Figure 2(c) and Table 1). The performance of GDM-S shown in Figure 5 is more interesting. After about 10 steps of iteration in the beginning, the cost function and the gradient norm begin to vary with oscillations which become increasingly larger as the iteration goes on, but meanwhile, their minimum values are getting smaller. At last when the maximum of 100 iterations is reached, the cost function and the gradient norm are reduced by more than 3 orders of magnitude and more than 1.5 orders of magnitude, respectively, compared with their initial values, which are comparable to the results of GDM-S in Groups 1 and 2. On the other hand, the variation of the inversion error for GDM-S also contains oscillations, but compared with the oscillations in the cost function and gradient norm (see Figures 5(a) and 5(b)), the ones in the inversion error are much smaller, and therefore the variation curve looks much smoother (see Figure 5(c)). This may be caused by the high complexity of the cost function in the control variable space if the observations to be assimilated are far from the unknown boundary, and this high complexity may also be a possible reason for the failure of the L-BFGS method as well as for the inefficiency of Wolfe’s line search in Group 3. The results of Group 3 indicate that the far distance between the observations and the unknown boundary has almost little effect on the validity and feasibility of the GDM-S, and therefore the simplified GD method is recommended in this case, instead of the L-BFGS method.

**(a)**

**(b)**

**(c)**

#### 5. Summary and Conclusions

In this paper, based on the numerical internal tidal model constructed by Chen et al. [18], the practical performances of the L-BFGS method (given by Liu and Nocedal [32]) and two versions of GD method are compared and investigated computationally through a series of ideal experiments in which the OBCs are estimated by assimilating the interior observations with the adjoint method. The two GD methods include the normal one with Wolfe’s line search for the step length and the simplified one with a fixed step length which should be predetermined by modeler. The cost function, the gradient norm, and the inversion result are reported and examined for each experiment.

According to the locations of the observations assimilated, all observations can be divided into two groups. Accordingly, all experiments are divided into three groups. In both of the first two groups, the observations that are closer to the unknown boundary are included for assimilation. In this case, as expected, the L-BFGS method has a remarkably better performance than the GD methods, not only in terms of the convergence rate but also in terms of the final result. This is consistent with the classical theory about the convergence properties of the L-BFGS method and the GD method. On the other hand, compared with the simplified GD method, the normal one with Wolfe’s line search can really use fewer iteration steps to reach a satisfactory solution, but the values of the minimized cost function and the gradient norm are almost the same. Although great computational efforts have been made to implement Wolfe’s line search to optimize the step length, the effect is much smaller than expected, and even the normal GD method finally yields a worse inversion result than the simplified one. In the third group of experiments, only the observations that are further from the unknown boundary are assimilated. In this case, the simplified GD method remains effective and yields satisfactory results, whereas the L-BFGS method fails to converge and the GD method with Wolfe’s line search presents an unacceptable slow convergence rate. This demonstrates that in the practical application, the simplified GD method, which is controllable and easy to implement, is competitive to both the classical L-BFGS method and the normal GD method with Wolfe’s line search. This suggests that when applied to the practical cases, the advanced L-BFGS algorithm and Wolfe’s line search may still need to be improved, while how to take sufficient advantage of some simple methods is still worth investigating. Nonetheless, the simplified GD method should not be ignored and should be regarded seriously as a choice, especially when the classical advanced optimization techniques fail or perform poorly. Furthermore, how to take sufficient advantage of some other simplified methods is also worth investigating.

#### Acknowledgment

The authors would like to deeply thank the editor and two anonymous reviewers for their constructive suggestions whereby their paper has been improved greatly. Partial support for this research was provided by the National Natural Science Foundation of China through Grant Nos. 41076006 and 41206001, the State Ministry of Science and Technology of China under Contract nos. 2008AA09A402, 2013AA091201, and 2013AA091202, the Ministry of Education’s 111 Project through Grant B07036, and the Fundamental Research Funds for the Central Universities 201261006 and 201262007.