Abstract

Based on an isopycnic-coordinate internal tidal model with the adjoint method, the inversion of spatially varying vertical eddy viscosity coefficient (VEVC) is studied in two groups of numerical experiments. In Group One, the influences of independent point schemes (IPSs) exerting on parameter inversion are discussed. Results demonstrate that the VEVCs can be inverted successfully with IPSs and the model has the best performance with the optimal IPSs. Using the optimal IPSs obtained in Group One, the inversions of VEVCs on two different Gaussian bottom topographies are carried out in Group Two. In addition, performances of two optimization methods of which one is the limited-memory Broyden-Fletcher-Goldfarb-Shanno (L-BFGS) method and the other is a simplified gradient descent method (GDM-S) are also investigated. Results of the experiments indicate that this adjoint model is capable to invert the VEVC with spatially distribution, no matter which optimization method is taken. The L-BFGS method has a better performance in terms of the convergence rate and the inversion results. In general, the L-BFGS method is a more effective and efficient optimization method than the GDM-S.

1. Introduction

Internal tide, which is the internal wave of tidal frequency, is a ubiquitous phenomenon in the oceans. Rattray [1], Baines [2], Bell [3], Baines [4], Craig [5], Gerkema [6], and Llewellyn Smith and Young [7] have developed theoretical models and obtained some analytical solutions of internal tide on ideal topographies. These theoretical models helped them investigate the generation and propagation of internal tide. Although great progress has been made on the internal tide theory and some analytical works were carried out, only a small amount of solutions can be provided, due to the complexity of the problems. For this reason, quantitative analysis with practical significance still has to rely on the combination of numerical simulation, theoretical analysis, experiment, and observation. Numerical simulation is an effective method in marine research and has been widely used in the internal tide research. Kang et al. [8] investigated the internal tide near Hawaii with a two-dimensional, two-layered numerical model and confirmed that the internal tide was generated by barotropic forcing at the Hawaiian Ridge and propagated in north-northeast and south-southwest directions. Based on a high-precision three-dimensional Princeton Ocean Model (POM), Niwa and Hibiya [9] obtained the distribution of the internal tide in the Pacific Ocean using the TOPEX/Poseidon (T/P) satellite data. Cummins et al. [10] simulated the generation and propagation of internal tides near the Aleutian Ridge using T/P altimeter data. The comparison between the altimeter data and their model results showed good agreement for the phase, which also provided evidence for wave fraction near the Aleutian Ridge. With a three-dimensional POM, Niwa and Hibiya [11] investigated the distribution and the energy of the internal tide around the continental shelf edge in the East China Sea. Their numerical experiment results indicated that internal tides are effectively generated over prominent topographies such as sea ridges, island chains, and straits. Jan et al. [12] modified the POM to study the generation of the internal tide and its influence on surface tide in the South China Sea. The conversion from the barotropic energy to the baroclinic energy over topographic ridges in the Luzon Strait was also estimated.

Determination of the vertical eddy viscosity coefficient (VEVC), which describes the vertical mixing in the ocean, plays an important role in the study of energy exchange and material transportation. The VEVC is regularly regarded as a constant in numerical models. Schemes to determine the VEVC mainly include the Prandtl mixing-length hypothesis model, the model, the Pacanowski-Philander mixing model [13], and some turbulent closure models that are more complicated. Many studies have been carried out to investigate the variation of the VEVC [1418]. All these mentioned studies indicate that due to different intensions of the vertical mixing in sea water, the VEVC should not be treated as a constant but a parameter with spatial distribution.

Satellite remote sensing technology and other related technologies provide us with a large number of data. Thus, it is one of the most important missions in physical oceanography to make use of the data efficiently and precisely as well as to combine the observation data with present numerical models. Indeed, data assimilation with the adjoint method provides an effective access to these missions. The use of the adjoint method in marine science can be traced back to 1980s. The adjoint model is capable of optimizing control parameters in numerical simulation. Bennett and McIntosh [19] applied the weak constraint thought to solve the tidal problem and the geostrophic-flow problem. Yu and O’Brien [20] assimilated both meterological and oceanographic data into an oceanic Ekman layer model and deduced the unknown boundary condition, the unknown vertical eddy viscosity, and the current field. Based on a tidal model with a two-level leapfrog method, Lardner [21] inverted the open boundary conditions in three-test problems. Seiler [22] used the adjoint method to assimilate observations into a quasi-geostrophic ocean model and estimated the lateral boundary values in ideal experiments. Navon [23] wrote a summary of the parameter estimation in meteorology and oceanography in the view of applications with four- dimensional variational data assimilation techniques. Using an automatic differentiation compiler, Ayoub [24] constructed the adjoint model of the Massachusetts Institute of Technology Ocean General Circulation Model and inverted the open boundary conditions in the North Atlantic. Zhang and Lu [25] developed a three-dimensional nonlinear numerical tidal model with the adjoint method and designed several numerical experiments to estimate three kinds of parameters including the open boundary conditions, the bottom friction coefficients, and the vertical eddy viscosity coefficients. Zhang and Lu [26] employed a two-dimensional tidal model to study the inversion of the bottom friction coefficients in the Bohai Sea and the Yellow Sea with the adjoint method. Chen et al. [27] constructed a three-dimensional internal tidal model with the adjoint method and estimated six different kinds of open boundary conditions on fourteen types of topography. Based on a tidal model, Zhang and Chen [28] carried out several semiidealized experiments to estimate the partly and fully spatial varying open boundary conditions. Cao et al. [29] investigated the inversion of open boundary conditions with a three-dimensional internal tidal model and simulated the internal tide around Hawaii by assimilating T/P data.

There are two main objectives of this paper. One is to study the inversion of the VEVC with an internal tidal model and the adjoint method. According to the introductions above, a lot of studies have been carried out to investigate the inversion of the control parameters of internal tide such as the open boundary condition [29, 30] and the bottom friction condition [31, 32]. However, few works are found to study the inversion of VEVC. Since VEVC is a decisive factor to describe the vertical mixing in the ocean, it is necessary to pay attention to the inversion of the VEVC. The other objective is to make a computational investigation on the performance of the gradient descent method and the limited-memory Broyden-Fletcher-Goldfarb-Shanno (L-BFGS) method for the inversion of VEVCs based on the model constructed by Chen et al. [27]. Both of the methods do not require any evaluations of the Hessian matrices but gradient vectors and, thus, are computationally feasible. Chen et al. [30] have made a comparative study on several optimization methods but it is on the inversion of the open boundary conditions which is a one-dimension case. The feasibility of these optimization methods for two-dimensional case such as the inversion of the VEVC needs further studies.

Two groups of numerical experiments are carried out to study the inversion of spatially varying VEVCs based on an isopycnic-coordinate internal tidal model with the adjoint method. In Group One, the influences of independent point schemes (IPSs) exerting on parameter inversion are discussed. Group Two investigates the inversions of VEVCs on two different Gaussian bottom topographies and the performances of two optimization methods which are the GDM-S and the L-BFGS methods.

This paper is organized as follows. Section 2 briefly introduces the adjoint tidal model and the methodology. Two optimization methods, including the GDM-S and the L-BFGS methods, are described in Section 3. Section 4 presents design and process of the experiments in detail. Results of the experiments are discussed in Section 5. Finally, we make a summary and draw some conclusions in Section 6.

2. Numerical Model Introduction

An isopycnic-coordinate internal tidal model with adjoint assimilation method is employed in this paper. There are two parts in the internal tide model. One is forward model with the governing equations and the other is adjoint model with the adjoint equations. The two models are used to simulate the internal tide and to optimize the control variables, respectively. Chen et al. [27] had introduced the two parts in great detail and tested the reasonability and feasibility of the model. The formulation will not be presented in this paper. The derivation of VEVC adjustment, introduction of the two optimization method, test of the adjoint method, and the independent point scheme (IPS) are described in details in this part.

2.1. Test of the Adjoint Method

According to the equations and derivations of Chen et al. [27], the formula to invert the VEVC can be derived. The first derivative of Lagrangian function with respect to VEVC is obtained as follows:where is the value of VEVC at grid in the th layer. The gradient of cost functions with respect to the VEVC in the grid can be deduced as follows:where is the potential density in the th layer, and are horizontal velocities at the th time step, and are the adjoint variables of and , respectively, and is the initial thickness of the th layer. The detailed derivation of (2) is presented in the appendix.

Accurately programming the adjoint in such problems as the present one is quite tricky and experience has shown that it is essential to check the accuracy of the adjoint computation before proceeding with the minimization runs [33]. The correctness of the adjoint method is verified in this section. Take the first-order term of a Taylor expansion for the cost function and we obtain the following equation:

Here, is a general point of the control variable, is the computed gradient, and is an arbitrary unit vector in the parameter space. Based on (3) a function of can be written as follows:where is a small real number that is not equal to zero. If the adjoint methodology is correct, it is supposed that according to (4). In this paper, the VEVC variable is treated as and test of the adjoint method is based on (4).

In order to test the accuracy of the adjoint method, two experiments are carried out in which two different types of are used. The different vector directions are and , respectively.

Figure 1 indicates the trends of as approaches to 0. It is clear that in both experiments when is less than 10−3, values of   (solid lines) are both very close to 1 (dashed lines). Equation is proved and the correctness of gradient computed in the adjoint model is verified.

2.2. Independent Point Scheme

The available observation data may not be sufficient and control parameters to be determined may be excessive in practice. That may cause ill-posedness of the inversion problem. Richardson and Panchang [34] first noted that if an adjoint method is applied, when there is a big error in data, the solution will be unstable and not unique. Many researchers have made progress in solving this problem. A lot of work [26, 27, 30, 32] have been done to prove the capability and feasibility of the independent point scheme (IPS) in solving ill-posed problems of inversion.

In this paper, the IPS is used to optimize the control parameter. The basic idea of IPS is as follows: some grids (e.g., ) are selected as the independent points; it is assumed that represents the value of VEVC in grid , so values of VEVC in all grids can be calculated from with linear interpolation method. The computing formula is given as follows:where is the weight coefficient of the Cressman form [35]:where is the center distance between and and is the influence radius. According to (5), the gradient of with respect to can be written aswhere is the gradient of with respect to the VEVC at the grid and can be calculated using formula (2).

According to Section 2.1, the correctness of the gradient with respect to the independent points should be tested. Based on (5), the perturbations applied to the independent points have a linear impact on those applied to the nonindependent points. The convergences for the nonindependent points will remain the same after linear transformation for independent points.

The values of VEVC at the independent grids can be calculated inside the model and values at other grids are gained through interpolation using (5). Then the spatial distribution of VEVC in the entire area is obtained.

3. Optimization Algorithms

There have been many large-scale optimization methods to solve the minimization problem [36]. Four main methods are line search method (e.g., Wolfe and Goldstein), trust-region method, conjugate gradient method (e.g., Fletcher-Reeves), and quasi-newton method (e.g., BFGS and L-BFGS). However, the number of studies discussing the performances of various optimization methods in the meteorological and oceanographic application is still relatively small [30]. Line search method requires repeated computations for the cost function and the gradient. Therefore it spends too much computation resource during numerical simulations especially for physical oceanography. Besides, line search methods may fail in some cases [37]. The L-BFGS method is commonly used to solve large-scale problems in oceanography and meteorology [30, 38, 39]. Chen et al. [30] have made a computational investigation on the performances of the L-BFGS method and two versions of gradient descent method with an internal tide model. In their work, a simplified gradient descent method is applied which is able to avoid too much computation. The step length is chosen according to the experience of the modeler. Compared with other methods, there are two main advantages of this plan. One is the less computation resource usage and the other is the more controllable optimization process. Many research papers have proved the feasibility of this method [2527, 31, 32].

Generally speaking, numerical methods to solve the minimization problems have the similar iterative formula as follows:where and are the priori and adjusted values of VEVC in the th iteration, respectively and and represent the iteration step length and the search direction, respectively. There are many methods to determine the search direction . Two different optimization methods employed in this paper are the GDM-S and the L-BFGS methods.

3.1. Gradient Descent Method (GDM)

The GDM is a simple and feasible method to define the search direction as follows:where ,  , and are the zonal index, the meridional index, and the layer index of the calculation grid, respectively; represents the step of iteration. is the norm of the gradient of the cost function with respect to the VEVC in the th iteration.

In the GDM-S, the step length is chosen to be a constant according to the experience of the modeler. We have surveyed the performances of different values of and take 0.006 as the best choice. The optimized value of VEVC is obtained after the VEVC in every grid is adjusted according to (8) and (9).

3.2. L-BFGS Method

L-BFGS is an optimization algorithm in the family of quasi-Newton methods that approximates the BFGS algorithm using a limited amount of computer memory. This method is first described in the work of Nocedal [40], where it is called the SQN method. It is a popular algorithm for parameter estimation in machine learning [41, 42]. Due to its resulting linear memory requirement, the L-BFGS method is particularly well suited for optimization problems with a large number of variables.

It requires the search direction to bewhere

Note that is the simplification of for convenience. Here,  ,  ,  , and  . Many studies have shown that typically , where does not improve the performance of L-BFGS [43]. So the number of corrections m used in the L-BFGS update of this paper is taken as 5 [44]. The version of L-BFGS used in this paper is described in Liu and Nocedal [44] and the Fortran codes are authorized by Nocedal [45].

4. Design of Experiments

All the experiments in this paper are implemented in an ideal regional area from 116°E to 124.17°E and from 18°N to 23.17°N with in mind the practical sea area located around the Luzon strait. The horizontal resolution is and there are totally grids in the area. The maximum depth is set to be 1000 meters. The horizontal eddy coefficient is chosen to be and the bottom friction coefficient is taken as . The Coriolis coefficient is taken as the local value. Only tide is considered and its angular frequency is . The whole-time step is 496.86 s which is 1/90 of the period of tide. All the four boundaries are open boundaries and boundary conditions are set to be the local water levels in the Flather form.

Eastern and western boundaries:positive in eastern boundary and negative in western boundary.

North and south boundaries:positive in north boundary and negative in south boundary.

is the surface elevation above the undisturbed sea level. and are the zonal current velocity and the meridional current velocity, respectively. , , and are the surface elevation and the zonal and meridional current velocity relating to the boundary barotropic tidal force, respectively. is the Coriolis coefficient and is the tidal frequency of tide. is the local water depth and is the acceleration of gravity.

Similar as Chen et al. [27], the tidal force at the th time step is subject tothe Fourier coefficients at four open boundaries are set to be 0 and are set to be 1 (−1) at the north and west (south and east) boundaries.

The T/P altimeter data is widely spread throughout the ocean and it can be used to invert VEVC. In this work, we pick 89 calculating points based on the distribution features of T/P altimeter observation as the observation points (Figure 2).

Two kinds of topographies are tested in this paper and they are generated based on the two formulas in (15), respectively (Figure 3). The sea water in the computing area is divided into two layers. The thicknesses of each layers are, respectively, and . The potential densities of corresponding layer are and , respectively. Considerwhere is the height of the topography and MaxDepth is the depth of the water.

For each experiment, the optimization of the VEVC can be implemented with the following steps.

Step 1. The prescribed VEVC is given and the forward model is run. The whole simulation time is 20 period of the tide in order to obtain a stable simulation result. The water elevations in the observation positions are treated as the “pseudoobservations.”

Step 2. Initial value of the control parameter (VEVC) is given and forward model is run to get the simulated results of all the state variables such as current velocity and water elevation. The value of cost function is calculated.

Step 3. Difference between the simulated elevation and the “pseudoobservation” plays as the external force of the adjoint model. Via backward integrating the adjoint equations in a period of the tide values of the adjoint variables are obtained.

Step 4. Using formula (2), along with the state variables and the adjoint variables obtained in Steps 2 and 3, the gradient of cost function with respect to VEVC is calculated.

Step 5. Update the unknown control variables with a certain optimization method.

Step 6. If the stopping criterion of iteration is reached, bring the iteration to an end and return the optimized parameter. Otherwise, update all the parameters and go back to Step 2.

In the experiments of this paper, all initial values of VEVC are set to 0.005 and the total number of iterations is allowed to be 100 at most. The chosen convergence criterion is that the last two values of the cost function are sufficiently close, which is defined bywhere and are the last and the second last values of the cost function, respectively.

Two groups of numerical experiments are carried out: the influence of IPSs on the inversion of VEVC is studied in Group One; in Group Two the ability of this internal tide model to invert different kinds of VEVC with spatial distribution is examined. Two kinds of spatial distribution of VEVC are prescribed and given in Figure 4. For both distributions, the VEVC value ranges from  m2/s to m2/s.

In Group One, nine experiments are carried out to discuss the influence of IPS on the inversion of VEVC. The distance between independent points (IP) ranges from 20′ (length of 2 grids) to 100′ (length of 10 grids) and details are listed in Table 1. Topography A is applied and distribution (a) is chosen to be the prescribed spatial distribution of VEVC.

In Group Two, four numerical experiments (NEs) are carried out which are numbered as NE1~NE4, respectively. Each experiment is implemented with GDM-S and the L-BFGS methods. Information of topographies and prescribed VEVCs for all NEs is listed in Table 2. The IPSs are the optimal schemes obtained in Group One. Other parameters are exactly the same as Group One.

All the experiments in Group One and Group Two are carried out following Steps 1 to 6 described in Section 4. Version of L-BFGS method used in this paper is from the work of Liu and Nocedal [44].

5. Experiment Results and Discussions

5.1. Results and Discussions of Group One

Figure 5 illustrates the relationships between mean absolute errors (MAE, which reflects the error between the inversion result and the given VEVC) and the distance between independent points in Group One.

As is shown in Figure 5, MAEs with different optimization methods vary as the IPSs change. The minimum MAEs with the two methods are not the same (dashed lines). The minimum MAE with the GDM-S is  m2/s while that with the L-BFGS method reaches  m2/s. The corresponding IP distances are 90′ (length of 9 grids, GDM-S) and 30′ (length of 3 grids, L-BFGS), respectively. According to the MAEs illustrated in Figure 5, the optimal IPSs of the two methods are selected as IPS8 (GDM-S) and IPS 2 (L-BFGS).

5.2. Results and Discussions of Group Two

With the respective optimal IPSs and the iteration process in Section 4, the VEVCs are inverted with GDM-S method (I) and L-BFGS method (II), respectively, and the inversion results are shown in Figure 6.

Comparison of the inversion results with the prescribed VEVCs indicates that all the given spatial distributions of VEVC are successfully inverted after 100 iteration steps. The main features of all distributions can be recovered very well. Surfaces of the inverted VEVC with the L-BFGS are much smoother than those with the GDM-S. Compared against the inversion results with the GDM-S (left panels), patterns with the L-BFGS method (right panels) are closer to the prescribed VEVC. More statistic data will be presented in the next paragraphs.

MAEs of the four numerical experiments after assimilation are calculated and listed in Table 3. The initial values of MAE in all NEs are  m2/s. Based on the results shown in Table 3, all the MAEs after assimilation are more than one order of magnitude lower than the initial values, which means the success for both of the two optimization methods. Note that the MAEs with the L-BFGS method are quite close (less than  m2/s and cannot be distinguished in the table) and are less than half of those with the GDM-S. This result demonstrates the ability of the L-BFGS method to deduce the overall errors.

To compare the effectiveness of the two methods to invert the VEVC, we make statistics on the percentages of the grids at which the MAEs are less than  m2/s, which is listed in Table 4. With the GDM-S, the inversion errors are deduced by one order of magnitude at about 40% of the total grids. By contrast, ratios of all NEs with the L-BFGS method are 79.76%, without differences between NEs. This phenomenon indicates that the L-BFGS method is effective at more computation grids than the GDM-S. Furthermore, the L-BFGS method maintains its effectiveness no matter which topography is applied.

Combining the inversion patterns, the inversion errors of the VEVC, and the effectiveness analyses, conclusions can be drawn that the L-BFGS has a better performance in reducing the inversion errors.

Finally we come to the optimization history for all the experiments carried out in Group Two. The variations of the cost function normalized by its initial value, that of the norm of the gradient of the cost function with respect to the VEVCs and that of the inversion error, are plotted in Figures 7(a), 7(b), and 7(c), respectively, as a function of the iteration step.

Note that all experiments with the L-BFGS method reach the convergence criterion and stop after 4 iterations, which indicates that the computation time for the L-BFGS method is one twenty-fifth of that for the GDM-S. Figure 7(a) indicates that all the cost functions are in downward trends throughout the iteration process and decrease by more than 2 (7) orders of magnitude for the GDM-S (L-BFGS method), which means that the final differences between simulation value and the observation of these two methods are less than one-tenth and one-thousandth of their initial values, respectively. As is shown in Figure 7(b), the norms of gradient of the cost function with respect to the VEVC decrease by more than 1 order of magnitude (GDM-S) and 4 orders of magnitude (L-BFGS), compared with their respective initial values. This indicates that the inversion result is becoming increasingly closer to the given VEVC during the iteration. As shown in Figure 7(c), the inversion errors with the two methods keep declining throughout the iterations until the stopping criterions are satisfied. In general, the cost functions, norms of gradient, and the inversion errors of the VEVC have steady descent, which demonstrates that this adjoint model is capable to invert the VEVC. What is more, both the GDM-S and the L-BFGS methods are effective in terms of the inversion of the control parameters with spatially distributions of internal tide.

It is also clear in Figure 7 that the convergence rate for the cost functions, norms of gradient, and the inversion errors of the VEVC is much faster with the L-BFGS method than those with the GDM-S, which is consistent with the classic theories about the convergence rate of the quasi-Newton method and the GDM [36]. This trend is also consistent with the results of numerical experiments to invert the open boundary conditions in Chen et al. [30]. With no doubt, the L-BFGS method is a more effective and efficient optimization method to invert the spatially varying VEVC. However, the GDM-S is easier to understand and to implement in the model. Moreover, the step length and the search direction in the process of GDM-S can be freely controlled by the modelers, which is very convenient in practice. Therefore, for the inversion of the VEVC, the GDM-S should also be regarded as a choice.

6. Summary and Conclusions

Based on an isopycnic-coordinate internal tidal model, the inversion of VEVC is studied in this paper. A series of numerical experiments are carried out to examine the influence factors on the inversion of VEVCs in four aspects: independent point schemes (IPS), topography, the spatial distribution of VEVC, and the optimization methods. For each experiment, the cost function, the norm of gradient of cost function with respect to the VEVC, and the inversion error are calculated and analyzed in details.

The IPS is introduced and discussed in Group One. All the VEVCs can be inverted successfully with IPS. MAE is regarded as the comparison criterion of the result. After comparing the 9 experiments, the correctness of the IPS is confirmed and the optimal IPSs are selected for the GDM-S and the L-BFGS methods, respectively.

Based on the optimal IPSs in Group One, two kinds of VEVC distributions are successfully inverted with this adjoint model on two kinds of topography in Group Two. MAEs after optimization are at the level of 10−4 (10−5) for the GDM-S (L-BFGS), which is one (two) order(s) of magnitude lower than the initial value. All the cost functions and their gradient norms with respect to the VEV lead satisfactory declines no matter which optimization method is taken. Compared with the GDM-S, the L-BFGS method has a remarkably better performance, not only in terms of the convergence rate but also in terms of the final inversion results. The computation time for the L-BFGS method is much shorter than that for the GDM-S. To sum up, the L-BFGS method is a more effective and efficient method than the GDM-S in terms of the inversion of the VEVC. Nevertheless, the GDM-S is more convenient and controllable so it should not be ignored and should be taken seriously as a choice for the inversion of the VEVC with spatially distribution.

The success of numerical experiments lays a solid foundation for the practical experiments and encourages us to carry out experiments in practical sea area with measured data and the real T/P altimeter data.

Appendix

Derivation of (2)

Let us start with the governing equations in Chen et al. [27].Layer 1 (surface layer)Layer   ()Layer (bottom layer)

The variables and background of the governing equations have been introduced in Chen’s [27] work in details. We will not repeat them in this part. The interface and friction terms are expressed bywhere is the vertical eddy viscosity coefficient, , and .

The cost function is defined aswhich is exactly the same as that in [27]. Then the Lagrangian function is defined aswhere same definitions are applied in (A.2a)~(A.2c) and (A.3a)~(A.3c). Then the Lagrangian function can be written aswhere , , and functions , , and are, respectively, defined as Note that (A.11)~(A.13) do not contain the variable , which means The Lagrangian function can be written as

Finally, according to the typical theory of Lagrangian multiplier method, we have the following first-order derivate of Lagrangian function with respect to the control parameter :then the gradient of the cost function with respect to the variable can be deduced from (A.16):

Conflict of Interests

The authors declare that there is no conflict of interests regarding the publication of this paper.

Acknowledgments

Partial support for this research was provided by the National Natural Science Foundation of China through Grant 41371496, the State Ministry of Science and Technology of China through Grant 2013AA122803, and the Fundamental Research Funds for the Central Universities 201262007 and 201362033.