Abstract

Accurate extraction of the modulated annual cycle (MAC) is important for climatic and oceanic research. A variety of methods are available to extract the annual cycle with inconsistent results. Since actual annual cycles are unknown in the observation series, the reliability and applicability of the results extracted by these methods are difficult to estimate. In this study, three widely used decomposition methods, ensemble empirical mode decomposition (EEMD), nonlinear mode decomposition (NMD), and enhanced harmonic analysis (EHA), are evaluated by idealized numerical experiments for extracting modulated annual cycles from climate series. Idealized numerical experiments are carried out and show that the recently proposed EHA had the most accuracy in extracting the MAC from the constructed data. The optimal independent point (IP) number, which makes the most accurate result for EHA, can be found in each ideal experiment. In the actual experiment, two IP selection criteria are proposed for EHA to extract MAC from observations.

1. Introduction

The annual cycle is an important aspect of climatic and oceanic data research. Air temperature, water temperature, water level, and other data have significant annual cycles. For example, the annual cycle can explain 60–90% of monthly sea level variance in the marginal northwest Pacific [1]. In the past, the annual cycle was usually regarded as an exact repeat from year to year [2]. This stationary annual cycle, known as the traditional annual cycle (TAC), is often used in anomalies calculation [13]. Anomaly is usually defined as a deviation of a value from its typical position in the annual cycle. Numerous papers on climate science have been published, which use monthly anomalies and propose a physical mechanism to explain it [2]. However, TAC ignored the variability of amplitudes and phases of the annual cycle and mixed the changes in the annual cycle into the anomalies. In order to explain the annual cycle changes which are usually caused by changes within the nonlinear climate system, a description of the annual cycle needed to be improved, including the amplitude changes from one year to the next, or rather a modulated annual cycle (MAC) [2]. How and why the MAC changes have attracted increasingly more attention and various methods have been used to extract the MAC [4].

The TAC of climatic and oceanic data is generally considered to be expressed as a trigonometric function, and the amplitude and phase of the TAC can be obtained through fitting models, which is the elementary method to extract TAC. Feng et al. [1] extracted the MAC with moving window fitting models. Although this method can obtain the continuously time-varying amplitude and phase of the annual cycle, it usually underestimates the amplitude change. Wu et al. [2] used ensemble empirical mode decomposition (EEMD) [5] to extract MAC. Qian et al. [3] used EEMD to analyze the Niño-3 SST index from 1880 to 2008 and found that its MAC has strong interannual modulation and long-term changes in both amplitude and phase. Deng et al. [6] used nonlinear mode decomposition (NMD) to extract the change of MAC and found that the interannual change of the annual cycle ignored by TAC would affect the quantification of long-term persistence (LTP).

To test the effectiveness of various methods, Deng and Fu [7] compared the extraction results of five methods, including the fitting models, complex demodulation (CD), EEMD, NMD, and seasonal trend decomposition procedure based on loess (STL), through idealized numerical experiments involving MAC with three different time-varying amplitudes, and they concluded that NMD has the optimal performance. In addition, NMD is a more complicated method, which requires more resources on computation.

Jin et al. [8] proposed an enhanced harmonic analysis (EHA) to analyze temporal variations of internal tidal currents in the South China Sea. Based on the independent point (IP) scheme [9], EHA can generate time-varying amplitudes and phases for specific frequencies. Pan et al. [10] developed a MATLAB toolbox of S_TIDE based on the EHA and applied it to explore tide-river interaction in the Columbia River Estuary. Recently, EHA has been applied to investigate the seasonal variation of the principal tidal constituents in the Bohai Sea [11] and the changes of nodal cycles in the Gulf of Maine [12]. In this paper, we firstly apply EHA to the extraction of the MAC and compare it with EEMD and NMD.

The paper is organized as follows. In Section 2, we introduce the extraction methods: EEMD, NMD, and EHA. Section 3 describes the design of idealized numerical experiments. Experimental results are discussed inSection 4. Finally, the conclusions are drawn in Section 5.

2. Methods

2.1. EEMD

Empirical mode decomposition (EMD) is an adaptive signal decomposition algorithm for nonlinear, nonstationary signals proposed by Huang et al. [13]. Through EMD, a complex signal in the time domain can be decomposed into a finite number of components, which are called intrinsic mode functions (IMFs). As an intrinsic mode of a signal, each IMF can be modulated by both amplitude and frequency, meaning that its amplitude and frequency change with time. A time series of climate element data can be decomposed with the EMD method in the following form:where is the number of IMFs, which is determined by factors like the length and variation of the data, end effects, and stoppage criteria, is the -th IMF, and represents the residual trend. Compared with the traditional Fourier transform-based signal analysis method, such as wavelet transform, EMD breaks the limitation of the preselection of wavelet function and has good temporal resolution and adaptability.

While the EMD has these wonderful properties, it also has a serious “mode mixing” problem, which is defined as any IMF consisting of oscillations of dramatically disparate scales. To solve the mode mixing problem, the Ensemble EMD (EEMD) was developed [5]. In this method, multiple noises are added to an observed time series to simulate the scenario of multiple trials for a time series, in order to carry out an ensemble average approach for corresponding IMFs and extract scale-consistent signals. The main steps of the EEMD method are as follows [2]:(1)Add a white noise series to the targeted data and decompose the data with added white noise into IMFs.(2)Repeat step 1 several times, but the white noise series is different in each time.(3)Obtain the (ensemble) means of corresponding IMFs of the decompositions as the final result.

The effect of EEMD is that the added white noise series cancel each other, and the mean IMFs stay within the natural dyadic filter windows, significantly reducing the chance of mode mixing. EEMD is widely used in many fields, such as MAC [24], seismic signal [14], and machinery diagnosis [15].

2.2. NMD

Nonlinear mode decomposition (NMD) was proposed by Iatsenko et al. [16] for nonlinear and nonstationary data. This method can decompose a time series into a set of physically meaningful oscillations in the frequency space, based on a combination of time-frequency analysis techniques and surrogate data tests. A given signal can be decomposed by the NMD method into nonlinear modes (NMs) and some additive noise as follows:

The key of the NMD method is to obtain the time-frequency representation (TFR) of the signal. Theoretically, all traditional TFR techniques can be used, like wavelet transform, windowed Fourier transform, and so on. Based on the TFR, all possible harmonics of each NM can be reconstructed. Then, using surrogate data sets’ tests to eliminate false harmonics, all remaining harmonics are summed to obtain a complete NM. As a new decomposition method, NMD has been applied to many fields, such as city engineering modeling [17], analysis of surface air temperature [6], and photoplethysmography [18].

2.3. EHA

In the traditional fitting models, the amplitude and phase of the annual cycle can be estimated by the least-squares methods, and this method is expressed as follows:where is the observations, is a constant, is the amplitude, and is the phase. Equation (3) can be rewritten as follows:where

EHA assumes the parameters and to be time-varying functions. Then, equation (4) can be modified as

The time-varying parameters in equation (6) are solved with the help of an independent point (IP) scheme. The IP scheme assumes that the coefficients and are changing and selects representative time moments as IP, which are evenly distributed in the time domain. The coefficients and at IPs are called independent parameters ( and ), and the values at the other times are obtained by cubic spline interpolation between two adjacent IPs. is called the IP number. Therefore, the coefficients and can be expressed as a function of and :where is the interpolation weight coefficient of the -th independent point at a time and the coefficient is determined by the interpolation method. Both and are unknown at this time. Substituting equation (7) into equation (6), we get

At times , we can get , so equations can be obtained:

There are unknowns in the system of equation (9). When is much larger than , , , and can be solved by the least-square fitting. Then, the time-varying amplitude and phase can be calculated from and after interpolation. The selection of IP numbers is discussed in detail in Section 4.3. It should be noted that when IP number is one, the amplitude and phase obtained by EHA are constant and the same as the results of the traditional fitting model. This means that the traditional fitting model can be seen as a special case of EHA.

3. Ideal Experiments

In this work, we design seven experiments, each of which has a different amplitude change in the annual cycle. Since the amplitude change is more apparent than the phase change, we mainly consider amplitude change with fixed frequency and phase in this work. Nevertheless, in the section of the discussion, we also conduct some experiments where both amplitudes and phases are time-varying. The constructed data are obtained by adding noise to the preset annual cycles. And seven kinds of amplitude changes are set in this work: constant, linear increase, periodic change, and four kinds of complex nonlinear changes. In each experiment, noise with 1%, 2%, and 4% noise-power ratio (NPR) whose Hurst index is is added, respectively, to test the influence of noise on the extraction results. The noise is generated by using a Fourier-filtering method (FFM) [19]. Then, three methods, EMD, NMD, and EHA, are used to extract the annual cycle. By comparing the extraction results with the preset annual cycle, we examine the effectiveness of EHA and provide some suggestions in the actual application.

Climatic data can be decomposed into long-term trends, annual cycles, and daily high-frequency varying fluctuations [2]. Like Deng and Fu [7], in order to examine the accuracy of the extraction results of various methods, the constructed data was used instead of the observed data, and the influence of the long-term trend was not considered. The original data consisted of a MAC and high-frequency fluctuations on a time scale of 50 years (one number per day).

In this paper, the trigonometric function was used to construct the annual cycle with fixed frequency and phase and changing amplitude . can be expressed as the following form:where the frequency is . In this paper, 7 kinds of amplitude changes were designed: constant, linear increase, periodic change, and 4 kinds of nonlinear changes (Figure 1). The 7 kinds of amplitude changes all ranged from 9 to 11. Similar to Deng and Fu [7], noise was considered in this paper to represent high-frequency fluctuations. Noise with a noise-power ratio of 1%, 2%, and 4% is added in each experiment, respectively.

4. Result and Discussion

4.1. Result

The previously constructed data has been processed by three methods, EEMD, NMD, and EHA, to extract different MACs. This section mainly discusses the deviation between the extraction results and the preset amplitude-modulated annual cycle. In order to avoid the edge error, only 46 years results were selected, excluding the first two years and the last two years. The deviation can be expressed by the root mean square error (RMSE) as follows:where represents the extracted annual cycle and represents the preset annual cycle. is the length of . The RMSE in each experiment is shown in Figure 2. The RMSE of the three methods increases with noise power. The RMSEs of EEMD are the largest and an order of magnitude higher than the RMSEs of other methods. When the changes of the preset annual cycle are relatively simple (the first three types), the RMSE of EHA is an order of magnitude smaller than that of NMD. When the preset annual cycle changes are complex, RMSEs of EHA and NMD are in the same order of magnitude. In each group of experiments, the RMSE of EHA is the smallest, EEMD is the largest, and NMD is between them. The RMSEs of NMD and EHA are also affected by the complexity of amplitude variation. At the same noise-power-ratio level, the more complex the amplitude variation is, the larger the RMSE will be. However, the RMSE of EEMD is almost only affected by the noise-power ratio and is not sensitive to the complexity of amplitude changes. EHA can directly obtain the instantaneous time-varying amplitude. For the extraction results of NMD and EEMD, the time-varying amplitudes of the MAC are obtained through a cubic spline fitting to the local maxima of series [4]. The result of EHA is closest to the preset amplitude (Figure 3).

Among the three methods, both EEMD and NMD are adaptive, while EHA, which is subjective, needs users to tune the IP number. When the IP number is too small, the result obtained by EHA is relatively simple and can only reflect low-frequency changes of MAC. The more the IP is, the more the MAC changes can be reflected. When the IP number is too large, the result of EHA is likely to be severely abnormal. This means that we not only need a large enough IP number to simulate the change of MAC but also need certain restrictions to prevent it from becoming a pure data fitting result. In the idealized numerical experiments, the IP number with the smallest RMSE is the optimal IP number. However, there is no preset MAC in the observed data, so the subjectivity of EHA is a weakness that needs to be overcome. For this reason, we try to propose two criteria to select the optimal IP number in Section 4.4.

4.2. Extended to Time-Varying Phases

In fact, not only the amplitudes but also the phases of the annual cycles are modulated according to the observations. Thus, we also use EHA, EEMD, and NMD to extract the annual cycle with both amplitudes and phases modulated from constructed data. The new MAC can be expressed as the following form:

The frequency (once a year) was fixed. The time-varying amplitude and noise of the annual cycle were set as in Section 3, and the newly added phase change was designed as follows:

The phase varies between positive and negative 20°. Then, we obtained 7 more complicated MACs. The data used in the experiments 8–14 was constructed with the new MACs and noise. And the new MACs are extracted by the EEMD, NMD, and EHA. The RMSEs of experiments 8–14 are shown in Figure 4. It can be seen that, in experiments 8–10, the RMSEs of EHA are significantly larger than those without phase changes, but they are still the smallest among the three methods. And in experiments 11–14, EHA still shows the highest accuracy. As for the amplitude, the error of the NMD result becomes larger, and the EHA result is still closest to the preset amplitude (Figure 5), indicating the superiority of EHA.

4.3. Impact of IP Number on the EHA Extraction Results

The extraction result of EHA depends on the IP number, which is a feature of the EHA method. According to Pan et al. [10], amplitudes and phases obtained by EHA using different IP numbers represent oscillations on different time scales. If the amplitudes and phases of the given MAC are very complicated, it is obvious that more IP numbers should be used in EHA. However, too many IP numbers will induce a seriously abnormal extraction result. In fact, for a given MAC, there always exists an optimal IP number that can describe it the best.

For example, when the amplitude and phase are constant, the RMSE increased with IP number, and the RMSE was the smallest when IP number is one. If the amplitude and phase were not constant, the RMSE decreased first and then increased with the increasing IP number. In other words, in each experiment, an optimal IP number can always be found to minimize the deviation of the EHA result from the preset annual cycle (Tables 1 and 2). The optimal IP number is determined by the complexity of the annual cycle and the noise power. In our idealized experiments, the optimal IP number increases with the complexity of MAC.

4.4. Annual Cycle Extraction of Observation Data

In order to test the performance of these methods (EEMD, NMD, and EHA) in the real observations, they are applied to the daily water temperature at the Rhône Sion Station from 1984 to 2013, for a total of 30 years. Sion Station is located at the bottom of a densely populated Alpine Valley through the Rhône River. We use EEMD, NMD, and EHA with 80 IPs (the reason for the selection of 80 is given later) to extract the annual cycle of this data (Figure 6(a)). The lack of true annual cycle data made it impossible to judge the accuracy of their results. The trend of annual cycle amplitude changes obtained by the three methods was generally consistent, but the EEMD results changed more drastically, and the difference between NMD and EHA is relatively small (Figure 6(b)). It shows that EHA can also effectively extract the annual cycle from the real observed climate data and can decompose annual cycles into time-varying amplitudes and phases.

Compared with the constructed data, the observation data of the temperature time series are more complicated. The way of selecting the optimal IP number has changed since we do not know the real MAC. As indicated before, the more the IP numbers, the more complicated the results obtained by EHA. As many IP numbers as possible should be used in the analysis of observed data. Too many IP numbers, however, will lead to serious abnormal extraction results. Then, the resulted time-varying amplitudes and phases will lose physical meaning and become only artifacts. Here, we attempt to propose two criteria for selecting optimal IP number when extracting MAC from observations using EHA. Firstly, if the IP number is appropriate, the time-varying amplitude obtained by EHA usually fluctuates around the constant amplitude obtained by the traditional fitting model [10]. Thus, the relative size of the time-averaged amplitude can be used to verify the reliability of the results. cannot exceed 5% of the amplitude extracted by traditional fitting model, namely,

Secondly, S_TIDE will also provide a confidence interval of 95% for the obtained amplitude. We can compare the amplitude and interval length (i.e., the signal-to-noise ratio of amplitude) to determine if the result is reasonable:where is time-averaged amplitude and is the corresponding time-averaged confidence interval. It is generally believed that greater than 2 is credible, and usually decreases with increasing IP number (Figure 7). In the observed series analyzed in this paper, the IP number less than 80 can meet the above criteria, so we treat 80 as the optimal IP number.

In the observed data, there may be many cycles, such as the annual cycle and half-year annual cycle. The previous numerical experiments were based on a monofrequency MAC. Therefore, applying the EHA described in Section 2.3 directly to the observed data may not perform as well as the experimental results. In fact, the description of EHA in Section 2.3 is a simplified version for the monofrequency MAC experiment. The complete EHA can process the data which has multiple cycles. When the data has multiple frequency cycles, the form of EHA changes from equation (5) towhere is the frequency of the -th cycle and is the number of cycles. After calculating equation (15), multiple modulated cycles with different frequencies can be obtained. For different types of data, which cycles should be used in EHA decomposition is also different, and its effect needs further verification.

5. Conclusion

The extraction of annual cycles has great significance in the study of climatic and oceanic data. We design a set of idealized experiments to test the effect of EHA on extracting the annual cycle. In the idealized experiments involving the annual cycle with only amplitude changed, the result of EHA has the smallest deviation from the preset annual cycle. The deviations of EHA and NMD extraction results are affected by the complexity of the annual cycle and the noise power. In the experiments extended to time-varying phases, the RMSEs of EHA with modulated phases are significantly larger than the RMSEs with a fixed phase, but EHA can still generate the most accurate result among the three methods.

The performance of EHA is affected by the IP number. For each group of experiments, the optimal IP number minimizing the deviation of the extraction results from the preset annual cycle can always be found. For the actual observation data, due to the lack of the reference annual cycle, the definition of the optimal IP number has changed. Thus, we proposed two criteria for selecting the optimal IP number. In future work, the new method explored in this study is expected to process a variety of climatic and oceanic data.

Data Availability

All data used to support the findings of this study are available from the corresponding author. The NMD toolbox for MATLAB provided by D. Iatsenko can be downloaded at http://www.physics.lancs.ac.uk/research/nbmphysics/diats/nmd/, and S_TIDE toolbox for MATLAB can be downloaded at https://www.researchgate.net/project/A-non-stationary-tidal-analysis-toolbox-S-TIDE. The observed water temperature data can be downloaded at https://github.com/marcotoffolon/air2stream/tree/master/Switzerland.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

Many thanks are due to Zuntao Fu for his guidance on how to generate noise with long-term memory. Thanks are due to G. Rilling for distributing the EMD package. This study was supported by the National Natural Science Foundation of China through Grant 41876003 and the National Key Research and Development Program of China through Grants 2017YFA0604101 and 2016YFC1401404.