Research Article  Open Access
A New Algorithm to Accelerate Harmonic Analysis of Time Series
Abstract
The Lomb periodogram has been traditionally a tool that allows us to elucidate if a frequency turns out to be important for explaining the behaviour of a given time series. Many linear and nonlinear reiterative harmonic processes that are used for studying the spectral content of a time series take into account this periodogram in order to avoid including spurious frequencies in their models due to the leakage problem of energy from one frequency to others. However, the estimation of the periodogram requires long computation time that makes the harmonic analysis slower when we deal with certain time series. Here we propose an algorithm that accelerates the extraction of the most remarkable frequencies from the periodogram, avoiding its whole estimation of the harmonic process at each iteration. This algorithm allows the user to perform a specific analysis of a given scalar time series. As a result, we obtain a functional model made of (1) a trend component, (2) a linear combination of Fourier terms, and (3) the socalled mixed secular terms by reducing the computation time of the estimation of the periodogram.
1. Introduction
The Lomb periodogram is a mathematical tool that allows us to carry out a spectral analysis of a time series looking for unknown periodicities. It is based on the leastsquares technique and was developed by Lomb [1] and Scargle [2] by following different ways. This tool overcomes the main problems of the classic periodogram among which we have to emphasize the fact that the Lomb periodogram can be used for analyzing unequally spaced time series.
The Lomb periodogram is a function that depends on the frequency in a continuous manner. Its maximums or peaks point out the harmonics that explain the largest variance of the data. However, it is unpractical to compute the Lomb periodogram for a time series in an infinite set of frequencies. Therefore, we have to confine ourselves to a discrete approximation of it. One of the problems that we have to deal with when we compute the Lomb periodogram is to fix a step size for the frequency domain. If the time series is evenly spaced, it suffices taking a step size that allows us to detect the maximum period of time. However, we have to be more cautious when we cope with oddly spaced time series. In this case, the shorter the step size we choose, the better and safer approximation we get. However, when the step size for the frequency domain is considerably small or the length of the time series is large, the computation time of the Lomb periodogram increases.
The discretization of the Lomb periodogram implies other problems such as the creation of spurious spectral lines. To avoid including those false peaks in a harmonic model, the Lomb periodogram is considered in a process where the frequencies that better explain the data are extracted one by one. As a frequency is pulled out, the information kept in it is removed from the data. After that, the Lomb periodogram is calculated once again for the residuals and another frequency is extracted. As we could guess, the computation time of the analysis rises. Some of those recurrent processes are the frequency analysis mapping on usual sampling (FAMOUS) proposed by Mignard [3] and the nonlinear harmonic analysis (NLHA) created by Harada [4].
Some authors, such as Mignard [3], have developed techniques in order to accelerate the computation of the periodogram by looking for the larger step size that allows us to detect the harmonic content of the data. Here we suggest another method to remove the significant spectral lines of a time series in a faster manner by patching the Lomb periodogram.
Herein, we describe the NLHA algorithm and the modification applied in order to get a quicker analysis. As an adding, we provide a couple of examples where this technique is compared with the traditional computation of the Lomb periodogram.
The paper is structured as follows. In Section 2, we present and describe the NLHA method along with its mathematical basis. In Section 3, the suggested modification of the NLHA method is explained. Section 4 contains a couple of different time series which are analyzed as an example. Finally, Section 5 provides a brief summary.
2. Nonlinear Harmonic Analysis
Let us consider a time series with observations ( in naturals, ), represented by , where is the time at which the observation is carried out and is the measurement of the phenomenon we are interested in at epoch . Henceforth, in order to facilitate subsequent calculations, given a time series, we will consider its temporal translation , where
For the construction of the NLHA model, we have used the technique introduced by Harada [4] that has been successfully applied to different problems [5]. This technique, which we will describe briefly, is about fitting a time series using the leastsquares method to a function of the form where represents the number of basic functions and are the linear coefficients to solve for. This set of basic functions consists of(1)three polynomial functions which set up the trend of the series: where is the temporal range of the data;(2)two Fourier terms for the angular frequency , denoted by (3)a couple of the socalled mixed secular terms for the frequency : This last kind of base functions does not have to appear necessary in the model for each frequency.
Suppose that angular frequencies have already been included in the functional model, the set of them being denoted by and assume that the subset contains the frequencies associated with mixed secular terms. Next, we search additional frequencies that could be added to the functional model by studying the Lomb periodogram of the residuals obtained from the leastsquares fit of the temporary functional model.
A criterion is needed to guess which frequencies are linked to mixed secular terms and which are not, just to construct our model function in (2). The procedure that allows us to elucidate such an association is based on the Lomb periodogram and an extension of it. The algorithm increases the number of frequencies one by one and adds them to the model. First, we have to compute the spectrum of the Lomb periodogram, which is given by the formula [4]: where and is the weight assigned to the observation .
The peak of this spectrum will point out a significant angular frequency to be included in the model. In the next stage, when the Fourier term of the frequency is already selected as a basic function, we compute the extended periodogram [4]: where
If this maximum of the extended periodogram is larger than the maximum of the Lomb periodogram, mixed secular terms will be linked to this frequency as well. In other cases we will only select classic Fourier terms.
Let us denote by the vector of angular frequencies. As we compute each frequency, an adjustment to the data is made by the leastsquares method just to solve for the value of the linear coefficients . After each estimation of these coefficients that we denote by , and before adding a new angular frequency, we can regard the objective function of the leastsquares problem for our model as a function on the space of frequencies, that is: with . Therefore, our problem is reduced to a minimization problem that depends nonlinearly on the parameters, namely:
To carry out this task we have used the BroydenFletcherGoldfarbShanno (BFGS) algorithm [6] where we have considered as a seed point. Let us denote by the solution of (11). Next stage deals with the adjustment of the model to the data by considering as a vector of frequencies and by using the leastsquares method again. This cycle recurs until the difference between the seed point and the solution of (11) becomes small. After that, we can continue extracting another frequency or we can stop the process if the residual root mean square (RMS) becomes smaller than a fixed level. We can also stop the algorithm by reaching a preset number of frequencies given by the user.
3. Corrected Patched Periodogram
The Lomb periodogram is one of the most used spectral tools when we are interested in looking for the harmonic behaviour of a certain set of data. However, its estimation takes long time sometimes, especially when the time series is large enough. Moreover, when the periodogram is included as a part of an iterative nonlinear harmonic process, such as the one described in previous section, we find out that the computation time is considerably increased due to the estimation of a periodogram each time a frequency is removed from the harmonic contents of the data. Now imagine that the time series is multidimensional; therefore the time the algorithm needs to provide us a result will be huge. Thus, it would be interesting to modify the estimation of the periodogram in some way just to reduce the computation time of the analysis. There are several ways to deal with this problem and get a quicker analysis of a given time series.
On one hand, we can accelerate the calculation of the periodogram by reducing the number of frequencies in the frequency domain. In this case, you take the risk of skipping harmonics that explain the data better than any other one. To cope with this problem Mignard [3] suggests an algorithm that is able to estimate a fixed step size for the frequency domain such that the harmonics that may explain the data are taken into account. However, this procedure needs you to provide a previous set of frequencies that may have something to do with the time series we want to analyze.
On the other hand, instead of reducing the number of frequencies, we can keep the frequency domain and patch the periodogram in a surrounding range of the higher peaks. Once we have estimated the Lomb periodogram for the first time, we will focus on modifying the spectrum by patching it. Herein the algorithm we suggest is described. Here we explain the process for the Lomb periodogram, but it can be applied to the extended periodogram as well.
Step 1. Let us denote by the discrete frequency domain where the periodogram will be estimated. Set up .
Step 2. By using (6), get the Lomb periodogram for the time series :
Step 3. Do . Select the frequency where the maximum of the periodogram, , is achieved, namely, . Then, remove the harmonic information from the data by following the nonlinear harmonic algorithm:
Step 4. If the stop condition of the nonlinear harmonic process is reached, then stop. Otherwise, the process continues extracting another frequency. Now, instead of estimating the whole periodogram for the residuals, we patch the spectrum of the original data. First of all, consider a frequency range surrounding the previous extracted peak with frequencies,
Step 5. Let us denote by the periodogram of the residuals for the harmonics in .
Step 6. Scale and patch the periodogram by doing where is the rate between the median of the values in and the median of the values in the patch . Then, return to Step 3.
To run the algorithm it is necessary to set the length of the range in Step 4. The value may depend on the data, but usually taking is good enough. If you expect your data to be noisy, then choosing a larger value of would be safer. As we have seen, at Step 6 we carry out a scaling of the values of the periodogram that are not patched. The reason why we scale the periodogram is because as a frequency is extracted from the data, the power of the periodogram for the residuals is compressed. Therefore, if we want to avoid creating spurious peaks, we have to reduce the power of the periodogram. A way to do that task is calculating the rate that will give us an idea of the manner the power spectra are reduced. Nevertheless, this scaling process does not assure us not to detect spurious frequencies, so in addition, we have included a correction stage, that is to say, the user has the possibility to order the algorithm to estimate the whole periodogram of the residuals after a specific number of patches. For the following steps, this periodogram will be the new to be patched.
4. Examples
4.1. Synthetic Time Series
In this section the capabilities of the method are shown by using a synthetic time series as input. The number of data is set to 5000, and the time domain goes from −2499.5 to 2499.5, regularly spaced. The harmonic content of the data is detailed in Table 1. Apart from those frequencies, the data are affected by a white noise component with unit amplitude. For the analysis, it has been considered the use of the patched periodogram with , and, besides, no correction stage is ruled out. Figures 1, 2, 3, 4, 5, and 6 show the graphical display of the Lomb periodogram as a patch is performed.
 
^{
a}Frequencies included in the functional model (in cycles per unit of time). ^{ b}Linear coefficient of the sinus Fourier term for the frequency that appears at the same row. ^{ c}Linear coefficient of the cosinus Fourier term for the frequency that appears at the same row. ^{ d}Linear coefficient of the sinus mixed secular term for the frequency appearing at the same row. ^{ e}Linear coefficient of the cosinus mixed secular term for the frequency that appears at the same row. 
The results are shown in Table 2. As we can observe, the nonlinear harmonic method together with the patched periodogram is able to detect properly the frequencies that describe the data. In fact, there are scarcely differences between those estimations and the ones got by using the standard nonlinear harmonic method—without the patched periodogram. In fact, the final RMS is the same for both methods (1.037 units). However, the time needed to perform the analysis with the patched periodogram (80.69 seconds) is almost three times shorter than the time required by the common analysis (233.38 seconds). The overhead has been reduced drastically while achieving the same goodness of the analysis results.
 
^{
a}Estimated value of the frequencies included in the functional model (in cycles per unit of time). ^{ b}Linear coefficient of the sinus Fourier term for the frequency that appears at the same row. ^{ c}Linear coefficient of the cosinus Fourier term for the frequency that appears at the same row. ^{ d}Linear coefficient of the sinus mixed secular term for the frequency appearing at the same row. ^{ e}Linear coefficient of the cosinus mixed secular term for the frequency that appears at the same row. 
4.2. IAU2000A Pole Offsets Time Series
As a real application, the IAU2000A pole offsets time series have been analyzed. The data (available at the IERS web site) refer to the and parameters which represent small changes in the unit vector components of the celestial pole position with regard to the IAU2000A precessionnutation model [7]. The time domain (in Julian days) ranges from September 23rd, 2000, to September 23rd, 2010, and is regularly spaced. As we were interested in including the same harmonic content for and models, the data were treated as a complex time series, . In order to assess the estimations of the harmonics and the computation time by using the patched periodogram, we have also carried out an analysis with the original nonlinear harmonic method. In both cases, 20 frequencies have been extracted from the data after removing a linear trend component. The range of the patched periodogram was set to , and a correction stage is applied after the extraction of each four frequencies.
The numerical results are shown in Tables 3, 4, 5, and 6. Considering the uncertainties for the estimated parameters we can see a good agreement between the model obtained by using the Lomb periodogram and the model estimated by using the patched periodogram. There are six spectral lines linked to mixed secular terms. Those frequencies are also the same for both models, and the estimations of the linear coefficients are quite similar. There are subtle differences between certain periods and their associated coefficients, and we also observe differences in the extracting frequency order but, in general, we can state that the models contain basically the same spectral lines. Moreover, both models provide similar values of the final weighted RMS: 0.09358 milliarcseconds (mas) and 0.12653 mas for and , respectively, by using the common method and 0.09356 mas and 0.12651 mas by using the patched periodogram. Nevertheless, the analysis carried out with the patched periodogram turns out to be faster (1968.73 seconds) than the analysis with the traditional Lomb periodogram (2618.69 seconds). The differences can be reduced by applying the correction stage less often, although we prefer following a conservative strategy in this example.
 
^{
a}Estimated value of the periods included in the functional model. ^{ b}Linear coefficient of the sinus Fourier term for and for the period that appears at the same row. ^{ c}Linear coefficient of the cosinus Fourier term for and for the period that appears at the same row. ^{ d}Linear coefficient of the sinus Fourier term for and for the period that appears at the same row. ^{ e}Linear coefficient of the cosinus Fourier term for and for the period that appears at the same row. 
 
^{
a}Estimated value of the periods included in the functional model. ^{ b}Linear coefficient of the sinus Fourier term for and for the period that appears at the same row. ^{ c}Linear coefficient of the cosinus Fourier term for and for the period that appears at the same row. ^{ d}Linear coefficient of the sinus Fourier term for and for the period that appears at the same row. ^{ e}Linear coefficient of the cosinus Fourier term for and for the period that appears at the same row. 
