Research Article | Open Access
M. Brackstone, A. S. Deakin, "Approximations of Time Series", International Scholarly Research Notices, vol. 2011, Article ID 321683, 10 pages, 2011. https://doi.org/10.5402/2011/321683
Approximations of Time Series
A method is proposed to approximate the main features or patterns including interventions that may occur in a time series. Collision data from the Ontario Ministry of Transportation illustrate the approach using monthly collision counts from police reports over a 10-year period from 1990 to 1999. The domain of the time series is partitioned into nonoverlapping subdomains. The major condition on the approximation requires that the series and the approximation have the same average value over each subdomain. To obtain a smooth approximation, based on the second difference of the series, a few iterations are necessary since an iteration over one subdomain is affected by the previous iteration over the adjacent subdomains.
Graduated licensing system (GLS) is a method of gradual exposure of young novice drivers into the driving environment, allowing them to obtain initial experience with driving under supervision, followed by more independent driving under higher-risk circumstances . This model was widely incorporated into driver licensing programs across the US and Canada as well as other countries over the 1990s. Most of these programs have incorporated similar restrictions into their initial phases . These include driving with supervision, restricted driving at night, limited teenage passengers, and zero blood alcohol level while driving. This method has had limited long-term evaluation in North America, but long-term followup in New Zealand suggested a reduced but persistent long-term reduction in young driver collisions as a result of its implementation . The collision data for Ontario drivers, around the time of the introduction of the GLS (, p. 126), illustrate the variety of approximations of time series that are possible.
There are many practical techniques for smoothing a time series . The smoothed value at a point is a weighted average involving the elements in the series that are within a local window about the point. One way to generate the weights in a moving average involves a local polynomial approximation of order 3 (or 5) where the window includes 5 (or 7 et cetera) points. The weights are then defined by regression. Another approach defines the weights in terms of an appropriate kernel, and this method applies more generally to bivariate data . The advantages of local estimates compared with global estimates are discussed in .
The first step in the computational process involves the partition of the domain of the time series into subdomains. The subdomains are then labelled as odd-numbered or even-numbered. Iterations are then performed over the odd numbered subdomains followed by the iterations over the even-numbered subdomains. This process is numerically efficient since the iterations over one set of subdomains update the boundary conditions for the iterations over the remaining subdomains . To determine a smooth and accurate approximation, this set of iterations is repeated a few times.
This paper is organized as follows. Section 2 describes the form of the approximations, the partition of the time series, and the minimization over the subdomains of the partition. In Section 3, the equations for the approximation are derived, and some computational details are given in Section 4. Under certain assumptions, approximations of time series with variable spacing are possible. The time series and their approximations are presented in Section 5, and an example outlines the approach for step level changes, missing data, and outliers in Section 6. In Section 7, the approximation over a subdomain is determined by a fourth-order polynomial and a straight line. Finally, guidelines for the application of the proposed approach to a time series are outlined in “Concluding remarks.”
2. The General Model
The equation for the general model for the approximation of the time series is where and . In these equations, is the term for outliers (if present); is the th approximation; is a trend and includes the level changes; is a nonperiodic oscillatory function; is the remainder; is a measure of the variation of the remainder. A restriction on the approximations requires that the root-mean-square (RMS) value of the remainder is a decreasing sequence with increasing . The form of (2.1) is similar to the asymptotic expansion of a function that contains a small positive parameter (, p. 1–4).
The partition of the domain of the time series , henceforth denoted by , is chosen in order to accurately approximate possible patterns in the time series. Let be a partition of where the nonoverlapping subdomains are , , where is the number of elements in the th subdomain . The overlapping subdomains, over which the iterations are computed, are defined as . For , and for , so that the overlapping subdomains are defined over the interval .
An approximation of a time series , where , is determined by a few iterations starting with . Once the desired accuracy is obtained, the last iteration is defined as . All iterations and the approximation along with the remainder satisfy the following properties.(1)An iteration over has the same average value as the time series and, hence, the average value of the remainder is zero over this subdomain. If is a measure of “energy” in the process, then conserves energy over each subdomain in the partition. In the particular case , and is a fixed point for the approximation so that at .(2)The measure of smoothness of the iterations at time for the th iteration is defined by which is the second difference of at time . The norm on is defined as the RMS value Provided that the number of elements in is greater than 1, then the condition that has a minimum value is imposed. is required for and in to determine at the endpoints of . These two values of are the boundary conditions for the minimization on .(3)For most of the examples presented in this paper, and are fixed points so that () and (). These values provide the boundary conditions for the minimization over the adjacent subdomain. For the general case where in ( in ), one of the boundary conditions is missing in () so that an external boundary condition is required as described in the last paragraph of Section 3.(4)In some cases there are two or more approximations over one or more subdomains and a criterion is required to choose the best approximation. From (2.1), where is determined from over a partition . For the example in Section 5, the simplest case occurs when is a refinement of ; that is, where covers in . Then an approximation over is and the other is defined by . Let the RMS value of the remainder over be denoted by , and is the RMS value over in . The approximation defined by is a significant improvement if the ratio for a chosen value of . As shown in Section 7, an upper bound for takes on values between 0.75 and 0.9. For the example in Section 6 involving an outlier, there are two approximations for .(5)The magnitude of the remainder is defined to be the RMS value of the remainder over each subdomain in a partition, and this definition implies that the RMS value of the series in (2.1) is equal to 1. For the example presented in Section 5, the subdomains are uniform with 12 elements.
3. Mathematical Details
Given the iterates and on , the iterates for are computed such that the sum of squares of has a minimum value. For the moment, the first and last interval and are excluded. The following variables are required to set up the equations for the minimization over the subdomain: are matrices and a prime on a matrix indicates the transposed matrix. From the definition of in Section 2, these matrices are related by , where is a tridiagonal symmetric matrix with elements where −2 is on the main diagonal. The equations for the iterations are obtained by replacing with . Since the sum of for is a constant for all , then where , and is the average value of over . The condition on the sum in terms of is , where is the solution of . Thus, where and (Section 7). The solution for such that has a minimum is , where . Finally, , and the solution is The iteration , where and in , respectively. For a given , and then are uniquely determined, and is computed in advance for all of the subdomains that occur in a time series.
3.1. External Boundary Conditions
If is not a fixed point, then the subdomain requires a boundary condition. Here are three possible external boundary conditions to impose at that can be used to reflect the possible behavior of the time series near the endpoint. (1). The slope of the tangent is zero at . (2) is defined so that in (3.4). This condition implies that the approximation over is a segment of a straight line. (3)An iterative process is used to obtain such that the RMS value of the remainder over the adjacent subdomain(s) has a minimum value.
Similarly, if is not a fixed point, then the external boundary conditions are obtained by replacing , , , with , , , , respectively.
4. Computational Aspects
For a time series and a partition , there is a related series defined by for , where is the average value of for . This property holds for all of the approximations in this paper,
The approximations for a time series and the averaged time series are the same to the desired accuracy provided that the same partition and the same external boundary conditions (if any) are applied.
Consequently, any time series with variable spacing can be approximated provided that the estimates of the average values of the time series over the subdomains are adequate.
The approximation for the averaged series is employed especially for larger subdomains ( or more). The efficiency of the computations is increased if the boundary conditions in the first set of iterations (even and odd) are the average of the four values of the series that straddle the subdomains and . The averaged series was used in all computations, although the approximation obtained from may be more efficient in special cases.
It is convenient to introduce another notation to represent a partition: = , where the number of elements in the subdomains in the first block is and in the last block by . These blocks are a convenient way to separate the seasons or a set of months. Also, the approximation obtained by iterating the time series times, using the partition , is denoted by (). The number of iterations is determined from the difference by imposing the condition that . in Figures 1 and 2, in Figures 3 and 4, and in Figure 5. All calculations in this paper were performed using Maple software .
Two time series, provided by the Ontario Ministry of Transportation (, p. 126), illustrate the approximations. The graph of the time series for the monthly accidents for young novice drivers is given in Figure 1 where the main feature here is the intervention that occurs at 52 months owing to the introduction of the GLS on April 1, 1994. The corresponding graph for all drivers is shown in Figure 3 where the sharp drop in the graph from the maximum in December/January to April, except for the last 2 years, is a strong feature of the series.
In Figure 1, the uniform partition = , except for the first and last blocks, provides a smooth approximation and captures the intervention well. In this case, the first and last elements are fixed ( and ). Since the sum of the elements in the remainder over is zero, then . Other uniform partitions are possible where there are 3 or 6 elements in each subdomain. The approximation in the former case is slightly less smooth than in Figure 1, and the RMS value of the remainder is 26. For the case of 6 elements, the RMS value of the remainder is 30. A more accurate approximation is obtained if the subdomains have two elements; however, this approximation has an angular appearance since it more closely approximates the time series.
In Figure 2, the approximation of Figure 1 is expressed as a sum of a trend and an oscillatory series. The partition for the trend is = . The external boundary condition implies that the tangent is horizontal at the endpoints of the series. The trend in this example is defined as a seasonal approximation of the time series where the subdomains contain 6 elements over the domain of the intervention. The remainder is the oscillatory series .
In Figure 3, the points for January or December (plus one November) and April are fixed points for the approximation . The partition is . The second approximation captures the increase in the number of accidents that occur in the summer months by approximating the remainder in to obtain where . The partition is a refinement of where 7 is replaced with with ; 8 with 3,5. Consequently, the approximations and have the same average value over the subdomains of .
The subdomains that are the same in the two partitions and are indicated by the intervals over which the approximation is zero in Figure 3. For the remaining intervals, the ratio of the RMS value of the remainder in Figure 4 to the RMS value of is equal to 0.49 so that this second approximation is significant. Furthermore, each of the ten segments of , excluding the segments in which the approximation is identically equal to zero, has a value between 0.40 and 0.52. The approximation of and the remainder appear in Figure 4.
6. Level Changes, Missing Data, and Outliers
For a step level change between and , an approximation may not provide an adequate approximation for the time series in the subdomains on both sides of the step. For , an external boundary condition is applied at such that the remainder of the approximation has a minimum RMS value. The same approach is applied to the series for . These ideas are illustrated in Figure 5 where the partition for the approximation over [1,120] is = . The approximation exhibits a phenomenon that is similar to a Fourier series near a discontinuity in that the approximation overshoots on the right and undershoots to the left of the jump. The maximum and minimum of are 1.176 and −0.173. Moving away from the jump in either direction, the oscillations of occur with rapidly decreasing amplitude. The details in item 4 of Section 2 are applied over the subdomains adjacent to to choose between the two approximations of the trend. Interventions and level shifts, from an autoregressive moving average point of view, are presented in  and .
A simple example indicates the approach for a series that has a missing value or a possible outlier at . The series is = where is not defined in the case of a missing value. For both cases, the approximation is determined for the partition = , where the value of the series is at , such that the RMS value of the remainder is a minimum. A good initial estimate for is the average value of the time series in a window about . Then an iterative process is started to obtain , as shown in Figure 6, and the RMS value of the remainder is 0.104. The smoothest approximation over occurs for , where has a minimum value. For the case of a possible outlier, for in (2.1) and provided that the ratio of the RMS value of the approximation with to the RMS value of the approximation under the assumption that is not an outlier satisfies the condition in item 4 of Section 2.
7. Properties of Approximations
Approximations of Random Samples
The point of this exercise is to determine the in item 4 of Section 2 such that the only reasonable approximation for a series of random samples is equal to the mean of the series. 12,000 random samples from the normal distribution with a mean of 0 and a standard deviation of 1 were generated using Maple to form 100 time series with 120 elements in each series. For each series, five approximations were determined where the subdomains of the uniform partition contained 3, 4, 6, 12, and 24 elements. The external boundary condition for the approximation is the condition of zero slope of the tangent. For each series, the ratio of the RMS value of the remainder for the approximation to the RMS value of the series were calculated, and the results are given in Table 1. The approximations corresponding to 24 and 12 elements are smooth and appear to reflect an underlying pattern in the series; whereas, for the cases 3 and 4, the approximations are contorted. An upper bound for is less than the minimum values in the range.
The terms in the equation (3.5) for the approximation over have a simple interpretation. For the first two terms in (3.5), and are points on straight lines. For the last term, Maple solves for and (3.6) for exactly; consequently, an accurate computation shows that and are points on the graph of a quadratic and a quartic polynomial, respectively.
To describe the properties of and , it is necessary to change variables from to where and is the central point of the interval . In the variable, the boundary conditions are applied at the points , where . The equation of the quadratic polynomial for is defined by and at so that . For any integer , is equal to at the corresponding value for , and . The equation for the quartic polynomial for , provided , is determined by , where the roots of the equation for are and . Thus, . is a measure of the smoothness of the approximation over : G(2) = 1.0, G(3) = 0.59, G(4) = 0.39, G(6) = 0.21, G(12) = 0.062, and G(24) = 0.017.
The major input for the approximation of a time series involves the partition of the domain. Initially a uniform partition is chosen and, if seasonal behavior is present in the series, a subset of the partitions cover the domain for the seasons. In general, as the length of the subintervals decreases, the approximation is less smooth and the accuracy of the approximation increases. The best approximation occurs at the point at which the approximation is acceptably smooth. The subintervals can be enlarged to determine a much smoother approximation that can be labelled as a trend while still respecting the seasonal aspects of the series; however, if an intervention is present, then some adjustment of the partition may be necessary in the region of the intervention. For time series with a well-defined local maximum or minimum, the approximation can be assigned the same value as the series by taking the partition to be a single point of the domain. For series with jumps and other complexities, examples are provided to suggest how to proceed in these cases.
An approach in the literature, as indicated in the introduction, defines the approximation at a point as a weighted average of the values of the values of the time series in a window about the point. This approach may smooth out interesting features in the time series and, if applied over a smaller intervals, the approximation will not be smooth. Since the proposed model is not based on regression, a comparison of the two approaches has not been considered.
- A. F. Williams and R. A. Shults, “Graduated driver licensing research, 2007-present: a review and commentary.,” Journal of Safety Research, vol. 41, no. 2, pp. 77–84, 2010.
- M. Brackstone, Proposal for impact evaluation of graduated licensing system on young drivers in Ontario, M.S. thesis, Department of Epidemiology and Biostatistics, University of Western Ontario, Canada, 2008.
- J. D. Langley, A. C. Wagenaar, and D. J. Begg, “An evaluation of the New Zealand graduated driver licensing system,” Accident Analysis and Prevention, vol. 28, no. 2, pp. 139–146, 1996.
- G. Janacek, Practical Time Series, Oxford University Press, New York, NY, USA, 2001.
- W. Härdle, Applied Nonparametric Regression, vol. 19 of Econometric Society Monographs, Cambridge University Press, New York, NY, USA, 1990.
- L. Keele, Semiparametric Regression for the Social Sciences, Wiley, Hoboken, NY, USA, 2008.
- B. F. Smith, P. E. Bjørstad, and W. D. Gropp, Domain Decomposition: Parallel Multilevel Methods for Elliptic Partial Differential Equation, Cambridge University Press, New York, NY, USA, 1996.
- J. Kevorkian and J. D. Cole, Perturbation Methods in Applied Mathematics, vol. 34 of Applied Mathematical Sciences, Springer, New York, NY, USA, 1981.
- Maple software, version 13, Maplesoft, Waterloo, Canada.
- W. W. S. Wei, Time Series Analysis: Univariate and Multivariate Method, Addison Wesley/Pearson, Boston, Mass, USA, 2nd edition, 2006.
- R. S. Tsay, “Outliers, level shifts, and variance changes in time series,” Journal of Forecasting, vol. 7, pp. 1–20, 1988.
Copyright © 2011 M. Brackstone and A. S. Deakin. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.