Parameter Estimation in Mean Reversion Processes with Deterministic Long-Term Trend
This paper describes a procedure based on maximum likelihood technique in two phases for estimating the parameters in mean reversion processes when the long-term trend is defined by a continued deterministic function. Closed formulas for the estimators that depend on observations of discrete paths and an estimation of the expected value of the process are obtained in the first phase. In the second phase, a reestimation scheme is proposed when a priori knowledge exists of the long-term trend. Some experimental results using simulated data sets are graphically illustrated.
In the literature it is common to find studies and applications of mean reversion models, at both practical and theoretical levels, especially in fields such as energy markets, commodities, and interest rates. In these models there is a long-term trend which acts as an attractor making the process to oscillate around it and there is a random component that adds volatility to the movement. The specific characteristics of the models will depend on the structure of the trend and the volatility of the process.
Some common models of mean reversion are those in which all the parameters in the process are constant, such as Ornstein-Uhlenbeck model, the CIR model proposed by Cox et al. , and in general the models obtained from the CKLS structure proposed by Chan et al. . In addition to these models there are other methods not so common in the literature in which the reversion is determined not by a parameter but a deterministic function or another stochastic process [3, 4]. In general, the fit of the models to specific situations is done by assigning values to the parameters and functions, either from a priori knowledge of the problem or from parameter estimation techniques based on historical data series that help to detect unobservable information from the process.
One of the first difficulties in the application of parameter estimation techniques is that it is necessary to have information with the same time resolution that is specified in the model. Since models of stochastic differential equations are formulated in continuous time and that the observed paths of the process can be obtained only in discrete time, it is necessary to discretize the EDE model. An approach frequently used for this purpose is given by Euler-Maruyama. With this methodology a new discrete-time process is obtained, with which it is possible to make inferences that are valid for continuous time model, as in the case of estimating the parameters.
There are different procedures that can be implemented such as methods based on distributional moments , Kernel methods [6, 7], least squares, and maximum likelihood [8–10]. Parameter estimation becomes fundamental too for modeling control systems. In the literature it is possible to find different methodologies for those issues such as those developed for Meng and Ding , where the parameter estimation in an equation-error autoregressive (EEAR) system is done through a transformation of the model to an equivalent one removing the autoregressive term and to obtain the parameters of the original model the principle of equivalence is used. Another methodology is proposed for Cao and Liu  where the parameter estimation in a power system is carried out from a recursive procedure to estimate simultaneously all the parameters from techniques based on the hierarchical identification principle, using gradient algorithms and least squared algorithms. Complementarily, Ding et al.  used methods based on the gradient algorithms and least squares iterative algorithms for system identification in output error (OE) and output error moving average (OEMA) systems, where the parameters that depend on unknown variables are computed using estimates of these unknown variables from previously estimated parameters.
In the area of multirate system identification, Ding et al. [14, 15] used the polynomial transformation technique to deal with the identification problem for dual-rate stochastic systems, while Sahebsara et al.  discussed the parameter estimation of multirate multi-input multioutput systems. Also, Ding and Chen  presented the combined parameter and state estimation algorithms of the lifted state-space models for general dual-rate systems based on the hierarchical identification method. Shi et al.  gave a crosstalk identification algorithm for multirate xDSL FIR systems.
In general, the auxiliary model identification idea has been used to solve the identification problem of dual-rate single-input single-output systems as shown in .
For more detailed information about multi-innovation stochastic gradient algorithm for multi-input multioutput systems and for multirate multi-input systems as well as auxiliary models based multi-innovation extended stochastic identification theory (see  and references therein). We will concentrate on one-factor stochastic model in which the parameters are estimated using maximum likelihood based on the discretized model when the long-term trend is given by a deterministic function. In this case we must estimate both the trend function and the parameters. To estimate the long-term trend function some convolution techniques and numerical differentiation are used, whereas the normality properties of residuals resulting from discretization are used for estimating parameters by maximum likelihood and in this way it is possible to obtain closed formulas for estimators based on the observations of a path of the process and the estimation of the long-term trend.
In the estimation process there may be some bias in the estimates with respect to the actual values despite the fact that they are obtained by the method of maximum likelihood. When this situation occurs it is necessary to develop alternative methodologies to have a greater adjustment to the estimates based on the initial estimates .
This paper is organized as follows. In Section 2 a description of mean reversion processes is presented for cases when the long-term trend is given by a continued deterministic function. The first phase of the estimation technique is shown in Section 3. Section 4 presents some examples and preliminary results for the estimation technique and in Section 5 (second phase), a procedure for reestimating the parameters from additional a priori knowledge is described.
2. Mean Reversion Processes with Deterministic Long-Term Trend
Mean reversion processes of one factor with constant parameters and mean given by a deterministic function can be written aswith initial condition , where , , and are constants, is a deterministic function, and is a Unidimensional Standard Brownian Motion defined on a complete probability space .
parameter is called reversion rate, is the mean reversion level or trend of long-run equilibrium, is the parameter associated with the volatility, and determines the sensitivity of the variance to the level of .
Model (1) is a generalization of the models CKLS, Chan et al. , where the mean reversion level is a deterministic function that captures the trend of the process. In a general way, plays the role of an attractor at each point in the sense that, when the trend term and therefore decreases and when a similar argument establishes that grows.
To establish the relation between the mean reversion function and the expected value of the process , we can consider (1) written in integral form as
Taking the expected value, and thus obtain the ordinary differential equation:
The solution of this equation is .
To illustrate graphically the relation between and , we use as an example. In this case the solution to (3) is given bywhere .
Figure 1 shows the dynamic behavior of , and one path of for some specific values in the parameters.
From the Figure it can be seen that the expected value of the process mimics the long-term mean: that is just the way mean reversion works.
3. Phase 1: Parameter Estimation
Similar to the procedure established in Marín Sánchez and Palacio  and Huzurbazar  the main objective is to obtain closed formulas for the parameter estimators from discrete observations of a sample path of the process. This can be achieved from the discretized maximum likelihood function, which is obtained from the process that results from the Euler-Maruyama discretization scheme applied to (1) and using the normal and independent increments properties of the Brownian Motion over the residuals of the discrete process.
Consider the differential equation (1) with the initial value . and are, respectively, the trend and diffusion functions that depend on , , and a vector of unknown parameters . Now suppose that the conditions that guarantee the existence and uniqueness of the solution Mao  are satisfied, so the expected value of exists and can be defined using and its derivative (see (3)). In this way,
Replacing in (1) results in
Now we can assume that we have the discrete observations for the times , where , . Using the numerical scheme of Euler-Maruyama on (6) it follows thatwhere .
Define the new variable :
Note that , , are independent random variables distributed .
The variable depends on the observations of the sample path of the process and on the unobservable quantities and its derivative . Taking into account that the sample path resumes all the information that is known from the process, it is necessary to estimate the expected value from those observations. As in Tifenbach , p. 31-32, we can assume that the expected value of the process can be approximated using a convolution1 over the sample path, so the variable will depend only on the observations .
Thus, we assume that the expected value of the process can be approximated by taking a convolution of the sample path for some particular convolution function given byThe most common convolution is the moving average, where constants are given by for all , which is denoted by . Other convolutions can be defined varying the relative weight of each observation: for example, they can be defined from the coefficients of odd rows of Pascal’s triangle which is denoted by . In this case the central observations have a greater weight than more distant observations and for this reason the convolution is close to the path. Those specific cases of convolutions are presented in Figure 2.
The convolution is used for the purpose of smoothing the sample path, eliminating the noise and capturing the trend. This objective can also be achieved using other techniques like filtering. In Figure 2 the Hodrick-Prescott  filter denoted by was used to obtain an estimation of that are smoother than the convolutions and .
At this point we have that the expected value can be approximated from the sample path . Now, to get the derivative of we can apply numerical derivatives techniques like Taylor’s theorem: for example, the derivative at the observation can be defined using a three-point rule asfor . For the derivate is given by and for by . This procedure can be done using a different number of points to calculate the derivate at each point.
With the estimation of and we have one realization of and now we can define its normal density function:
Since each is independent the joint density function can be expressed as the product of their marginal densities:
Consequently, the likelihood function is given by
Taking the natural logarithm on both sides it follows that
Now consider the problem of maximizing the likelihood function given by
If we assumed that the value of is known a priori, the estimation process leads to the estimated parameter given by
The estimator is determined by the equation:
Finally, with the estimation of we can now approximate with (5).
4. Preliminary Results
The aim of this section is to show the performance of the estimators defined in the previous section. To make this we will simulate some paths of a process satisfying (1) with a set of parameters and and giving a deterministic form to the function and then find the estimators with the established procedure in order to determine—using statistical properties—if the estimates are approximate to actual parameters.
We analyze the performance of the estimators for sinusoidal and parabolic trend functions and for a set of values of and when . Each generated path consists of 250 observations with . The statistics are calculated for 1000 different simulations. Considering that for the parameters estimation it is necessary to find the expected value of the process from each path, we compare two different convolution functions and the Hodrick-Prescott filter. For numerical derivation a five-point derivation rule is used.
The values of the parameters and functions used in the simulations are presented in Table 1.
In Table 2 we present some basic statistics (mean and standard deviation ) for the estimators obtained from the simulations. The convolution functions used in the simulations are the ones defined for the weights and . weights are the same defined in Section 3 for the moving average and weights are given by for from to . In this convolution the central observation has a relative weight twice the most distant observation.
From Table 2 we can conclude that the estimation technique is working well, especially in the first example (sinusoidal trend). The relative error of the estimators is less than 2% in the case of the parameter but there is a bias in the parameter that depends on the convolution function and at the same time on the parameter used in such convolution.
In Figure 3 we can see the performance of the estimation of the deterministic function . In that figure an estimation from a random path is shown instead of the average. The estimates are close to the theoretical function in both cases but having some differences at the beginning and the final of the observations. One form to measure the fit between the theoretical function and its estimation is using the Root Mean Square error (RMS). Table 3 presents that measure for the proposed estimation in cases 1 and 2.
5. Phase 2: Re-Estimation
From the simulations we can see that the estimate of is not very accurate and that differences come from and , and in turn this affects the estimation of other parameters.
If it were possible to have all the paths of the process it would not be necessary for numerical approximation of the expected value or its derivative to capture information from , and it could be enough to perform an average of all paths at each time and that could generate pretty accurate estimations. As this condition is not the usual but only has a path, the procedure detailed above is necessary and this involves the errors mentioned in the estimates of and .
To correct this deficiency it is proposed to carry out a reestimate procedure trying to adjust a function to the initial estimate from a priori knowledge of the process. This recalculation allows a correction in the other estimators while providing a functional form that allows simulations of future paths of the process.
The following procedure is proposed to perform the reestimate of :(1)Get an estimate of according to the procedure described in Section 4.(2)Define a functional structure from a priori knowledge of the process.(3)Estimate the vector parameters minimizing the error function defined by As an example, suppose the functional form of the long-term mean is defined aswhere . In this case the error function would be given byTo estimate it is necessary to solve the system of equations defined by , which is equivalent to the system of linear equations , whereA similar procedure can be implemented under the assumption that , to find if is known. A comparison between , , and is shown in Figure 4 for both cases and for a random path of the process defined with the same set of parameters of Table 1.
Having a best estimate of allows a reestimation of the parameters and . To achieve this we must perform a similar procedure to that proposed in Section 4, defining
Table 4 summarizes the results obtained from the first estimation and the reestimation procedure for the cases where has a sinusoidal form and a parabolic form and the other parameters are defined in Table 1.
From the results of the above example we can observe how the reestimation procedure reduces the bias in the initial estimation of for both sinusoidal and parabolic cases. The final bias in is approximately 1.5% of the real value of the parameter, reducing almost 90% of the error in the first estimation in the case of sinusoidal function and near 60% for the parabolic function. The reestimations of do not significantly change from the original estimate. For the case of the RMS error is present in Table 5 for a random path of the process. In both cases the error in the reestimation is less than the 50% of the error present in the initial estimation.
6. Conclusions and Comments
This paper shows how starting from a path of a stochastic process is possible to find an approximation of the parameters that define the underlying dynamics It is important to highlight that, for modeling the dynamic behavior of prices of some commodities or interest rates, only one realization of the process in discrete time is available and it represents the only source of information to capture the nature of the process. Therefore, it becomes very important to obtain and interpret the invisible and visible information found on the path to make projections more adjusted to the future process behavior being studied.
It is essential that a model that seeks to reproduce the behavior of a process that is generated in the real world incorporates a priori information provided by an expert in the topic and that this additional information allows the model to better fit the process. This prior information is key in the development of the methodology proposed in this paper, because it allows us to adjust the initial estimates of the parameters making more reliable the results obtained with the adjusted models.
On the other hand, in the development of the proposed methodology it can be seen that it allows us to obtain closed formulas for the maximum likelihood estimators in mean reversion process when the long-term trend is defined by a deterministic function, decreasing computational effort and allowing greater understanding of procedures.
The authors declare that they have no competing interests.
- A convolution function is an even, continuous, and real valued function satisfying the fact that and . A convolution of a curve is another function defined bywhere is a convolution function.In the discrete case the convolution of two sequences can be written as
D. Pilipovic, Energy Risk: Valuing and Managing Energy Derivatives, McGraw-Hill, New York, NY, USA, 2007.
B. Tifenbach, Numerical Methods for Modeling Energy Spot Prices, University of Calgary, 2000.
J. Yu and P. Phillips, “Corrigendum to ‘a Gaussian approach for continuous time models of the shortterm interest rate’,” Econometrics Journal, vol. 14, pp. 126–129, 2011.View at: Google Scholar
X. Mao, Stochastic Differential Equation and Applications, Horwood Publishing Limited, England, UK, 1997.
R. J. Hodrick and E. C. Prescott, “Postwar U.S business cycles: an empirical investigation,” Journal of Money, Credit and Banking, vol. 29, no. 1, pp. 1–16, 1997.View at: Google Scholar