Table of Contents
ISRN Signal Processing
VolumeΒ 2011, Article IDΒ 138683, 10 pages
Research Article

Linear Estimation of Stationary Autoregressive Processes

1Engineering Department, Persian Gulf University, Davvas, 75169-13798 Bushehr Port, Iran
2Advanced Communication Research Center, Sharif University of Technology, P.O. Box 11356-11155, Tehran, Iran

Received 1 December 2010; Accepted 12 January 2011

Academic Editors: K. M.Β Prabhu and A.Β Tefas

Copyright Β© 2011 Reza Dianat and Farokh Marvasti. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.


Consider a sequence of an π‘šth-order Autoregressive (AR) stationary discrete-time process and assume that at least π‘šβˆ’1 consecutive neighboring samples of an unknown sample are available. It is not important that the neighbors are from one side or are from both the left and right sides. In this paper, we find explicit solutions for the optimal linear estimation of the unknown sample in terms of the neighbors. We write the estimation errors as the linear combination of innovation noises. We also calculate the corresponding mean square errors (MSE). To the best of our knowledge, there is no explicit solution for this problem. The known solutions are the implicit ones through orthogonality equations. Also, there are no explicit solutions when fewer than π‘šβˆ’1 samples are available. The order of the process (π‘š) and the feedback coefficients are assumed to be known.

1. Introduction

Estimation has many applications in different areas including compression and equalization [1, 2]. The linear estimation is more common due to its mathematical simplicity. The optimal linear estimation of a random variable π‘₯ in terms of 𝑦1, 𝑦2, and 𝑦𝑛 is the following linear combination𝐸̂π‘₯β‰œπ‘₯βˆ£π‘¦1,𝑦2,…,𝑦𝑛=𝑛𝑖=1𝐴𝑖𝑦𝑖,(1) where the coefficients 𝐴𝑖 must be chosen to minimize the MSE 𝐸{(π‘₯βˆ’Μ‚π‘₯)2} and 𝐸{β‹…} stands for the expected value. To minimize the MSE, we must choose 𝐴𝑖’s to satisfy the orthogonality principle as follows: 𝐸(π‘₯βˆ’Μ‚π‘₯)𝑦𝑖=0,𝑖=1,2,…,𝑛.(2) We also write the above condition asπ‘₯βˆ’Μ‚π‘₯βŸ‚π‘¦π‘–,𝑖=1,2,…,𝑛.(3) Therefore, in the optimal linear estimation, we search for the coefficients such that the error is orthogonal to the data.

A common model for many signals including image, speech, and biological signals is the AR model [1, 3–5]. This model has applications in different areas including detection [6, 7], traffic modeling [8], channel modeling [9], and forecasting [10]. An AR process is the output of an all-pole causal filter whose input is a white sequence called innovation noise [11]. We introduce another model for the process using an all-pole anticausal filter as well. The optimal linear estimation of an AR process is accomplished through the recursive solution of Yule-Walker (YW) equations using Levinson-Durbin algorithm [12]. This solution is recursive and implicit. As we will see in some cases the equation coefficients do not form a Toeplitz matrix and we cannot enjoy the complexity reduction advantage of Levinson algorithm.

To the best of our knowledge, there is no explicit solution for YW equations. Most of the focus of researchers is on model parameters estimation from observations. When researchers arrive at YW equations, they stop, since they consider the solution as known through Levinson recursion. Broersen in his method for autocorrelation function estimation form observations points to YW equations and mainly concentrates on bias reduction in estimation using finite set of observations [13, 14]. He does not attempt to find the solution for YW equations. Fattah et al. try to estimate the autocorrelation function of an ARMA model from noisy data; they again refer to YW equation set and its solution using matrix inversion and no explicit solution is proposed [15]. Xia and Kamel propose an optimization method to estimate AR model parameters from noisy data [16]. Noise is not necessarily Gaussian. The method finds a minimum for a cost function and exploits a neural-network algorithm. Again, the explicit solution of the orthogonality equations is not the goal of the paper. Hsiao proposes an algorithm to estimate the parameters of a time-varying AR system [17]. He considers the feedback coefficients of a time-varying AR process as random variables. The proposed algorithm maximizes a posteriori probabilities conditioned on the data. The recursive algorithm is compared to Monte Carlo simulation in terms of accuracy and complexity. In this paper, the aim is parameter estimation from data and not the analytic solution of orthogonality equations. In [18], a sequence of Gaussian AR vector is considered. As the sequence elements are vectors rather than scalars, the AR model is defined by matrix feedback coefficients rather than scalar feedback coefficients. The estimation here is more complex, and some independence conditions are assumed. The method is based on convex optimization, and no exact answer can be provided. Mahmoudi and Karimi propose an LS-based method to estimate AR parameters from noisy data [19]. The method exploites YW equations, but this method also does not provide the explicit solution to the equations. Another LS-based estimation method can be seen in [20].

As mentioned above, we could not see the final solution to YW orthogonality equations in the literature. In this work we have derived explicit solutions for orthogonality equations for different cases. Consider a stationary π‘šth order AR process. The order and the feedback coefficients of the process are assumed to be known, and the model parameter estimation is out of the scope of this paper. The main goal of this paper is finding the solution for the orthogonality equations. We will find the the optimal linear estimation of a sample in terms of the neighbors where at least π‘šβˆ’1 consecutive neighbors are available. The consecutive neighbors include the situations where all the π‘šβˆ’1 or more neighbors are in one side, or some of them are left neighbors and the others are right neighbors. We will show that no more that π‘š consecutive neighbors in each side are needed. Our approach is to find orthogonal estimation errors that are linear combinations of data. We use the well-known causal LTI AR model as well as our anticausal model to form orthogonal errors. The errors are formed as a linear combination of causal and anticausal innovation (process) noises. Beginning from suitable errors that are both orthogonal to the data and are linear combination of data, we arrive at linear estimations. We seek LTI system approach rather than trying to directly solve the orthogonality equations. The results of this paper for different cases can be important in situations where the equation matrices are ill-posed and the matrix inversion and other recursive algorithms become unstable.

This paper is organized as follows. In Section 2, the causal model is reviewed and the anticausal model is introduced. In Section 3, we review the forward prediction problem. We state the problem symmetries in Section 4. We see how we can use the similarities between two problems to exploit the solution of one problem to find the solution of the other problem. In Section 5, we extract a number of relations for cross-correlation functions that will be used later. We find the interpolation formulae when infinite data are available in Section 6. We find the prediction and interpolation with finite data in Sections 7 and 8, respectively. In Section 9, we present a detailed example to show that our relations and the matrix solution of the orthogonality principle result in the same coefficients. Finally, we conclude the work in Section 10.

2. Causal and Anticausal Models

A discrete-time stationary AR process 𝑠𝑛 of order π‘š is modeled as follows.𝑠𝑛+π‘Ž1π‘ π‘›βˆ’1+π‘Ž2π‘ π‘›βˆ’2+β‹―+π‘Žπ‘šπ‘ π‘›βˆ’π‘š=𝐼𝑛,π‘›βˆˆβ„€.(4) The above equation is meant for a causal LTI system. 𝐼𝑛, the input of the system, is called the innovation noise and is a stationary white sequence with the zero expected value, that is, 𝐸{πΌπ‘›πΌπ‘˜}=𝜎2𝛿[π‘›βˆ’π‘˜] and 𝐸{𝐼𝑛}=0, where 𝜎 is a positive constant. 𝛿[0]=1 and 𝛿[𝑖]=0 elsewhere. The system is causal. Therefore 𝑠𝑛, the output of the system in the time index 𝑛, is a linear combination of the inputs in the time index 𝑛 and before. So, we can write𝑠𝑛=β„Ž0𝐼𝑛+β„Ž1πΌπ‘›βˆ’1+β„Ž2πΌπ‘›βˆ’2+β‹―=βˆžξ“π‘–=0β„Žπ‘–πΌπ‘›βˆ’π‘–.(5) In the above equation, β„Žπ‘› is the impulse response of the system. Assuming the causal system model, we have β„Žπ‘›=0 for 𝑛<0. Paying attention to the whiteness of the sequence {𝐼𝑛} and from (5) we get the following result. 𝐼𝑛+π‘˜βŸ‚π‘ π‘›,π‘˜>0,π‘›βˆˆβ„€.(6)

Figure 1 is the causal model of the AR process. 𝐻(𝑧) is the Z-transform of β„Žπ‘›, which is defined as𝐻(𝑧)=βˆžξ“π‘˜=βˆ’βˆžβ„Žπ‘˜π‘§βˆ’π‘˜.(7) For the system defined by (4), we have1𝐻(𝑧)=𝐴=1(𝑧)1+π‘Ž1π‘§βˆ’1+β‹―+π‘Žπ‘šπ‘§βˆ’π‘š.(8)

Figure 1: The causal model.

Assuming a stable causal system, we conclude that the roots of 𝐴(𝑧)=0 must be inside the unit circle |𝑧|=1. The power spectral density function (PSDF) of a process is the 𝑍-transform of its autocorrelation function. The PSDF of 𝑠𝑛 is [11]𝑆𝑠(𝑧)=𝑆𝐼𝑧(𝑧)𝐻(𝑧)π»βˆ’1ξ€Έ=𝜎2𝑧𝐻(𝑧)π»βˆ’1ξ€Έ.(9) In the above equation 𝑆𝑠(𝑧) is the PSDF of 𝑠𝑛 and 𝑆𝐼(𝑧) is the PSDF of 𝐼𝑛.

We now present the anticausal model. If we apply the sequence 𝑠𝑛 to an LTI system with the transfer function π»βˆ’1(π‘§βˆ’1), we get another innovation noise called πΌξ…žπ‘›. Figure 2 demonstrates the generation of the new innovation noise. To see the whiteness of the sequence πΌξ…žπ‘›, note that the PSDF of πΌξ…žπ‘› by using Figure 2 and (9) is as follows.𝑆𝐼′(𝑧)=𝑆𝑠(𝑧)π»βˆ’1ξ€·π‘§βˆ’1ξ€Έπ»βˆ’1(𝑧)=𝜎2𝐻𝑧(𝑧)π»βˆ’1ξ€Έπ»βˆ’1ξ€·π‘§βˆ’1ξ€Έπ»βˆ’1(𝑧)=𝜎2.(10)

Figure 2: Generation of another innovation noise.

Equivalently we can apply πΌξ…žπ‘› to the inverse system with the transfer function π»ξ…ž(𝑧)=𝐻(π‘§βˆ’1) to get 𝑠𝑛. The generation of 𝑠𝑛 from πΌξ…žπ‘› is depicted in Figure 3.

Figure 3: Anticausal model.

We haveπ»ξ…žξ€·π‘§(𝑧)=π»βˆ’1ξ€Έ=1π΄ξ€·π‘§βˆ’1ξ€Έ.(11) Therefore β„Žξ…žπ‘›=β„Žβˆ’π‘›. Noting that β„Žπ‘›=0 for 𝑛<0, we see that β„Žξ…žπ‘›=0 for 𝑛>0. Also note that the roots of 𝐴(π‘§βˆ’1)=0 are outside the unit circle, as we had the roots of 𝐴(𝑧)=0 inside the unit circle. Regarding these points, we know that the system with the transfer function π»ξ…ž(𝑧) is stable and anticausal. We haveπ»ξ…ž1(𝑧)=1+π‘Ž1𝑧+π‘Ž2𝑧2+β‹―+π‘Žπ‘šπ‘§π‘š.(12)

Using the above equation and Figure 3, we get𝑠𝑛+π‘Ž1𝑠𝑛+1+π‘Ž2𝑠𝑛+2+β‹―+π‘Žπ‘šπ‘ π‘›+π‘š=πΌξ…žπ‘›,π‘›βˆˆβ„€.(13)

Also, note that𝑠𝑛=βˆžξ“π‘–=βˆ’βˆžβ„Žξ…žπ‘–πΌξ…žπ‘›βˆ’π‘–=β„Žξ…ž0πΌξ…žπ‘›+β„Žξ…žβˆ’1πΌξ…žπ‘›+1+β„Žξ…žβˆ’2πΌξ…žπ‘›+2+β‹―=β„Ž0πΌξ…žπ‘›+β„Ž1πΌξ…žπ‘›+1+β‹―=βˆžξ“π‘–=0β„Žπ‘–πΌξ…žπ‘›+𝑖.(14)

From (14) and Figure 3, we see that 𝑠𝑛 is a linear combination of πΌξ…žπ‘› and the inputs after that. The whiteness of the sequence {πΌξ…žπ‘›} gives thenπΌξ…žπ‘›βˆ’π‘˜βŸ‚π‘ π‘›,π‘›βˆˆβ„€,π‘˜>0.(15)

3. Forward Prediction

Forward prediction can be accomplished by using the whitening filter [11]. The data are whitened, and we use the equivalent white data to achieve the prediction. As an example, consider the 1-step forward prediction of 𝑠𝑛. It is seen that 𝑠𝑛 is estimated aŝ𝑠𝑛=ξπΈξ€½π‘ π‘›βˆ£π‘ π‘›βˆ’π‘˜ξ€Ύ,π‘˜>0=βˆ’π‘šξ“π‘˜=1π‘Žπ‘˜π‘ π‘›βˆ’π‘˜=βˆ’π‘Ž1π‘ π‘›βˆ’1βˆ’π‘Ž2π‘ π‘›βˆ’2βˆ’β‹―βˆ’π‘Žπ‘šπ‘ π‘›βˆ’π‘š.(16) It can bee seen from (4) that the error π‘ π‘›βˆ’Μ‚π‘ π‘› is equal to 𝐼𝑛 and therefore, from (6), it is orthogonal to π‘ π‘›βˆ’π‘˜ for π‘˜>0. It proves the optimality of (16).

The 2-step prediction can be done as [11]̂𝑠𝑛=ξπΈξ€½π‘ π‘›βˆ£π‘ π‘›βˆ’π‘˜ξ€Ύ,π‘˜β‰₯2=βˆ’π‘Ž1Μ‚π‘ π‘›βˆ’1βˆ’π‘šξ“π‘–=2π‘Žπ‘–π‘ π‘›βˆ’π‘–.(17) In the above equation, Μ‚π‘ π‘›βˆ’1 is the prediction of π‘ π‘›βˆ’1 from its previous data (1-step prediction) and is obtained by replacing 𝑛 by π‘›βˆ’1 in (16).Μ‚π‘ π‘›βˆ’1=ξπΈξ€½π‘ π‘›βˆ’1βˆ£π‘ π‘›βˆ’π‘˜ξ€Ύ,π‘˜β‰₯2=βˆ’π‘šξ“π‘˜=1π‘Žπ‘˜π‘ π‘›βˆ’π‘˜βˆ’1=βˆ’π‘Ž1π‘ π‘›βˆ’2βˆ’π‘Ž2π‘ π‘›βˆ’3βˆ’β‹―βˆ’π‘Žπ‘šπ‘ π‘›βˆ’π‘šβˆ’1.(18) From (17), (18), and (4), the estimation error is 𝑒𝑛=𝑠𝑛+π‘šξ“π‘–=2π‘Žπ‘–π‘ π‘›βˆ’π‘–βˆ’π‘Ž1π‘šξ“π‘˜=1π‘Žπ‘˜π‘ π‘›βˆ’π‘˜βˆ’1=πΌπ‘›βˆ’π‘Ž1π‘ π‘›βˆ’1βˆ’π‘Ž1π‘šξ“π‘˜=1π‘Žπ‘˜π‘ π‘›βˆ’π‘˜βˆ’1=πΌπ‘›βˆ’π‘Ž1πΌπ‘›βˆ’1.(19)

From (6), it is clear that 𝐼𝑛 and πΌπ‘›βˆ’1 are orthogonal to π‘ π‘›βˆ’π‘˜ for π‘˜β‰₯2. It proves the optimality of (17).

The higher-order predictions can be obtained in the same manner. As the final example of this section, consider the 3-step forward prediction that is accomplished as follows. ̂𝑠𝑛=ξπΈξ€½π‘ π‘›βˆ£π‘ π‘›βˆ’π‘˜ξ€Ύ,π‘˜β‰₯3=βˆ’π‘Ž1Μ‚π‘ π‘›βˆ’1βˆ’π‘Ž2Μ‚π‘ π‘›βˆ’2βˆ’π‘šξ“π‘˜=3π‘Žπ‘˜π‘ π‘›βˆ’π‘˜.(20)

In the above equation, Μ‚π‘ π‘›βˆ’1 and Μ‚π‘ π‘›βˆ’2 are the 2-step and 1-step predictions of π‘ π‘›βˆ’1 and π‘ π‘›βˆ’2, respectively, and are obtained from (17) and (16). The error is𝑒𝑛=𝑠𝑛+π‘Ž1Μ‚π‘ π‘›βˆ’1+π‘Ž2Μ‚π‘ π‘›βˆ’2+π‘šξ“π‘–=3π‘Žπ‘–π‘ π‘›βˆ’π‘–=πΌπ‘›βˆ’π‘Ž1π‘ π‘›βˆ’1βˆ’π‘Ž2π‘ π‘›βˆ’2+π‘Ž1Μ‚π‘ π‘›βˆ’1+π‘Ž2Μ‚π‘ π‘›βˆ’2=πΌπ‘›βˆ’π‘Ž1ξ€·π‘ π‘›βˆ’1βˆ’Μ‚π‘ π‘›βˆ’1ξ€Έβˆ’π‘Ž2ξ€·π‘ π‘›βˆ’2βˆ’Μ‚π‘ π‘›βˆ’2ξ€Έ=πΌπ‘›βˆ’π‘Ž1ξ€·πΌπ‘›βˆ’1βˆ’π‘Ž1πΌπ‘›βˆ’2ξ€Έβˆ’π‘Ž2πΌπ‘›βˆ’2.(21)

4. The Problem Symmetries

Consider the following linear interpolation of 𝑠𝑛 from the data around it:̂𝑠𝑛=ξπΈξ€½π‘ π‘›βˆ£π‘ π‘›βˆ’π‘˜1,π‘ π‘›βˆ’π‘˜1+1,…,π‘ π‘›βˆ’1,𝑠𝑛+1,…,𝑠𝑛+π‘˜2ξ€Ύ=π‘Žξ…žβˆ’π‘˜1π‘ π‘›βˆ’π‘˜1+π‘Žξ…žβˆ’π‘˜1+1π‘ π‘›βˆ’π‘˜1+1+β‹―+π‘Žξ…žπ‘˜2𝑠𝑛+π‘˜2.(22) The orthogonality principle givesπΈπ‘ ξ‚†ξ‚€π‘›βˆ’π‘Žξ…žβˆ’π‘˜1π‘ π‘›βˆ’π‘˜1βˆ’π‘Žξ…žβˆ’π‘˜1+1π‘ π‘›βˆ’π‘˜1+1βˆ’β‹―βˆ’π‘Žξ…žπ‘˜2𝑠𝑛+π‘˜2𝑠𝑛+𝑖=0,𝑖=βˆ’π‘˜1,βˆ’π‘˜1+1,…,π‘˜2,𝑖≠0.(23) The above equations become𝑅𝑠𝑖+π‘˜1ξ€»π‘Žξ…žβˆ’π‘˜1+𝑅𝑠𝑖+π‘˜1ξ€»π‘Žβˆ’1ξ…žβˆ’π‘˜1+1+β‹―+π‘…π‘ ξ€Ίπ‘–βˆ’π‘˜2ξ€»π‘Žξ…žπ‘˜2=𝑅𝑠[𝑖],𝑖=βˆ’π‘˜1,βˆ’π‘˜1+1,…,π‘˜2,𝑖≠0.(24) In the above equations, 𝑅𝑠[𝑖]=𝐸{π‘ π‘›π‘ π‘›βˆ’π‘–}.

Now, consider the following estimation.̂𝑠𝑛=ξπΈξ€½π‘ π‘›βˆ£π‘ π‘›βˆ’π‘˜2,π‘ π‘›βˆ’π‘˜2+1,…,π‘ π‘›βˆ’1,𝑠𝑛+1,…,𝑠𝑛+π‘˜1ξ€Ύ=π‘Žπ‘˜ξ…žξ…ž1𝑠𝑛+π‘˜1+π‘Žπ‘˜ξ…žξ…ž1βˆ’1𝑠𝑛+π‘˜1βˆ’1+β‹―+π‘Žξ…žξ…žβˆ’π‘˜2π‘ π‘›βˆ’π‘˜2.(25) The orthogonality of error to the data givesπΈπ‘ ξ‚†ξ‚€π‘›βˆ’π‘Žπ‘˜ξ…žξ…ž1𝑠𝑛+π‘˜1βˆ’π‘Žπ‘˜ξ…žξ…ž1βˆ’1𝑠𝑛+π‘˜1βˆ’1βˆ’β‹―βˆ’π‘Žξ…žξ…žβˆ’π‘˜2π‘ π‘›βˆ’π‘˜2𝑠𝑛+𝑖=0,𝑖=π‘˜1,π‘˜1βˆ’1,…,βˆ’π‘˜2,𝑖≠0.(26) They becomeπ‘…π‘ ξ€Ίπ‘–βˆ’π‘˜1ξ€»π‘Žπ‘˜ξ…žξ…ž1+π‘…π‘ ξ€Ίπ‘–βˆ’π‘˜1ξ€»π‘Ž+1π‘˜ξ…žξ…ž1βˆ’1+β‹―+𝑅𝑠𝑖+π‘˜2ξ€»π‘Žπ‘˜ξ…žξ…ž2=𝑅𝑠[𝑖],𝑖=π‘˜1,π‘˜1βˆ’1,…,βˆ’π‘˜2,𝑖≠0.(27) Regarding that the 𝑅𝑠[β‹…] is an even function, we notice that the set of equations (24) and the set of equations (27) are exactly the same. Therefore,π‘Žξ…žβˆ’π‘˜1=π‘Žπ‘˜ξ…žξ…ž1,π‘Žξ…žβˆ’π‘˜1+1=π‘Žπ‘˜ξ…žξ…ž1βˆ’1,…,π‘Žξ…žπ‘˜2=π‘Žξ…žξ…žβˆ’π‘˜2.(28)

As an example, consider the following backward prediction.̂𝑠𝑛=ξπΈξ€½π‘ π‘›βˆ£π‘ π‘›+π‘˜ξ€Ύ,π‘˜>0.(29) Using (16) and the symmetry, we get̂𝑠𝑛=ξπΈξ€½π‘ π‘›βˆ£π‘ π‘›+π‘˜ξ€Ύ,π‘˜>0=βˆ’π‘šξ“π‘˜=1π‘Žπ‘˜π‘ π‘›+π‘˜=βˆ’π‘Ž1𝑠𝑛+1βˆ’π‘Ž2𝑠𝑛+2βˆ’β‹―βˆ’π‘Žπ‘šπ‘ π‘›+π‘š.(30) The validity of the solution can also be confirmed as from (13), it is seen that the estimation error is𝑠𝑛+π‘Ž1𝑠𝑛+1+π‘Ž2𝑠𝑛+2+β‹―+π‘Žπ‘šπ‘ π‘›+π‘š=πΌξ…žπ‘›.(31) Using (15), it is clear that the error is orthogonal to the data. It proves the optimality of (30).

5. Cross-Correlation Functions

In this section, we derive a number of properties for the cross-correlations between innovation noises and the AR process. We will exploit these properties to prove our solutions.

We define 𝑅𝑠𝐼[π‘˜]=𝐸{π‘ π‘›πΌπ‘›βˆ’π‘˜} and 𝑅𝐼′𝑠[π‘˜]=𝐸{πΌξ…žπ‘›π‘ π‘›βˆ’π‘˜}. The first simple property follows from (6) and (15) as follows.𝑅𝑠𝐼[π‘˜]=𝑅𝐼′𝑠[π‘˜]=0,π‘˜<0.(32) Now, consider Figure 1. In this figure 𝐼𝑛 is the input and 𝑠𝑛 is the output. The impulse response of system is β„Žπ‘›β‰œβ„Ž[𝑛]. Therefore, we have [11]𝑅𝑠𝐼[π‘˜]=𝑅𝐼[π‘˜][π‘˜]βˆ—β„Ž=𝜎2𝛿[π‘˜][π‘˜]βˆ—β„Ž=𝜎2β„Ž[π‘˜].(33) In this equation, 𝑅𝐼[π‘˜]=𝐸{πΌπ‘›πΌπ‘›βˆ’π‘˜} and the β€œβˆ—β€ operator is the discrete convolution. Taking the 𝑍-transform from both sides of (33) and using (8), we get𝑆𝑠𝐼(𝑧)=𝜎2𝜎𝐻(𝑧)=21+π‘Ž1π‘§βˆ’1+β‹―+π‘Žπ‘šπ‘§βˆ’π‘š.(34) Or equivalently𝑆𝑠𝐼(𝑧)1+π‘Ž1π‘§βˆ’1+β‹―+π‘Žπ‘šπ‘§βˆ’π‘šξ€Έ=𝜎2.(35) Taking inverse 𝑍-transform from this equation, we have𝑅𝑠𝐼[π‘˜]+π‘Ž1𝑅𝑠𝐼[]π‘˜βˆ’1+β‹―+π‘Žπ‘šπ‘…π‘ πΌ[]π‘˜βˆ’π‘š=𝜎2𝛿[π‘˜].(36) The right side of (36) is zero for π‘˜β‰ 0.

Referring to Figure 3, we have [11]𝑅𝐼′𝑠[π‘˜]=𝑅𝐼′[π‘˜]βˆ—β„Žξ…ž[]βˆ’π‘˜=𝜎2𝛿[π‘˜]βˆ—β„Žξ…ž[]βˆ’π‘˜=𝜎2β„Žξ…ž[]βˆ’π‘˜=𝜎2β„Ž[π‘˜].(37) Again, we conclude that𝑅𝐼′𝑠[π‘˜]+π‘Ž1𝑅𝐼′𝑠[]π‘˜βˆ’1+β‹―+π‘Žπ‘šπ‘…πΌβ€²π‘ []π‘˜βˆ’π‘š=𝜎2𝛿[π‘˜].(38)

6. Interpolation Using an Infinite Set of Data

In this section, we assume that infinite number of data are available. However, we will see that only a finite number of data are sufficient.

6.1. Infinite Data on the Left Side

We want to obtain the following estimation.̂𝑠𝑛=ξπΈξ€½π‘ π‘›βˆ£π‘ π‘›+𝑖,π‘–β‰€π‘˜1ξ€Ύ,𝑖≠0.(39)

π‘˜1 is a positive integer constant not greater than π‘š. There are π‘˜1 data available on the right side of 𝑠𝑛 and infinite data on the left side. Define π‘Ž0β‰œ1. We are going to prove the following.̂𝑠𝑛=ξπΈξ€½π‘ π‘›βˆ£π‘ π‘›+𝑖,π‘–β‰€π‘˜1ξ€Ύ1,𝑖≠0=βˆ’βˆ‘π‘˜1π‘˜=0π‘Ž2π‘˜βŽ›βŽœβŽœβŽπ‘˜1ξ“π‘˜=1βŽ›βŽœβŽœβŽπ‘˜1βˆ’π‘˜ξ“π‘=0π‘Žπ‘π‘Žπ‘+π‘˜βŽžβŽŸβŽŸβŽ π‘ π‘›+π‘˜+π‘šβˆ’π‘˜1ξ“π‘˜=1βŽ›βŽœβŽœβŽπ‘˜1𝑝=0π‘Žπ‘π‘Žπ‘+π‘˜βŽžβŽŸβŽŸβŽ π‘ π‘›βˆ’π‘˜+π‘šξ“π‘˜=π‘šβˆ’π‘˜1+1ξƒ©π‘šβˆ’π‘˜ξ“π‘=0π‘Žπ‘π‘Žπ‘+π‘˜ξƒͺπ‘ π‘›βˆ’π‘˜βŽžβŽŸβŽŸβŽ .(40) Observe from (40) that although there are infinite data on the left side of 𝑠𝑛, only π‘š data π‘ π‘›βˆ’1 to π‘ π‘›βˆ’π‘š participate in the estimation. Indeed, (40) is the optimal linear estimation solution for ̂𝑠𝑛=𝐸{π‘ π‘›βˆ£π‘ π‘›+𝑖,βˆ’π‘˜2β‰€π‘–β‰€π‘˜1,𝑖≠0}, where π‘˜2 can be any integer greater than or equal to π‘š.

To prove the optimality of (40), we must show that the estimation error is orthogonal to the data. Firstly the estimation error can be calculated by inserting ̂𝑠𝑛 from (40) in 𝑒𝑛=π‘ π‘›βˆ’Μ‚π‘ π‘›. Secondly, by extending the innovation noises using (4) we can confirm that𝑒𝑛=π‘ π‘›βˆ’Μ‚π‘ π‘›=1βˆ‘π‘˜1π‘˜=0π‘Ž2π‘˜ξ€·πΌπ‘›+π‘Ž1𝐼𝑛+1+β‹―+π‘Žπ‘˜1𝐼𝑛+π‘˜1ξ€Έ.(41) Indeed, we have obtained (40) from (41). The motivation is that the estimation error has to possess two essential conditions: (1) it must be orthogonal to the data and (2) it must be only a linear combination of the data and the variable to be estimated. It remains to prove that the right side of (41) is orthogonal to the data.

Using (6), it is quite clear that 𝐼𝑛 to 𝐼𝑛+π‘˜1 are orthogonal to π‘ π‘›βˆ’π‘˜ for π‘˜>0, and so is 𝑒𝑛 in (41). Further, we have𝐸𝑠𝑛+𝑖𝐼𝑛+π‘Ž1𝐼𝑛+1+β‹―+π‘Žπ‘˜1𝐼𝑛+π‘˜1ξ€Έξ€Ύ=𝑅𝑠𝐼[𝑖]+π‘Ž1𝑅𝑠𝐼[]π‘–βˆ’1+β‹―+π‘Žπ‘˜1π‘…π‘ πΌξ€Ίπ‘–βˆ’π‘˜1ξ€»=𝑅𝑠𝐼[𝑖]+π‘Ž1𝑅𝑠𝐼[]π‘–βˆ’1+β‹―+π‘Žπ‘–π‘…π‘ πΌ[0],1β‰€π‘–β‰€π‘˜1.(42) The last equation of (42) is justified as we have 𝑅𝑠𝐼[π‘˜]=0 for π‘˜<0 from (32). Using (32), (36), (41), and (42) it is seen that𝐸𝐼𝑛+π‘Ž1𝐼𝑛+1+β‹―+π‘Žπ‘˜1𝐼𝑛+π‘˜1𝑠𝑛+𝑖=0,1β‰€π‘–β‰€π‘˜1.(43) This completes the proof.

The MSE is 𝐸𝑒2𝑛=1ξ‚€βˆ‘π‘˜1π‘˜=0π‘Ž2π‘˜ξ‚2𝐼⋅𝐸𝑛+π‘Ž1𝐼𝑛+1+β‹―+π‘Žπ‘˜1𝐼𝑛+π‘˜1ξ€Έ2=1ξ‚€βˆ‘π‘˜1π‘˜=0π‘Ž2π‘˜ξ‚2⋅𝐸𝐼2𝑛+π‘Ž21𝐸𝐼2𝑛+1ξ€Ύ+β‹―+π‘Ž2π‘˜1𝐸𝐼2𝑛+π‘˜1=1ξ‚‡ξ‚ξ‚€βˆ‘π‘˜1π‘˜=0π‘Ž2π‘˜ξ‚2β‹…ξ‚€πœŽ2+π‘Ž21𝜎2+β‹―+π‘Ž2π‘˜1𝜎2.(44)


6.2. Infinite Data on the Right Side

By symmetry, and replacing π‘ π‘›βˆ’π‘˜ by 𝑠𝑛+π‘˜ in (40), the following estimation is derived.̂𝑠𝑛=ξπΈξ€½π‘ π‘›βˆ£π‘ π‘›βˆ’π‘–,π‘–β‰€π‘˜1ξ€Ύ1,𝑖≠0=βˆ’βˆ‘π‘˜1π‘˜=0π‘Ž2π‘˜βŽ›βŽœβŽœβŽπ‘˜1ξ“π‘˜=1βŽ›βŽœβŽœβŽπ‘˜1βˆ’π‘˜ξ“π‘=0π‘Žπ‘π‘Žπ‘+π‘˜βŽžβŽŸβŽŸβŽ π‘ π‘›βˆ’π‘˜+π‘šβˆ’π‘˜1ξ“π‘˜=1βŽ›βŽœβŽœβŽπ‘˜1𝑝=0π‘Žπ‘π‘Žπ‘+π‘˜βŽžβŽŸβŽŸβŽ π‘ π‘›+π‘˜+π‘šξ“π‘˜=π‘šβˆ’π‘˜1+1ξƒ©π‘šβˆ’π‘˜ξ“π‘=0π‘Žπ‘π‘Žπ‘+π‘˜ξƒͺ𝑠𝑛+π‘˜βŽžβŽŸβŽŸβŽ .(46)

Again, only π‘š data 𝑠𝑛+1 to 𝑠𝑛+π‘š on the right side of 𝑠𝑛 participate in the interpolation, and the data after them are not needed. Therefore, (46) is the solution for all the optimal linear interpolations ̂𝑠𝑛=𝐸{π‘ π‘›βˆ£π‘ π‘›βˆ’π‘–,βˆ’π‘˜2β‰€π‘–β‰€π‘˜1,𝑖≠0}, where π‘˜2 can be any integer greater than or equal to π‘š.

The validity of (46) can also be proved as follows. The error is calculated as 𝑒𝑛=π‘ π‘›βˆ’Μ‚π‘ π‘›, where ̂𝑠𝑛 is from (46). By extending the innovation noises from (13), it can be verified that𝑒𝑛=π‘ π‘›βˆ’Μ‚π‘ π‘›=1βˆ‘π‘˜1π‘˜=0π‘Ž2π‘˜ξ‚€πΌξ…žπ‘›+π‘Ž1πΌξ…žπ‘›βˆ’1+β‹―+π‘Žπ‘˜1πΌξ…žπ‘›βˆ’π‘˜1.(47) Using (15), it is quite clear that πΌξ…žπ‘›βˆ’π‘˜1 to πΌξ…žπ‘› are orthogonal to 𝑠𝑛+π‘˜ for π‘˜>0, and so is 𝑒𝑛 in (47). Further, we haveπΈξ‚†π‘ π‘›βˆ’π‘–ξ‚€πΌξ…žπ‘›+π‘Ž1πΌξ…žπ‘›βˆ’1+β‹―+π‘Žπ‘˜1πΌξ…žπ‘›βˆ’π‘˜1=𝑅𝐼′𝑠[𝑖]+π‘Ž1𝑅𝐼′𝑠[]π‘–βˆ’1+β‹―+π‘Žπ‘˜1π‘…πΌβ€²π‘ ξ€Ίπ‘–βˆ’π‘˜1ξ€»=𝑅𝐼′𝑠[𝑖]+π‘Ž1𝑅𝐼′𝑠[]π‘–βˆ’1+β‹―+π‘Žπ‘–π‘…πΌβ€²π‘ [0],1β‰€π‘–β‰€π‘˜1.(48) The last equation of (48) is justified as we have 𝑅𝐼′𝑠[π‘˜]=0 for π‘˜<0 from (32). Using (32), (38), (47), and (48), it is seen that 𝐸{π‘’π‘›π‘ π‘›βˆ’π‘–}=0 for 1β‰€π‘–β‰€π‘˜1. This completes the proof.

The MSE is the same as in (45).

6.3. Infinite Data on Both Sides

Now, we want to estimate 𝑠𝑛 from all the data around it. We will see that only π‘š data on each side are needed and as is expected, the data with the same distance from 𝑠𝑛 participate with the same weight. We havê𝑠𝑛=ξπΈξ€½π‘ π‘›βˆ£π‘ π‘›βˆ’π‘–ξ€Ύ1,𝑖≠0=βˆ’βˆ‘π‘šπ‘˜=0π‘Ž2π‘˜β‹…ξƒ©π‘šξ“π‘˜=1ξƒ©π‘šβˆ’π‘˜ξ“π‘=0π‘Žπ‘π‘Žπ‘+π‘˜ξƒͺξ€·π‘ π‘›βˆ’π‘˜+𝑠𝑛+π‘˜ξ€Έξƒͺ.(49) This estimation can also be obtained by letting π‘˜1=π‘š in (40) or (46). Again, note that (49) is the optimal solution for the problems ̂𝑠𝑛=𝐸{π‘ π‘›βˆ£π‘ π‘›βˆ’π‘–,𝑖≠0,βˆ’π‘˜1β‰€π‘–β‰€π‘˜2}, where π‘˜1 and π‘˜2 can be any integer greater than or equal to π‘š.

The validity of (49) can also be proved as follows. The error is calculated as 𝑒𝑛=π‘ π‘›βˆ’Μ‚π‘ π‘›, where ̂𝑠𝑛 is from (49). By extending the innovation noises from (4), it can be verified that𝑒𝑛=π‘ π‘›βˆ’Μ‚π‘ π‘›=1βˆ‘π‘šπ‘˜=0π‘Ž2π‘˜ξ€·πΌπ‘›+π‘Ž1𝐼𝑛+1+β‹―+π‘Žπ‘šπΌπ‘›+π‘šξ€Έ.(50) Using (6) it is quite clear that 𝐼𝑛 to 𝐼𝑛+π‘š are orthogonal to π‘ π‘›βˆ’π‘˜ for π‘˜>0, and so is 𝑒𝑛 in (50). Further, we have𝐸𝑠𝑛+𝑖𝐼𝑛+π‘Ž1𝐼𝑛+1+β‹―+π‘Žπ‘šπΌπ‘›+π‘šξ€Έξ€Ύ=𝑅𝑠𝐼[𝑖]+π‘Ž1𝑅𝑠𝐼[]π‘–βˆ’1+β‹―+π‘Žπ‘šπ‘…π‘ πΌ[]π‘–βˆ’π‘š,𝑖>0.(51) Using (32), (36), (50), and (51), it is seen that 𝐸{𝑒𝑛𝑠𝑛+𝑖}=0 for 𝑖>0. This completes the proof.

The MSE is𝐸𝑒2𝑛=1ξ€·βˆ‘π‘šπ‘˜=0π‘Ž2π‘˜ξ€Έ2𝐼⋅𝐸𝑛+π‘Ž1𝐼𝑛+1+β‹―+π‘Žπ‘šπΌπ‘›+π‘šξ€Έ2=𝜎2βˆ‘π‘šπ‘˜=0π‘Ž2π‘˜.(52)

7. Prediction with Finite Data

Assume that only π‘šβˆ’1 consecutive data π‘ π‘›βˆ’1 to π‘ π‘›βˆ’π‘š+1 are available. We want to prove the following.̂𝑠𝑛=ξπΈξ€½π‘ π‘›βˆ£π‘ π‘›βˆ’π‘˜ξ€Ύ1,1β‰€π‘˜β‰€π‘šβˆ’1=βˆ’1βˆ’π‘Ž2π‘šπ‘šβˆ’1ξ“π‘˜=1ξ€·π‘Žπ‘˜βˆ’π‘Žπ‘šπ‘Žπ‘šβˆ’π‘˜ξ€Έπ‘ π‘›βˆ’π‘˜.(53)

The above estimation can be obtained as follows. Since π‘ π‘›βˆ’π‘š is not available we can estimate it from data π‘ π‘›βˆ’1 to π‘ π‘›βˆ’π‘š+1. The estimated value can be now used to predict 𝑠𝑛 using (16).̂𝑠𝑛=ξπΈξ€½π‘ π‘›βˆ£π‘ π‘›βˆ’π‘˜ξ€Ύ,1β‰€π‘˜β‰€π‘šβˆ’1=βˆ’π‘šβˆ’1ξ“π‘˜=1π‘Žπ‘˜π‘ π‘›βˆ’π‘˜βˆ’π‘Žπ‘šΜ‚π‘ π‘›βˆ’π‘š=βˆ’π‘Ž1π‘ π‘›βˆ’1βˆ’π‘Ž2π‘ π‘›βˆ’2βˆ’β‹―βˆ’π‘Žπ‘šβˆ’1π‘ π‘›βˆ’π‘š+1βˆ’π‘Žπ‘šΜ‚π‘ π‘›βˆ’π‘š.(54) On the other hand, π‘ π‘›βˆ’π‘š can be backward predicted using (30) asΜ‚π‘ π‘›βˆ’π‘š=ξπΈξ€½π‘ π‘›βˆ’π‘šβˆ£π‘ π‘›βˆ’π‘˜ξ€Ύ,1β‰€π‘˜β‰€π‘šβˆ’1=βˆ’π‘Ž1π‘ π‘›βˆ’π‘š+1βˆ’π‘Ž2π‘ π‘›βˆ’π‘š+2βˆ’β‹―βˆ’π‘Žπ‘šβˆ’1π‘ π‘›βˆ’1βˆ’π‘Žπ‘šΜ‚π‘ π‘›.(55) Now we have two equations (54) and (55) with two unknowns ̂𝑠𝑛 and Μ‚π‘ π‘›βˆ’π‘š. Solving these equations, we get (53). The optimality of (53) can also be proved by seeing that the estimation error is equal to𝑒𝑛=πΌπ‘›βˆ’π‘Žπ‘šπΌξ…žπ‘›βˆ’π‘š1βˆ’π‘Ž2π‘š.(56) To derive the above equation, we has used (4) and (13). It is easily seen from (6) and (15) that 𝐼𝑛 and πΌξ…žπ‘›βˆ’π‘š are orthogonal to data π‘ π‘›βˆ’1 to π‘ π‘›βˆ’π‘š+1. This proves the optimality of (53). To calculate the MSE, we note that𝐸𝑒2𝑛𝑒=πΈπ‘›ξ€·π‘ π‘›βˆ’Μ‚π‘ π‘›ξ€½π‘’ξ€Έξ€Ύ=𝐸𝑛𝑠𝑛.(57) The last equation is justified, as the error is orthogonal to the data and to the estimation which is a linear combination of the data. Inserting (56) in (57), we get𝐸𝑒𝑛𝑠𝑛=11βˆ’π‘Ž2π‘šπΌβ‹…πΈξ€½ξ€·π‘›βˆ’π‘Žπ‘šπΌξ…žπ‘›βˆ’π‘šξ€Έπ‘ π‘›ξ€Ύ=11βˆ’π‘Ž2π‘šβ‹…ξ€·π‘…π‘ πΌ[0]βˆ’π‘Žπ‘šπ‘…πΌβ€²π‘ []ξ€Έ.βˆ’π‘š(58) Finally, using (58), (32), and (36), we have𝐸𝑒2𝑛=𝜎21βˆ’π‘Ž2π‘š.(59)

Higher-order predictions with π‘šβˆ’1 data can be obtained from (53). As an example, we havê𝑠𝑛=ξπΈξ€½π‘ π‘›βˆ£π‘ π‘›βˆ’π‘˜ξ€Ύ,2β‰€π‘˜β‰€π‘š=βˆ’π‘Ž1Μ‚π‘ π‘›βˆ’1βˆ’π‘šξ“π‘˜=2π‘Žπ‘˜π‘ π‘›βˆ’π‘˜,(60) where Μ‚π‘ π‘›βˆ’1 is derived by replacing 𝑛 by π‘›βˆ’1 in (53).

We could not derive a simple general form for the estimation with less than π‘šβˆ’1 data.

8. Interpolation with Finite Data

We now derive the linear interpolation with less than π‘š data on each side. More clearly we allegê𝑠𝑛=ξπΈξ€½π‘ π‘›βˆ£π‘ π‘›+π‘˜,βˆ’π‘˜2β‰€π‘˜β‰€π‘˜1ξ€Ύ1,π‘˜β‰ 0=βˆ’βˆ‘π‘˜1π‘˜=0π‘Ž2π‘˜βˆ’βˆ‘π‘šπ‘˜=π‘˜2+1π‘Ž2π‘˜Γ—βŽ›βŽœβŽœβŽπ‘˜1ξ“π‘˜=π‘šβˆ’π‘˜2βŽ›βŽœβŽœβŽπ‘˜1βˆ’π‘˜ξ“π‘=0π‘Žπ‘π‘Žπ‘+π‘˜βŽžβŽŸβŽŸβŽ π‘ π‘›+π‘˜+π‘šβˆ’π‘˜2βˆ’1ξ“π‘˜=1βŽ›βŽœβŽœβŽπ‘˜1βˆ’π‘˜ξ“π‘=0π‘Žπ‘π‘Žπ‘+π‘˜βˆ’π‘šβˆ’π‘˜ξ“π‘=π‘˜2+1π‘Žπ‘π‘Žπ‘+π‘˜βŽžβŽŸβŽŸβŽ π‘ π‘›+π‘˜+π‘šβˆ’π‘˜1βˆ’1ξ“π‘˜=1βŽ›βŽœβŽœβŽπ‘˜1𝑝=0π‘Žπ‘π‘Žπ‘+π‘˜βˆ’π‘šβˆ’π‘˜ξ“π‘=π‘˜2βˆ’π‘˜+1π‘Žπ‘π‘Žπ‘+π‘˜βŽžβŽŸβŽŸβŽ π‘ π‘›βˆ’π‘˜+π‘˜2ξ“π‘˜=π‘šβˆ’π‘˜1βŽ›βŽœβŽœβŽπ‘˜2βˆ’π‘˜ξ“π‘=0π‘Žπ‘π‘Žπ‘+π‘˜βŽžβŽŸβŽŸβŽ π‘ π‘›βˆ’π‘˜βŽžβŽŸβŽŸβŽ .(61) In (61) we must have π‘˜1+π‘˜2β‰₯π‘šβˆ’1 and π‘˜1β‰€π‘˜2β‰€π‘šβˆ’1. It means that the distance between 𝑠𝑛 and the farthest data on the right side is less than the distance between 𝑠𝑛 and the farthest data on the left side. The optimality of (61) can be seen as we can verify that from (61), (4), and (13) the estimation error is𝑒𝑛=1βˆ‘π‘˜1π‘˜=0π‘Ž2π‘˜βˆ’βˆ‘π‘šπ‘˜=π‘˜2+1π‘Ž2π‘˜β‹…ξ€·πΌπ‘›+π‘Ž1𝐼𝑛+1+β‹―+π‘Žπ‘˜1𝐼𝑛+π‘˜1βˆ’π‘Žπ‘šπΌξ…žπ‘›βˆ’π‘šβˆ’π‘Žπ‘šβˆ’1πΌξ…žπ‘›βˆ’π‘š+1βˆ’β‹―βˆ’π‘Žπ‘˜2+1πΌξ…žπ‘›βˆ’π‘˜2βˆ’1.(62) It remains to prove that (62) is orthogonal to the data. (1)It is clear from (6) and (15) that 𝐼𝑛 to 𝐼𝑛+π‘˜1 and πΌξ…žπ‘›βˆ’π‘š to πΌξ…žπ‘›βˆ’π‘˜2βˆ’1 are orthogonal to the data π‘ π‘›βˆ’1 to π‘ π‘›βˆ’π‘˜2. Therefore the error in (62) is orthogonal to π‘ π‘›βˆ’π‘˜ for 1β‰€π‘˜β‰€π‘˜2. (2)Further from (43) and regarding that πΌξ…žπ‘›βˆ’π‘š to πΌξ…žπ‘›βˆ’π‘˜2βˆ’1 are orthogonal to the data 𝑠𝑛+1 to 𝑠𝑛+π‘˜1 according to (15), we see that the error in (62) is orthogonal to 𝑠𝑛+π‘˜ for 1β‰€π‘˜β‰€π‘˜1.

Therefore the error is orthogonal to the data and the proof is completed.

From (32), (36), and (62), the MSE is𝐸𝑒2𝑛𝑒=𝐸𝑛𝑠𝑛=1βˆ‘π‘˜1π‘˜=0π‘Ž2π‘˜βˆ’βˆ‘π‘šπ‘˜=π‘˜2+1π‘Ž2π‘˜β‹…ξ€·π‘…π‘ πΌ[0]+π‘Ž1𝑅𝑠𝐼[]βˆ’1+β‹―+π‘…π‘ πΌξ€Ίβˆ’π‘˜1ξ€»βˆ’π‘Žπ‘šπ‘…πΌβ€²π‘ []βˆ’π‘šβˆ’π‘Žπ‘šβˆ’1𝑅𝐼′𝑠[]β‹―βˆ’π‘š+1βˆ’π‘Žπ‘˜2+1π‘…πΌβ€²π‘ ξ€Ίβˆ’π‘˜2=πœŽβˆ’1ξ€»ξ€Έ2βˆ‘π‘˜1π‘˜=0π‘Ž2π‘˜βˆ’βˆ‘π‘šπ‘˜=π‘˜2+1π‘Ž2π‘˜.(63) For the case π‘˜1=π‘˜2, 2π‘˜1β‰₯π‘šβˆ’1, π‘˜1β‰€π‘šβˆ’1, we can replace π‘˜2 by π‘˜1 in (61) to achieve the following.̂𝑠𝑛=ξπΈξ€½π‘ π‘›βˆ£π‘ π‘›+π‘˜,βˆ’π‘˜1β‰€π‘˜β‰€π‘˜1ξ€Ύ1,π‘˜β‰ 0=βˆ’βˆ‘π‘˜1π‘˜=0π‘Ž2π‘˜βˆ’βˆ‘π‘šπ‘˜=π‘˜1+1π‘Ž2π‘˜β‹…βŽ›βŽœβŽœβŽπ‘˜1ξ“π‘˜=π‘šβˆ’π‘˜1βŽ›βŽœβŽœβŽπ‘˜1βˆ’π‘˜ξ“π‘=0π‘Žπ‘π‘Žπ‘+π‘˜βŽžβŽŸβŽŸβŽ ξ€·π‘ π‘›βˆ’π‘˜+𝑠𝑛+π‘˜ξ€Έ+π‘šβˆ’π‘˜1βˆ’1ξ“π‘˜=1βŽ›βŽœβŽœβŽπ‘˜1βˆ’π‘˜ξ“π‘=0π‘Žπ‘π‘Žπ‘+π‘˜βˆ’π‘šβˆ’π‘˜ξ“π‘=π‘˜1+1π‘Žπ‘π‘Žπ‘+π‘˜βŽžβŽŸβŽŸβŽ ξ€·π‘ π‘›βˆ’π‘˜+𝑠𝑛+π‘˜ξ€ΈβŽžβŽŸβŽŸβŽ .(64) As expected, we see that the data with the same distance from 𝑠𝑛 participate with the same weight.

Now, consider the case that the distance between 𝑠𝑛 and the farthest data on the right side is more than the distance between 𝑠𝑛 and the farthest data on the left side. It can be handled by the symmetry of the problem. More clearly, if we replace π‘ π‘›βˆ’π‘˜ by 𝑠𝑛+π‘˜ and vice versa in (61), we get the following.̂𝑠𝑛=ξπΈξ€½π‘ π‘›βˆ£π‘ π‘›βˆ’π‘˜,βˆ’π‘˜2β‰€π‘˜β‰€π‘˜1ξ€Ύ1,π‘˜β‰ 0=βˆ’βˆ‘π‘˜1π‘˜=0π‘Ž2π‘˜βˆ’βˆ‘π‘šπ‘˜=π‘˜2+1π‘Ž2π‘˜Γ—βŽ›βŽœβŽœβŽπ‘˜1ξ“π‘˜=π‘šβˆ’π‘˜2βŽ›βŽœβŽœβŽπ‘˜1βˆ’π‘˜ξ“π‘=0π‘Žπ‘π‘Žπ‘+π‘˜βŽžβŽŸβŽŸβŽ π‘ π‘›βˆ’π‘˜+π‘šβˆ’π‘˜2βˆ’1ξ“π‘˜=1βŽ›βŽœβŽœβŽπ‘˜1βˆ’π‘˜ξ“π‘=0π‘Žπ‘π‘Žπ‘+π‘˜βˆ’π‘šβˆ’π‘˜ξ“π‘=π‘˜2+1π‘Žπ‘π‘Žπ‘+π‘˜βŽžβŽŸβŽŸβŽ π‘ π‘›βˆ’π‘˜+π‘šβˆ’π‘˜1βˆ’1ξ“π‘˜=1βŽ›βŽœβŽœβŽπ‘˜1𝑝=0π‘Žπ‘π‘Žπ‘+π‘˜βˆ’π‘šβˆ’π‘˜ξ“π‘=π‘˜2βˆ’π‘˜+1π‘Žπ‘π‘Žπ‘+π‘˜βŽžβŽŸβŽŸβŽ π‘ π‘›+π‘˜+π‘˜2ξ“π‘˜=π‘šβˆ’π‘˜1βŽ›βŽœβŽœβŽπ‘˜2βˆ’π‘˜ξ“π‘=0π‘Žπ‘π‘Žπ‘+π‘˜βŽžβŽŸβŽŸβŽ π‘ π‘›+π‘˜βŽžβŽŸβŽŸβŽ .(65) Again in (65), π‘˜1β‰€π‘˜2β‰€π‘šβˆ’1 and π‘˜1+π‘˜2β‰₯π‘šβˆ’1. The estimation error in this case is𝑒𝑛=1βˆ‘π‘˜1π‘˜=0π‘Ž2π‘˜βˆ’βˆ‘π‘šπ‘˜=π‘˜2+1π‘Ž2π‘˜β‹…ξ‚€πΌξ…žπ‘›+π‘Ž1πΌξ…žπ‘›βˆ’1+β‹―+π‘Žπ‘˜1πΌξ…žπ‘›βˆ’π‘˜1βˆ’π‘Žπ‘šπΌπ‘›+π‘šβˆ’π‘Žπ‘šβˆ’1𝐼𝑛+π‘šβˆ’1βˆ’β‹―βˆ’π‘Žπ‘˜2+1𝐼𝑛+π‘˜2+1ξ€Έ.(66) The MSE is the same as (63). We could not find a simple general form for the case π‘˜1+π‘˜2<π‘šβˆ’1.

9. A Detailed Example

In this section we deal with a detailed example. The optimal linear estimation of the following process is desired.𝑠𝑛+0.8π‘ π‘›βˆ’1+0.3π‘ π‘›βˆ’2βˆ’0.1π‘ π‘›βˆ’3=𝐼𝑛.(67)𝐼𝑛 is the innovation noise with the unit variance 𝜎=1. We have π‘Ž1=0.8, π‘Ž2=0.3 and π‘Ž3=βˆ’0.1. The process is the response of the following 3rd order (π‘š=3) all-pole filter to the innovation noise.1𝐻(𝑧)=1+0.8π‘§βˆ’1+0.3π‘§βˆ’2βˆ’0.1π‘§βˆ’3.(68) The poles of this system are 𝑝1=0.2 and 𝑝2,3=βˆ’0.5±𝑗0.5. Taking inverse 𝑍-transform from 𝑆𝑠(𝑧)=𝐻(𝑧)𝐻(π‘§βˆ’1), we get the following autocorrelation function.𝑅𝑠[π‘˜]=π‘Ÿπ‘˜ξ€½π‘ =πΈπ‘›π‘ π‘›βˆ’π‘˜ξ€Ύ=62513542Γ—5βˆ’βˆ£π‘›βˆ£+402257Γ—2βˆ’βˆ£π‘›βˆ£/2ξ‚€ξ‚€103cos3πœ‹4π‘›ξ‚ξ‚€βˆ’26sin3πœ‹4.|𝑛|(69) From (69), we have π‘Ÿ0=1.8716, π‘Ÿ1=βˆ’1.1339, π‘Ÿ2=0.2322, π‘Ÿ3=0.3415, π‘Ÿ4=βˆ’0.4563, π‘Ÿ5=0.2858, and π‘Ÿ6=βˆ’0.0576. Now, we consider different cases.

9.1. Prediction with Finite Data

We want to derive the following optimal linear prediction.̂𝑠𝑛=ξπΈξ€½π‘ π‘›βˆ£π‘ π‘›βˆ’1,π‘ π‘›βˆ’2ξ€Ύ=𝐴1π‘ π‘›βˆ’1+𝐴2π‘ π‘›βˆ’2.(70) Using (53), we havê𝑠𝑛1=βˆ’ξ€Ί1βˆ’0.01(0.8+0.1Γ—0.3)π‘ π‘›βˆ’1+(0.3+0.1Γ—0.8)π‘ π‘›βˆ’2ξ€»=βˆ’0.8384π‘ π‘›βˆ’1βˆ’0.3838π‘ π‘›βˆ’2.(71) If we want to verify the solution using the orthogonality equations, we haveπΈπ‘ ξ€½ξ€·π‘›βˆ’π΄1π‘ π‘›βˆ’1βˆ’π΄2π‘ π‘›βˆ’2ξ€Έπ‘ π‘›βˆ’π‘˜ξ€Ύ=0,π‘˜=1,2.(72) Expanding (72), we getπ‘Ÿ0𝐴1+π‘Ÿ1𝐴2=π‘Ÿ1,π‘Ÿ1𝐴1+π‘Ÿ0𝐴2=π‘Ÿ2,(73) where π‘Ÿπ‘˜β€™s come from (69). Replacing π‘Ÿπ‘˜β€™s from (69) in (73), we get1.8716𝐴1βˆ’1.1339𝐴2=βˆ’1.1339,βˆ’1.1339𝐴1+1.8716𝐴2=0.2322,(74) Solving (74), we get the same result as (71).

9.2. Interpolation with Finite Data

Consider the following problem.̂𝑠𝑛=ξπΈξ€½π‘ π‘›βˆ£π‘ π‘›βˆ’1,𝑠𝑛+1ξ€Ύ=𝐴1π‘ π‘›βˆ’1+π΄ξ…ž1𝑠𝑛+1(75) It is the symmetric case of π‘˜1=π‘˜2=1 and we have 2π‘˜1=2=π‘šβˆ’1. Using (64), we havê𝑠𝑛[]=βˆ’1Γ—0.8βˆ’0.3Γ—(βˆ’0.1)𝑠1+0.64βˆ’0.09βˆ’0.01π‘›βˆ’1+𝑠𝑛+1𝑠=βˆ’0.5390π‘›βˆ’1+𝑠𝑛+1ξ€Έ.(76) Let us rederive the solution of (75) using the orthogonality conditions. We haveπΈπ‘ ξ€½ξ€·π‘›βˆ’π΄1π‘ π‘›βˆ’1βˆ’π΄ξ…ž1𝑠𝑛+1ξ€Έπ‘ π‘›βˆ’π‘˜ξ€Ύ=0,π‘˜=1,βˆ’1.(77) Expanding (77), we get the following.π‘Ÿ0𝐴1+π‘Ÿ2π΄ξ…ž1=π‘Ÿ1,π‘Ÿ2𝐴1+π‘Ÿ0π΄ξ…ž1=π‘Ÿ1.(78) Solving (78), we get the same answer as (76).

Now, consider the nonsymmetric following problem.̂𝑠𝑛=ξπΈξ€½π‘ π‘›βˆ£π‘ π‘›βˆ’2,π‘ π‘›βˆ’1,𝑠𝑛+1ξ€Ύ=𝐴1𝑠𝑛+1+π΄ξ…ž1π‘ π‘›βˆ’1+π΄ξ…ž2π‘ π‘›βˆ’2(79) which is the case of π‘˜1=1<π‘˜2=2β‰€π‘šβˆ’1, and π‘˜1+π‘˜2β‰₯π‘šβˆ’1. From (61), we get the following results.̂𝑠𝑛1=βˆ’β‹…ξ€·1+0.64βˆ’0.011Γ—0.8𝑠𝑛+1+(1Γ—0.8+0.8Γ—0.3βˆ’0.3Γ—(βˆ’0.1))Γ—π‘ π‘›βˆ’1+1Γ—0.3π‘ π‘›βˆ’2ξ€Έ=βˆ’0.4908𝑠𝑛+1βˆ’0.6564π‘ π‘›βˆ’1βˆ’0.1840π‘ π‘›βˆ’2.(80) Now, we want to obtain the solution of (79) using the matrix equations and we expect the same answer as (80). The orthogonality condition isπΈπ‘ ξ€½ξ€·π‘›βˆ’π΄1𝑠𝑛+1βˆ’π΄ξ…ž1π‘ π‘›βˆ’1βˆ’π΄ξ…ž2π‘ π‘›βˆ’2ξ€Έπ‘ π‘›βˆ’π‘˜ξ€Ύ=0,π‘˜=βˆ’1,1,2.(81) It follows thatπ‘Ÿ0𝐴1+π‘Ÿ2π΄ξ…ž1+π‘Ÿ3π΄ξ…ž2=π‘Ÿ1,π‘Ÿ2𝐴1+π‘Ÿ0π΄ξ…ž1+π‘Ÿ1π΄ξ…ž2=π‘Ÿ1,π‘Ÿ3𝐴1+π‘Ÿ1π΄ξ…ž1+π‘Ÿ0π΄ξ…ž2=π‘Ÿ2.(82) The result of (82) is the same as (80).

9.3. Interpolation with Infinite Data on the Left Side

We want to obtain the following estimation.̂𝑠𝑛=ξπΈξ€½π‘ π‘›βˆ£π‘ π‘›+𝑖,𝑖≀1,𝑖≠0=𝐴1𝑠𝑛+1+π΄ξ…ž1π‘ π‘›βˆ’1+π΄ξ…ž2π‘ π‘›βˆ’2+π΄ξ…ž3π‘ π‘›βˆ’3.(83) We can do it if we let π‘˜1=1 in (40). It follows that̂𝑠𝑛1=βˆ’β‹…ξ€·1+0.641Γ—0.8𝑠𝑛+1+(1Γ—0.8+0.8Γ—0.3)π‘ π‘›βˆ’1+(1Γ—0.3+0.8Γ—(βˆ’0.1))π‘ π‘›βˆ’2+1Γ—(βˆ’0.1)π‘ π‘›βˆ’3ξ€Έ=βˆ’0.4878𝑠𝑛+1βˆ’0.6341π‘ π‘›βˆ’1βˆ’0.1341π‘ π‘›βˆ’2+0.0610π‘ π‘›βˆ’3.(84)

Now we verity (84) using the orthogonality conditions.πΈπ‘ ξ€½ξ€·π‘›βˆ’π΄1𝑠𝑛+1βˆ’π΄ξ…ž1π‘ π‘›βˆ’1βˆ’π΄ξ…ž2π‘ π‘›βˆ’2βˆ’π΄ξ…ž3π‘ π‘›βˆ’3ξ€Έπ‘ π‘›βˆ’π‘˜ξ€Ύ=0,π‘˜=βˆ’1,1,2,3.(85) The following set of equations is obtainedπ‘Ÿ0𝐴1+π‘Ÿ2π΄ξ…ž1+π‘Ÿ3π΄ξ…ž2+π‘Ÿ4π΄ξ…ž3=π‘Ÿ1,π‘Ÿ2𝐴1+π‘Ÿ0π΄ξ…ž1+π‘Ÿ1π΄ξ…ž2+π‘Ÿ2π΄ξ…ž3=π‘Ÿ1,π‘Ÿ3𝐴1+π‘Ÿ1π΄ξ…ž1+π‘Ÿ0π΄ξ…ž2+π‘Ÿ1π΄ξ…ž3=π‘Ÿ2,π‘Ÿ4𝐴1+π‘Ÿ2π΄ξ…ž1+π‘Ÿ1π΄ξ…ž2+π‘Ÿ0π΄ξ…ž3=π‘Ÿ3.(86)

Note that the coefficient matrix of (86) is not Toeplitz. The result of (86) is the same as (84).

10. Conclusion

We introduced anticausal LTI model besides the known causal LTI model for AR processes. Using these models and the related innovation noises, we achieved the optimal linear interpolations for different cases. Specifically, we extracted the formulae when there are infinite data on the right, or the left sides of the variable to be estimated. We also obtained the linear prediction or interpolation with finite data. The number of data must be at least the order of the process minus one. We could not find a general simple form when fewer data are available. For the proofs of our solutions, the innovation noises and the orthogonality principle are essential.


  1. Z.-D. Chen, R.-F. Chang, and W.-J. Kuo, β€œAdaptive predictive multiplicative autoregressive model for medical image compression,” IEEE Transactions on Medical Imaging, vol. 18, no. 2, pp. 181–184, 1999. View at Publisher Β· View at Google Scholar Β· View at PubMed Β· View at Scopus
  2. Z. Zhu and H. Leung, β€œAdaptive blind equalization for chaotic communication systems using extended-Kalman filter,” IEEE Transactions on Circuits and Systems I, vol. 48, no. 8, pp. 979–989, 2001. View at Publisher Β· View at Google Scholar Β· View at Scopus
  3. D. Matrouf and J. L. Gauvain, β€œUsing AR HMM state-dependent filtering for speech enhancement,” in Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '99), pp. 785–788, March 1999.
  4. A. J. E. M. Janssen, R. N. J. Veldhuis, and L. B. Vries, β€œAdaptive interpolation of discrete-time signals that can be modeled as autoregressive processes,” IEEE Transactions on Acoustics, Speech, and Signal Processing, vol. 34, no. 2, pp. 317–330, 1986. View at Google Scholar Β· View at Scopus
  5. R. L. Burr and M. J. Cowan, β€œAutoregressive spectral models of heart rate variability: practical issues,” Journal of Electrocardiology, vol. 25, pp. 224–233, 1992. View at Publisher Β· View at Google Scholar Β· View at Scopus
  6. M. B. Sirvanci and S. S. Wolff, β€œNonparametric detection with autoregressive data,” IEEE Transactions on Information Theory, vol. 2, no. 6, pp. 725–731, 1976. View at Google Scholar Β· View at Scopus
  7. H. E. Witzgall and J. S. Goldstein, β€œDetection performance of the reduced-rank linear predictor ROCKET,” IEEE Transactions on Signal Processing, vol. 51, no. 7, pp. 1731–1738, 2003. View at Publisher Β· View at Google Scholar Β· View at Scopus
  8. A. Golaup and A. H. Aghvami, β€œModelling of MPEG4 traffic at GoP level using autoregressive processes,” in Proceedings of the 56th Vehicular Technology Conference, pp. 854–858, September 2002.
  9. S. Coleri, M. Ergen, A. Puri, and A. Bahai, β€œChannel estimation techniques based on pilot arrangement in OFDM systems,” IEEE Transactions on Broadcasting, vol. 48, no. 3, pp. 223–229, 2002. View at Publisher Β· View at Google Scholar Β· View at Scopus
  10. J. H. Kim, β€œForecasting autoregressive time series with bias-corrected parameter estimators,” International Journal of Forecasting, vol. 19, no. 3, pp. 493–502, 2003. View at Publisher Β· View at Google Scholar Β· View at Scopus
  11. A. Papoulis and S. U. Pillai, Probability, Random Variables, and Stochastic Processes, McGraw-Hill, New York, NY, USA, 2002.
  12. N. Levinson, β€œThe Wiener RMS error criterion in filter design and prediction,” Journal of Mathematical Physics, vol. 25, no. 4, pp. 261–278, 1974. View at Google Scholar
  13. P. M. T. Broersen, β€œFinite-sample bias in the Yule-Walker method of autoregressive estimation,” in Proceedings of IEEE International Instrumentation and Measurement Technology Conference, pp. 342–347, May 2008. View at Publisher Β· View at Google Scholar
  14. P. M. T. Broersen, β€œFinite-sample bias propagation in autoregressive estimation with the Yule-Walker method,” IEEE Transactions on Instrumentation and Measurement, vol. 58, no. 5, pp. 1354–1360, 2009. View at Publisher Β· View at Google Scholar Β· View at Scopus
  15. S. A. Fattah, W.-P. Zhu, and M. O. Ahmad, β€œFinite-sample bias in the Yule-Walker method of autoregressive estimation,” in Proceedings of the Canadian Conference on Electrical and Computer Engineering, pp. 001 815–001 818, May 2008.
  16. Y. Xia and M. S. Kamel, β€œA generalized least absolute deviation method for parameter estimation of autoregressive signals,” IEEE Transactions on Neural Networks, vol. 19, no. 1, pp. 107–118, 2008. View at Publisher Β· View at Google Scholar Β· View at PubMed Β· View at Scopus
  17. T. Hsiao, β€œIdentification of time-varying autoregressive systems using maximum a Posteriori estimation,” IEEE Transactions on Signal Processing, vol. 56, no. 8, pp. 3497–3509, 2008. View at Publisher Β· View at Google Scholar Β· View at Scopus
  18. J. Songsiri, J. Dahl, and L. Vandenberghe, β€œMaximum-likelihood estimation of autoregressive models with conditional independence constraints,” in Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '09), pp. 1701–1704, April 2009. View at Publisher Β· View at Google Scholar
  19. A. Mahmoudi and M. Karimi, β€œParameter estimation of autoregressive signals from observations corrupted with colored noise,” Signal Processing, vol. 90, no. 1, pp. 157–164, 2010. View at Publisher Β· View at Google Scholar Β· View at Scopus
  20. W. X. Zheng, β€œAutoregressive parameter estimation from noisy data,” IEEE Transactions on Circuits and Systems II, vol. 47, no. 1, pp. 71–75, 2000. View at Publisher Β· View at Google Scholar Β· View at Scopus