Abstract

Consider a sequence of an 𝑚th-order Autoregressive (AR) stationary discrete-time process and assume that at least 𝑚1 consecutive neighboring samples of an unknown sample are available. It is not important that the neighbors are from one side or are from both the left and right sides. In this paper, we find explicit solutions for the optimal linear estimation of the unknown sample in terms of the neighbors. We write the estimation errors as the linear combination of innovation noises. We also calculate the corresponding mean square errors (MSE). To the best of our knowledge, there is no explicit solution for this problem. The known solutions are the implicit ones through orthogonality equations. Also, there are no explicit solutions when fewer than 𝑚1 samples are available. The order of the process (𝑚) and the feedback coefficients are assumed to be known.

1. Introduction

Estimation has many applications in different areas including compression and equalization [1, 2]. The linear estimation is more common due to its mathematical simplicity. The optimal linear estimation of a random variable 𝑥 in terms of 𝑦1, 𝑦2, and 𝑦𝑛 is the following linear combination𝐸̂𝑥𝑥𝑦1,𝑦2,,𝑦𝑛=𝑛𝑖=1𝐴𝑖𝑦𝑖,(1) where the coefficients 𝐴𝑖 must be chosen to minimize the MSE 𝐸{(𝑥̂𝑥)2} and 𝐸{} stands for the expected value. To minimize the MSE, we must choose 𝐴𝑖’s to satisfy the orthogonality principle as follows: 𝐸(𝑥̂𝑥)𝑦𝑖=0,𝑖=1,2,,𝑛.(2) We also write the above condition as𝑥̂𝑥𝑦𝑖,𝑖=1,2,,𝑛.(3) Therefore, in the optimal linear estimation, we search for the coefficients such that the error is orthogonal to the data.

A common model for many signals including image, speech, and biological signals is the AR model [1, 35]. This model has applications in different areas including detection [6, 7], traffic modeling [8], channel modeling [9], and forecasting [10]. An AR process is the output of an all-pole causal filter whose input is a white sequence called innovation noise [11]. We introduce another model for the process using an all-pole anticausal filter as well. The optimal linear estimation of an AR process is accomplished through the recursive solution of Yule-Walker (YW) equations using Levinson-Durbin algorithm [12]. This solution is recursive and implicit. As we will see in some cases the equation coefficients do not form a Toeplitz matrix and we cannot enjoy the complexity reduction advantage of Levinson algorithm.

To the best of our knowledge, there is no explicit solution for YW equations. Most of the focus of researchers is on model parameters estimation from observations. When researchers arrive at YW equations, they stop, since they consider the solution as known through Levinson recursion. Broersen in his method for autocorrelation function estimation form observations points to YW equations and mainly concentrates on bias reduction in estimation using finite set of observations [13, 14]. He does not attempt to find the solution for YW equations. Fattah et al. try to estimate the autocorrelation function of an ARMA model from noisy data; they again refer to YW equation set and its solution using matrix inversion and no explicit solution is proposed [15]. Xia and Kamel propose an optimization method to estimate AR model parameters from noisy data [16]. Noise is not necessarily Gaussian. The method finds a minimum for a cost function and exploits a neural-network algorithm. Again, the explicit solution of the orthogonality equations is not the goal of the paper. Hsiao proposes an algorithm to estimate the parameters of a time-varying AR system [17]. He considers the feedback coefficients of a time-varying AR process as random variables. The proposed algorithm maximizes a posteriori probabilities conditioned on the data. The recursive algorithm is compared to Monte Carlo simulation in terms of accuracy and complexity. In this paper, the aim is parameter estimation from data and not the analytic solution of orthogonality equations. In [18], a sequence of Gaussian AR vector is considered. As the sequence elements are vectors rather than scalars, the AR model is defined by matrix feedback coefficients rather than scalar feedback coefficients. The estimation here is more complex, and some independence conditions are assumed. The method is based on convex optimization, and no exact answer can be provided. Mahmoudi and Karimi propose an LS-based method to estimate AR parameters from noisy data [19]. The method exploites YW equations, but this method also does not provide the explicit solution to the equations. Another LS-based estimation method can be seen in [20].

As mentioned above, we could not see the final solution to YW orthogonality equations in the literature. In this work we have derived explicit solutions for orthogonality equations for different cases. Consider a stationary 𝑚th order AR process. The order and the feedback coefficients of the process are assumed to be known, and the model parameter estimation is out of the scope of this paper. The main goal of this paper is finding the solution for the orthogonality equations. We will find the the optimal linear estimation of a sample in terms of the neighbors where at least 𝑚1 consecutive neighbors are available. The consecutive neighbors include the situations where all the 𝑚1 or more neighbors are in one side, or some of them are left neighbors and the others are right neighbors. We will show that no more that 𝑚 consecutive neighbors in each side are needed. Our approach is to find orthogonal estimation errors that are linear combinations of data. We use the well-known causal LTI AR model as well as our anticausal model to form orthogonal errors. The errors are formed as a linear combination of causal and anticausal innovation (process) noises. Beginning from suitable errors that are both orthogonal to the data and are linear combination of data, we arrive at linear estimations. We seek LTI system approach rather than trying to directly solve the orthogonality equations. The results of this paper for different cases can be important in situations where the equation matrices are ill-posed and the matrix inversion and other recursive algorithms become unstable.

This paper is organized as follows. In Section 2, the causal model is reviewed and the anticausal model is introduced. In Section 3, we review the forward prediction problem. We state the problem symmetries in Section 4. We see how we can use the similarities between two problems to exploit the solution of one problem to find the solution of the other problem. In Section 5, we extract a number of relations for cross-correlation functions that will be used later. We find the interpolation formulae when infinite data are available in Section 6. We find the prediction and interpolation with finite data in Sections 7 and 8, respectively. In Section 9, we present a detailed example to show that our relations and the matrix solution of the orthogonality principle result in the same coefficients. Finally, we conclude the work in Section 10.

2. Causal and Anticausal Models

A discrete-time stationary AR process 𝑠𝑛 of order 𝑚 is modeled as follows.𝑠𝑛+𝑎1𝑠𝑛1+𝑎2𝑠𝑛2++𝑎𝑚𝑠𝑛𝑚=𝐼𝑛,𝑛.(4) The above equation is meant for a causal LTI system. 𝐼𝑛, the input of the system, is called the innovation noise and is a stationary white sequence with the zero expected value, that is, 𝐸{𝐼𝑛𝐼𝑘}=𝜎2𝛿[𝑛𝑘] and 𝐸{𝐼𝑛}=0, where 𝜎 is a positive constant. 𝛿[0]=1 and 𝛿[𝑖]=0 elsewhere. The system is causal. Therefore 𝑠𝑛, the output of the system in the time index 𝑛, is a linear combination of the inputs in the time index 𝑛 and before. So, we can write𝑠𝑛=0𝐼𝑛+1𝐼𝑛1+2𝐼𝑛2+=𝑖=0𝑖𝐼𝑛𝑖.(5) In the above equation, 𝑛 is the impulse response of the system. Assuming the causal system model, we have 𝑛=0 for 𝑛<0. Paying attention to the whiteness of the sequence {𝐼𝑛} and from (5) we get the following result. 𝐼𝑛+𝑘𝑠𝑛,𝑘>0,𝑛.(6)

Figure 1 is the causal model of the AR process. 𝐻(𝑧) is the Z-transform of 𝑛, which is defined as𝐻(𝑧)=𝑘=𝑘𝑧𝑘.(7) For the system defined by (4), we have1𝐻(𝑧)=𝐴=1(𝑧)1+𝑎1𝑧1++𝑎𝑚𝑧𝑚.(8)

Assuming a stable causal system, we conclude that the roots of 𝐴(𝑧)=0 must be inside the unit circle |𝑧|=1. The power spectral density function (PSDF) of a process is the 𝑍-transform of its autocorrelation function. The PSDF of 𝑠𝑛 is [11]𝑆𝑠(𝑧)=𝑆𝐼𝑧(𝑧)𝐻(𝑧)𝐻1=𝜎2𝑧𝐻(𝑧)𝐻1.(9) In the above equation 𝑆𝑠(𝑧) is the PSDF of 𝑠𝑛 and 𝑆𝐼(𝑧) is the PSDF of 𝐼𝑛.

We now present the anticausal model. If we apply the sequence 𝑠𝑛 to an LTI system with the transfer function 𝐻1(𝑧1), we get another innovation noise called 𝐼𝑛. Figure 2 demonstrates the generation of the new innovation noise. To see the whiteness of the sequence 𝐼𝑛, note that the PSDF of 𝐼𝑛 by using Figure 2 and (9) is as follows.𝑆𝐼(𝑧)=𝑆𝑠(𝑧)𝐻1𝑧1𝐻1(𝑧)=𝜎2𝐻𝑧(𝑧)𝐻1𝐻1𝑧1𝐻1(𝑧)=𝜎2.(10)

Equivalently we can apply 𝐼𝑛 to the inverse system with the transfer function 𝐻(𝑧)=𝐻(𝑧1) to get 𝑠𝑛. The generation of 𝑠𝑛 from 𝐼𝑛 is depicted in Figure 3.

We have𝐻𝑧(𝑧)=𝐻1=1𝐴𝑧1.(11) Therefore 𝑛=𝑛. Noting that 𝑛=0 for 𝑛<0, we see that 𝑛=0 for 𝑛>0. Also note that the roots of 𝐴(𝑧1)=0 are outside the unit circle, as we had the roots of 𝐴(𝑧)=0 inside the unit circle. Regarding these points, we know that the system with the transfer function 𝐻(𝑧) is stable and anticausal. We have𝐻1(𝑧)=1+𝑎1𝑧+𝑎2𝑧2++𝑎𝑚𝑧𝑚.(12)

Using the above equation and Figure 3, we get𝑠𝑛+𝑎1𝑠𝑛+1+𝑎2𝑠𝑛+2++𝑎𝑚𝑠𝑛+𝑚=𝐼𝑛,𝑛.(13)

Also, note that𝑠𝑛=𝑖=𝑖𝐼𝑛𝑖=0𝐼𝑛+1𝐼𝑛+1+2𝐼𝑛+2+=0𝐼𝑛+1𝐼𝑛+1+=𝑖=0𝑖𝐼𝑛+𝑖.(14)

From (14) and Figure 3, we see that 𝑠𝑛 is a linear combination of 𝐼𝑛 and the inputs after that. The whiteness of the sequence {𝐼𝑛} gives then𝐼𝑛𝑘𝑠𝑛,𝑛,𝑘>0.(15)

3. Forward Prediction

Forward prediction can be accomplished by using the whitening filter [11]. The data are whitened, and we use the equivalent white data to achieve the prediction. As an example, consider the 1-step forward prediction of 𝑠𝑛. It is seen that 𝑠𝑛 is estimated aŝ𝑠𝑛=𝐸𝑠𝑛𝑠𝑛𝑘,𝑘>0=𝑚𝑘=1𝑎𝑘𝑠𝑛𝑘=𝑎1𝑠𝑛1𝑎2𝑠𝑛2𝑎𝑚𝑠𝑛𝑚.(16) It can bee seen from (4) that the error 𝑠𝑛̂𝑠𝑛 is equal to 𝐼𝑛 and therefore, from (6), it is orthogonal to 𝑠𝑛𝑘 for 𝑘>0. It proves the optimality of (16).

The 2-step prediction can be done as [11]̂𝑠𝑛=𝐸𝑠𝑛𝑠𝑛𝑘,𝑘2=𝑎1̂𝑠𝑛1𝑚𝑖=2𝑎𝑖𝑠𝑛𝑖.(17) In the above equation, ̂𝑠𝑛1 is the prediction of 𝑠𝑛1 from its previous data (1-step prediction) and is obtained by replacing 𝑛 by 𝑛1 in (16).̂𝑠𝑛1=𝐸𝑠𝑛1𝑠𝑛𝑘,𝑘2=𝑚𝑘=1𝑎𝑘𝑠𝑛𝑘1=𝑎1𝑠𝑛2𝑎2𝑠𝑛3𝑎𝑚𝑠𝑛𝑚1.(18) From (17), (18), and (4), the estimation error is 𝑒𝑛=𝑠𝑛+𝑚𝑖=2𝑎𝑖𝑠𝑛𝑖𝑎1𝑚𝑘=1𝑎𝑘𝑠𝑛𝑘1=𝐼𝑛𝑎1𝑠𝑛1𝑎1𝑚𝑘=1𝑎𝑘𝑠𝑛𝑘1=𝐼𝑛𝑎1𝐼𝑛1.(19)

From (6), it is clear that 𝐼𝑛 and 𝐼𝑛1 are orthogonal to 𝑠𝑛𝑘 for 𝑘2. It proves the optimality of (17).

The higher-order predictions can be obtained in the same manner. As the final example of this section, consider the 3-step forward prediction that is accomplished as follows. ̂𝑠𝑛=𝐸𝑠𝑛𝑠𝑛𝑘,𝑘3=𝑎1̂𝑠𝑛1𝑎2̂𝑠𝑛2𝑚𝑘=3𝑎𝑘𝑠𝑛𝑘.(20)

In the above equation, ̂𝑠𝑛1 and ̂𝑠𝑛2 are the 2-step and 1-step predictions of 𝑠𝑛1 and 𝑠𝑛2, respectively, and are obtained from (17) and (16). The error is𝑒𝑛=𝑠𝑛+𝑎1̂𝑠𝑛1+𝑎2̂𝑠𝑛2+𝑚𝑖=3𝑎𝑖𝑠𝑛𝑖=𝐼𝑛𝑎1𝑠𝑛1𝑎2𝑠𝑛2+𝑎1̂𝑠𝑛1+𝑎2̂𝑠𝑛2=𝐼𝑛𝑎1𝑠𝑛1̂𝑠𝑛1𝑎2𝑠𝑛2̂𝑠𝑛2=𝐼𝑛𝑎1𝐼𝑛1𝑎1𝐼𝑛2𝑎2𝐼𝑛2.(21)

4. The Problem Symmetries

Consider the following linear interpolation of 𝑠𝑛 from the data around it:̂𝑠𝑛=𝐸𝑠𝑛𝑠𝑛𝑘1,𝑠𝑛𝑘1+1,,𝑠𝑛1,𝑠𝑛+1,,𝑠𝑛+𝑘2=𝑎𝑘1𝑠𝑛𝑘1+𝑎𝑘1+1𝑠𝑛𝑘1+1++𝑎𝑘2𝑠𝑛+𝑘2.(22) The orthogonality principle gives𝐸𝑠𝑛𝑎𝑘1𝑠𝑛𝑘1𝑎𝑘1+1𝑠𝑛𝑘1+1𝑎𝑘2𝑠𝑛+𝑘2𝑠𝑛+𝑖=0,𝑖=𝑘1,𝑘1+1,,𝑘2,𝑖0.(23) The above equations become𝑅𝑠𝑖+𝑘1𝑎𝑘1+𝑅𝑠𝑖+𝑘1𝑎1𝑘1+1++𝑅𝑠𝑖𝑘2𝑎𝑘2=𝑅𝑠[𝑖],𝑖=𝑘1,𝑘1+1,,𝑘2,𝑖0.(24) In the above equations, 𝑅𝑠[𝑖]=𝐸{𝑠𝑛𝑠𝑛𝑖}.

Now, consider the following estimation.̂𝑠𝑛=𝐸𝑠𝑛𝑠𝑛𝑘2,𝑠𝑛𝑘2+1,,𝑠𝑛1,𝑠𝑛+1,,𝑠𝑛+𝑘1=𝑎𝑘1𝑠𝑛+𝑘1+𝑎𝑘11𝑠𝑛+𝑘11++𝑎𝑘2𝑠𝑛𝑘2.(25) The orthogonality of error to the data gives𝐸𝑠𝑛𝑎𝑘1𝑠𝑛+𝑘1𝑎𝑘11𝑠𝑛+𝑘11𝑎𝑘2𝑠𝑛𝑘2𝑠𝑛+𝑖=0,𝑖=𝑘1,𝑘11,,𝑘2,𝑖0.(26) They become𝑅𝑠𝑖𝑘1𝑎𝑘1+𝑅𝑠𝑖𝑘1𝑎+1𝑘11++𝑅𝑠𝑖+𝑘2𝑎𝑘2=𝑅𝑠[𝑖],𝑖=𝑘1,𝑘11,,𝑘2,𝑖0.(27) Regarding that the 𝑅𝑠[] is an even function, we notice that the set of equations (24) and the set of equations (27) are exactly the same. Therefore,𝑎𝑘1=𝑎𝑘1,𝑎𝑘1+1=𝑎𝑘11,,𝑎𝑘2=𝑎𝑘2.(28)

As an example, consider the following backward prediction.̂𝑠𝑛=𝐸𝑠𝑛𝑠𝑛+𝑘,𝑘>0.(29) Using (16) and the symmetry, we get̂𝑠𝑛=𝐸𝑠𝑛𝑠𝑛+𝑘,𝑘>0=𝑚𝑘=1𝑎𝑘𝑠𝑛+𝑘=𝑎1𝑠𝑛+1𝑎2𝑠𝑛+2𝑎𝑚𝑠𝑛+𝑚.(30) The validity of the solution can also be confirmed as from (13), it is seen that the estimation error is𝑠𝑛+𝑎1𝑠𝑛+1+𝑎2𝑠𝑛+2++𝑎𝑚𝑠𝑛+𝑚=𝐼𝑛.(31) Using (15), it is clear that the error is orthogonal to the data. It proves the optimality of (30).

5. Cross-Correlation Functions

In this section, we derive a number of properties for the cross-correlations between innovation noises and the AR process. We will exploit these properties to prove our solutions.

We define 𝑅𝑠𝐼[𝑘]=𝐸{𝑠𝑛𝐼𝑛𝑘} and 𝑅𝐼𝑠[𝑘]=𝐸{𝐼𝑛𝑠𝑛𝑘}. The first simple property follows from (6) and (15) as follows.𝑅𝑠𝐼[𝑘]=𝑅𝐼𝑠[𝑘]=0,𝑘<0.(32) Now, consider Figure 1. In this figure 𝐼𝑛 is the input and 𝑠𝑛 is the output. The impulse response of system is 𝑛[𝑛]. Therefore, we have [11]𝑅𝑠𝐼[𝑘]=𝑅𝐼[𝑘][𝑘]=𝜎2𝛿[𝑘][𝑘]=𝜎2[𝑘].(33) In this equation, 𝑅𝐼[𝑘]=𝐸{𝐼𝑛𝐼𝑛𝑘} and the “” operator is the discrete convolution. Taking the 𝑍-transform from both sides of (33) and using (8), we get𝑆𝑠𝐼(𝑧)=𝜎2𝜎𝐻(𝑧)=21+𝑎1𝑧1++𝑎𝑚𝑧𝑚.(34) Or equivalently𝑆𝑠𝐼(𝑧)1+𝑎1𝑧1++𝑎𝑚𝑧𝑚=𝜎2.(35) Taking inverse 𝑍-transform from this equation, we have𝑅𝑠𝐼[𝑘]+𝑎1𝑅𝑠𝐼[]𝑘1++𝑎𝑚𝑅𝑠𝐼[]𝑘𝑚=𝜎2𝛿[𝑘].(36) The right side of (36) is zero for 𝑘0.

Referring to Figure 3, we have [11]𝑅𝐼𝑠[𝑘]=𝑅𝐼[𝑘][]𝑘=𝜎2𝛿[𝑘][]𝑘=𝜎2[]𝑘=𝜎2[𝑘].(37) Again, we conclude that𝑅𝐼𝑠[𝑘]+𝑎1𝑅𝐼𝑠[]𝑘1++𝑎𝑚𝑅𝐼𝑠[]𝑘𝑚=𝜎2𝛿[𝑘].(38)

6. Interpolation Using an Infinite Set of Data

In this section, we assume that infinite number of data are available. However, we will see that only a finite number of data are sufficient.

6.1. Infinite Data on the Left Side

We want to obtain the following estimation.̂𝑠𝑛=𝐸𝑠𝑛𝑠𝑛+𝑖,𝑖𝑘1,𝑖0.(39)

𝑘1 is a positive integer constant not greater than 𝑚. There are 𝑘1 data available on the right side of 𝑠𝑛 and infinite data on the left side. Define 𝑎01. We are going to prove the following.̂𝑠𝑛=𝐸𝑠𝑛𝑠𝑛+𝑖,𝑖𝑘11,𝑖0=𝑘1𝑘=0𝑎2𝑘𝑘1𝑘=1𝑘1𝑘𝑝=0𝑎𝑝𝑎𝑝+𝑘𝑠𝑛+𝑘+𝑚𝑘1𝑘=1𝑘1𝑝=0𝑎𝑝𝑎𝑝+𝑘𝑠𝑛𝑘+𝑚𝑘=𝑚𝑘1+1𝑚𝑘𝑝=0𝑎𝑝𝑎𝑝+𝑘𝑠𝑛𝑘.(40) Observe from (40) that although there are infinite data on the left side of 𝑠𝑛, only 𝑚 data 𝑠𝑛1 to 𝑠𝑛𝑚 participate in the estimation. Indeed, (40) is the optimal linear estimation solution for ̂𝑠𝑛=𝐸{𝑠𝑛𝑠𝑛+𝑖,𝑘2𝑖𝑘1,𝑖0}, where 𝑘2 can be any integer greater than or equal to 𝑚.

To prove the optimality of (40), we must show that the estimation error is orthogonal to the data. Firstly the estimation error can be calculated by inserting ̂𝑠𝑛 from (40) in 𝑒𝑛=𝑠𝑛̂𝑠𝑛. Secondly, by extending the innovation noises using (4) we can confirm that𝑒𝑛=𝑠𝑛̂𝑠𝑛=1𝑘1𝑘=0𝑎2𝑘𝐼𝑛+𝑎1𝐼𝑛+1++𝑎𝑘1𝐼𝑛+𝑘1.(41) Indeed, we have obtained (40) from (41). The motivation is that the estimation error has to possess two essential conditions: (1) it must be orthogonal to the data and (2) it must be only a linear combination of the data and the variable to be estimated. It remains to prove that the right side of (41) is orthogonal to the data.

Using (6), it is quite clear that 𝐼𝑛 to 𝐼𝑛+𝑘1 are orthogonal to 𝑠𝑛𝑘 for 𝑘>0, and so is 𝑒𝑛 in (41). Further, we have𝐸𝑠𝑛+𝑖𝐼𝑛+𝑎1𝐼𝑛+1++𝑎𝑘1𝐼𝑛+𝑘1=𝑅𝑠𝐼[𝑖]+𝑎1𝑅𝑠𝐼[]𝑖1++𝑎𝑘1𝑅𝑠𝐼𝑖𝑘1=𝑅𝑠𝐼[𝑖]+𝑎1𝑅𝑠𝐼[]𝑖1++𝑎𝑖𝑅𝑠𝐼[0],1𝑖𝑘1.(42) The last equation of (42) is justified as we have 𝑅𝑠𝐼[𝑘]=0 for 𝑘<0 from (32). Using (32), (36), (41), and (42) it is seen that𝐸𝐼𝑛+𝑎1𝐼𝑛+1++𝑎𝑘1𝐼𝑛+𝑘1𝑠𝑛+𝑖=0,1𝑖𝑘1.(43) This completes the proof.

The MSE is 𝐸𝑒2𝑛=1𝑘1𝑘=0𝑎2𝑘2𝐼𝐸𝑛+𝑎1𝐼𝑛+1++𝑎𝑘1𝐼𝑛+𝑘12=1𝑘1𝑘=0𝑎2𝑘2𝐸𝐼2𝑛+𝑎21𝐸𝐼2𝑛+1++𝑎2𝑘1𝐸𝐼2𝑛+𝑘1=1𝑘1𝑘=0𝑎2𝑘2𝜎2+𝑎21𝜎2++𝑎2𝑘1𝜎2.(44)

Therefore,𝐸𝑒2𝑛=𝜎2𝑘1𝑘=0𝑎2𝑘.(45)

6.2. Infinite Data on the Right Side

By symmetry, and replacing 𝑠𝑛𝑘 by 𝑠𝑛+𝑘 in (40), the following estimation is derived.̂𝑠𝑛=𝐸𝑠𝑛𝑠𝑛𝑖,𝑖𝑘11,𝑖0=𝑘1𝑘=0𝑎2𝑘𝑘1𝑘=1𝑘1𝑘𝑝=0𝑎𝑝𝑎𝑝+𝑘𝑠𝑛𝑘+𝑚𝑘1𝑘=1𝑘1𝑝=0𝑎𝑝𝑎𝑝+𝑘𝑠𝑛+𝑘+𝑚𝑘=𝑚𝑘1+1𝑚𝑘𝑝=0𝑎𝑝𝑎𝑝+𝑘𝑠𝑛+𝑘.(46)

Again, only 𝑚 data 𝑠𝑛+1 to 𝑠𝑛+𝑚 on the right side of 𝑠𝑛 participate in the interpolation, and the data after them are not needed. Therefore, (46) is the solution for all the optimal linear interpolations ̂𝑠𝑛=𝐸{𝑠𝑛𝑠𝑛𝑖,𝑘2𝑖𝑘1,𝑖0}, where 𝑘2 can be any integer greater than or equal to 𝑚.

The validity of (46) can also be proved as follows. The error is calculated as 𝑒𝑛=𝑠𝑛̂𝑠𝑛, where ̂𝑠𝑛 is from (46). By extending the innovation noises from (13), it can be verified that𝑒𝑛=𝑠𝑛̂𝑠𝑛=1𝑘1𝑘=0𝑎2𝑘𝐼𝑛+𝑎1𝐼𝑛1++𝑎𝑘1𝐼𝑛𝑘1.(47) Using (15), it is quite clear that 𝐼𝑛𝑘1 to 𝐼𝑛 are orthogonal to 𝑠𝑛+𝑘 for 𝑘>0, and so is 𝑒𝑛 in (47). Further, we have𝐸𝑠𝑛𝑖𝐼𝑛+𝑎1𝐼𝑛1++𝑎𝑘1𝐼𝑛𝑘1=𝑅𝐼𝑠[𝑖]+𝑎1𝑅𝐼𝑠[]𝑖1++𝑎𝑘1𝑅𝐼𝑠𝑖𝑘1=𝑅𝐼𝑠[𝑖]+𝑎1𝑅𝐼𝑠[]𝑖1++𝑎𝑖𝑅𝐼𝑠[0],1𝑖𝑘1.(48) The last equation of (48) is justified as we have 𝑅𝐼𝑠[𝑘]=0 for 𝑘<0 from (32). Using (32), (38), (47), and (48), it is seen that 𝐸{𝑒𝑛𝑠𝑛𝑖}=0 for 1𝑖𝑘1. This completes the proof.

The MSE is the same as in (45).

6.3. Infinite Data on Both Sides

Now, we want to estimate 𝑠𝑛 from all the data around it. We will see that only 𝑚 data on each side are needed and as is expected, the data with the same distance from 𝑠𝑛 participate with the same weight. We havê𝑠𝑛=𝐸𝑠𝑛𝑠𝑛𝑖1,𝑖0=𝑚𝑘=0𝑎2𝑘𝑚𝑘=1𝑚𝑘𝑝=0𝑎𝑝𝑎𝑝+𝑘𝑠𝑛𝑘+𝑠𝑛+𝑘.(49) This estimation can also be obtained by letting 𝑘1=𝑚 in (40) or (46). Again, note that (49) is the optimal solution for the problems ̂𝑠𝑛=𝐸{𝑠𝑛𝑠𝑛𝑖,𝑖0,𝑘1𝑖𝑘2}, where 𝑘1 and 𝑘2 can be any integer greater than or equal to 𝑚.

The validity of (49) can also be proved as follows. The error is calculated as 𝑒𝑛=𝑠𝑛̂𝑠𝑛, where ̂𝑠𝑛 is from (49). By extending the innovation noises from (4), it can be verified that𝑒𝑛=𝑠𝑛̂𝑠𝑛=1𝑚𝑘=0𝑎2𝑘𝐼𝑛+𝑎1𝐼𝑛+1++𝑎𝑚𝐼𝑛+𝑚.(50) Using (6) it is quite clear that 𝐼𝑛 to 𝐼𝑛+𝑚 are orthogonal to 𝑠𝑛𝑘 for 𝑘>0, and so is 𝑒𝑛 in (50). Further, we have𝐸𝑠𝑛+𝑖𝐼𝑛+𝑎1𝐼𝑛+1++𝑎𝑚𝐼𝑛+𝑚=𝑅𝑠𝐼[𝑖]+𝑎1𝑅𝑠𝐼[]𝑖1++𝑎𝑚𝑅𝑠𝐼[]𝑖𝑚,𝑖>0.(51) Using (32), (36), (50), and (51), it is seen that 𝐸{𝑒𝑛𝑠𝑛+𝑖}=0 for 𝑖>0. This completes the proof.

The MSE is𝐸𝑒2𝑛=1𝑚𝑘=0𝑎2𝑘2𝐼𝐸𝑛+𝑎1𝐼𝑛+1++𝑎𝑚𝐼𝑛+𝑚2=𝜎2𝑚𝑘=0𝑎2𝑘.(52)

7. Prediction with Finite Data

Assume that only 𝑚1 consecutive data 𝑠𝑛1 to 𝑠𝑛𝑚+1 are available. We want to prove the following.̂𝑠𝑛=𝐸𝑠𝑛𝑠𝑛𝑘1,1𝑘𝑚1=1𝑎2𝑚𝑚1𝑘=1𝑎𝑘𝑎𝑚𝑎𝑚𝑘𝑠𝑛𝑘.(53)

The above estimation can be obtained as follows. Since 𝑠𝑛𝑚 is not available we can estimate it from data 𝑠𝑛1 to 𝑠𝑛𝑚+1. The estimated value can be now used to predict 𝑠𝑛 using (16).̂𝑠𝑛=𝐸𝑠𝑛𝑠𝑛𝑘,1𝑘𝑚1=𝑚1𝑘=1𝑎𝑘𝑠𝑛𝑘𝑎𝑚̂𝑠𝑛𝑚=𝑎1𝑠𝑛1𝑎2𝑠𝑛2𝑎𝑚1𝑠𝑛𝑚+1𝑎𝑚̂𝑠𝑛𝑚.(54) On the other hand, 𝑠𝑛𝑚 can be backward predicted using (30) aŝ𝑠𝑛𝑚=𝐸𝑠𝑛𝑚𝑠𝑛𝑘,1𝑘𝑚1=𝑎1𝑠𝑛𝑚+1𝑎2𝑠𝑛𝑚+2𝑎𝑚1𝑠𝑛1𝑎𝑚̂𝑠𝑛.(55) Now we have two equations (54) and (55) with two unknowns ̂𝑠𝑛 and ̂𝑠𝑛𝑚. Solving these equations, we get (53). The optimality of (53) can also be proved by seeing that the estimation error is equal to𝑒𝑛=𝐼𝑛𝑎𝑚𝐼𝑛𝑚1𝑎2𝑚.(56) To derive the above equation, we has used (4) and (13). It is easily seen from (6) and (15) that 𝐼𝑛 and 𝐼𝑛𝑚 are orthogonal to data 𝑠𝑛1 to 𝑠𝑛𝑚+1. This proves the optimality of (53). To calculate the MSE, we note that𝐸𝑒2𝑛𝑒=𝐸𝑛𝑠𝑛̂𝑠𝑛𝑒=𝐸𝑛𝑠𝑛.(57) The last equation is justified, as the error is orthogonal to the data and to the estimation which is a linear combination of the data. Inserting (56) in (57), we get𝐸𝑒𝑛𝑠𝑛=11𝑎2𝑚𝐼𝐸𝑛𝑎𝑚𝐼𝑛𝑚𝑠𝑛=11𝑎2𝑚𝑅𝑠𝐼[0]𝑎𝑚𝑅𝐼𝑠[].𝑚(58) Finally, using (58), (32), and (36), we have𝐸𝑒2𝑛=𝜎21𝑎2𝑚.(59)

Higher-order predictions with 𝑚1 data can be obtained from (53). As an example, we havê𝑠𝑛=𝐸𝑠𝑛𝑠𝑛𝑘,2𝑘𝑚=𝑎1̂𝑠𝑛1𝑚𝑘=2𝑎𝑘𝑠𝑛𝑘,(60) where ̂𝑠𝑛1 is derived by replacing 𝑛 by 𝑛1 in (53).

We could not derive a simple general form for the estimation with less than 𝑚1 data.

8. Interpolation with Finite Data

We now derive the linear interpolation with less than 𝑚 data on each side. More clearly we allegê𝑠𝑛=𝐸𝑠𝑛𝑠𝑛+𝑘,𝑘2𝑘𝑘11,𝑘0=𝑘1𝑘=0𝑎2𝑘𝑚𝑘=𝑘2+1𝑎2𝑘×𝑘1𝑘=𝑚𝑘2𝑘1𝑘𝑝=0𝑎𝑝𝑎𝑝+𝑘𝑠𝑛+𝑘+𝑚𝑘21𝑘=1𝑘1𝑘𝑝=0𝑎𝑝𝑎𝑝+𝑘𝑚𝑘𝑝=𝑘2+1𝑎𝑝𝑎𝑝+𝑘𝑠𝑛+𝑘+𝑚𝑘11𝑘=1𝑘1𝑝=0𝑎𝑝𝑎𝑝+𝑘𝑚𝑘𝑝=𝑘2𝑘+1𝑎𝑝𝑎𝑝+𝑘𝑠𝑛𝑘+𝑘2𝑘=𝑚𝑘1𝑘2𝑘𝑝=0𝑎𝑝𝑎𝑝+𝑘𝑠𝑛𝑘.(61) In (61) we must have 𝑘1+𝑘2𝑚1 and 𝑘1𝑘2𝑚1. It means that the distance between 𝑠𝑛 and the farthest data on the right side is less than the distance between 𝑠𝑛 and the farthest data on the left side. The optimality of (61) can be seen as we can verify that from (61), (4), and (13) the estimation error is𝑒𝑛=1𝑘1𝑘=0𝑎2𝑘𝑚𝑘=𝑘2+1𝑎2𝑘𝐼𝑛+𝑎1𝐼𝑛+1++𝑎𝑘1𝐼𝑛+𝑘1𝑎𝑚𝐼𝑛𝑚𝑎𝑚1𝐼𝑛𝑚+1𝑎𝑘2+1𝐼𝑛𝑘21.(62) It remains to prove that (62) is orthogonal to the data. (1)It is clear from (6) and (15) that 𝐼𝑛 to 𝐼𝑛+𝑘1 and 𝐼𝑛𝑚 to 𝐼𝑛𝑘21 are orthogonal to the data 𝑠𝑛1 to 𝑠𝑛𝑘2. Therefore the error in (62) is orthogonal to 𝑠𝑛𝑘 for 1𝑘𝑘2. (2)Further from (43) and regarding that 𝐼𝑛𝑚 to 𝐼𝑛𝑘21 are orthogonal to the data 𝑠𝑛+1 to 𝑠𝑛+𝑘1 according to (15), we see that the error in (62) is orthogonal to 𝑠𝑛+𝑘 for 1𝑘𝑘1.

Therefore the error is orthogonal to the data and the proof is completed.

From (32), (36), and (62), the MSE is𝐸𝑒2𝑛𝑒=𝐸𝑛𝑠𝑛=1𝑘1𝑘=0𝑎2𝑘𝑚𝑘=𝑘2+1𝑎2𝑘𝑅𝑠𝐼[0]+𝑎1𝑅𝑠𝐼[]1++𝑅𝑠𝐼𝑘1𝑎𝑚𝑅𝐼𝑠[]𝑚𝑎𝑚1𝑅𝐼𝑠[]𝑚+1𝑎𝑘2+1𝑅𝐼𝑠𝑘2=𝜎12𝑘1𝑘=0𝑎2𝑘𝑚𝑘=𝑘2+1𝑎2𝑘.(63) For the case 𝑘1=𝑘2, 2𝑘1𝑚1, 𝑘1𝑚1, we can replace 𝑘2 by 𝑘1 in (61) to achieve the following.̂𝑠𝑛=𝐸𝑠𝑛𝑠𝑛+𝑘,𝑘1𝑘𝑘11,𝑘0=𝑘1𝑘=0𝑎2𝑘𝑚𝑘=𝑘1+1𝑎2𝑘𝑘1𝑘=𝑚𝑘1𝑘1𝑘𝑝=0𝑎𝑝𝑎𝑝+𝑘𝑠𝑛𝑘+𝑠𝑛+𝑘+𝑚𝑘11𝑘=1𝑘1𝑘𝑝=0𝑎𝑝𝑎𝑝+𝑘𝑚𝑘𝑝=𝑘1+1𝑎𝑝𝑎𝑝+𝑘𝑠𝑛𝑘+𝑠𝑛+𝑘.(64) As expected, we see that the data with the same distance from 𝑠𝑛 participate with the same weight.

Now, consider the case that the distance between 𝑠𝑛 and the farthest data on the right side is more than the distance between 𝑠𝑛 and the farthest data on the left side. It can be handled by the symmetry of the problem. More clearly, if we replace 𝑠𝑛𝑘 by 𝑠𝑛+𝑘 and vice versa in (61), we get the following.̂𝑠𝑛=𝐸𝑠𝑛𝑠𝑛𝑘,𝑘2𝑘𝑘11,𝑘0=𝑘1𝑘=0𝑎2𝑘𝑚𝑘=𝑘2+1𝑎2𝑘×𝑘1𝑘=𝑚𝑘2𝑘1𝑘𝑝=0𝑎𝑝𝑎𝑝+𝑘𝑠𝑛𝑘+𝑚𝑘21𝑘=1𝑘1𝑘𝑝=0𝑎𝑝𝑎𝑝+𝑘𝑚𝑘𝑝=𝑘2+1𝑎𝑝𝑎𝑝+𝑘𝑠𝑛𝑘+𝑚𝑘11𝑘=1𝑘1𝑝=0𝑎𝑝𝑎𝑝+𝑘𝑚𝑘𝑝=𝑘2𝑘+1𝑎𝑝𝑎𝑝+𝑘𝑠𝑛+𝑘+𝑘2𝑘=𝑚𝑘1𝑘2𝑘𝑝=0𝑎𝑝𝑎𝑝+𝑘𝑠𝑛+𝑘.(65) Again in (65), 𝑘1𝑘2𝑚1 and 𝑘1+𝑘2𝑚1. The estimation error in this case is𝑒𝑛=1𝑘1𝑘=0𝑎2𝑘𝑚𝑘=𝑘2+1𝑎2𝑘𝐼𝑛+𝑎1𝐼𝑛1++𝑎𝑘1𝐼𝑛𝑘1𝑎𝑚𝐼𝑛+𝑚𝑎𝑚1𝐼𝑛+𝑚1𝑎𝑘2+1𝐼𝑛+𝑘2+1.(66) The MSE is the same as (63). We could not find a simple general form for the case 𝑘1+𝑘2<𝑚1.

9. A Detailed Example

In this section we deal with a detailed example. The optimal linear estimation of the following process is desired.𝑠𝑛+0.8𝑠𝑛1+0.3𝑠𝑛20.1𝑠𝑛3=𝐼𝑛.(67)𝐼𝑛 is the innovation noise with the unit variance 𝜎=1. We have 𝑎1=0.8, 𝑎2=0.3 and 𝑎3=0.1. The process is the response of the following 3rd order (𝑚=3) all-pole filter to the innovation noise.1𝐻(𝑧)=1+0.8𝑧1+0.3𝑧20.1𝑧3.(68) The poles of this system are 𝑝1=0.2 and 𝑝2,3=0.5±𝑗0.5. Taking inverse 𝑍-transform from 𝑆𝑠(𝑧)=𝐻(𝑧)𝐻(𝑧1), we get the following autocorrelation function.𝑅𝑠[𝑘]=𝑟𝑘𝑠=𝐸𝑛𝑠𝑛𝑘=62513542×5𝑛+402257×2𝑛/2103cos3𝜋4𝑛26sin3𝜋4.|𝑛|(69) From (69), we have 𝑟0=1.8716, 𝑟1=1.1339, 𝑟2=0.2322, 𝑟3=0.3415, 𝑟4=0.4563, 𝑟5=0.2858, and 𝑟6=0.0576. Now, we consider different cases.

9.1. Prediction with Finite Data

We want to derive the following optimal linear prediction.̂𝑠𝑛=𝐸𝑠𝑛𝑠𝑛1,𝑠𝑛2=𝐴1𝑠𝑛1+𝐴2𝑠𝑛2.(70) Using (53), we havê𝑠𝑛1=10.01(0.8+0.1×0.3)𝑠𝑛1+(0.3+0.1×0.8)𝑠𝑛2=0.8384𝑠𝑛10.3838𝑠𝑛2.(71) If we want to verify the solution using the orthogonality equations, we have𝐸𝑠𝑛𝐴1𝑠𝑛1𝐴2𝑠𝑛2𝑠𝑛𝑘=0,𝑘=1,2.(72) Expanding (72), we get𝑟0𝐴1+𝑟1𝐴2=𝑟1,𝑟1𝐴1+𝑟0𝐴2=𝑟2,(73) where 𝑟𝑘’s come from (69). Replacing 𝑟𝑘’s from (69) in (73), we get1.8716𝐴11.1339𝐴2=1.1339,1.1339𝐴1+1.8716𝐴2=0.2322,(74) Solving (74), we get the same result as (71).

9.2. Interpolation with Finite Data

Consider the following problem.̂𝑠𝑛=𝐸𝑠𝑛𝑠𝑛1,𝑠𝑛+1=𝐴1𝑠𝑛1+𝐴1𝑠𝑛+1(75) It is the symmetric case of 𝑘1=𝑘2=1 and we have 2𝑘1=2=𝑚1. Using (64), we havê𝑠𝑛[]=1×0.80.3×(0.1)𝑠1+0.640.090.01𝑛1+𝑠𝑛+1𝑠=0.5390𝑛1+𝑠𝑛+1.(76) Let us rederive the solution of (75) using the orthogonality conditions. We have𝐸𝑠𝑛𝐴1𝑠𝑛1𝐴1𝑠𝑛+1𝑠𝑛𝑘=0,𝑘=1,1.(77) Expanding (77), we get the following.𝑟0𝐴1+𝑟2𝐴1=𝑟1,𝑟2𝐴1+𝑟0𝐴1=𝑟1.(78) Solving (78), we get the same answer as (76).

Now, consider the nonsymmetric following problem.̂𝑠𝑛=𝐸𝑠𝑛𝑠𝑛2,𝑠𝑛1,𝑠𝑛+1=𝐴1𝑠𝑛+1+𝐴1𝑠𝑛1+𝐴2𝑠𝑛2(79) which is the case of 𝑘1=1<𝑘2=2𝑚1, and 𝑘1+𝑘2𝑚1. From (61), we get the following results.̂𝑠𝑛1=1+0.640.011×0.8𝑠𝑛+1+(1×0.8+0.8×0.30.3×(0.1))×𝑠𝑛1+1×0.3𝑠𝑛2=0.4908𝑠𝑛+10.6564𝑠𝑛10.1840𝑠𝑛2.(80) Now, we want to obtain the solution of (79) using the matrix equations and we expect the same answer as (80). The orthogonality condition is𝐸𝑠𝑛𝐴1𝑠𝑛+1𝐴1𝑠𝑛1𝐴2𝑠𝑛2𝑠𝑛𝑘=0,𝑘=1,1,2.(81) It follows that𝑟0𝐴1+𝑟2𝐴1+𝑟3𝐴2=𝑟1,𝑟2𝐴1+𝑟0𝐴1+𝑟1𝐴2=𝑟1,𝑟3𝐴1+𝑟1𝐴1+𝑟0𝐴2=𝑟2.(82) The result of (82) is the same as (80).

9.3. Interpolation with Infinite Data on the Left Side

We want to obtain the following estimation.̂𝑠𝑛=𝐸𝑠𝑛𝑠𝑛+𝑖,𝑖1,𝑖0=𝐴1𝑠𝑛+1+𝐴1𝑠𝑛1+𝐴2𝑠𝑛2+𝐴3𝑠𝑛3.(83) We can do it if we let 𝑘1=1 in (40). It follows that̂𝑠𝑛1=1+0.641×0.8𝑠𝑛+1+(1×0.8+0.8×0.3)𝑠𝑛1+(1×0.3+0.8×(0.1))𝑠𝑛2+1×(0.1)𝑠𝑛3=0.4878𝑠𝑛+10.6341𝑠𝑛10.1341𝑠𝑛2+0.0610𝑠𝑛3.(84)

Now we verity (84) using the orthogonality conditions.𝐸𝑠𝑛𝐴1𝑠𝑛+1𝐴1𝑠𝑛1𝐴2𝑠𝑛2𝐴3𝑠𝑛3𝑠𝑛𝑘=0,𝑘=1,1,2,3.(85) The following set of equations is obtained𝑟0𝐴1+𝑟2𝐴1+𝑟3𝐴2+𝑟4𝐴3=𝑟1,𝑟2𝐴1+𝑟0𝐴1+𝑟1𝐴2+𝑟2𝐴3=𝑟1,𝑟3𝐴1+𝑟1𝐴1+𝑟0𝐴2+𝑟1𝐴3=𝑟2,𝑟4𝐴1+𝑟2𝐴1+𝑟1𝐴2+𝑟0𝐴3=𝑟3.(86)

Note that the coefficient matrix of (86) is not Toeplitz. The result of (86) is the same as (84).

10. Conclusion

We introduced anticausal LTI model besides the known causal LTI model for AR processes. Using these models and the related innovation noises, we achieved the optimal linear interpolations for different cases. Specifically, we extracted the formulae when there are infinite data on the right, or the left sides of the variable to be estimated. We also obtained the linear prediction or interpolation with finite data. The number of data must be at least the order of the process minus one. We could not find a general simple form when fewer data are available. For the proofs of our solutions, the innovation noises and the orthogonality principle are essential.