Abstract

In most missing samples problems, the signals are assumed to be bandlimited. That is, the signals are assumed to be sparsely approximated by a known subset of the discrete Fourier transform basis vectors. We discuss the recovery of missing samples when the signals can be sparsely approximated by an unknown subset of certain unitary basis vectors. We propose the use of the orthogonal matching pursuit to recover missing samples by sparse approximations.

1. Introduction

Discrete signals are usually represented by their samples taken on a uniform sampling grid. However, in many applications, it may happen that some samples are lost or unavailable. In such cases, it is required to convert the irregularly sampled signal to a regularly sampled one, that is, to restore the missing samples. One approach for the recovery of missing data in discrete signals is based on the assumption that the underlying continuous-time signals are bandlimited. The celebrated sampling theorem by Whittaker [1], Kotel'nikov [2], and Shannon [3] implies that any continuous-time bandlimited signal can be reconstructed by its regularly spaced samples if the sampling frequency is higher than two times the maximum frequency component of the signal. The solution of the nonuniform sampling problem poses more difficulties and there exists a vast literature dealing with necessary and sufficient conditions for unique reconstruction and methods for reconstructing a function from its samples, for example, [46]. The numerical reconstruction methods, however, have to operate in a finite-dimensional model, whereas the theoretical results are usually derived for the continuous-time bandlimited functions (an infinite-dimensional subspace). The use of a truncated sampling series results with a finite-dimensional model but may lead, however, to severely ill-posed numerical problems [7].

Another approach is to address this problem in a finite-dimensional model of discrete bandlimited signals. A discrete bandlimited (DBL) signal has a sparse representation in terms of a certain unitary basis vector (e.g., discrete Fourier transform). That is, the signal can be represented by only a known subset of the unitary basis vectors. The recovery of missing samples is, in this case, equivalent to solving a linear system of equations [8]. In this paper, we focus on the recovery problem, when the a priori knowledge is that the signal is sparsely represented by an unknown subset of certain unitary basis vectors. This problem is much harder to solve, and there is an infinite number of possible solutions. Our approach is to choose the sparsest solution. That is, we are interested in approximating the known samples with a minimum number of basis vectors. Standard methods for solving linear systems of equations cannot provide the sparsest solution. We suggest the use of the orthogonal matching pursuit algorithm [911] for determining the sparsest approximation to the given samples from which the unknown samples can be determined.

2. Recovery of Missing Samples of Discrete Bandlimited Signals

Let be a discrete signal with samples. That is, can be described by an -dimensional real vector whose elements are denoted by . These elements correspond to the samples of the signal. Let be an orthonormal transform matrix; that is, , where is the identity matrix and is the transpose of . The orthonormal matrix defines an orthonormal transform of the vector , denoted by , which is by definition the signal . The inverse transform of is by definition the signal . In the complex case, the vectors belong to the space (space of -dimensional complex vectors), and the conjugate transpose of the unitary matrix has to be taken when computing the inverse transform.

The columns of define an orthonormal basis of . That is, each signal can be represented by a linear combination of the columns of , where the coefficients in this linear combination are given by the transform coefficients . Let () be the rows of (i.e., the columns of the matrix ). It follows that for each

Let be a -size proper subset of and assume that only the samples , , are known. Consequently, the available signal samples define systems of equations for the transform coefficients: The recovery of missing samples from an arbitrary signal is of course impossible: there is an infinite number of solutions to the underdetermined system of equations (2). The signals that occur in applications, however, often satisfy known constraints. The constraint used in this paper is called discrete band-limitedness: the signal can be represented by only a known subset of the columns of . That is, the signal is a linear combination of certain subset of certain orthonormal basis vectors of .

Let be a -size proper subset of , where . A discrete bandlimited signal approximation to the signal is obtained by That is, we assume that all transform coefficients with indices are equal to zero.

For example, let be the discrete Fourier transform (DFT) matrix. That is, is the unitary matrix with elements ( is the imaginary unit) A signal is bandlimited if its DFT vanishes on some fixed nonempty proper subset of . The complement of is the subset . When the subset is the set of elements (i.e., ) or equivalently (by periodicity) the signal is called a low-pass signal in DFT domain.

Another example is the discrete cosine transform. Let be the discrete cosine transform (DCT) matrix. That is, is the orthonormal matrix with elements A signal is bandlimited if its DCT vanishes on some fixed nonempty proper subset of . When the complement of (i.e., subset ) is the set of elements the signal is called a low-pass signal in DCT domain.

Assuming that is a discrete bandlimited signal (i.e., , ), then, the available signal samples, , , can now be expressed as Equation (9) is a linear system consisting of equations with unknowns. The existence and uniqueness of the solution depend, for any transform, on the subsets and (see, e.g., [8] for a discussion on the recovery of samples when the signals are bandlimited in the discrete Fourier transform domain). When the system of equations (9) has a full column rank and , we can determine the unique exact solution. When the system has a full column rank and (overdetermined system of equations), we will be interested in the least squares solution.

3. Sparse Approximations

In the previous section, we discussed the recovery of missing samples, when the band-limitedness was known a priori. That is, we had the a priori knowledge of which transform coefficients are equal to zero. The recovery of the missing samples is, in this case, equivalent to solving a linear system of equations for which many algorithms can be used. If we know that the signal is a low-pass bandlimited signal (e.g., in DFT or DCT domain), but the bandwidth is not given, we may start with a low bandwidth model and solve the resulting linear system of equations for increasing bandwidths until a sufficient accuracy in approximating the given samples is obtained. If the only a priori knowledge is that the signal is sparse in terms of the basis vectors, that is, the signal can be represented (or accurately approximated) with a few unknown basis vectors, the situation is very different. In this case, we have also to determine the basis vectors that sparsely approximate the signal.

We have to solve the underdetermined system of equations (2): where is the -size subset of indices of known samples. The underdetermined system of equations (2) has an infinite number of solutions. Since we know that the signal is sparsely approximated by the basis vectors, one reasonable approach is to choose the sparsest solution among the infinite number of solutions. That is, the optimal approximation is defined as either the sparsest approximation (i.e,. with the fewest basis vectors) that yields an approximation error that is smaller than a prespecified threshold or the approximation using a fixed number of basis vectors that minimizes the approximation error. Finding these approximations is an NP hard problem [12, 13].

We will use the following notations. Let be a signal of samples that contains the given samples; that is, , , . Let , , be the columns of the matrix . For each , let be a unit-norm vector of samples, where , , , and the norm is the standard norm in : . The set of unit-norm vectors will be called a dictionary. The problem is to determine a sparse approximation to the vector with the dictionary vectors , .

After determining a sparse solution using dictionary vectors: where the original signal will approximated by from which the missing samples can be determined.

Several algorithms have been proposed for reducing the computational complexity by searching for efficient but nonoptimal approximations. The matching pursuit (MP) [10, 11] is a popular iterative greedy algorithm for approximate decomposition that addresses the sparsity issue directly. Vectors are selected one by one from the dictionary, picking at each iteration the vector that best correlates with the present residual, and thus optimizing the signal approximation (in terms of energy) at each iteration. An intrinsic feature of this algorithm is that when stopped after a few steps, it yields an approximation using only a few dictionary vectors.

We consider the space of real-valued signals of size . Let , , be a dictionary of vectors, having a unit norm. Let be the residual of an term approximation of a given signal . MP subdecomposes the residue by projecting it on the dictionary vector that matches best. Starting from an initial approximation and residue , the MP algorithm builds up a sequence of sparse approximations stepwise. MP begins by projecting on a vector and computing the residue : where is the residual vector after best approximating ( norm sense) with . Since is orthogonal to , It follows that to minimize , we must choose such that is maximum. The process is iterated on the residual. Suppose that for is already computed. The next iteration chooses such that For unit-norm vectors, the last condition is equivalent to That is, the optimal vector is the one which best correlates with the residual. The MP projects on the chosen vector: The orthogonality of and implies Summing (18) from between 0 and (for any integer ) yields Similarly, summation of (19) from between 0 and gives When dealing with finite-dimension signals (as in our case), converges exponentially to 0 when tends to infinity [10].

The approximations of MP are improved by orthogonalizing the directions of projection with a Gram-Schmidt procedure [9]. The resulting orthogonal MP (OMP) converges with a finite number of steps, which is not the case for a standard MP. The vector selected by the MP is a priori not orthogonal to the previously selected vectors , . After subtracting the projection of over , new components are introduced in the direction of , . This is avoided by projecting the residues on an orthonormal set of vectors , , that span the same subspace that is spanned by , . That is, in each iteration, we determine the best approximation of the residual in the subspace spanned by , . When an approximation of sufficient accuracy is obtained, to expand over the original dictionary vectors , , we perform a change of basis by expanding in , .

We demonstrate the applicability of the approach by example. The original 128 samples length signal is the linear combination of two basis vectors of the DCT with randomly chosen coefficients: . The given signal is a noisy version of the original signal with additive white Gaussian noise with standard deviation equal to 0.05. We assume that 30 contiguous samples (9 to 38) of the noisy signal are missing. The results of the OMP are depicted in Figure 1, and we can see that the missing samples were approximately restored. The norm of the missing samples is 0.4339, while the norm of the error is 0.0315. If the samples of the original exact DBL signal are given, the missing 30 samples are completely recovered by the OMP procedure.

4. Conclusions

In this paper, we addressed the recovery of missing samples of discrete bandlimited signals. We have suggested the use of the orthogonal matching pursuit to recover missing samples by sparse approximations, when the signal can be sparsely represented by an unknown subset of basis vectors of a certain orthonormal transform.