Abstract

Transform image processing methods are methods that work in domains of image transforms, such as Discrete Fourier, Discrete Cosine, Wavelet, and alike. They proved to be very efficient in image compression, in image restoration, in image resampling, and in geometrical transformations and can be traced back to early 1970s. The paper reviews these methods, with emphasis on their comparison and relationships, from the very first steps of transform image compression methods to adaptive and local adaptive filters for image restoration and up to “compressive sensing” methods that gained popularity in last few years. References are made to both first publications of the corresponding results and more recent and more easily available ones. The review has a tutorial character and purpose.

1. Introduction: Why Transforms? Which Transforms?

It will not be an exaggeration to assert that digital image processing came into being with introduction, in 1965 by Cooley and Tukey, of the Fast Fourier Transform algorithm (FFT, [1]) for computing the Discrete Fourier Transform (DFT). This publication immediately resulted in impetuous growth of activity in all branches of digital signal and image processing and their applications.

The second wave in this process was inspired by the introduction into communication engineering and digital image processing, in the 1970s, of Walsh-Hadamard transform and Haar transform [2] and the development of a large family of fast transforms with FFT-type algorithms [35]. Whereas Walsh-Hadamard and Haar transforms have already been known in mathematics, other transforms, for instance, quite popular at the time Slant Transform [6], were being invented “from scratch.” This development was mainly driven by the needs of data compression, though the usefulness of transform domain processing for image restoration and enhancement was also recognized very soon [3]. This period ended up with the introduction of the Discrete Cosine Transform (DCT, [7, 8]), which was soon widely recognized as the best choice among all available at the time transforms and resulted in JPEG and MPEG standards for image, audio, and video compression.

The third large wave of activities in transforms for signal and image processing was caused by the introduction, in the 1980s, of a family of transforms that was coined the name “wavelet transform” [9]. The main motivation was achieving a better local representation of signals and images in contrast to the “global” representation that is characteristic to Discrete Fourier, DCT, Walsh-Hadamard, and other fast transforms available at the time. During 1980s–1990s a large variety of discrete wavelet transforms were suggested [10] for solving various tasks in signal and in image processing.

Presently, fast transforms with FFT-type fast algorithms and wavelet transforms constitute the basic instrumentation tool of digital image processing.

The main distinctive feature of transforms that makes them so efficient in digital image processing is their energy compaction capability. In regular image representations, in form of sets of ordered pixels, some pixels, for instance, those that belong to object borders, are more important than the others and there are always some pixels in each particular image that are of no such importance and can be dropped out from image representation and restored from the remaining “important” pixels. But the problem is that one never knows in advance which pixels in the image are “important” and which are not.

The situation is totally different in image representation in transform domain. For orthogonal transforms that feature good energy compaction capability, a lion share of total image “energy” (sum of squared transform coefficients) is concentrated in a small fraction of transform coefficients, which indices are, as a rule, known in advance for the given type of images or can be easily detected. It is this feature of transforms that is called their energy compaction capability. It allows replacing images with their “band-limited,” in terms of a specific transform, approximations, that is, approximations defined by a sufficiently small fraction of image transform coefficients.

The optimal transform with the best energy compaction capability can be defined for groups of images of the same type or ensembles of images that are subjects of processing. Customarily, the “band-limited” image approximation accuracy is evaluated in terms of the root mean square approximation error (RMSE) for the image ensemble. By virtue of this, the optimal transform that secures the least band limited approximation error is defined by the ensemble correlation function, that is, image autocorrelation function averaged over the image ensemble. For continuous (not sampled) signals, this transform is called the Karhunen-Loeve Transform [11, 12]. Its discrete analog for sampled signals is called the Hotelling transform [13] and the result of applying this transform to sampled signals is called signal principal component decomposition. Karhunen-Loeve and Hotelling transforms provide the theoretical lower bound for compact (in terms of the number of terms in signal decomposition) signal discrete representation for RMS criterion of signal approximation.

However, being optimal in terms of the energy compaction capability, Karhunen-Loeve and Hotelling transforms have, generally, high computational complexity: the per pixel number of operations required for their computation is of the order of image size. This is why for practical needs only fast transforms that feature fast transform algorithms with computational complexity of the order of the logarithm of image size are considered. A register of the most relevant fast transforms, their main characteristic features, and application areas is presented in Table 1. Their mathematical definitions are given in Table 2.

Different fast transforms have different energy compaction capability for different types of images. Figure 1 illustrates the energy compaction capabilities of Discrete Fourier, Discrete Cosine, Walsh, and Haar transforms on a particular test image. It shows that for this image DCT demonstrates the best energy compaction capability: 95% of the image energy is contained in only 9.5% of all DCT spectral coefficients, whereas for DFT, Walsh and Haar transforms these fractions are 11%, 15%, and 9.8%, correspondingly. Note also that, though for Haar Transform the fraction of the “meaningful” transform coefficients (9.8%) is close to that for DCT, quality degradations of the band-limited approximation to the image are noticeably more severe.

Experience shows that DCT is in most applications advantageous with respect to other transforms in terms of the energy compaction capability. This property of DCT has a simple and intuitive explanation.

DCT of a discrete signal is essentially, to the accuracy of an unimportant exponential shift factor, DFT for the same signal extended outside its border to a double length by means of its mirrored, in the order of samples, copy [15, 16]. This way of signal extension has a profound implication on the speed of decaying to zero of signal DCT spectra and is a paramount and fundamental reason of the good energy compaction capability of DCT. DFT implies periodical extension of signals, which may cause severe discontinuities at signal borders and thus may require intensive high frequency components to reproduce them. DCT also implies periodical extension, however extension of not the signal itself but of its above-described extended copy. Thanks to such an “even” extension, the extended signal has no discontinuities at its borders, as well as at borders of the initial signal. Therefore such an extended signal is “smooth” and its DFT spectrum decays to zero faster than DFT spectrum of the initial, not extended signal.

It is noteworthy to mention that introducing DCT in [7] was motivated not by the above reasoning but by the finding that basis functions of DCT provide a good approximation to the eigenvectors of the class of Toeplitz matrices with entries ; that is, DCT can be considered as a good approximation to the Karhunen-Loeve Transform for signals with the Toeplitz correlation matrix. In fact, the above-mentioned fundamental reason for good energy compaction capability of DCT is hidden in the even symmetry of the Toeplitz matrix.

In the family of orthogonal transforms, DFT and DCT occupy a special place. DFT and DCT are two discrete representations of the Integral Fourier Transform, which plays a fundamental role in signal and image processing as a mathematical model of signal transformations and of wave propagation in imaging systems [17, 18]. Thanks to this, DFT and DCT are applicable in much wider range of applications than the other fast transforms (see Table 1).

In conclusion of this section, come back to the common feature of fast transform, to the availability for all of these transform of fast algorithms of FFT type. Initially, fast transforms were developed only for signals with the number of samples equal to an integer power of two. These are the fastest algorithms. At present, this limitation is overcome, and fast transform algorithms exist for arbitrary number of signal samples. Note that 2D and 3D transforms for image and video applications are built as separable; that is, they work separably in each dimension. Note that the transform separability is the core idea for fast transform algorithms. All fast transform algorithms are based on representation of signal and transform spectrum sample indices as multidimensional, that is represented by multiple digits, numbers.

The data on computational complexity of the transform fast algorithms are collected in Table 2. These data are more or less commonly known. Not as widely known is the existence of so-called pruned algorithms for the cases, when transform input data contain substantial fraction of zero samples and/or one does not need to compute all transform coefficients [1524]. These algorithms are useful, for instance, in numerical reconstruction of holograms and in interpolation of sampled data by means of transform zero padding [17, 25, 26].

In some applications it is advisable to apply transforms locally in running window rather than globally to entire image frames. For such application, Discrete Cosine and Discrete Fourier Transforms have an additional very useful feature. Computing these transforms in running window can be carried out recursively: signal transform coefficients at each window position can be found by quite simple modification of the coefficients found at the previous window position with per pixel computational complexity proportional to the window size rather than to the product of the window size and its logarithm, which is the computational complexity of the fast transforms [17, 2628].

The rest of the paper is arranged as follows. In Section 2, applications of fast transforms for image data compression are reviewed. In Section 3, we proceed to transform methods for image restoration and enhancement. In Section 4, we show how DFT and DCT can be applied for image perfect, that is, error less, resampling. And finally in Section 5, we address the usage of fast transforms for solving a specific task of image resampling, the task of image recovery from sparse and irregularly taken samples, and, in particular, compressive sensing approach to this problem.

2. Image Compression

2.1. Dilemma: Compressive Discretization or Compression

Ideally, image digitization, that is, converting continuous signals from image sensors into digital signals, should be carried out in such a way as to provide as compact image digital representation as possible provided that quality of image reproduction satisfies given requirements. Due to technical limitation, image digitization is most frequently performed in two steps: discretization, that is, converting image sensor signal into a set of real numbers, and scalar quantization, that is rounding off, of those numbers to a set of fixed quantized values [15, 18].

Image discretization can, in general, be treated as obtaining coefficients of image signal expansion over a set of discretization basis functions. In order to make the set of these representation coefficients as compact as possible, one should choose discretization basis functions that secure the least number of the signal expansion coefficients sufficient for image reconstruction with a given required quality. One can call this general method of signal discretization “Compressive discretization” because it secures the most compact discrete representation of signals, which cannot be further compressed. Note that this term should not be confused with terms “Compressive sensing” and “Compressive sampling,” that gained large popularity in recent years [2936]. We will discuss the “compressive sensing” approach later in Section 5.

There are at least two examples of practical implementation of the principle of general discretization by means of measuring transform domain image coefficients: Coded Aperture Imaging and Magnetic Resonance Imaging (MRI). In coded aperture imaging, images are sensed through binary masks that implement binary basis functions. In MRI imaging, coefficients of Fourier series expansion of sensor data are measured.

However, by virtue of historical reasons and of the technical tradition, image discretization is most frequently in imaging engineering is implemented as image sampling at nodes of a uniform rectangular sampling lattice using sampling basis functions, which are formed from one “mother” function by its shifts by multiple of fixed intervals called “sampling” intervals.

The theoretical foundation of image sampling is the sampling theorem. The traditional formulation of the sampling theorem states that signals with Fourier spectrum limited with bandwidth can be perfectly restored from their samples taken with sampling interval , commonly called the Nyquist sampling interval [3739].

In reality no continuous signal is band-limited, and the image sampling interval is defined not through specifying, in one or another way, of the image bandwidth, but directly by the requirement to reproduce smallest objects and borders of larger objects present in images sufficiently well. The selected in this way image sampling interval specifies image base band .

Since small objects and object borders usually occupy relatively small fraction of the image area, vast portions of images are oversampled, that is, sampled with redundantly small sampling intervals. Hence, substantial compression of image sampled representation is possible. It can be implemented by means of applying to the image primary and redundant sampled representation a discrete analog of the general compressive discretization, that is, by means of image expansion over a properly chosen set of discrete optimal bases functions and limitation of the amount of the expansion coefficients. This is exactly what is done in all transform methods of image compression.

2.2. Transform Methods of Image Compression

Needs of image compression were the primary motivations of digital image processing. Being started in 1950s from various predictive coding methods (a set of good representative publications can be found in [40]), by the beginning of 1970s image compression research began concentrating mostly on what we call now “transform coding” [41, 42], which literally implements the compressed discretization principle.

The principle of image transform coding is illustrated by the flow diagrams sketched in Figure 2.

According to these diagrams, set of image pixels is first subjected to a fast orthogonal transform. Then low intensity transform coefficients are discarded, which substantially reduces the volume of data. This is the main source of image compression. Note that discarding certain image transform coefficients means replacement of images by their band-limited, in terms of the selected transform, approximations. The remaining coefficients are subjected, one by one, to optimal nonuniform scalar quantization that minimizes the average number of transform coefficient quantization levels. Finally, quantized transform coefficients are entropy encoded to minimize the average number of bits per coefficient. For image reconstruction from the compressed bit stream, it must be subjected to corresponding inverse transformations.

In 1970s, main activities of researches were aimed at inventing, in addition to known at the time Discrete Fourier, Walsh-Hadamard, and Haar Transforms, new transforms that have guaranteed fast transform algorithms and improved energy compaction capability [35].

This transform invention activity gradually faded after introduction, in a short note, of the Discrete Cosine transform [7], which, as it was already mentioned, was finally recognized as the most appropriate transform for image compression due to its superior energy compaction capability and the availability of the fast algorithm. But the true breakthrough happened, when it was realized that much higher image compression efficiency can be achieved, if transform coding is applied not to entire image frames but blockwise [41, 42].

Images usually contain many objects, and their global spectra are a mixture of object spectra, whereas spectra of individual image fragments or blocks are much more specific and this enables much easier separation of “important” (most intense) from “unimportant” (least intense) spectral components. This is vividly illustrated in Figure 3, where global and block wise DCT power spectra of a test image are presented for comparison. Although, as one can see in the figure, spectral coefficients of DCT global spectrum are more sparse (spectrum sparsity of this image, i.e., fraction of spectral coefficients that contain 95% of the total spectrum energy, is 0.095) than, on average, those of image of pixel fragments (average sparsity of spectra fragments is 0.14), spectra of individual fragments are certainly more specific and their components responsible for small objects and object borders have much higher energy than in the global spectrum and will not be lost when 95% of the most intense components are selected.

Blockwise image DCT transform compression proved to be so successful, that, by 1992, it was put in the base of the image and video coding standards “JPEG” and “MPEG” [4346], which eventually resulted in the revolution in the industry of photographic and video cameras and led to the emergence of digital television as well [47].

Though the standard does not fix the size of blocks, blocks of pixels are used in most cases. Obviously, the compression may, in principle, be more efficient, if the block size is adaptively selected in different areas of images; hence the use of variable size of blocks was discussed in number of publications (see, for instance, [48]).

3. Transform Domain Filters for Image Restoration and Enhancement

3.1. Transform Domain Scalar Empirical Wiener Filters

Very soon after the image transform compression methods emerged, it was recognized that transforms represent a very useful tool for image restoration from distortions of image signals in imaging system and for image enhancement [49, 50]. The principle is very simple: for image improvement, image transform coefficients are modified in a certain way and then the image is reconstructed by the inverse transform of modified coefficients (Figure 4). This way of processing is called transform domain filtering.

For the implementation of modifications of image transform coefficients, two options are usually considered.(i)Modification of absolute values of transform coefficients by a nonlinear transfer function; usually it is a function that compresses the dynamic range of transform coefficients, which redistributes the coefficients’ intensities in favor of less intensive coefficients and results in contrast enhancement of image small details and edges.(ii)Multiplication of transform coefficients by scalar weight coefficients; this processing is called transform domain “scalar filtering” [49].

For defining optimal scalar filter coefficients, a Wiener-Kolmogorov [51, 52] approach of minimization of mean squared filtering error is used and thus filters implement empirical Wiener filtering principles [17, 18, 26].

Three modifications of the filters based on this principle are (i) proper Empirical Wiener Filters, (ii) Signal Spectrum Preservation Filters, and (iii) Rejecting Filters [18]. For image denoising from additive signal independent noise and image deblurring, they modify input image Fourier spectra at each frequency component according to the equations correspondingly.

Empirical Wiener Filter

Signal Spectrum Preservation Filter

Rejecting Filter where is Fourier spectrum of output image, is power spectrum of additive noise, assumed to be known or to be empirically evaluated from the input noisy image [1518], and is the imaging system frequency response assumed to be known, for instance, from the imaging system certificate.

As one can see in these equations, all these filters eliminate image spectral components that are less intensive than those of noise and the remaining components are corrected by the “inverse” filter with frequency response of . Division of image spectra by compensates image blur in the imaging system due to imperfect ; that is, it performs image deblurring. It is applicable in processing specifically in domains of DFT or DCT, which are, as it was already mentioned, two complementing discrete representations of the integral Fourier Transform.

In addition, empirical Wiener filter modifies image spectrum through deamplification of image spectral components according to the level of noise in them; spectrum preservation filter modifies magnitude of the image spectrum as well by making it equal to a square root of image power spectrum empirical estimate as a difference between power spectrum of noisy image and that of noise. Rejecting filter does not modify remaining, that is, not rejected, spectral components at all. In some publications, image denoising using the spectrum preservation filter is called “soft thresholding” and rejecting filtering is called “hard thresholding” [53, 54].

Versions of these filters are filters that combine image denoising-deblurring and image enhancement by means of -law nonlinear modifications of the corrected signal spectral components where is a parameter that controls the degree of spectrum dynamic range compression and, by this, the degree of enhancement of low intensity image spectral components [18].

The described filters proved to be very efficient in denoising so-called narrow band noises, that is, noises, such as “moirè noise” or “banding noise,” that contain only few nonzero components in its spectrum in the selected transform. Some illustrative examples are given in Figures 5 and 6.

However for image cleaning from wideband or “white” noise, above transform domain filtering applied globally to entire image frames is not efficient. In fact it can even worsen the image visual quality. As one can see from an illustrative example shown in Figure 7, filtered image loses its sharpness and contains residual correlated noise, which is visually more annoying than the initial white noise.

The reason for this inefficiency of global transform domain filtering is the same as for the inefficiency of the global transform compression, which was discussed in Section 2.2: in global image spectra low intensity image components are hidden behind wideband noise spectral components. And the solution of this problem is the same: replace global filtering by local fragment wise filtering. This idea is implemented in local adaptive linear filters. Being applied locally, filters are becoming locally adaptive, as their frequency responses are determined, according to (1)–(4), by local spectra of image fragments within the filter window. Local adaptive linear filters date back to mid-1980s and were refined in subsequent years [16, 18, 26, 55]. As a theoretical base for the design of local adaptive filters, local criteria of processing quality were suggested [56].

It is instructive to note that this is not an accident that the evolution of human vision came up with a similar solution. It is well known that when viewing image, human eye’s optical axis permanently hops chaotically over the field of view and that the human visual acuity is very nonuniform over the field of view. The field of view of a man is about 30°. Resolving power of man’s vision is about 1′. However such a relatively high resolving power is concentrated only within a small fraction of the field of view that has size of about 2° (see, for instance, [57]); that is, the area of the acute vision is about 1/15th of the field of view. For images of pixels this means window of roughly pixels.

The most straightforward way to implement local filtering is to do it in a hopping window, just as human vision does, and this is exactly the way of processing implemented in the transform image coding methods. However “hopping window” processing, being very attractive from the computational complexity point of view, suffers from “blocking effects,” artificial discontinuities that may appear, in the result of processing, at the edges of the hopping window. Blocking effects are artifacts characteristic for transform image coding. The desire to avoid them motivated the advancement of “Lapped” transforms [58], which are applied blockwise with the half of the block size overlap. Obviously, the ultimate solution of the “blocking effects” problem would be pixel by pixel processing in sliding window that scans image row-wise/column/wise.

Thus, local adaptive linear filters work in a transform domain in a sliding window and, at each position of the window, modify, according to the type of the filter defined by (1)–(4), transform coefficients of the image fragment within the window, and then compute an estimate of the window central pixel by means of the inverse transform of the modified transform coefficients. As an option, accumulation of estimates of the all window pixels overlapping in the process of image scanning by the filter window is possible as well [59]. As for the transform for local adaptive filtering, DCT proved to be the best choice in most cases. Figures 8 and 9 give examples of local adaptive filtering for image deblurring and enhancement.

Local adaptive filters can work with color and multicomponent images and videos as well. In this case, filtering is carried out in the corresponding multidimensional transform domains, for instance, domains of 3D (two spatial and one color component coordinates) spectra of color images or 3D spectra of a sequence of video frames (two spatial and time coordinates). In the latter case filter 3D window scans video frame sequence in both spatial and time coordinates. As one can see from Figures 10 and 11, the availability of the additional third dimension substantially adds to the filter efficiency [18, 55, 60].

Certain further improvement of image denoising capability can be achieved if, for each image fragment in a given position of the sliding window, similar, according to some criterion, image fragments over the whole image frame are found and included in the stack of fragments, to which the 3D transform domain filtering is applied. This approach in its simplest form was coined the name “nonlocal means” filtering [6163]. In fact, similar method has been much earlier known as the correlational averaging (see, for instance, [64]). It assumes finding similar image fragments by their cross-correlation coefficient with a template object and averaging of similar fragments, that is, leaving only the dc component of the 3D spectra in this dimension. In [65] the simple averaging was replaced by full scale 3D transform domain filtering. One should, however, take into account that the correlational averaging is prone to produce artifacts due to false image fragments that may be erroneously taken to the stack just because of the presence of noise that has to be cleaned [66].

Obviously, image restoration efficiency of the sliding window transform domain filtering will be higher, if the window size is selected adaptively at each window position. To this goal, filtering can be, for instance, carried out in windows of multiple sizes and, at each window position, the best filtering result should be taken as the signal estimate at this position using methods of statistical tests, for instance, “intersection of confidence intervals method” described in [67]. Another option in multiple window processing is combining, in a certain way (say, by weighted summation or by taking a median or in some other way), filtering results obtained for different windows. All this, however, increases correspondingly the computational complexity of the filtering, which may become prohibitive, especially in video processing.

In conclusion of this section note that DFT and DCT spectra of image fragments in sliding window form the so called “time-frequency representation” of signals, which can be traced back to 1930s-1940s to Gabor and Wigner and works on “visible speech” [6870]. For 1D signals, the dimensionality of the representation is 2D; for images, it is 3D. Such representation is very redundant and this opens an interesting option of applying to it image (for 1D signals) and video (for 2D signals) processing methods for signal denoising and, in general, signal separation.

3.2. Wavelet Shrinkage Filters for Image Denoising

In 1990s, a specific family of transform domain denoising filters, the so-called wavelet shrinkage filters, gained popularity after publications [53, 54, 71, 72]. The filters work in the domain of one of wavelet transforms and implement, for image denoising, the above-mentioned Signal Spectrum Preservation Filter (2) and Rejecting Filter (3), except that they do not include the “inverse filtering” component () that corrects distortions of image spectra in imaging systems. As it was already mentioned, the filters were coined in these publications the names “soft thresholding” and “hard thresholding” filters, correspondingly.

Basis functions of wavelet transforms (wavelets) are formed by means of a combination of two methods of building transform basis functions from one “mother” function, shifting and scaling. Thanks to this, wavelets combine local sensitivity of shift basis functions and global properties of “scaled” basis functions and feature such attractive properties as multiresolution and good localization in both signal and transform domain. Thus, the wavelet shrinkage filters, being applied to entire image frames, possess both global adaptivity and local adaptivity in different scales and do not require for the latter specification of the window size, which is necessary for the sliding window filters.

These filters did show a good denoising capability. However, a comprehensive comparison of their denoising capability with that of sliding window DCT domain filters reported in [59] showed that even when for particular test images the best wavelet transform from a transform set [73] was selected, sliding window DCT domain filters demonstrated in most cases better image denoising results.

The capability of wavelets to represent images in different scales can be exploited for improving the image denoising performance of both families of filters and for overcoming the above-mentioned main drawback of sliding window DCT domain filters, the fixed window size that might not be optimal for different image fragments. This can be achieved by means of incorporating sliding window DCT domain (SWTD) filters into the wavelet filtering structure through replacing soft/hard thresholding of image wavelet decomposition components in different scales by their filtering with SWTD filters working in the window of the smallest size pixels. This idea was implemented in hybrid wavelet/SWTD filters and proved to be fruitful in ([74], see also [55]).

3.3. Local Adaptive Filtering and Wavelet Shrinkage Filtering as Processing of Image Subband Decompositions

Obviously, sliding window transform domain and wavelet processing are just different implementation of the scalar linear filtering in transform domain. This motivates their unified interpretation in order to gain a deeper insight into their similarities and dissimilarities. The common base for such unified interpretation is the notion of signal subband decomposition [75]. One can show that both filter families can be treated as special cases of signal subband decomposition by band-pass filters with point spreads functions defined by the corresponding basis functions of the transform [26, 55]. From the point of view of signal subband decomposition, the main difference between sliding window transform domain and wavelet signal analysis is arrangement of bands in the signal frequency range. While for sliding window transform domain filtering subbands are uniformly arranged with the signal base band, subbands of the wavelet filters are arranged in a logarithmic scale. Hybrid wavelet/SWDCT filtering combines these two types of subband arrangements: “coarse” subband decomposition in logarithmic scale, provided by wavelet decomposition, is complemented with “fine” uniformly arranged sub-subbands within each wavelet subband, provided by the sliding window DCT filtering of the wavelet subbands.

It is curious to note that this “logarithmic coarse—uniform fine” subband arrangement resembles very much the arrangements of tones and semitones in music. In Bach’s equal tempered scale, octaves are arranged in a logarithmic scale and 12 semitones are equally spaced within octaves [76].

4. Image Resampling and Building “Continuous” Image Models

As it was indicated in the introductory section, Discrete Fourier and Discrete Cosine Transforms occupy the unique position among other orthogonal transforms. They are two versions of discrete representation of the integral Fourier Transform, the fundamental transform for describing the physical reality. Among applications specific for DFT and DCT are signal and image spectral analysis and analysis of periodicities, fast signal and image convolution and correlation, and image resampling and building “continuous” image models [17, 18, 26]. The latter is associated with the efficiency, with which DFT and DCT can be utilized for fast signal convolution.

Image resampling is a key operation in solving many digital image processing tasks. It assumes reconstruction of an approximation of the original nonsampled image by means of interpolation of available image samples to obtain samples “in-between” the available ones. The most feasible is interpolation by means of digital filtering implemented as digital convolution. A number of convolutional interpolation methods are known, beginning from the simplest and the least accurate nearest neighbor and linear (bilinear, for 2D case) interpolations to more accurate cubic (bicubic, for 2D case) and higher order spline methods [77, 78]. In some applications, for instance, in computer graphics and print art, the simplest nearest neighbor or linear (bilinear) interpolation methods can provide satisfactory results. In applications that are more demanding in terms of the interpolation accuracy, higher order spline interpolation methods gained popularity. The interpolation accuracy of spline interpolation methods grows with the spline order. However their computational complexity grows as well.

There exists a discrete signal interpolation method that is capable, given finite set of signal samples, to secure virtually error free interpolation of sampled signals, that is, the method that does not add to the signal any distortions additional to the distortions caused by the signal sampling. This method is the discrete sinc-interpolation [18, 25]. Interpolation kernel for the discrete sinc-interpolation of sampled signals of samples is the discrete sinc-function . This function replaces, for sampled signals with a finite number of samples, the continuous sinc-function , which, according to the sampling theory, is the ideal interpolation kernel for reconstruction of continuous signals from their samples provided that the number of samples is infinitely large. Note that the discrete sinc-function should not be confused with a windowed or truncated sinc-function with a finite number of samples, which is very commonly believed to be a good replacement in numerical interpolation of the continuous sinc-function. The windowed sinc-function does not secure interpolation error free signal numerical interpolation, whereas the discrete sinc-interpolation does.

Fast Fourier transform and fast Discrete Cosine transform are the ideal computational means for implementing the convolutional interpolation by the discrete sinc-function. Of these two transforms, FFT has a substantial drawback: it implements the cyclic convolution with a period equal to the number of signal samples, which is potentially prone to causing heavy oscillations at signal borders because of a discontinuity which is very probable between signal values at its opposite borders. This drawback is to a very high degree surmounted when signal convolution is implemented through DCT [18, 26]. As it was already indicated, convolution through DCT is also a cyclic convolution but with the period of samples and of signals that are evenly extended to double length by their inversed, in the order of samples, copies. This completely removes the danger of appearance of heavy oscillations due to discontinuities at signal borders that are characteristic for discrete sinc-interpolation through processing in DFT domain.

Several methods of implementation of DFT/DCT based discrete sinc-interpolation. The most straightforward one is DFT/DCT spectrum zero padding. Interpolated signal is generated by applying inverse DFT (or, correspondingly, DCT) transform to its zero pad spectrum. Padding DFT/DCT spectra of signals of samples with zeroes produces samples of the discrete sinc interpolated initial signal with subsampling rate , that is, with intersample distance of the initial sampling interval. The computational complexity of this method is per output sample.

The same discrete sinc interpolated times subsampled signal can be obtained more efficiently computation-wise by applying to the signal times the signal perfect shifting filter [25, 79] with shifts by at each th application and by subsequent combining the shifted signal copies. The computational complexity of this method is per output sample.

The method is based on the property of Shifted DFT (SDFT, [18, 26, 80]): if one computes SDFT of a signal with some value of the shift parameter in signal domain and then computes inverse SDFT with another value of the shift parameter, the result will be a discrete sinc interpolated copy of the signal shifted by the difference between those values of the shift parameter. In order to avoid objectionable oscillations at signal borders, the DCT based version of this filter is recommended [8183].

Image subsampling using the perfect shifting filter can be employed for creating “continuous” image models for subsequent image arbitrary resampling with any given accuracy [18, 84]. It can also be used for computing image correlations and image spectral analysis with a subpixel accuracy [18, 26]. Particular examples of using created in this way continuous signal models for converting image spectra from polar coordinate system to Cartesian coordinate system in the direct Fourier method for image reconstruction from projections and for converting (rebinning) of image fan beam projections into parallel projections one can find in [18].

Except for creating “continuous” image models, the perfect shifting filter is very well suited for image sheering in the three-step method for fast image rotation by means of three subsequent (horizontal/vertical/horizontal) image sheerings [85].

Perfect interpolation capability of the discrete sinc-interpolation was demonstrated in a comprehensive comparison of different interpolation methods in experiments with multiple 360° rotations reported in [25]. The results of experiments illustrated in Figure 12 clearly evidence that, in contrast to other methods, including high order spline interpolation ones [78, 86, 87], discrete sinc-interpolation does not introduce any appreciable distortions into the rotated image.

In some applications, “elastic” or space variant image resampling is required, when shifts of pixel positions are specified individually for each image pixel. In these cases, the perfect shifting filter can be applied to image fragments in sliding window for evaluating interpolated value of the window central pixel at each window position. Typical application examples are imitation of image retrieval through turbulent media (for illustration see [88]) and stabilization and superresolution of turbulent videos [8991].

Being working in DCT transform domain, “elastic” resampling in sliding window can be easily combined with above-described local adaptive denoising and deblurring [25]. Yet another additional option is making resampling in sliding window adaptive to contents of individual image fragments by means of switching between the perfect shifting and another interpolation method better suited for specific image fragments. This might be useful in application to images that contain very heterogeneous fragments such as photographs and binary drawings and text [25].

A certain limitation of the perfect shifting filter in creating “continuous” image models is its capability of subsampling images only with a rate expressed by an integer or a rational number. In some cases, this might be inconvenient, for instance, when the required resampling rate is a value between one and two, say 1.1, 1.2, or alike. For such cases, there exists the third method of signal resampling with discrete sinc-interpolation. It is based on the general Shifted Scaled DFT, which includes arbitrary analog shift and scale parameters. Using Shifted Scaled (ShSc) DFT, one can apply to the image DFT spectrum inverse ShScDFT with the desired scale parameter and obtain a correspondingly scaled discrete sinc interpolated image [82, 83]. Being the discrete sinc-interpolation method, this method also does not cause any image blurring, which other interpolation methods are prone to do. Figure 13 illustrates this on an example of multiple iterative image zooming-in and zooming-out.

The Shifted Scaled DFT can be presented as a convolution and, as such, can be computed using Fast Fourier or Fast Cosine Transforms [18, 26]. Thus, the computational complexity of the method is per output signal sample, where is the scale parameter. As in all other cases, the use of DCT guaranties the absence of severe boundary effects. This method is especially well suited for numerical reconstruction of digitally recorded color holograms, when it is required to rescale images reconstructed from holograms recorded with different wave lengths of coherent illumination [82].

Among the image processing tasks, which involve “continuous” image models, are also signal differentiation and integration, the fundamental tasks of numerical mathematics that date back to such classics of mathematics as Newton and Leibnitz. One can find standard numerical differentiation and integration recipes in numerous reference books, for instance, [92, 93]. All of them are based on signal approximation through Taylor expansion. However, according to the sampling theory, approximation of sampled signal by Taylor expansion is not optimal, and the direct use of these methods for sampled signals may cause substantial errors in cases, when signals contain substantial high frequency components. In order to secure sufficiently accurate signal differentiation and integration by means of those standard algorithms, one must substantially oversample such signals.

The exact solution for the discrete representation of signal differentiation and integration provided by the sampling theory tells that, given the signal sampling interval and signal sampling and reconstruction devices, discrete frequency responses (in DFT domain) of digital filters for perfect differentiation and integrations should be, correspondingly, proportional and inversely proportional to the frequency index [18, 26]. This result directly leads to fast differentiation and integration algorithms that work in DFT or DCT domains using corresponding fast transforms with computational complexity operation per signal sample for signals of samples. As in all cases, realization of the integration and, especially, differentiation filters in DCT domain is preferable because of much lower, for DCT, border effect artifacts associated with the finite number of signal samples.

The comprehensive comparison of the accuracy of standard numerical differentiation and integration algorithms with perfect DCT-based differentiation and integration reported in [25, 94] shows that the standard numerical algorithms in reality work not with original signals but with their blurred to a certain degree copies. This conclusion is illustrated in Figure 14 on an example of multiple alternative differentiations and integrations of a test rectangular impulse.

Computational efficiency of the DFT/DCT based interpolation error free discrete sinc-interpolation algorithms is rooted in the use of fast Fourier and Fast DCT transforms. Perhaps, the best concluding remark for this discussion of the discrete sinc-interpolation DFT/DCT domain methods would be mentioning that a version of what we call now Fast Fourier Transform algorithm was invented more than 200 hundred years ago by Gauss just for the purpose of facilitating numerical interpolation of sampled data of astronomical observation [95].

5. Image Recovery from Sparse and Irregularly Sampled Data

5.1. Discrete Sampling Theorem Based Method

As we mentioned at the beginning of Section 2, image discretization is usually carried out by image sampling at nodes of a uniform rectangular sampling lattice. In this section we address using fast transforms for solving the problem closely associated with the general compressive discretization: how and with what accuracy can one recover an image from its sparse and irregularly taken samples.

There are many applications, where, contrary to the common practice of uniform sampling, sampled data are collected in irregular fashion. Because image display devices as well as computer software for processing sampling data assume using regular uniform rectangular sampling lattice, one needs in all these cases to convert irregularly sampled images to regularly sampled ones.

Generally, the corresponding regular sampling grid may contain more samples than it is available, because coordinates of positions of available samples might be known with a “subpixel” accuracy, that is, with the accuracy (in units of image size) better than , where is the number of available pixels. Therefore one can regard available samples as being sparsely placed at nodes of a denser sampling lattice with the total amount of nodes .

The general framework for recovery of discrete signals from a given set of their arbitrarily taken samples can be formulated as an approximation task in the assumption that continuous signals are represented in computers by their irregularly taken samples and it is believed that if all samples in a regular sampling lattice were known, they would be sufficient for representing those continuous signals [18, 96]. The goal of the processing is generating, out of an incomplete set of samples, a complete set of signal samples in such a way as to secure the most accurate, in terms of the reconstruction mean square error (MSE), approximation of the discrete signal, which would be obtained if the continuous signal it is intended to represent were densely sampled at all positions.

Above-described discrete sinc-interpolation methods provide band-limited, in terms of signal Fourier spectra, approximation of regularly sampled signals. One can also think of signal band-limited approximation in terms of their spectra in other transforms. This approach is based on the Discrete Sampling Theorem [18, 96, 97].

The meaning of the Discrete Sampling Theorem is very simple. Given samples of a signal one can, in principle, compute certain coefficients of a certain signal transform and then reconstruct samples of a band-limited, in this transform, approximation of the signal by means of inverse transform of those coefficients supplemented with - zero coefficients. If the transform has the highest, for this particular type of signals, energy compaction capability and selected were those nonzero transform coefficients that concentrate the highest, for this particular transform, signal energy, the obtained signal approximation will have the least mean square approximation error. If the signal is known to be band-limited in the selected transform and the computed nonzero coefficients corresponded to this band limitation, signal will be reconstructed precisely.

The discrete sampling theorem is applicable to signals of any dimensionality, though the formulation of the signal band-limitedness depends on the signal dimensionality. For 2D images and such transforms as Discrete Fourier, Discrete Cosine, and Walsh Transforms, the most simple is compact “low-pass” band-limitedness by a rectangle or by a circle sector.

It was shown in ([96], see also [18, 97]) that, for 1D signal and such transforms as DFT and DCT, signal recovery from sparse samples is possible for arbitrary positions of sparse samples. For images signals, the same is true, if band-limitation conditions are separable over the image dimensions. For nonseparable band-limitations, such as circle or circle sector, this may not be true and certain redundancy in the number of available samples might be required to secure exact recovery of band-limited images.

As it was indicated, the choice of the transform must be made on the base of the transform energy compaction capability for each particular class of images, to which the image to be recovered is believed to belong. The type of the band-limitation also must be based on a priori knowledge regarding the class of images at hand. The number of samples to be recovered is a matter of a priori belief of how many samples of a regular uniform sampling lattice would be sufficient to represent the images for the end user.

Implementation of signal recovery/approximation from sparse nonuniformly sampled data according to the discrete sampling theorem requires matrix inversion, which is, generally, a very computationally demanding procedure. In applications, in which one can be satisfied with image reconstruction with a certain limited accuracy, one can apply to the reconstruction a simple iterative reconstruction algorithm of the Gerchberg-Papoulis [98, 99] type, which, at each iteration step, alternatively applies band limitation in spectral domain and then restores available pixels in their given positions in the image obtained by the inverse transform.

Among reported applications, one can mention superresolution from multiple video frames of turbulent video and superresolution in computed tomography that makes use of redundancy of object slice images, in which usually a substantial part of their area is an empty space surrounding the object [96].

5.2. “Compressive Sensing”: Promises and Limitations

The described discrete sampling theorem based methods for image recovery from sparse samples by means of their band-limited approximation in a certain transform domain require explicit formulation of the desired band limitation in the selected transform domain. While for 1D signal this formulation is quite simple and requires, for most frequently used low pass band limitation, specification of only one parameter, signal bandwidth, in 2D case formulation of the signal band limitation, requires specification of a 2D shape of signal band-limited spectrum. The simplest shapes, rectangle and circle sector ones, may only approximate, with a certain redundancy, real areas occupied by the most intensive image spectral coefficients for particular images. Figure 15 illustrates this on an example of spectral binary mask that corresponds to 9.5% of the DCT spectral coefficients of the test image shown in Figure 3 that contain 95% of the spectrum energy and a rectangular outline of this mask, which includes 47% of the spectral coefficients.

In the cases, when exact character of spectrum band limitation is not known, image recovery from sparse samples can be achieved using the “compressive sensing” approach introduced in [2932]. During last recent years, this approach to handling sparse spectral image representation has obtained considerable popularity [3336].

The compressive sensing approach assumes obtaining a band-limited, in a certain selected transform domain, approximation of images as well, but it does not require explicit formulation of the image band-limitation and achieves image recovery from an incomplete set of samples by means of minimization of norm of the image spectrum in the selected transform (i.e., of the number of signal nonzero transform coefficients), conditioned by preservation in the recovered image of its available pixels.

However, there is a price one should pay for the uncertainty regarding the band limitation: the number of samples required for recovering signal samples by this approach must be redundant with respect to the given number nonzero spectral coefficients: [30]. In real applications, this might be a serious limitation. For instance, for the test image of pixels shown in Figure 3 spectrum sparsity (relative number of nonzero spectral components) on the energy level 95% is 0.095, whereas , which means that the “compressive sensing” approach requires in this case more signal samples () than it is required to recover.

5.3. Discrete Signal Band-Limitedness and the Discrete Uncertainty Principle

Signal band-limitedness plays an important role in dealing with both continuous signals and discrete (sampled) signals representing them. It is well known that continuous signals cannot be both strictly band-limited and have strictly bounded support. In fact, continuous signals neither are band-limited nor have strictly bounded support. They can only be more or less densely concentrated in signal and spectral domains. This property is mathematically formulated in the form of the “uncertainty principle” [100] where and are intervals in signal and their Fourier spectral domains that contain a sufficiently large fraction of signal energy.

In distinction to that, sampled signals that represent continuous signals can be sharply bounded in both signal and spectral domains. This is quite obvious for some signal spectral presentation, such as Haar signal spectra: Haar basis functions are examples of sampled functions sharply bounded in both signal and Haar spectral domains. But it turns out that there exist signals that are sharply bounded in both signal and their spectral domains of DFT or DCT, which are discrete representations of the Fourier integral transform. An example of a space-limited/band-limited image is shown in Figure 16. Such images can be generated using the above-mentioned iterative Gershberg-Papoulis type algorithm which, at each iteration, applies to images alternatively space limitation in image domain and band limitation in transform domain. Such images are very useful as test images for testing image processing algorithms.

In a similar way one can generate space-limited and band-limited analogs of the discrete sinc-function, i.e. functions, which form band-limited shift bases in the given space limits. In [97] such functions were coined a name “sinc-lets.” Figure 17 shows an example of a sinc-let in its three positions within a limited interval and, for comparison, the discrete sinc-function for the same band limitation.

The following relationship between signal bounds in signal and DFT spectral domains can be derived: where is the number of signal nonzero samples, is the number of its nonzero spectral samples, and is the number of samples in the signal sampling lattice. This is what one can call the discrete uncertainty principle.

6. Conclusion

Two fundamental features of fast transforms, the energy compaction capability and fast algorithms for transform computation, make them perfect tool for image compression, restoration, reconstruction, and resampling. Of all fast transforms, Discrete Fourier Transform and Discrete Cosine Transform are the most important as they are complementing each other discrete representations of the integral Fourier transform, one of the most fundamental mathematical models for describing physical reality, and, in addition, they enable fast digital convolution. From these two transforms, DCT is preferable in most applications thanks to its ability to very substantially reduce processing artifacts associated with image discontinuities at the borders.

Modern tendency in the imaging engineering is computational imaging. Computer processing of sensor data enables substantial price reduction and sometimes even complete removal of imaging optics and similar imaging hardware. It also gives birth to numerous new imaging techniques in astrophysics, biology, industrial engineering, remote sensing, and other applications. No doubts, this area promises many new achievements in the coming years. And it is certain that fast transforms will remain to be the indispensable tool in this process.

Nomenclature

:Image samples
:Image spectral coefficients
;
:pixel and, correspondingly, spectra coefficient vertical and horizontal indices
:Various constants
:Image sampling intervals
:Sampling interval in the transform domain
:Total number of image samples
:Wavelength of coherent radiation
:Distance between object plane and hologram.
Abbreviations
DCT:Discrete Cosine Transform
DFT:Discrete Fourier Transform
DFrT:Discrete Fresnel Transform
PSNR:Peak Signal-to-Noise Ratio: ratio of signal dynamic range to noise standard deviation
RMSE:root mean square error
SNR:Signal-to-Noise Ratio (ratio of signal and noise variances).

Conflict of Interests

The author declares that there is no conflict of interests regarding the publication of this paper.