Abstract

Time-frequency representation algorithms such as spectrograms have proven to be useful tools in marine biosonar signal analysis. Although there are several different time-frequency representation algorithms designed for different types of signals with various characteristics, it is unclear which algorithms that are best suited for transient signals, like the echolocation signals of echolocating whales. This paper describes a comparison of seven different time-frequency representation algorithms with respect to their usefulness when it comes to marine biosonar signals. It also provides the answer to how close in time and frequency two transients can be while remaining distinguishable as two separate signals in time-frequency representations. This is, for instance, relevant in studies where echolocation signal component azimuths are compared in the search for the exact location of their acoustic sources. The smallest time difference was found to be 20 µs and the smallest frequency difference 49 kHz of signals with a −3 dB bandwidth of 40 kHz. Among the tested methods, the Reassigned Smoothed Pseudo Wigner-Ville distribution technique was found to be the most capable of localizing closely spaced signal components.

1. Introduction

The present paper discusses the advantages and disadvantages of seven time-frequency representation algorithms of short transient acoustic signals such as the echolocation signals from dolphins. Previous studies have shown that time-frequency representations of transient biosonar signals are valuable tools when analyzing details in the signals generated by for instance bottlenose dolphins Tursiops truncatus [1, 2]. However, it is unclear from the literature which time-frequency representation method is best suited for echolocation signals with respect to their inherent properties (signal length, signal to noise levels, frequency component compositions, etc.) or which limitations the different methods have.

This review of methods aims to serve two main purposes. First and foremost it is outlined as a guide to help researchers choose the time-frequency distribution algorithm that is best suited for their specific application in biosonar studies. Secondly, it aims to investigate how close together in time and frequency two transients, with only slightly different timing and/or frequency content, can be from each other while remaining distinguishable by currently existing time-frequency representation algorithms [3, 4].

This smallest distinguishable time difference is relevant in for example marine biosonar studies when comparing component azimuths of echolocation signals (“clicks”) in different positions within the sonar beam. Such studies would, for instance, be useful when trying to determine the propagation path of echolocation clicks inside a dolphin’s head and when trying to establish whether certain echolocation signals originate from one or two sound sources. Reports by Dormer [5], Amundin and Andersen [6], Madsen et al. [7], and Au et al., [8] suggest that the echolocation beam is generated by only one source, the right set of phonic lips. Lammers and Castellote [9] suggest that beluga whales (Delphinapterus leucas) may generate their echolocation clicks by simultaneous activation of both the right and left pairs of phonic lips. Cranford et al. [10] indicate that either the left or the right set of phonic lips or simultaneous activation of both sets of phonic lips present in the animals may produce echolocation signals.

The application of adequate time-frequency representation methods will also be useful in studies of the distribution of the frequency components within the beam, as well as in investigations of the internal beam steering ability of echolocating animals [1113]. Future studies comparing azimuths and frequency contents of different parts of the echolocation beam are likely to be useful in these areas of biosonar research. However, the smallest distinguishable time difference of transients must first be established theoretically. These results are provided in the present paper.

The signal processing methods have been evaluated after application to echolocation-inspired synthetic test signals where the parameters are based on previously reported measurements and theories regarding the generation of dolphin echolocation [811, 14]. As a reference, the methods have also been applied to measurement data recorded from a bottlenose dolphin (Tursiops truncatus) performing a target detection task [15].

2. Method

2.1. Chosen Time-Frequency Representation Methods

The time-frequency representation methods included in this study are Spectrogram (Sp), Wigner-Ville distribution (WV), Choi-Williams distribution (CW), Reassigned Multitaper Spectrogram (RMSp), Locally Stationary Process Multitaper Spectrogram (LMSp), Reassigned Smoothed Pseudo Wigner-Ville distribution (RSmWV), and Spectrogram Multiplication (SpM). All implementations of the algorithms are made by the authors.

The evaluation is based on holistic visual judgments of the ability of the time-frequency representations to localize the signal components rather than on individual quantified parameters. This strategy was chosen after careful consideration and was based on the fact that an analysis that is strictly based on quantified parameters tends to narrow down the perspective of the algorithms’ abilities to visualize the signals in an intuitively interpretable way. This limitation was largely avoided by the chosen method.

The Fractional Fourier Transform has been applied to echolocation signals before [1]. However, we chose here not to include the Fractional Fourier Transform in the algorithm comparison study since others have shown that its implementation is problematic. For instance, Bultheel and Martínez Sulbaran [16] demonstrated that the different ways of implementing the Fractional Fourier Transform did not give coherent results.

2.2. Spectrogram (Sp)

Traditional spectrum analysis techniques characterize the frequency content of signals without providing any information of when in time the different frequency components of the signal appear. However, the spectrogram divides the signal into several shorter sections, where a regular spectrum is estimated from each section, revealing in which shorter time interval the different frequencies occur. The spectrogram of a signal is defined aswhere the window function , centered at time , is multiplied with the signal before the Fourier transform and squared absolute value (see [3]). There are several advantages of using the spectrogram, for example, a fast implementation thanks to the fast Fourier transform (FFT) and a straightforward interpretation. In the ideal case, the window and thereby the time slot should be very short in order to clearly reveal the exact time when the frequencies start and stop. However, a short time slot gives a poor frequency resolution, with the resultant problem that we cannot differ between closely spaced frequencies. Therefore, each application requires a reasonable compromise between high time resolution and high frequency resolution. The length (and shape) of the window function determines the resolution in time and frequency. Usually, a proper trade-off between time and frequency resolution is found by choosing a window length of roughly the same size as the length of the signal. The 3 dB bandwidth will then be approximately , which is also the best achievable frequency resolution (with as the sampling frequency).

2.3. Wigner-Ville Distribution (WV)

The Wigner-Ville distribution is based on the Fourier transform of the so-called instantaneous autocorrelation, which provides the best possible concentration in time as well as frequency (see [3]). Usually, the expression for the Wigner-Ville distribution is referred to aswhere is the measured signal or the corresponding analytical signal. However, this representation can only be used for simple signals, consisting of a single component in the time-frequency plane. With more than one component, cross terms appear. These are located in between the components in the time-frequency plane and can be twice as large as the true signal components. The cross terms show up between all separate components and if the signal is disturbed by high levels of noise, which usually generates many time-frequency components and therefore also many cross terms, it becomes difficult to interpret the Wigner-Ville distribution.

The implementation we have made for this paper is based on the discrete-time, discrete-frequency representation described in Chapter 6 of Boashash [4].

2.4. Choi-Williams Distribution (CW)

Various kernels for smoothing of the Wigner-Ville distribution have been proposed to reduce cross terms. The smoothing is formulated aswhere is the smoothing kernel and denotes double convolution (smoothing), in time as well as frequency. Possibly the most applied method is the Choi-Williams distribution [17], also called the exponential distribution whereand is a smoothing parameter to be chosen. The Choi-Williams distribution gives a reduction of the cross terms but keeps most of the concentration of the auto-terms, when the signal components are located at different frequencies as well as different times. However, for two components located at the same frequency, or at the same time, the cross terms will still be more or less present and the remaining amplitude of the cross terms could cause a misinterpretation of the distributions.

The Choi-Williams kernel is implemented using the sampled discrete-time and discrete-frequency form presented in Chapter 6 of Boashash [4].

2.5. Reassigned Multitaper Spectrogram Using Hermite Functions (RMSp)

The concept of multitapers replaces the calculation of the two-dimensional convolution between the smoothing kernel and the Wigner-Ville distribution with so-called multitaper spectrograms, defined aswhere is the spectrogram using the window and is the number of multitapers. Using Hermite functions as windows has been shown to give the best time-frequency localization and orthonormality in the time-frequency domain. Orthonormality leads to optimal reduction of variance if the disturbances can be defined as white Gaussian noise, which results in a better estimate when the signals are heavily disturbed by noise. Another advantage of the multitaper technique is its easy and fast implementation. The multitaper spectrogram smoothes the time-frequency representation resulting in a lower time-frequency resolution and therefore the reassignment technique is applied as proposed in Xiao and Flandrin [18].

2.6. Locally Stationary Process Multitaper Spectrogram (LMSp)

A locally stationary process is here designed to have a covariance function that is a multiplication of a covariance function of a stationary processand a time-variable functionasThe parameter is a design parameter. The process is nonstationary with properties suitable for modeling measured signals that, for example, have a transient amplitude behavior like the echolocation signals of odontocetes. For such signals, the mean square error optimal smoothing kernel and the corresponding multitaper spectrogram representation is defined according towhere is the spectrogram using the th Hermite function. The parameters , , correspond to a set of weighting factors that are defined depending on the value of the model-parameter [19].

2.7. Reassigned Smoothed Pseudo Wigner-Ville Distribution (RSmWV)

The reassignment principle has existed for some time but was not applied to any large extent until Auger and Flandrin reintroduced the method [20]. The idea behind the reassignment is to first reduce the cross terms by smoothing of the Wigner distribution and then recapture the localization of a single component by reassigning its mass to the center of gravity. In RSmWV, the smoothing is defined aswhere and are separate smoothing window functions for time and frequency, respectively. This so-called separable kernel replaces the 2D convolution of the quadratic time-frequency representation with two 1D convolutions, which might simplify the choice of suitable parameters in the design of the windows. In this form, the spectrogram is easily seen as a weighted sum of all the Wigner-Ville distribution values at the neighboring points. The “mass” at these points should be assigned to the centers of gravity, which are calculated asThe reassignment is then made according towhere is the Dirac impulse which reassigns (moves) the “mass” of to the time and frequency values specified by and . This leads to perfectly localized signals in the case of single component chirp signals, frequency tones, and impulse signals.

We have implemented the discrete time-frequency algorithm called MSPWVD in Auger and Flandrin [20] using the smoothing parameters and . The time-smoothing window has the length samples (approximately a time-smoothing width of . The frequency smoothing window has a 3 dB frequency bandwidth  Hz, determining the possible frequency resolution.

The reassignment procedure can be made for all time and frequency values. However, there are several advantages by setting the spectrum values to a lower limit, below which no reassignment is performed. This limit is mostly related to the “noise floor,” the disturbances showing up randomly everywhere in the time-frequency plane, caused by the noise disturbance of the signal. The other advantage is the faster computation as the reassignment made at every time and frequency value is a heavy calculation procedure.

2.8. Multiplication of Short Time Fourier Transforms (SpM)

The spectrogram multiplication method is based on an optimal combination of spectrograms from a minimum mean cross-entropy criterion, [21] and the solution becomes a multiplication of a long-window and a short-window spectrogram (geometric mean),The idea of this method is to overcome some of the disadvantages of having a trade-off between a high time resolution and a high frequency resolution. The signal components that are present in both spectrograms (both the long-window and the short-window spectrograms) are enhanced with this method.

2.9. Description of the Test Signals

The frequency contents of three different synthetic signals over time are represented with all seven methods. The noise sensitivities of the methods are compared by adding white Gaussian noise to the third test signal. The fourth test signal is a measured click recorded from a bottlenose dolphin (Tursiops truncatus).

The first test signal (S1) is a synthetic two-component signal composed of two transient chirps overlapping in time but separate in frequency content; see Figure 1. The linear chirp rate () is  Hz/s, the start frequencies of each signal component are 40 kHz and 100 kHz, and the total signal is approximately 70 μs long. It is generated by the superposition of two chirp components defined bywhere = time. Each component was sampled with  MHz and windowed with a Gaussian window (GW) before summation. The window can be described bywhereThe windows smoothed the frequency spectra so that the true start frequencies were in fact lower than . The start frequency of the 100 kHz signal component was stepwise decreased until all of the time-frequency representation methods displayed S1 as one single signal and the two components merged into one single click structure. This was done in order to establish the smallest frequency difference that the two components of S1 could have while still being represented as two components by each of the studied time-frequency representations.

The second test signal (S2) (see Figure 2) was a synthetic two-component signal comprising two successive transient chirps with overlapping frequency content; see Figure 2. Both chirps were created in the same way as those in S1. The start frequency of both components was set to 50 kHz and the start time difference was 35 μs. The start time difference was later stepwise decreased until the two signal components were represented as a single signal by all methods. This was done in order to establish the smallest distinguishable time difference between two signal components.

The third tested signal (S3) comprised one transient up-sweep chirp and one shorter transient chirp partially overlapping in both time and frequency content; see Figure 3. The shorter of the two components was windowed with a Gaussian window described by (15) where  s. The amplitude of that component was doubled compared to the other one before windowing to compensate for signal loss due to the narrower window.

Generally, the further apart the components are in either frequency or time, the easier they are to accurately represent in both time and frequency. Therefore, the frequency and/or time differences between the signal components have been chosen in order for some of the studied methods to collapse and not be able to separate the two components. This was done to enlighten both the advantages and limitations of the different methods.

The capacity of the methods to cope with noise was tested by adding white Gaussian noise with a mean value of zero and a variance of 0.1 to S3.

The fourth tested signal (S4) was an echolocation click from a male bottlenose dolphin (Tursiops truncatus). The signal was recorded 45 degrees off the main acoustic axis of the animal and contained two transients, separated in time (see Figure 4). A complete description of how the signal was measured can be found by Branstetter et al. [15]. In summary, the animal was performing a strictly echoic target detection task while stationed on a bite plate. The signal was recorded with 300 kHz and band pass filtered from 100 Hz to 200 kHz.

All the time-frequency representation figures (Figures 510) consist of grids of 300 time points and 819 frequency points ( MHz, FFT-length 4096). The gray scale in the figures ranges linearly from the minimum value to the maximum value of each time-frequency representation. Three of the methods, that is, WV, CW and LMSp, provided negative values in the plots. Hence, the zero level has not been represented as black in these plots, as it has in the other methods. This is also the reason why we chose a linear scale in all figures instead of a logarithmic one.

The applied window functions used in the Sp, RSmWV, and SpM methods are Hanning windows.

3. Results and Discussion

Figure 5 shows the resulting time-frequency representations of S1, plotted with manually optimized parameters for this specific signal. The two signal components that overlapped in time but were separate in frequency content were represented as two partially merged areas by the Sp, the LMSp, and the SpM method. Cross-terms between the signal components dominated the WV and CW time-frequency representations. The RMSp, reassignment based on a multitaper spectrogram, requires rather long time windows to achieve the required frequency resolution of the two time-frequency components. The smoothing in time, combined with the reassignment technique, produces picture where there actually seems to be four components. RSmWV, however, managed to separate the two signal components and demonstrated that there was a chirp-like tendency in them.

The minimum frequency difference between the signal components in S1 was 49 kHz. The components were not distinguishable when the frequency difference was below this value, except with RSmWV, which continued to display two signals (see Table 1).

The visualization using WV and CW are also here heavily disturbed by the cross terms between the two components (see Figure 6). For the Sp and LMSp the leakage causes artifacts and when the time distance of the two components is narrowed, the resulting picture will visualize just one component. When the time difference between the signal components was decreased to 20 μs, only RMSp, RSmWV, and SpM displayed them as separate signals. Signals closer in time than 20 μs were visualized as a single signal. It is also notable that the shape of the slope of the chirp in S2 was discernible with RMSp, in this case (compared to the previous example with S1), and the slope could not be clearly visualized by any of the other methods.

S3 is a good example of a signal that is difficult to display accurately in the time-frequency plane (see Figure 7). RMSp, RSmWV, and LMSp displayed it as containing two components separated in time and partially overlapping in frequency content, as the frequency resolution was too low using the best possible time-resolution. The RSmWV and the LMSp representations factually indicated that the signal contained one short broadband transient in addition to one longer and more narrow band transient. All other methods represented S3 as a triangular structure, caused by the cross terms.

Adding white Gaussian noise of 0.29 Vrms to S3 (see Figure 8) gave the result shown in Figure 9. With the exception of RSmWV, all methods demonstrated a collapse in the presence of intense noise. The Sp, LMSp, and Spm all become difficult to interpret, due to the smoothing of signal components and additional interfering with noise components. The cross terms with noise components in WV and CW make these time-frequency spectra impossible to interpret. In the RMSp reassignment of noise components to single time-frequency points also makes the visualization unclear.

According to all methods, S4 contained two click components and one low-frequency click approximately 63 μs before a more high-frequency click (Figure 10). The center frequency of the two clicks was 25 kHz and 103 kHz. The time-frequency visualizations are most similar to the example signal S3, as there are both a time and a frequency difference between components. The typical view with cross terms between components is seen for the WV and CW and the smoothing, especially in frequency, is clear for the Sp, LMSp, and SpM. However, RMSp and RSmWV showed more localized signal energies as compared to the other methods.

Table 1 summarizes the results in terms of which methods that were able to localize the signal components in both time and frequency for the smallest possible frequency and time differences. The RSmWV method was the only one that managed to localize both signal components accurately for extremely small time and frequency differences between the signal components.

The two-component Gaussian windowed chirp signal used in this study consists of components that are approximately 70 μs long. For components overlapping only in time, a frequency difference no smaller than 49 kHz was required for signals with a −3 dB bandwidth of 40 kHz in order to be able to clearly localize both signal components. For components overlapping only in frequency, the time difference had to be at least 20 μs for any of the investigated methods to visually resolve them as two components. For signals of more complex structure, for example, overlapping in frequency as well as in time, and where the two components were of different length, special care had to be taken when interpreting the results. For such signals, it is advisable to base the interpretation on the results from more than one method.

Furthermore, the results indicate that the RSmWV method was the least sensitive to noise disturbances. It was also the most robust when taking into account that different noise realizations result in rather different appearances of the time-frequency representations. RSmWV and RMSp are both based on smoothing or multitapering, which reduces noise disturbances in the time-frequency plane. The reason for the RMSp and RSmWV to perform different and better than the other methods is however the reassignment technique, which moves mass of the spectrum to the mass center of the corresponding spectrum. Therefore, the reassignment based methods are not useful for finding, for example, the energy or amplitude of components but is a technique for visualization to resolve time-frequency components. A general reflection is that these results also indicate that the reassignment technique is very useful when it comes to noisy transient signals and would thus benefit scientists in marine biosonar research. Nevertheless, there is room for further developments of for instance design parameters and smoothing/multitaper techniques of time-frequency representations to further optimize the reassignment methods to short transient biosonar signals in future studies.

4. Conclusions

The smallest frequency difference between signal components overlapping in time while remaining distinguishable with currently existing time-frequency representation algorithms was found to be 49 kHz. The corresponding smallest time difference between signal components overlapping in frequency content was found to be 20 μs.

Among the tested methods, RSmWV was the one most capable of localizing closely spaced signal components. The results also suggested that it was the most robust technique in noisy environments. However, a disadvantage with RSmWV is the need to find suitable and parameters, in combination with deciding a threshold value for the reassignment procedure. The method hence requires some degree of parameter “tweaking.” Although the SpM method requires the setting of two parameters, these are easily interpretable as time window sizes. This straightforward interpretation is not possible for the and values of the RSmWV method. Therefore, the SpM method might be preferred in studies where there are no extreme demands when it comes to signal component localization precision.

The reassignment technique is effective also for noisy transient signals and is thus well suited for use in marine biosonar research.

Conflict of Interests

The authors declare that there is no conflict of interests regarding the publication of this paper.

Acknowledgment

The authors thank Brian Branstetter for letting us test the methods on one of his highly interesting recordings.