Abstract

We present a method for coding speech signals for the simulation of a cochlear implant. The method is based on a wavelet packet decomposition strategy. We used wavelet packet db4 for 7 levels, generated a series of channels with bandwidths exactly the same as nucleus device, and applied an input stimulus to each channel. The processed signal was then reconstructed and compared to the original signal, which preserved the contents to a high percentage. Finally, performance of the wavelet packet decomposition in terms of computational complexity was compared to other commonly used strategies in cochlear implants. The results showed the power of this method in processing of the input signal for implant users with less complexity than other methods, while maintaining the contents of the input signal to a very good extent.

1. Introduction

Human cochlea is a coiled duct, filled with fluid, and divided into three sections: the scala vestibuli, scala media, and scala tympanci. The scala media is separated from the scala tympani by basilar membrane. Input sound vibrations create a pressure difference between both sides of the basilar membrane which results in a corresponding movement of basilar membrane. Above the basilar membrane lies the tectorial membrane, and between the two membranes, lie the hair cells.

There are two types of the hair cells, the outer hair cells (OHCs) and the inner hair cells (IHCs). Three outer hair cells deliver information to one inner hair cell.

Mechanical movement of OHCs in response to the input sound vibrations is transformed by IHCs into neural activity.

Any damage to the cochlea is mainly associated with damage or loss of the outer hair cells, while the inner hair cells are mainly remained intact [1]. In case of missing of OHCs in any part of the cochlea, the frequencies associated with this specific location along the basilar membrane will not be sensed any more.

Cochlear implants are being widely used for the patients with severe damages in the cochlea. The main parts of a cochlear implant from inside to the outside of the ear are an array electrode (surgically placed into the inner ear), a receiver (intracranial), a transmitter (placed on the skull using a magnet, fitted on the exact location of the intracranial receiver), speech processor (placed behind the ear), and a microphone which collects the sound [2].

A damaged cochlea is not able to analyze the input signal into proper frequency bands. A speech processor is designed to overcome this disability and simulate the function of a healthy cochlea.

Many studies have been conducted to develop speech processing techniques. The previous techniques were not successful in proving time and frequency resolutions at the same time. Filter banks, for instance, despite of providing excellent temporal resolution, are not successful in providing adequate frequency resolution. On the other hand, FFT is strong in frequency resolution while limited in temporal resolution [3].

Wavelet has overcome the limitations of the previous methods by providing both time and frequency resolutions [4]. The DWT is defined by the following equation: ๎“๐‘Š(๐‘—,๐‘˜)=๐‘—๎“๐‘˜๐‘ฅ(๐‘˜)2โˆ’๐‘—/2๐œ“๎€ท2โˆ’๐‘—๎€ธ,๐‘›โˆ’๐‘˜(1) where ๐œ“ is the mother wavelet, ๐‘ฅ(๐‘˜) is the input signal, the coefficient 2โˆ’๐‘— represents the scale, and the constant ๐‘˜ is used to generate different positions in the wavelet. As a result, both temporal and frequency resolutions can be obtained without any restriction.

This paper is organized as follows. Section 2 describes the preprocessing method (Section 2.1), a justification of why wavelet has been chosen as the leading strategy (Section 2.2), steps taken towards the selection of the wavelet-packet-decomposition tree, and signal analysis (Section 2.3). Section 3 covers the results. Finally, the conclusions are given in Section 4.

2. Material and Methods

2.1. Preprocessing

The electrical format of the input acoustic signal is generated by a microphone. A gain controller is used to enhance quality of this signal. In order to preemphasize the high frequency components corresponding to consonant parts of the speech, a first order high-pass Butterworth filter with cut-off frequency of 1.2โ€‰KHz is applied [5].

The signal is the digitalized using an A/D and further filtering is performed to remove its DC component (Figure 1(a)).

2.2. Selection of the Wavelet-Packet-Decomposition Tree

The next stage is signal analysis. As described in the introduction, we chose wavelet transform for this purpose. MATLAB provides a family of wavelet packets including Haar, Daubechies, Symlets, Coiflets, and Meyer. The main challenge is now to select the best wavelet-packet-decomposition tree (WPD) which provides the best speech recognition after the reconstruction, while each family has its own advantages. Daubechies provides a maximally flat response, while the Symmlet is more symmetrical [6].

In 2006, Nogueira et al. applied a db3 and Haar wavelets to decompose the input signal. The results were tested on a group of 7 patients which showed the significant efficiency of db3 [6]. Studies of Gopalakrishna et al. have shown that Daubechies (db4, for six levels) and Symlets have similar results [3]. Comparing the mentioned wavelets, we chose wavelet packet db4 for 7 levels.

There are several reasons for choosing wavelet packet db4. Firstly, db is compact support which results in finite number of filter parameters and faster implementation. We demand a filter of shorter length which results in better frequency resolution and noise reduction. The length of the filter in db wavelet (2๐‘, where ๐‘ is the order of the filter) is less than others (e.g., 6๐‘ for coiflet).

Secondly, db wavelet has maximum number of vanishing moments amongst others which results in a better regularity of the signal. Therefore, signal components with fine amplitudes will not be missed.

Finally, db has identical forward/backward filter parameters which enables us with a fast, exact reconstruction of the speech signal.

Thus, db4 meets our expectations for speech signal processing.

After choosing the suitable wavelet packet, we decomposed the signal in seven levels to be of the same frequency range as the nucleus device for each channel.

The next stage is signal analysis as described in the next section.

2.3. Signal Analysis

Figure 1 shows a block diagram of the decomposition strategy. We applied wavelet packet db4 for 7 levels which will generate a series of channels with bandwidths exactly the same as nucleus device [7]. The correspondent phi (๐‘Š0) and psi (๐‘Š1โˆ’๐‘Š6) functions are shown in Figure 2.

Figure 3 shows the related wavelet-packet tree. In this figure, the numbers next to the tree branches indicate the channel numbers. By adding some nodes together, the correspondent frequency range is generated and specified to the demanded channel. Node [7,0] includes dc information which is not used in the calculations in this paper. Table 1 presents a list of frequency bands of different channels.

We will assume a sampling frequency of 16โ€‰KHz, simulation rate of 250โ€“2400 pulse per second, and a number of 22 channels corresponding to the 22 electrodes in the cochlear implant. Decomposition of the signal in each level causes further down sampling and decrease in the simulation rate. Depending on the patientโ€™s MAP, in order to increase the simulation rate, the output needs to be repeated or linearly interpolated [5].

After decomposition of the frequency band of the input signal into the mentioned 22 channels, the envelope of the signal is detected for each band. A second-order low-pass Butterworth filter with cut-off frequency of 200โ€“400โ€‰Hz is applied for smoothing [8, 9], as shown in Figure 1(b).

Then, ๐‘›-of-๐‘š strategy is used to select 10 channels with maximum amplitude out of 22. The 10 selected channels contain the maximum share in energy of the signal and not all of the other channels are needed to provide the proper hearing sensation [10].

Finally, amplitude matching is needed to map the decomposed signal to the dynamic range of the human ear. For this purpose, a nonlinear logarithmic function is used [11], shown in Figure 1(c).

3. Results

After preprocessing of an input signal and implementation of the WPD, the signal was analyzed as described in Section 2.3. The decomposed signal was then reconstructed to test the accuracy of the method.

The cross-correlation between the reconstructed and original signals was over 90% (dependent on the type of the input signal), showing the power of this technique and its capability in representation of a high percentage of the original signal for the implant user, despite eliminating information of some nodes (with less share in the energy of the signal).

Furthermore, the complexity of the algorithm needs to be low, so as to minimize the power requirements. The latter criterion is important as it can prolong the battery life of the speech processor. Table 2 includes a comparison list of approximate number of multiplications in various speech processing techniques. The result shows that waveletโ€™s complexity applied in this study is significantly lower than other methods.

4. Conclusion

In this paper, we presented a wavelet-based technique to decompose the input signal into different frequency bands. For the decomposition strategy, we chose WPD (wavelet packet db4 for 7 levels) and generated a series of channels with bandwidths exactly the same as nucleus device. Reconstruction of the decomposed signal showed that our technique can generate the processing with less complexity than other methods, while maintaining over 90% of the original signal. This makes the strategy a good choice with respect to accuracy and power saving for cochlear implant users.