Audio Streaming with Silence Detection Using 802.15.4 Radios
Short-range radios with low data rate are gaining popularity due to their abundant commercial availability. It is imperative that high-speed multimedia would be an attractive application field with these radios. Audio over 802.15.4 compliant radios is a challenging task to achieve. This paper describes a real-time implementation of audio communication using 802.15.4 radios. Silence detection and soft ADPCM are the main features of our work. Our results show that silence detection improves bandwidth optimization and audio communication performance over low bit-rate radios.
Wireless technologies due to their attractive features like wireless connectivity, easy and quick installation, and low energy operations are making inroads in various application domains. Major wireless technologies are WiFi (Wireless Local Area Network over IEEE 802.11a/b/g), Bluetooth, WPAN (Wireless Personal Area Network over IEEE 802.15.1), and LRWPAN (Low Rate Wireless Personal Area Network over IEEE 802.15.4). WiFi being designed for long-range and high-speed data rates has already had its impact on local area networking and wireless data networks. Bluetooth was primarily designed for short-range wireless connectivity and has found its effective application areas in computer and consumer peripheral connectivity. The Bluetooth is the most complicated protocol with 188 primitives and events in total. On the other hand, LRWPAN is the simplest one with only 48 primitives defined in 802.15.4 . LRWPAN is an emerging technology for low data rate small networks like WSN, home automation, security surveillance systems, precision agriculture, medical health care networks, and sensitive military applications. Due to its all encompassing coverage of application domains, it has become imperative to explore multimedia streaming over LRWPAN. Audio transportation over LRWPAN is an important area opening many possibilities for the application areas like wild fire detection, habitat monitoring, and environmental monitoring. In particular they will be useful in distributed surveillance, emergency, and rescue where audio and video streaming over low cost LRWPAN networks is highly desirable . The challenges of multimedia over WSN are discussed in ; authors note that high bandwidth demand and strict time-constraint of multimedia communication bring significant challenges for sensor networks in matching the energy and processing capacities.
This work presents a case study of real field implementation of audio streaming over LRWPAN short range radio network specifically 802.15.4 devices with real-time software-based ADCPM (Adaptive Differential Pulse Code Modulation) compression, silence detection, and data synchronization. We investigate problems of software ADPCM implementation, effects of data buffering and silence detection over important performance metrics like jitter and bandwidth efficiency. Real-time results show that LRWAPN makes a strong case for audio streaming applications if employed with techniques to meet timing restriction of real-time data streaming and to improve bandwidth efficiency.
Section 2 discusses related work and explains how this work differs from them. Challenges, issues, and their probable solutions are also discussed in the same section. Section 3 presents implementation details and useful tips for application developer and researchers alike. Section 4 discusses real-time results with performance metrics for audio streaming. In Section 5 we conclude with our remarks based on presented work.
2. Previous Work and Multimedia Streaming over 802.15.4
There are a few implementations reported in real time audio data streaming. Firefly  is TDMA based, 42-hop two-way audio streaming implementation used for coal mine. Authors have used AM transreceiver as an external hardware for time synchronization which draws 5 mA additionally. Reference  has analyzed the reliability of two-way radio communication over 802.15.4 devices over multi-hop scenario. They have enhanced TETRA (TErrestial Trunked Radio) protocol based on TDMA and have used SimplicTI as low power RF network protocol. Data compression is achieved through an external vocoder for audio data processing. Reference  has presented audio streaming implementation over ZigBee network. The authors have investigated the capability of ZigBee protocols for low-rate voice streaming using packet loss as a performance metric. The implementation uses external programmable ADPCM processor providing adjustable audio rate ranging from 16 Kbps to 64 Kbps.
This work differs from ; we use CSMA/CA as against TDMA used in the cited work. TDMA requires additional hardware or additional time synchronization protocol. Both place additional burden on the system in terms of energy and complexity of time synch protocol and messages. In investigating the capacity of the ZigBee network  over audio streaming authors have used external vocoder; we have used inline software-based ADPCM compression saving additional data compression chip and thereby energy consumed by the hardware. Finally we have implemented silence detection and message synchronization which to our notice have not been reported till today.
2.1. Challenges and Issues of Multimedia Streaming over LRWPAN
2.1.1. Bandwidth and Data Rate for Audio
The bandwidth of 802.15.4 radios is 250 Kbps. Useful bandwidth with ACK and the short address scheme is not more than 55.6% which gives a maximum usable bandwidth of 139 Kbps . Audio streaming with 8 Kbps sampling rate, 12-bit sample size, and 1 : 3 compression ratios would require 32 Kbps bandwidth. Another restriction is packet size; maximum data size accommodated is 114 bytes. In implementation it can even reduce due to other signaling and network related data. For 108 bytes of payload it can carry 216 samples of audio data, that is, 27 ms audio data. Packet transmission time with full packet load is 6.56 ms .
2.1.2. Data Compression in Audio
ADPCM is simpler than advanced low bit-rate vocoders and can complete encoding and decoding in a relatively short time . The author has implemented ADPCM code on 8-bit microcontroller. Other cited works have used external CODEC for compression. ADPCM uses difference between previous (predicted sample) and the current sample values; since this difference is smaller than the actual sample size, the data size is reduced. In ADPCM predicted sample dynamically adapts to the changing signal which ensures that the mean squared prediction error is continually minimized. The compression ratio will depend upon the number of bits used to represent the difference and the sampled number of bits. These can be 2 bit, 3 bit, 4 bit sizes. Each would give 16 Kbps, 24 Kbps, and 32 Kbps data rate, respectively. This implementation uses 4-bit ADCPM code size as it gives satisfactory voice quality .
2.1.3. Silence Processing
Receiver-based error recovery schemes aggressively use insertion techniques where lost data packets are replaced with prestored silence packets or background noise packets [4, 9]. However at transmission side silence can be detected and processed and will be beneficial for low bit-rate radio communication. It can be seen that in normal conversation, 50%–60% communication is silence. Silence detection and sending silence in encoded form have two basic advantages; firstly it will reduce data size thereby reducing the bandwidth usage, and secondly it will also reduce ADPCM compression time.
2.1.4. Performance Parameters and Energy Considerations
The important streaming metrics such as throughput, packet loss, and jitter has been considered in [2, 4, 10]. This implementation examines these parameters in conjunction with silence detection and processing.
The implementation is done through the Jennic-5139 microcontroller kit. This CPU contains 12-bit ADC and 11-bit DAC and a fully compliant 2.4 GHz, IEEE 802.15.4 transceiver. A separate audio signal processing circuit is used to interface microphone and earphone device. The analog signal is single ended output in the range of 0–3 V.
Figure 1 shows the block diagram of setup for the audio communication between end node and master sink device. A timer of 8 KHz was employed to sample the audio data through ADC. 12-bit inbuilt ADC is used whose conversion time is 52 us (at 500 KHz clock 26 clock cycles to convert the data sample). An interrupt of 8 KHz allows capturing audio data and filling in the input buffers D1 and D2. Two buffers are employed so that when first buffer is full and is in process of conversion and transmission another buffer gets filled through interrupt driven data capturing process. Once the buffer is full, it is passed through silence detection and encoding process. If there is non-silence data, it is encrypted through software implemented ADPCM algorithm and transferred to the compressed data buffer C-buf. Else it is directly transferred to compressed data buffer. This compressed buffer is packetized, ready to be transmitted to the destination device. At the receiving device received packet is decrypted through software implemented ADPCM decryption algorithm. Then silence data is filled at indicated location with supplied length. Two buffers are employed at receiver side so that when one buffer is under data dissemination to DAC at 8 KHz rate, another buffer can be free for accommodating the received packet data. All the experimentation used a 20-second audio file to be played for audio transmission. Experimentation was carried by analysing the packets generated and received at each end device.
The decompressed signal will produce glitch at signal values close to higher and lower bounds of (near to 0x000 and 0xfff) ADC. The glitch can be generated because ADPCM state goes out of synch due to packet loss or very small and very large values of ADPCM coded value, these can be smoothened out using soft threshold filter, smoothing them at higher or lower bounds.
We have used single ended 0–3.3 volt analog signal as input with 12 bit-ADC; it is digitized in the range of 0x000 to 0xfff values. We have used values in the range of 0x7f0 to 0x80f as silence values. If there is a continuous patch of values between these ranges, it is assumed to be silence signal. The rest of the signal is compressed. A packet contains pointer to position of start of silence values and size of the silence values along with rest of the compressed values:
4. Results and Discussion
4.1. Conversion Time and Power Consumption
Table 1 shows power consumption and conversion time per sample of the some of codec chips giving ADPCM output. CPU used in this work is 32-bit, 16 MIPS. The power consumption of this chip in active mode with ADC/DAC and transreceiver in idle state is 92 mW . It can be seen that any addition of external codec chip would add substantial energy budget to the design. The conversion time is 69 us, 52 us for ADC conversion as mentioned above, and 17 us for ADPCM software conversion time. These two parameters support our selection of software-based ADPCM implementation. The conversion time of the ADPCM is averaged over number of samples. ADPCM without silence conversion time was 3.1 ms for sample size of 180 samples, while ADPCM conversion time with silence detection was observed as Min. value 1 ms, Max value 3 ms, and average value 2.5 ms.
Jitter is one of the important parameters to be controlled for smooth audio streaming in the network. Network communication problems result in jitter. Firstly, jitter due to dropped audio data packets because of in-adequate buffer size and un-synchronized packet sending. This can be taken care by proper buffer size calculation and additional buffer with synchronized packet sending. Secondly, jitter due to variation in arrival rate of the packet due to network conditions. This is difficult to handle in wireless network because of unpredictable behavior of the wireless media and congestion in the network.
Figure 2 shows graph for number of samples versus % jitter. The graph shows remarkable reduction in the jitter as more numbers of samples are buffered in the system. Further reduction in jitter is clearly seen with silence processing added with same setup. It is seen that silence processing does not have more advantage in jitter reduction with fair buffering of audio sampling being achieved.
4.3. Bandwidth Efficiency and Communication Load
Figure 3 shows average packet size reduction due to addition of silence processing. With packet size of 50 bytes that is 100 samples, average packet size reduction is 62%. With packet size of 75 bytes that is 150 samples per packet, average packet size reduction is 68%, while with packet size of 90 bytes that is 180 samples, average packet size reduction is 69%. In each of these situations the buffer size is equal to packet size. As the packet size increases, the number of samples also increases which makes silence processing more efficient.
The effect of silence processing over audio data can be clearly seen in Figure 4. Average data sent without silence is 31476 bytes/second while average data sent with silence is 13000 bytes/second. This is demonstrated in BW utilization graph in Figure 4.
Bandwidth was calculated as The communication bandwidth use has been decreased from average 13% without silence to average 5.2% due to sharp drop in the data to be sent over the channel. This is very useful feature when using high bandwidth audio data transmission over low bandwidth 802.15.4 radios. Particularly the Qos of real-time audio transmission can be improved by applying tight congestion control and packet recovery techniques . These techniques will require more bandwidth which now can be made available.
This paper presents an audio data transmission over LRWPAN which is a low bandwidth wireless network. Audio transportation over LRWPAN is a difficult task due to tight timing constraints of real-time audio streaming over low-rate wireless radios. The focus of the work is twofold: first to reduce energy consumption by using software implementation of ADPCM saving external hardware for signal sampling and data compression and second silence processing at sender side thereby reducing the size of data sent over the communication channel. The jitter can be reduced by using buffering at receiver side. It was seen that addition of silence processing did not introduce significant time delay at sender side. The results show a significant reduction of bandwidth utilization due to silence processing thereby enhancing the capacity of real-time voice streaming over low-rate wireless network.
E. GURSES and Ö. B. Akan, “Multimedia communication in wireless sensor networks,” Annals of Telecommunications, vol. 60, pp. 799–827, 2005.View at: Google Scholar
D. Brunelli, M. Maggiorotti, L. Benini, and F. L. Bellifemine, “Analysis of audio streaming capability of Zigbee networks,” in Proceedings of the 5th European Conference on Wireless Sensor Networks (EWSN '08), R. Verdone, Ed., pp. 189–204, Springer, Berlin, Germany, 2008.View at: Google Scholar
B. Latré, P. De Mil, I. Moerman, N. Van Dierdonck, B. Dhoedt, and P. Demeester, “Maximum throughput and minimum delay in IEEE 802.15.4,” Lecture Notes in Computer Science, vol. 3794, pp. 866–876, 2005.View at: Google Scholar
J. A. Kang and H. K. Kim, “Adaptive redundant speech transmission over wireless multimedia sensor networks based on estimation of perceived speech quality,” Sensors, vol. 11, pp. 8469–8484, 2011.View at: Google Scholar
R. Alesii, F. Graziosi, L. Pomante, and C. Rinaldi, “Exploiting WSN for audio surveillance applications: the VoWSN approach,” in Proceedings of the 11th EUROMICRO Conference on Digital System Design Architectures, Methods and Tools (DSD '08), pp. 520–524, 2008.View at: Google Scholar