Abstract

In recent years, the wide applications of the wireless sensor networks have achieved great success. However, the security is a critical issue in many scenarios ranging from covert military operations to the organization of the social unrest. Because the traditional encrypting methods are easy to arouse suspicion, an adaptive audio steganography method is proposed. The method is based on interval and variable low bit coding, which can be applied to covert wireless communication. The interval for embedding secret messages into the audio file and the threshold in variable low bit coding are used for selecting the embedding location and embedding bits adaptively; thus the embedding capacity and the embedding rate are variable. Experimental results demonstrate that the proposed method has better performance in embedding rate and invisibility than other audio steganography methods.

1. Introduction

Nowadays, wireless sensor networks (WSNs) and multimedia are getting increasing attention from both academic and industry communities, which hold the promise of facilitating large-scale and real-time data processing including video and audio in complex real-time multimedia environments, retrieving multimedia content, and object detection [13]. The security for real-time communication is a critical issue that must be resolved. In this paper, real-time audio communication is focused. Most often, cryptography techniques are utilized for the security of WSNs, which are based on rendering the content of a message garbled to unauthorized people [4]. However, it is well known that cryptography methods make people become aware of the existence of secret information. Hence, steganography, which is a process of embedding secret messages into a cover signal to avoid illegal detection [5], was introduced to ensure the transmission safety of the secret information and authenticate the multimedia data in WSNs [6, 7]. A verifiable diversity ranking search scheme over encrypted outsourced data is proposed while preserving privacy in cloud computing, which also supports search results verification [8].

In recent years, many methods have been proposed for steganography and steganalysis, such as coverless information hiding [9], steganography, and steganalysis based on deep learning [10, 11]. However, it is still a challenging problem to use audio signal as the cover signal, because Human Auditory System (HAS) is very sensitive. The existing typical audio steganography methods can be divided into time domain methods, transform domain methods, compressed domain methods, and phase methods. Most time domain methods refer to the methods based on least significant bits (LSB) of the audio data [12]. The algorithm performs scrambling pretreatment using logistic chaotic on the secret watermarking information to make watermarking bits garbled. The public audio signal is transformed by DWT, so that the secret information is embedded into the selected wavelet coefficients with multi-resolution. The original carrier is not required in watermarking recovery. Another time domain method is echo hiding method [13]. Because the weak signal becomes unable to hear after the strong signal disappears, the echo is introduced to achieve information hiding. This method is robust but has high computational complexity, low capacity, and low extraction accuracy. The transform domain methods firstly need to do discrete Fourier transform (DFT), discrete cosine transform (DCT), or discrete wavelet transform (DWT), or a combination of the three transforms, and then select certain frequency coefficients to embed the secret message. These methods have good robustness but low capacity and relatively complex calculation [14, 15]. The compressed domain methods are suitable for the compressed audio files. These methods are not only difficult to implement but also have limited hiding capacity [16]. Because HAS is sensitive to relative phase, but not sensitive to absolute phase, the methods based on phase coding are proposed. The secret message is embedded by replacing the absolute phase of the original audio with the relative phase. These methods are robust, but the capacity is limited and the calculation is more complicated [17, 18]. According to the different application scenarios, a method with a rational selection among hiding capacity, invisibility, and robustness is expected.

There are some data embedding algorithms based on LSB. The lowest bit coding is the method that embeds secret data only in LSB. This method can minimize the transition and obtain the embedding capacity up to 12.5% of the wav file [19]. The parity coding method breaks a signal down into separate regions of samples instead of individual sample. If the secret bit to be encoded does not match with the sample region’s parity bit, then it flips the LSB of one sample in the region [20]. Exclusive OR (XOR) operation is firstly performed on the LSB, and then the LSB of the sample is modified or kept unchanged according to the result of XOR operation and the message bit to be embedded [21]. About bit selection, the different bits are selected to hide the secret data in each sample. For instance, the first two most significant bits (MSB) of a sample are used for bit selection and only the first three LSB are used for data embedding [22]. A parameter R, which is an interval, is set for embedding the bit stream of a secret message into each byte of audio packets. If , the secret message is embedded into every 2 bytes of audio streams, while if , the secret message is embedded into every 3 bytes of audio streams [23]. While through selecting some samples, only a few samples are used for data hiding instead of using all the samples. For instance, the first three MSB are used to select the next sample for embedding the secret bits [22]. Fibonacci sequence is used to select the samples for data hiding [24]. In average amplitude method, the average amplitude data of surrounding audio data is used as a threshold. If the amplitude level is bigger than the average value, then 2 LSB are used for embedding; otherwise the secret data will not be embedded [19]. The variable low bit coding is the improved version of the lowest bit coding which can increase the embedding capacity. Because the sound is a silence at the middle range of audio data, the data cannot be embedded in the middle range. Two thresholds are defined based on the standard level, which is calculated by the middle range. The thresholds are used for selecting the embedding bits.

Many improved LSB methods are based on the combination of several algorithms. In [25], a dual random LSB method is proposed by combining Huffman coding with RSA encryption. The method embeds the data in variable LSB depending on the MSB of cover audio samples. Also, there is an increase in capacity due to the use of Huffman coding. In [26], for each sample, the third and fourth LSB are replaced with the secret message and the second and fifth LSB are altered by using an intelligent algorithm so that the stego sample gets minimized. In [27], the audio samples are 16 bits. The secret message is embedded into the coefficients of a cover audio. Each secret bit is embedded into the selected position of a cover coefficient. The positions are selected from the 0th to 7th LSB based on the MSB. In [28], a wav-audio steganography algorithm based on modifying amplitude is proposed. Sampling points are grouped by each three successive ones and the amplitude values are calculated. Secret message is embedded by modifying the amplitude value of the second sampling point by comparing the amplitude value of the second sampling point with the average value of the first and the third sampling points. In [29], a novel reversible natural language watermarking method combines arithmetic coding and synonym substitution operations. The original context can be perfectly recovered by decompressing the extracted compressed data and substituting the replaced synonyms with their original synonyms.

In this paper, an improved audio steganography method for covert wireless communication is proposed by incorporating the variable low bit coding with the embedding intervals. Different from the steganography methods based on Voice over Internet Protocol (VoIP) [30], the proposed method will be applied to nonreal-time audio data in covert wireless communication, in which the hiding capacity, instantaneity, and security are concerned. The method has good performance in terms of the embedding rate, hiding capacity, and invisibility.

The rest of the paper is organized as follows. The variable low bit coding and the interval setting are described in Section 2. In Section 3, the proposed method is presented in details. The experiments and results analysis are shown in Section 4. The conclusion is in Section 5.

2.1. Variable Low Bit Coding

Variable low bit coding is an improved LSB method, which can increase the capacity. The middle range of data represents the silence. Supposing that the audio file is sampled every 16 bits, the range of audio data is -32768~32767. When the audio data is zero, the sound is silent. Suppose that the audio file is sampled every 8 bits; the range of audio data is from 0 to 255. The middle range of data is 128 and the sound is silent in that range. Because embedding data into the silent sound will reveal the secret data, the data cannot be embedded in the middle range. By calculating the standard level, two thresholds and are set. If the amplitude value is smaller than , the secret data will not be embedded; if it is between and , then one bit is used for data embedding; if it is bigger than , then two bits are used for data embedding.

2.2. The Interval Setting

Parameter is set as the interval for embedding secret message into the audio file, which means that bits of secret message are embedded into the sample while the interval is R.

Figure 1 shows embedding of the secret message with different values of . Suppose that the audio file is sampled every 8 bits. Figure 1(a) shows that when , the traditional LSB method is used. The bits of the secret message are embedded into the LSB of each sample. It is obvious that 1 byte message can be embedded into 8 bytes of audio signal. Figure 1(b) illustrates that when , 2 bits of secret message can be embedded into the current sample while the interval is one sample. In this case, 1 byte of secret message is embedded into 7 bytes of the audio file. Figure 1(c) illustrates that the method can embed 3 bits of the secret message into the current sample while the intervals are two samples, when . Thus 3 bytes of the secret message can be embedded into 22 bytes of the audio data.

3. The Proposed Method

In this section, a new audio steganography method is described in details. The proposed method can adaptively select the embedding location and embedding bits through setting the interval and variable low bit coding. In this method, wav files are used as the audio files, because the wav file format is a subset of Microsoft’s RIFF specification for the storage of multimedia files, which are widely used in Windows Operating System. A wav file includes two parts, the header of the audio file and the audio data. The first 44 bytes of the audio file are the header and the rest are the data. Because the header is constant, the secret data should be embedded into the audio data, not into the header. The framework of the proposed method is shown in Figure 2.

3.1. Improved Variable Low Bit Coding

The LSB method is to embed secret data only in the LSB. The variable low bit coding method improves the lowest bit coding and can increase embedding capacity. In this paper, the variable low bit coding is improved by combining the embedding interval, which means that not every audio sample data is used for embedding the secret information. The 8 bits mono wav files are selected as the carriers, whose sampling frequency is 11.025 kHz. Two thresholds and are set. If the absolute value of the difference between 128 and the audio data is less than , then one bit of LSB is used for data embedding; if it is between and , then two bits are used for data embedding; if it is more than , then three bits are used for data embedding.

3.2. The Embedding Procedure

Parameter is set as the interval for embedding the secret message into the audio file, which is shown in Figure 1. According to the improved variable low bit coding method, and are the thresholds, which can select the embedding bits of a sample. Therefore, combining the interval with the thresholds, the proposed method can adaptively select the embedding location and the embedding bits. The embedding procedure is shown in Figure 3.

The detail steps are presented as follows:Input: It is a cover audio file and a secret message S.Output: It is a stego cover audio file .Step 1: Read a cover audio file in binary form.Step 2: The converted binary audio file is sampled every 8 bits. Put it into an array .Step 3: Input the secret message to be embedded.Step 4: Encrypt with AES algorithm to B, and then put into an array in the form of binary sequence.Step 5: Calculate the length of as L.Step 6: Set the thresholds and according to the middle range of the audio data.Step 7: Put the absolute value of the difference between the audio data and 128 into an array .For  : ;//Compare the relationship between and the thresholds , .If  , then set and the LSB of are replaced by the being embedded messages. Put into a key array ;Else if    &&  , then set and the 2 LSB of are replaced by the being embedded messages. Put into a key array , ;Else set and the 3 LSB of are replaced by the being embedded messages. Put into a key array , .End.Step 8: The modified audio samples are formed into the stego audio signal , and then send , , , and as the hiding key to the receiver.

3.3. The Extracting Procedure

The extracting procedure is an inverse process of the embedding procedure and is shown in Figure 4.

The detail steps are described as follows:Input: It is the stego cover audio file .Output: It is the secret message.Step 1: Read the stego audio file in binary form.Step 2: The converted binary audio file is sampled every 8 bits and is put into an array .Step 3: Extract the secret message according to the hiding key arrays , , and , and the secret message is put into an array .Calculate the length of , , and as , , and ,For  ;EndFor  ;EndFor  ;;;EndStep 4: Decrypt the bits of to obtain the corresponding secret information .

4. Experiments and Results Analysis

For implementing the program, the programming techniques based on socket and multithread are used, Microsoft Visual C++ 2013 is used to compile and assemble, and Windows 7 is the operating platform.

The program is composed of two parts, the ordinary wireless communication and the application of steganography. The sending node and the receiving node can transmit the audio data to embed and extract the secret information. For simplicity, both the sending node and the receiving node are implemented in one program. The communicating parties just need to install the program to their PC and enter the host’s IP, and then they can communicate with each other by hiding information. Because the wav format is widely used, we use the wav-audio files as the steganography carriers. We collected 50 wav files with mono from the WSNs. The sampling rate is 11.025 KHz with 8 bits per sample. The size of the audio files varies from 39KB to 652KB.

4.1. An Example

The following is an example to illustrate the embedding and extracting procedures. Suppose that the encrypted message to be embedded is “10011011”; the cover audio file is put into the array after sampling and quantization. The cover audio data is shown in Table 1.

Setting , , according to the embedding method, if , , then LSB of are replaced by “1”; putting “45” into the array ; if , , then 3 LSB of are replaced by “001”; putting “46” to the array ; if , , then 2 LSB of are replaced by “10”; putting “49” into the array ; if , , then 2 LSB of are replaced by “11”; putting “51” into the array . The stego audio data is shown in Table 2.

When the receiver obtains the stego audio file and the hiding key, he can extract the secret message. The extracted secret bits are put into the array .

According to the array , the receiver can obtain .

According to the array , the receiver can obtain the following:

According to the array , the receiver can obtain the following:Finally, the receiver can obtain the message “10011011” and then decrypt it to obtain the secret message.

4.2. Selecting the Thresholds

According to the embedding procedure, two thresholds and are set based on the middle range 128 to select the embedding bits. If the absolute value of the difference between the audio data and 128 is lower than , one bit is embedded; if it is between and , two bits are embedded; if it is more than , three bits are used for embedding. Therefore, the thresholds and will influence the embedding capacity and the quality of stego audio. should not be too big. If is very big, the method may become the traditional LSB method. In the experiments, the embedding rate is set to 0.1bps. When , 20, and 30, compare the value of signal-to-noise ratio (SNR) under different , which is shown in Figure 5. As we can see in Figure 5, when , the value of SNR grows slowly after ; when , the growth of the value of SNR is slower after , and when , the value of SNR grows very slowly. Therefore, in the simulation experiment, we set and .

4.3. The Embedding Rate

The embedding rate means that the number of the secret information can be embedded in a byte of cover audio data. For instance, in the LSB method, 1 byte of secret message should be embedded in 8 bytes of carrier data; thus the embedding rate is 12.5%. In the proposed method, the embedding location and embedding bits are selected adaptively, and thus the embedding rate is variable. When , it is 12.5%; when , it is 14.3%; when , it is 13.6%. Therefore, the average embedding rate is 13.5%. In Table 3, the embedding rate of the proposed method has been compared with the other LSB methods’.

4.4. The Embedding Capacity

The proposed method is an adaptive embedding algorithm and the embedding capacity is related to the thresholds and , the size of wav-audio file, the sampling rate, and the quantization value. Because audio data is one of the most important covers in wireless communication, there is a large amount of audio data. Thereby the embedding capacity is considerable.

In variable low bit coding methods, the thresholds determine the embedding capacity. If most of the audio data are around 128, the embedding capacity will be very low, while in the proposed method, even though most of the audio data are around 128, the embedding capacity is higher than the LSB methods.

4.5. Imperceptibility Analysis

SNR, mean square error (MSE), bit error rate (BER), and the waveform are widely used to evaluate the imperceptibility of the audio steganography methods. Suppose is the original audio, y is the stego audio, and is the sampling point number of and y.

SNR is calculated by (3). Higher SNR means better invisibility of the algorithm.MSE is defined as the mean square error between the cover audio and the stego audio. MSE can measure the distortion in the audio and it is calculated by (4). Lower MSE means better performance of the algorithm.BER describes the ratio of the modified bit number of the stego audio and the bit number of the original audio. Lower BER means better imperceptibility of the algorithm. BER is calculated byTable 4 shows the average value of SNR, MSE, and BER at different embedding rates. The results indicate that, at different embedding rates, SNR is kept at a high value but MSE and BER are kept at a low value. And in Figure 6, the proposed method has been compared with the LSB method [19], Variable Low Bit Coding method [20], and bit selection method [22] in SNR, MSE, and BER. In Figure 6(a), the SNR value of the proposed method is lower than [19] and higher than [20, 22]. This means that the invisibility of the proposed method is lower than [19] and better than [20, 22]. In Figure 6(b), the MSE value of the proposed method is the same as [19] and lower than [20, 22]. This means that the proposed method has better performance than [20, 22] and the same performance with [19]. In Figure 6(c), the BER value of the four methods is the same and this means that they have the same imperceptibility. In a word, the results demonstrate that the proposed method has good imperceptibility.

Figure 7 shows the waveforms of the original audio and the stego audio when the embedding rate is 0.1 bps. Obviously, the difference between the original audio and the stego audio is very tiny, which also means that the imperceptibility of the proposed method is satisfying.

5. Conclusion

An adaptive audio steganography method for wireless communication has been presented based on interval and the variable low bit coding in this paper. The interval for embedding secret message into audio file and the thresholds and in variable low bit coding are used for selecting the embedding location and the embedding bits adaptively. The embedding capacity and the embedding rate are variable. The proposed method is simple and fast, which can meet the instantaneity of wireless communication. In the future, we should investigate another low bit-rate speech codec applicable to wireless communication and design new steganographic algorithms.

Data Availability

The [software code and data] data used to support the findings of this study are available from the corresponding author upon request. Correspondence should be addressed to Yuling Liu: [email protected].

Conflicts of Interest

The four authors, Guojiang Xin, Yuling Liu, Ting Yang, and Yu Cao, all declare that there are no conflicts of interest regarding the publication of this paper.

Acknowledgments

This work was supported in part by the National Natural Science Foundation of China under Grant no. 61103215 and Hunan Provincial Natural Science Foundation of China under Grant no. 2018JJ2062.