Research Article  Open Access
Detecting Steganography of Adaptive Multirate Speech with Unknown Embedding Rate
Abstract
Steganalysis of adaptive multirate (AMR) speech is a significant research topic for preventing cybercrimes based on steganography in mobile speech services. Differing from the stateoftheart works, this paper focuses on steganalysis of AMR speech with unknown embedding rate, where we present three schemes based on supportvectormachine to address the concern. The first two schemes evolve from the existing image steganalysis schemes, which adopt different global classifiers. One is trained on a comprehensive speech sample set including original samples and steganographic samples with various embedding rates, while the other is trained on a particular speech sample set containing original samples and steganographic samples with uniform distributions of embedded information. Further, we present a hybrid steganalysis scheme, which employs Dempster–Shafer theory (DST) to fuse all the evidence from multiple specific classifiers and provide a synthesized detection result. All the steganalysis schemes are evaluated using the wellselected feature set based on statistical characteristics of pulse pairs and compared with the optimal steganalysis that adopts specialized classifiers for corresponding embedding rates. The experimental results demonstrate that all the three steganalysis schemes are feasible and effective for detecting the existing steganographic methods with unknown embedding rates in AMR speech streams, while the DSTbased scheme outperforms the others overall.
1. Introduction
Steganography is an ancient but effective technique for covert communications through hiding confidential messages into seemingly innocent carriers with imperceptible distortion. Although its history can date back to 440 BC [1], its candidate carriers have been ceaselessly evolving with the elapsing of years [2]. Over the last years, the steganographic carriers have developed from image [3, 4] to almost all media forms (e.g., video [5, 6], audio [7, 8], text [9, 10], network protocol [11, 12], and Voice over IP [13–16]). However, steganography is a doubleedged sword. Illegal usage of this technique would facilitate cybercrime activities and thereby pose a great threat to information security. Thus, its countermeasure, steganalysis, has been also attracting considerable attention [17–25], whose purpose is to detect potential steganographic behaviors effectively.
In today’s mobile world, adaptive multirate (AMR) codec has become a wellknown and important compression standard for speech coding and been widely employed in not only 3G and 4G speech services [26–28] but also various mobile instant messaging apps (such as WhatsApp, Snapchat, LINE, and WeChat). Moreover, it is also a popular file format for storing AMRencoded spoken audio supported by almost all mobile communication devices. Due to its increasing popularity and broad influence in mobile communications, AMR speech is spontaneously considered as an ideal carrier by the steganographic research community, and some relevant studies have been successfully performed [29–33].
AMR is a typical codec based on an algebraic codeexcited linear prediction algorithm, in which algebraic codebook indices (ACIs), also called fixed codebook indices (FCIs), occupy a large percentage of each speech frame [26–28]. Taking the AMR speech codec at 12.2 kbps mode [28], for example, 140 bits out of 244 frame bits is allocated to FCIs, suggesting that FCIs account for a large proportion (57.38%) of all frame bits [33]. Therefore, they are popularly regarded as nice candidates for steganographic carriers in the existing studies [29–33]. Geiser and Vary [29] first incorporated information hiding into speech coding of the AMR codec by modifying the fixedcodebooksearch algorithm. Specifically, two secret bits can be hidden into a track pulse through limiting the searching range of the second FCI to two of eight candidate values. Their experimental results demonstrate that this method can offer a steganographic bandwidth of 2 kbit/s for the AMR speech codec at 12.2 kbps mode, while guaranteeing an imperceptible impact on speech quality and fairly small computational complexity. Moreover, following the similar idea, Miao et al. [30] proposed an adaptive suboptimal pulse combination constrained method for steganography in the AMR speech stream. Their main advantage over the previous method is enabling regulation of the steganographic capacity by introducing an embedding factor . For example, for the AMR speech codec at 12.2 kbps mode, can be typically set as 1, 2, or 4, so the steganographic bandwidths are correspondingly 1, 2, or 3 kbit/s [32, 33]. It has been demonstrated that, by choosing a befitting , this method can achieve a nice tradeoff between the distortion of speech quality and the embedding capacity [30].
To prevent potential cybercrimes based on the above steganographic methods, some steganalysis studies have accordingly been conducted. Miao et al. [31] first presented two steganalysis methods for AMR speech. One is called Markovbased method that adopts Markov transition probabilities to evaluate the relationship between pulse positions in each track, while the other is Entropybased method that employs the joint entropy and the conditional entropy to measure the uncertainty of pulse positions [31]. However, the above two kinds of statistical features are not accurate enough for characterizing AMR speech, because they ignore the fact that the pulse positions may often be interchanged in the AMR encoding process [33]. Moreover, Ren et al. [32] presented a steganalysis method called FastSPP, which employs probabilities of same pulse positions (SPP) as the features to detect the existing steganographic methods [29, 30]. However, the SPP features only reflect the distributions of two trackpulses being in the same position, which are not comprehensive enough to characterize AMR speech [33]. Particularly, if a steganographic method designedly abandons the trackpulses with the same positions and the ones that would be the same after the embedding operation, FastSPP could not detect any abnormalities [33]. Therefore, in our previous work [33], we presented more accurate and more complete features for steganalysis of AMR speech. To avoid the impact induced by possible interchange of pulse positions in each track, we employ the statistical features of pulse pairs to characterize AMR speech, including the probability distributions of pulse pairs reflecting the longterm distribution of speech signals, Markov transition probabilities of pulse pairs depicting the shortterm invariant characteristic of speech signals, and joint probability matrices of pulse pairs characterizing the tracktotrack correlation [33]. Moreover, to optimize the feature set as well as cut down the dimension, a feature selection mechanism using adaptive boosting (AdaBoost) [34–38] is designed. Employing the selected optimal feature set, a supportvectormachine (SVM) based steganalysis of AMR speech was presented. The experimental results show that the proposed method significantly outperforms the previous ones.
However, all the above steganalysis methods assume that the embedding rate (also called the usage rate of the cover, which is the ratio between the practical embedded bits and the total number of cover bits) of steganographic samples in a given test set is exactly known. In other words, they generally train specific classifiers for steganographic samples with predefined embedding rates, and each specialized classifier is expected to detect the steganographic samples with the corresponding embedding rate. Unfortunately, in practice, we usually cannot ascertain whether the steganographic operation has been performed on a given sample, let alone knowing the concrete embedding rate. Thus, it is necessary and significant to develop detection technique for steganography with unknown embedding rate [39–41]. To the best of our knowledge, this work in this paper is the first one dedicated to address the concern in the speech steganalysis field. In the image steganalysis field, however, some pioneer researchers have presented two useful schemes for detecting image steganography with unknown embedding rate. Both the two schemes adopt global classifiers based on a machinelearning algorithm (e.g., SVM) as the detectors, but the components of their training set are different. Specifically, the training set of the first scheme includes original (untouched) samples and steganographic samples with various embedding rates [40, 41], while that of the other one consists of original samples and steganographic samples with uniform distributions of embedded data [40]. In this work, we would like to attempt to first extend the two existing schemes to AMR speech steganalysis with unknown embedding rate employing the stateoftheart steganalysis features presented in our recent work [33]. Besides, incorporating with Dempster–Shafer theory (DST) [42, 43], we further present a hybrid steganalysis scheme for AMR speech based steganography with unknown embedding rate. DST, also called evidence theory, is a wellestablished framework for uncertain reasoning, which can fuse available evidence from different sources and achieve a level of belief (confidence; trust) by considering all of them [42–46]. The main idea behind the presented steganalysis scheme is employing an algorithm based on DST to combine all the evidence from a set of classifiers intended for detecting steganographic approaches with specific embedding rates and accordingly providing a synthesized judgement for having or not having hidden information. All the three steganalysis schemes are evaluated with a great number of AMRencoded speech samples and compared with the optimal steganalysis that uses every specialized classifier to detect the steganography with the corresponding embedding rate. The experimental results show that all these steganalysis schemes are feasible and efficient for detecting the stateoftheart steganographic methods with unknown embedding rates in AMR speech streams, while the DSTbased scheme can achieve better detection performance than the other ones.
The remaining of this paper is organized as follows. To make this paper selfcontained, Section 2 first reviews the stateoftheart steganalysis features based on statistical characteristics of pulse pairs. Section 3 presents the three steganalysis schemes for detecting AMR speech based steganography with unknown embedding rate. Section 4 evaluates the performance of the three steganalysis schemes by a set of comprehensive experiments, which is followed by concluding remarks given in Section 5.
2. Steganalysis Features Based on Statistical Characteristics of Pulse Pairs
In this work, all the presented steganalysis schemes would adopt the stateoftheart detection features based on statistical characteristics of pulse pairs for AMR speech, which consists of long term features, shortterm features, and tracktotrack features [33].
The probability distributions of the pulse pairs are employed to depict the longterm features of AMR speech. Assume that the given AMR speech sample to be detected has subframes and each subframe contains tracks. For the th track in the th subframe, two pulse positions as a pulse pair can be extracted. For a pulse pair , its probability (denoted by ) appearing in all subframes can be determined as follows: where “&” is the binary AND operation, “” is the binary OR operation, and is a characteristic function defined as follows:
Let the number of candidate positions for every pulse in each track be ; the number of the possible pulse pairs (denoted by ) is
Therefore, there are pulse pairs in each subframe. That is to say, the dimension of the longterm feature set (LTFS) for pulse pairs is .
According to the shortterm invariance of speech signals [47], the pulse pair of a track in the current subframe is bound to have a strong correlation with the one of the same track in the prior subframe [33]. In this sense, for the th pulse pairs (i.e., the pulse pairs of the th tracks) in all subframes, the sequence of pulseposition pairs can be considered as a Markov chain. Accordingly, the Markov transition matrix (MTM) can be employed to describe the transitive correlation of pulsepair states in the given track. Moreover, as a firstorder Markov chain, satisfiesIn the th tracks of all subframes, the probability that the pulse pair occurs after the pulse pair isFurther, the MTM for the th track (denoted by ) can be determined as follows:where is the number of all possible pulseposition pairs for the th track that can be determined as (3); is the th possible pulseposition pair for the th track, where and are the potential pulse positions for the th track. Moreover, assume that there are candidate positions for each pulse; , , and satisfy the following relation:
Since there are possible pulseposition pairs in each track, the size of each MTM is . Taking the MTMs of all tracks into account, the dimension of the feature set would be very large. However, the characteristics of all the MTMs are similar. Therefore, we often adopt the average Markov transition probabilities (MTPs) as the steganalysis features instead. Apparently, the average MTM (denoted by ) is determined as
Accordingly, the dimension of the shortterm feature set (STFS) for pulse pairs is .
Furthermore, the joint probability matrices of the pulse pairs in different tracks are employed to characterize the tracktotrack features. To be specific, for the pulse pair of the th track and the one of the th track , the joint probability matrix (JPM) iswhere is the number of all possible pulseposition pairs for the th track that can be determined by (3); is the th possible pulseposition pair for the th (th) track; and is the joint probability of and . Specifically, the joint probability of the pulseposition pair in the th track and the pulseposition pair in the th track can be determined as follows:where is the number of the subframes, () is the pulse pair in the th (th) track of the th subframe , is a characteristic function defined as (2), and “&” is the binary AND operation.
Like STFS above, we adopt the average JPM as the tracktotrack feature set (TTFS) instead of all JPMs to reduce the computational complexity. Specifically, the average JPM (denoted by ) is
Apparently, the dimension of the TTFS is . Accordingly, the total dimension of all the three feature sets is . Taking the AMR speech codec at 12.2 kbps mode as an example, there are five tracks in each subframe (i.e., ), where two pulses share eight candidate positions, that is, . Thus, there are pulse pairs in each track, and the total dimension of all feature sets is 2772. These features are still too large to be directly adopted in the machinelearning based steganalysis scheme, since veryhighdimensional features would not only cause huge computational costs in the detection phase but also be more likely to induce overfitting in the training phase [33]. Thus, a feature selection mechanism based on AdaBoost [34–38] is employed to optimize the feature set as well as reduce the dimension. In the previous work [33], by this mechanism a reduced feature set with the 498 most effective features is obtained for the AMR speech codec at 12.2 kbps mode, of which the composition is shown in Table 1. Given that the excellent effectiveness of the selected feature set for steganalysis of AMR speech has been verified, we directly employ it in this paper.

3. Steganalysis Schemes for Detecting AMR Speech Steganography with Unknown Embedding Rate
In this section, we present three steganalysis schemes for detecting AMR speech based steganography with unknown embedding rate employing SVM, which is a wellknown machinelearning tool with excellent performance on classification [48–53] and popularly employed in the steganalysis field [17–20, 24, 25, 33]. The first two schemes are extended from the existing image steganalysis schemes [40–42], which both employ global classifiers to detect the steganography but adopt different training sets. As depicted in Figures 1 and 2, the first scheme trains the global classifier using a comprehensive speech sample set, including original samples and steganographic samples with various embedding rates, while the second one adopts a particular speech sample set, consisting of original samples and steganographic samples with uniform distributions of embedded data, to train the global classifier. For ease of description, we denote the first scheme as GCM, meaning that it trains the global classifier on mixed samples with various embedding rates, and the second scheme as GCU, meaning that it trains the global classifier on particular samples with uniform distributions of embedded data. In this work, for each AMR speech based steganographic method, the training set of GCM involves the steganographic samples with the embedding rates from 10% to 100%. Moreover, to obtain the steganographic AMR speech samples with uniform distributions of embedded data for GCU, we choose the tracks for hiding information in each subframe in a uniform random manner during the steganographic processes.
In addition, we further present a steganalysis scheme based on Dempster–Shafer theory (DST) for AMR speech based steganography with unknown embedding rate, as shown in Figure 3. To make the paper selfcontained, we first review DST briefly. DST is a wellestablished mathematical theory of evidence first presented by Dempster [42] and Shafer [43], which can combine the evidence from different sources to obtain the probability of a certain event [43]. Owing to its powerful reasoning function based on evidence combination, DST has been popularly employed in many fields, such as information fusion [44], classification [45], and intrusion detection [46].
Generally, DST is constructed on a finite set of possible elements (denoted by ) under consideration, called a frame of discernment. Note that is exhaustive, and all elements in are mutually exclusive. Let be the set including all possible subsets of . A mass function for assigning a probability mass to each element, also called basic probability assignment, is defined as follows:where is the empty set. Each nonempty subset of is called a focal element, and its mass function represents the exact belief for the proposition described by . Further, the belief function for a subset of Θ, denoted by , is the sum of the mass values of all its subsets; namely,
The plausibility function for a subset of , denoted by , is the sum of the mass values of all the subsets of that intersect ; namely,
Moreover, DST provides a combination rule to obtain a synthesized belief value for an element by fusing the evidence from different sources. Formally, assume that are mass functions for a subset of from different evidence, the combination rule can be stated as follows:where is a conflict factor that measures the degree of conflict for all the evidence and can be determined as follows:
Note that if , all the available evidence is highly contradictory and thereby cannot be directly combined.
In our work, the frame of discernment for detecting AMR speech based steganography with unknown embedding rate is defined as , where and represent the cover (original) and steganographic samples, respectively, and accordingly, . As shown in Figure 3, we adopt the specific SVMbased classifiers for the embedding rates from 10% to 100% as ten independent evidence sources. That is to say, there are ten mass functions from the specific SVMbased classifiers for various embedding rates. Specifically, the th mass function from the classifier for the embedding rate of 10% × is defined as follows:where () is the confidence probability for the test sample belonging to the cover (steganographic) classification, offered by the SVMbased classifier for the embedding rate of 10% × .
According to (15), we can get
Incorporating (13) and (14), we can further obtain
Thus, we can finally make a decision by comparing and . That is, for a test sample, its classification (denoted by ) can be determined as follows:
4. Performance Evaluation and Analysis
In this paper, all the SVMbased classifiers are implemented employing LibSVM [49], a popular opensource software library for SVM. Specifically, the classifiers are constructed on the linear SVM (Cstyle) with RBF kernel, in which the default parameters are employed, that is, and . Moreover, we collect a total of 3366 tensecond speech samples from audio materials for language learning, of which the components are shown in Table 2. Without loss of generality, we typically choose the AMR codec at 12.2 kbps mode as the cover codec. In the experiments, all steganalysis schemes are evaluated on through detecting the stateoftheart steganographic methods, namely, Geiser’s method [29] and Miao’s methods at the modes of = 1, 2, and 4 [30]. Prior to the steganographic experiments, we randomly select a half (1683) of the total speech samples as the cover sample set for training (CSST) and take the remaining samples as the cover sample set for detection (CSSD). In the steganographic experiments, the embedded messages are all randomly produced. For the three steganalysis schemes, we define their training sets as follows:(i)The training set of the first scheme (GCM): for each steganographic method, the training set includes 1400 speech samples randomly selected from CSST and 1400 mixed steganographic speech samples at the embedding rates from 10% to 100%, where there are 140 speech samples at each embedding rate.(ii)The training set of the second scheme (GCU): for each steganographic method, the training set includes 1400 speech samples randomly selected from CSST and 1400 steganographic speech samples with uniform distributions of embedded messages.(iii)The training sets of the third scheme (DSTbased scheme): for each steganographic method, it is necessary to train the specific classifiers for different embedding rates. Accordingly, for each embedding rate, a training set needs to be created, which includes 1400 speech samples randomly selected from CSST and 1400 samples generated by performing the given steganographic method at the corresponding embedding rate.

In addition, to evaluate the steganalysis performance at the various embedding rates from 10% to 100%, we create ten detection sample sets for each steganographic method. Specifically, for each embedding rate, the detection sample set consists of 1400 speech samples randomly chosen from CSSD and 1400 speech samples generated by performing the given steganographic method at the corresponding embedding rate. Further, we evaluate the performance of the three steganalysis schemes by comparing them with the steganalysis based on specific classifiers (SCs) [33]. In all steganalysis experiments, we make the statistical analyses on accuracy (ACC, the proportion of true detection results), false positive rate (FPR, the proportion of false positives out of all negatives), and false negative rate (FNR, the proportion of false negatives out of all positives).
Figures 4, 5, 6, and 7, respectively, show the experimental results of detecting all the four steganographic methods for the tensecond speech samples at the embedding rates from 10% to 100%, from which we can learn that all the three steganalysis schemes in this paper are feasible and effective, while there are some differences in their detection performance. To be specific, the DSTbased scheme outperforms GCU and GCM on the whole as also shown in Tables 3–6, since the detection accuracies of the DSTbased scheme are better than the others in most cases and closer to those of the scheme based on SCs overall. Moreover, the FPRs of the DSTbased scheme are smaller than the others in any case. By the way, for a given steganographic method, the FNRs of each steganalysis scheme presented in this paper are almost the same at any embedding rate, since each scheme adopts the identical classifier to detect the cover samples. In the cases of the embedding rates smaller than 40%, some detection accuracies of the DSTbased scheme are very slightly lower than GCU or GCM. The main reason behind this phenomenon is that the detection accuracies of the specific classifiers are relatively low and thereby more likely make the evidence from them highly contradictory. Overall, since the embedding capacities of tensecond speech samples under the embedding rates lower than 40% are very small, the detection performance of all the steganalysis schemes is not so good (particularly, the accuracies are lower than 80% for Geiser’s method and Miao’s methods at the modes of = 1 and 2). In this sense, how to further improve the steganalysis performance for relatively low embedding rates is still a question worthy of study.




(a) Statistical results of ACC
(b) Statistical results of FPR
(c) Statistical results of FNR
(a) Statistical results of ACC
(b) Statistical results of FPR
(c) Statistical results of FNR
(a) Statistical results of ACC
(b) Statistical results of FPR
(c) Statistical results of FNR
(a) Statistical results of ACC
(b) Statistical results of FPR
(c) Statistical results of FNR
In addition, to comprehensively evaluate the performance of the presented schemes for detecting steganographic methods at variable embedding rates, we prepare a mixed detection sample set for each steganographic method, which consists of 1400 speech samples randomly chosen from CSSD and 140 steganographic samples generated by performing the given steganographic method at each embedding rate from 10% to 100%. Figure 8 shows the statistical results of the steganalysis experiments. From these charts, we can learn that all the presented three schemes can achieve relatively good accuracies for detecting the existing steganographic methods. Specifically, for Geiser’s method, the accuracies are more than 79%; for Miao’s method , the accuracies more than 75%; for Miao’s method , the accuracies more than 80%; and for Miao’s method , the accuracies more than 87%. In a word, the presented three schemes are effective for detecting the existing steganographic methods with any given embedding rates.
(a) Statistical results of ACC
(b) Statistical results of FPR
(c) Statistical results of FNR
To further assess the performance of the presented three steganalysis schemes and compare them with the steganalysis based on SCs, we draw receiveroperatingcharacteristic (ROC) curves for detecting all the stateoftheart steganographic methods at the typical embedding rates of 30%, 60%, and 100%, as shown in Figures 9, 10, 11, and 12, and calculate their areas under the curves (AUC), as shown in Table 7. The experimental results demonstrate again that the presented three steganalysis schemes are really feasible and effective for detecting the stateoftheart steganographic methods, while the DSTbased scheme can offer better detection performance than GCU and GCM overall.

(a) Embedding rate of 30%
(b) Embedding rate of 60%
(c) Embedding rate of 100%
(a) Embedding rate of 30%
(b) Embedding rate of 60%
(c) Embedding rate of 100%
(a) Embedding rate of 30%
(b) Embedding rate of 60%
(c) Embedding rate of 100%
(a) Embedding rate of 30%
(b) Embedding rate of 60%
(c) Embedding rate of 100%
5. Conclusions
Due to its increasing popularity and broad influence in mobile communications, AMR speech is spontaneously considered as an ideal carrier by the steganographic research community, and some relevant steganographic techniques have been successfully developed. However, AMR speech based steganography is a doubleedged sword. Illegal usage of this technique would facilitate cybercrime activities and thereby pose a great threat to information security. Thus, its countermeasure, steganalysis of AMR speech, has been also a significant problem worthy of study. Although some fruitful steganalysis studies for AMR speech have been conducted, all the stateoftheart methods deal with the problem under the assumption that the embedding rate of steganographic samples to be tested is exactly known, which is actually unpractical. Therefore, we are motivated to study steganalysis of AMR speech with unknown embedding rate in this paper. To address this problem, we came up with three different schemes based on SVM. The first two schemes are extended from the existing image steganalysis schemes, which both use global classifiers to detect the steganography but adopt different training sets. Specifically, the first scheme trains the global classifier on a comprehensive speech sample set including original samples and steganographic samples with various embedding rates, while the second one trains the global classifier on a particular speech sample set consisting of original samples and steganographic samples with uniform distributions of embedded data. Besides, we further presented the third hybrid steganalysis scheme based DST, which adopts DST to combine all the evidence from a set of specific classifiers and accordingly provide a synthesized decision for having or not having hidden information. All the three steganalysis schemes are evaluated employing the optimized feature set based on statistical characteristics of pulse pairs and compared with the optimal steganalysis that uses each specialized classifier to detect the steganography with the corresponding embedding rate. The experimental results demonstrate that all the presented steganalysis schemes are feasible and effective for detecting the existing steganographic methods with unknown embedding rates in AMR speech streams, while the DSTbased scheme can provide better performance than the others in most cases.
Conflicts of Interest
The authors declare that there are no conflicts of interest regarding the publication of this paper.
Acknowledgments
This work was supported in part by Natural Science Foundation of China under Grant nos. U1536115, 61302094, and U1405254, Program of China Scholarships Council under Grant no. 201507540001, Natural Science Foundation of Fujian Province of China under Grant no. 2014J01238, Program for New Century Excellent Talents in Fujian Province University under Grant no. MJK201623, Program for Outstanding Youth Scientific and Technological Talents in Fujian Province University under Grant no. MJK201554, Promotion Program for Young and MiddleAged Teacher in Science & Technology Research of Huaqiao University under Grant no. ZQNPY115, and Program for Science & Technology Innovation Teams and Leading Talents of Huaqiao University under Grant no. 2014KJTD13.
References
 N. Provos and P. Honeyman, “Hide and seek: an introduction to steganography,” IEEE Security and Privacy, vol. 99, no. 3, pp. 32–44, 2003. View at: Publisher Site  Google Scholar
 E. Zielińska, W. Mazurczyk, and K. Szczypiorski, “Trends in steganography,” Communications of the ACM, vol. 57, no. 3, pp. 86–95, 2014. View at: Publisher Site  Google Scholar
 A. Cheddad, J. Condell, K. Curran, and P. Mc Kevitt, “Digital image steganography: survey and analysis of current methods,” Signal Processing, vol. 90, no. 3, pp. 727–752, 2010. View at: Publisher Site  Google Scholar
 B. Li, M. Wang, X. Li, S. Tan, and J. Huang, “A strategy of clustering modification directions in spatial image steganography,” IEEE Transactions on Information Forensics and Security, vol. 10, no. 9, pp. 1905–1917, 2015. View at: Publisher Site  Google Scholar
 M. M. Sadek, A. S. Khalifa, and M. G. M. Mostafa, “Video steganography: a comprehensive review,” Multimedia Tools and Applications, vol. 74, no. 17, pp. 7063–7094, 2015. View at: Publisher Site  Google Scholar
 M. Ramalingam and N. A. M. Isa, “A datahiding technique using scenechange detection for video steganography,” Computers & Electrical Engineering, vol. 54, pp. 423–434, 2016. View at: Publisher Site  Google Scholar
 F. Djebbar, B. Ayad, K. A. Meraim, and H. Hamam, “Comparative study of digital audio steganography techniques,” Eurasip Journal on Audio, Speech, and Music Processing, vol. 2012, no. 1, article 25, 2012. View at: Publisher Site  Google Scholar
 G. Hua, J. Huang, Y. Q. Shi, J. Goh, and V. L. L. Thing, “Twenty years of digital audio watermarking  A comprehensive review,” Signal Processing, vol. 128, pp. 222–242, 2016. View at: Publisher Site  Google Scholar
 E. Satir and H. Isik, “A compressionbased text steganography method,” Journal of Systems and Software, vol. 85, no. 10, pp. 2385–2394, 2012. View at: Publisher Site  Google Scholar
 C.Y. Chang and S. Clark, “Practical linguistic steganography using contextual synonym substitution and a novel vertex coding method,” Computational Linguistics, vol. 40, no. 2, pp. 403–448, 2014. View at: Publisher Site  Google Scholar
 J. Lubacz, W. Mazurczyk, and K. Szczypiorski, “Principles and overview of network steganography,” IEEE Communications Magazine, vol. 52, no. 5, pp. 225–229, 2014. View at: Publisher Site  Google Scholar
 W. Mazurczyk, S. Wendzel, S. Zander, A. Houmansadr, and K. Szczypiorski, Information Hiding in Communication Networks: Fundamentals, Mechanisms, Applications, and Countermeasures, John Wiley & Sons, Inc., Hoboken, New Jersey, 2016. View at: Publisher Site
 W. Mazurczyk, “VoIP steganography and its detectiona survey,” ACM Computing Surveys, vol. 46, no. 2, article 20, 2013. View at: Publisher Site  Google Scholar
 H. Tian, J. Qin, S. Guo et al., “Improved adaptive partialmatching steganography for Voice over IP,” Computer Communications, vol. 70, pp. 95–108, 2015. View at: Publisher Site  Google Scholar
 H. Tian, J. Qin, Y. Huang et al., “Optimal matrix embedding for VoiceoverIP steganography,” Signal Processing, vol. 117, pp. 33–43, 2015. View at: Publisher Site  Google Scholar
 Y. Jiang, S. Tang, L. Zhang, M. Xiong, and Y. J. Yip, “Covert voice over internet protocol communications with packet loss based on fractal interpolation,” ACM Transactions on Multimedia Computing, Communications, and Applications, vol. 12, no. 4, article 54, pp. 1–20, 2016. View at: Publisher Site  Google Scholar
 A. Janicki, W. Mazurczyk, and K. Szczypiorski, “Steganalysis of transcoding steganography,” Annals of Telecommunications/Annales des Télécommunications, vol. 69, no. 78, pp. 449–460, 2014. View at: Publisher Site  Google Scholar
 Z. Xia, X. Wang, X. Sun, and B. Wang, “Steganalysis of least significant bit matching using multiorder differences,” Security and Communication Networks, vol. 7, no. 8, pp. 1283–1291, 2014. View at: Publisher Site  Google Scholar
 V. Holub and J. Fridrich, “Lowcomplexity features for JPEG steganalysis using undecimated DCT,” IEEE Transactions on Information Forensics and Security, vol. 10, no. 2, pp. 219–228, 2015. View at: Publisher Site  Google Scholar
 Z. Xia, X. Wang, X. Sun, Q. Liu, and N. Xiong, “Steganalysis of LSB matching using differences between nonadjacent pixels,” Multimedia Tools and Applications, vol. 75, no. 4, pp. 1947–1962, 2016. View at: Publisher Site  Google Scholar
 W. Tang, H. Li, W. Luo, and J. Huang, “Adaptive steganalysis based on embedding probabilities of pixels,” IEEE Transactions on Information Forensics and Security, vol. 11, no. 4, pp. 734–744, 2016. View at: Publisher Site  Google Scholar
 J. Yu, F. Li, H. Cheng, and X. Zhang, “Spatial steganalysis using contrast of residuals,” IEEE Signal Processing Letters, vol. 23, no. 7, pp. 989–992, 2016. View at: Publisher Site  Google Scholar
 T. Denemark, M. Boroumand, and J. Fridrich, “Steganalysis features for contentadaptive JPEG steganography,” IEEE Transactions on Information Forensics and Security, vol. 11, no. 8, pp. 1736–1746, 2016. View at: Publisher Site  Google Scholar
 H. Tian, Y. Wu, Y. Cai et al., “Distributed steganalysis of compressed speech,” Soft Computing, vol. 21, no. 3, pp. 795–804, 2017. View at: Google Scholar
 H. Tian, Y. Wu, C. C. Chang et al., “Steganalysis of analysisbysynthesis speech exploiting pulseposition distribution characteristics,” Security and Communication Networks, vol. 9, no. 15, pp. 2934–2944, 2016. View at: Publisher Site  Google Scholar
 3GPP/ETSI, “AMR speech codec: general description, version 10.0.0,” Technical Report TS 26 071, Sophia Antipolis Cedex, France, April 2011. View at: Google Scholar
 3GPP/ETSI., “Performance characterization of the adaptive multirate (AMR) speech codec,” Technical Report TR 126 975, Sophia Antipolis Cedex, France, January 2009. View at: Google Scholar
 3GPP/ETSI, “Digital cellular telecommunications system (phase 2+); Universal mobile telecommunications system (UMTS); LTE: mandatory speech codec speech processing functions; Adaptive multirate (AMR) speech codec; Transcoding functions (3GPP TS 26.090 version 13.0.0 Release 13),” Technical Report TR 126 090, Sophia Antipolis Cedex, France, January 2016. View at: Google Scholar
 B. Geiser and P. Vary, “High rate data hiding in ACELP speech codecs,” in Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP '08), pp. 4005–4008, Las Vegas, Nev, USA, April 2008. View at: Publisher Site  Google Scholar
 H. Miao, L. Huang, Z. Chen, W. Yang, and A. AlHawbani, “A new scheme for covert communication via 3G encoded speech,” Computers & Electrical Engineering, vol. 38, no. 6, pp. 1490–1501, 2012. View at: Publisher Site  Google Scholar
 H. Miao, L. Huang, Y. Shen, X. Lu, and Z. Chen, “Steganalysis of compressed speech based on Markov and entropy,” in Proceedings of the 12th International Workshop on DigitalForensics and Watermarking (IWDW), pp. 63–76, Auckland, New Zealand, Oct. 2013. View at: Publisher Site  Google Scholar
 Y. Ren, T. Cai, M. Tang, and L. Wang, “AMR steganalysis based on the probability of same pulse position,” IEEE Transactions on Information Forensics and Security, vol. 10, no. 9, pp. 1801–1811, 2015. View at: Publisher Site  Google Scholar
 H. Tian, Y. Wu, Y. Huang et al., “Steganalysis of adaptive multiRate speech using statistical characteristics of pulse pairs,” Signal Processing, vol. 134, pp. 9–22, 2017. View at: Publisher Site  Google Scholar
 Y. Freund, R. Schapire, and N. Abe, “A short introduction to boosting,” Journal of Japanese Society for Artificial Intelligence, vol. 14, pp. 771–780, 1999. View at: Google Scholar  MathSciNet
 X. Wen, L. Shao, Y. Xue, and W. Fang, “A rapid learning algorithm for vehicle classification,” Information Sciences, vol. 295, pp. 395–406, 2015. View at: Publisher Site  Google Scholar
 D. D. Le and S. Satoh, “Feature selection by adaboost for SVMbased face detection,” Information Technology Letters, vol. 3, pp. 183–186, 2004. View at: Google Scholar
 Y.J. Yeh and C.T. Hsu, “Online selection of tracking features using AdaBoost,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 19, no. 3, pp. 442–446, 2009. View at: Publisher Site  Google Scholar
 L. Guo, P.S. Ge, M.H. Zhang, L.H. Li, and Y.B. Zhao, “Pedestrian detection for intelligent transportation systems combining AdaBoost algorithm and support vector machine,” Expert Systems with Applications, vol. 39, no. 4, pp. 4274–4286, 2012. View at: Publisher Site  Google Scholar
 A. D. Ker, P. Bas, R. Böhme et al., “Moving steganography and steganalysis from the laboratory into the real world,” in Proceedings of the 1st ACM Workshop on Information Hiding and Multimedia Security, IH and MMSec 2013, pp. 45–58, France, June 2013. View at: Publisher Site  Google Scholar
 T. Pevny, “Detecting messages of unknown length,” in Proceedings of the Media Watermarking, Security, and Forensics III, vol. 7880, pp. 1–12, San Francisco Airport, California, USA, 2011. View at: Publisher Site  Google Scholar
 L. Marvel, B. Henz, and C. Boncelet, “A performance study of ±1 steganalysis employing a realistic operating scenario,” in Proceedings of the 2007 IEEE Military Communications Conference, pp. 1–7, USA, October 2007. View at: Publisher Site  Google Scholar
 A. P. Dempster, “Upper and lower probabilities induced by a multivalued mapping,” Annals of Mathematical Statistics, vol. 38, pp. 325–339, 1967. View at: Publisher Site  Google Scholar  MathSciNet
 G. Shafer, A Mathematical Theory of Evidence, Princeton University Press, Princeton, NJ, USA, 1976. View at: MathSciNet
 R. R. Murphy, “DempsterShafer theory for sensor fusion in autonomous mobile robots,” IEEE Transactions on Robotics and Automation, vol. 14, no. 2, pp. 197–206, 1998. View at: Publisher Site  Google Scholar
 N. R. Pal and S. Ghosh, “Some classification algorithms integrating DempsterShafer theory of evidence with the rank nearest neighbor rules,” IEEE Transactions on Systems, Man, and Cybernetics Part A:Systems and Humans., vol. 31, no. 1, pp. 59–66, 2001. View at: Publisher Site  Google Scholar
 T. M. Chen and V. Venkataramanan, “DempsterShafer theory for intrusion detection in ad hoc networks,” IEEE Internet Computing, vol. 9, no. 6, pp. 35–41, 2005. View at: Publisher Site  Google Scholar
 J. S. Perkell and D. H. Klatt, Invariance and Variability in Speech Processes, Lawrence Erlbaum Associates, Mahwah, New Jersey, USA, 1986.
 A. J. Smola and B. Schölkopf, “A tutorial on support vector regression,” Statistics and Computing, vol. 14, no. 3, pp. 199–222, 2004. View at: Publisher Site  Google Scholar  MathSciNet
 C. Chang and C. Lin, “LIBSVM: a Library for support vector machines,” ACM Transactions on Intelligent Systems and Technology, vol. 2, no. 3, article 27, 2011. View at: Publisher Site  Google Scholar
 B. Gu, V. S. Sheng, Z. Wang, D. Ho, S. Osman, and S. Li, “Incremental learning for νsupport vector regression,” Neural Networks, vol. 67, pp. 140–150, 2015. View at: Publisher Site  Google Scholar
 B. Gu, V. S. Sheng, K. Y. Tay, W. Romano, and S. Li, “Incremental support vector learning for ordinal regression,” IEEE Transactions on Neural Networks and Learning Systems, vol. 26, no. 7, pp. 1403–1416, 2015. View at: Publisher Site  Google Scholar  MathSciNet
 B. Gu, V. S. Sheng, and S. Li, “Biparameter space partition for costsensitive SVM,” in Proceedings of the 24th International Joint Conference on Artificial Intelligence, pp. 3532–3539, Buenos Aires, Argentina, July 2015. View at: Google Scholar
 B. Gu and V. S. Sheng, “A robust regularization path algorithm for vsupport vector classification,” IEEE Transactions on Neural Networks and Learning Systems, vol. 28, no. 5, pp. 1241–1248, 2017. View at: Publisher Site  Google Scholar
Copyright
Copyright © 2017 Hui Tian et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.