Audio Watermarking Algorithm with a Synchronization Mechanism Based on Spectrum Distribution

Wu, Qiuling; Ding, Runyu; Wei, Jiangchun

doi:https://doi.org/10.1155/2022/2617107

Security and Communication Networks

On this page

Abstract Introduction Conclusion Data Availability Conflicts of Interest Acknowledgments References Copyright Related Articles

Research Article | Open Access

Volume 2022 | Article ID 2617107 | https://doi.org/10.1155/2022/2617107

Audio Watermarking Algorithm with a Synchronization Mechanism Based on Spectrum Distribution

Qiuling Wu,¹Runyu Ding,¹and Jiangchun Wei²

Academic Editor: Rutvij Jhaveri

Received23 May 2022

Revised07 Aug 2022

Accepted20 Aug 2022

Published05 Sept 2022

Abstract

In order to solve the problem that it is difficult to extract the watermark when the carried audio is subjected to malicious attacks, audio watermarking algorithm based on spectrum distribution is proposed. An eigenvalue is designed to represent the spectrum distribution of the specified frequency-band, and then the binary watermark is embedded by adjusting the difference of the eigenvalues between two adjacent frequency-bands. The polarity of embedding depth is determined by comparing the values of the eigenvalues, which will minimize the modification about the audio so as to improve transparency. The binary watermark can be extracted blindly by judging the difference of the eigenvalues. The improved synchronization mechanism takes the small frame with the largest energy in the voiced frame as the synchronization mark to search the embedding and extracting location of the watermark, which makes the algorithm have strong robustness. Experimental results show that the proposed algorithm has large payload capacity, good transparency, low complexity, and strong robustness against most attacks.

1. Introduction

1.1. Related Works

The rapid development of Internet and computer technology has provided great convenience for people to spread various multimedia resources on the network, but the copyright protection of these multimedia resources has become more and more concerned with people. Digital watermarking technology plays an important role in solving copyright protection problems [1–3]. It uses a specific embedding algorithm to conceal a watermark that can prove the author’s copyright about multimedia resources. When a copyright dispute occurs, the author extracts the watermark from multimedia resources by using the extracting algorithm corresponding to the embedding algorithm to prove his ownership. Audio digital watermarking technology refers to a kind of security protection technology that embeds the watermark into the audio secretly without much impacting on the audio quality [4], so as to achieve the purposes of copyright tracking, integrity protection, content authentication, and recovery advertising timing.

According to different application purposes, audio digital watermarking can be divided into robust watermarking and fragile watermarking. Robust audio watermarking technology has strong robustness. When the carried audio is subjected to malicious attacks, it can still extract the watermark with a very small bit error rate (BER), so it is mainly used for audio copyright protection [5, 6]. Fragile audio watermarking technology means that the watermark is very sensitive to malicious attacks. It can accurately locate and even recover the tampered region of the audio, so it is mainly used for audio integrity protection [7, 8]. Most robust watermarking algorithms are developed in frequency-domain, and the signal processing algorithms mainly include discrete Fourier transform (DFT) [9], discrete cosines transform (DCT) [10, 11], and discrete wavelets transform (DWT) [12–15]. Megias et al. [16] presented a blind audio watermarking algorithm in the DFT domain for overcoming the synchronization attack. The algorithm embedded the watermark in the frequency domain and marked the embedding location in the time domain. However, the synchronization mark in the time domain was vulnerable to attack which may lead to the failure in the extracting process because the watermark’s location could not be searched. Tewari et al. [17] proposed an audio watermarking algorithm in the DCT domain. The algorithm selected the appropriate audio frame which would be processed by DCT according to the minimum energy threshold, and it embedded the watermark by modifying and quantifying the average energy of the DCT coefficient. This algorithm had strong robustness to MP3 attacks, but its robustness to other attacks needed to be improved. By introducing the spectrum shaping technology in autoregressive model into vector modulation, Hu and Hsu [18] proposed a blind audio watermarking algorithm in the DWT domain. The embedding depth of the algorithm was consistent with the auditory masking threshold and had high embedding capacity, but its robustness was poor. In order to extract the watermark blindly, many scholars use quantization index modulation (QIM) to design audio watermarking algorithm. Hwang et al. [2] presented an audio watermarking algorithm based on QIM and singular value decomposition (SVD). The algorithm directly applied SVD to stereo signal and embedded the watermark according to the ratio of the singular value. It had good robustness against signal processing operations, such as amplitude scaling, compression, and resampling.

In order to improve the robustness of the algorithm, many scholars design water-marking algorithms in a hybrid domain. Merrad and Saadi [19] proposed an audio robust watermarking technology in DWT and DCT hybrid domain according to the characteristics of strong correlation between two continuous samples. The algorithm had good robustness against random cropping, echo addition, amplitude scaling, and so on. Q. L. Wu and M. Wu [20] proposed an audio watermarking algorithm in the DWT-DCT domain. This algorithm embedded binary watermark into audio by modifying the average amplitude of the hybrid transformed coefficients according to the redundancy of the human auditory system. From the experimental results, it had good robustness against conventional signal processing attacks. However, it could not resist synchronization attacks, such as jittering and time scale modification (TSM), because it lacked synchronization mechanism. In a word, with the continuous development of audio watermarking research, scholars have developed more and more watermarking algorithms with different performance. However, most audio watermarking algorithms are robust against conventional signal processing operations, but their resistance to synchronization attacks needs to be improved, mainly because most algorithms lack effective synchronization mechanisms. Synchronization attack will seriously damage the structure of audio data, which will make it difficult to determine the exacting location of the watermark in the audio, resulting in the extraction failure. Therefore, how to overcome synchronization attacks has become a very challenging problem in watermarking research. The most common attack types are random clipping, jittering, TSM, and pitch shifting modification (PSM). TSM adjusts the playing time of the audio by changing its playing speed, that is, the overall playing time is compressed or expanded, while the sample rate is almost unchanged. PSM usually modifies the tone of the audio without changing its playback speed. Jittering means deleting or adding one sample every several samples in the audio. Random clipping refers to randomly cutting out several samples in the different parts of the audio. Jiang et al. [21] proposed an audio watermarking algorithm that indirectly realizes synchronization by using the audio frame sequence number based on the global characteristics of audio frames. This scheme is robust against most of signal processing operations and partial synchronization attacks. Liu et al. [22] proposed an audio watermarking algorithm to resist the synchronization attack. The algorithm constructed a logarithmic mean characteristic by using the frequency-domain coefficients and then uses the residuals of the two sets of characteristics to design the watermarking algorithm. The robustness of this algorithm against signal processing operations and partial synchronization attacks was better than most existing watermarking algorithms. Hu et al. [23] proposed an audio watermarking algorithm with synchronization mechanism based on lifting wavelet transform. The algorithm sorted and reconstructed the approximate coefficients to design the embedding and extracting rules according to the expected bit rate of the watermark. It had strong robustness to synchronization attacks, but its transparency was poor. Wu et al. [24] proposed an audio watermarking algorithm with an implicit synchronization mechanism based on SVD and the genetic algorithm (GA). The synchronization mechanism of this algorithm took the sampling point with the largest amplitude in the voiced frame as the synchronization mark to track the location of the watermark in the audio and achieved good results in resisting synchronization attacks. However, the synchronization mechanism used a single sampling point as the synchronization mark, which would lead to inaccuracy in the region where the watermark was located because the sampling point might no longer have the largest amplitude after the audio was attacked. In the research of the synchronization mechanism, the existing synchronization mechanisms mainly include exhaustive search, explicit synchronization [25], implicit synchronization, autocorrelation, and constant watermark. Each synchronization mechanism has its own advantages and disadvantages. It needs to be coordinated with the specific watermark embedding and extracting algorithm in order to give full play to the best performance of the algorithm.

1.2. Contributions

In the above introduction about the audio watermarking algorithm, we have known that the robustness of the algorithm is an important performance on the premise of ensuring good audio quality. After analyzing the characteristics of the audio in the time domain and the frequency domain, this paper proposes a robust watermarking algorithm based on spectrum distribution and improves the synchronization mechanism in reference [24]. The main contributions are as follows:(1)Propose a blind audio watermarking algorithm based on spectrum distribution. An eigenvalue that reflect the spectrum distribution of the audio in a narrow frequency-band is constructed, and then the difference of the eigenvalues between two adjacent frequency-bands is modified by adjusting the DCT coefficients. The embedding algorithm and the extracting algorithm are developed according to the corresponding relationship between the binary watermark and the difference of the eigenvalue. The proposed algorithm has good transparency, strong robustness, and blind extracting ability. In order to obtain good transparency, the polarity of the embedding depth is determined by comparing the values of the eigenvalues in two frequency-bands. The purpose of this is to minimize the modification of the DCT coefficient so as to ensure audio quality. Since the spectrum distribution of the carried audio does not change much under attack, embedding the watermark on the eigenvalue related to the spectral distribution makes this algorithm more robust. In addition, this algorithm can realize blind extraction, which will facilitate the application of this algorithm in practice.(2)Improve the synchronization mechanism in reference [24]. Since most of the audio content is concentrated on the voiced frame, the synchronization mechanism in reference [24] took the single sampling point with the largest amplitude in the voiced frame as the synchronization mark to design a synchronization mechanism. This method sometimes was inaccurate in determining the region where the watermark was located, mainly because the synchronization mark might not be the original sampling point with the largest amplitude after the audio was attacked. On this basis, the synchronization mechanism needs to be improved. The improved synchronization mechanism divides the voiced frame into many small frames and then uses the small frame with the maximum energy as the synchronization mark and selects a certain number of continuous small frames around it to form the embedding region.

The remainder of this paper is organized as follows: In Section 1, we will review related work of audio watermarking algorithms in recent years and then introduce our contributions in this paper. Section 2 describes the improved synchronization mechanism and shows the implementation steps in detail. Section 3 elaborates the principle of the proposed audio watermarking algorithm, and this section will be divided into two parts, including the procedure for embedding the watermark and for extracting the watermark. In Section 4, the implementation steps of the embedding algorithm and the extracting algorithm are set out in detail. Section 5 assesses the performance of this proposed algorithm and compares their performance with three related algorithms. Finally, Section 6 draws up the conclusion and gives the future research plan.

2. Synchronization Mechanism

Synchronization attack will seriously damage the structure of the audio data, which will lead to the extracting algorithm unable to accurately find the location of watermark in audio, so it is a very challenging type of attack [26, 27]. Therefore, it is particularly important to design a synchronization mechanism that can accurately search the location of the watermark in the audio. On the basis of the synchronization mechanism proposed in reference [24], it is improved to make the synchronization mechanism have a better performance. This method takes the small frame with the largest energy in a voiced frame as the synchronization mark and takes the audio data in a fixed region around this mark as the carrier for carrying the watermark. The specific steps are described as follows: Step 1: Convert the watermark into a matrix with rows and columns. is the length of the watermark, and . is the bit value, and . Step 2: Divide the audio into fragments and select the voiced frames with the largest energy from each audio fragment to carry the watermark, and the length of each voiced frame is . Step 3: divide each voiced frame into small frames with sample-points and calculate the energy of each small frame. Step 4: take the small frame with the largest energy as the synchronization mark and select small frames around it as the region to be used to carry the watermark.

According to the specific position of the synchronization mark in the voiced frame, it can be divided into the following three cases.(1)If the number of small frames before the synchronization mark is less than , it indicates that the synchronization mark is closer to the head of the voiced frame, and the watermark location should include consecutive small frames starting from the first small frame.(2)If the number of small frames after the synchronization mark is less than , it indicates that the synchronization mark is closer to the end of the voiced frame, and the watermark location should include consecutive small frames at the end.(3)In addition to the above two cases, the location where the watermark is located takes the synchronization mark as the benchmark and selects small frames forward and small frames backward.

After the above steps, the selected audio fragment with consecutive small frames is used to carry the watermark.

3. Principle of the Watermarking Algorithm

In this section, audio watermarking algorithm with a synchronization mechanism is developed. The proposed algorithm uses the improved synchronization mechanism to identify the embedding and extracting locations of the watermark. The embedding and extracting rules are developed based on spectrum distribution. It not only resists conventional signal processing attacks but also has excellent performance in resisting synchronization attacks.

3.1. Principle of Embedding Watermarks

DCT has strong “energy concentration” characteristics and good decorrelation, so it has been widely used in the field of image and audio signal processing. Suppose that is the original audio with sample-points, and it can be expressed as the following formula:where is the amplitude of the sample-point. Dividing into audio fragments and obtain the voiced frame with the largest energy with sample-points in each audio fragments. Use the improved synchronization mechanism in Section 2 to obtain the audio data for carrying the watermark, where is the position number of the audio data in the embedding region. Apply DCT on , as shown in the formulas (2) and (3).where is the component that its frequency is 0 Hz, and is the ^th harmonic component. The frequency of each harmonic component can be calculated by the following formula:where is the sampling frequency. The spectrum of the audio can be used to describe the proportion of each harmonic component in audio, which is related to the frequency and amplitude of harmonics. Divide into L fragments to obtain the frequency-bands . Each frequency-band contains spectrum lines, and , . The spectrum distribution function (SD) shown in the formula (5) is designed to represent the spectrum distribution of the frequency-band .where is the initial frequency of the frequency-band, is the frequency of the harmonic component, and . After the audio is attacked, it is assumed that the variation of the DCT coefficient is . Then, the new spectrum distribution can be shown in the following formula:

Since is very small, and is much larger than , so . That is, when the audio is attacked, SD only changes very little.

The following experiment can be used to verify this deduction. The experimental parameters are as follows: , , , , and . Four attack types are applied to the tested audio, respectively, including amplitude scaling, noise corruption with 20 dB, low-pass filtering with 4 kHz, and TSM with +5%. The four curves under attacks in Figure 1 are basically consistent with the original curve, which indicates that SD can be used to express the stability of the audio spectrum structure after being attacked. The experimental results are also consistent with the deduction from the formula (6). It can be seen that the spectrum structure has good stability after the audio is subjected to attacks, the watermark algorithm can be developed according to this feature.

According to the formula (5), the spectrum distributions of and can be calculated as and , then the average value can be expressed in the following formula:

The modified coefficients and can be expressed as the formulas (8) and (9).

According to formula (5), the spectrum distributions of can be calculated as , as shown in the following formula:

Similarly, the spectrum distribution of can be expressed as .

The difference of spectrum distribution in two adjacent frequency-bands can be described in the following formula:where is the embedding depth, and . can be set according to the following formula:

In order to prevent the serious degradation of audio quality caused by too much modification of the DCT coefficient, a threshold is set to judge whether the watermark can be embedded in the two adjacent frequency-bands. If , it indicates that two frequency-bands need be set as invalid and no watermark bit can be embedded in them. Otherwise, modify the DCT coefficients according to the following embedding rules.(1)If , set , then , , and . In order to improve the transparency of the algorithm, the polarity of must be adjusted according to and , as shown in formula (12).(2)If , set , then , and .

3.2. Principle of Extracting Watermarks

In the embedding process, the binary watermark can be embedded into two adjacent frequency-bands by adjusting the DCT coefficients. Therefore, the extracting rules is to extract the binary watermark by calculating the difference between the spectral distributions of two adjacent bands, which can be expressed by the following formula:

4. Implementation of the Proposed Algorithm

4.1. Procedure for the Embedding Watermark

Figure 2 shows the embedding diagram, and the detailed embedding steps are described as follows: Step 1: Convert the watermark into a binary matrix of Step 2: Divide the original audio into fragments. In each fragment, the voiced frame with the largest energy is selected to carry the watermark, and the length of the voiced frame is . Step 3: Divide each voiced frame into small frames with sample points and select the small frame with the largest energy as the synchronization mark. Step 4: Select consecutive small frames around the synchronization mark as the embedding location of the watermark. See Step 4 in Section 2 for details. Step 5: Apply DCT on to obtain DCT coefficients . Step 6: Starting from the spectrum line, select frequency-bands from , each frequency-band has spectrum lines. Step 7: Calculate , and of the two adjacent frequency-bands. Step 8: If , set these two frequency-bands as invalid and go to step 7, otherwise, go to step 9. Step 9: embed 1 bit watermark by modifying the DCT coefficients of these two frequency-bands according to the formulas (8), (9), and (12). Step10: Repeat step 7 to step 9 until bits watermark are embedded into the voiced frame. Step 11: Apply inverse discrete cosine transform (IDCT) on to obtain . Step 12: Repeat step 3 to step11, until all lines of binary watermark are embedded into the original audio. Step 13: Reconstruct all voice frames carrying the binary watermark to obtain the carried audio .

In order to improve the robustness of the algorithm, several voiced frames can be selected in each audio fragment to repeatedly carry the same line of binary watermark. For example, select three voiced frames with the largest energy in the same audio fragment to carry the watermark.

4.2. Procedure for the Extracting Watermark

Figure 3 shows the extracting diagram, and the detailed extracting steps are described as follows: Step 1: Divide the carried audio into fragments. In each fragment, the voiced frame with the largest energy is selected to extract the watermark from it, and the length of the voiced frame is . Step 2: Divide each voiced frame into small frames with sample points and select the small frame with the largest energy as the synchronization mark. Step 3: Select consecutive small frames around the synchronization mark as the extracting region of the watermark. See Step 4 in Section 2 for details. Step 4: Apply DCT on to obtain DCT coefficients . Step 5: Starting from the spectrum line, select frequency-bands from , and each band has spectrum lines. Step 6: Calculate , and of two adjacent frequency-bands. Step 7: If , go to step 6, otherwise, go to step 8. Step 8: Extract 1 bit watermark from these two frequency-bands according to the formula (13). Step 9: Repeat step 6 to step 8 until bits watermark are extracted from the voiced frame. Step 10: Repeat step 2 to step 9 until all binary watermark are extracted from the carried audio .

5. Performance Evaluation

In this section, the performance of the proposed algorithm will be evaluated from four aspects, including payload capacity, transparency, robustness, and complexity. Evaluating transparency is to calculate the decline of audio quality before and after being embedded. It can be measured from both subjective and objective aspects. The subjective index is the mean opinion score (MOS) and the objective index is the signal-to-noise ratio (SNR) which is shown in formula (14), and the object difference grade (ODG) which is the output value of the perceptual evaluation of audio quality (PEAQ). Robustness refers to the characteristic that the algorithm can still extract the watermark more accurately after the carried audio was attacked by numerous attacks. Robustness can be usually evaluated by BER which refers to the proportion of error bits between the extracted watermark and the original watermark, and BER can be expressed in formula (15). The smaller the BER, the stronger the robustness of the algorithm against attacks. According to the standard of international federation of the phonographic industry (IFPI), SNR should be greater than 20 dB so as to make the audio have good quality, and BER should be less than 20%.

Normalized correlation (NC) also can be used to evaluate the robustness of the algorithm. It reflects the robustness by evaluating the similarity between the extracted watermark and the original watermark, as shown in formula (16). If the value of NC is close to 1, it indicates that the extracted watermark is very similar to the original watermark, that is, the robustness of the algorithm is strong.

The experimental parameters are as follows: (1) The watermarks are two binary images shown in Figure 4(a) with the size of and Figure 4(b) with the size of ; (2) The tested audio are twenty 64-second audio signals which were formatted by WAV and sampled at 44100 Hz with 16-bit resolution, all of which come from TIMIT standard database; (3) Algorithm parameters: , , , , , , , , and .

(a)

(b)

The experimental equipment and application software are described as follows: (1) Computer system: 64-bit Microsoft Windows 10; (2) Software for processing audio: Cool Edit Pro; (3) Programming Language: Matlab 2016R.

5.1. Capacity and Transparency

Formula (17) can be used to calculate the payload capacity of the proposed algorithm.where is the duration of the audio used to carry the watermark. In our experiment, is 64 seconds, and the binary watermark is 4096 bits, so the payload capacity is bps. Table 1 shows the experimental results, including transparency, which is evaluated by SNR (dB), MOS, ODG of the audio, robustness which is evaluated by BER (%) and NC of the extracted watermark, and the payload capacity which is expressed as Cap (bps).

According to the experimental results in Table 1, the proposed algorithm has good transparency, because under the premise of the payload capacity of 64 bps, the average SNR is as high as 26.86 dB, ODG is −0.45, and MOS is 4.7, which is higher than the standard of IFPI. Good transparency is mainly because the polarity of embedding depth is determined by comparing the SD of the two adjacent frequency-bands as shown in formula (12), which will minimize the modification of DCT coefficients. The waveforms of the original audio and the carried audio are shown in Figures 5(a) and 5(b), respectively (only show an audio clip about 3 seconds so as to display the details of the audio). The corresponding spectrograms are shown in Figures 6(a) and 6(b). It can be seen that the waveforms and spectrograms of the audio before and after embedding the watermark both have no obvious difference, which indicates that the proposed algorithm has good transparency.

(a)

(b)

(a)

(b)

In Table 1, BER is equal to 0 and NC is equal to 1, which indicates that when there is no attack applied on the carried audio, the watermark can be extracted accurately. Therefore, the watermark image extracted in Figure 7(w) is the same as the original image in Figure 4(a), and the watermark image extracted in Figure 8(w) is the same as the original image in Figure 4(b).

Figure 7

The extracted images about the first watermark. (a) Noise corruption (35 dB). (b) Noise corruption (20 dB). (c) MP3 compression (128 kbps). (d) MP3 compression (64 kbps). (e) Re-quantization. (f) Resampling. (g) Echo addition (100 s). (h) Echo addition (50 s). (i) Low-pass filtering(8 kHz). (j) Low-pass filtering (4 kHz). (k) Amplitude scaling (1.2). (l) Amplitude scaling (0.8). (m) TSM (−5%). (n) TSM (+5%). (o) TSM (−10%). (p) TSM (+10%). (q) PSM (−10%). (r) PSM (+10%). (s) Jittering (500). (t) Jittering (1000). (u) Random cropping (100). (v) Random cropping (200). (w) No attack.

Figure 8

The extracted images about the second watermark. (a) Noise corruption (35 dB). (b) Noise corruption (20 dB). (c) MP3 compression (128 kbps). (d) MP3 compression (64 kbps). (e) Re-quantization. (f) Resampling. (g) Echo addition (100 s). (h) Echo addition (50 s). (i) Low-pass filtering(8 kHz). (j) Low-pass filtering (4 kHz). (k) Amplitude scaling (1.2). (l) Amplitude scaling (0.8). (m) TSM (−5%). (n) TSM (+5%). (o) TSM (−10%). (p) TSM (+10%). (q) PSM (−10%). (r) PSM (+10%). (s) Jittering (500). (t) Jittering (1000). (u) Random cropping (100). (v) Random cropping (200). (w) No attack.

The payload capacity of the proposed algorithm is lower than that in reference [13], but has better transparency. The payload capacity and transparency of the proposed algorithm are both better than those in reference [23]. Under the same payload capacity, the transparency of the proposed algorithm is higher than that in reference [24].

5.2. Robustness

This section will evaluate the robustness of the proposed algorithm. Carry out different attacks on the carried audio and then use the extracting algorithm to extract the watermark. Finally, compare the extracted watermark with the original watermark and quantitatively evaluate the robustness of the algorithm by BER and NC. Attack types include a variety of conventional signal processing operations and synchronous attacks, as shown in Table 2.

After the above attacks on the carried audio, the average values of BER (%) and NC calculated under the same type of attacks are listed in Table 3. The extracted images are shown in Figures 7 and 8.

According to the experimental results shown in Table 3, Figures 7 and 8, the proposed algorithm shows excellent robustness against some attacks, including noise corruption with 35 dB, MP3 compression with 128 kbps, re-quantization, resampling, echo addition with the delay of 100 ms and 50 ms, low-pass filter with 8 kHz, and amplitude scaling. The extracted watermarks are very clear. All BER values are equal to 0, and all NC values are equal to 1.

When it resists noise corruption with 20 dB, low-pass filtering with 4 kHz, MP3 compression with 64 kbps, jittering, and random cropping, it has strong robustness against those attacks. The extracted watermarks are very high similarity with the original watermarks, BER are below 2.98%, and NC values are above 0.99.

When resisting TSM, BER values are relatively high, but they meet the standard of IFPI, and the main extracted information of the watermarks can be distinguished from the extracted images. When resisting PSM, BER all are high and the extracted watermarks are very fuzzy. It is difficult to obtain the main information of watermark from the extracted image. Those experimental results show that the proposed algorithm has weak robustness against PSM. The main reason is that the proposed algorithm takes the small frame with the largest energy in the voiced frame as the synchronization mark. When the carried audio is subjected to PSM, the location of the synchronization mark may be offset. If the offset synchronization mark is still used as the benchmark to search the location of the watermark, the extracted information will be inaccurate.

5.3. Complexity

The complexity of the algorithm usually can be measured by calculating the running time. Too much complexity will limit the scope of application of the algorithm. In reference [24], the average running time for the embedding watermark is 1055.99 seconds, and the average running time for extracting watermark is 0.8544 seconds. In our study, the embedding time and extraction time are greatly shortened. The average running time for the embedding watermark is 1.2612 seconds, and the average running time of the extracting watermark is 0.6012 seconds, so this proposed algorithm has low complexity.

The experimental results show that the algorithm has a good comprehensive performance. Compared with reference [13], the payload capacity of this algorithm is slightly smaller, but it has higher transparency and better robustness against most attacks. Compared with reference [23], this algorithm has a larger payload capacity and better transparency, and its robustness is stronger except for noise corruption with 20 dB, TSM, PSM, and jittering. Under the same payload capacity, the proposed algorithm has lower complexity, better transparency, and stronger robustness against most attacks, especially low-pass filtering with 4 kHz, except for noise corruption with 20 dB, TSM, and PSM than the algorithm in reference [24]. This is mainly because the proposed algorithm designs the embedding algorithm and the extracting algorithm based on the spectrum distribution which has good stability, and the synchronization mechanism is an improvement on the synchronization mechanism proposed in reference [24].

6. Conclusion

The spectrum distribution of low-frequency components of audio will not change much after being attacked. Based on this feature, a robust and blind audio watermarking algorithm based on spectrum distribution is proposed. The proposed algorithm designs an eigenvalue to represent the spectrum distribution of the frequency-band and then develops the embedding algorithm by adjusting the difference of the eigenvalues between two adjacent frequency-bands. The DCT coefficients are modified according to the embedding rules, and the polarity of embedding depth is determined by comparing the eigenvalues of the two frequency-bands, which will minimize the modification of DCT coefficients, so that the algorithm has good transparency. When extracting the watermark, the binary watermark can be extracted blindly only by judging the difference of the eigenvalues. Therefore, the original audio is not needed, which is very convenient for practical application. In this study, the improved synchronization mechanism takes the small frame with the largest amplitude in the voiced frame as the synchronization mark to obtain the embedding and extracting location of the watermark in the audio, so as to improve the robustness of the algorithm. In addition, other measures are taken to improve the robustness of the algorithm, such as embedding the watermark into three voiced frames repeatedly. From the experimental results and the comparison with other similar algorithms, SNR of the proposed algorithm reaches 26.86 dB when the payload capacity is 64 bps, so it has a large payload capacity and good transparency. When the carried audio is attacked by noise corruption, MP3 compression, low-pass filtering, amplitude scaling, TSM, jittering, and random cropping, the extracted watermarks are very similar with the original watermarks, so this algorithm has strong robustness.

Although the proposed algorithm has advantages in transparency, payload capacity, complexity, and robustness against most attacks, it also has some shortcomings, such as the robustness against TSM and PSM that needs to be improved. In future research, we will overcome this and further improve the robustness for resisting in more attack types.

Data Availability

The data are available upon request.

Conflicts of Interest

The authors declare that there are no conflicts of interest.

Acknowledgments

This research was funded by the National Natural Science Foundation of China, China (Grant no. 11601202), the High-Level Talent Scientific Research Foundation of the Jinling Institute of Technology, China (Grant no. jit-b-201918), Jiangsu University Student Innovation Training Program (Grant no. 202213573082Y), Collaborative Education Project of the Ministry of Education (Grant no. 202102089002), and Deputy general manager of the Science and Technology of Jiangsu Province (Grant no. FZ20220114).

References

D. Singh and S. K. Singh, “DWT-SVD and DCT based robust and blind watermarking scheme for copyright protection,” Multimedia Tools and Applications, vol. 76, no. 11, pp. 13001–13024, 2017.
View at: Google Scholar
M. J. Hwang, J. S. Lee, M. S. Lee, and H. G. Kang, “SVD based adaptive QIM watermarking on stereo audio signals,” IEEE Transactions on Multimedia, vol. 20, no. 1, pp. 45–54, 2017.
View at: Google Scholar
C. Lwendi, G. Srivastava, J. O. Ohyun, and A. R. Javed, “KeySplitWatermark: zero watermarking algorithm for software protection against cyber-attacks,” IEEE Access, vol. 8, Article ID 72650, 2020.
View at: Google Scholar
Y. Hong and J. Kim, “Autocorrelation modulation-based audio blind watermarking robust against high efficiency advanced audio coding,” Applied Sciences, vol. 9, no. 14, pp. 1–17, 2019.
View at: Google Scholar
B. Lei, I. Y. Soon, F. Zhou, Z. Li, and H. Lei, “A robust audio watermarking scheme based on lifting wavelet transform and singular value decomposition,” Signal Processing, vol. 92, no. 9, pp. 1985–2001, 2012.
View at: Google Scholar
X. Y. Wang, Q. L. Shi, S. M. Wang, and H. Y. Yang, “A blind robust digital watermarking using invariant exponent moments,” AEU-Int J Electron C, vol. 70, no. 4, pp. 416–426, 2016.
View at: Google Scholar
Q. Qian, H. X. Wang, X. M. Sun, Y. H. Cui, and H. Shi, “Speech authentication and content recovery scheme for security communication and storage,” Telecommunication Systems, vol. 67, pp. 635–649, 2018.
View at: Google Scholar
P. K. Dhar and T. Shimamure, “Blind audio watermarking in transform domain based on singular value decomposition and exponential-log operations,” Radioengineering, vol. 26, no. 2, pp. 552–561, 2017.
View at: Google Scholar
E. Salah, A. Khaldi, and K. M. Redouane, “A Fourier transform based audio watermarking algorithm,” Applied Acoustics, vol. 172, pp. 1–7, 2021.
View at: Google Scholar
A. N. Mohammad, V. Chalee, G. R. Hamurabi, J. M. Francisco, and I. D. Jos, “Digital speech watermarking based on line analysis and singular value decomposition,” Proceedings of the National Academy of Sciences, vol. 87, no. 3, pp. 433–446, 2017.
View at: Google Scholar
Z. H. Liu, D. Luo, and J. W. Huang, “Tamper recovery algorithm for digital speech signal based on DWT and DCT,” Multimedia Tools and Applications, vol. 76, Article ID 12481, 2017.
View at: Google Scholar
I. Natgunanathan, Y. Xiang, and G. Hua, “Patchwork-Based multi-layer audio watermarking,” IEEE Trans. Audio Lang. Process, vol. 25, pp. 2176–2187, 2017.
View at: Google Scholar
S. T. Chen, C. Y. Hsu, and H. N. Huang, “Wavelet-domain audio watermarking using optimal modification on low-frequency amplitude,” IET Signal Processing, vol. 9, no. 2, pp. 166–176, 2015.
View at: Google Scholar
A. A. Attari and A. A. B. Shirazi, “Robust audio watermarking algorithm based on DWT using Fibonacci numbers,” Multimedia Tools and Applications, vol. 77, no. 19, Article ID 25607, 2018.
View at: Google Scholar
X. C. Yuan, C. M. Pun, and C. Chen, “Robust mel-frequency cepstral coefficients feature detection and dual-tree complex wavelet transform for digital audio watermarking,” Information Sciences, vol. 298, pp. 159–179, 2015.
View at: Google Scholar
D. Megias, J. Serra-Ruiz, and M. Fallahpour, “Efficient self-synchronized blind audio watermarking system based on time domain and FFT amplitude modification,” Signal Processing, vol. 90, pp. 3078–3092, 2010.
View at: Google Scholar
T. K. Tewari, V. Saxena, and J. P. Gupta, “A digital audio watermarking algorithm using selective mid band DCT coefficients and energy threshold,” International Journal of audio Technology, vol. 17, pp. 365–371, 2014.
View at: Google Scholar
H. T. Hu and L. Y. Hsu, “Incorporating spectral shaping filtering into DWT based vector modulation to improve blind audio watermarking,” Wireless Personal Communications, vol. 94, no. 2, pp. 221–240, 2017.
View at: Google Scholar
A. Merrad and S. Saadi, “Blind speech watermarking using hybrid scheme based on DWT/DCT and sub-sampling,” Multimedia Tools and Applications, vol. 77, no. 20, Article ID 27589, 2018.
View at: Google Scholar
Q. L. Wu and M. Wu, “A novel robust audio watermarking algorithm by modifying the average amplitude in transform domain,” Applied Sciences, vol. 8, no. 723, pp. 1–17, 2018.
View at: Google Scholar
W. Z. Jiang, X. H. Huang, and Y. J. Quan, “Audio watermarking algorithm against synchronization attacks using global characteristics and adaptive frame division,” Signal Processing, vol. 162, pp. 153–160, 2019.
View at: Google Scholar
Z. H. Liu, Y. K. Huang, and J. W. Huang, “Patchwork-based audio watermarking robust against de-synchronization and recapturing attacks,” IEEE Transactions on Information Forensics and Security, vol. 14, no. 5, pp. 1171–1180, 2019.
View at: Google Scholar
H. T. Hu, J. R. Chang, and S. J. Lin, “Synchronous blind audio watermarking via shape configuration of sorted LWT coefficient magnitudes,” Signal Processing, vol. 147, pp. 190–202, 2018.
View at: Google Scholar
Q. L. Wu, A. Y. Qu, and D. D. Huang, “Robust and blind audio watermarking algorithm in dual domain for overcoming synchronization attacks,” Mathematical Problems in Engineering, vol. 11, pp. 1–15, 2020.
View at: Google Scholar
X. Wang, P. P. Niu, and H. Y. Yang, “A robust audio watermarking scheme using higher-order statistics in empirical mode decomposition domain,” Fundamenta Informaticae, vol. 130, no. 4, pp. 467–490, 2014.
View at: Google Scholar
B. Han, R. Jhaveri, H. Wang, D. Qiao, and J. Du, “Application of robust zero-watermarking scheme based on federated learning for securing the healthcare data,” IEEE Journal of Biomedical and Health Informatics, vol. 10, pp. 1–10, 2021.
View at: Google Scholar
M. Nematollahi, S. Al-Haddad, and S. Doraisamy, “Speaker frame selection for digital speech watermarking,” Natl. Acad. Sci. Lett, vol. 39, no. 3, pp. 197–201, 2016.
View at: Google Scholar

Copyright

Copyright © 2022 Qiuling Wu et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

PDF Download Citation

Download other formats

Order printed copies

Views

439

Downloads

351

Citations