Advances in Multimedia

Volume 2017 (2017), Article ID 5879257, 10 pages

https://doi.org/10.1155/2017/5879257

## Commutative Watermarking-Encryption of Audio Data with Minimum Knowledge Verification

Stuttgart Media University, Nobelstrasse 10, 70569 Stuttgart, Germany

Correspondence should be addressed to Roland Schmitz

Received 30 September 2016; Accepted 20 February 2017; Published 20 March 2017

Academic Editor: Akram M. Z. M. Khedher

Copyright © 2017 Roland Schmitz and Jan Gruber. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

#### Abstract

We present a scheme for commutative watermarking-encryption (CWE) of audio data and demonstrate its robustness against an important class of attacks, Time-Scale Modifications (TSM). In addition, we show how the proposed CWE scheme can be integrated into a cryptographic protocol enabling public verification of the embedded mark without disclosing the mark or the watermarking key used for embedding.

#### 1. Introduction

Commutative watermarking-encryption (CWE) means the combination of watermarking and encryption in such a way that the encryption and watermarking functions commute; that is,where is the encryption function, is the encryption key, is the watermarking key, is the cleartext media data, and is the mark to be embedded.

If encryption and watermarking do commute, their combination can serve as an important building block within a Digital Rights Management (DRM) System, as detailed further in Section 2. In the present paper, an existing CWE concept for still images [1] is extended to audio files. To the best of our knowledge, this is the first CWE scheme for audio files to appear in the literature. In addition, we show that the presented CWE scheme can be integrated into a modified version of a protocol due to Craver and Katzenbeisser [2], enabling zero-knowledge verification of the watermark, meaning a verifier can verify the presence of a watermark without disclosure of the mark or the watermarking key . The rest of the paper is organized as follows: in Section 2, we motivate the need for CWE schemes and identify some basic requirements. In Section 3, we shortly review existing CWE schemes for still images and encryption/watermarking techniques for audio files, with a special emphasis on those algorithms using similar techniques as in our approach. In Section 4, we present our CWE scheme in detail. Section 5 provides experimental results on the robustness and fidelity of the watermarking part. Section 6 presents the integration of the CWE scheme into a zero-knowledge protocol for verifying the mark, and Section 7 concludes the paper.

#### 2. Motivation for CWE

The concept of commutative watermarking-encryption (CWE) was first discussed in [3] with a special emphasis on watermarking in the encrypted domain. From the left-hand side of (1) it is clear that the watermarking function must be able to act in the encrypted domain, which means that only a limited set of audiovisual features (if any) is available to the embedder and can be used to embed the mark.

##### 2.1. Dispute Resolve Protocols

The prime motivation to look at CWE schemes originates from the need to implement so-called Dispute Resolve Protocols, where a rights owner provides a digital media object to a distributor , who in turn sells to some customer . In this scenario, a number of attacks are possible, most importantly the case where sells a copy of in his own right. In particular, if such a copy is detected, the Dispute Resolve Protocol must be able to identify as the rightful owner of and to identify as the offending party.

An obvious solution is that embeds a watermark identifying as the rightful owner into and provides the marked object to . The distributor in turn marks for each customer with an additional watermark uniquely identifying . Unfortunately, in this scenario the distributor is able to generate identical copies of and sell them to customers . If these copies are marked with the identifier of some specific customer , the distributor can repudiate having generated the copies and the customer could be held responsible for the offence of .

The basic problem here is that has access to the marked object in plaintext. If a CWE scheme is available, however, the following protocol between a generic seller and a generic buyer becomes possible, as proposed in [4]:(1) encrypts with her symmetric key . The result is .(2) sends to , together with an individual mark that is to embed into .(3) embeds into and encrypts the result with his own key . The final result is sent to .(4) verifies that contains as watermark. If the verification is successful, removes her own encryption and sends the result back to the buyer.(5) removes his encryption from and is in possession of the individually marked object .If the distributor takes the role of the seller in this protocol and the rightsholder performs the en- and decryption operations in steps (1) and (4), respectively, the problem mentioned above can be solved, if a CWE scheme for the media object is available. The need for a CWE scheme becomes obvious in steps (3) and (4), where an encrypted media object is watermarked and the presence of a watermark is verified in an encrypted object, respectively. Moreover, steps (3) and (4) call for a public key watermarking scheme, where there is a private embedding key and a public detection key, or an asymmetric scheme, where it is possible to verify the existence of a watermark without fully disclosing the embedding key or the watermark itself.

##### 2.2. DRM Systems

In Digital Rights Management (DRM) Systems [5], encryption and watermarking are often combined in a natural way: the media data are transferred to a compliant media player in encrypted form, so that access to the plaintext data happens only under control of the compliant player. In addition, watermarks are embedded into the media data to have an additional layer of protection which is present even after the data have been decrypted. These watermarks can be used to claim copyright, enforce copying restrictions, or track illegal copies offered on the Internet. If a CWE scheme is used, compliant media players have the opportunity to detect and insert watermarks even in encrypted data. More generally, it should be possible to protect multimedia data throughout the distribution chain in a flexible way by allowing the encryption and watermarking operations to commute [6].

##### 2.3. Searching in Encrypted Databases

With the advent of cloud computing, new security challenges have arisen. For example, cloud computing clients need to secure their data, not only to protect their data from public attacks, but also to protect their data from their cloud service provider [7]. Thus, clients need to encrypt their data in the cloud. On the other hand, a cloud service provider or a client often has the need to search through the client data according to certain metadata or tags. It is therefore highly desirable to provide techniques which can protect the clients’ privacy and offer a large amount of accessibility at the same time. CWE schemes can provide such a solution, if metadata are used as watermarks and embedded into the encrypted data.

#### 3. Related Work

##### 3.1. CWE Schemes for Image Data

To the best of our knowledge, no CWE schemes for audio data have been proposed so far. However, there have been a number of attempts aimed at still images, of which we only review the so-called* invariant encryption* approach, as it is also used in our audio CWE scheme. For a more comprehensive review of existing CWE schemes for still images, see [8].

The invariant encryption approach to CWE as introduced in [1] is to encrypt the media data completely (as opposed to the partial encryption approach, which leaves part of the data unencrypted to host the watermark), but to use a weaker cipher that leaves a feature space of the media data invariant. This invariant feature space can be used to embed a watermark. For example, a permutation cipher can be used for encryption, leaving the global first-order statistics of the image untouched. The invariant feature space is therefore represented by the image histogram and a histogram-based algorithm can be used to embed the mark. The advantage of the invariant encryption approach is that all media data are encrypted (and not just a subset as in partial encryption schemes). The disadvantage, besides using a weaker cipher, is an inherent lack of robustness of the watermark.

##### 3.2. Audio Ciphering Methods

Let the audio signal consist of a set of sample values . Most existing audio ciphering methods like, for example, [9] or [10], substitute the audio sample values and change (i.e., flatten) the global histogram of the amplitudes of the sample values. The flattening of the histogram makes it impossible to use the histogram for embedding a watermark. In [11], however, a permutation cipher is used to permute the sample values in the time domain, thereby keeping the histogram invariant. This shows that it is possible to transfer the invariant encryption approach to audio data.

##### 3.3. Audio Watermarking Methods

From the host of existing audio watermarking methods (see [12] for an overview), the method by Xiang et al. [13] is the most important for our work, as it uses (a part of) the amplitude histogram for embedding the mark. The range of the audio sample values is splitted into equal-sized bins. The amplitude histogram is an -dimensional vectorwhere denotes the number of samples falling into the th bin. The relevant part of the amplitude histogram consists of the bins covering the interval , where is the mean value of the absolute amplitude values and is some fixed parameter. This condition makes sure that the bins in the relevant part of the histogram are “well filled,” that is, . To embed a watermarking bit , a triple of consecutive histogram bins with heights is used. If , the relation should hold, where is a predefined threshold value. If the relation is not satisfied by the three bins, a certain number of samples is shifted from the first and third bin of the triple into the second bin by adding and subtracting, respectively, a bin width to the samples. An analogous process is carried out if .

As embedding the mark has altered the mean value of the amplitude values, for extraction, the correct mean value has to be searched within a search space . For each mean value in , the corresponding histogram is formed and the distance between the first extracted bits and a known synchronization sequence sync is computed. The mean value associated with the minimum distance is used to extract the remaining watermark bits.

The described synchronization process helps to make the watermark robust against TSM attacks (cf. Section 5.2). Although the watermarking scheme is based on the histogram, it cannot be used in conjunction with a permutation cipher to form CWE scheme, because only a certain number of sample values in a histogram bin are modified. Therefore, after application of the permutation cipher, different sample values than before are modified, which destroys the commutativity property. Moreover, the scheme by Xiang et al. does not use a secret watermarking key .

#### 4. The Proposed CWE Scheme

The proposed scheme is based on the earlier ideas [1, 13] described in Section 3. In order to apply them in the audio domain and in order to make the overall scheme more robust to TSM attacks, some modifications were necessary, which are described in the following paragraphs.

##### 4.1. Ciphering Algorithm

An analogue audio signal is transferred into the digital domain by sampling the time-continuous signal at a certain discrete sampling rate. At the same time, the obtained samples are quantized according to the bit depth available, the result being a set of sample values , where can be seen as a discrete time coordinate. Common bit depths for representing audio are 16, 20, or 24 bit. The general idea is to permute the discrete points in time, while leaving the sample values untouched. In order to generate the permutations, the discrete version of Arnolds Cat Map [14] was used, because it is a well-known chaotic map used by many authors for generating permutations in image ciphering (see e.g., [15]). The discrete Cat Map is a two-dimensional map defined on a square grid bywhere and are parameters that can serve as the secret key if the function is used for encryption purposes. Two-dimensional permutations of the square grid can be quickly generated by repeated application of the Cat Map. Note, however, that there are only different keys. Therefore, it has been proposed in [16] to change the secret parameters in every iteration of the Cat Map. In order to apply the Cat Map on a discrete audio signal of length , the audio signal is rearranged into a square grid of size . If is not a square number, the signal is padded with random sample values having the same probability distribution (i.e., the same histogram) as the original signal. This makes sure that the padded values cannot be distinguished from the original values by an attacker. Moreover, the original histogram is largely unchanged by the padding (cf. Figure 1). Figure 2 shows the effect of the Cat Map after five iterations on the waveform of an example signal. The resulting PSNR between original and enciphered signal is 16.47.