Complete Traceability Multimedia Fingerprinting Codes Resistant to Averaging Attack and Adversarial Noise with Optimal Rate
Abstract
In this paper we consider complete traceability multimedia fingerprinting codes resistant to averaging attacks and adversarial noise. Recently it was shown that there are no such codes for the case of an arbitrary linear attack. However, for the case of averaging attacks complete traceability multimedia fingerprinting codes of exponential cardinality resistant to constant adversarial noise were constructed in [1]. We continue this work and provide an improved lower bound on the rate of these codes.
1 Introduction
Multimedia fingerprinting codes are used to protect digital content from illegal copying and redistribution. The key idea of this technique is to embed a unique signal, called watermark, into every copy, so that it can be tracked to its buyer [2, 3]. Watermarks should be able to protect the dealer from collusion attack, when a coalition of dishonest users (pirates) construct a new file, for example, by averaging their copies of the same content. By gathering a big enough coalition it is possible to sufficiently decrease the impact of each individual fingerprint, which makes it hard for the dealer to identify the pirates. In papers [4, 5] the authors propose to use separable (or signature) codes to track all members of the coalition.
A model of multimedia fingerprinting with an adversarial noise was proposed in [6], i.e. the coalition of dishonest users can add some noise to the content in order to hide their fingerprints. In [7] it was shown that there are no multimedia codes resistant to a general linear attack and an adversarial noise. However, in [1] the authors proved that for the most common case of averaging attack one can construct multimedia codes with a non-vanishing rate. We continue their research and prove a new lower bound on the rate, which has the same order as an upper bound. A detailed survey of state-of-the-art results can be found in [8].
2 Problem statement
Vectors are denoted by bold letters, such as , and the th entry is referred to as . The set of integers is abbreviated by . The sign stands for the Euclidean norm. A support of a vector is a set of such coordinates that . Scalar (dot) product of vectors and is denoted as , greatest common divisor of integers and is referred to as . For a given binary matrix with columns and set introduce notation for a result of averaging attack
A binary entropy function is defined as follows
Suppose that multimedia content is represented by a vector , which is being sold to users. Vector is often called a host signal. To protect the content from unauthorized copying the dealer constructs a set of watermarks , which are also called fingerprints. The dealer fixes orthonormal vectors of length and forms watermarks as linear combinations of with binary coefficients
(1) |
Then watermarks are added to the host signal to obtain a final copy for the -th user
We assume that , so the added watermark doesn’t change the content much.
A coalition of dishonest users may come together to forge a new copy and redistribute it among other users. They can apply a linear attack, i.e., create a new copy as a linear combination of their copies. In addition, they may add a noise vector , , to make it harder for the dealer to identify them.
where for each dishonest user in exactly participates in the attack, and to ensure the multimedia content not be changed. Especially in averaging attack, the last condition is for every and it implies that
Note that
therefore, is close enough to the original signal .
In order to find the coalition of dishonest users based on the forged copy , the dealer evaluates
where , and forms a syndrome vector . The syndrome vector can be equivalently defined through the matrix equation
where , for , and , .
The dealer wants to design a matrix in such a way, that by observing he always can find the support if the size of the coalition is at most . The following definition for a noiseless scenario was introduced in [6].
Definition 1.
A binary matrix is called a -multimedia digital fingerprinting code with complete traceability (-MDF code for short) if for any two distinct coalitions , , , we have
for any real vectors and , such that , , , , .
Denote the maximal cardinality and the maximal rate of -MDF code of length as and . Denote by and an upper and a lower limits of as . It is known that
(2) |
The upper bound of (2) can be derived from an upper bound for a binary adder channel from [9]. The lower bound is based on the following observation from [6]. If any columns of a binary matrix are independent over the field of real numbers , then is a -MDF code. Since parity check matrices of binary codes with a distance poses this property, application of Goppa or BCH codes gives an explicit construction with a rate [6]. An improved lower bound can be derived from the results of the paper [10], where the authors proved the existence of binary matrices, , such that any columns are independent over the field , . We note that the latter result was proved with a probabilistic method, i.e. it’s not explicit.
Now we discuss a noisy scenario. In [7] the authors defined -light complete traceability multimedia digital fingerprinting codes and proved that they don’t exist. Informally, if some coefficient is sufficiently small, then it is possible to compensate the signal of th user by the noise so that it would be impossible to identify this user. However, for the case of averaging attacks, when all non-zero coefficients are equal, the situation is different. Let us give the corresponding definition from [1].
Definition 2.
A binary matrix is called a (Euclidean) -light complete traceability code if for any two distinct coalitions , , , we have
for any real vectors , .
In other words, Euclidean distance between vectors and , generated by different coalitions and , , should be big, i.e.
Remark 1.
Although an averaging attack is very restrictive for the coalition, in many papers authors consider only them instead of general linear attacks. One of the arguments is that averaging attack is the most fair choice since all the members of a coalition contribute the same proportion of data into a forged copy [4, 3]. However, in future research it may be reasonable to study a model with different coefficients , which are lower bounded by some constant.
Define codes for the case of noise vectors with a bounded cardinality of their support.
Definition 3.
A binary matrix is called a Hamming -light complete traceability code if for any two distinct coalitions , , , we have
for any real vectors , .
Equivalently, the number of different coordinates of vectors and , generated by distinct coalitions and , , should be big, i.e.
Denote the maximal cardinality of Euclidean and Hamming light complete traceability codes of length by and respectively. Define the rates of these codes as follows
In the following proposition we show an obvious connection between these two families of codes.
Proposition 1.
1. A Hamming -light complete traceability code is a Euclidean -light complete traceability code for .
2. A Euclidean -light complete traceability code is a Hamming -light complete traceability code for .
3. The rates of these codes are connected as follows
Proof.
1. Assume that a Hamming -light complete traceability code is not a Euclidean -light complete traceability code, i.e. there exist two coalitions and , such that
Since the minimal positive value of coordinate is at least , we conclude that there are at most
coordinates, in which and are different. Hence, there are two vectors , , , such that
Therefore, is not a Hamming -light complete traceability code. This contradiction proves the first claim.
2. Assume that a Euclidean -light complete traceability code is not a Hamming -light complete traceability code, i.e. there exist two coalitions and , such that
Since the absolute value of every coordinate of the vector is at most 1, we have
which contradicts the definition of Euclidean -light complete traceability codes.
3. Claim 3 is an obvious corollary of claims 1 and 2. ∎
In [1] it was proved that for constant . An upper bound is the same as in the noiseless case, , since the proof works for an averaging attack. Therefore, there is a gap between the lower and upper bound. We eliminate this gap in the next section.
3 Lower bound on the rate of light complete traceability codes
In this section we prove
Theorem 1.
For
(3) |
Corollary 1.
For , , we have
where
For the case of small noise and a new lower bound has the following form
It improves the previous lower bound and has the same order as the upper bound. However, the new bound is not explicit, i.e. there is no effective encoding or decoding algorithm for a new code.
Proof of Theorem 1.
Consider a random matrix , , in which every entry is chosen independently and equals 1 with a probability . The value of will be specified later. Fix two coalitions and , . Call a row good, if
Otherwise, we call a row bad. Call a pair of coalitions good, if there are at least good rows for them. Otherwise, call such a pair bad. Then the condition that is a Hamming -light complete traceability code is equivalent to the absence of bad pairs of coalitions.
We say that a bad pair of coalitions and is minimal, if there is no another bad pair of coalitions and , . For example, a bad pair of intersecting coalitions and with can’t be minimal, since it contains another bad pair and . Obviously, to prove that is a Hamming -light complete traceability code it is enough to check that there are no minimal bad pairs of coalitions.
We are going to prove that a mathematical expectation of the number of minimal bad pairs of coalitions is tending to zero as . By Markov’s inequality this would imply that for big enough there exists -light complete traceability Hamming code with the rate and .
Now we estimate the probability that a row is bad for coalitions and .
Lemma 2.
The probability that a row is bad for coalitions and , , , , is upper bounded by , . For non-intersecting coalitions and , , the probability that a row is bad is upper bounded by . Moreover, for all .
Proof of Lemma 2.
For the case , , probability of a bad row is equal to
which is not greater than for all .
Now assume that . Denote the cardinality of the intersection of and as . Consider two cases and , .
The first case . Note that for any distribution of zeroes and ones in columns from there exists at most one fraction of ones in which makes the row bad. Hence the probability of obtaining a bad string is upper bounded by
For this bound looks as follows
where in the first inequality a Stirling’s approximation for a maximal binomial coefficient was used.
The second case . Observe that the greatest common divisor is at most , since . Since implies and , it is readily seen that for a bad row the th coordinate in sums and should be divided by and respectively. Therefore, probability of a bad row can be upper bounded by the probability that a binomial random variable is divided by . One can see that for . Now we prove that for the probability is at most .
To estimate a mathematical expectation of the number of minimal bad pairs of coalitions we iterate over all pairs of coalitions having sizes and , , all pairs of non-intersecting coalitions of size , and over all possible amounts of good rows.
where
In inequality a) we used the fact that
since and .
Let . Note that since for all then . For the condition holds, hence, as , which implies that the rate is achievable. For the minimum would be attained at , which tends to , so
Theorem 1 is proved. ∎
4 Conclusion
In this paper we proved a new lower bound on the rate of Euclidean -light complete traceability codes, which shows that the optimal rate has order . However, the proof uses probabilistic arguments and does not provide an explicit construction with efficient encoding and decoding algorithms. A natural open problem is to design a code with an optimal rate and efficient decoding algorithm.
Coefficient shows what proportion of the original content was contributed by user into an illegal copy. It is natural that if the contribution of user was very small then it will be hard for a dealer to identify such user. So, another open task is to design a code capable of finding all members of a coalition for an adversarial noise and linear attack, whose coefficients are lower bounded by some constant, i.e. all users, whose contribution was big enough.
Acknowledgment
The reported study was supported by RFBR and National Science Foundation of Bulgaria (NSFB), project number 20-51-18002, and by RFBR, grant no. 20-01-00559.
References
- [1] E. Egorova, M. Fernandez, G. Kabatiansky, and Y. Miao, “Existence and construction of complete traceability multimedia fingerprinting codes resistant to averaging attack and adversarial noise,” Problems of Information Transmission, vol. 56, no. 4, pp. 388–398, 2020.
- [2] K. R. Liu, W. Trappe, Z. J. Wang, M. Wu, and H. Zhao, Multimedia fingerprinting forensics for traitor tracing. EURASIP Book Series on Signal Processing and Communications: Hindawi Publishing Corporation, 2005.
- [3] W. Trappe, M. Wu, Z. J. Wang, and K. R. Liu, “Anti-collusion fingerprinting for multimedia,” IEEE Transactions on Signal Processing, vol. 51, no. 4, pp. 1069–1087, 2003.
- [4] M. Cheng and Y. Miao, “On anti-collusion codes and detection algorithms for multimedia fingerprinting,” IEEE transactions on information theory, vol. 57, no. 7, pp. 4843–4851, 2011.
- [5] M. Cheng, L. Ji, and Y. Miao, “Separable codes,” IEEE transactions on information theory, vol. 58, no. 3, pp. 1791–1803, 2011.
- [6] E. Egorova, M. Fernandez, G. Kabatiansky, and M. H. Lee, “Signature codes for weighted noisy adder channel, multimedia fingerprinting and compressed sensing,” Designs, Codes and Cryptography, vol. 87, no. 2, pp. 455–462, 2019.
- [7] J. Fan, Y. Gu, M. Hachimori, and Y. Miao, “Signature codes for weighted binary adder channel and multimedia fingerprinting,” IEEE Transactions on Information Theory, vol. 67, no. 1, pp. 200–216, 2020.
- [8] E. E. Egorova and G. A. Kabatiansky, “Separable collusion-secure multimedia codes,” Problems of Information Transmission, vol. 57, no. 2, pp. 178–198, 2021.
- [9] A. G. D’yachkov and V. V. Rykov, “On a coding model for a multiple-access adder channel,” Problemy Peredachi Informatsii, vol. 17, no. 2, pp. 26–38, 1981.
- [10] N. H. Bshouty and H. Mazzawi, “On parity check (0, 1)-matrix over z p,” in Proceedings of the Twenty-Second Annual ACM-SIAM Symposium on Discrete Algorithms. SIAM, 2011, pp. 1383–1394.
- [11] W. Hoeffding, “Probability inequalities for sums of bounded random variables,” in The collected works of Wassily Hoeffding. New York: Springer, 1994, pp. 409–426.