This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Complete Traceability Multimedia Fingerprinting Codes Resistant to Averaging Attack and Adversarial Noise with Optimal Rate

Ilya Vorobyev
Skolkovo Institute of Science and Technology
Abstract

In this paper we consider complete traceability multimedia fingerprinting codes resistant to averaging attacks and adversarial noise. Recently it was shown that there are no such codes for the case of an arbitrary linear attack. However, for the case of averaging attacks complete traceability multimedia fingerprinting codes of exponential cardinality resistant to constant adversarial noise were constructed in [1]. We continue this work and provide an improved lower bound on the rate of these codes.

1 Introduction

Multimedia fingerprinting codes are used to protect digital content from illegal copying and redistribution. The key idea of this technique is to embed a unique signal, called watermark, into every copy, so that it can be tracked to its buyer [2, 3]. Watermarks should be able to protect the dealer from collusion attack, when a coalition of dishonest users (pirates) construct a new file, for example, by averaging their copies of the same content. By gathering a big enough coalition it is possible to sufficiently decrease the impact of each individual fingerprint, which makes it hard for the dealer to identify the pirates. In papers [4, 5] the authors propose to use separable (or signature) codes to track all members of the coalition.

A model of multimedia fingerprinting with an adversarial noise was proposed in [6], i.e. the coalition of dishonest users can add some noise to the content in order to hide their fingerprints. In [7] it was shown that there are no multimedia codes resistant to a general linear attack and an adversarial noise. However, in [1] the authors proved that for the most common case of averaging attack one can construct multimedia codes with a non-vanishing rate. We continue their research and prove a new lower bound on the rate, which has the same order as an upper bound. A detailed survey of state-of-the-art results can be found in [8].

The rest of the paper is structured as follows. In Section 2 we introduce the required notation and definitions and formally describe the problem. Our main result is proved in Section 3. Section 4 concludes the paper and discusses some open problems.

2 Problem statement

Vectors are denoted by bold letters, such as 𝒙\bm{x}, and the iith entry is referred to as xix_{i}. The set of integers {1, 2,,M}\{1,\,2,\,\ldots,\,M\} is abbreviated by [M][M]. The sign \lVert\cdot\rVert stands for the Euclidean norm. A support supp(𝒙)supp(\bm{x}) of a vector 𝒙\bm{x} is a set of such coordinates ii that xi0x_{i}\neq 0. Scalar (dot) product of vectors 𝒙\bm{x} and 𝒚\bm{y} is denoted as 𝒙,𝒚\langle\bm{x},\,\bm{y}\rangle, greatest common divisor of integers aa and bb is referred to as (a,b)(a,b). For a given binary n×Mn\times M matrix HH with columns 𝒉1,,𝒉M\bm{h}_{1},\,\ldots,\,\bm{h}_{M} and set I[M]I\subset[M] introduce notation for a result of averaging attack

σ(HI)=|I|1iI𝒉i.\sigma(H\mid I)=\lvert I\rvert^{-1}\sum\limits_{i\in I}\bm{h}_{i}.

A binary entropy function h(x)h(x) is defined as follows

h(x)=xlog2x(1x)log2(1x).h(x)=-x\log_{2}x-(1-x)\log_{2}(1-x).

Suppose that multimedia content is represented by a vector 𝒙N\bm{x}\in\mathbb{R}^{N}, which is being sold to MM users. Vector 𝒙\bm{x} is often called a host signal. To protect the content from unauthorized copying the dealer constructs a set of watermarks 𝒘1,,𝒘M\bm{w}_{1},\,\ldots,\,\bm{w}_{M}, which are also called fingerprints. The dealer fixes nn orthonormal vectors 𝒇1,,𝒇n\bm{f}_{1},\,\ldots,\,\bm{f}_{n} of length N,𝒇iNN,\,\bm{f}_{i}\in\mathbb{R}^{N} and forms watermarks 𝒘i\bm{w}_{i} as linear combinations of 𝒇j\bm{f}_{j} with binary coefficients hij{0, 1}h_{ij}\in\{0,\,1\}

𝒘i=j=1nhij𝒇j for i[M].\bm{w}_{i}=\sum\limits_{j=1}^{n}h_{ij}\,\bm{f}_{j}\text{ for }i\in[M]. (1)

Then watermarks are added to the host signal to obtain a final copy 𝒚i\bm{y}_{i} for the ii-th user

𝒚i=𝒙+𝒘i.\bm{y}_{i}=\bm{x}+\bm{w}_{i}.

We assume that 𝒘i𝒙\lVert\bm{w}_{i}\rVert\ll\lVert\bm{x}\rVert, so the added watermark doesn’t change the content much.

A coalition of dishonest users I[M]I\subset[M] may come together to forge a new copy and redistribute it among other users. They can apply a linear attack, i.e., create a new copy 𝒚\bm{y} as a linear combination of their copies. In addition, they may add a noise vector 𝜺\bm{\varepsilon}, 𝜺𝒙\lVert\bm{\varepsilon}\rVert\ll\lVert\bm{x}\rVert, to make it harder for the dealer to identify them.

𝒚=iIλi𝒚i+𝜺,\bm{y}=\sum\limits_{i\in I}\lambda_{i}\,\bm{y}_{i}+\bm{\varepsilon},

where λi>0\lambda_{i}>0 for each dishonest user in II exactly participates in the attack, λi\lambda_{i}\in\mathbb{R} and iIλi=1\sum_{i\in I}\lambda_{i}=1 to ensure the multimedia content 𝒙\bm{x} not be changed. Especially in averaging attack, the last condition is λi=1/|I|\lambda_{i}=1/\lvert I\rvert for every iIi\in I and it implies that

𝒚=iIλi𝒚i+𝜺=𝒙+iIλi𝒘i+𝜺.\bm{y}=\sum\limits_{i\in I}\lambda_{i}\,\bm{y}_{i}+\bm{\varepsilon}=\bm{x}+\sum\limits_{i\in I}\lambda_{i}\,\bm{w}_{i}+\bm{\varepsilon}.

Note that

𝒚𝒙=iIλi𝒘i+𝜺max𝒘i+𝜺𝒙,\lVert\bm{y}-\bm{x}\rVert=\lVert\sum\limits_{i\in I}\lambda_{i}\,\bm{w}_{i}+\bm{\varepsilon}\rVert\leq\max\lVert\bm{w}_{i}\rVert+\lVert\bm{\varepsilon}\rVert\ll\lVert\bm{x}\rVert,

therefore, 𝒚\bm{y} is close enough to the original signal 𝒙\bm{x}.

In order to find the coalition of dishonest users based on the forged copy 𝒚\bm{y}, the dealer evaluates

sk\displaystyle s_{k} =𝒚𝒙,𝒇k\displaystyle=\langle\bm{y}-\bm{x},\bm{f}_{k}\rangle
=iIλij=1nhij𝒇j+𝜺,𝒇k=iIλihik+ek,\displaystyle=\langle\sum\limits_{i\in I}\lambda_{i}\,\sum\limits_{j=1}^{n}h_{ij}\,\bm{f}_{j}+\bm{\varepsilon},\bm{f}_{k}\rangle=\sum\limits_{i\in I}\lambda_{i}\,h_{ik}+e_{k},

where ek=𝜺,𝒇ke_{k}=\langle\bm{\varepsilon},\bm{f}_{k}\rangle, and forms a syndrome vector 𝑺=(s1,,sn)\bm{S}=(s_{1},\,\ldots,\,s_{n}). The syndrome vector 𝑺\bm{S} can be equivalently defined through the matrix equation

𝑺=H𝚲T+𝒆,\bm{S}=H\bm{\Lambda}^{T}+\bm{e},

where Λ=(λ1,,λM)\Lambda=(\lambda_{1},\,\ldots,\,\lambda_{M}), λi=0\lambda_{i}=0 for iIi\notin I, and 𝒆=(e1,,en)\bm{e}=(e_{1},\,\ldots,\,e_{n}), 𝒆𝜺\lVert\bm{e}\rVert\leq\lVert\bm{\varepsilon}\rVert.

The dealer wants to design a matrix HH in such a way, that by observing 𝑺\bm{S} he always can find the support supp(𝚲)supp(\bm{\Lambda}) if the size of the coalition II is at most tt. The following definition for a noiseless scenario was introduced in [6].

Definition 1.

A binary n×Mn\times M matrix HH is called a tt-multimedia digital fingerprinting code with complete traceability (tt-MDF code for short) if for any two distinct coalitions II, II^{\prime}, |I|,|I|t\lvert I\rvert,\lvert I^{\prime}\rvert\leq t, we have

H𝚲TH𝚲TH\bm{\Lambda}^{T}\neq H\bm{\Lambda}^{\prime T}

for any real vectors 𝚲=(λ1,,λM)\bm{\Lambda}=(\lambda_{1},\,\ldots,\,\lambda_{M}) and 𝚲=(λ1,,λM)\bm{\Lambda}^{\prime}=(\lambda_{1}^{\prime},\,\ldots,\,\lambda_{M}^{\prime}), such that λi0\lambda_{i}\geq 0, λi0\lambda_{i}^{\prime}\geq 0, i=1Mλi=i=1Mλi=1\sum\limits_{i=1}^{M}\lambda_{i}=\sum\limits_{i=1}^{M}\lambda_{i}^{\prime}=1, supp(𝚲)=Isupp(\bm{\Lambda})=I, supp(𝚲)=Isupp(\bm{\Lambda}^{\prime})=I^{\prime}.

Denote the maximal cardinality and the maximal rate of tt-MDF code of length nn as M(n,t)M(n,t) and R(n,t)=n1log2M(n,t)R(n,t)=n^{-1}\log_{2}M(n,t). Denote by R(t)R^{*}(t) and R(t)R_{*}(t) an upper and a lower limits of R(n,t)R(n,t) as nn\to\infty. It is known that

Ω(log2tt)R(t)R(t)log2t2t(1+o(1)).\Omega\left(\frac{\log_{2}t}{t}\right)\leq R_{*}(t)\leq R^{*}(t)\leq\frac{\log_{2}t}{2t}(1+o(1)). (2)

The upper bound of (2) can be derived from an upper bound for a binary adder channel from [9]. The lower bound is based on the following observation from [6]. If any 2t2t columns of a binary matrix HH are independent over the field of real numbers \mathbb{R}, then HH is a tt-MDF code. Since parity check matrices of binary codes with a distance d>2td>2t poses this property, application of Goppa or BCH codes gives an explicit construction with a rate R(t)1/tR_{*}(t)\geq 1/t[6]. An improved lower bound Ω(log2tt)\Omega\left(\frac{\log_{2}t}{t}\right) can be derived from the results of the paper [10], where the authors proved the existence of binary n×Mn\times M matrices, n1log2M=Ω(log2t/t)n^{-1}\log_{2}M=\Omega(\log_{2}t/t), such that any 2t2t columns are independent over the field p\mathds{Z}_{p}, p>2tp>2t. We note that the latter result was proved with a probabilistic method, i.e. it’s not explicit.

Now we discuss a noisy scenario. In [7] the authors defined (t,δ)(t,\delta)-light complete traceability multimedia digital fingerprinting codes and proved that they don’t exist. Informally, if some coefficient λi\lambda_{i} is sufficiently small, then it is possible to compensate the signal of iith user by the noise so that it would be impossible to identify this user. However, for the case of averaging attacks, when all non-zero coefficients λi\lambda_{i} are equal, the situation is different. Let us give the corresponding definition from [1].

Definition 2.

A binary n×Mn\times M matrix HH is called a (Euclidean) (t,δ)(t,\delta)-light complete traceability code if for any two distinct coalitions I1I_{1}, I2I_{2}, |I1|,|I2|t\lvert I_{1}\rvert,\lvert I_{2}\rvert\leq t, we have

σ(HI1)+𝒆1σ(HI2)+𝒆2,\sigma(H\mid I_{1})+\bm{e}_{1}\neq\sigma(H\mid I_{2})+\bm{e}_{2},

for any real vectors 𝐞1,𝐞2n\bm{e}_{1},\bm{e}_{2}\in\mathbb{R}^{n}, 𝐞1,𝐞2δ\lVert\bm{e}_{1}\rVert,\lVert\bm{e}_{2}\rVert\leq\delta.

In other words, Euclidean distance between vectors σ(HI1)\sigma(H\mid I_{1}) and σ(HI2)\sigma(H\mid I_{2}), generated by different coalitions I1I_{1} and I2I_{2}, |I1|,|I2|t\lvert I_{1}\rvert,\lvert I_{2}\rvert\leq t, should be big, i.e.

σ(HI1)σ(HI2)>2δ.\lVert\sigma(H\mid I_{1})-\sigma(H\mid I_{2})\rVert>2\delta.
Remark 1.

Although an averaging attack is very restrictive for the coalition, in many papers authors consider only them instead of general linear attacks. One of the arguments is that averaging attack is the most fair choice since all the members of a coalition contribute the same proportion of data into a forged copy [4, 3]. However, in future research it may be reasonable to study a model with different coefficients λi\lambda_{i}, which are lower bounded by some constant.

Define codes for the case of noise vectors with a bounded cardinality of their support.

Definition 3.

A binary n×Mn\times M matrix HH is called a Hamming (t,T)(t,T)-light complete traceability code if for any two distinct coalitions I1I_{1}, I2I_{2}, |I1|,|I2|t\lvert I_{1}\rvert,\lvert I_{2}\rvert\leq t, we have

σ(HI1)+𝒆1σ(HI2)+𝒆2,\sigma(H\mid I_{1})+\bm{e}_{1}\neq\sigma(H\mid I_{2})+\bm{e}_{2},

for any real vectors 𝐞1,𝐞2n\bm{e}_{1},\bm{e}_{2}\in\mathbb{R}^{n}, |supp(𝐞1)|,|supp(𝐞2)|T\lvert supp(\bm{e}_{1})\rvert,\lvert supp(\bm{e}_{2})\rvert\leq T.

Equivalently, the number of different coordinates of vectors σ(HI1)\sigma(H\mid I_{1}) and σ(HI2)\sigma(H\mid I_{2}), generated by distinct coalitions I1I_{1} and I2I_{2}, |I1|,|I2|t\lvert I_{1}\rvert,\lvert I_{2}\rvert\leq t, should be big, i.e.

|supp(σ(HI1)σ(HI2))|>2T.\lvert supp(\sigma(H\mid I_{1})-\sigma(H\mid I_{2}))\rvert>2T.

Denote the maximal cardinality of Euclidean and Hamming light complete traceability codes of length nn by ME(n,t,δ)M_{E}(n,t,\delta) and MH(n,t,T)M_{H}(n,t,T) respectively. Define the rates of these codes as follows

RE(n,t,δ)=log2ME(n,t,δ)n,R_{E}(n,t,\delta)=\frac{\log_{2}M_{E}(n,t,\delta)}{n},
RH(n,t,T)=log2MH(n,t,T)n.R_{H}(n,t,T)=\frac{\log_{2}M_{H}(n,t,T)}{n}.

In the following proposition we show an obvious connection between these two families of codes.

Proposition 1.

1. A Hamming (t,T)(t,T)-light complete traceability code HH is a Euclidean (t,δ)(t,\delta)-light complete traceability code for δ=2T/(2t(t1))\delta=\sqrt{2T}/(2t(t-1)).
2. A Euclidean (t,δ)(t,\delta)-light complete traceability code HH is a Hamming (t,T)(t,T)-light complete traceability code for T=2δ2T=\lfloor 2\delta^{2}\rfloor.
3. The rates of these codes are connected as follows

RE(n,t,2T/(2t(t1)))RH(n,t,T),\displaystyle R_{E}(n,t,\sqrt{2T}/(2t(t-1)))\geq R_{H}(n,t,T),
RH(n,t,2δ2)RE(n,t,δ).\displaystyle R_{H}(n,t,\lfloor 2\delta^{2}\rfloor)\geq R_{E}(n,t,\delta).
Proof.

1. Assume that a Hamming (t,T)(t,T)-light complete traceability code HH is not a Euclidean (t,2T/(2t(t1)))(t,\sqrt{2T}/(2t(t-1)))-light complete traceability code, i.e. there exist two coalitions I1I_{1} and I2I_{2}, such that

𝚫2δ, where 𝚫=σ(HI1)σ(HI2),δ=2T/(2t(t1)).\lVert\bm{\Delta}\rVert\leq 2\delta,\text{ where }\bm{\Delta}=\sigma(H\mid I_{1})-\sigma(H\mid I_{2}),\;\delta=\sqrt{2T}/(2t(t-1)).

Since the minimal positive value of coordinate Δi\Delta_{i} is at least 1/(t(t1))1/(t(t-1)), we conclude that there are at most

4δ2t2(t1)2=2T4\delta^{2}t^{2}(t-1)^{2}=2T

coordinates, in which σ(HI1)\sigma(H\mid I_{1}) and σ(HI2)\sigma(H\mid I_{2}) are different. Hence, there are two vectors 𝒖1\bm{u}_{1}, 𝒖2\bm{u}_{2}, |supp(𝒖1)|,|supp(𝒖2)|T\lvert supp(\bm{u}_{1})\rvert,\lvert supp(\bm{u}_{2})\rvert\leq T, such that

σ(HI1)+𝒖1=σ(HI2)+𝒖2.\sigma(H\mid I_{1})+\bm{u}_{1}=\sigma(H\mid I_{2})+\bm{u}_{2}.

Therefore, HH is not a Hamming (t,T)(t,T)-light complete traceability code. This contradiction proves the first claim.

2. Assume that a Euclidean (t,δ)(t,\delta)-light complete traceability code HH is not a Hamming (t,2δ2)(t,\lfloor 2\delta^{2}\rfloor)-light complete traceability code, i.e. there exist two coalitions I1I_{1} and I2I_{2}, such that

|supp(𝚫)|2T, where 𝚫=σ(HI1)σ(HI2),T=2δ2.\lvert supp(\bm{\Delta})\rvert\leq 2T,\text{ where }\bm{\Delta}=\sigma(H\mid I_{1})-\sigma(H\mid I_{2}),\;T=\lfloor 2\delta^{2}\rfloor.

Since the absolute value of every coordinate of the vector 𝚫\bm{\Delta} is at most 1, we have

𝚫2T2δ,\lVert\bm{\Delta}\rVert\leq\sqrt{2T}\leq 2\delta,

which contradicts the definition of Euclidean (t,δ)(t,\delta)-light complete traceability codes.

3. Claim 3 is an obvious corollary of claims 1 and 2. ∎

In [1] it was proved that lim infnRE(n,t,δ)Ω(1/t)\liminf_{n\to\infty}R_{E}(n,t,\delta)\geq\Omega(1/t) for constant δ\delta. An upper bound is the same as in the noiseless case, lim supnRE(n,t,δ)log2t2t(1+o(1))\limsup_{n\to\infty}R_{E}(n,t,\delta)\leq\frac{\log_{2}t}{2t}(1+o(1)), since the proof works for an averaging attack. Therefore, there is a Θ(log2t)\Theta(\log_{2}t) gap between the lower and upper bound. We eliminate this gap in the next section.

3 Lower bound on the rate of light complete traceability codes

In this section we prove

Theorem 1.

For τ<1/4\tau<1/4

lim infnRH(n,t,τn)(12τ)log2t6t(1+o(1)),t.\liminf_{n\to\infty}R_{H}(n,t,\lfloor\tau n\rfloor)\geq\frac{(1-2\tau)\log_{2}t}{6t}(1+o(1)),\;\;t\to\infty. (3)

Combining Theorem 1 and Proposition 1 we obtain the following

Corollary 1.

For δ2=αn\delta^{2}=\alpha n, α<1/8t2(t1)2)\alpha<1/8t^{2}(t-1)^{2}), we have

lim infnRE(n,t,αn)(12τ)log2t6t(1+o(1)),t,\liminf_{n\to\infty}R_{E}(n,t,\sqrt{\alpha n})\geq\frac{(1-2\tau)\log_{2}t}{6t}(1+o(1)),\;\;t\to\infty,

where τ=2αt2(t1)2.\tau=2\alpha t^{2}(t-1)^{2}.

For the case of small noise δ=o(n)\delta=o(\sqrt{n}) and nn\to\infty a new lower bound has the following form

lim infnRE(n,t,δ)limα0lim infnRE(n,t,αn)log2t6t(1+o(1)).\liminf_{n\to\infty}R_{E}(n,t,\delta)\geq\lim\limits_{\alpha\to 0}\liminf_{n\to\infty}R_{E}(n,t,\sqrt{\alpha n})\geq\frac{\log_{2}t}{6t}(1+o(1)).

It improves the previous lower bound Ω(1/t)\Omega(1/t) and has the same order Θ(log2t/t)\Theta(\log_{2}t/t) as the upper bound. However, the new bound is not explicit, i.e. there is no effective encoding or decoding algorithm for a new code.

Proof of Theorem 1.

Consider a random n×Mn\times M matrix HH, M=2RnM=2^{Rn}, in which every entry is chosen independently and equals 1 with a probability 1/21/2. The value of RR will be specified later. Fix two coalitions I1I_{1} and I2I_{2}, |I1|,|I2|t\lvert I_{1}\rvert,\lvert I_{2}\rvert\leq t. Call a row rr good, if

i1I1hr,i1|I1|i2I2hr,i2|I2|.\frac{\sum\limits_{i_{1}\in I_{1}}h_{r,i_{1}}}{\lvert I_{1}\rvert}\neq\frac{\sum\limits_{i_{2}\in I_{2}}h_{r,i_{2}}}{\lvert I_{2}\rvert}.

Otherwise, we call a row bad. Call a pair of coalitions good, if there are at least 2T+12T+1 good rows for them. Otherwise, call such a pair bad. Then the condition that HH is a Hamming (t,T)(t,T)-light complete traceability code is equivalent to the absence of bad pairs of coalitions.

We say that a bad pair of coalitions I1I_{1} and I2I_{2} is minimal, if there is no another bad pair of coalitions I1I_{1}^{\prime} and I2I_{2}^{\prime}, I1I2I1I2I_{1}^{\prime}\cup I_{2}^{\prime}\subset I_{1}\cup I_{2}. For example, a bad pair of intersecting coalitions I1I_{1} and I2I_{2} with |I1|=|I2|\lvert I_{1}\rvert=\lvert I_{2}\rvert can’t be minimal, since it contains another bad pair I1I2I_{1}\setminus I_{2} and I2I1I_{2}\setminus I_{1}. Obviously, to prove that HH is a Hamming (t,T)(t,T)-light complete traceability code it is enough to check that there are no minimal bad pairs of coalitions.

We are going to prove that a mathematical expectation of the number of minimal bad pairs of coalitions is tending to zero as nn\to\infty. By Markov’s inequality this would imply that for big enough nn there exists (t,T)(t,T)-light complete traceability Hamming code with the rate RR and T=τnT=\lfloor\tau n\rfloor.

Now we estimate the probability that a row ii is bad for coalitions I1I_{1} and I2I_{2}.

Lemma 2.

The probability that a row is bad for coalitions I1I_{1} and I2I_{2}, |I1|=q\lvert I_{1}\rvert=q, |I2|=r\lvert I_{2}\rvert=r, q>rq>r, is upper bounded by p(q)=q1/3+o(1)p(q)=q^{-1/3+o(1)}, qq\to\infty. For non-intersecting coalitions I1I_{1} and I2I_{2}, |I1|=|I2|=q\lvert I_{1}\rvert=\lvert I_{2}\rvert=q, the probability that a row is bad is upper bounded by p(q)=q1/2+o(1)p(q)=q^{-1/2+o(1)}. Moreover, p(q)1/2p(q)\leq 1/2 for all qq.

Proof of Lemma 2.

For the case q=rq=r, I1I2=I_{1}\cap I_{2}=\emptyset, probability of a bad row is equal to

22qi=0q(qi)(qi)=22q(2qq)=O(q0.5),\displaystyle 2^{-2q}\sum\limits_{i=0}^{q}\binom{q}{i}\binom{q}{i}=2^{-2q}\binom{2q}{q}=O(q^{-0.5}),

which is not greater than 1/21/2 for all qq.

Now assume that q>rq>r. Denote the cardinality of the intersection of I1I_{1} and I2I_{2} as kk. Consider two cases qk>sq-k>s and qksq-k\leq s, s=q2/3s=q^{2/3}.

The first case qk>sq-k>s. Note that for any distribution of zeroes and ones in columns from I2I_{2} there exists at most one fraction of ones in I1I2I_{1}\setminus I_{2} which makes the row bad. Hence the probability of obtaining a bad string is upper bounded by

maxl(qkl)2qk1/2.\max\limits_{l}\frac{\binom{q-k}{l}}{2^{q-k}}\leq 1/2.

For qq\to\infty this bound looks as follows

maxl(qkl)2qk<1+o(1)π(qk)/2<1+o(1)πs/2=O(q1/3),\max\limits_{l}\frac{\binom{q-k}{l}}{2^{q-k}}<\frac{1+o(1)}{\sqrt{\pi(q-k)/2}}<\frac{1+o(1)}{\sqrt{\pi s/2}}=O(q^{-1/3}),

where in the first inequality a Stirling’s approximation (qk(qk)/2)2qkπ(qk)/2\binom{q-k}{(q-k)/2}\sim\frac{2^{q-k}}{\sqrt{\pi(q-k)/2}} for a maximal binomial coefficient was used.

The second case qksq-k\leq s. Observe that the greatest common divisor d=(q,r)d=(q,r) is at most ss, since dqrqksd\leq q-r\leq q-k\leq s. Since s1/q=s2/rs_{1}/q=s_{2}/r implies (q/d)s1(q/d)\mid s_{1} and (r/d)s2(r/d)\mid s_{2}, it is readily seen that for a bad row ii the iith coordinate in sums jI1𝒉j\sum\limits_{j\in I_{1}}\bm{h}_{j} and jI2𝒉j\sum\limits_{j\in I_{2}}\bm{h}_{j} should be divided by q/dq/d and r/dr/d respectively. Therefore, probability of a bad row can be upper bounded by the probability PP that a binomial random variable ξBin(q,1/2)\xi\sim Bin(q,1/2) is divided by q/dq1/3q/d\geq q^{1/3}. One can see that P<1/2P<1/2 for q/d>1q/d>1. Now we prove that for qq\to\infty the probability PP is at most q1/3+o(1)q^{-1/3+o(1)}.

By Hoeffding’s inequality [11]

Pr(|ξq/2|>qlnq)2e2lnq=O(q2).\Pr(\lvert\xi-q/2\rvert>\sqrt{q\ln q})\leq 2e^{-2\ln q}=O(q^{-2}).

Define S=[q/2qlnqq/d,q/2+qlnqq/d]S=[\lfloor\frac{q/2-\sqrt{q\ln q}}{q/d}\rfloor,\lfloor\frac{q/2+\sqrt{q\ln q}}{q/d}\rfloor]. Then we can estimate PP as follows

P\displaystyle P =l=0dPr(ξ=lq/d)\displaystyle=\sum\limits_{l=0}^{d}\Pr(\xi=l\cdot q/d)
lSPr(ξ=lq/d)+Pr(|ξq/2|>qlnq)\displaystyle\leq\sum\limits_{l\in S}\Pr(\xi=l\cdot q/d)+\Pr(\lvert\xi-q/2\rvert>\sqrt{q\ln q})
maxxPr(ξ=x)2qlnq+1q/d+O(q2)\displaystyle\leq\max\limits_{x}\Pr(\xi=x)\cdot\left\lceil\frac{2\sqrt{q\ln q}+1}{q/d}\right\rceil+O(q^{-2})
1q2qlnq+1q/d+O(q2)\displaystyle\leq\frac{1}{\sqrt{q}}\cdot\left\lceil\frac{2\sqrt{q\ln q}+1}{q/d}\right\rceil+O(q^{-2})
O(q1/3lnq)=q1/3+o(1).\displaystyle\leq O(q^{-1/3}\sqrt{\ln q})=q^{-1/3+o(1)}.

To estimate a mathematical expectation EE of the number of minimal bad pairs of coalitions we iterate over all <Mq+r<M^{q+r} pairs of coalitions having sizes qq and rr, q>rq>r, all pairs of non-intersecting coalitions of size qq, and over all possible amounts L<2T+1L<2T+1 of good rows.

E\displaystyle E <0<rqtMq+rL=02T(nL)(1p(q))Lp(q)nL\displaystyle<\sum\limits_{0<r\leq q\leq t}M^{q+r}\sum\limits_{L=0}^{2T}\binom{n}{L}(1-p(q))^{L}p(q)^{n-L}
<a)q=1tqM2q(2T+1)(n2T)(1p(q))2Tp(q)n2T\displaystyle\stackrel{{\scriptstyle a)}}{{<}}\sum\limits_{q=1}^{t}qM^{2q}(2T+1)\binom{n}{2T}(1-p(q))^{2T}p(q)^{n-2T}
=q=1t22qRnp(q)n2(h(2τ)+o(1))n(1p(q)p(q))2τn\displaystyle=\sum\limits_{q=1}^{t}2^{2qRn}p(q)^{n}2^{(h(2\tau)+o(1))n}\left(\frac{1-p(q)}{p(q)}\right)^{2\tau n}
=q=1t2A(q)n,\displaystyle=\sum\limits_{q=1}^{t}2^{A(q)n},

where

A(q)=2qR+log2p(q)+h(2τ)+2τlog2(1p(q)p(q)).\displaystyle A(q)=2qR+\log_{2}p(q)+h(2\tau)+2\tau\log_{2}\left(\frac{1-p(q)}{p(q)}\right).

In inequality a) we used the fact that

(nL)(1p(q))Lp(q)nL(n2T)(1p(q))2Tp(q)n2T,\binom{n}{L}(1-p(q))^{L}p(q)^{n-L}\leq\binom{n}{2T}(1-p(q))^{2T}p(q)^{n-2T},

since 2τ<1/21p(q)2\tau<1/2\leq 1-p(q) and 2T(1p(q))n2T\leq(1-p(q))n.

Let R^=minq[1,t]log2p(q)+h(2τ)+2τlog2(1p(q)p(q))2q\hat{R}=\min\limits_{q\in[1,t]}-\frac{\log_{2}p(q)+h(2\tau)+2\tau\log_{2}\left(\frac{1-p(q)}{p(q)}\right)}{2q}. Note that since 2τ<1p(q)2\tau<1-p(q) for all qq then R^>0\hat{R}>0. For R<R^R<\hat{R} the condition A(q)<0A(q)<0 holds, hence, E0E\to 0 as qq\to\infty, which implies that the rate R^\hat{R} is achievable. For tt\to\infty the minimum would be attained at qq, which tends to \infty, so

R^=(12τ)log2t6t(1+o(1)),t.\hat{R}=\frac{(1-2\tau)\log_{2}t}{6t}(1+o(1)),\quad t\to\infty.

Theorem 1 is proved. ∎

4 Conclusion

In this paper we proved a new lower bound on the rate of Euclidean (t,δ)(t,\delta)-light complete traceability codes, which shows that the optimal rate has order Θ(log2t/t)\Theta(\log_{2}t/t). However, the proof uses probabilistic arguments and does not provide an explicit construction with efficient encoding and decoding algorithms. A natural open problem is to design a code with an optimal rate and efficient decoding algorithm.

Coefficient λi\lambda_{i} shows what proportion of the original content was contributed by user ii into an illegal copy. It is natural that if the contribution of user ii was very small then it will be hard for a dealer to identify such user. So, another open task is to design a code capable of finding all members of a coalition for an adversarial noise and linear attack, whose coefficients λi\lambda_{i} are lower bounded by some constant, i.e. all users, whose contribution was big enough.

Acknowledgment

The reported study was supported by RFBR and National Science Foundation of Bulgaria (NSFB), project number 20-51-18002, and by RFBR, grant no. 20-01-00559.

References

  • [1] E. Egorova, M. Fernandez, G. Kabatiansky, and Y. Miao, “Existence and construction of complete traceability multimedia fingerprinting codes resistant to averaging attack and adversarial noise,” Problems of Information Transmission, vol. 56, no. 4, pp. 388–398, 2020.
  • [2] K. R. Liu, W. Trappe, Z. J. Wang, M. Wu, and H. Zhao, Multimedia fingerprinting forensics for traitor tracing.   EURASIP Book Series on Signal Processing and Communications: Hindawi Publishing Corporation, 2005.
  • [3] W. Trappe, M. Wu, Z. J. Wang, and K. R. Liu, “Anti-collusion fingerprinting for multimedia,” IEEE Transactions on Signal Processing, vol. 51, no. 4, pp. 1069–1087, 2003.
  • [4] M. Cheng and Y. Miao, “On anti-collusion codes and detection algorithms for multimedia fingerprinting,” IEEE transactions on information theory, vol. 57, no. 7, pp. 4843–4851, 2011.
  • [5] M. Cheng, L. Ji, and Y. Miao, “Separable codes,” IEEE transactions on information theory, vol. 58, no. 3, pp. 1791–1803, 2011.
  • [6] E. Egorova, M. Fernandez, G. Kabatiansky, and M. H. Lee, “Signature codes for weighted noisy adder channel, multimedia fingerprinting and compressed sensing,” Designs, Codes and Cryptography, vol. 87, no. 2, pp. 455–462, 2019.
  • [7] J. Fan, Y. Gu, M. Hachimori, and Y. Miao, “Signature codes for weighted binary adder channel and multimedia fingerprinting,” IEEE Transactions on Information Theory, vol. 67, no. 1, pp. 200–216, 2020.
  • [8] E. E. Egorova and G. A. Kabatiansky, “Separable collusion-secure multimedia codes,” Problems of Information Transmission, vol. 57, no. 2, pp. 178–198, 2021.
  • [9] A. G. D’yachkov and V. V. Rykov, “On a coding model for a multiple-access adder channel,” Problemy Peredachi Informatsii, vol. 17, no. 2, pp. 26–38, 1981.
  • [10] N. H. Bshouty and H. Mazzawi, “On parity check (0, 1)-matrix over z p,” in Proceedings of the Twenty-Second Annual ACM-SIAM Symposium on Discrete Algorithms.   SIAM, 2011, pp. 1383–1394.
  • [11] W. Hoeffding, “Probability inequalities for sums of bounded random variables,” in The collected works of Wassily Hoeffding.   New York: Springer, 1994, pp. 409–426.