This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Repairing Reed-Solomon Codes
with Side Information

Thi Xinh Dinh†∗, Ba Thong Le, Son Hoang Dau, Serdar Boztas, Stanislav Kruglik, Han Mao Kiah,
Emanuele Viterbo**, Tuvi Etzion#, and Yeow Meng Chee#
RMIT University, Nanyang Technological University, Tay Nguyen University,
∗∗Monash University, #National University of Singapore
Abstract

We generalize the problem of recovering a lost/erased symbol in a Reed-Solomon code to the scenario in which some side information about the lost symbol is known. The side information is represented as a set SS of linearly independent combinations of the sub-symbols of the lost symbol. When S=S=\varnothing, this reduces to the standard problem of repairing a single codeword symbol. When SS is a set of sub-symbols of the erased one, this becomes the repair problem with partially lost/erased symbol. We first establish that the minimum repair bandwidth depends on |S||S| and not the content of SS and construct a lower bound on the repair bandwidth of a linear repair scheme with side information SS. We then consider the well-known subspace-polynomial repair schemes and show that their repair bandwidths can be optimized by choosing the right subspaces. Finally, we demonstrate several parameter regimes where the optimal bandwidths can be achieved for full-length Reed-Solomon codes.

I Introduction

To prevent data loss and increase data availability in distributed storage systems, a file is usually split into kk data chunks and transformed (encoded) into n>kn>k coded chunks using an erasure code, and then stored across nn different storage nodes. If the code is MDS [1], such a system can withstand any nkn-k failures because the entire file can be recovered from any kk chunks. When only one node fails, which is usually the most typical case (see, e.g. [2]), a repair/replacement node must download enough data from other helper nodes to recover its lost chunk. In the repair-bandwidth problem [3, 4], one seeks to minimize the repair bandwidth, i.e. the amount of downloaded data required for a successful recovery of the lost chunk. A low-bandwidth repair scheme can also be used for degraded read, in which requests for a chunk stored at an unavailable node can be served by other available nodes [5]. This important problem has been extensively studied in the literature (see, e.g. [6] and the references therein).

In this work we generalize the setting of the repair-bandwidth problem to accommodate side information (see Fig. 1 for a toy example). In information theory, the concept of side information has been investigated in numerous contexts, including source coding [7], channel coding [8], list decoding [9], index coding [10, 11], and private information retrieval [12]. In the context of the repair problem, side information refers to the additional information that the repair node knows about the lost chunk while recovering it. The side information could arise due to a partial loss of data, which means that part of the chunk is still accessible and serves as side information, or due to partial information gained from the previous communications or from other sources. The question of interest is that given the side information, what is the lowest repair bandwidth we can achieve. We refer to this as the repair-bandwidth with side information problem.

Refer to caption
Figure 1: An illustration of repair schemes that recover 𝒂=(a1,a2)\bm{a}=(a_{1},a_{2}) with and without side information. The side information a1+a2a_{1}+a_{2} leads to a reduction of 1 bit in the repair bandwidth. The repair node first obtains a2(a2+b1)b1a_{2}\leftarrow(a_{2}+b_{1})-b_{1}, and then a1(a1+a2)a2a_{1}\leftarrow(a_{1}+a_{2})-a_{2}.

In the scope of this work, we focus on Reed-Solomon codes, which is currently the most widely used families of erasure codes in distributed storage systems (see [13]). The repair bandwidth as well as the closely related metric called I/O cost and sub-packetization size have been investigated in a number of recent works for different families of Reed-Solomon codes [14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 13, 33, 34, 35, 36, 37]. For an [n,k]𝔽q[n,k]_{\mathbb{F}_{q^{\ell}}} Reed-Solomon code over the finite field 𝔽q\mathbb{F}_{q^{\ell}}, a coded chunk, which is an element in 𝔽q\mathbb{F}_{q^{\ell}} (called a symbol), is repaired from a number of 𝔽q\mathbb{F}_{q}-elements (called sub-symbols) extracted from other coded chunks. Each symbol 𝜸𝔽q\bm{\gamma}\in\mathbb{F}_{q^{\ell}} consists of \ell sub-symbols in 𝔽q\mathbb{F}_{q}, i.e. 𝜸=(γ1,,γ)𝔽q\bm{\gamma}=(\gamma_{1},\ldots,\gamma_{\ell})\in\mathbb{F}_{q}^{\ell}. The repair bandwidth is measured in the number of extracted sub-symbols, while the side information of a symbol 𝜸\bm{\gamma} can be represented as a set SS of 𝔽q\mathbb{F}_{q}-linearly independent combinations of its sub-symbols γ1,,γ\gamma_{1},\ldots,\gamma_{\ell}. In summary, our contributions are given below.

  • We show that the minimum repair bandwidth for a codeword symbol of a Reed-Solomon code given the side information SS depends on it size |S||S|, and independent of the specific choice of its elements.

  • We obtain a lower bound on the repair bandwidth of a linear repair scheme for a failed node with side information.

  • For subspace-polynomial repair schemes for [n,k]𝔽q[n,k]_{\mathbb{F}_{q^{\ell}}} Reed-Solomon codes with nkqmn-k\geq q^{m}, m<m<\ell ([15, 16, 17, 31]), we prove that special subspaces can be chosen to minimize the repair bandwidth among all such schemes.

  • A subspace that minimizes the repair bandwidth among all subspace-polynomial repair schemes can be found by solving an optimization problem on subspace intersections, which is of its own interest. We solve the problem in a few parameter regimes, leaving others for future research.

The paper is organized as follows. Section II provides required notations and definitions. Section III is devoted for the description and solutions of the repair with side information problem in different cases. We conclude the paper in Section IV.

II Preliminaries

II-A Definitions and Notations

Let qq be a prime power, 𝔽q\mathbb{F}_{q} be the finite field of qq elements and 𝔽q\mathbb{F}_{q^{\ell}} be the extension field of degree \ell of 𝔽q\mathbb{F}_{q}. We use [n][n] to denote the set {1,2,,n}\{1,2,\ldots,n\}, aba\mid b to denote that aa divides bb, and (a,b)(a,b) for gcd(a,b)\gcd(a,b), for a,ba,b\in\mathbb{Z}. For a set UU, let U=U{0}U^{*}\stackrel{{\scriptstyle\mbox{\tiny$\triangle$}}}{{=}}U\setminus\{0\}, and 𝜸U={𝜸u:uU}\bm{\gamma}U\stackrel{{\scriptstyle\mbox{\tiny$\triangle$}}}{{=}}\{\bm{\gamma}u\colon u\in U\}. We use 𝗌𝗉𝖺𝗇𝔽q(U)\mathsf{span}_{\mathbb{F}_{q}}(U) to denote the 𝔽q\mathbb{F}_{q}-subspace of 𝔽q\mathbb{F}_{q^{\ell}} spanned by a subset U𝔽qU\subseteq\mathbb{F}_{q^{\ell}}. We use dim𝔽q()\dim_{\mathbb{F}_{q}}(\cdot) and 𝗋𝖺𝗇𝗄𝔽q()\mathsf{rank}_{\mathbb{F}_{q}}(\cdot) (or dim()\dim(\cdot) and 𝗋𝖺𝗇𝗄()\mathsf{rank}(\cdot) for short) for the dimension of a subspace and the rank of a set of vectors over 𝔽q\mathbb{F}_{q}. The (field) trace of an element 𝒃𝔽q{\bm{b}}\in\mathbb{F}_{q^{\ell}} over 𝔽q\mathbb{F}_{q} is 𝖳𝗋𝔽q/𝔽q(𝒃)=i=01𝒃qi\mathsf{Tr}_{{\mathbb{F}}_{q^{\ell}}/{\mathbb{F}}_{q}}({\bm{b}})=\sum_{i=0}^{\ell-1}{\bm{b}}^{q^{i}}. We also write 𝖳𝗋()\mathsf{Tr}(\cdot) instead of 𝖳𝗋𝔽q/𝔽q()\mathsf{Tr}_{{\mathbb{F}}_{q^{\ell}}/{\mathbb{F}}_{q}}(\cdot) for simplicity.

Let 𝒞{\mathcal{C}} be a linear [n,k][n,k] code over 𝔽q\mathbb{F}_{q^{\ell}}. Then 𝒞{\mathcal{C}} is an kk-dimensional 𝔽q\mathbb{F}_{q^{\ell}}-subspace of 𝔽qn\mathbb{F}_{q^{\ell}}^{n}. A codeword of 𝒞{\mathcal{C}} is an element 𝒄=(𝒄1,𝒄2,,𝒄n)𝒞\vec{\bm{c}}=({\bm{c}}_{1},{\bm{c}}_{2},\ldots,{\bm{c}}_{n})\in{\mathcal{C}} and its codeword symbols are the components 𝒄j{\bm{c}}_{j}, j[n]j\in[n]. The dual code of a code 𝒞{\mathcal{C}} is the orthogonal complement 𝒞{\mathcal{C}}^{\perp} of 𝒞{\mathcal{C}}, 𝒞={𝒈𝔽qn:𝒄,𝒈=0,𝒄𝒞}{\mathcal{C}}^{\perp}=\{\vec{\bm{g}}\in\mathbb{F}_{q^{\ell}}^{n}:\langle\vec{\bm{c}},\vec{\bm{g}}\rangle=0,\forall\vec{\bm{c}}\in{\mathcal{C}}\}, where 𝒄,𝒈\langle\vec{\bm{c}},\vec{\bm{g}}\rangle is the scalar product of 𝒄\vec{\bm{c}} and 𝒈\vec{\bm{g}}. The code 𝒞{\mathcal{C}}^{\perp} is an 𝔽q\mathbb{F}_{q^{\ell}}-subspace of 𝔽qn\mathbb{F}_{q^{\ell}}^{n} with dimension nkn-k. The elements of 𝒞{\mathcal{C}}^{\perp} are called dual codewords. The number nkn-k is called the redundancy of the code 𝒞{\mathcal{C}}.

Definition 1.

Let A={𝜶j}j=1nA=\{\bm{\alpha}_{j}\}_{j=1}^{n} be a subset of size nn in 𝔽q\mathbb{F}_{q^{\ell}}. A Reed-Solomon code RS(A,k)𝔽qn\text{RS}(A,k)\subseteq\mathbb{F}_{q^{\ell}}^{n} of dimension kk with evaluation points AA is defined as

RS(A,k)={(f(𝜶1),,f(𝜶n)):f𝔽q[x],deg(f)<k},\text{RS}(A,k)=\big{\{}\big{(}f(\bm{\alpha}_{1}),\ldots,f(\bm{\alpha}_{n})\big{)}\colon f\in\mathbb{F}_{q^{\ell}}[x],\ \deg(f)<k\big{\}},\vspace{-5pt}

where 𝔽q[x]\mathbb{F}_{q^{\ell}}[x] is the ring of polynomials over 𝔽q\mathbb{F}_{q^{\ell}}. We also use the notation RS(n,k)(n,k), ignoring the evaluation points.

A generalized Reed-Solomon code, GRS(A,k,𝝀)\text{GRS}(A,k,\vec{\bm{\lambda}}), where 𝝀=(𝝀1,,𝝀n)𝔽qn\vec{\bm{\lambda}}=(\bm{\lambda}_{1},\ldots,\bm{\lambda}_{n})\in\mathbb{F}_{q^{\ell}}^{n}, is the set of codewords (𝝀1g(𝜶1),,𝝀ng(𝜶n))\big{(}\bm{\lambda}_{1}g(\bm{\alpha}_{1}),\ldots,\bm{\lambda}_{n}g(\bm{\alpha}_{n})\big{)}, where 𝝀j0\bm{\lambda}_{j}\neq 0 for all j[n]j\in[n], g𝔽q[x],deg(g)<nkg\in\mathbb{F}_{q^{\ell}}[x],\ \deg(g)<n-k. The dual code of a Reed-Solomon code RS(A,k)\text{RS}(A,k) is a generalized Reed-Solomon code GRS(A,nk,𝝀)\text{GRS}(A,n-k,\vec{\bm{\lambda}}), for some multiplier vector 𝝀\vec{\bm{\lambda}} ([1, Chap. 10]). We sometimes use the notation GRS(n,k)(n,k), ignoring AA and 𝝀\vec{\bm{\lambda}}.

Let f(x)f(x) be a polynomial corresponding to a codeword of the Reed-Solomon code 𝒞=RS(A,k){\mathcal{C}}=\text{RS}(A,k), and g(x)g(x) be a polynomial of degree at most nk1n-k-1, which corresponds to a codeword of the dual code 𝒞{\mathcal{C}}^{\perp}. Then j=1ng(𝜶j)(𝝀jf(𝜶j))=0,\sum_{j=1}^{n}g(\bm{\alpha}_{j})\big{(}\bm{\lambda}_{j}f(\bm{\alpha}_{j})\big{)}=0, and we call the polynomial g(x)g(x) a check polynomial for 𝒞{\mathcal{C}}.

II-B Trace Repair Method

Let RS(n,k)(n,k) be a Reed-Solomon code over 𝔽q\mathbb{F}_{q^{\ell}} with evaluation points AA, 𝒄\vec{\bm{c}}  a codeword corresponding to polynomial f(x)f(x), f𝔽q[x],deg(f)<kf\in\mathbb{F}_{q^{\ell}}[x],\ \deg(f)<k, and 𝒄j=f(𝜶){\bm{c}}_{j^{*}}=f(\bm{\alpha}^{*}) is a codeword symbol/node of 𝒄\vec{\bm{c}}, where 𝜶=𝜶jA\bm{\alpha}^{*}=\bm{\alpha}_{j^{*}}\in A. A (linear) trace repair scheme for f(𝜶)f(\bm{\alpha}^{*}) corresponds to a set of \ell check polynomials {gi(x)}i[]\left\{g_{i}(x)\right\}_{i\in[\ell]}, gi𝔽q[x],deg(gi)<nkg_{i}\in\mathbb{F}_{q^{\ell}}[x],\ \deg(g_{i})<n-k, that satisfies the Full-Rank Condition: 𝗋𝖺𝗇𝗄𝔽q{gi(𝜶)}i[]=\mathsf{rank}_{\mathbb{F}_{q}}\{g_{i}(\bm{\alpha}^{*})\}_{i\in[\ell]}=\ell. The repair bandwidth of such a repair scheme (in 𝔽q\mathbb{F}_{q}-symbols) is 𝖻𝗐=𝜶A{𝜶}𝗋𝖺𝗇𝗄𝔽q({gi(𝜶)}i[]){\sf{bw}}=\sum_{\bm{\alpha}\in A\setminus\{\bm{\alpha}^{*}\}}\mathsf{rank}_{\mathbb{F}_{q}}(\{g_{i}(\bm{\alpha})\}_{i\in[\ell]}). To repair all nn components of 𝒄\vec{\bm{c}}, we need nn such repair schemes (possibly with repetition). See, e.g. [31], for a detailed explanation of why the above scheme works with an example.

III Recovering an Erased Symbol with Side Information

III-A The Problem Description

Let 𝒞{\mathcal{C}} be an RS(n,k)(n,k) code over 𝔽q\mathbb{F}_{q^{\ell}} with evaluation points A𝔽qA\subseteq\mathbb{F}_{q^{\ell}}. Suppose that the codeword symbol f(𝜶)f(\bm{\alpha}^{*}) is erased and needs to be recovered, given a set of 𝔽q\mathbb{F}_{q}-linearly independent combinations of its sub-symbols (elements of 𝔽q\mathbb{F}_{q}) as side information. Note that for each vector of coefficients 𝒂=(a1,,a)𝔽q\vec{{\bm{a}}}=(a_{1},\ldots,a_{\ell})\in\mathbb{F}_{q}^{\ell}, there exists a 𝜷𝔽q\bm{\beta}\in\mathbb{F}_{q^{\ell}} such that the equality i=1aiξi=𝖳𝗋(𝜷𝝃)\sum_{i=1}^{\ell}a_{i}\xi_{i}=\mathsf{Tr}(\bm{\beta}\bm{\xi}) holds for all 𝝃𝔽q\bm{\xi}\in\mathbb{F}_{q^{\ell}}. Therefore, we can represent the side information as a set S={𝜷i}i=1sS=\{\bm{\beta}_{i}\}_{i=1}^{s}, where s=|S|s=|S|, and assume that SS is 𝔽q\mathbb{F}_{q}-linearly independent. We assume that the replacement/recovery node already knows ss traces {𝖳𝗋(𝜷if(𝜶))}i[s]\{\mathsf{Tr}(\bm{\beta}_{i}f(\bm{\alpha}^{*}))\}_{i\in[s]}, where {𝜷i}i[s]𝔽q\{\bm{\beta}_{i}\}_{i\in[s]}\subseteq\mathbb{F}_{q^{\ell}} is 𝔽q\mathbb{F}_{q}-linearly independent. Equivalently, we can represent the side information as a subspace 𝒮=𝗌𝗉𝖺𝗇𝔽q(S){\mathcal{S}}\stackrel{{\scriptstyle\mbox{\tiny$\triangle$}}}{{=}}\mathsf{span}_{\mathbb{F}_{q}}(S). We call SS the side information set and 𝒮{\mathcal{S}} the side information subspace. Note that SS is a basis of 𝒮{\mathcal{S}} and s=dim(𝒮)s=\dim({\mathcal{S}}).

According to the trace-repair method, it needs to reconstruct some s\ell-s traces of f(𝜶)f(\bm{\alpha}^{*}), namely, {𝖳𝗋(𝜷if(𝜶))}i[s+1,]\{\mathsf{Tr}(\bm{\beta}_{i}f(\bm{\alpha}^{*}))\}_{i\in[s+1,\ell]}, referred to as target traces, where {𝜷i}i[]\{\bm{\beta}_{i}\}_{i\in[\ell]} is an 𝔽q\mathbb{F}_{q}-basis of 𝔽q\mathbb{F}_{q^{\ell}}. We refer to T={𝜷i}i[s+1,]T\stackrel{{\scriptstyle\mbox{\tiny$\triangle$}}}{{=}}\{\bm{\beta}_{i}\}_{i\in[s+1,\ell]} as the target set and 𝒯=𝗌𝗉𝖺𝗇𝔽q(T){\mathcal{T}}\stackrel{{\scriptstyle\mbox{\tiny$\triangle$}}}{{=}}\mathsf{span}_{\mathbb{F}_{q}}(T) the target subspace with respect to the side information set SS (or the side information subspace 𝒮{\mathcal{S}}). We capture this discussion in Proposition 1. Its proof is similar to [16, Thm. 4] and can be found in Appendix V-A.

Proposition 1.

Let S={𝛃i}i[s]S\stackrel{{\scriptstyle\mbox{\tiny$\triangle$}}}{{=}}\{\bm{\beta}_{i}\}_{i\in[s]} be a linearly independent set and f(𝛂)f(\bm{\alpha}^{*}) be a symbol of Reed-Solomon code RS(n,k)(n,k) over 𝔽q\mathbb{F}_{q^{\ell}}, nqn\leq q^{\ell}. A linear repair scheme for f(𝛂)f(\bm{\alpha}^{*}) with side information SS corresponds to a collection of s\ell-s polynomials {gi(x)}i[s+1,]𝔽q[x]\{g_{i}(x)\}_{i\in[s+1,\ell]}\subset\mathbb{F}_{q^{\ell}}[x], where deg(gi)<nk\deg(g_{i})<n-k, T={gi(𝛂)}i[s+1,]T\stackrel{{\scriptstyle\mbox{\tiny$\triangle$}}}{{=}}\{g_{i}(\bm{\alpha}^{*})\}_{i\in[s+1,\ell]} and STS\cup T are linearly independent.

III-B Optimal Repair Bandwidths Only Depend on the Side Information Set Size

We demonstrate below that the optimal repair bandwidth for recovering an erasure depends on s=|S|s=|S| but not on the specific choice of SS. We first need an auxiliary lemma.

Lemma 1.

Given two 𝔽q\mathbb{F}_{q}-subspaces 𝒮{\mathcal{S}} and 𝒯{\mathcal{T}}^{\prime} of 𝔽q\mathbb{F}_{q^{\ell}} of dimensions ss and t=st=\ell-s, respectively. Then there exists an element 𝛅𝔽q\bm{\delta}\in\mathbb{F}_{q^{\ell}}^{*} such that 𝒮𝒯=𝔽q{\mathcal{S}}\oplus{\mathcal{T}}=\mathbb{F}_{q^{\ell}}, where 𝒯=𝛅𝒯{\mathcal{T}}=\bm{\delta}{\mathcal{T}}^{\prime}. Equivalently, given two 𝔽q\mathbb{F}_{q}-linearly independent subsets S={𝛃i}i[s]S=\{\bm{\beta}_{i}\}_{i\in[s]} and T={𝛃i}i[s+1,]T^{\prime}=\{\bm{\beta}^{\prime}_{i}\}_{i\in[s+1,\ell]} of 𝔽q\mathbb{F}_{q^{\ell}}, where s[]s\in[\ell], there exists 𝛅𝔽q\bm{\delta}\in\mathbb{F}_{q^{\ell}}^{*} such that STS\cup T, where T=𝛅TT=\bm{\delta}T^{\prime}, forms an 𝔽q\mathbb{F}_{q}-basis of 𝔽q\mathbb{F}_{q^{\ell}}.

Proof.

For each 𝜸𝒯{0}\bm{\gamma}\in{\mathcal{T}}^{\prime}\setminus\{0\}, as {𝜹𝜸:𝜹𝔽q}=𝔽q\{\bm{\delta}\bm{\gamma}\colon\bm{\delta}\in\mathbb{F}_{q^{\ell}}^{*}\}=\mathbb{F}_{q^{\ell}}^{*}, there are exactly qs1q^{s}-1 such 𝜹\bm{\delta} so that 𝜹𝜸𝒮{0}\bm{\delta}\bm{\gamma}\in{\mathcal{S}}\setminus\{0\} (as |𝒮|=qs|{\mathcal{S}}|=q^{s}). Let U{𝜹𝔽q:𝜸𝒯{0} such that 𝜹𝜸S}U\triangleq\{\bm{\delta}\in\mathbb{F}_{q^{\ell}}^{*}\colon\exists\bm{\gamma}\in{\mathcal{T}}^{\prime}\setminus\{0\}\text{ such that }\bm{\delta}\bm{\gamma}\in S\}, then U=𝜸𝒯{𝜹𝔽q:𝜹𝜸𝒮}.U=\bigcup_{\bm{\gamma}\in{\mathcal{T}}^{\prime*}}\{\bm{\delta}\in\mathbb{F}_{q^{\ell}}^{*}\colon\bm{\delta}\bm{\gamma}\in{\mathcal{S}}\}. We have

|U|𝜸𝒯|{𝜹𝔽q:𝜹𝜸𝒮}|=𝜸𝒯(qs1)=(qs1)(qs1)=(q1)(qs+qs2)<q1,\begin{split}|U|&\leq\sum_{\bm{\gamma}\in{\mathcal{T}}^{\prime*}}|\{\bm{\delta}\in\mathbb{F}_{q^{\ell}}^{*}\colon\bm{\delta}\bm{\gamma}\in{\mathcal{S}}\}|\\ &=\sum_{\bm{\gamma}\in{\mathcal{T}}^{\prime*}}(q^{s}-1)=(q^{\ell-s}-1)(q^{s}-1)\\ &=(q^{\ell}-1)-(q^{\ell-s}+q^{s}-2)<q^{\ell}-1,\end{split}

for s[]s\in[\ell]. Hence, there exists a 𝜹U\bm{\delta}\notin U, 𝜹0\bm{\delta}\neq 0, satisfying that for every 𝜸𝒯,𝜹𝜸𝒮\bm{\gamma}\in{\mathcal{T}}^{\prime*},\bm{\delta}\bm{\gamma}\notin{\mathcal{S}}. Thus, 𝒮𝜹𝒯={0}{\mathcal{S}}\cap\bm{\delta}{\mathcal{T}}^{\prime}=\{0\} or S𝜹TS\cup\bm{\delta}T^{\prime} forms a basis of 𝔽q\mathbb{F}_{q^{\ell}} as desired. ∎

Proposition 2.

The minimum repair bandwidth for a codeword symbol of an RS(n,k)(n,k) over 𝔽q\mathbb{F}_{q^{\ell}} given the side information SS depends on |S||S| but not on the specific choice of its elements.

Proof.

Let S={𝜷i}i[s]S=\{\bm{\beta}_{i}\}_{i\in[s]} and S={𝜷i}i[s]S^{\prime}=\{\bm{\beta}^{\prime}_{i}\}_{i\in[s]} be two different sets of side information for repairing the same codeword symbol f(𝜶)f(\bm{\alpha}^{*}). It suffices to show that for every repair scheme for f(𝜶)f(\bm{\alpha}^{*}) with side information SS^{\prime}, there exists a repair scheme for f(𝜶)f(\bm{\alpha}^{*}) with side information SS achieving the same repair bandwidth. To this end, let {gi(x)}i[s+1,]𝔽q[x]\{g_{i}(x)\}_{i\in[s+1,\ell]}\subset\mathbb{F}_{q^{\ell}}[x] corresponds to the repair scheme with side information SS^{\prime}, i.e. gi(𝜶)=𝜷ig_{i}(\bm{\alpha}^{*})=\bm{\beta}^{\prime}_{i}, i[s+1,]i\in[s+1,\ell] and {𝜷i}i[]\{\bm{\beta}^{\prime}_{i}\}_{i\in[\ell]} form an 𝔽q\mathbb{F}_{q}-basis of 𝔽q\mathbb{F}_{q^{\ell}}. According to Lemma 1, there exists 𝜹𝔽q\bm{\delta}\in\mathbb{F}_{q^{\ell}} such that {𝜷1,,𝜷s,𝜹𝜷s+1,,𝜹𝜷}\{\bm{\beta}_{1},\ldots,\bm{\beta}_{s},\bm{\delta}\bm{\beta}^{\prime}_{s+1},\ldots,\bm{\delta}\bm{\beta}^{\prime}_{\ell}\} is linearly independent. Therefore, the polynomials hi(x)𝜹gi(x)h_{i}(x)\triangleq\bm{\delta}g_{i}(x) for all i[s+1,]i\in[s+1,\ell] form a repair scheme for f(𝜶)f(\bm{\alpha}^{*}) with side information SS and with the same repair bandwidth as gi(x)g_{i}(x)’s. ∎

III-C A Lower Bound on the Bandwidth with Side Information

We provide a lower bound for the repair bandwidth with side information for one erasure in a Reed-Solomon code. The lower bound is similar to those in [15, 16, 17, 31], replacing qq^{\ell} by qsq^{\ell-s} at some places. When s=0s=0, it reduces to the existing bound (no side information). Its proof can be found in Appendix V-B.

Proposition 3.

Any linear repair scheme with side information size ss for Reed-Solomon code RS(A,k)(A,k) over 𝔽q\mathbb{F}_{q^{\ell}} requires a repair bandwidth of at least

tb𝖠𝖵𝖤+(n1t)b𝖠𝖵𝖤t\lfloor b_{\mathsf{AVE}}\rfloor+(n-1-t)\lceil b_{\mathsf{AVE}}\rceil

sub-symbols over 𝔽q\mathbb{F}_{q}, where n=|A|qn=|A|\leq q^{\ell}, r=nkr=n-k, b𝖠𝖵𝖤b_{\mathsf{AVE}} and tt are defined as b𝖠𝖵𝖤=logq(n1T)b_{\mathsf{AVE}}\stackrel{{\scriptstyle\mbox{\tiny$\triangle$}}}{{=}}\log_{q}(\frac{n-1}{T}), T=(r1)(qs1)+n1qsT\stackrel{{\scriptstyle\mbox{\tiny$\triangle$}}}{{=}}\frac{(r-1)(q^{\ell-s}-1)+n-1}{q^{\ell-s}}, and t=n1t\stackrel{{\scriptstyle\mbox{\tiny$\triangle$}}}{{=}}n-1 if b𝖠𝖵𝖤b_{\mathsf{AVE}}\in{\mathbb{Z}}, t=T(n1)qb𝖠𝖵𝖤qb𝖠𝖵𝖤qb𝖠𝖵𝖤t\stackrel{{\scriptstyle\mbox{\tiny$\triangle$}}}{{=}}\big{\lfloor}\frac{T-(n-1)q^{-\lceil b_{\mathsf{AVE}}\rceil}}{q^{-\lfloor b_{\mathsf{AVE}}\rfloor}-q^{-\lceil b_{\mathsf{AVE}}\rceil}}\big{\rfloor} otherwise.

In some special cases, the lower bound in Proposition 3 can be explicitly computed.

Corollary 1.

Consider a full-length RS(n=q,k)(n=q^{\ell},k) code over 𝔽q\mathbb{F}_{q^{\ell}} with nk=qmn-k=q^{m} for some 1m<1\leq m<\ell. Assume that (s)(\ell-s)\mid\ell and m(s)\ell\geq m(\ell-s). Then every linear repair scheme with side information set size ss requires a repair bandwidth of at least (q1)(s)(qs1)(qm1)q1(q^{\ell}-1)(\ell-s)-\frac{(q^{\ell-s}-1)(q^{m}-1)}{q-1} sub-symbols in 𝔽q\mathbb{F}_{q}.

Proof.

With n=qn=q^{\ell} and nk=qmn-k=q^{m}, we have

T=(qm1)(qs1)+q1qsT=\dfrac{(q^{m}-1)(q^{\ell-s}-1)+q^{\ell}-1}{q^{\ell-s}}\vspace{-5pt}

and

b𝖠𝖵𝖤=logqq1T=logq(q1(qm1)(qs1)+q1qs).b_{\mathsf{AVE}}=\log_{q}\dfrac{q^{\ell}-1}{T}=\log_{q}\bigg{(}\dfrac{q^{\ell}-1}{(q^{m}-1)(q^{\ell-s}-1)+q^{\ell}-1}q^{\ell-s}\bigg{)}.

Next, we show that s1<b𝖠𝖵𝖤<s\ell-s-1<b_{\mathsf{AVE}}<\ell-s. Indeed, the second inequality is obvious because q1<(qm1)(qs1)+q1q^{\ell}-1<(q^{m}-1)(q^{\ell-s}-1)+q^{\ell}-1. For the first inequality, we need to show that

q1(qm1)(qs1)+q1>1q,\dfrac{q^{\ell}-1}{(q^{m}-1)(q^{\ell-s}-1)+q^{\ell}-1}>\dfrac{1}{q},

which is equivalent to

(q1)(q1)>(qm1)(qs1)q1qm1>qs1q1,(q^{\ell}-1)(q-1)>(q^{m}-1)(q^{\ell-s}-1)\Longleftrightarrow\dfrac{q^{\ell}-1}{q^{m}-1}>\dfrac{q^{\ell-s}-1}{q-1},\vspace{-5pt}

which is true because

q1qm1qm(s)1qm1=i=0s1qmii=0s1qi=qs1q1,\dfrac{q^{\ell}-1}{q^{m}-1}\geq\dfrac{q^{m(\ell-s)}-1}{q^{m}-1}=\sum_{i=0}^{\ell-s-1}q^{mi}\geq\sum_{i=0}^{\ell-s-1}q^{i}=\dfrac{q^{\ell-s}-1}{q-1},

noting that either the first or the second inequality must be strict: if m=1m=1 (so that the second inequality becomes equality) then the first inequality is strict since q>qs=qm(s)q^{\ell}>q^{\ell-s}=q^{m(\ell-s)}. Thus, b𝖠𝖵𝖤b_{\mathsf{AVE}}\notin{\mathbb{Z}} and b𝖠𝖵𝖤=s1\lfloor b_{\mathsf{AVE}}\rfloor=\ell-s-1 and b𝖠𝖵𝖤=s\lceil b_{\mathsf{AVE}}\rceil=\ell-s. Plugging this in the formula for tt in Proposition 3 we obtain

t=T(q1)qb𝖠𝖵𝖤qb𝖠𝖵𝖤qb𝖠𝖵𝖤=(qm1)(qs1)q1.t=\left\lfloor\frac{T-(q^{\ell}-1)q^{-\lceil b_{\mathsf{AVE}}\rceil}}{q^{-\lfloor b_{\mathsf{AVE}}\rfloor}-q^{-\lceil b_{\mathsf{AVE}}\rceil}}\right\rfloor=\dfrac{(q^{m}-1)(q^{\ell-s}-1)}{q-1}.

Finally, using Proposition 3 we obtain a lower bound of

t(s1)+(q1t)(s)=(q1)(s)t\displaystyle t(\ell-s-1)+(q^{\ell}-1-t)(\ell-s)=(q^{\ell}-1)(\ell-s)-t
=(q1)(s)(qs1)(qm1)/(q1)\displaystyle=(q^{\ell}-1)(\ell-s)-(q^{\ell-s}-1)(q^{m}-1)/(q-1)

sub-symbols over 𝔽q\mathbb{F}_{q} on the repair bandwidth as claimed. ∎

III-D Optimal Subspace-Polynomial-Based Repair Schemes

In this section we investigate the repair bandwidth incurred by the subspace-polynomial repair scheme introduced in [17, 31], which generalizes the trace-polynomial-based scheme in [15, 16], under the new assumption of side information. We show that in contrast to the standard repair problem (with no side information), the repair bandwidth of such a scheme depends on the specific choice of the subspace. In particular, we transform the problem of finding subspace-polynomial repair schemes with minimum bandwidths possible into another one on subspace intersection, which on its own is an intriguing problem.

Before presenting Theorem 1, we note that given side information set S={𝜷i}i[s]S=\{\bm{\beta}_{i}\}_{i\in[s]}, to construct a subspace-polynomial repair scheme, one first needs to find a target set T={𝜷i}i[s+1,]T=\{\bm{\beta}_{i}\}_{i\in[s+1,\ell]} such that STS\cup T forms an 𝔽q\mathbb{F}_{q}-basis of 𝔽q\mathbb{F}_{q^{\ell}} (see Proposition 1 and the discussion preceding it). Next, given that r=nkqmr=n-k\geq q^{m}, for some m<m<\ell, one picks an mm-dimensional subspace 𝒲{\mathcal{W}} of 𝔽q\mathbb{F}_{q^{\ell}}, and form the s\ell-s check polynomials gi(x)=L𝒲(𝜷i(x𝜶))x𝜶g_{i}(x)\stackrel{{\scriptstyle\mbox{\tiny$\triangle$}}}{{=}}\frac{L_{\mathcal{W}}\big{(}\bm{\beta}_{i}(x-\bm{\alpha}^{*})\big{)}}{x-\bm{\alpha}^{*}}, i[s+1,]i\in[s+1,\ell]. Note that L𝒲(x)=𝝎𝒲(x𝝎)L_{\mathcal{W}}(x)\stackrel{{\scriptstyle\mbox{\tiny$\triangle$}}}{{=}}\prod_{\bm{\omega}\in{\mathcal{W}}}(x-\bm{\omega}) is the subspace polynomial which is a linearly map constructed over 𝒲{\mathcal{W}} and ker(L𝒲)=𝒲\ker(L_{\mathcal{W}})={\mathcal{W}}. The check polynomials are then used in the trace repair scheme.

Lemma 2.

Consider an RS(n,k)(n,k) with evaluation points A𝔽qA\subseteq\mathbb{F}_{q^{\ell}} satisfying nkqmn-k\geq q^{m}, m<m<\ell. Consider also a repair scheme with side information of size ss, that consists of s\ell-s polynomials {gi(x)}i[s+1,]\left\{g_{i}(x)\right\}_{i\in[s+1,\ell]}, where gi(x)L𝒲(𝛃i(x𝛂))/(x𝛂)g_{i}(x)\triangleq L_{\mathcal{W}}\big{(}\bm{\beta}_{i}(x-\bm{\alpha}^{*})\big{)}/(x-\bm{\alpha}^{*}) and T{𝛃i}i[s+1,]T\triangleq\{\bm{\beta}_{i}\}_{i\in[s+1,\ell]} is a target set. This scheme has bandwidth (with |A|1|A|-1 helper nodes)

(|A|1)(s)𝜶A{𝜶}dim((𝜶𝜶)𝒯𝒲)(|A|-1)(\ell-s)-\textstyle\sum_{\bm{\alpha}\in A\setminus\{\bm{\alpha}^{*}\}}\dim\big{(}(\bm{\alpha}-\bm{\alpha}^{*}){\mathcal{T}}\cap{\mathcal{W}}\big{)}

sub-symbols in 𝔽q\mathbb{F}_{q}, where 𝒯𝗌𝗉𝖺𝗇𝔽q(T){\mathcal{T}}\triangleq\mathsf{span}_{\mathbb{F}_{q}}(T).

Proof.

The node storing f(𝜶)f(\bm{\alpha}) computes s\ell-s traces 𝖳𝗋(L𝒲(𝜷i(𝜶𝜶))𝜶𝜶f(𝜶))\mathsf{Tr}\Big{(}\frac{L_{\mathcal{W}}\big{(}\bm{\beta}_{i}(\bm{\alpha}-\bm{\alpha}^{*})\big{)}}{\bm{\alpha}-\bm{\alpha}^{*}}f(\bm{\alpha})\Big{)}, i[s+1,]i\in[s+1,\ell]. However, due to the linearity of trace, it only needs to send 𝗋𝖺𝗇𝗄𝔽q({L𝒲(𝜷i(𝜶𝜶))}i[s,+1])\mathsf{rank}_{\mathbb{F}_{q}}\big{(}\left\{L_{\mathcal{W}}\big{(}\bm{\beta}_{i}(\bm{\alpha}-\bm{\alpha}^{*})\big{)}\right\}_{i\in[s,\ell+1]}\big{)} traces. To compute this rank, let 𝒰(𝜶𝜶)𝒯=𝗌𝗉𝖺𝗇𝔽q((𝜶𝜶)T){\mathcal{U}}\triangleq(\bm{\alpha}-\bm{\alpha}^{*}){\mathcal{T}}=\mathsf{span}_{\mathbb{F}_{q}}\big{(}(\bm{\alpha}-\bm{\alpha}^{*})T\big{)} and τ:𝒰𝔽q\tau\colon{\mathcal{U}}\to\mathbb{F}_{q^{\ell}} defined as τ(u)=L𝒲(u)\tau(u)=L_{\mathcal{W}}(u) for every u𝒰u\in{\mathcal{U}}. Then dim(𝒰)=dim(𝒯)=s\dim({\mathcal{U}})=\dim({\mathcal{T}})=\ell-s and ker(τ)=𝒰ker(L𝒲)=𝒰𝒲\ker(\tau)={\mathcal{U}}\cap\ker(L_{\mathcal{W}})={\mathcal{U}}\cap{\mathcal{W}}. Using the rank-nullity theorem, we obtain

𝗋𝖺𝗇𝗄𝔽q({L𝒲(𝜷i(𝜶𝜶))}i[s,+1])=dim(𝗂𝗆(τ))=dim(𝒰)dim(ker(τ))=(s)dim((𝜶𝜶)𝒯𝒲).\begin{split}&\mathsf{rank}_{\mathbb{F}_{q}}\big{(}\left\{L_{\mathcal{W}}\big{(}\bm{\beta}_{i}(\bm{\alpha}-\bm{\alpha}^{*})\big{)}\right\}_{i\in[s,\ell+1]}\big{)}\\ &=\dim\big{(}{\sf{im}}(\tau)\big{)}=\dim({\mathcal{U}})-\dim\big{(}\ker(\tau)\big{)}\\ &=(\ell-s)-\dim\big{(}(\bm{\alpha}-\bm{\alpha}^{*}){\mathcal{T}}\cap{\mathcal{W}}\big{)}.\end{split}

Summing this over all 𝜶A{𝜶}\bm{\alpha}\in A\setminus\{\bm{\alpha}^{*}\} completes the lemma. ∎

Given the side information subspace 𝒮{\mathcal{S}}, when constructing a subspace-polynomial repair scheme, we have the freedom in selecting relevant target subspace 𝒯{\mathcal{T}} and 𝒲{\mathcal{W}}. Hence, one can optimize the bandwidth over such 𝒯{\mathcal{T}} and 𝒲{\mathcal{W}}. This is in stark contrast to the case of standard repair without side information, in which any subspace 𝒲{\mathcal{W}} would lead to the same repair bandwidth [17, 31]. Theorem 1 formalizes this fact.

Theorem 1.

Consider an RS(A,k)(A,k), A𝔽qA\subseteq\mathbb{F}_{q^{\ell}}, |A|=n|A|=n, and nkqmn-k\geq q^{m}, m<m<\ell. Then the minimum bandwidth that a subspace-polynomial repair scheme for f(𝛂)f(\bm{\alpha}^{*}) (𝛂A\bm{\alpha}^{*}\in A) can achieve, given the side information subspace 𝒮{\mathcal{S}}, dim(𝒮)=s\dim({\mathcal{S}})=s, is

(|A|1)(s)max𝒯,𝒲𝜶A{𝜶}dim((𝜶𝜶)𝒯𝒲),(|A|-1)(\ell-s)-\textstyle\max_{{\mathcal{T}},{\mathcal{W}}}\sum_{\bm{\alpha}\in A\setminus\{\bm{\alpha}^{*}\}}\dim\big{(}(\bm{\alpha}-\bm{\alpha}^{*}){\mathcal{T}}\cap{\mathcal{W}}\big{)},

where the max\max is taken over all 𝔽q\mathbb{F}_{q}-subspaces 𝒯{\mathcal{T}} and 𝒲{\mathcal{W}} of 𝔽q\mathbb{F}_{q^{\ell}} with dim(𝒯)=s\dim({\mathcal{T}})=\ell-s, 𝒮𝒯=𝔽q{\mathcal{S}}\oplus{\mathcal{T}}=\mathbb{F}_{q^{\ell}}, and dim(𝒲)=m\dim({\mathcal{W}})=m.

Proof.

Follows directly from Proposition 1 and Lemma 2. ∎

Note that Theorem 1 converts the repair bandwidth with side information problem restricted to subspace-polynomial repair scheme to a pure subspace-intersection problem stated below.

(Subspace-Intersection Problem) Given 𝜶A𝔽q\bm{\alpha}^{*}\in A\subseteq\mathbb{F}_{q^{\ell}} and an ss-dimensional subspace 𝒮{\mathcal{S}} of 𝔽q\mathbb{F}_{q^{\ell}}, find 𝒯{\mathcal{T}} and 𝒲{\mathcal{W}} that maximizes the sum 𝜶A{𝜶}dim((𝜶𝜶)𝒯𝒲)\sum_{\bm{\alpha}\in A\setminus\{\bm{\alpha}^{*}\}}\dim\big{(}(\bm{\alpha}-\bm{\alpha}^{*}){\mathcal{T}}\cap{\mathcal{W}}\big{)} among all (s)(\ell-s)-dimensional subspaces 𝒯{\mathcal{T}} and mm-dimensional subspaces 𝒲{\mathcal{W}} satisfying 𝒮𝒯=𝔽q{\mathcal{S}}\oplus{\mathcal{T}}=\mathbb{F}_{q^{\ell}}.

The subspace-intersection problem can be tricky to solve, especially for general AA. Therefore, we limit ourselves to the more tractable case when A𝔽qA\equiv\mathbb{F}_{q^{\ell}}, for which optimal repair bandwidths were known for the standard repair setting (without side information) when qkqmq^{\ell}-k\geq q^{m} [17, 31]. We also assume that (s)(\ell-s)\mid\ell. With these assumptions, in Corollary 2, we can replace 𝒯{\mathcal{T}} by 𝔽qs\mathbb{F}_{q^{\ell-s}} in the optimization problem. Note that while 𝔽qs\mathbb{F}_{q^{\ell-s}} may not be a valid 𝒯{\mathcal{T}} (i.e 𝔽qs𝒮𝔽q\mathbb{F}_{q^{\ell-s}}\oplus{\mathcal{S}}\neq\mathbb{F}_{q^{\ell}}), by Lemma 1, at least one of its cosets is. Finally, although this corollary provides an upper bound instead of an exact formulation for the bandwidth, later on, using the lower bound in Proposition 3, we can establish optimal repair bandwidths for subspace polynomial schemes in some parameter regimes.

Corollary 2.

Consider a full-length RS(n=q,k)(n=q^{\ell},k) with evaluation points A=𝔽qA=\mathbb{F}_{q^{\ell}}, where nkqmn-k\geq q^{m}, m<m<\ell. Let SS be a side information set with s=|S|s=|S| and (s)(\ell-s)\mid\ell. Then there exists a subspace-polynomial repair scheme for f(𝛂)f(\bm{\alpha}^{*}) (𝛂A\bm{\alpha}^{*}\in A), given the side information set SS, with repair bandwidth

(q1)(s)max𝒲𝜸𝔽qdim(𝜸𝔽qs𝒲),(q^{\ell}-1)(\ell-s)-\max_{{\mathcal{W}}}\textstyle\sum_{\bm{\gamma}\in\mathbb{F}_{q^{\ell}}^{*}}\dim\big{(}\bm{\gamma}\mathbb{F}_{q^{\ell-s}}\cap{\mathcal{W}}\big{)}, (1)

where the max\max is taken over all mm-dimensional 𝔽q\mathbb{F}_{q}-subspaces 𝒲{\mathcal{W}} of 𝔽q\mathbb{F}_{q^{\ell}}.

Proof.

Let 𝒮=𝗌𝗉𝖺𝗇𝔽q(S){\mathcal{S}}\stackrel{{\scriptstyle\mbox{\tiny$\triangle$}}}{{=}}\mathsf{span}_{\mathbb{F}_{q}}(S) be the side information subspace. By Theorem 1, the minimum bandwidth achieved by a subspace-polynomial repair scheme with side information subspace 𝒮{\mathcal{S}} is

(q1)(s)max𝒯,𝒲𝜶A{𝜶}dim((𝜶𝜶)𝒯𝒲),(q^{\ell}-1)(\ell-s)-\max_{{\mathcal{T}},{\mathcal{W}}}\sum_{\bm{\alpha}\in A\setminus\{\bm{\alpha}^{*}\}}\dim\big{(}(\bm{\alpha}-\bm{\alpha}^{*}){\mathcal{T}}\cap{\mathcal{W}}\big{)}, (2)

where the max\max is taken over all 𝔽q\mathbb{F}_{q}-subspaces 𝒯{\mathcal{T}} and 𝒲{\mathcal{W}} of 𝔽q\mathbb{F}_{q^{\ell}} with dim(𝒯)=s\dim({\mathcal{T}})=\ell-s, 𝒮𝒯=𝔽q{\mathcal{S}}\oplus{\mathcal{T}}=\mathbb{F}_{q^{\ell}}, and dim(𝒲)=m\dim({\mathcal{W}})=m. To prove the existence of a repair scheme with bandwidth given by (1), we show that a (multiplicative) coset of 𝔽qs\mathbb{F}_{q^{\ell-s}} can be a (valid) target subspace. Indeed, according to Lemma 1, there exists 𝜹𝔽q\bm{\delta}\in\mathbb{F}_{q^{\ell}}^{*} such that 𝒮𝜹𝔽qs=𝔽q{\mathcal{S}}\oplus\bm{\delta}\mathbb{F}_{q^{\ell-s}}=\mathbb{F}_{q^{\ell}}, i.e. 𝜹𝔽qs\bm{\delta}\mathbb{F}_{q^{\ell-s}} is a valid target subspace w.r.t. the side information subspace 𝒮{\mathcal{S}}. Hence, setting 𝒯=𝜹𝔽qs{\mathcal{T}}=\bm{\delta}\mathbb{F}_{q^{\ell-s}} in (2) and using the assumption that A=𝔽qA=\mathbb{F}_{q^{\ell}}, there exists a subspace-polynomial repair scheme given the side information subspace 𝒮{\mathcal{S}} that achieves the bandwidth

(q1)(s)max𝒲𝜶𝔽q{𝜶}dim((𝜶𝜶)𝜹𝔽qs𝒲),(q^{\ell}-1)(\ell-s)-\max_{{\mathcal{W}}}\sum_{\bm{\alpha}\in\mathbb{F}_{q^{\ell}}\setminus\{\bm{\alpha}^{*}\}}\dim\big{(}(\bm{\alpha}-\bm{\alpha}^{*})\bm{\delta}\mathbb{F}_{q^{\ell-s}}\cap{\mathcal{W}}\big{)},

which is the same as (1) when replacing (𝜶𝜶)𝜹(\bm{\alpha}-\bm{\alpha}^{*})\bm{\delta} by 𝜸𝔽q\bm{\gamma}\in\mathbb{F}_{q^{\ell}}^{*}, noting that {(𝜶𝜶)𝜹:𝜶𝔽q{𝜶}}𝔽q\big{\{}(\bm{\alpha}-\bm{\alpha}^{*})\bm{\delta}\colon\bm{\alpha}\in\mathbb{F}_{q^{\ell}}\setminus\{\bm{\alpha}^{*}\}\big{\}}\equiv\mathbb{F}_{q^{\ell}}^{*}. ∎

Using Corollary 2, assuming a full-length code with (s)(\ell-s)\mid\ell, we can now construct a few concrete subspace-polynomial repair schemes that achieve optimal repair bandwidths among all linear schemes. To construct the first repair scheme achieving optimal repair bandwidth, we first prove an auxiliary lemma.

Lemma 3.

For every aa\mid\ell, bb\mid\ell, and (a,b)=1(a,b)=1, and for every 𝛄,𝛅𝔽q\bm{\gamma},\bm{\delta}\in\mathbb{F}_{q^{\ell}}^{*}, it holds that dim(𝛄𝔽qa𝛅𝔽qb){0,1}\dim(\bm{\gamma}\mathbb{F}_{q^{a}}\cap\bm{\delta}\mathbb{F}_{q^{b}})\in\{0,1\}.

Proof.

Note that 𝔽qa={𝝃q1qa1}i=0qa2\mathbb{F}_{q^{a}}^{*}=\big{\{}\bm{\xi}^{\frac{q^{\ell}-1}{q^{a}-1}}\big{\}}_{i=0}^{q^{a}-2} and 𝔽qb={𝝃q1qb1}i=0qb2\mathbb{F}_{q^{b}}^{*}=\big{\{}\bm{\xi}^{\frac{q^{\ell}-1}{q^{b}-1}}\big{\}}_{i=0}^{q^{b}-2}, where 𝝃\bm{\xi} is a primitive element of 𝔽q\mathbb{F}_{q^{\ell}}. To show that dim(𝜸𝔽qa𝜹𝔽qb){0,1}\dim(\bm{\gamma}\mathbb{F}_{q^{a}}\cap\bm{\delta}\mathbb{F}_{q^{b}})\in\{0,1\}, it suffices to show that for any 𝒖,𝒗𝜸𝔽qa𝜹𝔽qb{\bm{u}},{\bm{v}}\in\bm{\gamma}\mathbb{F}_{q^{a}}^{*}\cap\bm{\delta}\mathbb{F}_{q^{b}}^{*}, it holds that 𝒖/𝒗𝔽q{\bm{u}}/{\bm{v}}\in\mathbb{F}_{q}. Indeed, for such 𝒖{\bm{u}} and 𝒗{\bm{v}}, there exist xx, yy, zz, and ww such that

𝒖=𝜸(𝝃q1qa1)x=𝜹(𝝃q1qb1)z,𝒗=𝜸(𝝃q1qa1)y=𝜹(𝝃q1qb1)w,\begin{split}{\bm{u}}&=\bm{\gamma}\big{(}\bm{\xi}^{\frac{q^{\ell}-1}{q^{a}-1}}\big{)}^{x}=\bm{\delta}\big{(}\bm{\xi}^{\frac{q^{\ell}-1}{q^{b}-1}}\big{)}^{z},\ {\bm{v}}=\bm{\gamma}\big{(}\bm{\xi}^{\frac{q^{\ell}-1}{q^{a}-1}}\big{)}^{y}=\bm{\delta}\big{(}\bm{\xi}^{\frac{q^{\ell}-1}{q^{b}-1}}\big{)}^{w},\end{split}

which implies that

𝒖𝒗=(𝝃q1qa1)xy=(𝝃q1qb1)zw𝔽qa𝔽qb=𝔽q(a,b)=𝔽q.\frac{{\bm{u}}}{{\bm{v}}}=\big{(}\bm{\xi}^{\frac{q^{\ell}-1}{q^{a}-1}}\big{)}^{x-y}=\big{(}\bm{\xi}^{\frac{q^{\ell}-1}{q^{b}-1}}\big{)}^{z-w}\in\mathbb{F}_{q^{a}}\cap\mathbb{F}_{q^{b}}={\mathbb{F}}_{q^{(a,b)}}=\mathbb{F}_{q}.

The proof follows. ∎

The following theorem indicates the existence of optimal repair schemes for a full-length Reed-Solomon codes with side information size ss, where (s)|(\ell-s)|\ell. We prove that the existing subspace repair scheme can be constructed from a subfield 𝔽qm\mathbb{F}_{q^{m}} of 𝔽q\mathbb{F}_{q^{\ell}}, where nkqmn-k\geq q^{m}, >m1\ell>m\geq 1, m|m|\ell, and (s,m)=1(\ell-s,m)=1. However, for any coset of 𝔽qm\mathbb{F}_{q^{m}}, the proof is still right.

Theorem 2.

Consider a full-length Reed-Solomon codes RS(n=q,k)(n=q^{\ell},k) over 𝔽q\mathbb{F}_{q^{\ell}} with nkqmn-k\geq q^{m} for some >m1\ell>m\geq 1. If (s)(\ell-s)\mid\ell, mm\mid\ell, and (s,m)=1(\ell-s,m)=1, then there exists a linear repair scheme with side information of size ss that uses the repair bandwidth of (q1)(s)(qs1)(qm1)/(q1)(q^{\ell}-1)(\ell-s)-(q^{\ell-s}-1)(q^{m}-1)/(q-1) sub-symbols in 𝔽q\mathbb{F}_{q}. The scheme is optimal when nk=qmn-k=q^{m}.

Proof.

Set 𝒲=𝔽qm{\mathcal{W}}\stackrel{{\scriptstyle\mbox{\tiny$\triangle$}}}{{=}}\mathbb{F}_{q^{m}}, and let T={𝜷i}i=s+1T\stackrel{{\scriptstyle\mbox{\tiny$\triangle$}}}{{=}}\{\bm{\beta}_{i}\}_{i=s+1}^{\ell} be a basis of  𝒯=𝔽qs{\mathcal{T}}\stackrel{{\scriptstyle\mbox{\tiny$\triangle$}}}{{=}}\mathbb{F}_{q^{\ell-s}}. We now consider the subspace repair scheme constructed from 𝒲{\mathcal{W}}. The statement that the achieved bandwidth is optimal among all linear repair schemes is obvious due to Corollary 1, noting that the assumptions (s)(\ell-s)\mid\ell, mm\mid\ell, and (s,m)=1(\ell-s,m)=1 imply (s)m\ell\geq(\ell-s)m. It remains to show that the stated scheme has the stated bandwidth. Indeed, from Lemma 2, it is sufficient to prove that 𝜸𝔽qdim(𝜸𝒯𝒲)=(qs1)(qm1)q1\sum_{\bm{\gamma}\in\mathbb{F}_{q^{\ell}}^{*}}\dim\big{(}\bm{\gamma}{\mathcal{T}}\cap{\mathcal{W}}\big{)}=\frac{(q^{\ell-s}-1)(q^{m}-1)}{q-1}. By Lemma 3, we note that to get the sum 𝜸𝔽qdim(𝜸𝒯𝔽qm)\sum_{\bm{\gamma}\in\mathbb{F}_{q^{\ell}}^{*}}\dim(\bm{\gamma}{\mathcal{T}}\cap\mathbb{F}_{q^{m}}), we compute the number of elements 𝜸\bm{\gamma} so that dim(𝜸𝒯𝔽qm)=1\dim(\bm{\gamma}{\mathcal{T}}\cap\mathbb{F}_{q^{m}})=1. As the set of 11-dimensional intersections 𝜸𝒯𝒲\bm{\gamma}{\mathcal{T}}\cap{\mathcal{W}} is a partition of 𝒲{\mathcal{W}} into 11-dimensional subspaces, there are |𝒲|q1\frac{|{\mathcal{W}}^{*}|}{q-1} such subspaces. Moreover, each of them is repeated qs1q^{\ell-s}-1 times, since 𝜸𝒯=𝜸𝒯\bm{\gamma}^{\prime}{\mathcal{T}}=\bm{\gamma}{\mathcal{T}} for all 𝜸𝜸𝒯\bm{\gamma}^{\prime}\in\bm{\gamma}{\mathcal{T}}. Thus, the number of elements 𝜸\bm{\gamma} with dim(𝜸𝒯𝔽qm)=1\dim(\bm{\gamma}{\mathcal{T}}\cap\mathbb{F}_{q^{m}})=1 is (qs1)(qm1)/(q1)(q^{\ell-s}-1)(q^{m}-1)/(q-1), which completes the proof. ∎

Now we consider a greedy construction that generates mm-dimensional subspaces 𝒲{\mathcal{W}}, m<m<\ell, that generate subspace-polynomial repair schemes with minimal repair bandwidths. Assume that aa\mid\ell. The aim is to construct a subspace 𝒲{\mathcal{W}} of 𝔽q\mathbb{F}_{q^{\ell}} satisfying dim(𝜸𝔽qa𝒲){0,1}\dim(\bm{\gamma}\mathbb{F}_{q^{a}}\cap{\mathcal{W}})\in\{0,1\} for all 𝜸𝔽q\bm{\gamma}\in\mathbb{F}_{q^{\ell}}^{*}.

Lemma 4.

Assume that aa\mid\ell and that q>(qm12)(qa1q1)2+1q^{\ell}>\binom{q^{m-1}}{2}\Big{(}\frac{q^{a}-1}{q-1}\Big{)}^{2}+1. Then there exists an mm-dimensional subspace 𝒲{\mathcal{W}} satisfying that dim(𝛄𝔽qa𝒲){0,1}\dim(\bm{\gamma}\mathbb{F}_{q^{a}}\cap{\mathcal{W}})\in\{0,1\} for all 𝛄𝔽q\bm{\gamma}\in\mathbb{F}_{q^{\ell}}^{*}.

Proof.

We will construct in a greedy manner a set {𝒘1,,𝒘m}𝔽q\{\bm{w}_{1},\dots,\bm{w}_{m}\}\subset\mathbb{F}_{q^{\ell}}^{*} that satisfies two properties given below.

  • (P1) {𝒘1,,𝒘m}\{\bm{w}_{1},\dots,\bm{w}_{m}\} is 𝔽q\mathbb{F}_{q}-linearly independent, and

  • (P2) the subspace 𝒲m=𝗌𝗉𝖺𝗇𝔽q({𝒘1,,𝒘m}){\mathcal{W}}_{m}\stackrel{{\scriptstyle\mbox{\tiny$\triangle$}}}{{=}}\mathsf{span}_{\mathbb{F}_{q}}\big{(}\{\bm{w}_{1},\dots,\bm{w}_{m}\}\big{)} satisfies that dim(𝜸𝔽qa𝒲m){0,1}\dim(\bm{\gamma}\mathbb{F}_{q^{a}}\cap{\mathcal{W}}_{m})\in\{0,1\} for all 𝜸𝔽q\bm{\gamma}\in\mathbb{F}_{q^{\ell}}^{*}

The first element 𝒘1\bm{w}_{1} can be picked arbitrarily in 𝔽q\mathbb{F}_{q^{\ell}}^{*} because 𝒲1=𝗌𝗉𝖺𝗇𝔽q({w1}){\mathcal{W}}_{1}\stackrel{{\scriptstyle\mbox{\tiny$\triangle$}}}{{=}}\mathsf{span}_{\mathbb{F}_{q}}(\{w_{1}\}) satisfies (P1) and (P2) obviously. Assume that we have already had a set {𝒘1,,𝒘m1}\{\bm{w}_{1},\dots,\bm{w}_{m-1}\} that satisfies (P1) and (P2). We now show that we can find 𝒘m\bm{w}_{m} so that {𝒘1,,𝒘m}\{\bm{w}_{1},\dots,\bm{w}_{m}\} satisfies (P1) and (P2) given that aa\mid\ell and q>(qm12)(qa1q1)2+1q^{\ell}>\binom{q^{m-1}}{2}\Big{(}\frac{q^{a}-1}{q-1}\Big{)}^{2}+1. Consider the set

Bm1={𝜶1𝒖+𝜶2𝒗:𝒖,𝒗𝒲m1,𝒖𝒗,𝜶1,𝜶2𝔽qa}.B_{m-1}\stackrel{{\scriptstyle\mbox{\tiny$\triangle$}}}{{=}}\{\bm{\alpha}_{1}{\bm{u}}+\bm{\alpha}_{2}{\bm{v}}\colon{\bm{u}},{\bm{v}}\in{\mathcal{W}}_{m-1},{\bm{u}}\neq{\bm{v}},\bm{\alpha}_{1},\bm{\alpha}_{2}\in\mathbb{F}_{q^{a}}\}.\vspace{-5pt}

Claim 1: |Bm1|(qm12)(qa1q1)2+1<|𝔽q||B_{m-1}|\leq\binom{q^{m-1}}{2}\Big{(}\frac{q^{a}-1}{q-1}\Big{)}^{2}+1<|\mathbb{F}_{q^{\ell}}|.

Claim 2: Any 𝒘m𝔽qBm1\bm{w}_{m}\in\mathbb{F}_{q^{\ell}}\setminus B_{m-1} satisfies (P1)-(P2).

Proof of Claim 1.

Note that 𝜶1=𝜶2=0\bm{\alpha}_{1}=\bm{\alpha}_{2}=0 gives 0Bm10\in B_{m-1}. Since (τ𝜶)𝒖=𝜶(τ𝒖)(\tau\bm{\alpha}){\bm{u}}=\bm{\alpha}(\tau{\bm{u}}) and τ𝒖𝒲m1\tau{\bm{u}}\in{\mathcal{W}}_{m-1} for τ𝔽q\tau\in\mathbb{F}_{q} and 𝒖𝒲m1{\bm{u}}\in{\mathcal{W}}_{m-1}, to count the elements of Bm1B_{m-1} corresponding to 𝜶10\bm{\alpha}_{1}\neq 0 and 𝜶20\bm{\alpha}_{2}\neq 0, we only need to consider (qa1)/(q1)(q^{a}-1)/(q-1) values for each 𝜶1\bm{\alpha}_{1} and 𝜶2\bm{\alpha}_{2}. Moreover, we can ignore the case 𝜶10\bm{\alpha}_{1}\neq 0 and 𝜶2=0\bm{\alpha}_{2}=0 or vice versa as the resulting elements are already counted for 𝜶10\bm{\alpha}_{1}\neq 0 and 𝜶20\bm{\alpha}_{2}\neq 0 when setting either 𝒗=0{\bm{v}}=0 or 𝒖=0{\bm{u}}=0, respectively. Thus, other than 0, Bm1B_{m-1} has at most (qa1q1)2(qm12)\big{(}\frac{q^{a}-1}{q-1}\big{)}^{2}\binom{q^{m-1}}{2} other elements, where the binomial factor counts the number of distinct pairs 𝒖,𝒗𝒲m1{\bm{u}},{\bm{v}}\in{\mathcal{W}}_{m-1}. Thus, |Bm1|(qa1q1)2(qm12)+1|B_{m-1}|\leq\big{(}\frac{q^{a}-1}{q-1}\big{)}^{2}\binom{q^{m-1}}{2}+1 elements. ∎

Proof of Claim 2.

Since 𝒲m1Bm1{\mathcal{W}}_{m-1}\subseteq B_{m-1}, 𝒘m𝒲m1\bm{w}_{m}\notin{\mathcal{W}}_{m-1}, which implies (P1). Assume, for the sake of contradiction, that dim(𝒲m𝜸𝔽qa)2\dim({\mathcal{W}}_{m}\cap\bm{\gamma}\mathbb{F}_{q^{a}})\geq 2 for some 𝜸𝔽q\bm{\gamma}\in\mathbb{F}_{q^{\ell}}^{*}. Then there exist 𝒖,𝒗𝒲m1{\bm{u}},{\bm{v}}\in{\mathcal{W}}_{m-1}, 𝒖𝒗{\bm{u}}\neq{\bm{v}} so that either a) {𝒘m+𝒖,𝒘m+𝒗}𝜸𝔽qa\{\bm{w}_{m}+{\bm{u}},\bm{w}_{m}+{\bm{v}}\}\subset\bm{\gamma}\mathbb{F}_{q^{a}} and 𝗋𝖺𝗇𝗄𝔽q({𝒘m+𝒖,𝒘m+𝒗})=2\mathsf{rank}_{\mathbb{F}_{q}}(\{\bm{w}_{m}+{\bm{u}},\bm{w}_{m}+{\bm{v}}\})=2, or b) {𝒘m+𝒖,𝒗}𝜸𝔽qa\{\bm{w}_{m}+{\bm{u}},{\bm{v}}\}\subset\bm{\gamma}\mathbb{F}_{q^{a}} and 𝗋𝖺𝗇𝗄𝔽q({𝒘m+𝒖,𝒗})=2\mathsf{rank}_{\mathbb{F}_{q}}(\{\bm{w}_{m}+{\bm{u}},{\bm{v}}\})=2. If a) occurs, then there exist 𝒙,𝒚𝔽qa{\bm{x}},{\bm{y}}\in\mathbb{F}_{q^{a}}, 𝒙0{\bm{x}}\neq 0, 𝒚0{\bm{y}}\neq 0, 𝒙𝒚{\bm{x}}\neq{\bm{y}}, so that 𝒘m+𝒖=𝜸𝒙\bm{w}_{m}+{\bm{u}}=\bm{\gamma}{\bm{x}} and 𝒘m+𝒗=𝜸𝒚\bm{w}_{m}+{\bm{v}}=\bm{\gamma}{\bm{y}}, which implies that 𝒘m=𝒚𝒙𝒚𝒖+𝒙𝒚𝒙𝒗Bm1\bm{w}_{m}=\frac{{\bm{y}}}{{\bm{x}}-{\bm{y}}}{\bm{u}}+\frac{{\bm{x}}}{{\bm{y}}-{\bm{x}}}{\bm{v}}\in B_{m-1}, which contradicts our assumption. The case b) can be treated similarly. ∎

The proof of Lemma 4 follows from these two claims. Indeed, by Claim 1, there exists at least one element in 𝔽qBm1\mathbb{F}_{q^{\ell}}\setminus B_{m-1}, which is the desired 𝒘m\bm{w}_{m} according to Claim 2. ∎

Theorem 3.

Consider a full-length Reed-Solomon codes RS(n=q,k)(n=q^{\ell},k) over 𝔽q\mathbb{F}_{q^{\ell}} with nkqmn-k\geq q^{m} for some >m1\ell>m\geq 1. If (s)(\ell-s)\mid\ell and q>(qm12)(qs1q1)2+1q^{\ell}>\binom{q^{m-1}}{2}\Big{(}\frac{q^{\ell-s}-1}{q-1}\Big{)}^{2}+1, then there exists a linear repair scheme with side information of size ss that uses the repair bandwidth of (q1)(s)(qs1)(qm1)/(q1)(q^{\ell}-1)(\ell-s)-(q^{\ell-s}-1)(q^{m}-1)/(q-1) sub-symbols in 𝔽q\mathbb{F}_{q}. The scheme is optimal when nk=qmn-k=q^{m}.

Proof.

By Lemma 4, there exists an mm-dimensional subspace 𝒲{\mathcal{W}} satisfying dim(𝜸𝔽qs𝒲){0,1}\dim(\bm{\gamma}\mathbb{F}_{q^{\ell-s}}\cap{\mathcal{W}})\in\{0,1\}, for all 𝜸𝔽q\bm{\gamma}\in\mathbb{F}_{q^{\ell}}^{*}. The rest of the proof proceeds similarly to that of Theorem 2. ∎

III-E Bandwidth Reductions Given Side Information

To illustrate the repair bandwidth reduction in the presence of side information, we consider as an example the parameter regime assumed in Theorem 2. Case 1: s=1s=\ell-1 and m=/dm=\ell/d for some constant d2d\geq 2. Theorem 2 gives a repair bandwidth with side information 𝖻𝗐𝖲𝖨=(q1)(q/d1){\sf{bw}_{SI}}=(q^{\ell}-1)-(q^{\ell/d}-1). The optimal repair bandwidth with no side information is 𝖻𝗐=(q11)(11/d){\sf{bw}}=(q^{\ell-1}-1)(1-1/d)\ell (see [17, 31]). Clearly, lim𝖻𝗐𝖲𝖨/𝖻𝗐=0\lim_{\ell\to\infty}{\sf{bw}_{SI}}/{\sf{bw}}=0. Case 2: s=c/(c1)s=c\ell/(c-1), i.e. s=/c\ell-s=\ell/c, and m=/dm=\ell/d, for some constants c,d2c,d\geq 2. Then 𝖻𝗐𝖲𝖨=(q1)/c(q/c1)(q/d1)/(q1){\sf{bw}_{SI}}=(q^{\ell}-1)\ell/c-(q^{\ell/c}-1)(q^{\ell/d}-1)/(q-1), whereas 𝖻𝗐=(q1)(d1)/d{\sf{bw}}=(q^{\ell}-1)(d-1)\ell/d. Clearly, 𝖻𝗐𝖻𝗐𝖲𝖨(q/c1)(q/d1)/(q1){\sf{bw}}-{\sf{bw}_{SI}}\geq(q^{\ell/c}-1)(q^{\ell/d}-1)/(q-1)\to\infty as \ell\to\infty.

IV Conclusions

We proposed the problem of repairing a single erasure of Reed-Solomon codes with side information, which generalizes the standard repair problem, and established a lower bound on the repair bandwidth of a linear repair scheme. The problem of constructing optimal subspace-polynomial repair schemes can be reduced to a subspace intersection problem, which is interesting in its own right. We settled this problem for a few parameter regimes, leaving the general case open for future research.

Acknowledgement

The work of Son Hoang Dau was supported by the Australia Research Council DECRA Grant DE180100768 and DP Grant DP200100731. The work of Han Mao Kiah was supported by the Ministry of Education, Singapore, under its MOE AcRF Tier 2 Award under Grant MOE-T2EP20121-0007 and MOE AcRF Tier 1 Award under Grant RG19/23. The work of Stanislav Kruglik was supported by the Ministry of Education, Singapore, under its MOE AcRF Tier 2 Award under Grant MOE-T2EP20121-0007.

References

  • [1] F. J. MacWilliams and N. J. A. Sloane, The Theory of Error-Correcting Codes.   Amsterdam: North-Holland, 1977.
  • [2] K. V. Rashmi, N. B. Shah, D. Gu, H. Kuang, D. Borthakur, and K. Ramchandran, “A solution to the network challenges of data recovery in erasure-coded distributed storage systems: A study on the Facebook warehouse cluster,” in Proc. USENIX Conf. Hot Topics Storage File Syst. (HotStorage), 2013, pp. 8–8.
  • [3] A. Dimakis, P. Godfrey, Y. Wu, M. Wainwright, and K. Ramchandran, “Network coding for distributed storage systems,” IEEE Trans. Inform. Theory, vol. 56, no. 9, pp. 4539–4551, 2010.
  • [4] A. Dimakis, K. Ramchandran, Y. Wu, and C. Suh, “A survey on network codes for distributed storage,” Proc. IEEE, vol. 99, no. 3, pp. 476–489, 2011.
  • [5] O. Khan, R. Burns, J. Plank, W. Pierce, and C. Huang, “Rethinking erasure codes for cloud file systems: Minimizing i/o for recovery and degraded reads,” in Proc. 13th USENIX Conf. File Storage Technol. (FAST), 2012.
  • [6] S. B. Balaji, M. N. Krishnan, M. Vajha, V. Ramkumar, B. Sasidharan, and P. V. Kumar, “Erasure coding for distributed storage: an overview,” Science China Information Sciences, vol. 61, no. 10, 2018.
  • [7] A. Wyner, “On source coding with side information at the decoder,” IEEE Trans. Inform. Theory, vol. 21, no. 3, pp. 294–300, 1975.
  • [8] G. Keshet, Y. Steinberg, and N. Merhav, Channel Coding in the Presence of Side Information, ser. Foundations and Trends in Communication and Information Theory, 2008.
  • [9] V. Guruswami, “List decoding with side information,” in Proc. IEEE Annual Conference on Computational Complexity, 2003, pp. 300–309.
  • [10] Y. Birk and T. Kol, “Coding on demand by an informed source (ISCOD) for efficient broadcast of different supplemental data to caching clients,” IEEE Trans. Inform. Theory, vol. 52, no. 6, pp. 2825–2830, 2006.
  • [11] Z. Bar-Yossef, Y. Birk, T. S. Jayram, and T. Kol, “Index coding with side information,” IEEE Trans. Inform. Theory, vol. 57, no. 3, pp. 1479–1494, 2011.
  • [12] S. Kadhe, B. Garcia, A. Heidarzadeh, S. E. Rouayheb, and A. Sprintson, “Private information retrieval with side information,” IEEE Trans. Inform. Theory, vol. 66, no. 4, pp. 2032–2043, 2020.
  • [13] T. X. Dinh, L. Y. Nhi Nguyen, L. J. Mohan, S. Boztas, T. T. Luong, and S. H. Dau, “Practical considerations in repairing Reed-Solomon codes,” in Proc. IEEE Int. Symp. Inform. Theory (ISIT), 2022, pp. 2607–2612.
  • [14] K. Shanmugam, D. S. Papailiopoulos, A. G. Dimakis, and G. Caire, “A repair framework for scalar MDS codes,” IEEE J. Selected Areas Comm. (JSAC), vol. 32, no. 5, pp. 998–1007, 2014.
  • [15] V. Guruswami and M. Wootters, “Repairing Reed-Solomon codes,” in Proc. Annu. Symp. Theory Comput. (STOC), 2016.
  • [16] ——, “Repairing Reed-Solomon codes,” IEEE Trans. Inform. Theory, vol. 63, no. 9, pp. 5684–5698, 2017.
  • [17] S. H. Dau and O. Milenkovic, “Optimal repair schemes for some families of Reed-Solomon codes,” in Proc. IEEE Int. Symp. Inform. Theory (ISIT), 2017, pp. 346–350.
  • [18] S. H. Dau, I. Duursma, H. M. Kiah, and O. Milenkovic, “Repairing Reed-Solomon codes with two erasures,” in Proc. IEEE Int. Symp. Inform. Theory (ISIT), 2017, pp. 351–355.
  • [19] ——, “Repairing Reed-Solomon codes with multiple erasures,” IEEE Trans. Inform. Theory, vol. 54, no. 10, pp. 6567–6582, 2018.
  • [20] M. Ye and A. Barg, “Explicit constructions of high-rate MDS array codes with optimal repair bandwidth,” IEEE Trans. Inform. Theory, vol. 63, no. 4, pp. 2001–2014, 2017.
  • [21] W. Li, Z. Wang, and H. Jafarkhani, “A tradeoff between the sub-packetization size and the repair bandwidth for Reed-Solomon code,” in Proc. 55th Annual Allerton Conf. Comm. Control Comput. (Allerton), 2017, pp. 942–949.
  • [22] ——, “On the sub-packetization size and the repair bandwidth of Reed-Solomon codes,” IEEE Trans. Inform. Theory, vol. 65, no. 9, pp. 5484–5502, 2019.
  • [23] ——, “Repairing Reed-Solomon Codes Over GF(2)GF(2^{\ell}),” IEEE Comm. Lett., vol. 24, no. 1, pp. 34–37, 2020.
  • [24] A. Chowdhury and A. Vardy, “Improved schemes for asymptotically optimal repair of MDS codes,” in Proc. 55th Annual Allerton Conf. Comm Control Comput. (Allerton), 2017.
  • [25] ——, “Improved schemes for asymptotically optimal repair of MDS codes,” IEEE Trans. Inform. Theory, vol. 67, no. 8, pp. 5051–5068, 2021.
  • [26] I. Tamo, M. Ye, and A. Barg, “Optimal repair of Reed-Solomon codes: Achieving the cut-set bound,” in Proc. 58th Annual IEEE Symp. Foundations Computer Sci. (FOCS), 2017.
  • [27] ——, “The repair problem for Reed-Solomon codes: Optimal repair of single and multiple erasures with almost optimal node size,” IEEE Trans. Inform. Theory, vol. 65, no. 5, pp. 2673–2695, 2018.
  • [28] S. H. Dau and E. Viterbo, “Repair schemes with optimal I/O costs for full-length Reed-Solomon codes with two parities,” in Proc. IEEE Inform. Theory Workshop (ITW), 2018, pp. 590–594.
  • [29] S. H. Dau, I. Duursma, and H. Chu, “On the I/O costs of some repair schemes for full-length Reed-Solomon codes,” in Proc. IEEE Int. Symp. Inform. Theory (ISIT), 2018, pp. 1700–1704.
  • [30] I. Duursma and S. H. Dau, “Low bandwidth repair of the RS(10,4) Reed-Solomon code,” in Proc. Inform. Theory Applicat. Workshop (ITA), 2017.
  • [31] S. H. Dau, T. X. Dinh, H. M. Kiah, T. T. Luong, and O. Milenkovic, “Repairing Reed-Solomon codes via subspace polynomials,” IEEE Trans. Inform. Theory, vol. 67, no. 10, pp. 6395–6407, 2021.
  • [32] W. Li, H. Dau, Z. Wang, H. Jafarkhani, and E. Viterbo, “On the I/O costs in repairing short-length Reed-Solomon codes,” in Proc. IEEE Int. Symp. Inform. Theory (ISIT), 2019, pp. 1087–1091.
  • [33] T. X. Dinh, S. Boztas, S. H. Dau, and E. Viterbo, “Designing compact repair groups for Reed-Solomon codes,” in Proc. IEEE Int. Symp. Inform. Theory (ISIT), 2023, pp. 2027–2032.
  • [34] J. Xu, Y. Zhang, K. Wang, and Z. Zhang, “Cooperative repair of Reed-Solomon codes via linearized permutation polynomials,” IEEE Trans. Inform. Theory, pp. 1–11, 2023.
  • [35] A. Berman, S. Buzaglo, A. Dor, Y. Shany, and I. Tamo, “Repairing Reed–Solomon codes evaluated on subspaces,” in Proc. IEEE Int. Symp. Inform. Theory (ISIT), 2021, pp. 867–871.
  • [36] R. Con and I. Tamo, “Nonlinear repair of Reed-Solomon codes,” IEEE Trans. Inform. Theory, vol. 68, no. 8, pp. 5165–5177, 2022.
  • [37] R. Con, N. Shutty, I. Tamo, and M. Wootters, “Repairing Reed-Solomon codes over prime fields via exponential sums,” in Proc. IEEE Int. Symp. Inform. Theory (ISIT), 2023, pp. 1330–1335.
  • [38] A. W. Marshall, I. Olkin, and B. C. Arnold, Inequalities: Theory of Majorization and Its Applications, ser. Springer Series in Statistics, 2011.

V Appendix

V-A Proof of Proposition 1

The first part of this appendix is devoted for the discussion on the definition and the existence of a linear repair scheme for a failed node with side information of size ss of Reed-Solomon code RS(n,k)(n,k). Similar to an (exact) linear repair scheme for a failed node with standard repair, a linear repair scheme for a node with side information of size ss is described by elements 𝜸\bm{\gamma}’s used in each trace along with a linear algorithm.

We first propose the definition of a linear repair scheme with side information of size ss which is modeled after the definition of a linear repair scheme with the standard repair in [15].

Definition 2.

A linear repair scheme with side information S={𝜷i}i[s]S=\{\bm{\beta}_{i}\}_{i\in[s]} for a symbol f(𝜶)f(\bm{\alpha}^{*}) of Reed-Solomon code RS(n,k)(n,k) with evaluation point set AA, |A|=n|A|=n, over the coding field 𝔽q\mathbb{F}_{q^{\ell}} and the base field 𝔽q\mathbb{F}_{q} consists of
\bullet a set Q𝜶𝔽qQ_{\bm{\alpha}}\subset\mathbb{F}_{q^{\ell}}, for each 𝜶𝜶\bm{\alpha}\neq\bm{\alpha}^{*}, and
\bullet s\ell-s coefficients ηi𝔽q,i[s+1,]\eta_{i}\in\mathbb{F}_{q},i\in[s+1,\ell], where ηi\eta_{i}’s are 𝔽q\mathbb{F}_{q}-linear coefficients of the queries 𝜶A{0}{𝖳𝗋(𝜸f(𝜶)):𝜸Q𝜶}\cup_{\bm{\alpha}\in A\setminus\{0\}}\{\mathsf{Tr}(\bm{\gamma}f(\bm{\alpha})):\bm{\gamma}\in Q_{\bm{\alpha}}\} so that there is a linear reconstruction algorithm that computes f(𝜶)=i[]ηi𝝂if(\bm{\alpha}^{*})=\sum_{i\in[\ell]}\eta_{i}\bm{\nu}_{i}, where ηi=𝖳𝗋(𝜷if(𝜶))\eta_{i}=\mathsf{Tr}\big{(}\bm{\beta}_{i}f(\bm{\alpha}^{*})\big{)}, i[s]i\in[s], which are already known from the side information SS, and {𝝂1,,𝝂}\{\bm{\nu}_{1},\dots,\bm{\nu}_{\ell}\} is an 𝔽q\mathbb{F}_{q}-basis of 𝔽q\mathbb{F}_{q^{\ell}}. The repair bandwidth bb is the total number of sub-symbols in 𝔽q\mathbb{F}_{q} returned by each node 𝜶\bm{\alpha}, i.e., b=𝜶A{𝜶}Q𝜶b=\sum_{\bm{\alpha}\in A\setminus\{\bm{\alpha}^{*}\}}\mid Q_{\bm{\alpha}}\mid.

Lemma 5.

Suppose there is a linear repair scheme for repairing f(𝛂)f(\bm{\alpha}^{*}) of RS(A,k)(A,k) with side information S={𝛃i}i[s]S=\{\bm{\beta}_{i}\}_{i\in[s]} given by a set {Q𝛂}𝛂A{𝛂}\{Q_{\bm{\alpha}}\}_{\bm{\alpha}\in A\setminus\{\bm{\alpha}^{*}\}} and a linear algorithm as in Definition 2. Then, there is an 𝔽q\mathbb{F}_{q}-linearly independent set B={𝛃i}i[s+1,]B=\{\bm{\beta}_{i}\}_{i\in[s+1,\ell]} so that there are elements 𝛍𝛃i,𝛂\bm{\mu}_{\bm{\beta}_{i},\bm{\alpha}} satisfying 𝛃if(𝛂)=𝛂A{𝛂}𝛍𝛃i,𝛂f(𝛂)\bm{\beta}_{i}f(\bm{\alpha}^{*})=\sum_{\bm{\alpha}\in A\setminus\{\bm{\alpha}^{*}\}}\bm{\mu}_{\bm{\beta}_{i},\bm{\alpha}}f(\bm{\alpha}), for all f𝔽q[x]f\in\mathbb{F}_{q^{\ell}}[x] of degree less than kk, where {𝛍𝛃i,𝛂}𝛃iB𝗌𝗉𝖺𝗇𝔽q(Q𝛂)\{\bm{\mu}_{\bm{\beta}_{i},\bm{\alpha}}\}_{\bm{\beta}_{i}\in B}\subseteq\mathsf{span}_{\mathbb{F}_{q}}(Q_{\bm{\alpha}}).

Proof.

Suppose B={𝜷s+1,,𝜷}B=\{\bm{\beta}_{s+1},\dots,\bm{\beta}_{\ell}\} is an 𝔽q\mathbb{F}_{q}-linearly independent set of 𝔽q\mathbb{F}_{q^{\ell}} so that {𝜷1,,𝜷}\{\bm{\beta}_{1},\dots,\bm{\beta}_{\ell}\} is a basis of 𝔽q\mathbb{F}_{q^{\ell}} and {𝝂1,,𝝂}\{\bm{\nu}_{1},\dots,\bm{\nu}_{\ell}\} is its trace dual basis. According to Definition 2, the linear repair algorithm computes coefficients ηi𝔽q\eta_{i}\in\mathbb{F}_{q} so that f(𝜶)=i[]ηi𝝂if(\bm{\alpha}^{*})=\sum_{i\in[\ell]}\eta_{i}\bm{\nu}_{i}, where ηi=𝖳𝗋(𝜷if(𝜶))\eta_{i}=\mathsf{Tr}\big{(}\bm{\beta}_{i}f(\bm{\alpha}^{*})\big{)} for i[s]i\in[s], and ηi\eta_{i}’s, i[s+1,]i\in[s+1,\ell], are 𝔽q\mathbb{F}_{q}-linear functions of the queries in {𝖳𝗋(𝜸f(𝜶)):𝜸𝜶A{𝜶}Q𝜶}\big{\{}\mathsf{Tr}\big{(}\bm{\gamma}f(\bm{\alpha})\big{)}:\bm{\gamma}\in\cup_{\bm{\alpha}\in A\setminus\{\bm{\alpha}^{*}\}}Q_{\bm{\alpha}}\big{\}}, i.e., for i[s+1,]i\in[s+1,\ell],

ηi=𝜶A{𝜶}𝜸Q𝜶ω𝜶,𝜸𝖳𝗋(𝜸f(𝜶))=𝜶A{𝜶}𝖳𝗋((𝜸Q𝜶ω𝜶,𝜸𝜸)f(𝜶))=𝜶A{𝜶}𝖳𝗋(𝝁𝜷i,𝜶f(𝜶))=𝖳𝗋(𝜶A{𝜶}𝝁𝜷i,𝜶f(𝜶)),\begin{split}\eta_{i}&=\sum_{\bm{\alpha}\in A\setminus\{\bm{\alpha}^{*}\}}\sum_{\bm{\gamma}\in Q_{\bm{\alpha}}}\omega_{\bm{\alpha},\bm{\gamma}}\mathsf{Tr}\big{(}\bm{\gamma}f(\bm{\alpha})\big{)}\\ &=\sum_{\bm{\alpha}\in A\setminus\{\bm{\alpha}^{*}\}}\mathsf{Tr}\big{(}\big{(}\sum_{\bm{\gamma}\in Q_{\bm{\alpha}}}\omega_{\bm{\alpha},\bm{\gamma}}\bm{\gamma}\big{)}f(\bm{\alpha})\big{)}\\ &=\sum_{\bm{\alpha}\in A\setminus\{\bm{\alpha}^{*}\}}\mathsf{Tr}\big{(}\bm{\mu}_{\bm{\beta}_{i},\bm{\alpha}}f(\bm{\alpha})\big{)}\\ &=\mathsf{Tr}\big{(}\sum_{\bm{\alpha}\in A\setminus\{\bm{\alpha}^{*}\}}\bm{\mu}_{\bm{\beta}_{i},\bm{\alpha}}f(\bm{\alpha})\big{)},\end{split}

for ω𝜶,𝜸𝔽q\omega_{\bm{\alpha},\bm{\gamma}}\in\mathbb{F}_{q}, and 𝝁𝜷i,𝜶=𝜸Q𝜶ω𝜶,𝜸𝜸𝗌𝗉𝖺𝗇𝔽q(Q𝜶)\bm{\mu}_{\bm{\beta}_{i},\bm{\alpha}}\stackrel{{\scriptstyle\mbox{\tiny$\triangle$}}}{{=}}\sum_{\bm{\gamma}\in Q_{\bm{\alpha}}}\omega_{\bm{\alpha},\bm{\gamma}}\bm{\gamma}\in\mathsf{span}_{\mathbb{F}_{q}}(Q_{\bm{\alpha}}). Furthermore, 𝖳𝗋(βif(𝜶))=𝖳𝗋(𝜷ij[]ηj𝝂j)=j[]ηj𝖳𝗋(𝜷i𝜸j)=ηi\mathsf{Tr}\big{(}\beta_{i}f(\bm{\alpha}^{*})\big{)}=\mathsf{Tr}\big{(}\bm{\beta}_{i}\sum_{j\in[\ell]}\eta_{j}\bm{\nu}_{j}\big{)}=\sum_{j\in[\ell]}\eta_{j}\mathsf{Tr}(\bm{\beta}_{i}\bm{\gamma}_{j})=\eta_{i}, i[s+1,]i\in[s+1,\ell]. Then, for i[s+1,]i\in[s+1,\ell],

𝖳𝗋(βif(𝜶))=𝖳𝗋(𝜶A{𝜶}𝝁𝜷i,𝜶f(𝜶)).\mathsf{Tr}\big{(}\beta_{i}f(\bm{\alpha}^{*})\big{)}=\mathsf{Tr}\big{(}\sum_{\bm{\alpha}\in A\setminus\{\bm{\alpha}^{*}\}}\bm{\mu}_{\bm{\beta}_{i},\bm{\alpha}}f(\bm{\alpha})\big{)}. (3)

The Equation 3 holds for all polynomials f𝔽q[x]f\in\mathbb{F}_{q^{\ell}}[x], deg(f)<k\deg(f)<k, then for all 𝜸𝔽q\bm{\gamma}\in\mathbb{F}_{q^{\ell}}^{*} and for all i[s+1,]i\in[s+1,\ell] it still holds, i.e., 𝖳𝗋(𝜸βjf(𝜶))=𝖳𝗋(𝜸𝜶A{𝜶}𝝁𝜷i,𝜶f(𝜶))\mathsf{Tr}\big{(}\bm{\gamma}\beta_{j}f(\bm{\alpha}^{*})\big{)}=\mathsf{Tr}\big{(}\bm{\gamma}\sum_{\bm{\alpha}\in A\setminus\{\bm{\alpha}^{*}\}}\bm{\mu}_{\bm{\beta}_{i},\bm{\alpha}}f(\bm{\alpha}^{*})\big{)}, which derives to βif(𝜶)=𝜶A{𝜶}𝝁𝜷i,𝜶f(𝜶)\beta_{i}f(\bm{\alpha}^{*})=\sum_{\bm{\alpha}\in A\setminus\{\bm{\alpha}^{*}\}}\bm{\mu}_{\bm{\beta}_{i},\bm{\alpha}}f(\bm{\alpha}). This completes the proof. ∎

Lemma 5 ensures for the existence of a linear algorithm to repair a failed node f(𝜶)f(\bm{\alpha}^{*}) once there exists a linear repair scheme by ensuring the existence of the set {𝜷i}i[s+1,]\{\bm{\beta}_{i}\}_{i\in[s+1,\ell]} and the elements 𝝁𝜷i,𝜶\bm{\mu}_{\bm{\beta}_{i},\bm{\alpha}}, for i[s+1,]i\in[s+1,\ell] and 𝜶A{𝜶}\bm{\alpha}\in A\setminus\{\bm{\alpha}^{*}\}. We now propose a linear algorithm to repair f(𝜶)f(\bm{\alpha}^{*}) of RS(A,k)(A,k) with side information S={𝜷i}i[s]S=\{\bm{\beta}_{i}\}_{i\in[s]}, which is modeled after the linear algorithm to repair RS(A,k)(A,k) in [15].  

Algorithm 1.

Linear repair with side information S={𝜷i}i[s]S=\{\bm{\beta}_{i}\}_{i\in[s]}.   Input: A set AA of evaluation points, a point 𝜶A\bm{\alpha}^{*}\in A of the failed node f(𝜶)f(\bm{\alpha}^{*}), the ss traces 𝖳𝗋(𝜷if(𝜶))\mathsf{Tr}\big{(}\bm{\beta}_{i}f(\bm{\alpha}^{*})\big{)} corresponding to the side information S={𝜷i}i[s]S=\{\bm{\beta}_{i}\}_{i\in[s]}, the access to linear queries of the form 𝖳𝗋(𝜸f(𝜶))\mathsf{Tr}(\bm{\gamma}f(\bm{\alpha})), for all 𝜶A{𝜶}\bm{\alpha}\in A\setminus\{\bm{\alpha}^{*}\}.
Output: the value f(𝜶)f(\bm{\alpha}^{*}).
Steps:

  1. 1.

    Choose a linearly independent set B={βi}i[s+1,]B=\{\beta_{i}\}_{i\in[s+1,\ell]}.

  2. 2.

    Choose elements 𝝁𝜷i,𝜶𝔽q\bm{\mu}_{\bm{\beta}_{i},\bm{\alpha}}\in\mathbb{F}_{q^{\ell}} for each pair of 𝜷iB\bm{\beta}_{i}\in B and 𝜶A{𝜶}\bm{\alpha}\in A\setminus\{\bm{\alpha}^{*}\} so that βif(𝜶)=𝜶A{𝜶}𝝁𝜷i,𝜶f(𝜶)\beta_{i}f(\bm{\alpha}^{*})=\sum_{\bm{\alpha}\in A\setminus\{\bm{\alpha}^{*}\}}\bm{\mu}_{\bm{\beta}_{i},\bm{\alpha}}f(\bm{\alpha}).

  3. 3.

    for 𝜷iB\bm{\beta}_{i}\in B do

  4. 4.

    Choose an arbitrary spanning set Q𝜶Q_{\bm{\alpha}} for the set {𝝁𝜷i,𝜶}i[s+1,]\{\bm{\mu}_{\bm{\beta}_{i},\bm{\alpha}}\}_{i\in[s+1,\ell]} and get the queries 𝖳𝗋(𝜸f(𝜶))\mathsf{Tr}\big{(}\bm{\gamma}f(\bm{\alpha})\big{)}, 𝜸Q𝜶\bm{\gamma}\in Q_{\bm{\alpha}}.

  5. 5.

    Compute 𝖳𝗋(𝝁𝜷i,𝜶f(𝜶))\mathsf{Tr}\big{(}\bm{\mu}_{\bm{\beta}_{i},\bm{\alpha}}f(\bm{\alpha})\big{)} for each 𝜶A{𝜶}\bm{\alpha}\in A\setminus\{\bm{\alpha}^{*}\} through the traces 𝖳𝗋(𝜸f(𝜶))\mathsf{Tr}(\bm{\gamma}f(\bm{\alpha}))’s.

  6. 6.

    Compute 𝖳𝗋(𝜷if(𝜶))=𝖳𝗋(𝜶A{𝜶}𝝁𝜷i,𝜶f(𝜶)),\mathsf{Tr}\big{(}\bm{\beta}_{i}f(\bm{\alpha}^{*})\big{)}=\mathsf{Tr}\big{(}\sum_{\bm{\alpha}\in A\setminus\{\bm{\alpha}^{*}\}}\bm{\mu}_{\bm{\beta}_{i},\bm{\alpha}}f(\bm{\alpha})\big{)}, i[s+1,]i\in[s+1,\ell], by taking the trace of both sides of the equation in Step 2.

  7. 7.

    end

  8. 8.

    Compute f(𝜶)f(\bm{\alpha}^{*}) from {𝖳𝗋(𝜷if(𝜶))}i[]\{\mathsf{Tr}\big{(}\bm{\beta}_{i}f(\bm{\alpha}^{*})\big{)}\}_{i\in[\ell]}:

    f(𝜶)=i[]𝖳𝗋(𝜷if(𝜶))𝝂i,f(\bm{\alpha}^{*})=\sum_{i\in[\ell]}\mathsf{Tr}\big{(}\bm{\beta}_{i}f(\bm{\alpha}^{*})\big{)}\bm{\nu}_{i},

    where {𝝂1,,𝝂}\{\bm{\nu}_{1},\dots,\bm{\nu}_{\ell}\} is the dual basis of {𝜷1,,𝜷}\{\bm{\beta}_{1},\dots,\bm{\beta}_{\ell}\}.

 

The following proof of Proposition 1 indicates that a linear repair scheme for a node with side information size ss of a code RS(n,k)(n,k) is equivalent to a set of s\ell-s polynomials of degree less than nkn-k.

Proof of Proposition 1.

Supposing that {𝝂i}i=1\{\bm{\nu}_{i}\}_{i=1}^{\ell} is the trace-dual basis of {𝜷i}i=1\{\bm{\beta}_{i}\}_{i=1}^{\ell}, where {bi}i[s]=S\{b_{i}\}_{i\in[s]}=S and {bi}i=s+1=T\{b_{i}\}_{i=s+1}^{\ell}=T. Supposing that f(𝜶)=i=1ηi𝝂i.f(\bm{\alpha}^{*})=\sum_{i=1}^{\ell}\eta_{i}\bm{\nu}_{i}. According Lemma 5, the work of defining f(𝜶)f(\bm{\alpha}^{*}) with side information SS is now the work of defining s\ell-s coefficients ηi\eta_{i}, i[s+1,]i\in[s+1,\ell], and 𝖳𝗋(𝜷if(𝜶))=ηi\mathsf{Tr}\big{(}\bm{\beta}_{i}f(\bm{\alpha}^{*})\big{)}=\eta_{i}. This means that to define ηi\eta_{i}, i[s+1,]i\in[s+1,\ell], it is enough to find 𝖳𝗋(𝜷if(𝜶))\mathsf{Tr}\big{(}\bm{\beta}_{i}f(\bm{\alpha}^{*})\big{)}, or 𝖳𝗋(gi(𝜶)f(𝜶))\mathsf{Tr}\big{(}g_{i}(\bm{\alpha}^{*})f(\bm{\alpha}^{*})\big{)}, i[s+1,]i\in[s+1,\ell]. Each polynomial gi(x)g_{i}(x), i[s+1,]i\in[s+1,\ell], of degree less than nkn-k corresponds to a dual codeword of the Reed-Solomon codes RS(A,k)(A,k), which returns j=1ngi(𝜶j)λjf(𝜶j)=0\sum_{j=1}^{n}g_{i}(\bm{\alpha}_{j})\lambda_{j}f(\bm{\alpha}_{j})=0, where λ=(λ1,,λn)𝔽qn\lambda=(\lambda_{1},\dots,\lambda_{n})\in\mathbb{F}_{q^{\ell}}^{n}. Then, gi(𝜶)λf(𝜶)=𝜶j𝜶gi(𝜶j)λjf(𝜶j)g_{i}(\bm{\alpha}^{*})\lambda^{*}f(\bm{\alpha}^{*})=-\sum_{\bm{\alpha}_{j}\neq\bm{\alpha}^{*}}g_{i}(\bm{\alpha}_{j})\lambda_{j}f(\bm{\alpha}_{j}), which is equivalent to gi(𝜶)f(𝜶)=𝜶j𝜶gi(𝜶j)λjλf(𝜶j)g_{i}(\bm{\alpha}^{*})f(\bm{\alpha}^{*})=-\sum_{\bm{\alpha}_{j}\neq\bm{\alpha}^{*}}g_{i}(\bm{\alpha}_{j})\frac{\lambda_{j}}{\lambda^{*}}f(\bm{\alpha}_{j}). Applying the trace function on two sides of this equation we get 𝖳𝗋(gi(𝜶)f(𝜶))=𝜶j𝜶𝖳𝗋(gi(𝜶j)λjλf(𝜶j))\mathsf{Tr}\big{(}g_{i}(\bm{\alpha}^{*})f(\bm{\alpha}^{*})\big{)}=-\sum_{\bm{\alpha}_{j}\neq\bm{\alpha}^{*}}\mathsf{Tr}\big{(}g_{i}(\bm{\alpha}_{j})\frac{\lambda_{j}}{\lambda^{*}}f(\bm{\alpha}_{j})\big{)}. In conclusion, each ηi\eta_{i}, i[s+1,]i\in[s+1,\ell], can be computed through the traces 𝖳𝗋(gi(𝜶j)λjλf(𝜶j)),𝜶j𝜶\mathsf{Tr}\big{(}g_{i}(\bm{\alpha}_{j})\frac{\lambda_{j}}{\lambda^{*}}f(\bm{\alpha}_{j})\big{)},\bm{\alpha}_{j}\neq\bm{\alpha}^{*}, which can totally be defined through the polynomials gi(x)g_{i}(x), i[s+1,]i\in[s+1,\ell]. ∎

V-B Proof of Proposition 3

Proof of Proposition 3.

According to Proposition 1, a linear repair scheme with side information size ss for a failed node f(𝜶)f(\bm{\alpha}^{*}) corresponds to a set of s\ell-s polynomials. Supposing that the repair scheme is the polynomial set {gi(x)}i[s+1,]\{g_{i}(x)\}_{i\in[s+1,\ell]}, where 𝗋𝖺𝗇𝗄𝔽q({gs+1(𝜶),,g(𝜶)})=s\mathsf{rank}_{\mathbb{F}_{q}}\big{(}\{g_{s+1}(\bm{\alpha}^{*}),\dots,g_{\ell}(\bm{\alpha}^{*})\}\big{)}=\ell-s and 𝗋𝖺𝗇𝗄𝔽q({gs+1(𝜶),,g(𝜶)})=b𝜶\mathsf{rank}_{\mathbb{F}_{q}}\big{(}\{g_{s+1}(\bm{\alpha}),\dots,g_{\ell}(\bm{\alpha})\}\big{)}=b_{\bm{\alpha}}, for all 𝜶A{𝜶}\bm{\alpha}\in A\setminus\{\bm{\alpha}^{*}\}. Therefore, the repair bandwidth of the repair scheme is b=𝜶A{𝜶}b𝜶b=\sum_{\bm{\alpha}\in A\setminus\{\bm{\alpha}^{*}\}}b_{\bm{\alpha}}. For each 𝜶A\bm{\alpha}\in A, let S𝜶={e=(es+1,,e)𝔽qs:i[s+1,]eigi(𝜶)=0}S_{\bm{\alpha}}\stackrel{{\scriptstyle\mbox{\tiny$\triangle$}}}{{=}}\{\vec{e}=(e_{s+1},\dots,e_{\ell})\in\mathbb{F}_{q}^{\ell-s}:\sum_{i\in[s+1,\ell]}e_{i}g_{i}(\bm{\alpha})=0\}. Since 𝗋𝖺𝗇𝗄𝔽q({gi(𝜶)}i[s+1,])=b𝜶\mathsf{rank}_{\mathbb{F}_{q}}\big{(}\{g_{i}(\bm{\alpha})\}_{i\in[s+1,\ell]}\big{)}=b_{\bm{\alpha}}, dim𝔽q(S𝜶)=sb𝜶\dim_{\mathbb{F}_{q}}(S_{\bm{\alpha}})=\ell-s-b_{\bm{\alpha}}. Averaging the size |{𝜶A{𝜶}:eS𝜶}||\{\bm{\alpha}\in A\setminus\{\bm{\alpha}^{*}\}:\vec{e}\in S_{\bm{\alpha}}\}| over all nonzero vectors e𝔽qs\vec{e}\in\mathbb{F}_{q}^{\ell-s}, we have

μ=1qs1e𝔽qs{0}{𝜶A{𝜶}:eS𝜶}=1qs1𝜶A{𝜶}{e𝔽qs{0}:eS𝜶}=1qs1𝜶A{𝜶}(qsb𝜶1).\begin{split}&\mu\stackrel{{\scriptstyle\mbox{\tiny$\triangle$}}}{{=}}\frac{1}{q^{\ell-s}-1}\sum_{\vec{e}\in\mathbb{F}_{q}^{\ell-s}\setminus\{\vec{0}\}}\mid\{\bm{\alpha}\in A\setminus\{\bm{\alpha}^{*}\}:\vec{e}\in S_{\bm{\alpha}}\}\mid\\ &=\frac{1}{q^{\ell-s}-1}\sum_{\bm{\alpha}\in A\setminus\{\bm{\alpha}^{*}\}}\mid\{\vec{e}\in\mathbb{F}_{q}^{\ell-s}\setminus\{\vec{0}\}:\vec{e}\in S_{\bm{\alpha}}\}\mid\\ &=\frac{1}{q^{\ell-s}-1}\sum_{\bm{\alpha}\in A\setminus\{\bm{\alpha}^{*}\}}\big{(}q^{\ell-s-b_{\bm{\alpha}}}-1\big{)}.\end{split}

Then, there exists some e=(es+1,,e)𝔽qs{0}\vec{e}^{*}=(e_{s+1}^{*},\dots,e_{\ell}^{*})\in\mathbb{F}_{q}^{\ell-s}\setminus\{0\} such that |{𝜶A{𝜶}:eS𝜶}|μ|\{\bm{\alpha}\in A\setminus\{\bm{\alpha}^{*}\}:\vec{e}^{*}\in S_{\bm{\alpha}}\}|\geq\mu. Let g(x)=i[s+1,]eigi(x)g(x)\stackrel{{\scriptstyle\mbox{\tiny$\triangle$}}}{{=}}\sum_{i\in[s+1,\ell]}e_{i}^{*}g_{i}(x), g(x)g(x) vanishes on at least μ\mu elements of A{𝜶}A\setminus\{\bm{\alpha}^{*}\}. Furthermore, it follows from {gi(𝜶)}i[s+1,]\{g_{i}(\bm{\alpha}^{*})\}_{i\in[s+1,\ell]} is linearly independent and e0\vec{e}^{*}\neq 0 that g(𝜶)0g(\bm{\alpha}^{*})\neq 0. Hence, g(x)g(x) corresponds to a nonzero dual codeword of RS(A,k)(A,k) and has at most r1r-1 roots, where r=nkr=n-k. Then, μr1\mu\leq r-1, which allows that

𝜶A{𝜶}qb𝜶(r1)(qs1)+n1qs.\sum_{\bm{\alpha}\in A\setminus\{\bm{\alpha}^{*}\}}q^{-b_{\bm{\alpha}}}\leq\frac{(r-1)(q^{\ell-s}-1)+n-1}{q^{\ell-s}}. (4)

Put

T=(r1)(qs1)+n1qs,b𝖠𝖵𝖤=logn1T.T\stackrel{{\scriptstyle\mbox{\tiny$\triangle$}}}{{=}}\frac{(r-1)(q^{\ell-s}-1)+n-1}{q^{\ell-s}},b_{\mathsf{AVE}}\stackrel{{\scriptstyle\mbox{\tiny$\triangle$}}}{{=}}\log\frac{n-1}{T}.

Let

bmin=minb𝜶{0,,s}𝜶A{𝜶}b𝜶b_{\min}\stackrel{{\scriptstyle\mbox{\tiny$\triangle$}}}{{=}}\min_{b_{\bm{\alpha}}\in\{0,\dots,\ell-s\}}\sum_{\bm{\alpha}\in A\setminus\{\bm{\alpha}^{*}\}}b_{\bm{\alpha}} (5)

subject to (4).

The minimum occurs when b𝜶b_{\bm{\alpha}}’s are balanced and equal to b𝖠𝖵𝖤b_{\mathsf{AVE}}. Supposing that tt is the biggest integer satisfying

b1==bt=b𝖠𝖵𝖤,bt+1==bn1=b𝖠𝖵𝖤,b_{1}^{*}=\dots=b_{t}^{*}=\lfloor b_{\mathsf{AVE}}\rfloor,b_{t+1}^{*}=\dots=b_{n-1}^{*}=\lceil b_{\mathsf{AVE}}\rceil,

where i[n1]qbiT\sum_{i\in[n-1]}q^{-b_{i}^{*}}\leq T, and (b1,,bn1)(b_{1}^{*},\dots,b_{n-1}^{*}) is an optimal solution for (5). To obtain this solution, the ````balancing"" procedure as in [31] is applied. The computation for tt is easily obtained. Then, we get the lower bound as desired. ∎

V-C A discussion on the subspace intersections with the lowest repair bandwidth

A condition for an mm-dimensional subspace 𝒲{\mathcal{W}} so that the repair scheme constructed from this subspace by Lemma 2 and Corollary 2 obtains the minimal repair bandwidth among all mm-dimensional 𝔽q\mathbb{F}_{q}-subspaces is that the sum 𝜸𝔽qdim(𝜸𝔽qs𝒲)\sum_{\bm{\gamma}\in\mathbb{F}_{q^{\ell}}^{*}}\dim(\bm{\gamma}\mathbb{F}_{q^{\ell-s}}\cap{\mathcal{W}}) achieves the maximal value among all mm-dimensional 𝔽q\mathbb{F}_{q}-subspaces. One concrete consideration for obtaining the maximal sum is the case when the intersection subspaces have dimension 0 or 11. More particularly, for a parameter mm, if there exists an mm-dimensional subspaces 𝒲0{\mathcal{W}}_{0} with dim(𝜸𝔽qs𝒲0){0,1}\dim(\bm{\gamma}\mathbb{F}_{q^{\ell-s}}\cap{\mathcal{W}}_{0})\in\{0,1\}, for all 𝜸𝔽q\bm{\gamma}\in\mathbb{F}_{q^{\ell}}^{*}, then a sufficient condition for an arbitrary mm-dimensional subspace 𝒲{\mathcal{W}} used to construct subspace polynomial repair scheme that obtains the lowest repair bandwidth, i.e., the sum 𝜶𝔽q{𝜶}dim((𝜶𝜶)𝜹𝔽qs𝒲)\sum_{\bm{\alpha}\in\mathbb{F}_{q^{\ell}}\setminus\{\bm{\alpha}^{*}\}}\dim\big{(}(\bm{\alpha}-\bm{\alpha}^{*})\bm{\delta}\mathbb{F}_{q^{\ell-s}}\cap{\mathcal{W}}\big{)} achieves maximal value among all subspaces dimension mm, is also the condition that dim(𝜸𝔽qs𝒲){0,1}\dim(\bm{\gamma}\mathbb{F}_{q^{\ell-s}}\cap{\mathcal{W}})\in\{0,1\}, for all 𝜸𝔽q\bm{\gamma}\in\mathbb{F}_{q^{\ell}}^{*}. Moreover, for the codes RS(n,k)(n,k), where nk=qmn-k=q^{m}, this condition is the necessary and sufficient condition for the repair scheme constructed by 𝒲{\mathcal{W}} obtaining the optimal repair bandwidth. The repair schemes constructed in Theorems 2 and Theorem 3 are of this consideration. We will make the above discussion clearer in Corollary 3. Since our proof for the conclusion of subspace intersection of dimensions 0 or 11 is based on the majorization of two real number sequences, we first recall some basic results on this problem. For two sequences of real numbers x=(x1,,xp)x=(x_{1},\dots,x_{p}) and x=(x1,,xp)x^{\prime}=(x^{\prime}_{1},\dots,x^{\prime}_{p}), supposing that x1xpx_{1}\geq\dots\geq x_{p} and x1xpx^{\prime}_{1}\geq\dots\geq x^{\prime}_{p}, we say that xx is majorized by xx^{\prime} or xx^{\prime} majorizes xx if i=1pxi=i=1pxi\sum_{i=1}^{p}x_{i}=\sum_{i=1}^{p}x^{\prime}_{i} and i=1jxii=1jxi\sum_{i=1}^{j}x_{i}\leq\sum_{i=1}^{j}x^{\prime}_{i}, for all j[p1]j\in[p-1] [38, A.1, p.8].

Lemma 6.

[38, B.1, p.156] The inequality i=1pϕ(xi)i=1pϕ(xi)\sum_{i=1}^{p}\phi(x_{i})\leq\sum_{i=1}^{p}\phi(x^{\prime}_{i}) is satisfied for all continuous convex function ϕ:\phi:\mathbb{R}\rightarrow\mathbb{R} if and only if xx is majorized by xx^{\prime}.

Now we have the condition to get maximal value for the sum of intersection dimensions. Supposing that p=q1qa1p\stackrel{{\scriptstyle\mbox{\tiny$\triangle$}}}{{=}}\frac{q^{\ell}-1}{q^{a}-1}, which is the number of disjoint cosets of 𝔽qa\mathbb{F}_{q^{a}}^{*} in 𝔽q\mathbb{F}_{q^{\ell}}^{*}. Since each pair of cosets are completely coincided or disjoint, and for all 𝜸𝜸𝔽qa,𝜸𝔽qa=𝜸𝔽qa\bm{\gamma}^{\prime}\in\bm{\gamma}\mathbb{F}_{q^{a}}^{*},\bm{\gamma}^{\prime}\mathbb{F}_{q^{a}}^{*}=\bm{\gamma}\mathbb{F}_{q^{a}}^{*}, we only need to consider the sums over pp disjoint cosets with the value of each dimension in each sum repeated |𝔽qa|=qa1|\mathbb{F}_{q^{a}}^{*}|=q^{a}-1 times. We have the following proposition.

Proposition 4.

Let 𝛄1𝔽qa,,𝛄p𝔽qa\bm{\gamma}_{1}\mathbb{F}_{q^{a}}^{*},\dots,\bm{\gamma}_{p}\mathbb{F}_{q^{a}}^{*} are pp disjoint cosets and the two sequences d=(d1,,dp)d\stackrel{{\scriptstyle\mbox{\tiny$\triangle$}}}{{=}}(d_{1},\dots,d_{p}), d=(d1,,dp)d^{\prime}\stackrel{{\scriptstyle\mbox{\tiny$\triangle$}}}{{=}}(d^{\prime}_{1},\dots,d^{\prime}_{p}) are the dimensions of the intersection of subspaces 𝛄i𝔽qa\bm{\gamma}_{i}\mathbb{F}_{q^{a}} of these cosets with 𝒲{\mathcal{W}} and 𝒲{\mathcal{W}}^{\prime}, respectively. Without lost of generality, we can suppose that d1dpd_{1}\geq\dots\geq d_{p} and d1dpd^{\prime}_{1}\geq\dots\geq d^{\prime}_{p}. Let x=(x1,,xp)x\stackrel{{\scriptstyle\mbox{\tiny$\triangle$}}}{{=}}(x_{1},\dots,x_{p}) and x=(x1,,xp)x^{\prime}\stackrel{{\scriptstyle\mbox{\tiny$\triangle$}}}{{=}}(x^{\prime}_{1},\dots,x^{\prime}_{p}), where xi=qdi1x_{i}\stackrel{{\scriptstyle\mbox{\tiny$\triangle$}}}{{=}}q^{d_{i}}-1 if di>0d_{i}>0 and xi=0x_{i}\stackrel{{\scriptstyle\mbox{\tiny$\triangle$}}}{{=}}0 if di=0d_{i}=0, and xi=qdi1x^{\prime}_{i}\stackrel{{\scriptstyle\mbox{\tiny$\triangle$}}}{{=}}q^{d^{\prime}_{i}}-1 if di>0d^{\prime}_{i}>0 and xi=0x^{\prime}_{i}\stackrel{{\scriptstyle\mbox{\tiny$\triangle$}}}{{=}}0 if di=0d^{\prime}_{i}=0. Then, if xx is majorized by xx^{\prime} then i=1pdii=1pdi\sum_{i=1}^{p}d_{i}\geq\sum_{i=1}^{p}d^{\prime}_{i}.

Proof.

Since xx is majorized by xx^{\prime} and di=ϕ(xi),di=ϕ(xi)d_{i}=-\phi(x_{i}),d^{\prime}_{i}=-\phi(x^{\prime}_{i}), where ϕ(t)=logq(t+1)\phi(t)\stackrel{{\scriptstyle\mbox{\tiny$\triangle$}}}{{=}}-\log_{q}(t+1) is a continuous convex function over [0,+)[0,+\infty). The proof is completed by applying Lemma 6 for xx and xx^{\prime} and the computation of did_{i} and did^{\prime}_{i} through function ϕ(t)\phi(t). ∎

Corollary 3.

Let a|a|\ell, 𝒲{\mathcal{W}} and 𝒲{\mathcal{W}}^{\prime} are two mm-dimensional subspaces of 𝔽q\mathbb{F}_{q^{\ell}} where dim(𝛄𝔽qa𝒲){0,1}\dim(\bm{\gamma}\mathbb{F}_{q^{a}}\cap{\mathcal{W}})\in\{0,1\}, for all 𝛄𝔽q\bm{\gamma}\in\mathbb{F}_{q^{\ell}}^{*}. Then, 𝛄𝔽qdim(𝛄𝔽qa𝒲)𝛄𝔽qdim(𝛄𝔽qa𝒲)\sum_{\bm{\gamma}\in\mathbb{F}_{q^{\ell}}^{*}}\dim(\bm{\gamma}\mathbb{F}_{q^{a}}\cap{\mathcal{W}})\geq\sum_{\bm{\gamma}\in\mathbb{F}_{q^{\ell}}^{*}}\dim(\bm{\gamma}\mathbb{F}_{q^{a}}\cap{\mathcal{W}}^{\prime}). Moreover, 𝛄𝔽qdim(𝛄𝔽qa𝒲)=(qa1)(qm1)q1\sum_{\bm{\gamma}\in\mathbb{F}_{q^{\ell}}^{*}}\dim(\bm{\gamma}\mathbb{F}_{q^{a}}\cap{\mathcal{W}})=\frac{(q^{a}-1)(q^{m}-1)}{q-1}, and if there exists 𝛄𝔽q\bm{\gamma}^{\prime}\in\mathbb{F}_{q^{\ell}}^{*} so that dim(𝛄𝔽qa𝒲)>1\dim(\bm{\gamma}^{\prime}\mathbb{F}_{q^{a}}\cap{\mathcal{W}}^{\prime})>1 then 𝛄𝔽qdim(𝛄𝔽qa𝒲)>𝛄𝔽qdim(𝛄𝔽qa𝒲)\sum_{\bm{\gamma}\in\mathbb{F}_{q^{\ell}}^{*}}\dim(\bm{\gamma}\mathbb{F}_{q^{a}}\cap{\mathcal{W}})>\sum_{\bm{\gamma}\in\mathbb{F}_{q^{\ell}}^{*}}\dim(\bm{\gamma}\mathbb{F}_{q^{a}}\cap{\mathcal{W}}^{\prime}).

Proof.

The proof is completed by applying Proposition 4 and Lemma 6 for two sequences xx, xx^{\prime} in the special case where di{0,1}d_{i}\in\{0,1\}, for all i[p]i\in[p]. When all the intersections 𝜸𝔽qa𝒲\bm{\gamma}\mathbb{F}_{q^{a}}^{*}\cap{\mathcal{W}} is of dimension 0 or 11, the set of 11-dimensional intersections is a partition of 𝒲{\mathcal{W}}, which allows that the number of these subspaces is qm1q1\frac{q^{m}-1}{q-1}. Since each of the intersection is repeated qa1q^{a}-1 times, the total of dimensions is (qa1)(qm1)q1\frac{(q^{a}-1)(q^{m}-1)}{q-1}. If there exists dj>1d^{\prime}_{j}>1, for some j[p]j\in[p], then the strict inequality is achieved. ∎