Repairing Reed-Solomon Codes
with Side Information
Abstract
We generalize the problem of recovering a lost/erased symbol in a Reed-Solomon code to the scenario in which some side information about the lost symbol is known. The side information is represented as a set of linearly independent combinations of the sub-symbols of the lost symbol. When , this reduces to the standard problem of repairing a single codeword symbol. When is a set of sub-symbols of the erased one, this becomes the repair problem with partially lost/erased symbol. We first establish that the minimum repair bandwidth depends on and not the content of and construct a lower bound on the repair bandwidth of a linear repair scheme with side information . We then consider the well-known subspace-polynomial repair schemes and show that their repair bandwidths can be optimized by choosing the right subspaces. Finally, we demonstrate several parameter regimes where the optimal bandwidths can be achieved for full-length Reed-Solomon codes.
I Introduction
To prevent data loss and increase data availability in distributed storage systems, a file is usually split into data chunks and transformed (encoded) into coded chunks using an erasure code, and then stored across different storage nodes. If the code is MDS [1], such a system can withstand any failures because the entire file can be recovered from any chunks. When only one node fails, which is usually the most typical case (see, e.g. [2]), a repair/replacement node must download enough data from other helper nodes to recover its lost chunk. In the repair-bandwidth problem [3, 4], one seeks to minimize the repair bandwidth, i.e. the amount of downloaded data required for a successful recovery of the lost chunk. A low-bandwidth repair scheme can also be used for degraded read, in which requests for a chunk stored at an unavailable node can be served by other available nodes [5]. This important problem has been extensively studied in the literature (see, e.g. [6] and the references therein).
In this work we generalize the setting of the repair-bandwidth problem to accommodate side information (see Fig. 1 for a toy example). In information theory, the concept of side information has been investigated in numerous contexts, including source coding [7], channel coding [8], list decoding [9], index coding [10, 11], and private information retrieval [12]. In the context of the repair problem, side information refers to the additional information that the repair node knows about the lost chunk while recovering it. The side information could arise due to a partial loss of data, which means that part of the chunk is still accessible and serves as side information, or due to partial information gained from the previous communications or from other sources. The question of interest is that given the side information, what is the lowest repair bandwidth we can achieve. We refer to this as the repair-bandwidth with side information problem.

In the scope of this work, we focus on Reed-Solomon codes, which is currently the most widely used families of erasure codes in distributed storage systems (see [13]). The repair bandwidth as well as the closely related metric called I/O cost and sub-packetization size have been investigated in a number of recent works for different families of Reed-Solomon codes [14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 13, 33, 34, 35, 36, 37]. For an Reed-Solomon code over the finite field , a coded chunk, which is an element in (called a symbol), is repaired from a number of -elements (called sub-symbols) extracted from other coded chunks. Each symbol consists of sub-symbols in , i.e. . The repair bandwidth is measured in the number of extracted sub-symbols, while the side information of a symbol can be represented as a set of -linearly independent combinations of its sub-symbols . In summary, our contributions are given below.
-
•
We show that the minimum repair bandwidth for a codeword symbol of a Reed-Solomon code given the side information depends on it size , and independent of the specific choice of its elements.
-
•
We obtain a lower bound on the repair bandwidth of a linear repair scheme for a failed node with side information.
- •
-
•
A subspace that minimizes the repair bandwidth among all subspace-polynomial repair schemes can be found by solving an optimization problem on subspace intersections, which is of its own interest. We solve the problem in a few parameter regimes, leaving others for future research.
II Preliminaries
II-A Definitions and Notations
Let be a prime power, be the finite field of elements and be the extension field of degree of . We use to denote the set , to denote that divides , and for , for . For a set , let , and . We use to denote the -subspace of spanned by a subset . We use and (or and for short) for the dimension of a subspace and the rank of a set of vectors over . The (field) trace of an element over is . We also write instead of for simplicity.
Let be a linear code over . Then is an -dimensional -subspace of . A codeword of is an element and its codeword symbols are the components , . The dual code of a code is the orthogonal complement of , , where is the scalar product of and . The code is an -subspace of with dimension . The elements of are called dual codewords. The number is called the redundancy of the code .
Definition 1.
Let be a subset of size in . A Reed-Solomon code of dimension with evaluation points is defined as
where is the ring of polynomials over . We also use the notation RS, ignoring the evaluation points.
A generalized Reed-Solomon code, , where , is the set of codewords , where for all , . The dual code of a Reed-Solomon code is a generalized Reed-Solomon code , for some multiplier vector ([1, Chap. 10]). We sometimes use the notation GRS, ignoring and .
Let be a polynomial corresponding to a codeword of the Reed-Solomon code , and be a polynomial of degree at most , which corresponds to a codeword of the dual code . Then and we call the polynomial a check polynomial for .
II-B Trace Repair Method
Let RS be a Reed-Solomon code over with evaluation points , a codeword corresponding to polynomial , , and is a codeword symbol/node of , where . A (linear) trace repair scheme for corresponds to a set of check polynomials , , that satisfies the Full-Rank Condition: . The repair bandwidth of such a repair scheme (in -symbols) is . To repair all components of , we need such repair schemes (possibly with repetition). See, e.g. [31], for a detailed explanation of why the above scheme works with an example.
III Recovering an Erased Symbol with Side Information
III-A The Problem Description
Let be an RS code over with evaluation points . Suppose that the codeword symbol is erased and needs to be recovered, given a set of -linearly independent combinations of its sub-symbols (elements of ) as side information. Note that for each vector of coefficients , there exists a such that the equality holds for all . Therefore, we can represent the side information as a set , where , and assume that is -linearly independent. We assume that the replacement/recovery node already knows traces , where is -linearly independent. Equivalently, we can represent the side information as a subspace . We call the side information set and the side information subspace. Note that is a basis of and .
According to the trace-repair method, it needs to reconstruct some traces of , namely, , referred to as target traces, where is an -basis of . We refer to as the target set and the target subspace with respect to the side information set (or the side information subspace ). We capture this discussion in Proposition 1. Its proof is similar to [16, Thm. 4] and can be found in Appendix V-A.
Proposition 1.
Let be a linearly independent set and be a symbol of Reed-Solomon code RS over , . A linear repair scheme for with side information corresponds to a collection of polynomials , where , and are linearly independent.
III-B Optimal Repair Bandwidths Only Depend on the Side Information Set Size
We demonstrate below that the optimal repair bandwidth for recovering an erasure depends on but not on the specific choice of . We first need an auxiliary lemma.
Lemma 1.
Given two -subspaces and of of dimensions and , respectively. Then there exists an element such that , where . Equivalently, given two -linearly independent subsets and of , where , there exists such that , where , forms an -basis of .
Proof.
For each , as , there are exactly such so that (as ). Let , then We have
for . Hence, there exists a , , satisfying that for every . Thus, or forms a basis of as desired. ∎
Proposition 2.
The minimum repair bandwidth for a codeword symbol of an RS over given the side information depends on but not on the specific choice of its elements.
Proof.
Let and be two different sets of side information for repairing the same codeword symbol . It suffices to show that for every repair scheme for with side information , there exists a repair scheme for with side information achieving the same repair bandwidth. To this end, let corresponds to the repair scheme with side information , i.e. , and form an -basis of . According to Lemma 1, there exists such that is linearly independent. Therefore, the polynomials for all form a repair scheme for with side information and with the same repair bandwidth as ’s. ∎
III-C A Lower Bound on the Bandwidth with Side Information
We provide a lower bound for the repair bandwidth with side information for one erasure in a Reed-Solomon code. The lower bound is similar to those in [15, 16, 17, 31], replacing by at some places. When , it reduces to the existing bound (no side information). Its proof can be found in Appendix V-B.
Proposition 3.
Any linear repair scheme with side information size for Reed-Solomon code RS over requires a repair bandwidth of at least
sub-symbols over , where , , and are defined as , , and if , otherwise.
In some special cases, the lower bound in Proposition 3 can be explicitly computed.
Corollary 1.
Consider a full-length RS code over with for some . Assume that and . Then every linear repair scheme with side information set size requires a repair bandwidth of at least sub-symbols in .
Proof.
With and , we have
and
Next, we show that . Indeed, the second inequality is obvious because . For the first inequality, we need to show that
which is equivalent to
which is true because
noting that either the first or the second inequality must be strict: if (so that the second inequality becomes equality) then the first inequality is strict since . Thus, and and . Plugging this in the formula for in Proposition 3 we obtain
Finally, using Proposition 3 we obtain a lower bound of
sub-symbols over on the repair bandwidth as claimed. ∎
III-D Optimal Subspace-Polynomial-Based Repair Schemes
In this section we investigate the repair bandwidth incurred by the subspace-polynomial repair scheme introduced in [17, 31], which generalizes the trace-polynomial-based scheme in [15, 16], under the new assumption of side information. We show that in contrast to the standard repair problem (with no side information), the repair bandwidth of such a scheme depends on the specific choice of the subspace. In particular, we transform the problem of finding subspace-polynomial repair schemes with minimum bandwidths possible into another one on subspace intersection, which on its own is an intriguing problem.
Before presenting Theorem 1, we note that given side information set , to construct a subspace-polynomial repair scheme, one first needs to find a target set such that forms an -basis of (see Proposition 1 and the discussion preceding it). Next, given that , for some , one picks an -dimensional subspace of , and form the check polynomials , . Note that is the subspace polynomial which is a linearly map constructed over and . The check polynomials are then used in the trace repair scheme.
Lemma 2.
Consider an RS with evaluation points satisfying , . Consider also a repair scheme with side information of size , that consists of polynomials , where and is a target set. This scheme has bandwidth (with helper nodes)
sub-symbols in , where .
Proof.
The node storing computes traces , . However, due to the linearity of trace, it only needs to send traces. To compute this rank, let and defined as for every . Then and . Using the rank-nullity theorem, we obtain
Summing this over all completes the lemma. ∎
Given the side information subspace , when constructing a subspace-polynomial repair scheme, we have the freedom in selecting relevant target subspace and . Hence, one can optimize the bandwidth over such and . This is in stark contrast to the case of standard repair without side information, in which any subspace would lead to the same repair bandwidth [17, 31]. Theorem 1 formalizes this fact.
Theorem 1.
Consider an RS, , , and , . Then the minimum bandwidth that a subspace-polynomial repair scheme for () can achieve, given the side information subspace , , is
where the is taken over all -subspaces and of with , , and .
Note that Theorem 1 converts the repair bandwidth with side information problem restricted to subspace-polynomial repair scheme to a pure subspace-intersection problem stated below.
(Subspace-Intersection Problem) Given and an -dimensional subspace of , find and that maximizes the sum among all -dimensional subspaces and -dimensional subspaces satisfying .
The subspace-intersection problem can be tricky to solve, especially for general . Therefore, we limit ourselves to the more tractable case when , for which optimal repair bandwidths were known for the standard repair setting (without side information) when [17, 31]. We also assume that . With these assumptions, in Corollary 2, we can replace by in the optimization problem. Note that while may not be a valid (i.e ), by Lemma 1, at least one of its cosets is. Finally, although this corollary provides an upper bound instead of an exact formulation for the bandwidth, later on, using the lower bound in Proposition 3, we can establish optimal repair bandwidths for subspace polynomial schemes in some parameter regimes.
Corollary 2.
Consider a full-length RS with evaluation points , where , . Let be a side information set with and . Then there exists a subspace-polynomial repair scheme for (), given the side information set , with repair bandwidth
(1) |
where the is taken over all -dimensional -subspaces of .
Proof.
Let be the side information subspace. By Theorem 1, the minimum bandwidth achieved by a subspace-polynomial repair scheme with side information subspace is
(2) |
where the is taken over all -subspaces and of with , , and . To prove the existence of a repair scheme with bandwidth given by (1), we show that a (multiplicative) coset of can be a (valid) target subspace. Indeed, according to Lemma 1, there exists such that , i.e. is a valid target subspace w.r.t. the side information subspace . Hence, setting in (2) and using the assumption that , there exists a subspace-polynomial repair scheme given the side information subspace that achieves the bandwidth
which is the same as (1) when replacing by , noting that . ∎
Using Corollary 2, assuming a full-length code with , we can now construct a few concrete subspace-polynomial repair schemes that achieve optimal repair bandwidths among all linear schemes. To construct the first repair scheme achieving optimal repair bandwidth, we first prove an auxiliary lemma.
Lemma 3.
For every , , and , and for every , it holds that .
Proof.
Note that and , where is a primitive element of . To show that , it suffices to show that for any , it holds that . Indeed, for such and , there exist , , , and such that
which implies that
The proof follows. ∎
The following theorem indicates the existence of optimal repair schemes for a full-length Reed-Solomon codes with side information size , where . We prove that the existing subspace repair scheme can be constructed from a subfield of , where , , , and . However, for any coset of , the proof is still right.
Theorem 2.
Consider a full-length Reed-Solomon codes RS over with for some . If , , and , then there exists a linear repair scheme with side information of size that uses the repair bandwidth of sub-symbols in . The scheme is optimal when .
Proof.
Set , and let be a basis of . We now consider the subspace repair scheme constructed from . The statement that the achieved bandwidth is optimal among all linear repair schemes is obvious due to Corollary 1, noting that the assumptions , , and imply . It remains to show that the stated scheme has the stated bandwidth. Indeed, from Lemma 2, it is sufficient to prove that . By Lemma 3, we note that to get the sum , we compute the number of elements so that . As the set of -dimensional intersections is a partition of into -dimensional subspaces, there are such subspaces. Moreover, each of them is repeated times, since for all . Thus, the number of elements with is , which completes the proof. ∎
Now we consider a greedy construction that generates -dimensional subspaces , , that generate subspace-polynomial repair schemes with minimal repair bandwidths. Assume that . The aim is to construct a subspace of satisfying for all .
Lemma 4.
Assume that and that . Then there exists an -dimensional subspace satisfying that for all .
Proof.
We will construct in a greedy manner a set that satisfies two properties given below.
-
•
(P1) is -linearly independent, and
-
•
(P2) the subspace satisfies that for all
The first element can be picked arbitrarily in because satisfies (P1) and (P2) obviously. Assume that we have already had a set that satisfies (P1) and (P2). We now show that we can find so that satisfies (P1) and (P2) given that and . Consider the set
Claim 1: .
Claim 2: Any satisfies (P1)-(P2).
Proof of Claim 1.
Note that gives . Since and for and , to count the elements of corresponding to and , we only need to consider values for each and . Moreover, we can ignore the case and or vice versa as the resulting elements are already counted for and when setting either or , respectively. Thus, other than , has at most other elements, where the binomial factor counts the number of distinct pairs . Thus, elements. ∎
Proof of Claim 2.
Since , , which implies (P1). Assume, for the sake of contradiction, that for some . Then there exist , so that either a) and , or b) and . If a) occurs, then there exist , , , , so that and , which implies that , which contradicts our assumption. The case b) can be treated similarly. ∎
The proof of Lemma 4 follows from these two claims. Indeed, by Claim 1, there exists at least one element in , which is the desired according to Claim 2. ∎
Theorem 3.
Consider a full-length Reed-Solomon codes RS over with for some . If and , then there exists a linear repair scheme with side information of size that uses the repair bandwidth of sub-symbols in . The scheme is optimal when .
III-E Bandwidth Reductions Given Side Information
To illustrate the repair bandwidth reduction in the presence of side information, we consider as an example the parameter regime assumed in Theorem 2. Case 1: and for some constant . Theorem 2 gives a repair bandwidth with side information . The optimal repair bandwidth with no side information is (see [17, 31]). Clearly, . Case 2: , i.e. , and , for some constants . Then , whereas . Clearly, as .
IV Conclusions
We proposed the problem of repairing a single erasure of Reed-Solomon codes with side information, which generalizes the standard repair problem, and established a lower bound on the repair bandwidth of a linear repair scheme. The problem of constructing optimal subspace-polynomial repair schemes can be reduced to a subspace intersection problem, which is interesting in its own right. We settled this problem for a few parameter regimes, leaving the general case open for future research.
Acknowledgement
The work of Son Hoang Dau was supported by the Australia Research Council DECRA Grant DE180100768 and DP Grant DP200100731. The work of Han Mao Kiah was supported by the Ministry of Education, Singapore, under its MOE AcRF Tier 2 Award under Grant MOE-T2EP20121-0007 and MOE AcRF Tier 1 Award under Grant RG19/23. The work of Stanislav Kruglik was supported by the Ministry of Education, Singapore, under its MOE AcRF Tier 2 Award under Grant MOE-T2EP20121-0007.
References
- [1] F. J. MacWilliams and N. J. A. Sloane, The Theory of Error-Correcting Codes. Amsterdam: North-Holland, 1977.
- [2] K. V. Rashmi, N. B. Shah, D. Gu, H. Kuang, D. Borthakur, and K. Ramchandran, “A solution to the network challenges of data recovery in erasure-coded distributed storage systems: A study on the Facebook warehouse cluster,” in Proc. USENIX Conf. Hot Topics Storage File Syst. (HotStorage), 2013, pp. 8–8.
- [3] A. Dimakis, P. Godfrey, Y. Wu, M. Wainwright, and K. Ramchandran, “Network coding for distributed storage systems,” IEEE Trans. Inform. Theory, vol. 56, no. 9, pp. 4539–4551, 2010.
- [4] A. Dimakis, K. Ramchandran, Y. Wu, and C. Suh, “A survey on network codes for distributed storage,” Proc. IEEE, vol. 99, no. 3, pp. 476–489, 2011.
- [5] O. Khan, R. Burns, J. Plank, W. Pierce, and C. Huang, “Rethinking erasure codes for cloud file systems: Minimizing i/o for recovery and degraded reads,” in Proc. 13th USENIX Conf. File Storage Technol. (FAST), 2012.
- [6] S. B. Balaji, M. N. Krishnan, M. Vajha, V. Ramkumar, B. Sasidharan, and P. V. Kumar, “Erasure coding for distributed storage: an overview,” Science China Information Sciences, vol. 61, no. 10, 2018.
- [7] A. Wyner, “On source coding with side information at the decoder,” IEEE Trans. Inform. Theory, vol. 21, no. 3, pp. 294–300, 1975.
- [8] G. Keshet, Y. Steinberg, and N. Merhav, Channel Coding in the Presence of Side Information, ser. Foundations and Trends in Communication and Information Theory, 2008.
- [9] V. Guruswami, “List decoding with side information,” in Proc. IEEE Annual Conference on Computational Complexity, 2003, pp. 300–309.
- [10] Y. Birk and T. Kol, “Coding on demand by an informed source (ISCOD) for efficient broadcast of different supplemental data to caching clients,” IEEE Trans. Inform. Theory, vol. 52, no. 6, pp. 2825–2830, 2006.
- [11] Z. Bar-Yossef, Y. Birk, T. S. Jayram, and T. Kol, “Index coding with side information,” IEEE Trans. Inform. Theory, vol. 57, no. 3, pp. 1479–1494, 2011.
- [12] S. Kadhe, B. Garcia, A. Heidarzadeh, S. E. Rouayheb, and A. Sprintson, “Private information retrieval with side information,” IEEE Trans. Inform. Theory, vol. 66, no. 4, pp. 2032–2043, 2020.
- [13] T. X. Dinh, L. Y. Nhi Nguyen, L. J. Mohan, S. Boztas, T. T. Luong, and S. H. Dau, “Practical considerations in repairing Reed-Solomon codes,” in Proc. IEEE Int. Symp. Inform. Theory (ISIT), 2022, pp. 2607–2612.
- [14] K. Shanmugam, D. S. Papailiopoulos, A. G. Dimakis, and G. Caire, “A repair framework for scalar MDS codes,” IEEE J. Selected Areas Comm. (JSAC), vol. 32, no. 5, pp. 998–1007, 2014.
- [15] V. Guruswami and M. Wootters, “Repairing Reed-Solomon codes,” in Proc. Annu. Symp. Theory Comput. (STOC), 2016.
- [16] ——, “Repairing Reed-Solomon codes,” IEEE Trans. Inform. Theory, vol. 63, no. 9, pp. 5684–5698, 2017.
- [17] S. H. Dau and O. Milenkovic, “Optimal repair schemes for some families of Reed-Solomon codes,” in Proc. IEEE Int. Symp. Inform. Theory (ISIT), 2017, pp. 346–350.
- [18] S. H. Dau, I. Duursma, H. M. Kiah, and O. Milenkovic, “Repairing Reed-Solomon codes with two erasures,” in Proc. IEEE Int. Symp. Inform. Theory (ISIT), 2017, pp. 351–355.
- [19] ——, “Repairing Reed-Solomon codes with multiple erasures,” IEEE Trans. Inform. Theory, vol. 54, no. 10, pp. 6567–6582, 2018.
- [20] M. Ye and A. Barg, “Explicit constructions of high-rate MDS array codes with optimal repair bandwidth,” IEEE Trans. Inform. Theory, vol. 63, no. 4, pp. 2001–2014, 2017.
- [21] W. Li, Z. Wang, and H. Jafarkhani, “A tradeoff between the sub-packetization size and the repair bandwidth for Reed-Solomon code,” in Proc. 55th Annual Allerton Conf. Comm. Control Comput. (Allerton), 2017, pp. 942–949.
- [22] ——, “On the sub-packetization size and the repair bandwidth of Reed-Solomon codes,” IEEE Trans. Inform. Theory, vol. 65, no. 9, pp. 5484–5502, 2019.
- [23] ——, “Repairing Reed-Solomon Codes Over ,” IEEE Comm. Lett., vol. 24, no. 1, pp. 34–37, 2020.
- [24] A. Chowdhury and A. Vardy, “Improved schemes for asymptotically optimal repair of MDS codes,” in Proc. 55th Annual Allerton Conf. Comm Control Comput. (Allerton), 2017.
- [25] ——, “Improved schemes for asymptotically optimal repair of MDS codes,” IEEE Trans. Inform. Theory, vol. 67, no. 8, pp. 5051–5068, 2021.
- [26] I. Tamo, M. Ye, and A. Barg, “Optimal repair of Reed-Solomon codes: Achieving the cut-set bound,” in Proc. 58th Annual IEEE Symp. Foundations Computer Sci. (FOCS), 2017.
- [27] ——, “The repair problem for Reed-Solomon codes: Optimal repair of single and multiple erasures with almost optimal node size,” IEEE Trans. Inform. Theory, vol. 65, no. 5, pp. 2673–2695, 2018.
- [28] S. H. Dau and E. Viterbo, “Repair schemes with optimal I/O costs for full-length Reed-Solomon codes with two parities,” in Proc. IEEE Inform. Theory Workshop (ITW), 2018, pp. 590–594.
- [29] S. H. Dau, I. Duursma, and H. Chu, “On the I/O costs of some repair schemes for full-length Reed-Solomon codes,” in Proc. IEEE Int. Symp. Inform. Theory (ISIT), 2018, pp. 1700–1704.
- [30] I. Duursma and S. H. Dau, “Low bandwidth repair of the RS(10,4) Reed-Solomon code,” in Proc. Inform. Theory Applicat. Workshop (ITA), 2017.
- [31] S. H. Dau, T. X. Dinh, H. M. Kiah, T. T. Luong, and O. Milenkovic, “Repairing Reed-Solomon codes via subspace polynomials,” IEEE Trans. Inform. Theory, vol. 67, no. 10, pp. 6395–6407, 2021.
- [32] W. Li, H. Dau, Z. Wang, H. Jafarkhani, and E. Viterbo, “On the I/O costs in repairing short-length Reed-Solomon codes,” in Proc. IEEE Int. Symp. Inform. Theory (ISIT), 2019, pp. 1087–1091.
- [33] T. X. Dinh, S. Boztas, S. H. Dau, and E. Viterbo, “Designing compact repair groups for Reed-Solomon codes,” in Proc. IEEE Int. Symp. Inform. Theory (ISIT), 2023, pp. 2027–2032.
- [34] J. Xu, Y. Zhang, K. Wang, and Z. Zhang, “Cooperative repair of Reed-Solomon codes via linearized permutation polynomials,” IEEE Trans. Inform. Theory, pp. 1–11, 2023.
- [35] A. Berman, S. Buzaglo, A. Dor, Y. Shany, and I. Tamo, “Repairing Reed–Solomon codes evaluated on subspaces,” in Proc. IEEE Int. Symp. Inform. Theory (ISIT), 2021, pp. 867–871.
- [36] R. Con and I. Tamo, “Nonlinear repair of Reed-Solomon codes,” IEEE Trans. Inform. Theory, vol. 68, no. 8, pp. 5165–5177, 2022.
- [37] R. Con, N. Shutty, I. Tamo, and M. Wootters, “Repairing Reed-Solomon codes over prime fields via exponential sums,” in Proc. IEEE Int. Symp. Inform. Theory (ISIT), 2023, pp. 1330–1335.
- [38] A. W. Marshall, I. Olkin, and B. C. Arnold, Inequalities: Theory of Majorization and Its Applications, ser. Springer Series in Statistics, 2011.
V Appendix
V-A Proof of Proposition 1
The first part of this appendix is devoted for the discussion on the definition and the existence of a linear repair scheme for a failed node with side information of size of Reed-Solomon code RS. Similar to an (exact) linear repair scheme for a failed node with standard repair, a linear repair scheme for a node with side information of size is described by elements ’s used in each trace along with a linear algorithm.
We first propose the definition of a linear repair scheme with side information of size which is modeled after the definition of a linear repair scheme with the standard repair in [15].
Definition 2.
A linear repair scheme with side information for a symbol of Reed-Solomon code RS with evaluation point set , , over the coding field and the base field consists of
a set , for each , and
coefficients , where ’s are -linear coefficients of the queries
so that there is a linear reconstruction algorithm that computes
,
where , , which are already known from the side information , and is an -basis of .
The repair bandwidth is the total number of sub-symbols in returned by each node , i.e.,
.
Lemma 5.
Suppose there is a linear repair scheme for repairing of RS with side information given by a set and a linear algorithm as in Definition 2. Then, there is an -linearly independent set so that there are elements satisfying , for all of degree less than , where .
Proof.
Suppose is an -linearly independent set of so that is a basis of and is its trace dual basis. According to Definition 2, the linear repair algorithm computes coefficients so that , where for , and ’s, , are -linear functions of the queries in , i.e., for ,
for , and . Furthermore, , . Then, for ,
(3) |
The Equation 3 holds for all polynomials , , then for all and for all it still holds, i.e., , which derives to . This completes the proof. ∎
Lemma 5 ensures for the existence of a linear algorithm to repair a failed node once there exists a linear repair scheme by ensuring the existence of the set and the elements , for and . We now propose a linear algorithm to repair of RS with side information , which is modeled after the linear algorithm to repair RS in [15].
Algorithm 1.
Linear repair with side information .
Input: A set of evaluation points, a point of the failed node , the traces corresponding to the side information , the access to linear queries of the form , for all .
Output: the value .
Steps:
-
1.
Choose a linearly independent set .
-
2.
Choose elements for each pair of and so that .
-
3.
for do
-
4.
Choose an arbitrary spanning set for the set and get the queries , .
-
5.
Compute for each through the traces ’s.
-
6.
Compute , by taking the trace of both sides of the equation in Step 2.
-
7.
end
-
8.
Compute from :
where is the dual basis of .
The following proof of Proposition 1 indicates that a linear repair scheme for a node with side information size of a code RS is equivalent to a set of polynomials of degree less than .
Proof of Proposition 1.
Supposing that is the trace-dual basis of , where and . Supposing that According Lemma 5, the work of defining with side information is now the work of defining coefficients , , and . This means that to define , , it is enough to find , or , . Each polynomial , , of degree less than corresponds to a dual codeword of the Reed-Solomon codes RS, which returns , where . Then, , which is equivalent to . Applying the trace function on two sides of this equation we get . In conclusion, each , , can be computed through the traces , which can totally be defined through the polynomials , . ∎
V-B Proof of Proposition 3
Proof of Proposition 3.
According to Proposition 1, a linear repair scheme with side information size for a failed node corresponds to a set of polynomials. Supposing that the repair scheme is the polynomial set , where and , for all . Therefore, the repair bandwidth of the repair scheme is . For each , let . Since , . Averaging the size over all nonzero vectors , we have
Then, there exists some such that . Let , vanishes on at least elements of . Furthermore, it follows from is linearly independent and that . Hence, corresponds to a nonzero dual codeword of RS and has at most roots, where . Then, , which allows that
(4) |
Put
The minimum occurs when ’s are balanced and equal to . Supposing that is the biggest integer satisfying
where , and is an optimal solution for (5). To obtain this solution, the balancing procedure as in [31] is applied. The computation for is easily obtained. Then, we get the lower bound as desired. ∎
V-C A discussion on the subspace intersections with the lowest repair bandwidth
A condition for an -dimensional subspace so that the repair scheme constructed from this subspace by Lemma 2 and Corollary 2 obtains the minimal repair bandwidth among all -dimensional -subspaces is that the sum achieves the maximal value among all -dimensional -subspaces. One concrete consideration for obtaining the maximal sum is the case when the intersection subspaces have dimension or . More particularly, for a parameter , if there exists an -dimensional subspaces with , for all , then a sufficient condition for an arbitrary -dimensional subspace used to construct subspace polynomial repair scheme that obtains the lowest repair bandwidth, i.e., the sum achieves maximal value among all subspaces dimension , is also the condition that , for all . Moreover, for the codes RS, where , this condition is the necessary and sufficient condition for the repair scheme constructed by obtaining the optimal repair bandwidth. The repair schemes constructed in Theorems 2 and Theorem 3 are of this consideration. We will make the above discussion clearer in Corollary 3. Since our proof for the conclusion of subspace intersection of dimensions or is based on the majorization of two real number sequences, we first recall some basic results on this problem. For two sequences of real numbers and , supposing that and , we say that is majorized by or majorizes if and , for all [38, A.1, p.8].
Lemma 6.
[38, B.1, p.156] The inequality is satisfied for all continuous convex function if and only if is majorized by .
Now we have the condition to get maximal value for the sum of intersection dimensions. Supposing that , which is the number of disjoint cosets of in . Since each pair of cosets are completely coincided or disjoint, and for all , we only need to consider the sums over disjoint cosets with the value of each dimension in each sum repeated times. We have the following proposition.
Proposition 4.
Let are disjoint cosets and the two sequences , are the dimensions of the intersection of subspaces of these cosets with and , respectively. Without lost of generality, we can suppose that and . Let and , where if and if , and if and if . Then, if is majorized by then .
Proof.
Since is majorized by and , where is a continuous convex function over . The proof is completed by applying Lemma 6 for and and the computation of and through function . ∎
Corollary 3.
Let , and are two -dimensional subspaces of where , for all . Then, . Moreover, , and if there exists so that then .
Proof.
The proof is completed by applying Proposition 4 and Lemma 6 for two sequences , in the special case where , for all . When all the intersections is of dimension or , the set of -dimensional intersections is a partition of , which allows that the number of these subspaces is . Since each of the intersection is repeated times, the total of dimensions is . If there exists , for some , then the strict inequality is achieved. ∎