This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Adaptive Coding for Matrix Multiplication at Edge Networks

Elahe Vedadi and Hulya Seferoglu University of Illinois at Chicago
Email: {evedad2, hulya}@uic.edu
Abstract

Edge computing is emerging as a new paradigm to allow processing data at the edge of the network, where data is typically generated and collected, by exploiting multiple devices at the edge collectively. However, exploiting the potential of edge computing is challenging mainly due to the heterogeneous and time-varying nature of edge devices. Coded computation, which advocates mixing data in sub-tasks by employing erasure codes and offloading these sub-tasks to other devices for computation, is recently gaining interest, thanks to its higher reliability, smaller delay, and lower communication cost. In this paper, our focus is on characterizing the cost-benefit trade-offs of coded computation for practical edge computing systems, and develop an adaptive coded computation framework. In particular, we focus on matrix multiplication as a computationally intensive task, and develop an adaptive coding for matrix multiplication (ACM2) algorithm by taking into account the heterogeneous and time varying nature of edge devices. ACM2 dynamically selects the best coding policy by taking into account the computing time, storage requirements as well as successful decoding probability. We show that ACM2 improves the task completion delay significantly as compared to existing coded matrix multiplication algorithms.

I Introduction

Massive amount of data is generated at edge networks with the emerging Internet of Things (IoT) including self-driving cars, drones, health monitoring devices, etc. Transmitting such massive data to the centralized cloud, and expecting timely processing are not realistic with limited bandwidth between an edge network and centralized cloud. We consider a distributed computing system, where computationally intensive aspects are distributively processed at the end devices with possible help from edge servers (fog) and cloud. However, exploiting the potential of edge computing is challenging mainly due to the heterogeneous and time-varying nature of edge devices.

Coded computation is an emerging field, which studies the design of erasure and error-correcting codes to improve the performance of distributed computing through “smart” data redundancy. This breakthrough idea has spawned a significant effort, mainly in the information and coding theory communities [1, 2]. According to distributed computation, a master device divides computationally intensive aspects/tasks into multiple smaller sub-tasks, and offloads each of them to other devices (end devices, edge servers, and cloud), called workers, for computation. Coded computation (e.g., by employing erasure codes such as Reed Solomon codes [3, 4]), on the other hand, encodes the data in the sub-tasks, and offloads these coded sub-tasks for computation. The next example demonstrates the potential of coded computation for matrix multiplication.

Example 1

Consider a setup where a master wishes to offload a matrix multiplication C=ATBC=A^{T}B task to three workers. Assume AA and BB are K×KK\times K matrices and matrix AA is divided into two sub matrices A1A_{1} and A2A_{2}, which are then encoded using a (3,2)(3,2) Maximum Distance Separable (MDS) code, which is further explained in Section II, to give Z1=A1Z_{1}=A_{1}, Z2=A2Z_{2}=A_{2} and Z3=A1+A2Z_{3}=A_{1}+A_{2}, and sends each to a different worker. When the master receives the computed values (i.e., ZiTBZ_{i}^{T}B) from at least two out of three workers, it can decode its desired task, which is the computation of ATBA^{T}B. The power of coded computations is that it makes Z3=A1+A2Z_{3}=A_{1}+A_{2} acts as an extra task that can replace any of the other two tasks if they end up straggling or failing. \Box

Significant effort is being put on constructing codes for fast and distributed matrix-vector multiplication [1, 5], matrix-matrix multiplication [6, 7, 8, 9], dot product and convolution of two vectors [10, 11], gradient descent [12, 13, 14], distributed optimization [15, 16], Fourier transform [17], and linear transformations [18]. The trade-off between latency of computation and load of communication for data shuffling in MapReduce framework is characterized in [2], and optimum resource allocation algorithm is developed in [19]. This coding idea is extended for cellular networks [20], multistage computation [21], and heterogeneous systems [22, 23].

Our focus in this work is on matrix multiplication, where a master device divides its matrix multiplication computations into smaller tasks and assigns them to workers (possibly including itself) that can process these tasks in parallel. Product [7], polynomial [6], and MatDot (and its extension PolyDot) codes [8] are recently developed for matrix multiplication. Their main focus is to minimize/optimize the recovery threshold, which is the minimum number of workers that the master needs to wait for in order to compute matrix multiplication (C=ATBC=A^{T}B in Example 1). Although this metric is good for large scale computing systems in data centers, it fails short in edge computing, where other resources including computing power, storage, energy, networking resources are limited.

In this paper, we analyze the computing time, storage cost, and successful decoding probability of some existing codes for matrix multiplication. Then, we develop an adaptive coding for matrix multiplication (ACM2) algorithm that selects the best coding strategy for each sub-matrix.

We note that rateless codes considered in [24, 25] also provide adaptive coded computation mechanisms against heterogeneous and time-varying resources. However, the coding overhead in rateless codes can be high in some scenarios, which makes adaptive selection of fixed-rate codes a better alternative. Multi-message communication by employing Lagrange coded computation is considered in [26] to reduce under-utilization due to discarding partial computations carried out by stragglers as well as over-computation due to inaccurate prediction of the straggling behavior. A hierarchical coded matrix multiplication is developed in [27] to utilize both slow and fast workers. As compared to [26, 27], we propose an adaptive code selection policy for heterogeneous and time-varying resources. A code design mechanism under a heterogeneous setup is developed in [22, 23], where matrix AA is divided, coded, and offloaded to worker by taking into account heterogeneity of resources. However, available resources at helpers may vary over time, which is not taken into account in [22, 23]. Thus, it is crucial to design a coded computation mechanism, which is dynamic and adaptive to heterogeneous and time-varying resources, which is the goal of this paper.

We show that ACM2 improves the task completion delay significantly as compared to existing coded matrix multiplication algorithms. The following are the key contributions:

  • We provide computing time analysis of some existing codes designed for matrix multiplication including product [7], polynomial [6], and MatDot codes [8].

  • We characterize storage requirements of existing matrix multiplication codes [7, 6, 8] at master and workers.

  • We design ACM2 for an iterative procedure (e.g., gradient descent) that selects the best code as well as the optimum number of partitions for that code at each iteration to minimize the average computing time subject to storage and successful decoding probability constraints.111We note that ACM2 is generic enough to work with any matrix multiplication codes although we consider a subset of existing codes in this paper such as product, polynomial, and MatDot codes.

  • We evaluate the performance of ACM2 through simulations and show that ACM2 significantly improves the average computing time as compared to existing codes.

II Model, Background, and Motivation

II-A Model

II-A1 Setup

We consider a master/worker setup at the edge of the network, where the master device offloads its computationally intensive tasks (matrix multiplication computations) to workers wnw_{n}, n𝒩n\in\mathcal{N} (where 𝒩{1,,N}\mathcal{N}\triangleq\{1,\dots,N\} ) via device-to-device (D2D) links such as Wi-Fi Direct and/or Bluetooth. The master device divides a task (matrix) into smaller sub-matrices, and offloads them to parallel processing workers.

II-A2 Task Model

The master wants to compute functions of its collected data, which is determined by the applications. We will focus on computing linear functions; specifically matrix multiplication C=ATBC=A^{T}B, where AL×KA\in\mathbb{R}^{L\times K}, BL×KB\in\mathbb{R}^{L\times K}. Matrix multiplication forms an essential building block of many signal processing (convolution, compression, etc.) and machine learning algorithms (gradient descent, classification, etc.) [1]. We consider an iterative procedure (gradient descent) where a matrix multiplication is calculated at each iteration.

II-A3 Worker Model

The workers have the following properties: (i) failures: workers may fail or “sleep/die" or leave the network before finishing their assigned computational tasks. (ii) stragglers: workers will incur probabilistic delays in responding to the master.

II-A4 Coding Model

We design and employ an adaptive coding for matrix multiplication (ACM2) that selects the best coding strategy among repetition, MDS [28, 1], polynomial [6], MatDot [8], and product codes [7] by taking into account the computing time, storage cost, and successful decoding probability of these codes. The master device divides the matrix AA into pπp_{\pi} partitions (sub-matrices), where π{rep,mds}\pi\in\{\text{rep},\text{mds}\} for repetition and MDS codes. Both AA and BB matrices are divided into pπp_{\pi} partitions, where π{poly,matdot,pro}\pi\in\{\text{poly},\text{matdot},\text{pro}\} for polynomial, MatDot, and product codes.

II-A5 Delay Model

Each sub-matrix transmitted from the master to a worker wn,n𝒩,w_{n},\ n\in\mathcal{N}, experiences the following delays: (i) transmission delay for sending the sub-matrix from the master to the worker, (ii) computation delay for computing the multiplication of the sub-matrices, and (iii) transmission delay for sending the computed matrix from the worker wnw_{n} back to the master. We model the composite delay using the shifted exponential distribution (f(t)=λeλ(t1)f(t)=\lambda e^{-\lambda(t-1)} for t1t\geq 1) [29, 1], with λ\lambda referred to as the straggling parameter and each sub-task with shifted-scaled exponential distribution (f(t)=αλeαλ(t1α)f(t)=\alpha\lambda e^{-\alpha\lambda(t-\frac{1}{\alpha})} for t1αt\geq\frac{1}{\alpha}) where the scale parameter, α\alpha is selected from α{prep,pmds,ppoly2,pmatdot,ppro2}\alpha\in\{p_{\text{rep}},p_{\text{mds}},p^{2}_{\text{poly}},p_{\text{matdot}},p^{2}_{\text{pro}}\}.

II-B Background on Existing Codes for Matrix Multiplication

In this section, we provide a short background on existing coded computation mechanisms for matrix multiplication.

II-B1 Repetition Codes

The master device divides matrix AA column-wise into prepp_{\text{rep}} parts, where prep|Np_{\text{rep}}|N, and generates Nprep\frac{N}{p_{\text{rep}}} copies of each sub-matrix. Sub-matrix AiA_{i}, i=1,,prepi=1,\ldots,p_{\text{rep}} is transmitted to Nprep\frac{N}{p_{\text{rep}}} workers as well as matrix BB. Workers calculate AiTBA_{i}^{T}B, and return their calculation back to the master, which finishes matrix multiplication calculation when it receives AiTBA_{i}^{T}B, i=1,,prep\forall i=1,\ldots,p_{\text{rep}}.

II-B2 MDS Codes [28, 1]

The master device divides the matrix AA column-wise to pmdsp_{\text{mds}} partitions. An (N,kmds)(N,k_{\text{mds}}) MDS code sets kmds=pmdsk_{\text{mds}}=p_{\text{mds}} and codes kmdsk_{\text{mds}} sub-matrices into NN sub-matrices using existing MDS codes like Reed-Solomon codes. AiA_{i}, i=1,,N\forall i=1,\ldots,N as well as BB are transmitted to NN workers. When the master device receives kmdsk_{\text{mds}} AiTBA_{i}^{T}B calculations, it can decode and calculate ATBA^{T}B.

II-B3 Polynomial Codes [6]

The master device divides AA and BB column-wise into ppolyp_{\text{poly}} partitions, where ppoly2Np_{\text{poly}}^{2}\leq N. The master constructs polynomials; α(n)=j=1ppolyAjTnj1\alpha(n)=\sum_{j=1}^{p_{\text{poly}}}A_{j}^{T}n^{j-1} and β(n)=j=1ppolyBjn(j1)ppoly\beta(n)=\sum_{j=1}^{p_{\text{poly}}}B_{j}n^{(j-1)p_{\text{poly}}}, and sends them to worker WnW_{n}, which calculates α(n)β(n)\alpha(n)\beta(n) multiplication. When the master receives kpoly=ppoly2k_{\text{poly}}=p_{\text{poly}}^{2} α(n)β(n)\alpha(n)\beta(n) multiplication, decoding is completed.

II-B4 MatDot Codes [8]

Both matrices AA and BB are divided row-wise into pmatdotp_{\text{matdot}} partitions, where 2pmatdot1N2p_{\text{matdot}}-1\leq N. The master constructs the polynomials; α(n)=j=1pmatdotAjTnj1\alpha(n)=\sum_{j=1}^{p_{\text{matdot}}}A_{j}^{T}n^{j-1} and β(n)=j=1pmatdotBjnpmatdotj\beta(n)=\sum_{j=1}^{p_{\text{matdot}}}B_{j}n^{p_{\text{matdot}}-j}, and sends them to worker wnw_{n} for processing, where worker wnw_{n} calculates α(n)β(n)\alpha(n)\beta(n) multiplication. When the master receives kmatdot=2pmatdot1k_{\text{matdot}}=2p_{\text{matdot}}-1 results from workers, it can decode and calculate ATBA^{T}B.

II-B5 Product Codes [7]

Product codes extend MDS codes in a way that both AA and BB are partitioned column-wise and coded. In particular, both AA and BB are divided into pprop_{\text{pro}} partitions, and these partitions are put into ppro×pprop_{\text{pro}}\times p_{\text{pro}} array. Then, every row of the array is encoded with an (N,ppro)(\sqrt{N},p_{\text{pro}}) MDS code, which results ppro×Np_{\text{pro}}\times\sqrt{N} array. This array is also coded with an (N,ppro)(\sqrt{N},p_{\text{pro}}) MDS code, which results into N\sqrt{N}-by-N\sqrt{N} array. Each coded sub-matrix in this array is sent to a worker (out of NN workers) for calculation. Product codes are decodable if at least one entry of any possible sub-array with size larger than or equal to (Nppro+1)×(Nppro+1)(\sqrt{N}-p_{\text{pro}}+1)\times(\sqrt{N}-p_{\text{pro}}+1) is received successfully.

TABLE I: Comparison of product, polynomial, MatDot, MDS and repetition codes for matrix-matrix multiplication.
Recovery
threshold
(kk)
Computing
load
per worker (γ\gamma)
Storage
load
per worker (μ\mu)
Probability of
successful
computation (ρ\rho)
N=6 N=7 N=8 N=9
Product 6 K34\frac{K^{3}}{4} K2+K24K^{2}+\frac{K^{2}}{4} N/A N/A N/A 0.63
Polynomial 4 K34\frac{K^{3}}{4} K2+K24K^{2}+\frac{K^{2}}{4} 0.67 0.82 0.91 0.96
MatDot 3 K32\frac{K^{3}}{2} 2K22K^{2} 0.89 0.95 0.98 0.99
MDS 2 K32\frac{K^{3}}{2} 2K22K^{2} 0.98 0.99 0.99 0.99
Repetition N2+1\lfloor\frac{N}{2}+1\rfloor K32\frac{K^{3}}{2} 2K22K^{2} 0.67 N/A 0.73 N/A

II-C Motivation for Adaptive Coding

Assume a canonical setup, where AA and BB are K×KK\times K matrices and divided into two sub-matrices A0A_{0}, A1A_{1}, B0B_{0}, and B1B_{1}. The product codes divide matrices column-wise, i.e., AiA_{i} and BiB_{i} are K×K2K\times\frac{K}{2} matrices, for i{0,1}i\in\{0,1\}, and use two-level MDS codes. Considering that A2=A0+A1A_{2}=A_{0}+A_{1} and B2=B0+B1B_{2}=B_{0}+B_{1}, nine codes are constructed by AiTBiA_{i}^{T}B_{i}, for i,j{0,1,2}i,j\in\{0,1,2\}. In polynomial codes the master device, by dividing matrices column-wise, creates polynomials α(n)=A0+A1n\alpha(n)=A_{0}+A_{1}n and β(n)=B0+B1n2\beta(n)=B_{0}+B_{1}n^{2} for worker wnw_{n}, which multiplies α(n)β(n)\alpha(n)\beta(n). MatDot follows a similar idea of polynomial codes with the following difference: AA and BB are divided row-wise.

Table I shows the recovery threshold (kk) [7, 6, 8], computing load per worker (γ\gamma) (this is the simplified analysis presented in Section III-A and further detailed in Appendix A), which shows the number of required multiplications, storage load per worker (μ\mu), which shows the average amount of memory needed to store matrices and their multiplication (as detailed in Section III-B), and probability of successful computation (ρ\rho), which is calculated assuming that the failure probability of workers is 13\frac{1}{3} and independent. As seen, although MDS is the best in terms of recovery threshold (kk), it introduces more computing load per worker (because of partitioning only one of the matrices). Also, MatDot, MDS and repetition codes perform worse than polynomial and product codes in terms of storage load per worker. Product codes require at least N=9N=9 workers due to their very design, and for repetition codes the number of workers should be an even number, but MDS, polynomial and MatDot codes are more flexible. As seen, there is a trade-off among {k,γ,μ,ρ}\{k,\gamma,\mu,\rho\}, which we aim to explore in this paper by taking into account the limited edge computing resources including computing power and storage. For example, if there is no constraint on the total number of workers, but only on computing load, we will likely select product or polynomial codes. Next, we will provide a computing time and storage analysis of existing codes, and develop ACM2 that selects the best code depending on edge constraints.

III Adaptive Coding for Matrix Multiplication

III-A Computing Time Analysis

Assuming a shifted-scaled exponential distribution as a computation delay model with λ\lambda as the straggling parameter and α{prep,pmds,ppoly2,pmatdot,ppro2}\alpha\in\{p_{\text{rep}},p_{\text{mds}},p^{2}_{\text{poly}},p_{\text{matdot}},p^{2}_{\text{pro}}\} as the scale parameter for each worker, average computing time for repetition codes TrepT_{\text{rep}} and MDS codes TmdsT_{\text{mds}} are expressed [1] as

Trep1prep(1+prepNλlog(prep)),\displaystyle T_{\text{rep}}\approx\frac{1}{p_{\text{rep}}}\left(1+\frac{p_{\text{rep}}}{N\lambda}\log(p_{\text{rep}})\right), (1)
Tmds1pmds(1+1λlog(NNkmds)).\displaystyle T_{\text{mds}}\approx\frac{1}{{p_{\text{mds}}}}\left(1+\frac{1}{\lambda}\log\left(\frac{N}{N-{k_{\text{mds}}}}\right)\right). (2)
Corollary 1

The average computing time for polynomial codes TpolyT_{\text{poly}} and MatDot codes TmatdotT_{\text{matdot}} is expressed as the following assuming that a shifted-scaled exponential distribution is used as a delay model.

Tpoly1ppoly2(1+1λlog(NNkpoly)),\displaystyle T_{\text{poly}}\approx\frac{1}{{{p}^{2}_{\text{poly}}}}\left(1+\frac{1}{\lambda}\log\left(\frac{N}{N-{k_{\text{poly}}}}\right)\right), (3)
Tmatdot1pmatdot(1+1λlog(NNkmatdot)).\displaystyle T_{\text{matdot}}\approx\frac{1}{{p_{\text{matdot}}}}\left(1+\frac{1}{\lambda}\log\left(\frac{N}{N-{k_{\text{matdot}}}}\right)\right). (4)

Proof: The proof is provided in Appendix B. \Box

The product codes have different performance in two different regimes. In the first regime the number of workers scales sublinearly with ppro2p_{\text{pro}}^{2}, i.e., N=ppro2+𝒪(ppro)N=p_{\text{pro}}^{2}+\mathcal{O}(p_{\text{pro}}), while in the second regime, the number of workers scales linearly with ppro2p_{\text{pro}}^{2}, i.e., N=ppro2+𝒪(ppro2)N=p_{\text{pro}}^{2}+\mathcal{O}(p_{\text{pro}}^{2}). The computing time analysis of product codes is provided in these two regimes next.

Corollary 2

Assume a shifted-scaled exponential distribution as a delay model with λ\lambda as the straggling parameter and ppro2p^{2}_{\text{pro}} as the scale parameter for each worker, average computing time TproT_{\text{pro}} for (ppro+τ2,ppro)2(p_{\text{pro}}+\frac{\tau}{2},p_{\text{pro}})^{2} product codes and (ppro+τ2)2(p_{\text{pro}}+\frac{\tau}{2})^{2} workers, where τ\tau is an even integer, as pprop_{\text{pro}} grows to infinity, is expressed in the first regime as

Tpro1ppro2(1+1λlog(ppro+τ2cτ/2+1)),\displaystyle T_{\text{pro}}\approx\frac{1}{p_{\text{pro}}^{2}}\left(1+\frac{1}{\lambda}\log\left(\frac{p_{\text{pro}}+\frac{\tau}{2}}{c_{\tau/2+1}}\right)\right), (5)

where cτ/2+1(1+τ/2)+(1+τ/2)log(1+τ/2)c_{\tau/2+1}\approx(1+\tau/2)+\sqrt{(1+\tau/2)\log(1+\tau/2)} [30], [31]. Assuming the same delay distribution, the lower bound and upper bound of average computing time TproT_{\text{pro}} for (1+δppro,ppro)2(\sqrt{1+\delta}p_{\text{pro}},p_{\text{pro}})^{2} product codes and (1+δ)ppro2(1+\delta)p^{2}_{\text{pro}} workers, for a fixed δ\delta, as pprop_{\text{pro}} grows to infinity, is expressed in the second regime as

Tprolow=1ppro2(1+1λlog(1+δδ)),\displaystyle T_{\text{pro}}^{\text{low}}=\frac{1}{p_{\text{pro}}^{2}}\left(1+\frac{1}{\lambda}\log\left(\frac{1+\delta}{\delta}\right)\right), (6)
Tproup=1ppro2(1+2λlog(1+δ+1+δδ)).\displaystyle T_{\text{pro}}^{\text{up}}=\frac{1}{p_{\text{pro}}^{2}}\left(1+\frac{2}{\lambda}\log\left(\frac{1+\delta+\sqrt{1+\delta}}{\delta}\right)\right). (7)

Proof: The proof is provided in Appendix C. \Box

III-B Storage Analysis

In this section, we provide storage requirements of the codes that we explained in Section II-B. These codes have storage requirements both at the master and worker devices. In particular, we assume that each entry of matrices AA and BB requires a fixed amount of memory. Our analysis, which is provided next, quantifies how many of these entries are needed to be stored at the master and worker devices.

III-B1 Storage at Master

In all codes, we first store matrices AA and BB, where each matrix contains KLKL components. Also, we store the final result, ATBK×KA^{T}B\in\mathbb{R}^{K\times K}, which contains K2K^{2} components. Therefore, 2KL+K22KL+K^{2} entries should be stored at the master device for each code.

In repetition codes, we store the first krepk_{\text{rep}} results obtained from krepk_{\text{rep}} workers, where krep=NNprep+1k_{\text{rep}}=N-\frac{N}{p_{\text{rep}}}+1. Each result contains K2prep\frac{K^{2}}{p_{\text{rep}}} components.

Srepm=krepK2prep+2KL+K2.\displaystyle S_{\text{rep}}^{m}=k_{\text{rep}}\frac{K^{2}}{p_{\text{rep}}}+2KL+K^{2}. (8)

In MDS codes, we store NkmdsN-k_{\text{mds}} coded sub-matrices. Each coded matrix contains KLpmds\frac{KL}{p_{\text{mds}}} components. Then, we need to store the first kmdsk_{\text{mds}} results obtained from kmdsk_{\text{mds}} workers. Each result contains K2pmds\frac{K^{2}}{p_{\text{mds}}} components.

Smdsm=(Nkmds)KLpmds+kmdsK2pmds+2KL+K2.\displaystyle S_{\text{mds}}^{m}=(N-k_{\text{mds}})\frac{KL}{p_{\text{mds}}}+k_{\text{mds}}\frac{K^{2}}{p_{\text{mds}}}+2KL+K^{2}. (9)

In polynomial codes, we store 2N2N coded sub-matrices. Each coded matrix contains KLppoly\frac{KL}{p_{\text{poly}}} components. Then, we need to store the first kpolyk_{\text{poly}} results obtained from kpolyk_{\text{poly}} workers. Each such result contains K2ppoly2\frac{K^{2}}{p_{\text{poly}}^{2}} components.

Spolym=2NKLppoly+kpolyK2ppoly2+2KL+K2.\displaystyle S_{\text{poly}}^{m}=2N\frac{KL}{p_{\text{poly}}}+k_{\text{poly}}\frac{K^{2}}{p_{\text{poly}}^{2}}+2KL+K^{2}. (10)

In MatDot, we store 2N2N coded sub-matrices. Each coded matrix contains KLpmatdot\frac{KL}{p_{\text{matdot}}} components. Then, we need to store the first kmatdotk_{\text{matdot}} results collected from kmatdotk_{\text{matdot}} workers. Each result contains K2K^{2} components.

Smatdotm=2NKLpmatdot+kmatdotK2+2KL+K2.\displaystyle S_{\text{matdot}}^{m}=2N\frac{KL}{p_{\text{matdot}}}+k_{\text{matdot}}K^{2}+2KL+K^{2}. (11)

In product codes, we store 2(Nkpro)2(N-k_{\text{pro}}) coded sub-matrices. Each coded matrix contains KLppro\frac{KL}{p_{\text{pro}}} components. Then, we need to store the first kprok_{\text{pro}} results collected from kprok_{\text{pro}} workers, where kpro=2(ppro1)N(ppro1)2+1k_{\text{pro}}=2(p_{\text{pro}}-1)\sqrt{N}-{(p_{\text{pro}}-1)}^{2}+1 [6]. Each result contains K2ppro2\frac{K^{2}}{p_{\text{pro}}^{2}} components.

Sprom=2(Nkpro)KLppro+kproK2ppro2+2KL+K2.S_{\text{pro}}^{m}=2(N-k_{\text{pro}})\frac{KL}{p_{\text{pro}}}+k_{\text{pro}}\frac{K^{2}}{p_{\text{pro}}^{2}}+2KL+K^{2}. (12)

III-B2 Storage at Workers

In repetition codes, each worker receives one matrix of size KLprep\frac{KL}{p_{\text{rep}}} and one matrix of size KLKL, as we decompose only one of the matrices to prepp_{\text{rep}} parts. So, the size of resulting matrix is K2prep\frac{K^{2}}{p_{\text{rep}}}. Similarly, in MDS codes we decompose only one of the matrices to pmdsp_{\text{mds}} parts. Each worker receives one matrix with size of KLpmds\frac{KL}{p_{\text{mds}}} and one matrix with size of KLKL. Thus, the size of resulting matrix is K2pmds\frac{K^{2}}{p_{\text{mds}}}. Thus, the storage requirement of repetition and MDS codes is expressed as

Sxw=KLpx+KL+K2px,x{rep,mds}.\displaystyle S_{x}^{w}=\frac{KL}{p_{x}}+KL+\frac{K^{2}}{p_{x}},\forall x\in\{\text{rep},\text{mds}\}. (13)

In polynomial and product codes, each worker receives two matrices of size KLpx\frac{KL}{p_{x}}, x{poly,pro}x\in\{\text{poly},\text{pro}\}. The size of the matrix after multiplication is K2px2\frac{K^{2}}{p_{x}^{2}}. Therefore, the storage requirement of polynomial and product codes at each worker is

Sxw=2KLpx+K2px2,x{poly,pro}.\displaystyle S_{x}^{w}=\frac{2KL}{p_{x}}+\frac{K^{2}}{p_{x}^{2}},\forall x\in\{\text{poly},\text{pro}\}. (14)

On the other hand, the size of the matrix after computation is K2K^{2} in MatDot as matrix partitioning is done differently (i.e., row-wise, which means that AiLpmatdot×KA_{i}\in\mathbb{R}^{\frac{L}{p_{\text{matdot}}}\times K} and BiLpmatdot×KB_{i}\in\mathbb{R}^{\frac{L}{p_{\text{matdot}}}\times K} after partitioning and finally, AiTBiK×KA^{T}_{i}B_{i}\in\mathbb{R}^{K\times K}) as compared to polynomial and product codes (partitions column-wise, which means that we have AiL×KpxA_{i}\in\mathbb{R}^{L\times\frac{K}{p_{x}}} and BiL×KpxB_{i}\in\mathbb{R}^{L\times\frac{K}{p_{x}}} after partitioning, and the final result is AiTBiKpx×Kpx,x{poly,pro}A^{T}_{i}B_{i}\in\mathbb{R}^{\frac{K}{p_{x}}\times\frac{K}{p_{x}}},\forall x\in\{\text{poly},\text{pro}\}). Therefore, the storage requirement of MatDot is expressed as

Smatdotw=2KLpmatdot+K2.\displaystyle S_{\text{matdot}}^{w}=\frac{2KL}{p_{\text{matdot}}}+K^{2}. (15)

III-C Design of ACM2 Algorithm

In this section, we present our ACM2 algorithm. We consider an iterative process such as gradient descent, where matrix multiplications are required at each iteration. Our goal is to determine the best matrix multiplication code and the optimum number of matrix partitions by taking into account the task completion delay, i.e., computing time, storage requirements and decoding probability of each code. In particular, ACM2 solves an optimization problem at each iteration, and determines which code is the best as well as the optimum number of partitions for that code. For example, MDS codes may be good at iteration ii, while polynomial codes may suit better in later iterations. The optimization problem is formulated as

minπ,pπ\displaystyle\min_{\pi,p_{\pi}}\text{\; \;} Tπ\displaystyle T_{\pi}
subject to SπzSthrz,z{m,w},\displaystyle S_{\pi}^{z}\leq S_{\text{thr}}^{z},z\in\{m,w\},
ρπρthr,\displaystyle\rho_{\pi}\geq\rho_{\text{thr}},
kπN,\displaystyle k_{\pi}\leq N,
pπ2,\displaystyle p_{\pi}\geq 2,
π{rep,mds,poly,matdot,pro}.\displaystyle\pi\in\{\text{rep},\text{mds},\text{poly},\text{matdot},\text{pro}\}. (16)

The objective function selects the best code π\pi from the set {rep,mds,poly,matdot,pro}\{\text{rep},\text{mds},\text{poly},\text{matdot},\text{pro}\} as well as the optimum number of partitions pπp_{\pi}. The first constraint is the storage constraint, which limits the storage usage at master and worker devices with thresholds SthrmS_{\text{thr}}^{m} and SthrwS_{\text{thr}}^{w}. The second constraint is the successful decoding constraint, where successful decoding probability ρπ\rho_{\pi} should be larger than the threshold ρthr\rho_{\text{thr}}. The successful decoding probability is defined as the probability that the master receives all the required results from workers. If we assume that the failure probability of each worker is (1ϕ)(1-\phi) and independent, the total number of workers is NN and the number of sufficient results is kπk_{\pi}, one may formulate the probability of success for each coding method as a binomial probability ρπ=i=kπN(Ni)(ϕ)i(1ϕ)Ni\rho_{\pi}=\sum_{i=k_{\pi}}^{N}\binom{N}{i}(\phi)^{i}(1-\phi)^{N-i}. The third constraint makes sure that the recovery threshold kπk_{\pi} is less than the number of workers. The fourth constraint makes the number of partitions larger than or equal to 2, otherwise there is no matrix partitioning, which is a degenerate case.

IV Performance Analysis of ACM2

In this section, we evaluate the performance of our algorithm; ACM2 via simulations. We consider a master/worker setup, where per sub-matrix computing delay λ\lambda is an i.i.d. random variable following a shifted exponential distribution. We compare ACM2 with the baselines; repetition, MDS, polynomial, MatDot, and product codes.

Fig. 1 shows the average computing time versus number of workers NN when there is no storage or successful decoding probability constraint in (III-C). In this setup, K=2000K=2000, L=5000L=5000, ϕ=0.95\phi=0.95, λ\lambda is randomly and uniformly selected from {2,3,4,5,6,7,8,9,10}\{2,3,4,5,6,7,8,9,10\}. In this setup, workers are fast (since λ>1\lambda>1), so all coding algorithms prefer splitting matrices to more partitions. This causes low computing load at workers, but the master needs to wait for more results, i.e., recovery threshold kk is large, to be able to decode computed sub-matrices. In this setup, the optimum number of partitions of repetition codes is equal to NN (prep=Np^{*}_{\text{rep}}=N), while for MDS codes it is close to NN (0.9<pmdsN<10.9<\frac{p^{*}_{\text{mds}}}{N}<1). This means that repetition codes are the same as no coding and MDS gets closer to no coding as λ\lambda increases [28, 1]. When NN is small, no coding is the best, so ACM2 chooses MDS codes (which is close to repetition codes) as they behave as no coding. When NN increases, MDS, polynomial, and product codes perform better than repetition codes. In this setup, product codes operate in the first regime, because workers are fast. Thus, the optimum number of partitions is large and close to NN, so, N=ppro2+τpproN=p_{\text{pro}}^{2}+\tau p_{\text{pro}}. Product codes perform better in this regime [7]. When N\sqrt{N} is integer, product codes are the best as they can use all existing workers and choose the number of partitions as large as possible to decrease the computation load of each worker. However, when N\sqrt{N} is not integer, product codes may waste resources of some workers (as they only use N\lfloor\sqrt{N}\rfloor workers), so MDS and polynomial codes perform better. Computation time of MDS is less than or equal to polynomial, because computation load of polynomial is higher than MDS codes. The optimum number of partitions for each code increases with increasing NN, which decreases the computation load at each worker. Since all workers are fast (λ>1\lambda>1), all codes choose as large partitions as possible. Thus, they perform close to each other. MatDot performs worse than the other codes, because of its different way of partitioning; i.e., row-wise versus column-wise. Thus, MatDot introduces almost 22 times more computation load. As seen, ACM2 exploits the best code among all codes, so it performs the best.

Refer to caption
Figure 1: Task completion delay. There are no storage or successful decoding probability constraints.

Fig. 2 demonstrates average computing time versus number of workers when there exists storage constraint in (III-C). In this setup, K=2000K=2000, L=5000L=5000, ϕ=0.95\phi=0.95, λ\lambda is selected randomly and uniformly from λ{110,19,18,17,16,15,14,13,12}\lambda\in\{\frac{1}{10},\frac{1}{9},\frac{1}{8},\frac{1}{7},\frac{1}{6},\frac{1}{5},\frac{1}{4},\frac{1}{3},\frac{1}{2}\}, and the storage constraint is set to Sthrw=15S_{\text{thr}}^{w}=15M entries. In this scenario, as the workers are slow, i.e., λ<1\lambda<1, all codes prefer to choose small number of partitions. There is a trade-off between the number of partitions and storage requirement. It means that the storage requirement reduces with increasing number of partitions as smaller matrices are multiplied by each worker, so less storage is needed. Since there is a storage constraint, all codes prefer to increase the number of partitions. ACM2 exploits this trade-off and selects the best code and optimum number of partitions.

Fig. 3 illustrates average computing time versus number of workers when there exists both storage and success probability constraints in (III-C). In this setup, K=2000K=2000, L=5000L=5000, ϕ=0.9\phi=0.9, λ\lambda is selected randomly and uniformly from λ{12000,11000,1900,1800,1700,1600,1500}\lambda\in\{\frac{1}{2000},\frac{1}{1000},\frac{1}{900},\frac{1}{800},\frac{1}{700},\frac{1}{600},\frac{1}{500}\}, the storage constraint is set to Sthrw=10S_{\text{thr}}^{w}=10M entries and the success probability constraint is set to ρthr=0.98\rho_{\text{thr}}=0.98. In this scenario, our proposed algorithm selects any of the MDS, product, polynomial, MatDot and repetition codes at least one time during these iterations. In general in this setup, MatDot and polynomial codes perform better as compared with Fig. 1 and Fig. 2. Polynomial codes work better due to the tighter storage constraint and MatDot codes perform better because of the existence of success probability constraint, which has an inverse relation with the recovery threshold.

Refer to caption
Figure 2: Task completion delay. Storage is constrained.
Refer to caption
Figure 3: Task completion delay. Storage and success probability are constrained.

V Conclusion

In this paper, we focused on characterizing the cost-benefit trade-offs of coded computation for practical edge computing systems, and develop an adaptive coded computation framework. In particular, we studied matrix multiplication as a computationally intensive task, and developed an adaptive coding for matrix multiplication (ACM2) algorithm by taking into account the heterogeneous and time varying nature of edge devices. ACM2 dynamically selects the best coding policy by taking into account the computing time, storage requirements as well as successful decoding probability.

References

  • [1] K. Lee, M. Lam, R. Pedarsani, D. Papailiopoulos, and K. Ramchandran, “Speeding up distributed machine learning using codes,” IEEE Transactions on Information Theory, vol. 64, no. 3, March 2018.
  • [2] S. Li, M. A. Maddah-Ali, Q. Yu, and A. S. Avestimehr, “A fundamental tradeoff between computation and communication in distributed computing,” IEEE Transactions on Information Theory, vol. 64, no. 1, pp. 109–128, Jan 2018.
  • [3] F. J. MacWilliams and N. J. A. Sloane, The theory of error-correcting codes.   Elsevier, 1977.
  • [4] S. Lin and D. Costello, Error-Correcting Codes.   Prentice-Hall, Inc, 1983.
  • [5] N. S. Ferdinand and S. C. Draper, “Anytime coding for distributed computation,” in 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton).   IEEE, 2016, pp. 954–960.
  • [6] Q. Yu, M. Maddah-Ali, and S. Avestimehr, “Polynomial codes: an optimal design for high-dimensional coded matrix multiplication,” in Advances in Neural Information Processing Systems, 2017.
  • [7] K. Lee, C. Suh, and K. Ramchandran, “High-dimensional coded matrix multiplication,” in 2017 IEEE International Symposium on Information Theory (ISIT).   IEEE, 2017, pp. 2418–2422.
  • [8] M. Fahim, H. Jeong, F. Haddadpour, S. Dutta, V. Cadambe, and P. Grover, “On the optimal recovery threshold of coded matrix multiplication,” in 2017 55th Annual Allerton Conference on Communication, Control, and Computing (Allerton).   IEEE, 2017, pp. 1264–1270.
  • [9] Q. Yu, M. A. Maddah-Ali, and A. S. Avestimehr, “Straggler mitigation in distributed matrix multiplication: Fundamental limits and optimal coding,” in 2018 IEEE International Symposium on Information Theory (ISIT).   IEEE, 2018, pp. 2022–2026.
  • [10] S. Dutta, V. Cadambe, and P. Grover, “Short-dot: Computing large linear transforms distributedly using coded short dot products,” in Advances In Neural Information Processing Systems, 2016, pp. 2100–2108.
  • [11] ——, “Coded convolution for parallel and distributed computing within a deadline,” arXiv preprint arXiv:1705.03875, 2017.
  • [12] R. Tandon, Q. Lei, A. G. Dimakis, and N. Karampatziakis, “Gradient coding: Avoiding stragglers in distributed learning,” in International Conference on Machine Learning, 2017, pp. 3368–3376.
  • [13] W. Halbawi, N. Azizan, F. Salehi, and B. Hassibi, “Improving distributed gradient descent using reed-solomon codes,” in 2018 IEEE International Symposium on Information Theory (ISIT).   IEEE, 2018, pp. 2027–2031.
  • [14] N. Raviv, R. Tandon, A. Dimakis, and I. Tamo, “Gradient coding from cyclic MDS codes and expander graphs,” in Proceedings of the 35th International Conference on Machine Learning, J. Dy and A. Krause, Eds., vol. 80.   PMLR, 10–15 Jul 2018, pp. 4305–4313.
  • [15] C. Karakus, Y. Sun, and S. Diggavi, “Encoded distributed optimization,” in 2017 IEEE International Symposium on Information Theory (ISIT).   IEEE, 2017, pp. 2890–2894.
  • [16] C. Karakus, Y. Sun, S. Diggavi, and W. Yin, “Straggler mitigation in distributed optimization through data encoding,” in Advances in Neural Information Processing Systems, 2017, pp. 5434–5442.
  • [17] Q. Yu, M. A. Maddah-Ali, and A. S. Avestimehr, “Coded fourier transform,” in 2017 55th Annual Allerton Conference on Communication, Control, and Computing (Allerton).   IEEE, 2017, pp. 494–501.
  • [18] Y. Yang, P. Grover, and S. Kar, “Computing linear transformations with unreliable components,” IEEE Trans. on Information Theory, 2017.
  • [19] Q. Yu, S. Li, M. A. Maddah-Ali, and A. S. Avestimehr, “How to optimally allocate resources for coded distributed computing?” in 2017 IEEE International Conference on Communications (ICC).   IEEE, 2017.
  • [20] S. Li, Q. Yu, M. A. Maddah-Ali, and A. S. Avestimehr, “A scalable framework for wireless distributed computing,” IEEE/ACM Transactions on Networking, vol. 25, no. 5, pp. 2643–2654, 2017.
  • [21] S. Li, M. A. Maddah-Ali, and A. S. Avestimehr, “Coded distributed computing: Straggling servers and multistage dataflows,” in 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton).   IEEE, 2016, pp. 164–171.
  • [22] M. Kiamari, C. Wang, and A. S. Avestimehr, “On heterogeneous coded distributed computing,” in GLOBECOM 2017-2017 IEEE Global Communications Conference.   IEEE, 2017, pp. 1–7.
  • [23] A. Reisizadeh, S. Prakash, R. Pedarsani, and A. S. Avestimehr, “Coded computation over heterogeneous clusters,” IEEE Transactions on Information Theory, vol. 65, no. 7, pp. 4227–4242, 2019.
  • [24] A. Mallick, M. Chaudhari, U. Sheth, G. Palanikumar, and G. Joshi, “Rateless codes for near-perfect load balancing in distributed matrix-vector multiplication,” Proceedings of the ACM on Measurement and Analysis of Computing Systems, vol. 3, no. 3, pp. 1–40, 2019.
  • [25] Y. Keshtkarjahromi, Y. Xing, and H. Seferoglu, “Dynamic heterogeneity-aware coded cooperative computation at the edge,” in 2018 IEEE 26th International Conference on Network Protocols (ICNP).   IEEE, 2018, pp. 23–33.
  • [26] E. Ozfatura, S. Ulukus, and D. Gündüz, “Straggler-aware distributed learning: Communication–computation latency trade-off,” Entropy, vol. 22, no. 5, p. 544, 2020.
  • [27] S. Kiani, N. Ferdinand, and S. C. Draper, “Hierarchical coded matrix multiplication,” in 2019 16th Canadian Workshop on Information Theory (CWIT).   IEEE, 2019, pp. 1–6.
  • [28] K. Lee, M. Lam, R. Pedarsani, D. Papailiopoulos, and K. Ramchandran, “Speeding up distributed machine learning using codes,” in 2016 IEEE International Symposium on Information Theory (ISIT), July 2016, pp. 1143–1147.
  • [29] G. Liang and U. C. Kozat, “Tofec: Achieving optimal throughput-delay trade-off of cloud storage using erasure codes,” in IEEE INFOCOM 2014-IEEE Conference on Computer Communications.   IEEE, 2014, pp. 826–834.
  • [30] J. Justesen and T. Hoholdt, “Analysis of iterated hard decision decoding of product codes with reed-solomon component codes,” in 2007 IEEE Information Theory Workshop, Sep. 2007, pp. 174–177.
  • [31] B. Pittel, J. Spencer, and N. Wormald, “Sudden emergence of a giantk-core in a random graph,” Journal of Combinatorial Theory, Series B, vol. 67, no. 1, pp. 111 – 151, 1996. [Online]. Available: http://www.sciencedirect.com/science/article/pii/S0095895696900362
  • [32] A. DasGupta, Finite Sample Theory of Order Statistics and Extremes.   New York, NY: Springer New York, 2011, pp. 221–248.
  • [33] C. Clapham and J. Nicholson, The Concise Oxford Dictionary of Mathematics.   Oxford University Press, 2009. [Online]. Available: https://www.oxfordreference.com/view/10.1093/acref/9780199235940.001.0001/acref-9780199235940

VI Appendix A: Construction of Table I

In this appendix, we provide detailed calculations as well as some definitions related to Table I.

VI-A Recovery Threshold (k)(k)

Recovery threshold is defined as the minimum number of results that the master needs to receive to be able to compute the final result. In product codes the master can finish matrix multiplication calculation if it receives any kpro=2(ppro1)N(ppro1)2+1k_{\text{pro}}=2(p_{\text{pro}}-1)\sqrt{N}-(p_{\text{pro}}-1)^{2}+1 results from workers [6]. In the example of Table I, by plugging in N=9N=9 and ppro=2p_{\text{pro}}=2, the recovery threshold for product code becomes kpro=6k_{\text{pro}}=6. In polynomial codes, kpoly=ppoly2=4k_{\text{poly}}=p^{2}_{\text{poly}}=4. Indeed, there are four unknowns in α(n)β(n)=A0B0+A1B0n+A0B1n2+A1B1n3\alpha(n)\beta(n)=A_{0}B_{0}+A_{1}B_{0}n+A_{0}B_{1}n^{2}+A_{1}B_{1}n^{3}, so the master needs to receive at least four results from workers to be able to calculate C=ATBC=A^{T}B. In MatDot codes, kmatdot=2pmatdot1=3k_{\text{matdot}}=2p_{\text{matdot}}-1=3. Since there exist three unknowns in α(n)β(n)=(A0+A1n)(B0n+B1)=A0B1+(A0B0+A1B1)n+A1B0n2\alpha(n)\beta(n)=(A_{0}+A_{1}n)(B_{0}n+B_{1})=A_{0}B_{1}+(A_{0}B_{0}+A_{1}B_{1})n+A_{1}B_{0}n^{2}, the master needs to receive at least three results from workers to calculate C=ATBC=A^{T}B. In MDS codes, kmds=pmds=2k_{\text{mds}}=p_{\text{mds}}=2 [1], and in repetition codes krep=NNprep+1k_{\text{rep}}=N-\frac{N}{p_{\text{rep}}}+1, for prep|Np_{\text{rep}}|N, since the master can decode the final result when it receives AiTBA_{i}^{T}B, i=1,,prep\forall i=1,\ldots,p_{\text{rep}}. Therefore, in the example of Table I, by plugging in prep=2p_{\text{rep}}=2, the recovery threshold for repetition codes become N2+1\lfloor\frac{N}{2}+1\rfloor.

VI-B Computing Load per Worker (γ\gamma)

We define the computing load as the total number of multiplications (i.e., ai,kbk,ja_{i,k}b_{k,j}, i,k,j\forall i,k,j) to compute C=ATBC=A^{T}B. We note that the computing load could be considered as the static version of the computing time analysis. We assume that each ai,kbk,ja_{i,k}b_{k,j} multiplication takes a fixed amount of time, so the total computing time will be proportional to the the number of such multiplications . Next, we provide the detailed calculations for computing load per worker.

In product codes, the matrix partitioning is column-wise, so we have AiK×K2A_{i}\in\mathbb{R}^{K\times\frac{K}{2}} and BiK×K2B_{i}\in\mathbb{R}^{K\times\frac{K}{2}}, for i{0,1}i\in\{0,1\}, after partitioning. Thus, K2×K×K2=K34\frac{K}{2}\times K\times\frac{K}{2}=\frac{K^{3}}{4} multiplications are needed to compute AiTBiK2×K2A^{T}_{i}B_{i}\in\mathbb{R}^{\frac{K}{2}\times\frac{K}{2}}. In polynomial codes, the matrix partitioning is also column-wise, and we have AiK×K2A_{i}\in\mathbb{R}^{K\times\frac{K}{2}} and BiK×K2B_{i}\in\mathbb{R}^{K\times\frac{K}{2}}, for i{0,1}i\in\{0,1\} after partitioning, so α(n)K2×K\alpha(n)\in\mathbb{R}^{\frac{K}{2}\times K} and β(n)K×K2\beta(n)\in\mathbb{R}^{K\times\frac{K}{2}}. Similar to the product codes, K2×K×K2=K34\frac{K}{2}\times K\times\frac{K}{2}=\frac{K^{3}}{4} multiplications are needed to compute α(n)β(n)K2×K2\alpha(n)\beta(n)\in\mathbb{R}^{\frac{K}{2}\times\frac{K}{2}}. In MatDot codes, the matrix partitioning is row-wise. It means that we have AiK2×KA_{i}\in\mathbb{R}^{\frac{K}{2}\times K} and BiK2×KB_{i}\in\mathbb{R}^{\frac{K}{2}\times K}, for i{0,1}i\in\{0,1\} after partitioning. Hence, α(n)K×K2\alpha(n)\in\mathbb{R}^{K\times\frac{K}{2}} and β(n)K2×K\beta(n)\in\mathbb{R}^{\frac{K}{2}\times K} and K×K2×K=K32K\times\frac{K}{2}\times K=\frac{K^{3}}{2} multiplications are needed to compute α(n)β(n)K×K\alpha(n)\beta(n)\in\mathbb{R}^{K\times K}. In both MDS and repetition codes, one of the matrices, say AA, is partitioned column-wise, so we have AiK×K2A_{i}\in\mathbb{R}^{K\times\frac{K}{2}}, for i{0,1}i\in\{0,1\}, after partitioning and BK×KB\in\mathbb{R}^{K\times K}. Thus, K2×K×K=K32\frac{K}{2}\times K\times K=\frac{K^{3}}{2} multiplications are needed to compute AiTBK2×KA^{T}_{i}B\in\mathbb{R}^{\frac{K}{2}\times K}.

VI-C Storage Load Per Worker (μ)(\mu)

We can compute the storage load per worker of MDS, repetition, product, polynomial and MatDot codes according to (13), (14) and (15). In the example of Table I, assuming K=LK=L and pmds=prep=ppoly=ppro=pmatdot=2p_{\text{mds}}=p_{\text{rep}}=p_{\text{poly}}=p_{\text{pro}}=p_{\text{matdot}}=2, storage load per worker of polynomial and product codes becomes K2+K24K^{2}+\frac{K^{2}}{4}, while it is 2K22K^{2} for MDS, repetition and MatDot codes.

VI-D Probability of successful computation (ρ)(\rho)

As we discussed in Section III-C, the successful decoding probability can be defined as ρπ=i=kπN(Ni)(ϕ)i(1ϕ)Ni\rho_{\pi}=\sum_{i=k_{\pi}}^{N}\binom{N}{i}(\phi)^{i}(1-\phi)^{N-i}, where (1ϕ)(1-\phi) is the failure probability of each worker. By plugging in (1ϕ)=13(1-\phi)=\frac{1}{3}, kpro=6k_{\text{pro}}=6, kpoly=4k_{\text{poly}}=4, kmatdot=3k_{\text{matdot}}=3, kmds=2k_{\text{mds}}=2 and krep=N2+1k_{\text{rep}}=\lfloor\frac{N}{2}+1\rfloor we can compute the successful decoding probability for different N{6,7,8,9}N\in\{6,7,8,9\} for each coding method in Table I.

VII Appendix B: Proof of Corollary 1

VII-A Expected Value of the kthk^{\text{th}} Order Statistic of Exponential Distribution

Definition 1

Assume X1,X2,,XnX_{1},X_{2},\dots,X_{n} are any nn real valued random variables. If X(1)X(2)X(n)X_{(1)}\leq X_{(2)}\leq\dots\leq X_{(n)} represent the ordered values of X1,X2,,XnX_{1},X_{2},\dots,X_{n}, then, X(1),X(2),,X(n)X_{(1)},X_{(2)},\dots,X_{(n)} will be called the order statistics of X1,X2,,XnX_{1},X_{2},\dots,X_{n} [32].

Reyni’s Representation: The order statistics of an exponential distribution can be shown as linear combinations of independent exponential random variables with a special sequence of coefficients

X(k)=𝐿i=1kXini+1,X_{(k)}\overset{L}{=}\sum_{i=1}^{k}\frac{X_{i}}{n-i+1}, (17)

for k=1,,nk=1,\dots,n. Where =𝐿\overset{L}{=} means equal in distribution and X1,,XnX_{1},\dots,X_{n} are independent exponential random variables with mean 1λ\frac{1}{\lambda} [32].

The expected value of the kthk^{\text{th}} order statistic of exponential distribution with mean 1λ\frac{1}{\lambda}, is expressed as the following

E[X(k)]=1λ(HnH(nk)),\displaystyle E[X_{(k)}]=\frac{1}{\lambda}(H_{n}-H_{(n-k)}), (18)

where Hni=1n1iH_{n}\triangleq\sum_{i=1}^{n}\frac{1}{i}.

VII-B Expected Value of the kthk^{\text{th}} Order Statistic of Shifted-Scaled Exponential Distribution

The function f(X)=a+bXf(X)=a+bX is a monotone function as its first derivative does not change sign. When a monotone function is applied to an ordered set, it preserves the given order [33]. Now, let us define Yi=a+bXiY_{i}=a+bX_{i} for i=1,,ni=1,\dots,n where XiX_{i} are independent exponential random variables with mean 1λ\frac{1}{\lambda}. By monotonicity, we have Y(k)=a+bX(k)Y_{(k)}=a+bX_{(k)} and by considering Reyni’s representation for the kthk^{\text{th}} order statistic of exponential random variables, we have

Y(k)=𝐿a+i=1kbXini+1.Y_{(k)}\overset{L}{=}a+\sum_{i=1}^{k}\frac{bX_{i}}{n-i+1}. (19)

The expected value of the kthk^{\text{th}} order statistic of shifted-scaled exponential distribution with mean a+bλa+\frac{b}{\lambda} is expressed as the following

E[Y(k)]=a+bλ(HnH(nk)).\displaystyle E[Y_{(k)}]=a+\frac{b}{\lambda}(H_{n}-H_{(n-k)}). (20)

VII-C Proof of Corollary 1

In this paper, we assume shifted exponential distribution for computation time of the original task and shifted-scaled exponential distribution for computation time of each sub-task. Thus, if we assume XX as an exponential random variable with mean 1λ\frac{1}{\lambda}, the computation time of the original task can be expressed as t=X+1t=X+1 and if we assume XiX_{i}, i=1,,Ni=1,\dots,N as NN independent exponential random variables with mean 1λ\frac{1}{\lambda}, the computation time of each sub-task can be expressed as ti=Xi+1α,α{prep,pmds,ppoly2,pmatdot,ppro2}t_{i}=\frac{X_{i}+1}{\alpha},\alpha\in\{p_{\text{rep}},p_{\text{mds}},p^{2}_{\text{poly}},p_{\text{matdot}},p^{2}_{\text{pro}}\}. Hence, if we replace both aa and bb with 1α\frac{1}{\alpha} in (20), the expected value of the computation time of each worker for polynomial and MatDot codes can be shown as 1α+1αλ(HNH(Nki))\frac{1}{\alpha}+\frac{1}{\alpha\lambda}(H_{N}-H_{(N-k_{i})}) where, ki{kpoly,kmatdot}k_{i}\in\{k_{\text{poly}},k_{\text{matdot}}\} and α{ppoly2,pmatdot}\alpha\in\{p^{2}_{\text{poly}},p_{\text{matdot}}\}. Moreover, the NthN^{\text{th}} harmonic number, HNH_{N} can be approximated by natural logarithm function, i.e., HNlog(N)H_{N}\approx log(N) and H(Nki)log(Nki)H_{(N-k_{i})}\approx log(N-k_{i}). Therefore, the expected value of the computation time of each worker for aforementioned coding methods becomes

Ti1α(1+1λlog(NNki)),T_{i}\approx\frac{1}{\alpha}(1+\frac{1}{\lambda}log(\frac{N}{N-k_{i}})), (21)

where i{poly,matdot}i\in\{\text{poly},\text{matdot}\}, ki{kpoly,kmatdot}k_{i}\in\{k_{\text{poly}},k_{\text{matdot}}\} and α{ppoly2,pmatdot}\alpha\in\{p^{2}_{\text{poly}},p_{\text{matdot}}\}. \Box

VIII Appendix C: Proof of Corollary 2

VIII-A Asymptotic Computation Time of Product Codes

Although we can directly use order statistics to compute the computation time of repetition, MDS, polynomial and MatDot codes, it is challenging for product codes, because the decodability condition of product codes depends on which specific tasks are completed. However, with the help of the idea of edge-removal process in a bipartite graph [30] and order statistics, one can find an asymptotic computation time for product codes in two different regimes. Assuming exponential distribution as a delay model with mean 1λ\frac{1}{\lambda} for each worker, the average computing time TproT^{{}^{\prime}}_{\text{pro}} for products codes in the first regime is as the following [7, 30, 31]

Tpro1λlog(ppro+τ2cτ/2+1),\displaystyle T^{{}^{\prime}}_{\text{pro}}\approx\frac{1}{\lambda}\log\left(\frac{p_{\text{pro}}+\frac{\tau}{2}}{c_{\tau/2+1}}\right), (22)

where cτ/2+1(1+τ/2)+(1+τ/2)log(1+τ/2)c_{\tau/2+1}\approx(1+\tau/2)+\sqrt{(1+\tau/2)\log(1+\tau/2)}.

Also, the average computation time of (1+δppro,ppro)2(\sqrt{1+\delta}p_{\text{pro}},p_{\text{pro}})^{2} product codes with (1+δ)ppro2(1+\delta)p^{2}_{\text{pro}} workers (assuming exponential distribution with mean 1λ\frac{1}{\lambda} for the computation time of each worker), for a fixed constant δ\delta, as pprop_{\text{pro}} grows to infinity, is lower bounded as follows [7]

Tpro1λlog(1+δδ).\displaystyle T^{\prime}_{\text{pro}}\geq\frac{1}{\lambda}\log\left(\frac{1+\delta}{\delta}\right). (23)

On the other hand, since we consider the worst case scenario to compute the recovery threshold of product codes [6], the upper bound on the computational time of the product codes in the second regime is the (kpro)th(k_{\text{pro}})^{\text{th}} order statistics of (1+δ)ppro2(1+\delta)p^{2}_{\text{pro}} computational times, where kpro=N(Nppro+1)2+1k_{\text{pro}}=N-(\sqrt{N}-p_{\text{pro}}+1)^{2}+1.

Tpro1λlog(NNkpro)\displaystyle T^{\prime}_{\text{pro}}\leq\frac{1}{\lambda}\log\left(\frac{N}{N-k_{\text{pro}}}\right) (24)
=1λlog(N(Nppro+1)21)\displaystyle=\frac{1}{\lambda}\log\left(\frac{N}{(\sqrt{N}-p_{\text{pro}}+1)^{2}-1}\right) (25)
=1λ(log(NNppro)+log(NNppro+2))\displaystyle=\frac{1}{\lambda}\left(\log\left(\frac{\sqrt{N}}{\sqrt{N}-p_{\text{pro}}}\right)+\log\left(\frac{\sqrt{N}}{\sqrt{N}-p_{\text{pro}}+2}\right)\right) (26)
1λ(log(NNppro)+log(NNppro))\displaystyle\leq\frac{1}{\lambda}\left(\log\left(\frac{\sqrt{N}}{\sqrt{N}-p_{\text{pro}}}\right)+\log\left(\frac{\sqrt{N}}{\sqrt{N}-p_{\text{pro}}}\right)\right) (27)
=2λlog(NNppro)\displaystyle=\frac{2}{\lambda}\log\left(\frac{\sqrt{N}}{\sqrt{N}-p_{\text{pro}}}\right) (28)
=2λlog((1+δ)ppro(1+δ)pproppro)\displaystyle=\frac{2}{\lambda}\log\left(\frac{(\sqrt{1+\delta})p_{\text{pro}}}{(\sqrt{1+\delta})p_{\text{pro}}-p_{\text{pro}}}\right) (29)
=2λlog(1+δ+1+δδ).\displaystyle=\frac{2}{\lambda}\log\left(\frac{1+\delta+\sqrt{1+\delta}}{\delta}\right). (30)

Therefore, based on the above explanations, one can define Tproup:=2λlog(1+δ+1+δδ)T^{\prime\text{up}}_{\text{pro}}:=\frac{2}{\lambda}\log\left(\frac{1+\delta+\sqrt{1+\delta}}{\delta}\right) and Tprolow:=1λlog(1+δδ)T^{\prime\text{low}}_{\text{pro}}:=\frac{1}{\lambda}\log\left(\frac{1+\delta}{\delta}\right) as upper bound and lower bound of computational time of product codes, with exponential distribution as delay model of each worker in the second regime.

VIII-B Proof of Corollary 2

One can find asymptotic computation time of product codes assuming exponential distribution as a delay model for each worker in the previous section of this Appendix. On the other hand, we assume shifted-scaled exponential distribution for computation time of each worker in this paper. Therefore, according to equation (20), we can compute (5), (6) and (7) using (22), (23) and (24), respectively by replacing both shift and scale parameters, aa and bb, with 1ppro2\frac{1}{p^{2}_{\text{pro}}} in equation (20). \Box