Adaptive Coding for Matrix Multiplication at Edge Networks
Abstract
Edge computing is emerging as a new paradigm to allow processing data at the edge of the network, where data is typically generated and collected, by exploiting multiple devices at the edge collectively. However, exploiting the potential of edge computing is challenging mainly due to the heterogeneous and time-varying nature of edge devices. Coded computation, which advocates mixing data in sub-tasks by employing erasure codes and offloading these sub-tasks to other devices for computation, is recently gaining interest, thanks to its higher reliability, smaller delay, and lower communication cost. In this paper, our focus is on characterizing the cost-benefit trade-offs of coded computation for practical edge computing systems, and develop an adaptive coded computation framework. In particular, we focus on matrix multiplication as a computationally intensive task, and develop an adaptive coding for matrix multiplication (ACM2) algorithm by taking into account the heterogeneous and time varying nature of edge devices. ACM2 dynamically selects the best coding policy by taking into account the computing time, storage requirements as well as successful decoding probability. We show that ACM2 improves the task completion delay significantly as compared to existing coded matrix multiplication algorithms.
I Introduction
Massive amount of data is generated at edge networks with the emerging Internet of Things (IoT) including self-driving cars, drones, health monitoring devices, etc. Transmitting such massive data to the centralized cloud, and expecting timely processing are not realistic with limited bandwidth between an edge network and centralized cloud. We consider a distributed computing system, where computationally intensive aspects are distributively processed at the end devices with possible help from edge servers (fog) and cloud. However, exploiting the potential of edge computing is challenging mainly due to the heterogeneous and time-varying nature of edge devices.
Coded computation is an emerging field, which studies the design of erasure and error-correcting codes to improve the performance of distributed computing through “smart” data redundancy. This breakthrough idea has spawned a significant effort, mainly in the information and coding theory communities [1, 2]. According to distributed computation, a master device divides computationally intensive aspects/tasks into multiple smaller sub-tasks, and offloads each of them to other devices (end devices, edge servers, and cloud), called workers, for computation. Coded computation (e.g., by employing erasure codes such as Reed Solomon codes [3, 4]), on the other hand, encodes the data in the sub-tasks, and offloads these coded sub-tasks for computation. The next example demonstrates the potential of coded computation for matrix multiplication.
Example 1
Consider a setup where a master wishes to offload a matrix multiplication task to three workers. Assume and are matrices and matrix is divided into two sub matrices and , which are then encoded using a Maximum Distance Separable (MDS) code, which is further explained in Section II, to give , and , and sends each to a different worker. When the master receives the computed values (i.e., ) from at least two out of three workers, it can decode its desired task, which is the computation of . The power of coded computations is that it makes acts as an extra task that can replace any of the other two tasks if they end up straggling or failing.
Significant effort is being put on constructing codes for fast and distributed matrix-vector multiplication [1, 5], matrix-matrix multiplication [6, 7, 8, 9], dot product and convolution of two vectors [10, 11], gradient descent [12, 13, 14], distributed optimization [15, 16], Fourier transform [17], and linear transformations [18]. The trade-off between latency of computation and load of communication for data shuffling in MapReduce framework is characterized in [2], and optimum resource allocation algorithm is developed in [19]. This coding idea is extended for cellular networks [20], multistage computation [21], and heterogeneous systems [22, 23].
Our focus in this work is on matrix multiplication, where a master device divides its matrix multiplication computations into smaller tasks and assigns them to workers (possibly including itself) that can process these tasks in parallel. Product [7], polynomial [6], and MatDot (and its extension PolyDot) codes [8] are recently developed for matrix multiplication. Their main focus is to minimize/optimize the recovery threshold, which is the minimum number of workers that the master needs to wait for in order to compute matrix multiplication ( in Example 1). Although this metric is good for large scale computing systems in data centers, it fails short in edge computing, where other resources including computing power, storage, energy, networking resources are limited.
In this paper, we analyze the computing time, storage cost, and successful decoding probability of some existing codes for matrix multiplication. Then, we develop an adaptive coding for matrix multiplication (ACM2) algorithm that selects the best coding strategy for each sub-matrix.
We note that rateless codes considered in [24, 25] also provide adaptive coded computation mechanisms against heterogeneous and time-varying resources. However, the coding overhead in rateless codes can be high in some scenarios, which makes adaptive selection of fixed-rate codes a better alternative. Multi-message communication by employing Lagrange coded computation is considered in [26] to reduce under-utilization due to discarding partial computations carried out by stragglers as well as over-computation due to inaccurate prediction of the straggling behavior. A hierarchical coded matrix multiplication is developed in [27] to utilize both slow and fast workers. As compared to [26, 27], we propose an adaptive code selection policy for heterogeneous and time-varying resources. A code design mechanism under a heterogeneous setup is developed in [22, 23], where matrix is divided, coded, and offloaded to worker by taking into account heterogeneity of resources. However, available resources at helpers may vary over time, which is not taken into account in [22, 23]. Thus, it is crucial to design a coded computation mechanism, which is dynamic and adaptive to heterogeneous and time-varying resources, which is the goal of this paper.
We show that ACM2 improves the task completion delay significantly as compared to existing coded matrix multiplication algorithms. The following are the key contributions:
- •
- •
-
•
We design ACM2 for an iterative procedure (e.g., gradient descent) that selects the best code as well as the optimum number of partitions for that code at each iteration to minimize the average computing time subject to storage and successful decoding probability constraints.111We note that ACM2 is generic enough to work with any matrix multiplication codes although we consider a subset of existing codes in this paper such as product, polynomial, and MatDot codes.
-
•
We evaluate the performance of ACM2 through simulations and show that ACM2 significantly improves the average computing time as compared to existing codes.
II Model, Background, and Motivation
II-A Model
II-A1 Setup
We consider a master/worker setup at the edge of the network, where the master device offloads its computationally intensive tasks (matrix multiplication computations) to workers , (where ) via device-to-device (D2D) links such as Wi-Fi Direct and/or Bluetooth. The master device divides a task (matrix) into smaller sub-matrices, and offloads them to parallel processing workers.
II-A2 Task Model
The master wants to compute functions of its collected data, which is determined by the applications. We will focus on computing linear functions; specifically matrix multiplication , where , . Matrix multiplication forms an essential building block of many signal processing (convolution, compression, etc.) and machine learning algorithms (gradient descent, classification, etc.) [1]. We consider an iterative procedure (gradient descent) where a matrix multiplication is calculated at each iteration.
II-A3 Worker Model
The workers have the following properties: (i) failures: workers may fail or “sleep/die" or leave the network before finishing their assigned computational tasks. (ii) stragglers: workers will incur probabilistic delays in responding to the master.
II-A4 Coding Model
We design and employ an adaptive coding for matrix multiplication (ACM2) that selects the best coding strategy among repetition, MDS [28, 1], polynomial [6], MatDot [8], and product codes [7] by taking into account the computing time, storage cost, and successful decoding probability of these codes. The master device divides the matrix into partitions (sub-matrices), where for repetition and MDS codes. Both and matrices are divided into partitions, where for polynomial, MatDot, and product codes.
II-A5 Delay Model
Each sub-matrix transmitted from the master to a worker experiences the following delays: (i) transmission delay for sending the sub-matrix from the master to the worker, (ii) computation delay for computing the multiplication of the sub-matrices, and (iii) transmission delay for sending the computed matrix from the worker back to the master. We model the composite delay using the shifted exponential distribution ( for ) [29, 1], with referred to as the straggling parameter and each sub-task with shifted-scaled exponential distribution ( for ) where the scale parameter, is selected from .
II-B Background on Existing Codes for Matrix Multiplication
In this section, we provide a short background on existing coded computation mechanisms for matrix multiplication.
II-B1 Repetition Codes
The master device divides matrix column-wise into parts, where , and generates copies of each sub-matrix. Sub-matrix , is transmitted to workers as well as matrix . Workers calculate , and return their calculation back to the master, which finishes matrix multiplication calculation when it receives , .
II-B2 MDS Codes [28, 1]
The master device divides the matrix column-wise to partitions. An MDS code sets and codes sub-matrices into sub-matrices using existing MDS codes like Reed-Solomon codes. , as well as are transmitted to workers. When the master device receives calculations, it can decode and calculate .
II-B3 Polynomial Codes [6]
The master device divides and column-wise into partitions, where . The master constructs polynomials; and , and sends them to worker , which calculates multiplication. When the master receives multiplication, decoding is completed.
II-B4 MatDot Codes [8]
Both matrices and are divided row-wise into partitions, where . The master constructs the polynomials; and , and sends them to worker for processing, where worker calculates multiplication. When the master receives results from workers, it can decode and calculate .
II-B5 Product Codes [7]
Product codes extend MDS codes in a way that both and are partitioned column-wise and coded. In particular, both and are divided into partitions, and these partitions are put into array. Then, every row of the array is encoded with an MDS code, which results array. This array is also coded with an MDS code, which results into -by- array. Each coded sub-matrix in this array is sent to a worker (out of workers) for calculation. Product codes are decodable if at least one entry of any possible sub-array with size larger than or equal to is received successfully.
|
|
|
|
||||||||||||||||
N=6 | N=7 | N=8 | N=9 | ||||||||||||||||
Product | 6 | N/A | N/A | N/A | 0.63 | ||||||||||||||
Polynomial | 4 | 0.67 | 0.82 | 0.91 | 0.96 | ||||||||||||||
MatDot | 3 | 0.89 | 0.95 | 0.98 | 0.99 | ||||||||||||||
MDS | 2 | 0.98 | 0.99 | 0.99 | 0.99 | ||||||||||||||
Repetition | 0.67 | N/A | 0.73 | N/A |
II-C Motivation for Adaptive Coding
Assume a canonical setup, where and are matrices and divided into two sub-matrices , , , and . The product codes divide matrices column-wise, i.e., and are matrices, for , and use two-level MDS codes. Considering that and , nine codes are constructed by , for . In polynomial codes the master device, by dividing matrices column-wise, creates polynomials and for worker , which multiplies . MatDot follows a similar idea of polynomial codes with the following difference: and are divided row-wise.
Table I shows the recovery threshold () [7, 6, 8], computing load per worker () (this is the simplified analysis presented in Section III-A and further detailed in Appendix A), which shows the number of required multiplications, storage load per worker (), which shows the average amount of memory needed to store matrices and their multiplication (as detailed in Section III-B), and probability of successful computation (), which is calculated assuming that the failure probability of workers is and independent. As seen, although MDS is the best in terms of recovery threshold (), it introduces more computing load per worker (because of partitioning only one of the matrices). Also, MatDot, MDS and repetition codes perform worse than polynomial and product codes in terms of storage load per worker. Product codes require at least workers due to their very design, and for repetition codes the number of workers should be an even number, but MDS, polynomial and MatDot codes are more flexible. As seen, there is a trade-off among , which we aim to explore in this paper by taking into account the limited edge computing resources including computing power and storage. For example, if there is no constraint on the total number of workers, but only on computing load, we will likely select product or polynomial codes. Next, we will provide a computing time and storage analysis of existing codes, and develop ACM2 that selects the best code depending on edge constraints.
III Adaptive Coding for Matrix Multiplication
III-A Computing Time Analysis
Assuming a shifted-scaled exponential distribution as a computation delay model with as the straggling parameter and as the scale parameter for each worker, average computing time for repetition codes and MDS codes are expressed [1] as
(1) |
(2) |
Corollary 1
The average computing time for polynomial codes and MatDot codes is expressed as the following assuming that a shifted-scaled exponential distribution is used as a delay model.
(3) |
(4) |
Proof: The proof is provided in Appendix B.
The product codes have different performance in two different regimes. In the first regime the number of workers scales sublinearly with , i.e., , while in the second regime, the number of workers scales linearly with , i.e., . The computing time analysis of product codes is provided in these two regimes next.
Corollary 2
Assume a shifted-scaled exponential distribution as a delay model with as the straggling parameter and as the scale parameter for each worker, average computing time for product codes and workers, where is an even integer, as grows to infinity, is expressed in the first regime as
(5) |
where [30], [31]. Assuming the same delay distribution, the lower bound and upper bound of average computing time for product codes and workers, for a fixed , as grows to infinity, is expressed in the second regime as
(6) |
(7) |
Proof: The proof is provided in Appendix C.
III-B Storage Analysis
In this section, we provide storage requirements of the codes that we explained in Section II-B. These codes have storage requirements both at the master and worker devices. In particular, we assume that each entry of matrices and requires a fixed amount of memory. Our analysis, which is provided next, quantifies how many of these entries are needed to be stored at the master and worker devices.
III-B1 Storage at Master
In all codes, we first store matrices and , where each matrix contains components. Also, we store the final result, , which contains components. Therefore, entries should be stored at the master device for each code.
In repetition codes, we store the first results obtained from workers, where . Each result contains components.
(8) |
In MDS codes, we store coded sub-matrices. Each coded matrix contains components. Then, we need to store the first results obtained from workers. Each result contains components.
(9) |
In polynomial codes, we store coded sub-matrices. Each coded matrix contains components. Then, we need to store the first results obtained from workers. Each such result contains components.
(10) |
In MatDot, we store coded sub-matrices. Each coded matrix contains components. Then, we need to store the first results collected from workers. Each result contains components.
(11) |
In product codes, we store coded sub-matrices. Each coded matrix contains components. Then, we need to store the first results collected from workers, where [6]. Each result contains components.
(12) |
III-B2 Storage at Workers
In repetition codes, each worker receives one matrix of size and one matrix of size , as we decompose only one of the matrices to parts. So, the size of resulting matrix is . Similarly, in MDS codes we decompose only one of the matrices to parts. Each worker receives one matrix with size of and one matrix with size of . Thus, the size of resulting matrix is . Thus, the storage requirement of repetition and MDS codes is expressed as
(13) |
In polynomial and product codes, each worker receives two matrices of size , . The size of the matrix after multiplication is . Therefore, the storage requirement of polynomial and product codes at each worker is
(14) |
On the other hand, the size of the matrix after computation is in MatDot as matrix partitioning is done differently (i.e., row-wise, which means that and after partitioning and finally, ) as compared to polynomial and product codes (partitions column-wise, which means that we have and after partitioning, and the final result is ). Therefore, the storage requirement of MatDot is expressed as
(15) |
III-C Design of ACM2 Algorithm
In this section, we present our ACM2 algorithm. We consider an iterative process such as gradient descent, where matrix multiplications are required at each iteration. Our goal is to determine the best matrix multiplication code and the optimum number of matrix partitions by taking into account the task completion delay, i.e., computing time, storage requirements and decoding probability of each code. In particular, ACM2 solves an optimization problem at each iteration, and determines which code is the best as well as the optimum number of partitions for that code. For example, MDS codes may be good at iteration , while polynomial codes may suit better in later iterations. The optimization problem is formulated as
subject to | ||||
(16) |
The objective function selects the best code from the set as well as the optimum number of partitions . The first constraint is the storage constraint, which limits the storage usage at master and worker devices with thresholds and . The second constraint is the successful decoding constraint, where successful decoding probability should be larger than the threshold . The successful decoding probability is defined as the probability that the master receives all the required results from workers. If we assume that the failure probability of each worker is and independent, the total number of workers is and the number of sufficient results is , one may formulate the probability of success for each coding method as a binomial probability . The third constraint makes sure that the recovery threshold is less than the number of workers. The fourth constraint makes the number of partitions larger than or equal to 2, otherwise there is no matrix partitioning, which is a degenerate case.
IV Performance Analysis of ACM2
In this section, we evaluate the performance of our algorithm; ACM2 via simulations. We consider a master/worker setup, where per sub-matrix computing delay is an i.i.d. random variable following a shifted exponential distribution. We compare ACM2 with the baselines; repetition, MDS, polynomial, MatDot, and product codes.
Fig. 1 shows the average computing time versus number of workers when there is no storage or successful decoding probability constraint in (III-C). In this setup, , , , is randomly and uniformly selected from . In this setup, workers are fast (since ), so all coding algorithms prefer splitting matrices to more partitions. This causes low computing load at workers, but the master needs to wait for more results, i.e., recovery threshold is large, to be able to decode computed sub-matrices. In this setup, the optimum number of partitions of repetition codes is equal to (), while for MDS codes it is close to (). This means that repetition codes are the same as no coding and MDS gets closer to no coding as increases [28, 1]. When is small, no coding is the best, so ACM2 chooses MDS codes (which is close to repetition codes) as they behave as no coding. When increases, MDS, polynomial, and product codes perform better than repetition codes. In this setup, product codes operate in the first regime, because workers are fast. Thus, the optimum number of partitions is large and close to , so, . Product codes perform better in this regime [7]. When is integer, product codes are the best as they can use all existing workers and choose the number of partitions as large as possible to decrease the computation load of each worker. However, when is not integer, product codes may waste resources of some workers (as they only use workers), so MDS and polynomial codes perform better. Computation time of MDS is less than or equal to polynomial, because computation load of polynomial is higher than MDS codes. The optimum number of partitions for each code increases with increasing , which decreases the computation load at each worker. Since all workers are fast (), all codes choose as large partitions as possible. Thus, they perform close to each other. MatDot performs worse than the other codes, because of its different way of partitioning; i.e., row-wise versus column-wise. Thus, MatDot introduces almost times more computation load. As seen, ACM2 exploits the best code among all codes, so it performs the best.

Fig. 2 demonstrates average computing time versus number of workers when there exists storage constraint in (III-C). In this setup, , , , is selected randomly and uniformly from , and the storage constraint is set to M entries. In this scenario, as the workers are slow, i.e., , all codes prefer to choose small number of partitions. There is a trade-off between the number of partitions and storage requirement. It means that the storage requirement reduces with increasing number of partitions as smaller matrices are multiplied by each worker, so less storage is needed. Since there is a storage constraint, all codes prefer to increase the number of partitions. ACM2 exploits this trade-off and selects the best code and optimum number of partitions.
Fig. 3 illustrates average computing time versus number of workers when there exists both storage and success probability constraints in (III-C). In this setup, , , , is selected randomly and uniformly from , the storage constraint is set to M entries and the success probability constraint is set to . In this scenario, our proposed algorithm selects any of the MDS, product, polynomial, MatDot and repetition codes at least one time during these iterations. In general in this setup, MatDot and polynomial codes perform better as compared with Fig. 1 and Fig. 2. Polynomial codes work better due to the tighter storage constraint and MatDot codes perform better because of the existence of success probability constraint, which has an inverse relation with the recovery threshold.


V Conclusion
In this paper, we focused on characterizing the cost-benefit trade-offs of coded computation for practical edge computing systems, and develop an adaptive coded computation framework. In particular, we studied matrix multiplication as a computationally intensive task, and developed an adaptive coding for matrix multiplication (ACM2) algorithm by taking into account the heterogeneous and time varying nature of edge devices. ACM2 dynamically selects the best coding policy by taking into account the computing time, storage requirements as well as successful decoding probability.
References
- [1] K. Lee, M. Lam, R. Pedarsani, D. Papailiopoulos, and K. Ramchandran, “Speeding up distributed machine learning using codes,” IEEE Transactions on Information Theory, vol. 64, no. 3, March 2018.
- [2] S. Li, M. A. Maddah-Ali, Q. Yu, and A. S. Avestimehr, “A fundamental tradeoff between computation and communication in distributed computing,” IEEE Transactions on Information Theory, vol. 64, no. 1, pp. 109–128, Jan 2018.
- [3] F. J. MacWilliams and N. J. A. Sloane, The theory of error-correcting codes. Elsevier, 1977.
- [4] S. Lin and D. Costello, Error-Correcting Codes. Prentice-Hall, Inc, 1983.
- [5] N. S. Ferdinand and S. C. Draper, “Anytime coding for distributed computation,” in 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton). IEEE, 2016, pp. 954–960.
- [6] Q. Yu, M. Maddah-Ali, and S. Avestimehr, “Polynomial codes: an optimal design for high-dimensional coded matrix multiplication,” in Advances in Neural Information Processing Systems, 2017.
- [7] K. Lee, C. Suh, and K. Ramchandran, “High-dimensional coded matrix multiplication,” in 2017 IEEE International Symposium on Information Theory (ISIT). IEEE, 2017, pp. 2418–2422.
- [8] M. Fahim, H. Jeong, F. Haddadpour, S. Dutta, V. Cadambe, and P. Grover, “On the optimal recovery threshold of coded matrix multiplication,” in 2017 55th Annual Allerton Conference on Communication, Control, and Computing (Allerton). IEEE, 2017, pp. 1264–1270.
- [9] Q. Yu, M. A. Maddah-Ali, and A. S. Avestimehr, “Straggler mitigation in distributed matrix multiplication: Fundamental limits and optimal coding,” in 2018 IEEE International Symposium on Information Theory (ISIT). IEEE, 2018, pp. 2022–2026.
- [10] S. Dutta, V. Cadambe, and P. Grover, “Short-dot: Computing large linear transforms distributedly using coded short dot products,” in Advances In Neural Information Processing Systems, 2016, pp. 2100–2108.
- [11] ——, “Coded convolution for parallel and distributed computing within a deadline,” arXiv preprint arXiv:1705.03875, 2017.
- [12] R. Tandon, Q. Lei, A. G. Dimakis, and N. Karampatziakis, “Gradient coding: Avoiding stragglers in distributed learning,” in International Conference on Machine Learning, 2017, pp. 3368–3376.
- [13] W. Halbawi, N. Azizan, F. Salehi, and B. Hassibi, “Improving distributed gradient descent using reed-solomon codes,” in 2018 IEEE International Symposium on Information Theory (ISIT). IEEE, 2018, pp. 2027–2031.
- [14] N. Raviv, R. Tandon, A. Dimakis, and I. Tamo, “Gradient coding from cyclic MDS codes and expander graphs,” in Proceedings of the 35th International Conference on Machine Learning, J. Dy and A. Krause, Eds., vol. 80. PMLR, 10–15 Jul 2018, pp. 4305–4313.
- [15] C. Karakus, Y. Sun, and S. Diggavi, “Encoded distributed optimization,” in 2017 IEEE International Symposium on Information Theory (ISIT). IEEE, 2017, pp. 2890–2894.
- [16] C. Karakus, Y. Sun, S. Diggavi, and W. Yin, “Straggler mitigation in distributed optimization through data encoding,” in Advances in Neural Information Processing Systems, 2017, pp. 5434–5442.
- [17] Q. Yu, M. A. Maddah-Ali, and A. S. Avestimehr, “Coded fourier transform,” in 2017 55th Annual Allerton Conference on Communication, Control, and Computing (Allerton). IEEE, 2017, pp. 494–501.
- [18] Y. Yang, P. Grover, and S. Kar, “Computing linear transformations with unreliable components,” IEEE Trans. on Information Theory, 2017.
- [19] Q. Yu, S. Li, M. A. Maddah-Ali, and A. S. Avestimehr, “How to optimally allocate resources for coded distributed computing?” in 2017 IEEE International Conference on Communications (ICC). IEEE, 2017.
- [20] S. Li, Q. Yu, M. A. Maddah-Ali, and A. S. Avestimehr, “A scalable framework for wireless distributed computing,” IEEE/ACM Transactions on Networking, vol. 25, no. 5, pp. 2643–2654, 2017.
- [21] S. Li, M. A. Maddah-Ali, and A. S. Avestimehr, “Coded distributed computing: Straggling servers and multistage dataflows,” in 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton). IEEE, 2016, pp. 164–171.
- [22] M. Kiamari, C. Wang, and A. S. Avestimehr, “On heterogeneous coded distributed computing,” in GLOBECOM 2017-2017 IEEE Global Communications Conference. IEEE, 2017, pp. 1–7.
- [23] A. Reisizadeh, S. Prakash, R. Pedarsani, and A. S. Avestimehr, “Coded computation over heterogeneous clusters,” IEEE Transactions on Information Theory, vol. 65, no. 7, pp. 4227–4242, 2019.
- [24] A. Mallick, M. Chaudhari, U. Sheth, G. Palanikumar, and G. Joshi, “Rateless codes for near-perfect load balancing in distributed matrix-vector multiplication,” Proceedings of the ACM on Measurement and Analysis of Computing Systems, vol. 3, no. 3, pp. 1–40, 2019.
- [25] Y. Keshtkarjahromi, Y. Xing, and H. Seferoglu, “Dynamic heterogeneity-aware coded cooperative computation at the edge,” in 2018 IEEE 26th International Conference on Network Protocols (ICNP). IEEE, 2018, pp. 23–33.
- [26] E. Ozfatura, S. Ulukus, and D. Gündüz, “Straggler-aware distributed learning: Communication–computation latency trade-off,” Entropy, vol. 22, no. 5, p. 544, 2020.
- [27] S. Kiani, N. Ferdinand, and S. C. Draper, “Hierarchical coded matrix multiplication,” in 2019 16th Canadian Workshop on Information Theory (CWIT). IEEE, 2019, pp. 1–6.
- [28] K. Lee, M. Lam, R. Pedarsani, D. Papailiopoulos, and K. Ramchandran, “Speeding up distributed machine learning using codes,” in 2016 IEEE International Symposium on Information Theory (ISIT), July 2016, pp. 1143–1147.
- [29] G. Liang and U. C. Kozat, “Tofec: Achieving optimal throughput-delay trade-off of cloud storage using erasure codes,” in IEEE INFOCOM 2014-IEEE Conference on Computer Communications. IEEE, 2014, pp. 826–834.
- [30] J. Justesen and T. Hoholdt, “Analysis of iterated hard decision decoding of product codes with reed-solomon component codes,” in 2007 IEEE Information Theory Workshop, Sep. 2007, pp. 174–177.
- [31] B. Pittel, J. Spencer, and N. Wormald, “Sudden emergence of a giantk-core in a random graph,” Journal of Combinatorial Theory, Series B, vol. 67, no. 1, pp. 111 – 151, 1996. [Online]. Available: http://www.sciencedirect.com/science/article/pii/S0095895696900362
- [32] A. DasGupta, Finite Sample Theory of Order Statistics and Extremes. New York, NY: Springer New York, 2011, pp. 221–248.
- [33] C. Clapham and J. Nicholson, The Concise Oxford Dictionary of Mathematics. Oxford University Press, 2009. [Online]. Available: https://www.oxfordreference.com/view/10.1093/acref/9780199235940.001.0001/acref-9780199235940
VI Appendix A: Construction of Table I
In this appendix, we provide detailed calculations as well as some definitions related to Table I.
VI-A Recovery Threshold
Recovery threshold is defined as the minimum number of results that the master needs to receive to be able to compute the final result. In product codes the master can finish matrix multiplication calculation if it receives any results from workers [6]. In the example of Table I, by plugging in and , the recovery threshold for product code becomes . In polynomial codes, . Indeed, there are four unknowns in , so the master needs to receive at least four results from workers to be able to calculate . In MatDot codes, . Since there exist three unknowns in , the master needs to receive at least three results from workers to calculate . In MDS codes, [1], and in repetition codes , for , since the master can decode the final result when it receives , . Therefore, in the example of Table I, by plugging in , the recovery threshold for repetition codes become .
VI-B Computing Load per Worker ()
We define the computing load as the total number of multiplications (i.e., , ) to compute . We note that the computing load could be considered as the static version of the computing time analysis. We assume that each multiplication takes a fixed amount of time, so the total computing time will be proportional to the the number of such multiplications . Next, we provide the detailed calculations for computing load per worker.
In product codes, the matrix partitioning is column-wise, so we have and , for , after partitioning. Thus, multiplications are needed to compute . In polynomial codes, the matrix partitioning is also column-wise, and we have and , for after partitioning, so and . Similar to the product codes, multiplications are needed to compute . In MatDot codes, the matrix partitioning is row-wise. It means that we have and , for after partitioning. Hence, and and multiplications are needed to compute . In both MDS and repetition codes, one of the matrices, say , is partitioned column-wise, so we have , for , after partitioning and . Thus, multiplications are needed to compute .
VI-C Storage Load Per Worker
VI-D Probability of successful computation
VII Appendix B: Proof of Corollary 1
VII-A Expected Value of the Order Statistic of Exponential Distribution
Definition 1
Assume are any real valued random variables. If represent the ordered values of , then, will be called the order statistics of [32].
Reyni’s Representation: The order statistics of an exponential distribution can be shown as linear combinations of independent exponential random variables with a special sequence of coefficients
(17) |
for . Where means equal in distribution and are independent exponential random variables with mean [32].
The expected value of the order statistic of exponential distribution with mean , is expressed as the following
(18) |
where .
VII-B Expected Value of the Order Statistic of Shifted-Scaled Exponential Distribution
The function is a monotone function as its first derivative does not change sign. When a monotone function is applied to an ordered set, it preserves the given order [33]. Now, let us define for where are independent exponential random variables with mean . By monotonicity, we have and by considering Reyni’s representation for the order statistic of exponential random variables, we have
(19) |
The expected value of the order statistic of shifted-scaled exponential distribution with mean is expressed as the following
(20) |
VII-C Proof of Corollary 1
In this paper, we assume shifted exponential distribution for computation time of the original task and shifted-scaled exponential distribution for computation time of each sub-task. Thus, if we assume as an exponential random variable with mean , the computation time of the original task can be expressed as and if we assume , as independent exponential random variables with mean , the computation time of each sub-task can be expressed as . Hence, if we replace both and with in (20), the expected value of the computation time of each worker for polynomial and MatDot codes can be shown as where, and . Moreover, the harmonic number, can be approximated by natural logarithm function, i.e., and . Therefore, the expected value of the computation time of each worker for aforementioned coding methods becomes
(21) |
where , and .
VIII Appendix C: Proof of Corollary 2
VIII-A Asymptotic Computation Time of Product Codes
Although we can directly use order statistics to compute the computation time of repetition, MDS, polynomial and MatDot codes, it is challenging for product codes, because the decodability condition of product codes depends on which specific tasks are completed. However, with the help of the idea of edge-removal process in a bipartite graph [30] and order statistics, one can find an asymptotic computation time for product codes in two different regimes. Assuming exponential distribution as a delay model with mean for each worker, the average computing time for products codes in the first regime is as the following [7, 30, 31]
(22) |
where .
Also, the average computation time of product codes with workers (assuming exponential distribution with mean for the computation time of each worker), for a fixed constant , as grows to infinity, is lower bounded as follows [7]
(23) |
On the other hand, since we consider the worst case scenario to compute the recovery threshold of product codes [6], the upper bound on the computational time of the product codes in the second regime is the order statistics of computational times, where .
(24) | |||
(25) | |||
(26) | |||
(27) | |||
(28) | |||
(29) | |||
(30) |
Therefore, based on the above explanations, one can define and as upper bound and lower bound of computational time of product codes, with exponential distribution as delay model of each worker in the second regime.
VIII-B Proof of Corollary 2
One can find asymptotic computation time of product codes assuming exponential distribution as a delay model for each worker in the previous section of this Appendix. On the other hand, we assume shifted-scaled exponential distribution for computation time of each worker in this paper. Therefore, according to equation (20), we can compute (5), (6) and (7) using (22), (23) and (24), respectively by replacing both shift and scale parameters, and , with in equation (20).