Collaborative Coded Computation Offloading: An All-pay Auction Approach
Abstract
As the amount of data collected for crowdsensing applications increases rapidly due to improved sensing capabilities and the increasing number of Internet of Things (IoT) devices, the cloud server is no longer able to handle the large-scale datasets individually. Given the improved computational capabilities of the edge devices, coded distributed computing has become a promising approach given that it allows computation tasks to be carried out in a distributed manner while mitigating straggler effects, which often account for the long overall completion times. Specifically, by using polynomial codes, computed results from only a subset of devices are needed to reconstruct the final result. However, there is no incentive for the edge devices to complete the computation tasks. In this paper, we present an all-pay auction to incentivize the edge devices to participate in the coded computation tasks. In this auction, the bids of the edge devices are represented by the allocation of their Central Processing Unit (CPU) power to the computation tasks. All edge devices submit their bids regardless of whether they win or lose in the auction. The all-pay auction is designed to maximize the utility of the cloud server by determining the reward allocation to the winners. Simulation results show that the edge devices are incentivized to allocate more CPU power when multiple rewards are offered instead of a single reward.
Index Terms:
Coded distributed computing, straggler effects mitigation, all-pay auction, Bayesian Nash equilibriumI Introduction
The increasing sensing capabilities of Internet of Things (IoT) devices such as smart meters and smartphones, coupled with reliable wireless communication technologies, have enabled the development of crowdsensing applications, which leverage large-scale data for analysis. The IoT devices upload the data collected by the on-board sensors to the cloud server for analysis. Through deep learning techniques, the cloud server aims to build accurate decision-making models for smart city applications, such as estimation of road conditions [1], air quality monitoring [2] and tracking of medical conditions [3].
However, as the number of IoT devices increases rapidly, the cloud server may not be able to handle the ever-expanding datasets individually. Edge computing [4] has become a promising approach that extends cloud computing services to the edge of the networks. Specifically, by leveraging on the computational capabilities, e.g., Central Processing Unit (CPU) power, of the edge devices, e.g., base stations, the cloud server can offload its computation tasks to the edge devices. Instead of offloading the entire computation to a single edge device, the cloud server can divide the large datasets into smaller sets and distribute them to the edge devices for processing. This alleviates the bottleneck and single point of failure problems. Upon completing the allocated computation subtasks, the edge devices return the computed results to the cloud server for aggregation to reconstruct the final result.
To perform the distributed computation tasks efficiently, coded distributed computing [5] is being increasingly used to mitigate the straggler effects caused by factors such as imbalanced work allocation, contention of shared resources and network congestion [6, 7].
In this paper, we consider that a cloud server, i.e., master, aims to build a prediction model in which the training process involves a large number of matrix multiplication computations. Matrix multiplication is an important operation underlying many data analytics applications, e.g., machine learning, scientific computing and graph processing [8]. Due to a lack of computational capabilities and storage resources, the resource-constrained master employs edge devices, i.e., workers, to facilitate the distributed computation task. First, the master uses polynomial codes [8] to determine the algebraic structure of the encoded submatrices and distributes them to the workers. Then, the workers perform local computations based on their allocated datasets, and return the computed results to the master. The master reconstructs the final result by using fast polynomial interpolation algorithms such as the Reed-Solomon decoding algorithm [9].
However, there is no incentive for the workers to participate in or to complete their allocated distributed computation tasks. To design an appropriate incentive mechanism, it is important to consider the unique aspect of the coded distributed computing framework. Specifically, even though workers are allocated a subset of the entire dataset for computations, some of the workers may perform computations without any compensation if their computed results are not used. As such, we propose an all-pay auction to improve worker participation. In traditional auctions such as first-price and second-price auctions, only the winners of the auctions pay. In contrast, in all-pay auctions, regardless of whether the bidders win or lose, they are required to pay. In this all-pay auction, the bids of the workers are represented by the CPU power, i.e., the number of CPU cycles, allocated by the workers to complete the computation tasks. In other words, the larger the CPU power allocated, the higher the bid of the worker.
There are two advantages of an all-pay auction [10]. Firstly, it reduces the probability of non-completion of allocated tasks. This differs from traditional auctions in which the winners of the auctions can still choose not to complete their tasks and give up the reward that is promised by the auctioneer. As a result, the auctioneer needs to conduct another round of auction to meet its objectives. Secondly, it reduces the coordination cost between the auctioneer and the bidders. Specifically, in traditional auctions, the participants need to bid then contribute whereas in all-pay auctions, the bids of the participants are directly determined by their contributions. In other words, the participants do not need to bid explicitly in all-pay auctions.
II System Model
We consider a master-worker setup in a heterogeneous distributed computing network. The system model consists of a master, e.g., cloud server and a set of workers, e.g., edge devices such as base stations, which have different computational capabilities. In crowdsensing, for example, the ubiquity of the IoT devices as well as their on-board sensing and processing capabilities are leveraged to collect data for many innovative crowdsensing applications, e.g., weather prediction, location-based recommendation and forensic analysis. Given the large amount of sensor data collected from different IoT devices, the master aims to perform training of the machine learning algorithm to complete a user-defined task. As the number of IoT devices that contribute to the crowdsensing task increases, so does the size of the dataset that the master needs to handle. However, the master may not have sufficient resources, i.e., computation power to handle the growing dataset. Instead, it may utilize the resources of the workers to collaboratively complete the distributed computation tasks.
One of the main challenges in performing distributed computation tasks is the straggler effects. Coding techniques such as polynomial codes [8] can be used to mitigate straggler effects by reducing the recovery threshold, i.e., the number of workers that need to submit their results for the master to reconstruct the final result. In order to perform distributed coded matrix multiplication computations, i.e., where and are input matrices, and for integers , , and and a sufficiently large finite field , there are four important steps:
-
1.
Task Allocation: Given that all workers are able to store up to fraction of matrix and fraction of matrix , the master divides the input matrices into submatrices and , , where and respectively. Specifically, and represent the vectors of functions such that and , respectively. Then, the master distributes the submatrices to the workers over the wireless channels for computations.
-
2.
Local Computation: Each worker is allocated submatrices and by the master. Based on the allocated submatrices, the workers perform matrix multiplication, i.e., , .
-
3.
Wireless Transmission: Upon completion of the local computations, the workers transmit the computed results, i.e., to the master over the wireless communication channels.
-
4.
Reconstruction of Final Result: By using polynomial codes, the master is able to reconstruct the final result upon receiving out of computed results by using decoding functions. In other words, the master does not need to wait for all workers to complete the computation task. Note that although there is no constraint on the decoding functions to be used with polynomial codes, a low-complexity decoding function ensures the efficiency of the overall matrix multiplication computations.
In our system model, we consider the computations for distributed matrix multiplication. However, there are other types of distributed computation problems, e.g., stochastic gradient descent, convolution and Fourier transform. The matrix multiplication may also involve more than two matrices. Our system model can be easily extended to solve the matrix multiplication of more than two matrices and different computation problems.
By using polynomial codes [8], the optimum recovery threshold that can be achieved where each worker is able to store up to of matrix and of matrix is defined as:
(1) |
However, there is no incentive for the workers to be one of the workers to complete their local computations and return their computed results to the master. The workers may not want to participate in the distributed computation tasks or perform to the best of their abilities. For example, the workers may not allocate large CPU power for the coded distributed tasks, resulting in longer time needed to complete the computation tasks. In order to incentivize the workers to allocate more CPU power to complete the allocated computation tasks, we adopt an all-pay auction approach to determine the rewards to the workers while maximizing the utility of the master.
II-A Utility of the Master
Given that the master only needs the computed results from workers to reconstruct the final result, the master offers rewards, represented by the set where . Specifically, there are rewards for which workers compete. Since only rewards are offered, workers do not receive any reward from the master, even though they perform the matrix multiplication computations given the allocated submatrices.
The size of reward is represented by . The worker that allocates larger CPU power is offered larger reward. In particular, the worker that allocates the largest amount of CPU power receives a reward of , the worker with the second largest allocation receives reward and the worker with the -th largest allocation of CPU power is offered reward . If two or more workers allocate the same amount of CPU power to perform the distributed computation tasks, ties will be randomly broken. In other words, if both workers are ranked , one is ranked and the other is ranked . Hence, without loss of generality, . The total amount of reward offered by the master is denoted by , i.e., . The aim of the master is to spend the entire fixed reward to maximize the CPU power allocated by the workers.
As such, the expected utility of the master, is expressed as follows:
(2) |
where represents the order statistics of the worker’s CPU power allocation. Specifically, and denote the highest and -th highest CPU power allocation respectively among workers.
II-B Utility of the Worker
To perform the local computations on the allocated submatrices, each worker consumes computational energy, , which is defined as:
(3) |
where is the effective switch coefficient that depends on the chip architecture [11], is the total number of CPU cycles required to complete the allocated computation subtask and is the CPU power allocated by worker for the computation subtask. By using the polynomial codes, the computation task is evenly partitioned and distributed among all workers. As a result, the total number of CPU cycles that are needed to complete the allocated computation tasks is the same for all workers, i.e., . The unit cost of computational energy incurred by worker , is denoted by , where the unit cost of computational energy is the same for all workers.
Each worker has a valuation for the total reward . The workers’ valuations, , are independently drawn from such that and are strictly positive based on , where is the cumulative distribution function (CDF) of . The total cost of worker is represented by . As a result, the utility of worker for winning reward , , is expressed as:
(4) |
III All-pay Auction
In this section, we present the design of an all-pay auction that incentivizes the workers to allocate more CPU power for the allocated computation subtasks. In this auction, the master is the auctioneer whereas the workers are the bidders. The bid of a worker is represented by the CPU power that it allocates for its computation subtasks. In this all-pay auction, all workers, i.e., bidders, pay their bids regardless of whether they win or lose the auction.
Each worker knows its own valuation, but does not know the valuation of any other worker, . This establishes a one-dimensional incomplete information setting. In addition, if each worker has a different unit cost of computational energy which is only known to itself, we consider the two-dimensional incomplete information setting. The dimension of private information can be reduced following the procedure in [12]. In this work, we consider a one-dimensional incomplete information setting where the unit cost of computational energy is the same for all workers but the workers’ valuations are heterogeneous and private.
Given the utility of worker , in Equation (4), the objective of worker to maximize its expected utility, , is defined as follows:
(5) |
where is the winning probability of reward by worker .
Although the worker does not know exactly the valuations of other workers, it knows the distribution of the other workers’ valuations based on past interactions, which is a common knowledge to all workers and the master. In our model, we consider that all the workers’ valuations are drawn from the same distribution, which constitutes a symmetric Bayesian game where the prior is the distribution of the workers’ valuations.
Definition 1.
[13] A pure-strategy Bayesian Nash equilibrium is a strategy profile that satisfies
The subscript represents the index of other workers other than worker . Specifically, represents the equilibrium CPU power allocations of all other workers other than CPU power allocation of worker . At Bayesian Nash equilibrium, given the belief of worker , about the valuations and that the CPU powers allocated by other workers, where are at equilibrium, , worker aims to maximize its expected utility.
Proposition 1.
Under incomplete information setting, the all-pay auction admits a pure-strategy Bayesian Nash equilibrium that is strictly monotonic where the bid of a worker strictly increases in its valuation.
Proof.
The proof is omitted due to space constraints. ∎
Since the equilibrium CPU power allocation of worker , which is represented by , is a strictly monotonically increasing function of its valuation , we express the equilibrium strategy of worker as a function represented by , i.e., . Given the strict monotonicity, the inverse function also exists where and it is an increasing function. Due to the incomplete information setting, the objective of worker to maximize its expected utility in Equation (5) can be expressed as follows:
(6) |
where the cost of worker is represented by the function .
Since the workers are symmetric, i.e., the valuations of workers are drawn from the same distribution, the symmetric equilibrium strategy for each worker , can be derived. We first assume that there are rewards, where . The valuations of the workers, are ranked and represented by its order statistics, which are expressed as . In particular, represents the -th highest valuation among the valuations which are drawn from the common distribution . Given the order statistics of the workers’ valuations, , the corresponding cumulative distribution function and probability density function are represented by and respectively. Similarly, when dealing with the valuations of all workers, other than that of worker , the order statistic is represented by , which represents the -th highest valuation among the valuations. The corresponding cumulative distribution function and probability density function are represented by and respectively.
Given that other workers , where , follow a symmetric, increasing and differentiable equilibrium strategy , worker will never choose to allocate a CPU power greater than the equilibrium strategy given the highest valuation. In other words, worker will never allocate . Besides, the optimal strategy of the worker with lowest valuation is not to allocate any CPU power. On one hand, when the number of rewards offered is smaller than the number of workers, i.e., , the worker with lowest valuation will not win any reward. On the other hand, when the number of rewards offered is larger than or equal the number of workers, i.e., , the worker with lowest valuation will win a reward without allocating any CPU power. Hence, . With this, the expected utility of worker with valuation and CPU power allocation is expressed as follows:
(7) |
since , and .
By differentiating Equation (7) with respect to the variable and equating the result to zero, we obtain the following:
(8) |
When maximized, the marginal value of the reward is equivalent to the marginal cost of the CPU power. Since we have the differentiated function , the function can be found by using the integral of Equation (8). At equilibrium, when the expected utility of worker , , is maximized, we have the following:
(9) |
Thus the equilibrium strategy for worker with valuation , , is expressed as:
(10) |
Given the equilibrium strategy of worker , , the master aims to maximize its expected utility, . By using the polynomial codes, the master is able to reconstruct the final result by using the computed results from workers. Since the master spends the fixed reward completely, the maximization problem in Equation (2) is equivalent to maximizing the allocation of CPU power, which is expressed as follows:
(11) |
Since the equilibrium strategy of worker , , is affected by the reward structure, the master needs to determine the structure of the rewards such that it maximizes the CPU power allocation of the workers, thereby maximizing its own utility, .
IV Reward Structure
Given that the master spends the total amount of the reward, , the design of the optimal reward sequence is important to maximize the CPU power allocation of the workers since the equilibrium strategies of the workers depend on the differences between consecutive rewards.
The master needs to first decide whether to allocate the total amount of reward, to only one winner, i.e., winner-take-all reward structure, or to split the reward into several smaller rewards.
Proposition 2.
Given that the cost functions are convex, it is not optimal to only offer one reward where and since , for . In particular, if , it is not optimal to only offer a reward.
Proof.
Following the procedure in [12], we show that it is not optimal to offer a single reward given the cost functions of the workers are convex. The details of the proof are omitted due to space constraints. ∎
Since the winner-take-all reward structure is not optimal, the master is better off offering multiple rewards. Given that rewards are offered, the master can consider several reward sequences such as (i) homogeneous reward sequence, (ii) arithmetic reward sequence and (iii) geometric reward sequence. Specifically, the reward sequence is expressed as follows:
-
•
Homogeneous reward sequence: ,
-
•
Arithmetic reward sequence: , ,
-
•
Geometric reward sequence: , ,
where and are constants.
In the next section, we show the effects of different reward structures on the allocation of CPU powers by the workers.
V Simulation Results
In this section, we evaluate the all-pay auction mechanism. Table I summarizes the simulation parameters. With normalization, we consider the total amount of reward to be , i.e., . We also set (see “Task Allocation” step in Section II).
Parameter | Values |
Unit cost of computational energy, | 1 |
Effective switch coefficient, [14] | |
Total number of CPU cycles required, | |
Valuation of worker , |
V-A Monotonic Behaviour of Workers
In the simulations, we consider a uniform distribution of the workers’ valuation for the rewards, where which are independently drawn from . From Figs. 2-5, it can be observed that the worker CPU power allocation increases monotonically with its valuation. Specifically, the higher the valuation of the workers for the rewards, the larger the amount of CPU power allocated for the computation subtasks. Since the workers are symmetric where their valuations are drawn from the same distribution, the workers with the same valuation contribute the same amount of CPU power.


V-B Winner-take-all
Based on the different reward structure adopted by the master, the workers allocate their CPU power accordingly. Figure 2 shows that when there are 5 workers and only one reward is offered to the worker that allocates the largest amount of CPU power, the worker with the highest valuation of 1, i.e., , is only willing to contribute W of CPU power. However, when the master offers multiple rewards, the worker with the same valuation of 1 is willing to contribute as high as W, W and W as shown in Fig. 2, Fig. 5 and Fig. 5 respectively. With more rewards, the workers have higher chance of winning one of the rewards. Hence, to incentivize the workers to allocate for CPU power for the computation subtasks, the master is better off offering multiple rewards than a single reward.
V-C Multiple Rewards:



Figure 2, Fig. 5 and Fig. 5 show the allocation of CPU power by the workers under arithmetic and geometric reward sequences. When the difference between the consecutive rewards is smaller, the workers are willing to allocate more CPU power. For example, when the difference between the consecutive rewards is 0.05, i.e., , , the worker with valuation of 1 allocates W when there are 15 workers competing for 4 rewards. However, under the same setting of 15 workers competing for 4 rewards, the worker with valuation of 1 is only willing to allocate W when the difference between the consecutive rewards is 0.1.
V-D Effects of Different System Parameters
Apart from the different reward structures, the workers also behave differently under different system parameter values, e.g., the number of workers and the number of rewards.
V-D1 More Workers
When there is only one reward that is offered to the worker that allocates the largest amount of CPU power, the workers allocate more CPU power when there are 5 workers than that of 15 workers. For example, in Fig. 2, the worker with a valuation of allocates W when there are 5 workers but only allocates W when there are 15 workers. When there are more workers participating in the auction, the competition among the workers is stiffer and the probability of winning the reward decreases. As a result, the workers allocate a smaller amount of CPU power.
However, similar trends are only observed for workers with low valuations, e.g., , when multiple rewards are offered. When the number of workers increases, the workers with low valuations reduce their allocation of CPU power for the computation subtasks. However, this is not observed for workers with high valuations, e.g., . Figure 2, Fig. 5 and Fig. 5 show that the workers with high valuations allocate more CPU power when there are more workers competing for the multiple rewards offered by the master. Specifically, when the master offers 4 rewards with a difference of 0.05 between the consecutive rewards, the worker with a valuation of allocates W when there are 5 workers but allocates W when there are 15 workers. When multiple rewards are offered, since it is possible for the workers to still win one of the remaining rewards even if they do not win the top reward, the workers are more willing to allocate their CPU power for the computation subtasks. Hence, the workers with high valuations allocate more CPU power to increase their chance of winning the top reward.
V-D2 More Rewards
When the number of workers participating in the all-pay auction is fixed, the workers allocate more CPU power when there are more rewards that are offered. It is seen from Fig. 5 that when there are 10 workers in the all-pay auction, the worker with a valuation of allocates CPU power of W when 5 rewards are offered as compared to W and W when 3 and 4 rewards are offered respectively.
VI Conclusion
In this paper, we proposed an all-pay auction to incentivize workers to participate in the coded distributed computation tasks. Firstly, we use polynomial codes to determine the number of rewards to be offered in the all-pay auction. Then, the master determines the reward structure to maximize its utility given the strategies of the workers. For our future work, we can consider workers with valuations chosen from different probability distributions.
References
- [1] F. Kalim, J. P. Jeong, and M. U. Ilyas, “CRATER: A Crowd Sensing Application to Estimate Road Conditions,” IEEE Access, vol. 4, pp. 8317–8326, 2016.
- [2] J. Dutta, C. Chowdhury, S. Roy, A. I. Middya, and F. Gazi, “Towards Smart City: Sensing Air Quality in City Based on Opportunistic Crowd-Sensing,” in Proceedings of the 18th International Conference on Distributed Computing and Networking, ICDCN ’17, (New York, NY, USA), Association for Computing Machinery, 2017.
- [3] S. Jovanović, M. Jovanović, T. Škorić, S. Jokić, B. Milovanović, K. Katzis, and D. Bajić, “A Mobile Crowd Sensing Application for Hypertensive Patients,” Sensors, vol. 19, no. 2, p. 400, 2019.
- [4] W. Shi and S. Dustdar, “The Promise of Edge Computing,” Computer, vol. 49, no. 5, pp. 78–81, 2016.
- [5] J. S. Ng, W. Y. B. Lim, N. C. Luong, Z. Xiong, A. Asheralieva, D. Niyato, C. Leung, and C. Miao, “A Survey of Coded Distributed Computing,” arXiv preprint arXiv:2008.09048, 2020.
- [6] J. Dean and L. A. Barroso, “The Tail at Scale,” Commun. ACM, vol. 56, p. 74–80, Feb. 2013.
- [7] G. Ananthanarayanan, S. Kandula, A. G. Greenberg, I. Stoica, Y. Lu, B. Saha, and E. Harris, “Reining in the Outliers in Map-Reduce Clusters using Mantri.,” in Osdi, p. 24, 2010.
- [8] Q. Yu, M. Maddah-Ali, and S. Avestimehr, “Polynomial Codes: an Optimal Design for High-Dimensional Coded Matrix Multiplication,” in Advances in Neural Information Processing Systems 30 (I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, eds.), pp. 4403–4413, Curran Associates, Inc., 2017.
- [9] F. Didier, “Efficient Erasure Decoding of Reed-Solomon Codes,” arXiv preprint arXiv:0901.1886, 2009.
- [10] T. Luo, S. K. Das, H. P. Tan, and L. Xia, “Incentive Mechanism Design for Crowdsourcing: An All-Pay Auction Approach,” ACM Trans. Intell. Syst. Technol., vol. 7, Feb. 2016.
- [11] Y. Zhang, J. He, and S. Guo, “Energy-Efficient Dynamic Task Offloading for Energy Harvesting Mobile Cloud Computing,” in 2018 IEEE International Conference on Networking, Architecture and Storage (NAS), pp. 1–4, 2018.
- [12] K. Yoon, “The Optimal Allocation of Prizes in Contests: An Auction Approach,” tech. rep., 2012.
- [13] Tie Luo, S. S. Kanhere, S. K. Das, and Hwee-Pink Tan, “Optimal Prizes for All-Pay Contests in Heterogeneous Crowdsourcing,” in 2014 IEEE 11th International Conference on Mobile Ad Hoc and Sensor Systems, pp. 136–144, 2014.
- [14] Y. Hao, M. Chen, L. Hu, M. S. Hossain, and A. Ghoneim, “Energy Efficient Task Caching and Offloading for Mobile Edge Computing,” IEEE Access, vol. 6, pp. 11365–11373, 2018.