This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

11institutetext: University of Washington, Seattle, WA, USA
11email: {sarafa,karlin,jamiemmt}@cs.washington.edu

Competition Alleviates Present Bias in Task Completion

Aditya Saraf Research supported in part by NSF grant CCF-1813135.    Anna R. Karlin Research supported by NSF grant CCF-1813135.    Jamie Morgenstern
Abstract

We build upon recent work (Kleinberg and Oren, 2014; Kleinberg et al., 2016, 2017) that considers present biased agents, who place more weight on costs they must incur now than costs they will incur in the future. They consider a graph theoretic model where agents must complete a task and show that present biased agents can take exponentially more expensive paths than optimal. We propose a theoretical model that adds competition into the mix – two agents compete to finish a task first. We show that, in a wide range of settings, a small amount of competition can alleviate the harms of present bias. This can help explain why biased agents may not perform so poorly in naturally competitive settings, and can guide task designers on how to protect present biased agents from harm. Our work thus paints a more positive picture than much of the existing literature on present bias.

Keywords:
present bias behavioral economics incentive design

1 Introduction

One of the most influential lines of recent economic research has been behavioral game theory (Ariely and Jones, 2008; Kahneman and Tversky, 1979). The majority of economics research makes several idealized assumptions about the behavior of rational agents to prove mathematical results. Behavioral game theory questions these assumptions and proposes models of agent behavior that more closely align with human behavior. Through experimental research (DellaVigna and Malmendier, 2006; DellaVigna, 2009), behavioral economists have observed and codified several common types of cognitive biases, from loss aversion (Kahneman and Tversky, 1979) (the tendency to prefer avoiding loss to acquiring equivalent gains) to the sunk cost fallacy (Arkes and Blumer, 1985) (the tendency to factor in previous costs when determining the best future course of action) to present bias (Frederick et al., 2002) (the current topic). One primary goal of theorems in game theory is to offer predictive power. This perspective is especially important in the many computer science applications of these results, from modern ad auctions to cryptocurrency protocols. If these theorems are to predict human behavior, the mathematical models ought to include observed human biases. Thus, rather than viewing behavioral game theory as conflicting with the standard mathematical approach, the experimental results of behavioral game theory can inform more sophisticated mathematical models. This paper takes a step towards this goal, building on seminal work of Kleinberg and Oren (2014) and Kleinberg et al. (2016, 2017), who formulated a mathematical model for planning problems where agents are present biased.

Present bias refers to overweighting immediate costs relative to future costs. This is a ubiquitous bias in human behavior that explains diverse phenomena. The most natural example is procrastination, the familiar desire to delay difficult work, even when this predictably leads to negative consequences later. Present bias can also model the tendency of firms to prefer immediate gains to long-term gains and the tendency of politicians to prefer immediate results to long-term plans. One simple model of present bias(Kleinberg and Oren, 2014; Kleinberg et al., 2016, 2017) is to multiply costs in the current time period by present bias parameter bb when making plans. This model is a special case of hyperbolic discounting, where costs are discounted in proportion to how much later one would experience them. But even this special case suffices to induces time-inconsistency, resulting in a rich set of strategies consistent with human behavior.

Examples of time inconsistent behavior extend beyond procrastination. For example, one might undertake a project, and abandon it partway through, despite the underlying cost structure remaining unchanged. One might fail to complete a course with no deadlines, but pass the same course with weekly deadlines. Many people pay for a gym membership but never use it. Kleinberg and Oren (2014) presented the key insight that this diverse range of phenomena can all be expressed in a single graph-theoretic framework, which we describe below.

Fix a directed, acyclic graph GG, with designated source ss and sink tt. Refer to GG as a task graph, where ss is the start of the task and tt the end. A path through this graph corresponds to a plan to complete the task; each edge represents one step of the plan. Each edge has a weight corresponding to the cost of that step.

The goal of an agent is to complete the task while incurring the least cost (i.e., to take the cheapest path from ss to tt). An optimal agent will simply follow such a cheapest path. A naive present biased agent with bias parameter bb behaves as follows. At ss, they compute their perceived cost for taking each path to tt by summing the weights along this path with the first edge scaled up by b>1b>1. They choose the path with the lowest perceived cost and take one step along this path, say to vv, and recompute their perceived cost along each the path from vv to tt. Notice that such an agent may choose a path at ss, take one edge along that path, and then deviate away from it. This occurs because the agent believes that, after the current choice of edge, they will pick the path from vv to tt with lowest true cost. But once they arrive at vv, their perceived cost of a path differs from the true cost, and they pick a path with lowest perceived cost. This is why the agents are considered naive: they incorrectly assume their future self will behave optimally, and thus exhibit time-inconsistent behavior. See Figure 1 for an example.

ssxxvv0ttzz2121yy660001111
Figure 1: The optimal path is (s,x,t)(s,x,t) with total cost 66. However, an agent with bias b=2b=2 will take path (s,v,z,t)(s,v,z,t), with cost 2121. Importantly, when the agent is deciding which vertex to move to from ss, they evaluate xx as having total cost 1212, while vv has total cost 1111. This is because they assume they will behave optimally at vv by taking path (v,y,t)(v,y,t). However, they apply the same bias at vv and deviate to the worst possible path.

The power of this graph theoretic model is that it allows us to answer questions over a range of planning problems, and to formally investigate which tasks represent the “worst-case” cost of procrastination. This is useful both to understand how present-biased behavior differs from optimal behavior and to design tasks to accommodate present bias. We now briefly summarize the existing literature, to motivate our introduction of competition to the model.

1.1 Prior Work

The most striking result is that there are graphs where the cost ratio (the ratio of the optimal agent’s cost to the biased agent’s cost) is exponential in the size of the graph. In addition, all graphs with exponential cost ratio have a shared structure – they all have a large nn-fan as a graph minor (and graphs without exponential cost ratio do not) (Kleinberg and Oren, 2014; Tang et al., 2017). So this structure encodes the worst-case behavior for present bias in the standard model (and we later show how competition is especially effective in this graph). An nn-fan is pictured in Figure 2.

ssxxvv0ttzz2121yy660001111ssv1v_{1}00cc11vnv_{n}ttv2v_{2}v3v_{3}\cdots0c2c^{2}c3c^{3}cnc^{n}
Figure 2: A naive agent with bias b>c>1b>c>1 will continually choose to delay finishing the task.

The exponential cost ratio demonstrates the severe harm caused by present bias. How, then, can designers of a task limit the negative effects of present bias? Kleinberg and Oren (2014) propose a model where a reward is given after finishing the task, and where the agent will abandon the task if at any point, they perceive the remaining cost to be higher than the reward. Unlike an optimal agent, a biased agent may abandon a task partway through. Figure 3 uses the gym membership example from earlier to show this. As a result, they give the task designer the power to arbitrarily delete vertices and edges, which can model deadlines, as Figure 4 shows. They then investigate the structure of minimally motivating subgraphs – the smallest subgraph where the agent completes the task, for some fixed reward. Follow-up work of Tang et al. (2017) shows that finding any motivating subgraph is NP-hard. Instead of deleting edges, Albers and Kraft (2019) consider the problem of spreading a fixed reward onto arbitrary vertices to motivate an agent, and find that this too is NP-hard (with a constrained budget). For other recent work involving present bias, see (Gravin et al., 2016; Albers and Kraft, 2017; Yan and Yong, 2019; Oren and Soker, 2019; Ma et al., 2019).

ssxxvv0ttzz2121yy660001111ssv1v_{1}00cc11vnv_{n}ttv2v_{2}v3v_{3}\cdots0c2c^{2}c3c^{3}cnc^{n}ss22ttvv66
Figure 3: Let (s,v)(s,v) represent buying a gym membership and (v,t)(v,t) represent working out regularly for a month (Roughgarden, 2016). At tt, the agent receives a reward of 1111 due to health benefits. With bias b=2b=2, the agent initially believes this task is worth completing, but due to his bias, abandons the task at vertex vv, after having already purchased the membership. This same example works more generally for project abandonment, where (s,v)(s,v) could represent the easier planning/conceptual stage while (v,t)(v,t) represents the more difficult execution.
ssxxvv0ttzz2121yy660001111ssv1v_{1}00cc11vnv_{n}ttv2v_{2}v3v_{3}\cdots0c2c^{2}c3c^{3}cnc^{n}ss22ttvv66ss0v2,0v_{2,0}v1,0v_{1,0}0v1,2v_{1,2}0ttv2,2v_{2,2}0v1,1v_{1,1}0v2,1v_{2,1}22222255225555
Figure 4: Roughgarden (2016) gave this example to represent a three week course with two assignments. Let vi,jv_{i,j} correspond to being in week ii with jj assignments completed, and suppose the reward of the course is 99. Note that a student with bias b=2b=2 will procrastinate until v2,0v_{2,0} and then ultimately abandon the course. However, if the instructor required the first assignment to be completed by week 2 (which corresponds to removing v2,0v_{2,0}), the student would complete the course.

The above results all focus on accommodating present bias rather than alleviating it. By that, we mean that the approaches all focus on whether the agent can be convinced to complete the task – via edge deletion or reward dispersal – but not on guarding the agent from suboptimal behavior induced by their bias. Kleinberg et al. (2016) partially investigates the latter question in a model involving sophisticated agents, who plan around their present bias. They consider several types of commitment devices – tools by which sophisticated agents can constrain their future selves. However, these tools may require more powerful agents or designers and don’t necessarily make sense for naive agents. We take a different approach – we show that adding competition can simultaneously explain why present-biased agents may not perform exponentially poorly in “natural” games and guide task designers in encouraging biased agents towards optimal behavior.

1.2 Our Model

In our model, a task is still represented by a directed, acyclic graph GG, with a designated source ss and sink tt. There are two naive present-biased agents, A1A_{1} and A2A_{2}, both with bias bb, who compete to get to tt first. The cost of a path is the sum of the weights along the path, and time is represented by the number of edges in the path, which we call the length of the path. In other words, each edge represents one unit of time. The first agent to get to tt gets a reward of rr; ties are resolved by evenly splitting the reward. Recall that naive agents believe that they will behave optimally in the future. Thus, an agent currently at uu considers the cost to reach the target tt to be bc(u,v)bc(u,v) plus the cost of the optimal path from vv to tt minus the reward of that path. More formally, let 𝒫(vt)\mathcal{P}(v\to t) denote the set of paths from vtv\to t and let P(su)P(s\to u) denote the path the agent has taken to uu. Let Cn(u,v)C_{n}(u,v) denote the remaining cost that the naive agent believes they will incur while at uu and planning to go to vv. The subscript nn stands for “naive” (to help distinguish from c(u,v)c(u,v), the cost of the edge (u,v)(u,v)). Then:

Cn(u,v)=bc(u,v)+minP(vt)𝒫(vt)c(P(vt))RA2(P(su)(u,v)P(vt)),C_{n}(u,v)=b\cdot c(u,v)+\min_{\mathclap{P(v\to t)\in\mathcal{P}(v\to t)}}c(P(v\to t))-R_{A_{2}}(P(s\to u)\cup(u,v)\cup P(v\to t)), (1)

where c(P)=ePc(e)c(P)=\sum_{e\in P}c(e) denotes the cost of path PP and RA2(P)R_{A_{2}}(P) denotes the reward of taking path PP from ss to tt. This reward depends on the path the other agent A2A_{2} takes. Specifically, if A2A_{2} takes a path of length kk, and P(su)(u,v)P(vt)P(s\to u)\cup(u,v)\cup P(v\to t) is a path of length \ell, then

RA2(P(su)(u,v)P(vt))={r<kr2=k0 otherwise.R_{A_{2}}(P(s\to u)\cup(u,v)\cup P(v\to t))=\begin{cases}r&\ell<k\\ \frac{r}{2}&\ell=k\\ 0&\text{ otherwise}.\end{cases}

We will often rewrite the second term in (1), for ease of notation, as minPvc(Pv)RA2(Psu,vt)\min_{P_{v}}c(P_{v})-R_{A_{2}}(P_{s\to u,v\to t}). We sometimes refer instead to the naive agent’s utility, which is the negation of this cost. Given this cost function, the naive agent chooses the successor of node uu via

S(u)=argminv:(u,v)ECn(u,v)S(u)=\operatorname*{argmin}_{v:(u,v)\in E}C_{n}(u,v)

See Figure 5 for an example.

ssxxvv0ttzz2121yy660001111ssv1v_{1}00cc11vnv_{n}ttv2v_{2}v3v_{3}\cdots0c2c^{2}c3c^{3}cnc^{n}ss22ttvv66ss0v2,0v_{2,0}v1,0v_{1,0}0v1,2v_{1,2}0ttv2,2v_{2,2}0v1,1v_{1,1}0v2,1v_{2,1}22222255225555ssuuxxttyyvvzz2222224455774422
Figure 5: Suppose r=5r=5, the bias b=2b=2, and assume A2A_{2} takes path (s,u,x,t)(s,u,x,t). Then at ss, A1A_{1} prefers to take uu for perceived cost 4+4+52.5=10.54+4+5-2.5=10.5. Notice that, due to the reward, the path A1A_{1} believes he will take from uu is (u,x,t)(u,x,t), despite (u,y,z,t)(u,y,z,t) having lower cost. However, at uu, A1A_{1} evaluates the lower path to be cheaper, despite losing the race. This shows that a reward of 55 does not ensure a Nash equilibrium on (s,u,x,t)(s,u,x,t) when b=2b=2.

We now consider how this model of competition might both explain the outcomes of natural games and inform task designers on how to elicit better behavior from biased agents. For a natural game, consider the classic example of two companies competing to expand into a new market. Both companies want to launch a similar product, and are thus considering the same task graph GG. The companies are also present biased, since shareholders often prefer immediate profit maximization/loss minimization over long term optimal behavior. The first company to enter the market gains an insurmountable advantage, represented by reward rr. If the companies both enter the market at the same time, they split the market share, each getting reward r/2r/2. This arrangement can be modeled within our framework, and the competition between the companies should lead them to play a set of equilibrium strategies.

For a designed game, consider the problem of encouraging students to submit final projects before they are due. The instructor sets a deadline near the end of finals week to give students flexibility to complete the project when it best fits their schedule. The instructor also knows that (1) students tend to procrastinate and (2) trying to complete the final project in a few days is much more challenging than spreading it out. They would like to convince students to work on and possibly submit their assignments early, without changing the deadline (to allow flexibility for the students whom it suits best). One possible solution would be to give a small amount of extra credit to the first submission. How might they set this reward to encourage early submissions?

For another example of a designed game, consider a gym that enjoys increased enrollment at the start of the year. However, their new customers tend to not visit the gym frequently, and eventually cancel their membership. To remedy this, they try a new promotion, where new customers are offered free fitness classes, and the first customer(s) to attend 5 such classes gets a discount on their annual membership. How effective might such a scheme be at encouraging their new customers to regularly use their gym? Across these three examples, the intuition is that competition will alleviate the harms of present bias by driving agents towards optimal behavior.

1.3 Summary of Results

We have introduced a model of competition for completing tasks along some graph. We warm up by analyzing these games absent present bias. Namely, we classify all Nash equilibria for an arbitrary task graph with unbiased agents. This analysis involves characterizing the set of paths which are the cheapest of a given length.

We then analyze these games when agents have equal present bias. We show that a very small reward induces a Nash equilibrium on the optimal path, for any graph with a dominant path. This is a substantial improvement over the exponential worst case cost ratio experienced without competition. We then discuss how time-inconsistency defeats the intuition that higher rewards cause agents to prefer quicker paths. Despite this complication, we provide an algorithm that, given arbitrary graph GG and path QQ, determines the minimum reward needed to get a Nash equilibrium on QQ, if possible.

Finally, we add an element of bias uncertainty to the model, by drawing agents’ biases iid from distribution FF and, for the nn-fan, describe the relationship between FF and the reward required for a Bayes-Nash equilibrium on the optimal path. For a wide range of distributions, we find small rewards suffice to ensure that agents behave optimally (with high probability) in equilibrium. For the stronger goal of ensuring a constant expected cost ratio, it suffices to offer reward linear in nn when FF is not heavy-tailed; competition thus helps here as well.

2 Nash Equilibria with Unbiased Competitors

To build intuition, we first describe the Nash equilibria of these games when agents have no present bias. We also pinpoint where the analysis will change with the introduction of bias. Notice that each path PP in the graph is a strategy, with payoffs either uw=rc(P)u_{w}=r-c(P), ut=r/2c(P)u_{t}=r/2-c(P) or ul=c(P)u_{l}=-c(P), depending on whether the agent wins, ties or loses, respectively. (These in turn depend on the path taken by the opponent.) We first rule out dominated paths. Notice that if ul(P)uw(P)u_{l}(P)\geq u_{w}(P^{\prime}), path PP^{\prime} is dominated by path PP, regardless of the path taken by the opponent. Also, if uw(P)uw(P)u_{w}(P)\geq u_{w}(P^{\prime}) and |P||P||P|\leq|P^{\prime}| (where |P||P| is the number of edges in PP), then PP^{\prime} is dominated. Therefore, for any length kk, a single cheapest path of length kk will (weakly) dominate all other paths of length kk.

For a given graph GG, let P1,,PnP_{1},\ldots,P_{n} be a minimal set of non-dominated paths, where |Pi|<|Pi+1||P_{i}|<|P_{i+1}| for each 1i<n1\leq i<n. Thus, P1P_{1} is the quickest path, the remaining path of minimum length. Summarizing what we know about these paths:

  1. 1.

    Winning is better than losing: for any pair of paths (Pi,Pj)(P_{i},P_{j}), we know that uw(Pi)>ul(Pj)u_{w}(P_{i})>u_{l}(P_{j}). Thus, in particular, c(P1)c(Pn)rc(P_{1})-c(P_{n})\leq r.

  2. 2.

    Longer paths are more rewarding: That is, c(Pi)>c(Pi+1)c(P_{i})>c(P_{i+1}) for each ii. Otherwise, Pi+1P_{i+1} would be dominated since its length is greater. Therefore, in particular, PnP_{n} is the cheapest path, i.e., the lowest cost/weight path from ss to tt.

We’re interested in characterizing, across all possible task graphs, the pure Nash equilibria, restricting attention to paths in P1,,PnP_{1},\ldots,P_{n}.

Proposition 1

Let GG be an arbitrary task graph. As above, let P1,,PnP_{1},\dots,P_{n} be a minimal set of (non-dominated) paths ordered so P1P_{1} is the quickest and PnP_{n} the cheapest. Suppose n3n\geq 3. Then, path PiP_{i}, where i>1i>1, is a symmetric Nash equilibrium if and only if c(Pi1)c(Pi)r/2c(P_{i-1})-c(P_{i})\geq r/2. P1P_{1} is a symmetric Nash if and only if c(P1)c(Pn)r/2c(P_{1})-c(P_{n})\leq r/2. There are no other pure Nash equilibria. Therefore, there can be either 0, 1 or 2 pure Nash equilibria.

If n=2n=2, there is an additional asymmetric pure Nash equilibrium where one player plays the quickest path P1P_{1} and the other plays the cheapest path P2P_{2} if c(P1)c(P2)=r/2c(P_{1})-c(P_{2})=r/2.

Proof

First, suppose n3n\geq 3 and the opponent picks PiP_{i}, where i>1i>1. If the agent plays Pi1P_{i-1}, they get uw(Pi1)u_{w}(P_{i-1}). Since winning is always better than losing, the agent need not consider any P>iP_{>i}. Similarly, since longer paths are cheaper, the agent will not take any P<i1P_{<i-1}. Thus, the only choices are between tying on PiP_{i} or winning on Pi1P_{i-1}, so PiP_{i} is a symmetric Nash exactly when ut(Pi)uw(Pi1)u_{t}(P_{i})\geq u_{w}(P_{i-1}), or equivalently c(Pi1)c(Pi)r/2c(P_{i-1})-c(P_{i})\geq r/2. Similarly, if the opponent plays P1P_{1}, the agent chooses between tying on P1P_{1} or losing; in the latter case, losing on the cheapest path PnP_{n} gives highest utility. The agents therefore have a symmetric Nash exactly when tying on P1P_{1} is better than losing on PnP_{n}, or c(P1)c(Pn)r/2c(P_{1})-c(P_{n})\leq r/2.

The only strategy profile we haven’t considered is (Pi1,Pi)(P_{i-1},P_{i}); when might this be a Nash equilibrium? Suppose the opponent plays Pi1P_{i-1}. Either i1>1i-1>1, and so the best response is Pi2P_{i-2} or Pi1P_{i-1}, or i1=1i-1=1, and so the best response is P1P_{1} or PnP_{n}. If n3n\geq 3, then P2PnP_{2}\neq P_{n} and so (Pi1,Pi)(P_{i-1},P_{i}) is not an equilibrium. Otherwise, if n=2n=2, then P2=nP_{2}=n and so we get an asymmetric Nash when tying is dominated, i.e. ut(P1)ul(P2)u_{t}(P_{1})\leq u_{l}(P_{2}) and uw(P1)ut(P2)u_{w}(P_{1})\geq u_{t}(P_{2}). This is the same as c(P1)c(P2)=r/2c(P_{1})-c(P_{2})=r/2. ∎

If we apply the same reasoning when the tie-breaking rule is different, for example, when ties are resolved by both players getting the full reward, any PiP_{i} is a (symmetric) pure Nash (and there are no other pure Nash). On the other hand, if ties result in neither player receiving any reward, the same reasoning implies that we only have a pure (asymmetric) Nash if n=2n=2; if n3n\geq 3, there are no pure Nash.

We next turn our attention to the biased version of this problem. In the unbiased case, we could take a “global” view of the graph, and think about paths purely in terms of their overall length and cost. But when agents are biased, the actual structure of the path is very important; time-inconsistency means that agents look at paths locally, not globally. It is thus very difficult to cleanly rule out dominated paths – even paths with exponentially high cost may be taken, as we see next.

3 Nash Equilibria to Elicit Optimal Behavior from Biased Agents

We now assume that the agents are both naive, present biased agents, with shared bias parameter bb.111 The homogenity of the agents is not particularly important to our results in this section. If the agents have different bias parameters, our results go through by setting bb equal to the larger of the two biases. Our high-level goal is to show that competition convinces biased agents to take cheap paths, as unbiased agents do without competition. To this end, we show that a small amount of reward creates a Nash equilibrium on the cheapest path, first on the nn-fan, and then for all graphs which have a dominant path – a cheapest path that is also the uniquely quickest path.

3.1 The nn-fan

We focus first on the nn-fan since all graphs with exponential cost ratio for naive agents have a large kk-fan as a graph minor (Kleinberg and Oren, 2014). For an example of the nn-fan, refer back to Figure 2; note in particular that we assume the bias parameter b>c>1b>c>1. We show that, with a very small amount of reward, competition encourages optimal behavior in naive agents (and further, we fully characterize all equilibria on the nn-fan). To aid our proofs and discussions regarding the nn-fan, we define PiP_{i} as the path from ss to tt containing edge (vi,t)(v_{i},t). Define P0P_{0} as the direct path (s,t)(s,t).

Theorem 3.1

Let GG be an nn-fan, and for any bias b>cb>c, define ε=bc\varepsilon=b-c. Then, with a reward of r2εr\geq 2\varepsilon, there will be a Nash equilibrium on the optimal path, for two agents with bias bb. Further, if r2εcn1r\leq 2\varepsilon c^{n-1}, there is also a Nash equilibrium on the longest path. There are no other pure Nash equilibria, for any value of rr.

Proof

We first show that r2εr\geq 2\varepsilon guarantees the existence (not uniqueness) of an optimal pure Nash equilibrium. Suppose A2A_{2} takes the optimal path P0P_{0}. While standing at ss, A1A_{1} evaluates the cost to tt as br/2b-r/2, i.e. Cn(s,t)=br/2C_{n}(s,t)=b-r/2. Similarly, Cn(s,v1)=cC_{n}(s,v_{1})=c, so with r2εr\geq 2\varepsilon, A1A_{1} (weakly) prefers to go directly to tt. Using a similar calculation, we can see that when r2εcn1r\leq 2\varepsilon c^{n-1}, there is a Nash equilibrium on the longest path, PnP_{n}. This result is somewhat surprising – even with an extremely high reward, the naive agents may take the longest path. To see why, note that at any vertex viv_{i}, the naive agent is only comparing two options, going to tt right now or waiting one step and going to tt (since they naively believe they will behave optimally in the future). Further, when A2A_{2} takes the longest path, A1A_{1} gets the same reward from (vi,t)(v_{i},t) and (vi+1,t)(v_{i+1},t). Thus, just as the agent prefers to procrastinate in the nn-fan, they similarly procrastinate here until vn1v_{n-1}, where the reward must be very large for them to deviate from the longest path.

To show that there are no other Nash equilibria, suppose A2A_{2} takes PiP_{i}, where 1i<n1\leq i<n. As we explained above, since A1A_{1} gets the same reward from paths P1P_{1} to Pi1P_{i-1} and the same reward from paths Pi+1P_{i+1} to PnP_{n}, the agent will always procrastinate to at least vi1v_{i-1}, and if he chooses to go past viv_{i}, he will procrastinate until vnv_{n}. Now, suppose A1A_{1} is standing at vi1v_{i-1}. He will take path Pi1P_{i-1} if bci1r<cir/2bc^{i-1}-r<c^{i}-r/2, which implies that r>2εci1r>2\varepsilon c^{i-1}. Either the reward satisfies this, which shows that PiP_{i} is not a Nash equilibrium, or r2εci1r\leq 2\varepsilon c^{i-1}, and A1A_{1} continues to viv_{i}. Suppose he continues to viv_{i}. Then, he (weakly) prefers PiP_{i} if bcir/2ci+1bc^{i}-r/2\leq c^{i+1}, which implies that r2εcir\geq 2\varepsilon c^{i}. However, r2εci1<2εcir\leq 2\varepsilon c^{i-1}<2\varepsilon c^{i}, so rr does not satisfy this. Thus, A1A_{1} prefers to procrastinate until vnv_{n}. In summary, if A2A_{2} takes path PiP_{i}, A1A_{1} either prefers Pi1P_{i-1} or PnP_{n} (depending on the reward). The same argument applies to A2A_{2}, so there are no Nash equilibria on paths P1,,Pn1P_{1},\dots,P_{n-1}, regardless of rr. ∎

The ability to enable optimal behavior with just a constant reward stands in striking contrast to the setting where agents can abandon the task at any time. With neither competition nor internal rewards, the designer would have to place an exponentially high reward on tt (i.e. rcnr\geq c^{n}) for biased agents to finish the task.222However, with internal edge rewards, the designer could spend the same total reward to motivate two agents by directly adding a reward to the (s,t)(s,t) edge. This equivalence doesn’t hold in all graphs, as we show in the next section.

3.2 Graphs with an Unbiased Dominant Strategy

We now generalize our results to a larger category of graphs. To focus exclusively on the irrationality of present bias rather than the optimization problem of choosing between cheap, long paths and short, expensive paths, we focus on graphs with a dominant path – a cheapest path333There may be other cheapest paths which are longer. that is also the uniquely quickest path. This is the case in the nn-fan. In this setting, the problem is trivial for unbiased agents; simply take this dominant path. But for biased agents, the problem is still interesting; as the example of the nn-fan shows, they may take paths that are exponentially more costly than the dominant path. However, we prove that a small amount of competition and reward suffices to ensure the existence of a Nash equilibrium where both agents take the dominant path.

Theorem 3.2

Suppose GG is a task graph that has a dominant path, OO. Then, a reward of r2bmaxeOc(e)r\geq 2b\cdot\max_{e\in O}c(e) guarantees a Nash equilibrium on OO, for two agents with bias bb.

Proof

Assume that A2A_{2} takes OO. Recall that a biased agent perceives the remaining traversal cost of going from vv to tt as:

Cn(u,v)=bc(u,v)+minPvc(Pv)RA2(Psu,vt)\displaystyle C_{n}(u,v)=b\cdot c(u,v)+\min_{P_{v}}c(P_{v})-R_{A_{2}}(P_{s\to u,v\to t})

We know that for any vertex vv^{*} on the dominant path, the path that minimizes the second term is just the fragment of the dominant path from vtv^{*}\to t (it is both the quickest and cheapest way to get from vv^{*} to tt). Further, any deviation from the dominant path results in no reward. So, for any vv not on the dominant path, the path that minimizes the second term is again the cheapest path from vtv\to t. Thus, the cost equation simplifies to:

Cn(u,v)=bc(u,v)+d(v)r/2𝟏{D},\displaystyle C_{n}(u,v)=b\cdot c(u,v)+d(v)-r/2\cdot\mathbf{1}\{D\},

where d(v)d(v), the distance from vtv\to t, denotes the cost of the cheapest path from vv to tt (ignoring rewards) and 𝟏{D}\mathbf{1}\{D\} is simply an indicator variable that’s 11 if the agent has not deviated from the dominant path.

Now, let O=(s=v0,v1,v2,,t=vl)O=(s=v_{0}^{*},v_{1}^{*},v_{2}^{*},\dots,t=v_{l}^{*}) be the dominant path and suppose A2A_{2} takes this path. In order for A1A_{1} to choose OO, we require, for all ii:

S(vi)=vi+1\displaystyle S(v_{i}^{*})=v_{i+1}^{*}
\displaystyle\iff vi+1=argminv:(vi,v)Ebc(vi,v)+d(v)r/2𝟏{D}\displaystyle v_{i+1}^{*}=\operatorname*{argmin}_{v:(v_{i}^{*},v)\in E}b\cdot c(v_{i}^{*},v)+d(v)-r/2\cdot\mathbf{1}\{D\}
\displaystyle\iff v:(vi,v)E,bc(vi,v)+d(v)bc(vi,vi+1)+d(vi+1)r/2\displaystyle\forall v:(v_{i}^{*},v)\in E,bc(v_{i}^{*},v)+d(v)\geq bc(v_{i}^{*},v_{i+1}^{*})+d(v_{i+1}^{*})-r/2

Now, let ii be arbitrary, let vviv\neq v_{i}^{*} be an arbitrary neighbor of viv_{i}^{*}, and for ease of notation, let c=c(vi,v),c=c(vi,vi+1),d=d(v)c=c(v_{i}^{*},v),c^{*}=c(v_{i}^{*},v_{i+1}^{*}),d=d(v), and d=d(vi+1)d^{*}=d(v_{i+1}^{*}). Then, we get the following bound on the reward: r/2b(cc)+ddr/2\geq b(c^{*}-c)+d^{*}-d. To get a rough sufficient bound, notice that c+dc+dc+d\geq c^{*}+d^{*}, since OO is the cheapest path. This implies that bc>b(cc)+ddbc^{*}>b(c^{*}-c)+d^{*}-d. Thus, it suffices to set r2bcr\geq 2bc^{*} in order to ensure S(vi)=vi+1S(v_{i}^{*})=v_{i+1}^{*}. Repeating this argument for all ii, we see that a sufficient reward is r=2bmaxeOc(e)r=2b\cdot\max_{e\in O}c(e). ∎

One might object to our claim that r=2bmaxeOc(e)r=2b\max_{e\in O}c(e) is “small”. To calibrate our expectations, notice that we can view this problem as an agent trying to pick between several options (i.e. paths), each with their own reward and cost structure. We want to convince the agent to pick one particular option – namely, the cheapest path. But it would be unreasonable to expect that a reward significantly smaller than the cost of any option would sway the agent’s decision. Our theorem above shows that a reward that is at most proportional to the cheapest option suffices; and in many cases the reward is only a fraction of the cost of the optimal path (e.g. when the optimal path has balanced cost among many edges).

For a point of comparison, even internal edge rewards (which required the same reward budget as competitive rewards for the nn-fan) can require O(n)O(n) times as much reward in some instances. To see the intuition, notice that internal edge rewards must be applied at every step where the agent might want to deviate. The agent also immediately “consumes” this reward; it doesn’t impact his future decision making. However, the competitive reward is “at stake” whenever the agent considers deviating; this reward can sway the agent’s behavior without being immediately consumed. For a concrete example, see Figure 6.

ssxxvv0ttzz2121yy660001111ssv1v_{1}00cc11vnv_{n}ttv2v_{2}v3v_{3}\cdots0c2c^{2}c3c^{3}cnc^{n}ss22ttvv66ss0v2,0v_{2,0}v1,0v_{1,0}0v1,2v_{1,2}0ttv2,2v_{2,2}0v1,1v_{1,1}0v2,1v_{2,1}22222255225555ssuuxxttyyvvzz2222224455774422ss11v1v_{1}11u1u_{1}220v2v_{2}11u2u_{2}220v3v_{3}11\cdots11vnv_{n}11unu_{n}220tt
Figure 6: A graph with many suboptimal deviations. For an agent with bias b>2b>2, a designer with access to only edge rewards must spend O(n)O(n) total reward for optimal behavior (b2b-2 on each (vi,vi+1)(v_{i},v_{i+1}) edge). In our competitive setting, only 2(b2)2(b-2) total reward is required.

3.3 Increasing the Number of Competitors

A very natural extension to this model would involve more than 2 agents competing. The winner takes all, and ties are split evenly among those who tied. However, this modification doesn’t change much when trying to get a Nash equilibrium on the dominant path. The only change is that if mm agents are competing, the reward needed is O(m)O(m), as a single agent will get a 1/m1/m fraction of the reward in a symmetric equilibrium. This is true because there is no way for any agent to beat the dominant path, and claim the entire O(m)O(m) reward for themselves. So if the reward is scaled appropriately (i.e. in Theorem 3.2 set rmbmaxeOc(e)r\geq mb\cdot\max_{e\in O}c(e)), we will still guarantee a Nash equilibrium. Put another way, the per-agent reward needed for a Nash equilibrium on the dominant path does not change as the number of competitors varies.

4 General Nash Equilibria

In this section, we describe a polynomial time algorithm that, given an arbitrary graph GG, path QQ and bias bb, determines if QQ can be made a Nash equilibrium, and if so, the minimum required reward to do so. Finding and using this minimum required reward will generally cost much less than the bound given by Theorem 3.2. Moreover, this algorithm does not assume the existence of a dominant path. We start by describing how time-inconsistency defeats the intuition that higher rewards cause agents to prefer quicker paths. We then present an algorithm which computes the minimum rr that results in QQ being a symmetric Nash equilibrium.444In fact, the algorithm will return a set of O(n)O(n) disjoint intervals containing all values of rr that result in QQ being a symmetric Nash equilibrium.

4.1 Higher Rewards Need Not Encourage Quicker Paths

The proof of Theorem 3.2 suggests the following algorithm for this problem. Start with a reward of 0, and step along each vertex uQu\in Q, increasing the reward by just enough to ensure A1A_{1} stays on QQ for one more step (assuming A2A_{2} is taking QQ). However, if A1A_{1} wants to deviate onto a quicker/tied path at any point, return \bot; decreasing the reward would cause them to deviate earlier, and, intuitively, it seems that increasing the reward could not cause them to switch back to QQ from the quicker/tied path. After one pass, simply pass through again to ensure that the final reward doesn’t cause A1A_{1} to deviate early on. The following lemma shows that this algorithm is tractable, by showing that we can compute the minimum reward required for A1A_{1} to stay on QQ at any step (and determine whether A1A_{1} wants to deviate onto a quicker path).

Lemma 1

Assuming that A2A_{2} takes path QQ, A1A_{1} can efficiently compute minPvc(Pv)RA2(Psu,vt)\min_{P_{v}}c(P_{v})-R_{A_{2}}(P_{s\to u,v\to t}) by considering the cheapest path (from vtv\to t), the cheapest path where A1A_{1} ties A2A_{2}, and the cheapest path where A1A_{1} beats A2A_{2}. (Some of these paths may coincide, and at least one must exist).

Proof

If A1A_{1} assumes that A2A_{2} takes path QQ, then the reward is simply rr if the path under consideration wins, r/2r/2 if it ties, and 0 if it loses. Thus, the minimizer must be the cheapest paths in one of these three cases (and the cheapest vtv\to t path always exists; the other two may not). Specifically, let kk be the number of edges from vtv\to t on QQ, and let c,ck,c<kc_{\infty},c_{\leq k},c_{<k} denote the cost of the cheapest vtv\to t paths with any number of edges, at most kk edges, and strictly less than kk edges respectively. Then, minPvc(Pv)RA2(Psu,vt)=min(c,ckr/2,c<kr)\min_{P_{v}}c(P_{v})-R_{A_{2}}(P_{s\to u,v\to t})=\min(c_{\infty},c_{\leq k}-r/2,c_{<k}-r). These paths can be efficiently computed using the Bellman-Ford algorithm to determine the cheapest path with at most kk edges for any kk. ∎

Unfortunately, while tractable, the approach described above does not yield a correct algorithm. This is because it relies implicitly on the following two properties, which formalize the intuition that increasing the reward causes agents to favor quicker paths.

Property 1

If a reward rr guarantees a Nash equilibrium on some path QQ, any reward r>rr^{\prime}>r will either (a), still result in a Nash equilibrium on QQ, or (b), cause an agent to deviate to a quicker path QQ^{\prime}.

Property 2

If A1A_{1} deviates from QQ onto a quicker/tied path for some reward rr, increasing the reward will not cause them to follow QQ.

Both properties are intuitive – if we increase the reward, that should motivate the agent to take a path that beats their opponent. And vice versa – increasing the reward shouldn’t cause them to shift onto a slower path or shift between equal length paths. But surprisingly, both properties are false, as the following examples demonstrate.

ssxxvv0ttzz2121yy660001111ssv1v_{1}00cc11vnv_{n}ttv2v_{2}v3v_{3}\cdots0c2c^{2}c3c^{3}cnc^{n}ss22ttvv66ss0v2,0v_{2,0}v1,0v_{1,0}0v1,2v_{1,2}0ttv2,2v_{2,2}0v1,1v_{1,1}0v2,1v_{2,1}22222255225555ssuuxxttyyvvzz2222224455774422ss11v1v_{1}11u1u_{1}220v2v_{2}11u2u_{2}220v3v_{3}11\cdots11vnv_{n}11unu_{n}220ttss11v1v_{1}q1q_{1}v2v_{2}q2q_{2}v3v_{3}1010tt22100100
(a)
ssxxvv0ttzz2121yy660001111ssv1v_{1}00cc11vnv_{n}ttv2v_{2}v3v_{3}\cdots0c2c^{2}c3c^{3}cnc^{n}ss22ttvv66ss0v2,0v_{2,0}v1,0v_{1,0}0v1,2v_{1,2}0ttv2,2v_{2,2}0v1,1v_{1,1}0v2,1v_{2,1}22222255225555ssuuxxttyyvvzz2222224455774422ss11v1v_{1}11u1u_{1}220v2v_{2}11u2u_{2}220v3v_{3}11\cdots11vnv_{n}11unu_{n}220ttss11v1v_{1}q1q_{1}v2v_{2}q2q_{2}v3v_{3}1010tt22100100ssq1q_{1}v1v_{1}q2q_{2}v2v_{2}q3q_{3}88tt55v3v_{3}v4v_{4}xx221515
(b)
Figure 7: Graphs which do not exhibit the two expected properties. Unlabeled edges have cost 0.

For 1, consider the graph in Figure 7(a) and define paths Q=(s,q1,q2,t)Q=(s,q_{1},q_{2},t), V=(s,v1,v2,v3,t)V=(s,v_{1},v_{2},v_{3},t), and X=(s,v1,t)X=(s,v_{1},t). Suppose both agents have bias 1010 and that A2A_{2} takes QQ. Then a reward of 11 guarantees that A1A_{1} takes QQ, as the optimal path from v1tv_{1}\to t would follow VV. However, if r=300r=300, the optimal path from v1tv_{1}\to t follows XX. So, Cn(s,q1)=148>Cn(s,v1)=200C_{n}(s,q_{1})=-148>C_{n}(s,v_{1})=-200 and so the agent goes to v1v_{1}. But at v1v_{1}, the perceived cost of (v1,t)(v_{1},t) is actually 10001000, and so the agent prefers to take path VV. Thus, increasing the reward from 11 to 300300 causes the agent to deviate from QQ onto a slower path!

For 2, consider the graph in Figure 7(b) and define paths Q,VQ,V, and XX in the obvious manner. Again, suppose both agents have bias 1010 and that A2A_{2} takes QQ. Then, with a reward of 1010, the optimal path from v1tv_{1}\to t would follow VV. So, Cn(s,v1)=5>Cn(s,q1)=85=3C_{n}(s,v_{1})=5>C_{n}(s,q_{1})=8-5=3, and A1A_{1} goes to q1q_{1} and thus takes QQ. If r=2r=2, the optimal path from v1tv_{1}\to t still follows VV. But now Cn(s,v1)=5<Cn(s,q1)=81=7C_{n}(s,v_{1})=5<C_{n}(s,q_{1})=8-1=7. Thus, the agent goes to v1v_{1}. And here, with b=10b=10, deviating to XX is more attractive than remaining on VV, and thus the agent takes XX. So, although A1A_{1} deviates from QQ to a quicker path for reward 22, they remain on QQ with reward 1010.

To summarize, 1 fails because present biased agents can take slower paths than they planned and 2 fails because present biased agents can take quicker paths than they planned. In other words, while higher rewards do tempt agents to take quicker paths, and lower rewards tempt agents towards cheaper paths, their time inconsistency may make them do the opposite.

4.2 The Algorithm

We now present an algorithm which finds the set of rewards which induce a symmetric Nash equilibrium on a path QQ (Algorithm 1). At a high level, the algorithm narrows down the set \mathcal{I}^{*} of feasible rewards (rewards that ensure that QQ is a Nash equilibria) by computing the set v\mathcal{I}_{v} of rewards that ensure that A1A_{1} takes (u,v)(u,v) for every (u,v)Q(u,v)\in Q. The key idea is that we can efficiently compute all rr that ensures that A1A_{1} prefers (u,v)(u,v) over (u,v)(u,v^{\prime}) by splitting into cases based on whether the optimal paths from vtv\to t and vtv^{\prime}\to t involve winning, tying, or losing. We now prove the correctness and runtime of the algorithm.

Theorem 4.1

Algorithm 1 returns the minimum rr that ensures that QQ is a Nash equilibrium, or \bot if no such rr exists. Further, it runs in polynomial time.

Proof

Assume that A2A_{2} takes QQ. The correctness of the algorithm follows from this lemma:

Lemma 2

Let (u,v)Q(u,v)\in Q and (u,vv)G(u,v^{\prime}\neq v)\in G be arbitrary. v,v\mathcal{I}_{v,v^{\prime}} contains the set of all rewards that ensure that A1A_{1} prefers (u,v)Q(u,v)\in Q over (u,v)Q(u,v^{\prime})\notin Q when standing at uu.

Proof

Suppose that the length of the path from uu to tt in QQ is kk. From Lemma 1, we know that Cn(u,v)={c,ckr/2,c<kr}C_{n}(u,v)=\{c_{\infty},c_{\leq k}-r/2,c_{<k}-r\}. Note that c<kc_{<k} represents the cheapest way to win, ckc_{\leq k} the cheapest way to win or tie, and cc_{\infty} the cheapest way to win, tie, or lose (i.e. the cheapest path). So, cckc<kc_{\infty}\leq c_{\leq k}\leq c_{<k}. d,dkd_{\infty},d_{\leq k}, and d<kd_{<k} are defined similarly with respect to vv^{\prime}. Without loss of generality, assume that all these values exist (i.e. that one can win, tie, or lose from vv and vv^{\prime}); if this isn’t the case, then all values which rely on them can be safely removed.

Now, define c(r)=min(c,ckr/2,c<kr)c^{*}(r)=\min(c_{\infty},c_{\leq k}-r/2,c_{<k}-r). This function c(r)c^{*}(r) is simply the piecewise combination of three linear functions; r1=2(ckc)r_{1}=2(c_{\leq k}-c_{\infty}) and r2=2(c<kck)r_{2}=2(c_{<k}-c_{\leq k}) represent the switching points. Define d(r),s1d^{*}(r),s_{1} and s2s_{2} similarly. Let \mathcal{I} be the pairwise intersection555To obtain the pairwise intersection of two sets of intervals, simply take the Cartesian product of the two sets to obtain pairs, and then take the intersection within each pair to obtain a new set of intervals. {[0,r1],[r1,r2],[r2,]}×{[0,s1],[s1,s2],[s2,]}\{[0,r_{1}],[r_{1},r_{2}],[r_{2},\infty]\}\times^{\cap}\{[0,s_{1}],[s_{1},s_{2}],[s_{2},\infty]\}. Then, notice that for any II\in\mathcal{I}, d(r)c(r)d^{*}(r)-c^{*}(r) has a closed form over rIr\in I. Since A1A_{1} prefers (u,v)(u,v) over (u,v)(u,v^{\prime}) if bc(u,v)+d(r)>bc(u,v)+c(r)bc(u,v^{\prime})+d^{*}(r)>bc(u,v)+c^{*}(r), this expression can be solved for rr in the interval II, yielding a single (sub)interval. Let IrI_{r} denote this subinterval. Since \mathcal{I} is a partition of all possible rr, this means that v,v\mathcal{I}_{v,v^{\prime}}, the union of all intervals IrI_{r}, will contain all rr which ensure that A1A_{1} prefers (u,v)(u,v) to (u,v)(u,v^{\prime}). ∎

With this lemma, the proof of correctness is simple. S(u)=vS(u)=v if and only if rr is in all v,v\mathcal{I}_{v,v^{\prime}}, which is what v\mathcal{I}_{v} tracks. And thus A1A_{1} sticks with QQ if and only if rr is in all v\mathcal{I}_{v}, which is exactly what \mathcal{I}^{*} tracks.

To prove that the algorithm is efficient, notice that v,v\mathcal{I}_{v,v^{\prime}} contains at most 55 intervals since ||5|\mathcal{I}|\leq 5. Further, the pairwise intersection of two sets of intervals can be computed in time O(n+m)O(n+m) and is of size n+m\leq n+m, where nn and mm are the size of the input sets. Thus, all the interval intersections are efficient; in general, the runtime of the algorithm is dominated by the Bellman-Ford subroutine. ∎

Input: A DAG GG, with source ss and sink tt, path QQ in GG, and bias factor bb
Output: The minimum reward rr for QQ to be a Nash equilibrium, or \bot if not possible.
klen(Q)k\leftarrow\operatorname*{len}(Q)
={[0,]}\mathcal{I}^{*}=\{[0,\infty]\}
foreach (u,v)Q(u,v)\in Q do
       kk1k\leftarrow k-1
       c,ck,c<kBellman-Ford(v,),Bellman-Ford(v,k),Bellman-Ford(v,k1)c_{\infty},c_{\leq k},c_{<k}\leftarrow\textsf{Bellman-Ford}(v,\infty),\textsf{Bellman-Ford}(v,k),\textsf{Bellman-Ford}(v,k-1)
       r1,r22(ckc),2(c<kck)r_{1},r_{2}\leftarrow 2(c_{\leq k}-c_{\infty}),2(c_{<k}-c_{\leq k})
       v={[0,]}\mathcal{I}_{v}=\{[0,\infty]\}
       foreach (u,vv)G(u,v^{\prime}\neq v)\in G do
             d,dk,d<kBellman-Ford(v,),Bellman-Ford(v,k),Bellman-Ford(v,k1)d_{\infty},d_{\leq k},d_{<k}\leftarrow\textsf{Bellman-Ford}(v^{\prime},\infty),\textsf{Bellman-Ford}(v^{\prime},k),\textsf{Bellman-Ford}(v^{\prime},k-1)
             s1,s22(dkd),2(d<kdk)s_{1},s_{2}\leftarrow 2(d_{\leq k}-d_{\infty}),2(d_{<k}-d_{\leq k})
             // ×\times^{\cap} returns the pairwise intersection of two sets of intervals
             {[0,r1],[r1,r2],[r2,]}×{[0,s1],[s1,s2],[s2,]}\mathcal{I}\leftarrow\{[0,r_{1}],[r_{1},r_{2}],[r_{2},\infty]\}\times^{\cap}\{[0,s_{1}],[s_{1},s_{2}],[s_{2},\infty]\}
             v,v\mathcal{I}_{v,v^{\prime}}\leftarrow\emptyset
             foreach II\in\mathcal{I} do
                   cmin(c,ckr/2,c<kr)c^{*}\leftarrow\min(c_{\infty},c_{\leq k}-r/2,c_{<k}-r) assuming rIr\in I
                   dmin(d,dkr/2,d<kr)d^{*}\leftarrow\min(d_{\infty},d_{\leq k}-r/2,d_{<k}-r) assuming rIr\in I
                   IrI_{r}\leftarrow subinterval of II containing rr s.t. dc>b(c(u,v)c(u,v))d^{*}-c^{*}>b(c(u,v)-c(u,v^{\prime}))
                   v,vv,v{Ir}\mathcal{I}_{v,v^{\prime}}\leftarrow\mathcal{I}_{v,v^{\prime}}\cup\{I_{r}\}
                  
            vv×v,v\mathcal{I}_{v}\leftarrow\mathcal{I}_{v}\times^{\cap}\mathcal{I}_{v,v^{\prime}}
      ×v\mathcal{I}^{*}\leftarrow\mathcal{I}^{*}\times^{\cap}\mathcal{I}_{v}
if =\mathcal{I}^{*}=\emptyset then
       return \bot
else
       return smallest rr\in\mathcal{I}^{*}
Algorithm 1 Set Nash equilibrium on given path, if possible. Uses Bellman-Ford(v,i)\textsf{Bellman-Ford}(v,i) as a subroutine, to get the cost of the cheapest vtv\to t path with at most ii edges. Also uses a subroutine to compute the pairwise intersection of two sets of disjoint intervals.

5 Extending the Model with Bias Uncertainty and Multiple Competitors

One of the shortcomings of the prior results is that agents are assumed to have publicly known, identical biases. Neither identical nor known biases are very realistic assumptions. In general, it seems odd for the agents to know the exact present bias of their opponent, or for the agents to always have the same bias. From a design perspective, the task designer might not have full knowledge of the agents’ biases, and so may be unsure how to set the reward. We therefore consider this game played by two agents whose biases are uncertain. The agents’ biases are represented by random variables B1B_{1} and B2B_{2} drawn iid from distribution FF, which is publicly known to both the agents and the designer. b1b_{1} and b2b_{2} correspond to the realizations of these random variables. Our goal is now to construct, as cheaply as possible, Bayes-Nash equilibria (BNE) where agents behave optimally with high probability. In this setting, the cost equation becomes Cn(u,v)=bc(u,v)+minPvc(Pv)𝔼A2[RA2(Psu,vt)]C_{n}(u,v)=b\cdot c(u,v)+\min_{P_{v}}c(P_{v})-\operatorname*{\mathbb{E}}_{A_{2}}[R_{A_{2}}(P_{s\to u,v\to t})], where the expectation is over A2A_{2}’s choice of paths.

In this section, we provide a closed form BNE for the nn-fan. We start with the case of two agents and then consider mm competing agents. Since we are searching for BNE, we assume that the agents know their competitor’s strategy.

5.1 Bayes-Nash Equilibria on the nn-fan

As before, let PiP_{i} represent the path that includes edge (vi,t)(v_{i},t), and let P0P_{0} represent the optimal path. Let Pr[A2𝒫]\Pr[A_{2}\to\mathcal{P}] denote the probability that A2A_{2} takes any path in the set 𝒫\mathcal{P}. 666For convenience, we write PiP_{i} instead of {Pi}\{P_{i}\} for singleton sets. The probability is over B2FB_{2}\sim F. Let P>iP_{>i} denote the set of paths {Pi+1,,Pn}\{P_{i+1},\dots,P_{n}\}. Further, note that for any vertex viv_{i}, argminPvic(Pvi)𝔼A2[RA2(Ps,vit)]=(vi,t)\operatorname*{argmin}_{P_{v_{i}}}c(P_{v_{i}})-\operatorname*{\mathbb{E}}_{A_{2}}[R_{A_{2}}(P_{s\to,v_{i}\to t})]=(v_{i},t). This is because going directly to tt will always be the cheapest and quickest path from viv_{i} to tt. Thus:

Cn(vi1,vi)=ci𝔼[R(Pi)]\displaystyle C_{n}(v_{i-1},v_{i})=c^{i}-\operatorname*{\mathbb{E}}[R(P_{i})]
Cn(vi1,t)=bci1𝔼[R(Pi1)]\displaystyle C_{n}(v_{i-1},t)=bc^{i-1}-\operatorname*{\mathbb{E}}[R(P_{i-1})]

Further, 𝔼[R(Pi)]=r/2Pr[A2Pi]+rPr[A2P>i]\operatorname*{\mathbb{E}}[R(P_{i})]=r/2\Pr[A_{2}\to P_{i}]+r\Pr[A_{2}\to P_{>i}]. With this in mind, we start with a basic claim:

Proposition 2

When standing at vertex viv_{i}, A1A_{1} prefers PiP_{i} over Pi+1P_{i+1} if:

r/2Pr[A2{Pi,Pi+1}]>ci(b1c)r/2\Pr[A_{2}\to\{P_{i},P_{i+1}\}]>c^{i}(b_{1}-c)
Proof

If A1A_{1}’s expected utility from going immediately to tt is higher than their utility from procrastinating, then

rPr[A2P>i]\displaystyle r\cdot\Pr[A_{2}\to P_{>i}] +r/2Pr[A2Pi]b1ci>rPr[A2P>i+1]+r/2Pr[A2Pi+1]ci+1.\displaystyle+r/2\cdot\Pr[A_{2}\to P_{i}]-b_{1}c^{i}>r\cdot\Pr[A_{2}\to P_{>i+1}]+r/2\cdot\Pr[A_{2}\to P_{i+1}]-c^{i+1}.
It follows that
ci(b1c)\displaystyle c^{i}(b_{1}-c) <r(Pr[A2P>i]Pr[A2P>i+1])+r/2(Pr[A2Pi]Pr[A2Pi+1])\displaystyle<r(\Pr[A_{2}\to P_{>i}]-\Pr[A_{2}\to P_{>i+1}])+r/2(\Pr[A_{2}\to P_{i}]-\Pr[A_{2}\to P_{i+1}])
=r(Pr[A2Pi+1])+r/2(Pr[A2Pi]Pr[A2Pi+1])\displaystyle=r(\Pr[A_{2}\to P_{i+1}])+r/2(\Pr[A_{2}\to P_{i}]-\Pr[A_{2}\to P_{i+1}])
=r/2Pr[A2{Pi,Pi+1}].\displaystyle=r/2\Pr[A_{2}\to\{P_{i},P_{i+1}\}].

This claim matches intuition; due to the tying mechanism, A1A_{1} gets a reward r/2r/2 higher by going to tt from viv_{i} if A2A_{2} either takes PiP_{i} or Pi+1P_{i+1}. Note that A1A_{1} is comparing PiP_{i} to Pi+1P_{i+1} because the latter describes what he (naively) believes he will do if he procrastinates. We now describe a Bayes-Nash equilibrium in the nn-fan.

Theorem 5.1

Let GG be an nn-fan with reward rr and suppose B1,B2B_{1},B_{2} are drawn from distribution BB with CDF FF. Let pp be the solution to F(rp2+c)=pF(\frac{rp}{2}+c)=p. If p>1cn1+1p>\frac{1}{c^{n-1}+1}, then the following strategy is a Bayes-Nash equilibrium:

P(b)={take P0,brp2+ctake Pn,otherwise\displaystyle P(b)=\begin{cases}\text{take }P_{0},&b\leq\frac{rp}{2}+c\\ \text{take }P_{n},&\text{otherwise}\\ \end{cases}

In this equilibrium, for either agent, the probability that they take P0P_{0} is pp. So the expected cost ratio will be p+(1p)cnp+(1-p)c^{n}.

Proof

Using Proposition 2, at ss, A1A_{1} prefers P0P_{0} over P1P_{1} if r/2Pr[A2{P0,P1}]>b1cr/2\Pr[A_{2}\to\{P_{0},P_{1}\}]>b_{1}-c. Since Pr[A2P1]=0\Pr[A_{2}\to P_{1}]=0, this simplifies to b1<rPr[A2P0]2+cb_{1}<\frac{r\Pr[A_{2}\to P_{0}]}{2}+c. Let Pr[A2P0]=p\Pr[A_{2}\to P_{0}]=p. Then, since the biases are drawn iid, for this strategy to be a symmetric BNE we need:

Pr[A1P0]\displaystyle\Pr[A_{1}\to P_{0}] =Pr[A2P0]\displaystyle=\Pr[A_{2}\to P_{0}]
Pr[B1rp2+c]\displaystyle\Pr[B_{1}\leq\frac{rp}{2}+c] =p\displaystyle=p
F(rp2+c)\displaystyle F\left(\frac{rp}{2}+c\right) =p\displaystyle=p

So if pp satisfies this equation, by Proposition 2, A1A_{1} prefers to go from ss to tt directly. Now, consider the case where b1>rp2+cb_{1}>\frac{rp}{2}+c. Then, A1A_{1} prefers to go to v1v_{1}. Again using Proposition 2, at v1v_{1}, A1A_{1} will prefer tt over v2v_{2} if r/2Pr[A2{P1,P2}]>c(b1c)r/2\Pr[A_{2}\to\{P_{1},P_{2}\}]>c(b_{1}-c), which implies that b1c<0b_{1}-c<0. Since we know that b1>rp2+cb_{1}>\frac{rp}{2}+c, it’s certainly the case that b1>cb_{1}>c, so this never holds. Thus, A1A_{1} prefers to go to v2v_{2}. The same argument can be repeated until the agent reaches vn1v_{n-1}. Then, the agent will prefer tt over vnv_{n} if r/2Pr[A2{Pn1,Pn}]>cn1(b1c)r/2\Pr[A_{2}\to\{P_{n-1},P_{n}\}]>c^{n-1}(b_{1}-c), i.e. if b1<r(1p)2cn1+cb_{1}<\frac{r(1-p)}{2c^{n-1}}+c.

Combining our assumption that b1>rp2+cb_{1}>\frac{rp}{2}+c with this inequality, we get that pp must be less than 1cn1+1\frac{1}{c^{n-1}+1}. When this is false, the agent will always prefer PnP_{n} when their bias b1>rp2+cb_{1}>\frac{rp}{2}+c. So, under these conditions, the given strategy is a BNE. ∎

Notice that while the trivial solution p=0p=0 satisfies F(rp2+c)=pF(\frac{rp}{2}+c)=p, this is not above 1cn1+1\frac{1}{c^{n-1}+1}, so the trivial solution is not relevant for finding Bayes-Nash equilibria. And while it’s possible (for some distributions BB and very low rewards rr) for the non-trivial solution to p=Pr[B1rp2+c]p=\Pr[B_{1}\leq\frac{rp}{2}+c] to be less than 1cn1+1\frac{1}{c^{n-1}+1}, for the distributions we have explored this is not the case. One might wonder if there’s a straightforward generalization of this BNE to other graphs with a dominant path, as in the case without bias uncertainty. In Appendix 0.B, we discuss challenges that we encountered trying to do this.

We now use the theorem to understand how much reward is required for optimal behavior with high probability, or a low expected cost ratio (which is a much stronger requirement). Since the expected cost is p+cn(1p)p+c^{n}(1-p), in order for this to be low, 1p1-p has to be close to 1/cn1/c^{n}. Plugging this in to the CDF, we see that for this to happen, we must have

F(r2(11cn)+c)=11cn\displaystyle F\left(\frac{r}{2}\left(1-\frac{1}{c^{n}}\right)+c\right)=1-\frac{1}{c^{n}}

which essentially requires that exponentially little probability mass (in nn) remains after r/2r/2 distance from cc. For an exponential distribution, this requires rr to be linear in nn, and with a heavier tailed distribution like the Equal Revenue distribution, this requires rr to be exponential in nn. But we may be content with simply guaranteeing optimal behavior with high probability. In that case, so long as rr is increasing in nn, the agents will take the optimal path with high probability for at least the equal revenue, exponential, and uniform distributions. We more precisely explore the probability of optimal behavior and the cost ratios for these distributions in Appendix 0.A.

5.2 Increasing Number of Competitors

We saw earlier that increasing the number of competitors doesn’t change the per-agent reward needed for optimal behavior. But one might hope that in Bayesian settings such as bias uncertainty, increasing the number of agents significantly decreases the per-agent reward needed to encourage optimal behavior – in particular, as the number of agents increases, the probability of some agent having a very low bias increases. We provide evidence against this claim. We start by showing that the following strategy is a Bayes-Nash equilibrium with mm agents.

Theorem 5.2

Let GG be an nn-fan with reward rr and suppose there are m+1m+1 competitors with biases drawn iid from distribution BB with CDF FF. Let pp be the solution to F(rd(p,m)+c)=pF(rd(p,m)+c)=p, where:

d(p,m)=1(1p)m(1+pm)p(m+1)\displaystyle d(p,m)=\frac{1-(1-p)^{m}(1+pm)}{p(m+1)}

If p>m1log(1+m+m+12cn1)p>m^{-1}\log\left(1+m+\frac{m+1}{2c^{n-1}}\right), the following strategy is a Bayes-Nash equilibrium where the probability of optimal behavior for any agent is pp:

P(b)={take P0,brd(p,m)+ctake Pn,otherwise\displaystyle P(b)=\begin{cases}\text{take }P_{0},&b\leq rd(p,m)+c\\ \text{take }P_{n},&\text{otherwise}\\ \end{cases}
Proof

See Appendix 0.C.

Now, we consider the equal revenue distribution. In the case of two agents, we required exponentially high reward to get a constant expected cost ratio. Unfortunately, this remains true even as the number of agents competing goes to infinity.

Theorem 5.3

Suppose the biases are drawn from an equal revenue distribution shifted over by cc. Let ss denote the per agent price, so r=(m+1)sr=(m+1)s. Then, no matter how many agents are competing, for any fixed agent, the probability that they behave optimally is at most 4s+s2s2\frac{\sqrt{4s+s^{2}}-s}{2}.

Proof

See Appendix 0.C.

Earlier results show that the probability of optimal behavior with two competing agents is s1s\frac{s-1}{s}, which is nearly identical to the theorem’s bound as ss grows.777To see this, note that 4s+s2s2\frac{\sqrt{4s+s^{2}}-s}{2} can be written as 14s+s2s4s+s2+s1-\frac{\sqrt{4s+s^{2}}-s}{\sqrt{4s+s^{2}}+s} and s1s=11s\frac{s-1}{s}=1-\frac{1}{s}. Further, 4s+s2s4s+s2+s22+2s\frac{\sqrt{4s+s^{2}}-s}{\sqrt{4s+s^{2}}+s}\approx\frac{2}{2+2s} as ss grows, since 4s+s2s+2\sqrt{4s+s^{2}}\approx s+2. So, both probabilities are 1O(s1)1-O(s^{-1}). So even as mm\to\infty, the relationship between the reward and the probability of optimal behavior doesn’t significantly change (and thus the relationship between the reward and expected cost ratio won’t significantly change). We conjecture that, in general, increasing the number of agents does not significantly decrease the per-agent reward.

6 Conclusion

We studied the impact of competition on present bias, showing that in many settings where naive agents can experience exponentially high cost ratio, a small amount of competition drives agents to optimal behavior. This paper is a first step towards painting a more optimistic picture than much of the work surrounding present bias. Our results highlight why, in naturally competitive settings, otherwise biased agents might behave optimally. Further, task/mechanism designers can use our results to directly alleviate the harms of present bias. This competitive model might be a more natural model than other motivation schemes, such as internal edge rewards, and is able to more cheaply ensure optimal behavior. Our work also leaves open many exciting questions.

First, with bias uncertainty, we only obtain concrete results on the nn-fan. So one obvious direction is to determine which graphs have Bayes-Nash equilibria on the optimal path, and what these equilibria look like. Second, we explore two “dimensions” of competition – the amount of reward and the number of competitors, finding that the latter is unlikely to be significant. Another interesting goal is thus to explore new dimensions of competition.

Lastly, we could extend our work beyond cost ratios, moving to the model where agents can abandon their path at any point. For one, this move would allow us to integrate results on sunk-cost bias, represented as an intrinsic cost for abandoning a task that’s proportional to the amount of effort expended. Kleinberg et al. (2017) show that agents who are sophisticated with regard to their present-bias, but naive with respect to their sunk cost bias can experience exponentially high cost before abandoning their traversal (this is especially interesting because sophisticated agents without sunk cost bias behave nearly optimally). Can competition alleviate this exponential worst case? There are also interesting computational questions in this model. For instance, given a fixed reward budget rr, is it possible to determine in polynomial time if one can induce an equilibrium where both agents traverse the graph? Such problems are NP-hard for other reward models, but may be tractable with competition. Overall, the abandonment setting has several interesting interactions with competition that we have not explored.

References

  • Albers and Kraft [2017] S. Albers and D. Kraft. On the value of penalties in time-inconsistent planning. arXiv preprint arXiv:1702.01677, 2017.
  • Albers and Kraft [2019] S. Albers and D. Kraft. Motivating time-inconsistent agents: A computational approach. Theory of computing systems, 63(3):466–487, 2019.
  • Ariely and Jones [2008] D. Ariely and S. Jones. Predictably irrational. Harper Audio New York, NY, 2008.
  • Arkes and Blumer [1985] H. R. Arkes and C. Blumer. The psychology of sunk cost. Organizational behavior and human decision processes, 35(1):124–140, 1985.
  • DellaVigna [2009] S. DellaVigna. Psychology and economics: Evidence from the field. Journal of Economic literature, 47(2):315–72, 2009.
  • DellaVigna and Malmendier [2006] S. DellaVigna and U. Malmendier. Paying not to go to the gym. american economic Review, 96(3):694–719, 2006.
  • Frederick et al. [2002] S. Frederick, G. Loewenstein, and T. O’donoghue. Time discounting and time preference: A critical review. Journal of economic literature, 40(2):351–401, 2002.
  • Gravin et al. [2016] N. Gravin, N. Immorlica, B. Lucier, and E. Pountourakis. Procrastination with variable present bias. arXiv preprint arXiv:1606.03062, 2016.
  • Kahneman and Tversky [1979] D. Kahneman and A. Tversky. Prospect theory: An analysis of decision under risk. Econometrica, 47(2):263–292, 1979.
  • Kleinberg and Oren [2014] J. Kleinberg and S. Oren. Time-inconsistent planning: a computational problem in behavioral economics. In Proceedings of the fifteenth ACM conference on Economics and computation, pages 547–564, 2014.
  • Kleinberg et al. [2016] J. Kleinberg, S. Oren, and M. Raghavan. Planning problems for sophisticated agents with present bias. In Proceedings of the 2016 ACM Conference on Economics and Computation, pages 343–360, 2016.
  • Kleinberg et al. [2017] J. Kleinberg, S. Oren, and M. Raghavan. Planning with multiple biases. In Proceedings of the 2017 ACM Conference on Economics and Computation, pages 567–584, 2017.
  • Ma et al. [2019] H. Ma, R. Meir, D. C. Parkes, and E. Wu-Yan. Penalty bidding mechanisms for allocating resources and overcoming present bias. arXiv preprint arXiv:1906.09713, 2019.
  • Oren and Soker [2019] S. Oren and D. Soker. Principal-agent problems with present-biased agents. In International Symposium on Algorithmic Game Theory, pages 237–251. Springer, 2019.
  • Roughgarden [2016] T. Roughgarden. Cs269i: Incentives in computer science lecture# 19: Time-inconsistent planning. 2016.
  • Tang et al. [2017] P. Tang, Y. Teng, Z. Wang, S. Xiao, and Y. Xu. Computational issues in time-inconsistent planning. In Thirty-First AAAI Conference on Artificial Intelligence, 2017.
  • Yan and Yong [2019] W. Yan and J. Yong. Time-inconsistent optimal control problems and related issues. In Modeling, Stochastic Control, Optimization, and Applications, pages 533–569. Springer, 2019.

Appendix 0.A Concrete Distributions for Bias Uncertainty

In this section, we more precisely describe the relationship between rr and pp (the probability of optimal behavior in the BNE from Theorem 5.1) for several distributions.

Lemma 3

Suppose B1B_{1} and B2B_{2} are drawn from an equal revenue distribution that’s shifted over by cc (so b1,b2>cb_{1},b_{2}>c). Then, for any r2r\geq 2, the probability of optimal behavior in the BNE from Theorem 5.1 is 12r1-\frac{2}{r}, and thus the expected cost is 12r+2cnr1-\frac{2}{r}+\frac{2c^{n}}{r}.

Proof

The equal revenue distribution described has CDF F(z)=11zc+1F(z)=1-\frac{1}{z-c+1}. Now, applying Theorem 5.1:

11rp2+cc+1\displaystyle 1-\frac{1}{\frac{rp}{2}+c-c+1} =p\displaystyle=p
rp2+11\displaystyle\frac{rp}{2}+1-1 =p(rp2+1)\displaystyle=p\left(\frac{rp}{2}+1\right)
(r2)p2+(1r2)p\displaystyle\left(\frac{r}{2}\right)p^{2}+\left(1-\frac{r}{2}\right)p =0\displaystyle=0
p\displaystyle p =0 or r2r\displaystyle=0\text{ or }\frac{r-2}{r}

So with reward r2r\geq 2, p=r2rp=\frac{r-2}{r}. ∎

Because this distribution is heavy tailed, the expected cost remains exponential unless rr is exponential. However, if rr is any increasing function of nn, then w.h.p. both agents will take the optimal path. Now, we look at the exponential distribution.

Lemma 4

Suppose B1B_{1} and B2B_{2} are drawn from an exponential distribution, 𝐄𝐱𝐩(λ)\mathbf{Exp}(\lambda), that’s shifted over by cc (so b1,b2>cb_{1},b_{2}>c). Then, for any rr, the probability of optimal behavior in the BNE from Theorem 5.1 is at least p1O(eλr/2)p\geq 1-O(e^{-\lambda r/2}), and thus the expected cost is 1O(eλr/2)+O(cneλr/2)1-O(e^{-\lambda r/2})+O(c^{n}e^{-\lambda r/2}).

Proof

We first solve for pp in Theorem 5.1:

Pr[B1rp2+c]\displaystyle\Pr[B_{1}\leq\frac{rp}{2}+c] =p\displaystyle=p
1eλ(rp2+cc)\displaystyle 1-e^{-\lambda(\frac{rp}{2}+c-c)} =p\displaystyle=p
1eλrp2\displaystyle 1-e^{-\frac{\lambda rp}{2}} =p\displaystyle=p
ln(1p)p\displaystyle\frac{\ln(1-p)}{p} =λr2\displaystyle=\frac{-\lambda r}{2}
p\displaystyle p =0 or 1+2W(λr/2eλr/2)λr\displaystyle=0\text{ or }1+\frac{2W(-\lambda r/2\cdot e^{-\lambda r/2})}{\lambda r}

Where WW refers to the Lambert WW function, i.e. W(z)=xxex=zW(z)=x\iff xe^{x}=z. The Taylor series of WW around 0 is:

W(x)=n=1(n)n1n!xn=xx2+32x3\displaystyle W(x)=\sum_{n=1}^{\infty}\frac{(-n)^{n-1}}{n!}x^{n}=x-x^{2}+\frac{3}{2}x^{3}-\dots

For our application, notice that x=λr/2eλr/2x=-\lambda r/2\cdot e^{-\lambda r/2} is very small; thus, we let W(x)=xO(x2)W(x)=x-O(x^{2}). So:

p1O(eλr/2)\displaystyle p\geq 1-O(e^{-\lambda r/2})

This means that if r=O(n/λ)r=O(n/\lambda), we get a constant expected cost ratio (and extremely high probability of optimal behavior). We finally consider the simple example of a uniform distribution.

Lemma 5

Suppose B1B_{1} and B2B_{2} are drawn from the uniform distribution, 𝐔𝐧𝐢𝐟[c,d]\mathbf{Unif}[c,d] (so b1,b2>cb_{1},b_{2}>c). Then, for r2(dc)r\geq 2(d-c), the probability of optimal behavior in the BNE from Theorem 5.1 is 11. For all r<2(dc)r<2(d-c), this probability is zero.

Proof

For the first part of the lemma, note that if rp/2+cdrp/2+c\geq d, then F(rp/2+c)=1F(rp/2+c)=1, so p=1p=1 holds if r/2+cdr/2+c\geq d, or equivalently, r2(dc)r\geq 2(d-c). If we assume rp/2+c<drp/2+c<d, then using CDF F(z)=zcdcF(z)=\frac{z-c}{d-c} to solve for pp:

rp2+ccdc\displaystyle\frac{\frac{rp}{2}+c-c}{d-c} =p\displaystyle=p
r2p\displaystyle\frac{r}{2}p =(dc)p\displaystyle=(d-c)p

Since r<2(dc)r<2(d-c), this is satisfied only with p=0p=0. ∎

More generally, for any distribution bounded by bb^{*}, computing the reward that guarantees a Nash equilibrium if both agents had public biases bb^{*} also suffices to guarantee a BNE in this setting (and in particular, this BNE always results in optimal behavior). For the case of the uniform distribution, this is the only reward that ever results in optimal behavior. As a direct consequence of this lemma, the expected cost ratio is 11 with the reward r=O(d)r=O(d).

Appendix 0.B Difficulties Finding Bayes-Nash Equilibria in Arbitrary Graphs

The simplest way to generalize our BNE from the nn-fan would be to consider the following schematic:

Sy(b)={take O,bytake P, for some fixed Potherwise\displaystyle S_{y}(b)=\begin{cases}\text{take }O,&b\leq y\\ \text{take }P,\text{ for some fixed }P&\text{otherwise}\\ \end{cases}

Where yy ideally would depend on only the costs in the graph, the reward rr, and pp, the solution to F(y)=pF(y)=p. So the agents would either take the optimal path, if their bias was sufficiently small, or some path PP that can “catch” agents with high bias. In the fan graph, this was the path around the fan. That path happened to be the natural path for any bias, and perhaps more importantly, the “limiting” natural path as bb\to\infty (i.e. the greedy/myopic path, where you simply take the lowest cost edge at each step). But neither of these ideas generalize; this simple schematic will not produce a BNE in an arbitrary graph, as the following counter example shows.

Consider the modified 33-fan below. Let 1<c1=c<c2<c2<c22<c31<c_{1}=c<c^{2}<c_{2}<c_{2}^{2}<c_{3}. So the cost of the edges to tt increase faster than doubling.

ssxxvv0ttzz2121yy660001111ssv1v_{1}00cc11vnv_{n}ttv2v_{2}v3v_{3}\cdots0c2c^{2}c3c^{3}cnc^{n}ss22ttvv66ss0v2,0v_{2,0}v1,0v_{1,0}0v1,2v_{1,2}0ttv2,2v_{2,2}0v1,1v_{1,1}0v2,1v_{2,1}22222255225555ssuuxxttyyvvzz2222224455774422ss11v1v_{1}11u1u_{1}220v2v_{2}11u2u_{2}220v3v_{3}11\cdots11vnv_{n}11unu_{n}220ttss11v1v_{1}q1q_{1}v2v_{2}q2q_{2}v3v_{3}1010tt22100100ssq1q_{1}v1v_{1}q2q_{2}v2v_{2}q3q_{3}88tt55v3v_{3}v4v_{4}xx221515ssv1v_{1}00c2c_{2}11c1=cc_{1}=cv3v_{3}ttv2v_{2}0c3c_{3}

We claim the following:

Proposition 3

There is no symmetric BNE for this graph that assigns positive probability to only two paths, assuming the bias is drawn from a distribution with support on [1,][1,\infty].

The main idea behind the proof is to notice first that path P0P_{0} and P3P_{3} will always be taken, when the bias is very low or very high. We then define the interval of biases (the interval depends on the reward, costs, and probability of optimal behavior) wherein A1A_{1} wants to take P1P_{1} or P2P_{2} and argue that at least one of the intervals must be non-empty.

Proof

Note that as b:=b1b:=b_{1}\to\infty, for any fixed reward, the agent will always take the greedy option of choosing the cheapest edge at every step. This will cause them to take the path P3P_{3}. Further if their bias is below c1c_{1}, they will take the optimal path, P0P_{0}. So, any BNE must assign positive probability to those two paths. We now prove, by contradiction, that it must assign positive probability to either P1P_{1} or P2P_{2}.

Let pp be Pr[A2P0]\Pr[A_{2}\to P_{0}] and 1p=Pr[A2P3]1-p=\Pr[A_{2}\to P_{3}]. At ss, A1A_{1} prefers to procrastinate to v1v_{1} when b>c+rp/2b>c+rp/2, so suppose this is the case. At v1v_{1}, A1A_{1} would take P1P_{1} if:

bcr(1p)\displaystyle bc-r(1-p) <c2r(1p)\displaystyle<c_{2}-r(1-p)
b\displaystyle b <c2/c\displaystyle<c_{2}/c

So, if b[c+rp/2,c2/c]b\in[c+rp/2,c_{2}/c], A1A_{1} would take P1P_{1}. If this interval is non-empty, then we’re done. If it’s not, then:

c2c\displaystyle c_{2}c >c+rp/2\displaystyle>c+rp/2
rp\displaystyle rp >2c2c2c\displaystyle>\frac{2c_{2}}{c}-2c
r\displaystyle r >2c22c2cp\displaystyle>\frac{2c_{2}-2c^{2}}{cp}

If the reward is at least this high, then A1A_{1} always goes to v2v_{2} at v1v_{1}. Now, at v2v_{2} (where we know b>c2/cb>c_{2}/c), A1A_{1} takes P2P_{2} when:

bc2r(1p)\displaystyle bc_{2}-r(1-p) <c3r2(1p)\displaystyle<c_{3}-\frac{r}{2}(1-p)
bc2\displaystyle bc_{2} <c3+r2(1p)\displaystyle<c_{3}+\frac{r}{2}(1-p)
b\displaystyle b <2c3+r(1p)2c2\displaystyle<\frac{2c_{3}+r(1-p)}{2c_{2}}

As before, if b[c2/c,2c3+r(1p)2c2]b\in[c_{2}/c,\frac{2c_{3}+r(1-p)}{2c_{2}}], A1A_{1} would take P2P_{2}. We claim that this interval must be non-empty. Suppose it wasn’t, and c2/c>2c3+r(1p)2c2c_{2}/c>\frac{2c_{3}+r(1-p)}{2c_{2}}. Then:

r(1p)+2c3\displaystyle r(1-p)+2c_{3} <2c22/c\displaystyle<2c_{2}^{2}/c
r\displaystyle r <2c222c3cc(1p)\displaystyle<\frac{2c_{2}^{2}-2c_{3}c}{c(1-p)}

However, this upper bound causes a contradiction – from the definitions, we know that c22<c3c_{2}^{2}<c_{3}, so clearly c22<c3cc_{2}^{2}<c_{3}c, and thus this bound requires r<0r<0. This contradicts our typical requirement that r0r\geq 0, but even ignoring this requirement, this also contradicts our earlier bound that r>2c22c2cpr>\frac{2c_{2}-2c^{2}}{cp}.

So by contradiction, A1A_{1} will either take P1P_{1} or P2P_{2} with positive probability. ∎

In general, the main challenge with finding a BNE is finding a set of paths {Pi}\{P_{i}\} and bias intervals {Ii}\{I_{i}\} such that, for all intervals IiI_{i}, A1A_{1} must want to traverse PiP_{i} if their bias falls in IiI_{i}. The complication is that A1A_{1}’s decision to traverse PiP_{i} depends on the distribution over A2A_{2}’s choice of path, which is induced by the intervals IiI_{i} and the bias distribution. These interweaving constraints are difficult to manage even for simple graphs, as this preliminary result suggests.

Appendix 0.C Adding More Competitors with Bias Uncertainty

Proof (Proof of Theorem 5.2)

Suppose all agents besides A1A_{1} follow P(b)P(b). Then, at ss, A1A_{1} views procrastinating to v1v_{1} as having reward c+rPr[AiPn]m-c+r\Pr[A_{i}\to P_{n}]^{m}, using the iid assumption and the fact that A1A_{1} will only win if everyone else has a bias which, according to P(b)P(b), causes them to procrastinate. On the other hand, going directly from ss to tt has utility b1+𝔼[R]-b_{1}+\operatorname*{\mathbb{E}}[R], where NN is a random variable measuring how many agents take P0P_{0} and R=rN+1R=\frac{r}{N+1} is the random variable corresponding to the reward. Notice that N𝐁𝐢𝐧(m,p)N\sim\mathbf{Bin}(m,p), where p=Pr[AiP0]p=\Pr[A_{i}\to P_{0}]. We now use the following claim:

Claim

Let N𝐁𝐢𝐧(m,p)N\sim\mathbf{Bin}(m,p). Then:

𝔼[1N+1]=1(1p)m+1p(m+1)\operatorname*{\mathbb{E}}\left[\frac{1}{N+1}\right]=\frac{1-(1-p)^{m+1}}{p(m+1)}
Proof

First, using the definition of the expectation and simplifying:

𝔼[1N+1]\displaystyle\operatorname*{\mathbb{E}}\left[\frac{1}{N+1}\right] =i=0m1i+1(mi)pi(1p)mi\displaystyle=\sum_{i=0}^{m}\frac{1}{i+1}\binom{m}{i}p^{i}(1-p)^{m-i}
=i=0m(1p)mm+1(m+1i+1)(p1p)i\displaystyle=\sum_{i=0}^{m}\frac{(1-p)^{m}}{m+1}\binom{m+1}{i+1}\left(\frac{p}{1-p}\right)^{i}
=(1p)m+1p(m+1)i=0m(m+1i+1)(p1p)i+1\displaystyle=\frac{(1-p)^{m+1}}{p(m+1)}\sum_{i=0}^{m}\binom{m+1}{i+1}\left(\frac{p}{1-p}\right)^{i+1}

Now, notice that the binomial theorem says:

(1+p1p)m+1\displaystyle\left(1+\frac{p}{1-p}\right)^{m+1} =(m+10)(p1p)0+j=1m+1(m+1j)(p1p)j\displaystyle=\binom{m+1}{0}\left(\frac{p}{1-p}\right)^{0}+\sum_{j=1}^{m+1}\binom{m+1}{j}\left(\frac{p}{1-p}\right)^{j}
(1+p1p)m+11\displaystyle\left(1+\frac{p}{1-p}\right)^{m+1}-1 =i=0m(m+1i+1)(p1p)i+1\displaystyle=\sum_{i=0}^{m}\binom{m+1}{i+1}\left(\frac{p}{1-p}\right)^{i+1}

Plugging this back in:

𝔼[1N+1]\displaystyle\operatorname*{\mathbb{E}}\left[\frac{1}{N+1}\right] =(1p)m+1p(m+1)((1+p1p)m+1)\displaystyle=\frac{(1-p)^{m+1}}{p(m+1)}\left(\left(1+\frac{p}{1-p}\right)^{m+1}\right)
=((1p)(1+p1p))m+1(1p)m+1p(m+1)\displaystyle=\frac{\left((1-p)\left(1+\frac{p}{1-p}\right)\right)^{m+1}-(1-p)^{m+1}}{p(m+1)}
=1(1p)m+1p(m+1)\displaystyle=\frac{1-(1-p)^{m+1}}{p(m+1)}

From the claim, we see that 𝔼[R]=r(1(1p)m+1)p(m+1)\operatorname*{\mathbb{E}}[R]=\frac{r(1-(1-p)^{m+1})}{p(m+1)}. Now, A1A_{1} prefers to go immediately from ss to tt when:

b1+𝔼[R]\displaystyle-b_{1}+\operatorname*{\mathbb{E}}[R] >c+r(1p)m\displaystyle>-c+r(1-p)^{m}
b1\displaystyle b_{1} <r(1(1p)m+1p(m+1)(1p)m)+c\displaystyle<r\left(\frac{1-(1-p)^{m+1}}{p(m+1)}-(1-p)^{m}\right)+c
=r(1(1p)m(1+pm)p(m+1))+c\displaystyle=r\left(\frac{1-(1-p)^{m}(1+pm)}{p(m+1)}\right)+c

For ease of notation, let d(p,m)=1(1p)m(1+pm)p(m+1)d(p,m)=\frac{1-(1-p)^{m}(1+pm)}{p(m+1)}. As a sanity check, one can show that d(p,m)d(p,m) is always positive (to show that the expected reward will always be positive).

Claim

d(p,m)>0d(p,m)>0

Proof

We want:

1(1p)m(1+pm)p(m+1)\displaystyle\frac{1-(1-p)^{m}(1+pm)}{p(m+1)} >0\displaystyle>0
1(1p)m(1+pm)\displaystyle 1-(1-p)^{m}(1+pm) >0\displaystyle>0
1\displaystyle 1 >(1p)m(1+pm)\displaystyle>(1-p)^{m}(1+pm)
1\displaystyle 1 >epm(1+pm)\displaystyle>e^{-pm}(1+pm) (1+xex1+x\leq e^{x})
epm\displaystyle e^{pm} >1+pm\displaystyle>1+pm

And the last line holds by again using the fact that 1+xex1+x\leq e^{x}. ∎

We’ve shown that when the agent’s bias is less than rd(p,m)+crd(p,m)+c, they prefer P0P_{0} to PnP_{n} (assuming everyone else takes P0P_{0} with probability pp). Now, if their bias is above rd(p,m)+crd(p,m)+c, we know that they will progress to v1v_{1}. We claim that they will surely procrastinate all the way to vn1v_{n-1}. To see why, since we are analyzing a Bayes-Nash equilibrium, the agent assumes that all other agents follow the equilibrium strategy, which means that all their competitors take either P0P_{0} or PnP_{n}. Thus, the rewards from paths P1P_{1} to Pn1P_{n-1} are all identical. So by the construction of the nn-fan, the agent procrastinates until vn1v_{n-1}. If pp is not extremely low, they will then go to vnv_{n}, as the following claim demonstrates.

Claim

If p>log(1+m+m+12cn1)/mp>\log(1+m+\frac{m+1}{2c^{n-1}})/m and the agent has bias b>rd(p,m)+cb>rd(p,m)+c, then the agent takes path PnP_{n}.

Proof

We’ve just shown that if b>rd(p,m)+cb>rd(p,m)+c, they go to vn1v_{n-1}. From there, going to tt has utility b1cn1+r(1p)m-b_{1}c^{n-1}+r(1-p)^{m} and going to vnv_{n} has utility cn+r/2(1p)m-c^{n}+r/2(1-p)^{m}. So, the agent prefers tt when:

cn+r/2(1p)m\displaystyle-c^{n}+r/2(1-p)^{m} <b1cn1+r(1p)m\displaystyle<-b_{1}c^{n-1}+r(1-p)^{m}
(b1c)cn1\displaystyle(b_{1}-c)c^{n-1} <r/2(1p)m\displaystyle<r/2\cdot(1-p)^{m}
b1\displaystyle b_{1} <r(1p)m2cn1+c\displaystyle<\frac{r(1-p)^{m}}{2c^{n-1}}+c

So, if the agents bias is in the interval [rd(p,m)+c,r(1p)m2cn1+c][rd(p,m)+c,\frac{r(1-p)^{m}}{2c^{n-1}}+c], they prefer to deviate from the equilibrium strategy (in Theorem 5.2) by going to vn1v_{n-1}. We now compute when this interval is empty:

r1(1p)m(1+pm)p(m+1)+c\displaystyle r\frac{1-(1-p)^{m}(1+pm)}{p(m+1)}+c >r(1p)m2cn1+c\displaystyle>\frac{r(1-p)^{m}}{2c^{n-1}}+c
1(1p)m(1+pm)p(m+1)\displaystyle\frac{1-(1-p)^{m}(1+pm)}{p(m+1)} >(1p)m2cn1\displaystyle>\frac{(1-p)^{m}}{2c^{n-1}}
2cn1(1(1p)m(1+pm))\displaystyle 2c^{n-1}(1-(1-p)^{m}(1+pm)) >(1p)mp(m+1)\displaystyle>(1-p)^{m}p(m+1)
2cn1\displaystyle 2c^{n-1} >(1p)mp(m+1)+2cn1(1p)m(1+pm)\displaystyle>(1-p)^{m}p(m+1)+2c^{n-1}(1-p)^{m}(1+pm)
2cn1\displaystyle 2c^{n-1} >(1p)m(2cn1+pm(2cn1+1)+p)\displaystyle>(1-p)^{m}(2c^{n-1}+pm(2c^{n-1}+1)+p)
2cn1\displaystyle 2c^{n-1} >epm(2cn1+pm(2cn1+1)+p)\displaystyle>e^{-pm}(2c^{n-1}+pm(2c^{n-1}+1)+p) (Since (1p)mepm(1-p)^{m}\leq e^{-pm})
epm\displaystyle e^{pm} >1+mp2cn1+12cn1+p2cn1\displaystyle>1+mp\frac{2c^{n-1}+1}{2c^{n-1}}+\frac{p}{2c^{n-1}}
epm\displaystyle e^{pm} >1+m2cn1+12cn1+12cn1\displaystyle>1+m\frac{2c^{n-1}+1}{2c^{n-1}}+\frac{1}{2c^{n-1}} (Since p1p\leq 1)
epm\displaystyle e^{pm} >1+m+12cn1(m+1)\displaystyle>1+m+\frac{1}{2c^{n-1}(m+1)}
p\displaystyle p >log(1+m+m+12cn1)m\displaystyle>\frac{\log(1+m+\frac{m+1}{2c^{n-1}})}{m}

This is thus sufficient for the agent to take path PnP_{n}. ∎

So, by the claim, if p>log(1+m+m+12cn1)/mp>\log(1+m+\frac{m+1}{2c^{n-1}})/m, the following strategy is a Bayes-Nash equilibrium:

P(b)={take P0,brd(p,m)+ctake Pn,otherwise\displaystyle P(b)=\begin{cases}\text{take }P_{0},&b\leq rd(p,m)+c\\ \text{take }P_{n},&\text{otherwise}\\ \end{cases}

Which is exactly what the theorem states. ∎

Proof (Proof of Theorem 5.3)

Recall that, for the equal revenue distribution shifted by cc, F(z)=11zc+1F(z)=1-\frac{1}{z-c+1}. Now, we attempt to solve F(rd(p,m)+c)=pF(rd(p,m)+c)=p for pp:

11rd(p,m)+1\displaystyle 1-\frac{1}{rd(p,m)+1} =p\displaystyle=p
rd(p,m)+11\displaystyle rd(p,m)+1-1 =p(rd(p,m)+1)\displaystyle=p(rd(p,m)+1)
rd(p,m)\displaystyle rd(p,m) =p(rd(p,m)+1)\displaystyle=p(rd(p,m)+1)
rd(p,m)(1p)\displaystyle rd(p,m)(1-p) =p\displaystyle=p
r(1p)1(1p)m(1+pm)p(m+1)\displaystyle r(1-p)\cdot\frac{1-(1-p)^{m}(1+pm)}{p(m+1)} =p\displaystyle=p
(1p)1(1p)m(1+pm)p2\displaystyle(1-p)\cdot\frac{1-(1-p)^{m}(1+pm)}{p^{2}} =m+1r\displaystyle=\frac{m+1}{r}

We only need an upper bound on pp, not to solve this equation for general mm. Implicit differentiation shows that pp is increasing in mm when the above equation is satisfied. =Given this fact, an upper bound for pp is simply the limit as mm\to\infty, which is:

1pp2\displaystyle\frac{1-p}{p^{2}} =1s\displaystyle=\frac{1}{s}
sspp2\displaystyle s-sp-p^{2} =0\displaystyle=0
p\displaystyle p =4s+s2s2\displaystyle=\frac{\sqrt{4s+s^{2}}-s}{2}