11email: {sarafa,karlin,jamiemmt}@cs.washington.edu
Competition Alleviates Present Bias in Task Completion
Abstract
We build upon recent work (Kleinberg and Oren, 2014; Kleinberg et al., 2016, 2017) that considers present biased agents, who place more weight on costs they must incur now than costs they will incur in the future. They consider a graph theoretic model where agents must complete a task and show that present biased agents can take exponentially more expensive paths than optimal. We propose a theoretical model that adds competition into the mix – two agents compete to finish a task first. We show that, in a wide range of settings, a small amount of competition can alleviate the harms of present bias. This can help explain why biased agents may not perform so poorly in naturally competitive settings, and can guide task designers on how to protect present biased agents from harm. Our work thus paints a more positive picture than much of the existing literature on present bias.
Keywords:
present bias behavioral economics incentive design1 Introduction
One of the most influential lines of recent economic research has been behavioral game theory (Ariely and Jones, 2008; Kahneman and Tversky, 1979). The majority of economics research makes several idealized assumptions about the behavior of rational agents to prove mathematical results. Behavioral game theory questions these assumptions and proposes models of agent behavior that more closely align with human behavior. Through experimental research (DellaVigna and Malmendier, 2006; DellaVigna, 2009), behavioral economists have observed and codified several common types of cognitive biases, from loss aversion (Kahneman and Tversky, 1979) (the tendency to prefer avoiding loss to acquiring equivalent gains) to the sunk cost fallacy (Arkes and Blumer, 1985) (the tendency to factor in previous costs when determining the best future course of action) to present bias (Frederick et al., 2002) (the current topic). One primary goal of theorems in game theory is to offer predictive power. This perspective is especially important in the many computer science applications of these results, from modern ad auctions to cryptocurrency protocols. If these theorems are to predict human behavior, the mathematical models ought to include observed human biases. Thus, rather than viewing behavioral game theory as conflicting with the standard mathematical approach, the experimental results of behavioral game theory can inform more sophisticated mathematical models. This paper takes a step towards this goal, building on seminal work of Kleinberg and Oren (2014) and Kleinberg et al. (2016, 2017), who formulated a mathematical model for planning problems where agents are present biased.
Present bias refers to overweighting immediate costs relative to future costs. This is a ubiquitous bias in human behavior that explains diverse phenomena. The most natural example is procrastination, the familiar desire to delay difficult work, even when this predictably leads to negative consequences later. Present bias can also model the tendency of firms to prefer immediate gains to long-term gains and the tendency of politicians to prefer immediate results to long-term plans. One simple model of present bias(Kleinberg and Oren, 2014; Kleinberg et al., 2016, 2017) is to multiply costs in the current time period by present bias parameter when making plans. This model is a special case of hyperbolic discounting, where costs are discounted in proportion to how much later one would experience them. But even this special case suffices to induces time-inconsistency, resulting in a rich set of strategies consistent with human behavior.
Examples of time inconsistent behavior extend beyond procrastination. For example, one might undertake a project, and abandon it partway through, despite the underlying cost structure remaining unchanged. One might fail to complete a course with no deadlines, but pass the same course with weekly deadlines. Many people pay for a gym membership but never use it. Kleinberg and Oren (2014) presented the key insight that this diverse range of phenomena can all be expressed in a single graph-theoretic framework, which we describe below.
Fix a directed, acyclic graph , with designated source and sink . Refer to as a task graph, where is the start of the task and the end. A path through this graph corresponds to a plan to complete the task; each edge represents one step of the plan. Each edge has a weight corresponding to the cost of that step.
The goal of an agent is to complete the task while incurring the least cost (i.e., to take the cheapest path from to ). An optimal agent will simply follow such a cheapest path. A naive present biased agent with bias parameter behaves as follows. At , they compute their perceived cost for taking each path to by summing the weights along this path with the first edge scaled up by . They choose the path with the lowest perceived cost and take one step along this path, say to , and recompute their perceived cost along each the path from to . Notice that such an agent may choose a path at , take one edge along that path, and then deviate away from it. This occurs because the agent believes that, after the current choice of edge, they will pick the path from to with lowest true cost. But once they arrive at , their perceived cost of a path differs from the true cost, and they pick a path with lowest perceived cost. This is why the agents are considered naive: they incorrectly assume their future self will behave optimally, and thus exhibit time-inconsistent behavior. See Figure 1 for an example.
The power of this graph theoretic model is that it allows us to answer questions over a range of planning problems, and to formally investigate which tasks represent the “worst-case” cost of procrastination. This is useful both to understand how present-biased behavior differs from optimal behavior and to design tasks to accommodate present bias. We now briefly summarize the existing literature, to motivate our introduction of competition to the model.
1.1 Prior Work
The most striking result is that there are graphs where the cost ratio (the ratio of the optimal agent’s cost to the biased agent’s cost) is exponential in the size of the graph. In addition, all graphs with exponential cost ratio have a shared structure – they all have a large -fan as a graph minor (and graphs without exponential cost ratio do not) (Kleinberg and Oren, 2014; Tang et al., 2017). So this structure encodes the worst-case behavior for present bias in the standard model (and we later show how competition is especially effective in this graph). An -fan is pictured in Figure 2.
The exponential cost ratio demonstrates the severe harm caused by present bias. How, then, can designers of a task limit the negative effects of present bias? Kleinberg and Oren (2014) propose a model where a reward is given after finishing the task, and where the agent will abandon the task if at any point, they perceive the remaining cost to be higher than the reward. Unlike an optimal agent, a biased agent may abandon a task partway through. Figure 3 uses the gym membership example from earlier to show this. As a result, they give the task designer the power to arbitrarily delete vertices and edges, which can model deadlines, as Figure 4 shows. They then investigate the structure of minimally motivating subgraphs – the smallest subgraph where the agent completes the task, for some fixed reward. Follow-up work of Tang et al. (2017) shows that finding any motivating subgraph is NP-hard. Instead of deleting edges, Albers and Kraft (2019) consider the problem of spreading a fixed reward onto arbitrary vertices to motivate an agent, and find that this too is NP-hard (with a constrained budget). For other recent work involving present bias, see (Gravin et al., 2016; Albers and Kraft, 2017; Yan and Yong, 2019; Oren and Soker, 2019; Ma et al., 2019).
The above results all focus on accommodating present bias rather than alleviating it. By that, we mean that the approaches all focus on whether the agent can be convinced to complete the task – via edge deletion or reward dispersal – but not on guarding the agent from suboptimal behavior induced by their bias. Kleinberg et al. (2016) partially investigates the latter question in a model involving sophisticated agents, who plan around their present bias. They consider several types of commitment devices – tools by which sophisticated agents can constrain their future selves. However, these tools may require more powerful agents or designers and don’t necessarily make sense for naive agents. We take a different approach – we show that adding competition can simultaneously explain why present-biased agents may not perform exponentially poorly in “natural” games and guide task designers in encouraging biased agents towards optimal behavior.
1.2 Our Model
In our model, a task is still represented by a directed, acyclic graph , with a designated source and sink . There are two naive present-biased agents, and , both with bias , who compete to get to first. The cost of a path is the sum of the weights along the path, and time is represented by the number of edges in the path, which we call the length of the path. In other words, each edge represents one unit of time. The first agent to get to gets a reward of ; ties are resolved by evenly splitting the reward. Recall that naive agents believe that they will behave optimally in the future. Thus, an agent currently at considers the cost to reach the target to be plus the cost of the optimal path from to minus the reward of that path. More formally, let denote the set of paths from and let denote the path the agent has taken to . Let denote the remaining cost that the naive agent believes they will incur while at and planning to go to . The subscript stands for “naive” (to help distinguish from , the cost of the edge ). Then:
(1) |
where denotes the cost of path and denotes the reward of taking path from to . This reward depends on the path the other agent takes. Specifically, if takes a path of length , and is a path of length , then
We will often rewrite the second term in (1), for ease of notation, as . We sometimes refer instead to the naive agent’s utility, which is the negation of this cost. Given this cost function, the naive agent chooses the successor of node via
See Figure 5 for an example.
We now consider how this model of competition might both explain the outcomes of natural games and inform task designers on how to elicit better behavior from biased agents. For a natural game, consider the classic example of two companies competing to expand into a new market. Both companies want to launch a similar product, and are thus considering the same task graph . The companies are also present biased, since shareholders often prefer immediate profit maximization/loss minimization over long term optimal behavior. The first company to enter the market gains an insurmountable advantage, represented by reward . If the companies both enter the market at the same time, they split the market share, each getting reward . This arrangement can be modeled within our framework, and the competition between the companies should lead them to play a set of equilibrium strategies.
For a designed game, consider the problem of encouraging students to submit final projects before they are due. The instructor sets a deadline near the end of finals week to give students flexibility to complete the project when it best fits their schedule. The instructor also knows that (1) students tend to procrastinate and (2) trying to complete the final project in a few days is much more challenging than spreading it out. They would like to convince students to work on and possibly submit their assignments early, without changing the deadline (to allow flexibility for the students whom it suits best). One possible solution would be to give a small amount of extra credit to the first submission. How might they set this reward to encourage early submissions?
For another example of a designed game, consider a gym that enjoys increased enrollment at the start of the year. However, their new customers tend to not visit the gym frequently, and eventually cancel their membership. To remedy this, they try a new promotion, where new customers are offered free fitness classes, and the first customer(s) to attend 5 such classes gets a discount on their annual membership. How effective might such a scheme be at encouraging their new customers to regularly use their gym? Across these three examples, the intuition is that competition will alleviate the harms of present bias by driving agents towards optimal behavior.
1.3 Summary of Results
We have introduced a model of competition for completing tasks along some graph. We warm up by analyzing these games absent present bias. Namely, we classify all Nash equilibria for an arbitrary task graph with unbiased agents. This analysis involves characterizing the set of paths which are the cheapest of a given length.
We then analyze these games when agents have equal present bias. We show that a very small reward induces a Nash equilibrium on the optimal path, for any graph with a dominant path. This is a substantial improvement over the exponential worst case cost ratio experienced without competition. We then discuss how time-inconsistency defeats the intuition that higher rewards cause agents to prefer quicker paths. Despite this complication, we provide an algorithm that, given arbitrary graph and path , determines the minimum reward needed to get a Nash equilibrium on , if possible.
Finally, we add an element of bias uncertainty to the model, by drawing agents’ biases iid from distribution and, for the -fan, describe the relationship between and the reward required for a Bayes-Nash equilibrium on the optimal path. For a wide range of distributions, we find small rewards suffice to ensure that agents behave optimally (with high probability) in equilibrium. For the stronger goal of ensuring a constant expected cost ratio, it suffices to offer reward linear in when is not heavy-tailed; competition thus helps here as well.
2 Nash Equilibria with Unbiased Competitors
To build intuition, we first describe the Nash equilibria of these games when agents have no present bias. We also pinpoint where the analysis will change with the introduction of bias. Notice that each path in the graph is a strategy, with payoffs either , or , depending on whether the agent wins, ties or loses, respectively. (These in turn depend on the path taken by the opponent.) We first rule out dominated paths. Notice that if , path is dominated by path , regardless of the path taken by the opponent. Also, if and (where is the number of edges in ), then is dominated. Therefore, for any length , a single cheapest path of length will (weakly) dominate all other paths of length .
For a given graph , let be a minimal set of non-dominated paths, where for each . Thus, is the quickest path, the remaining path of minimum length. Summarizing what we know about these paths:
-
1.
Winning is better than losing: for any pair of paths , we know that . Thus, in particular, .
-
2.
Longer paths are more rewarding: That is, for each . Otherwise, would be dominated since its length is greater. Therefore, in particular, is the cheapest path, i.e., the lowest cost/weight path from to .
We’re interested in characterizing, across all possible task graphs, the pure Nash equilibria, restricting attention to paths in .
Proposition 1
Let be an arbitrary task graph. As above, let be a minimal set of (non-dominated) paths ordered so is the quickest and the cheapest. Suppose . Then, path , where , is a symmetric Nash equilibrium if and only if . is a symmetric Nash if and only if . There are no other pure Nash equilibria. Therefore, there can be either 0, 1 or 2 pure Nash equilibria.
If , there is an additional asymmetric pure Nash equilibrium where one player plays the quickest path and the other plays the cheapest path if .
Proof
First, suppose and the opponent picks , where . If the agent plays , they get . Since winning is always better than losing, the agent need not consider any . Similarly, since longer paths are cheaper, the agent will not take any . Thus, the only choices are between tying on or winning on , so is a symmetric Nash exactly when , or equivalently . Similarly, if the opponent plays , the agent chooses between tying on or losing; in the latter case, losing on the cheapest path gives highest utility. The agents therefore have a symmetric Nash exactly when tying on is better than losing on , or .
The only strategy profile we haven’t considered is ; when might this be a Nash equilibrium? Suppose the opponent plays . Either , and so the best response is or , or , and so the best response is or . If , then and so is not an equilibrium. Otherwise, if , then and so we get an asymmetric Nash when tying is dominated, i.e. and . This is the same as . ∎
If we apply the same reasoning when the tie-breaking rule is different, for example, when ties are resolved by both players getting the full reward, any is a (symmetric) pure Nash (and there are no other pure Nash). On the other hand, if ties result in neither player receiving any reward, the same reasoning implies that we only have a pure (asymmetric) Nash if ; if , there are no pure Nash.
We next turn our attention to the biased version of this problem. In the unbiased case, we could take a “global” view of the graph, and think about paths purely in terms of their overall length and cost. But when agents are biased, the actual structure of the path is very important; time-inconsistency means that agents look at paths locally, not globally. It is thus very difficult to cleanly rule out dominated paths – even paths with exponentially high cost may be taken, as we see next.
3 Nash Equilibria to Elicit Optimal Behavior from Biased Agents
We now assume that the agents are both naive, present biased agents, with shared bias parameter .111 The homogenity of the agents is not particularly important to our results in this section. If the agents have different bias parameters, our results go through by setting equal to the larger of the two biases. Our high-level goal is to show that competition convinces biased agents to take cheap paths, as unbiased agents do without competition. To this end, we show that a small amount of reward creates a Nash equilibrium on the cheapest path, first on the -fan, and then for all graphs which have a dominant path – a cheapest path that is also the uniquely quickest path.
3.1 The -fan
We focus first on the -fan since all graphs with exponential cost ratio for naive agents have a large -fan as a graph minor (Kleinberg and Oren, 2014). For an example of the -fan, refer back to Figure 2; note in particular that we assume the bias parameter . We show that, with a very small amount of reward, competition encourages optimal behavior in naive agents (and further, we fully characterize all equilibria on the -fan). To aid our proofs and discussions regarding the -fan, we define as the path from to containing edge . Define as the direct path .
Theorem 3.1
Let be an -fan, and for any bias , define . Then, with a reward of , there will be a Nash equilibrium on the optimal path, for two agents with bias . Further, if , there is also a Nash equilibrium on the longest path. There are no other pure Nash equilibria, for any value of .
Proof
We first show that guarantees the existence (not uniqueness) of an optimal pure Nash equilibrium. Suppose takes the optimal path . While standing at , evaluates the cost to as , i.e. . Similarly, , so with , (weakly) prefers to go directly to . Using a similar calculation, we can see that when , there is a Nash equilibrium on the longest path, . This result is somewhat surprising – even with an extremely high reward, the naive agents may take the longest path. To see why, note that at any vertex , the naive agent is only comparing two options, going to right now or waiting one step and going to (since they naively believe they will behave optimally in the future). Further, when takes the longest path, gets the same reward from and . Thus, just as the agent prefers to procrastinate in the -fan, they similarly procrastinate here until , where the reward must be very large for them to deviate from the longest path.
To show that there are no other Nash equilibria, suppose takes , where . As we explained above, since gets the same reward from paths to and the same reward from paths to , the agent will always procrastinate to at least , and if he chooses to go past , he will procrastinate until . Now, suppose is standing at . He will take path if , which implies that . Either the reward satisfies this, which shows that is not a Nash equilibrium, or , and continues to . Suppose he continues to . Then, he (weakly) prefers if , which implies that . However, , so does not satisfy this. Thus, prefers to procrastinate until . In summary, if takes path , either prefers or (depending on the reward). The same argument applies to , so there are no Nash equilibria on paths , regardless of . ∎
The ability to enable optimal behavior with just a constant reward stands in striking contrast to the setting where agents can abandon the task at any time. With neither competition nor internal rewards, the designer would have to place an exponentially high reward on (i.e. ) for biased agents to finish the task.222However, with internal edge rewards, the designer could spend the same total reward to motivate two agents by directly adding a reward to the edge. This equivalence doesn’t hold in all graphs, as we show in the next section.
3.2 Graphs with an Unbiased Dominant Strategy
We now generalize our results to a larger category of graphs. To focus exclusively on the irrationality of present bias rather than the optimization problem of choosing between cheap, long paths and short, expensive paths, we focus on graphs with a dominant path – a cheapest path333There may be other cheapest paths which are longer. that is also the uniquely quickest path. This is the case in the -fan. In this setting, the problem is trivial for unbiased agents; simply take this dominant path. But for biased agents, the problem is still interesting; as the example of the -fan shows, they may take paths that are exponentially more costly than the dominant path. However, we prove that a small amount of competition and reward suffices to ensure the existence of a Nash equilibrium where both agents take the dominant path.
Theorem 3.2
Suppose is a task graph that has a dominant path, . Then, a reward of guarantees a Nash equilibrium on , for two agents with bias .
Proof
Assume that takes . Recall that a biased agent perceives the remaining traversal cost of going from to as:
We know that for any vertex on the dominant path, the path that minimizes the second term is just the fragment of the dominant path from (it is both the quickest and cheapest way to get from to ). Further, any deviation from the dominant path results in no reward. So, for any not on the dominant path, the path that minimizes the second term is again the cheapest path from . Thus, the cost equation simplifies to:
where , the distance from , denotes the cost of the cheapest path from to (ignoring rewards) and is simply an indicator variable that’s if the agent has not deviated from the dominant path.
Now, let be the dominant path and suppose takes this path. In order for to choose , we require, for all :
Now, let be arbitrary, let be an arbitrary neighbor of , and for ease of notation, let , and . Then, we get the following bound on the reward: . To get a rough sufficient bound, notice that , since is the cheapest path. This implies that . Thus, it suffices to set in order to ensure . Repeating this argument for all , we see that a sufficient reward is . ∎
One might object to our claim that is “small”. To calibrate our expectations, notice that we can view this problem as an agent trying to pick between several options (i.e. paths), each with their own reward and cost structure. We want to convince the agent to pick one particular option – namely, the cheapest path. But it would be unreasonable to expect that a reward significantly smaller than the cost of any option would sway the agent’s decision. Our theorem above shows that a reward that is at most proportional to the cheapest option suffices; and in many cases the reward is only a fraction of the cost of the optimal path (e.g. when the optimal path has balanced cost among many edges).
For a point of comparison, even internal edge rewards (which required the same reward budget as competitive rewards for the -fan) can require times as much reward in some instances. To see the intuition, notice that internal edge rewards must be applied at every step where the agent might want to deviate. The agent also immediately “consumes” this reward; it doesn’t impact his future decision making. However, the competitive reward is “at stake” whenever the agent considers deviating; this reward can sway the agent’s behavior without being immediately consumed. For a concrete example, see Figure 6.
3.3 Increasing the Number of Competitors
A very natural extension to this model would involve more than 2 agents competing. The winner takes all, and ties are split evenly among those who tied. However, this modification doesn’t change much when trying to get a Nash equilibrium on the dominant path. The only change is that if agents are competing, the reward needed is , as a single agent will get a fraction of the reward in a symmetric equilibrium. This is true because there is no way for any agent to beat the dominant path, and claim the entire reward for themselves. So if the reward is scaled appropriately (i.e. in Theorem 3.2 set ), we will still guarantee a Nash equilibrium. Put another way, the per-agent reward needed for a Nash equilibrium on the dominant path does not change as the number of competitors varies.
4 General Nash Equilibria
In this section, we describe a polynomial time algorithm that, given an arbitrary graph , path and bias , determines if can be made a Nash equilibrium, and if so, the minimum required reward to do so. Finding and using this minimum required reward will generally cost much less than the bound given by Theorem 3.2. Moreover, this algorithm does not assume the existence of a dominant path. We start by describing how time-inconsistency defeats the intuition that higher rewards cause agents to prefer quicker paths. We then present an algorithm which computes the minimum that results in being a symmetric Nash equilibrium.444In fact, the algorithm will return a set of disjoint intervals containing all values of that result in being a symmetric Nash equilibrium.
4.1 Higher Rewards Need Not Encourage Quicker Paths
The proof of Theorem 3.2 suggests the following algorithm for this problem. Start with a reward of , and step along each vertex , increasing the reward by just enough to ensure stays on for one more step (assuming is taking ). However, if wants to deviate onto a quicker/tied path at any point, return ; decreasing the reward would cause them to deviate earlier, and, intuitively, it seems that increasing the reward could not cause them to switch back to from the quicker/tied path. After one pass, simply pass through again to ensure that the final reward doesn’t cause to deviate early on. The following lemma shows that this algorithm is tractable, by showing that we can compute the minimum reward required for to stay on at any step (and determine whether wants to deviate onto a quicker path).
Lemma 1
Assuming that takes path , can efficiently compute by considering the cheapest path (from ), the cheapest path where ties , and the cheapest path where beats . (Some of these paths may coincide, and at least one must exist).
Proof
If assumes that takes path , then the reward is simply if the path under consideration wins, if it ties, and if it loses. Thus, the minimizer must be the cheapest paths in one of these three cases (and the cheapest path always exists; the other two may not). Specifically, let be the number of edges from on , and let denote the cost of the cheapest paths with any number of edges, at most edges, and strictly less than edges respectively. Then, . These paths can be efficiently computed using the Bellman-Ford algorithm to determine the cheapest path with at most edges for any . ∎
Unfortunately, while tractable, the approach described above does not yield a correct algorithm. This is because it relies implicitly on the following two properties, which formalize the intuition that increasing the reward causes agents to favor quicker paths.
Property 1
If a reward guarantees a Nash equilibrium on some path , any reward will either (a), still result in a Nash equilibrium on , or (b), cause an agent to deviate to a quicker path .
Property 2
If deviates from onto a quicker/tied path for some reward , increasing the reward will not cause them to follow .
Both properties are intuitive – if we increase the reward, that should motivate the agent to take a path that beats their opponent. And vice versa – increasing the reward shouldn’t cause them to shift onto a slower path or shift between equal length paths. But surprisingly, both properties are false, as the following examples demonstrate.
For 1, consider the graph in Figure 7(a) and define paths , , and . Suppose both agents have bias and that takes . Then a reward of guarantees that takes , as the optimal path from would follow . However, if , the optimal path from follows . So, and so the agent goes to . But at , the perceived cost of is actually , and so the agent prefers to take path . Thus, increasing the reward from to causes the agent to deviate from onto a slower path!
For 2, consider the graph in Figure 7(b) and define paths , and in the obvious manner. Again, suppose both agents have bias and that takes . Then, with a reward of , the optimal path from would follow . So, , and goes to and thus takes . If , the optimal path from still follows . But now . Thus, the agent goes to . And here, with , deviating to is more attractive than remaining on , and thus the agent takes . So, although deviates from to a quicker path for reward , they remain on with reward .
To summarize, 1 fails because present biased agents can take slower paths than they planned and 2 fails because present biased agents can take quicker paths than they planned. In other words, while higher rewards do tempt agents to take quicker paths, and lower rewards tempt agents towards cheaper paths, their time inconsistency may make them do the opposite.
4.2 The Algorithm
We now present an algorithm which finds the set of rewards which induce a symmetric Nash equilibrium on a path (Algorithm 1). At a high level, the algorithm narrows down the set of feasible rewards (rewards that ensure that is a Nash equilibria) by computing the set of rewards that ensure that takes for every . The key idea is that we can efficiently compute all that ensures that prefers over by splitting into cases based on whether the optimal paths from and involve winning, tying, or losing. We now prove the correctness and runtime of the algorithm.
Theorem 4.1
Algorithm 1 returns the minimum that ensures that is a Nash equilibrium, or if no such exists. Further, it runs in polynomial time.
Proof
Assume that takes . The correctness of the algorithm follows from this lemma:
Lemma 2
Let and be arbitrary. contains the set of all rewards that ensure that prefers over when standing at .
Proof
Suppose that the length of the path from to in is . From Lemma 1, we know that . Note that represents the cheapest way to win, the cheapest way to win or tie, and the cheapest way to win, tie, or lose (i.e. the cheapest path). So, . , and are defined similarly with respect to . Without loss of generality, assume that all these values exist (i.e. that one can win, tie, or lose from and ); if this isn’t the case, then all values which rely on them can be safely removed.
Now, define . This function is simply the piecewise combination of three linear functions; and represent the switching points. Define and similarly. Let be the pairwise intersection555To obtain the pairwise intersection of two sets of intervals, simply take the Cartesian product of the two sets to obtain pairs, and then take the intersection within each pair to obtain a new set of intervals. . Then, notice that for any , has a closed form over . Since prefers over if , this expression can be solved for in the interval , yielding a single (sub)interval. Let denote this subinterval. Since is a partition of all possible , this means that , the union of all intervals , will contain all which ensure that prefers to . ∎
With this lemma, the proof of correctness is simple. if and only if is in all , which is what tracks. And thus sticks with if and only if is in all , which is exactly what tracks.
To prove that the algorithm is efficient, notice that contains at most intervals since . Further, the pairwise intersection of two sets of intervals can be computed in time and is of size , where and are the size of the input sets. Thus, all the interval intersections are efficient; in general, the runtime of the algorithm is dominated by the Bellman-Ford subroutine. ∎
5 Extending the Model with Bias Uncertainty and Multiple Competitors
One of the shortcomings of the prior results is that agents are assumed to have publicly known, identical biases. Neither identical nor known biases are very realistic assumptions. In general, it seems odd for the agents to know the exact present bias of their opponent, or for the agents to always have the same bias. From a design perspective, the task designer might not have full knowledge of the agents’ biases, and so may be unsure how to set the reward. We therefore consider this game played by two agents whose biases are uncertain. The agents’ biases are represented by random variables and drawn iid from distribution , which is publicly known to both the agents and the designer. and correspond to the realizations of these random variables. Our goal is now to construct, as cheaply as possible, Bayes-Nash equilibria (BNE) where agents behave optimally with high probability. In this setting, the cost equation becomes , where the expectation is over ’s choice of paths.
In this section, we provide a closed form BNE for the -fan. We start with the case of two agents and then consider competing agents. Since we are searching for BNE, we assume that the agents know their competitor’s strategy.
5.1 Bayes-Nash Equilibria on the -fan
As before, let represent the path that includes edge , and let represent the optimal path. Let denote the probability that takes any path in the set . 666For convenience, we write instead of for singleton sets. The probability is over . Let denote the set of paths . Further, note that for any vertex , . This is because going directly to will always be the cheapest and quickest path from to . Thus:
Further, . With this in mind, we start with a basic claim:
Proposition 2
When standing at vertex , prefers over if:
Proof
If ’s expected utility from going immediately to is higher than their utility from procrastinating, then
It follows that | ||||
∎
This claim matches intuition; due to the tying mechanism, gets a reward higher by going to from if either takes or . Note that is comparing to because the latter describes what he (naively) believes he will do if he procrastinates. We now describe a Bayes-Nash equilibrium in the -fan.
Theorem 5.1
Let be an -fan with reward and suppose are drawn from distribution with CDF . Let be the solution to . If , then the following strategy is a Bayes-Nash equilibrium:
In this equilibrium, for either agent, the probability that they take is . So the expected cost ratio will be .
Proof
Using Proposition 2, at , prefers over if . Since , this simplifies to . Let . Then, since the biases are drawn iid, for this strategy to be a symmetric BNE we need:
So if satisfies this equation, by Proposition 2, prefers to go from to directly. Now, consider the case where . Then, prefers to go to . Again using Proposition 2, at , will prefer over if , which implies that . Since we know that , it’s certainly the case that , so this never holds. Thus, prefers to go to . The same argument can be repeated until the agent reaches . Then, the agent will prefer over if , i.e. if .
Combining our assumption that with this inequality, we get that must be less than . When this is false, the agent will always prefer when their bias . So, under these conditions, the given strategy is a BNE. ∎
Notice that while the trivial solution satisfies , this is not above , so the trivial solution is not relevant for finding Bayes-Nash equilibria. And while it’s possible (for some distributions and very low rewards ) for the non-trivial solution to to be less than , for the distributions we have explored this is not the case. One might wonder if there’s a straightforward generalization of this BNE to other graphs with a dominant path, as in the case without bias uncertainty. In Appendix 0.B, we discuss challenges that we encountered trying to do this.
We now use the theorem to understand how much reward is required for optimal behavior with high probability, or a low expected cost ratio (which is a much stronger requirement). Since the expected cost is , in order for this to be low, has to be close to . Plugging this in to the CDF, we see that for this to happen, we must have
which essentially requires that exponentially little probability mass (in ) remains after distance from . For an exponential distribution, this requires to be linear in , and with a heavier tailed distribution like the Equal Revenue distribution, this requires to be exponential in . But we may be content with simply guaranteeing optimal behavior with high probability. In that case, so long as is increasing in , the agents will take the optimal path with high probability for at least the equal revenue, exponential, and uniform distributions. We more precisely explore the probability of optimal behavior and the cost ratios for these distributions in Appendix 0.A.
5.2 Increasing Number of Competitors
We saw earlier that increasing the number of competitors doesn’t change the per-agent reward needed for optimal behavior. But one might hope that in Bayesian settings such as bias uncertainty, increasing the number of agents significantly decreases the per-agent reward needed to encourage optimal behavior – in particular, as the number of agents increases, the probability of some agent having a very low bias increases. We provide evidence against this claim. We start by showing that the following strategy is a Bayes-Nash equilibrium with agents.
Theorem 5.2
Let be an -fan with reward and suppose there are competitors with biases drawn iid from distribution with CDF . Let be the solution to , where:
If , the following strategy is a Bayes-Nash equilibrium where the probability of optimal behavior for any agent is :
Proof
See Appendix 0.C.
Now, we consider the equal revenue distribution. In the case of two agents, we required exponentially high reward to get a constant expected cost ratio. Unfortunately, this remains true even as the number of agents competing goes to infinity.
Theorem 5.3
Suppose the biases are drawn from an equal revenue distribution shifted over by . Let denote the per agent price, so . Then, no matter how many agents are competing, for any fixed agent, the probability that they behave optimally is at most .
Proof
See Appendix 0.C.
Earlier results show that the probability of optimal behavior with two competing agents is , which is nearly identical to the theorem’s bound as grows.777To see this, note that can be written as and . Further, as grows, since . So, both probabilities are . So even as , the relationship between the reward and the probability of optimal behavior doesn’t significantly change (and thus the relationship between the reward and expected cost ratio won’t significantly change). We conjecture that, in general, increasing the number of agents does not significantly decrease the per-agent reward.
6 Conclusion
We studied the impact of competition on present bias, showing that in many settings where naive agents can experience exponentially high cost ratio, a small amount of competition drives agents to optimal behavior. This paper is a first step towards painting a more optimistic picture than much of the work surrounding present bias. Our results highlight why, in naturally competitive settings, otherwise biased agents might behave optimally. Further, task/mechanism designers can use our results to directly alleviate the harms of present bias. This competitive model might be a more natural model than other motivation schemes, such as internal edge rewards, and is able to more cheaply ensure optimal behavior. Our work also leaves open many exciting questions.
First, with bias uncertainty, we only obtain concrete results on the -fan. So one obvious direction is to determine which graphs have Bayes-Nash equilibria on the optimal path, and what these equilibria look like. Second, we explore two “dimensions” of competition – the amount of reward and the number of competitors, finding that the latter is unlikely to be significant. Another interesting goal is thus to explore new dimensions of competition.
Lastly, we could extend our work beyond cost ratios, moving to the model where agents can abandon their path at any point. For one, this move would allow us to integrate results on sunk-cost bias, represented as an intrinsic cost for abandoning a task that’s proportional to the amount of effort expended. Kleinberg et al. (2017) show that agents who are sophisticated with regard to their present-bias, but naive with respect to their sunk cost bias can experience exponentially high cost before abandoning their traversal (this is especially interesting because sophisticated agents without sunk cost bias behave nearly optimally). Can competition alleviate this exponential worst case? There are also interesting computational questions in this model. For instance, given a fixed reward budget , is it possible to determine in polynomial time if one can induce an equilibrium where both agents traverse the graph? Such problems are NP-hard for other reward models, but may be tractable with competition. Overall, the abandonment setting has several interesting interactions with competition that we have not explored.
References
- Albers and Kraft [2017] S. Albers and D. Kraft. On the value of penalties in time-inconsistent planning. arXiv preprint arXiv:1702.01677, 2017.
- Albers and Kraft [2019] S. Albers and D. Kraft. Motivating time-inconsistent agents: A computational approach. Theory of computing systems, 63(3):466–487, 2019.
- Ariely and Jones [2008] D. Ariely and S. Jones. Predictably irrational. Harper Audio New York, NY, 2008.
- Arkes and Blumer [1985] H. R. Arkes and C. Blumer. The psychology of sunk cost. Organizational behavior and human decision processes, 35(1):124–140, 1985.
- DellaVigna [2009] S. DellaVigna. Psychology and economics: Evidence from the field. Journal of Economic literature, 47(2):315–72, 2009.
- DellaVigna and Malmendier [2006] S. DellaVigna and U. Malmendier. Paying not to go to the gym. american economic Review, 96(3):694–719, 2006.
- Frederick et al. [2002] S. Frederick, G. Loewenstein, and T. O’donoghue. Time discounting and time preference: A critical review. Journal of economic literature, 40(2):351–401, 2002.
- Gravin et al. [2016] N. Gravin, N. Immorlica, B. Lucier, and E. Pountourakis. Procrastination with variable present bias. arXiv preprint arXiv:1606.03062, 2016.
- Kahneman and Tversky [1979] D. Kahneman and A. Tversky. Prospect theory: An analysis of decision under risk. Econometrica, 47(2):263–292, 1979.
- Kleinberg and Oren [2014] J. Kleinberg and S. Oren. Time-inconsistent planning: a computational problem in behavioral economics. In Proceedings of the fifteenth ACM conference on Economics and computation, pages 547–564, 2014.
- Kleinberg et al. [2016] J. Kleinberg, S. Oren, and M. Raghavan. Planning problems for sophisticated agents with present bias. In Proceedings of the 2016 ACM Conference on Economics and Computation, pages 343–360, 2016.
- Kleinberg et al. [2017] J. Kleinberg, S. Oren, and M. Raghavan. Planning with multiple biases. In Proceedings of the 2017 ACM Conference on Economics and Computation, pages 567–584, 2017.
- Ma et al. [2019] H. Ma, R. Meir, D. C. Parkes, and E. Wu-Yan. Penalty bidding mechanisms for allocating resources and overcoming present bias. arXiv preprint arXiv:1906.09713, 2019.
- Oren and Soker [2019] S. Oren and D. Soker. Principal-agent problems with present-biased agents. In International Symposium on Algorithmic Game Theory, pages 237–251. Springer, 2019.
- Roughgarden [2016] T. Roughgarden. Cs269i: Incentives in computer science lecture# 19: Time-inconsistent planning. 2016.
- Tang et al. [2017] P. Tang, Y. Teng, Z. Wang, S. Xiao, and Y. Xu. Computational issues in time-inconsistent planning. In Thirty-First AAAI Conference on Artificial Intelligence, 2017.
- Yan and Yong [2019] W. Yan and J. Yong. Time-inconsistent optimal control problems and related issues. In Modeling, Stochastic Control, Optimization, and Applications, pages 533–569. Springer, 2019.
Appendix 0.A Concrete Distributions for Bias Uncertainty
In this section, we more precisely describe the relationship between and (the probability of optimal behavior in the BNE from Theorem 5.1) for several distributions.
Lemma 3
Suppose and are drawn from an equal revenue distribution that’s shifted over by (so ). Then, for any , the probability of optimal behavior in the BNE from Theorem 5.1 is , and thus the expected cost is .
Proof
Because this distribution is heavy tailed, the expected cost remains exponential unless is exponential. However, if is any increasing function of , then w.h.p. both agents will take the optimal path. Now, we look at the exponential distribution.
Lemma 4
Suppose and are drawn from an exponential distribution, , that’s shifted over by (so ). Then, for any , the probability of optimal behavior in the BNE from Theorem 5.1 is at least , and thus the expected cost is .
Proof
We first solve for in Theorem 5.1:
Where refers to the Lambert function, i.e. . The Taylor series of around is:
For our application, notice that is very small; thus, we let . So:
∎
This means that if , we get a constant expected cost ratio (and extremely high probability of optimal behavior). We finally consider the simple example of a uniform distribution.
Lemma 5
Suppose and are drawn from the uniform distribution, (so ). Then, for , the probability of optimal behavior in the BNE from Theorem 5.1 is . For all , this probability is zero.
Proof
For the first part of the lemma, note that if , then , so holds if , or equivalently, . If we assume , then using CDF to solve for :
Since , this is satisfied only with . ∎
More generally, for any distribution bounded by , computing the reward that guarantees a Nash equilibrium if both agents had public biases also suffices to guarantee a BNE in this setting (and in particular, this BNE always results in optimal behavior). For the case of the uniform distribution, this is the only reward that ever results in optimal behavior. As a direct consequence of this lemma, the expected cost ratio is with the reward .
Appendix 0.B Difficulties Finding Bayes-Nash Equilibria in Arbitrary Graphs
The simplest way to generalize our BNE from the -fan would be to consider the following schematic:
Where ideally would depend on only the costs in the graph, the reward , and , the solution to . So the agents would either take the optimal path, if their bias was sufficiently small, or some path that can “catch” agents with high bias. In the fan graph, this was the path around the fan. That path happened to be the natural path for any bias, and perhaps more importantly, the “limiting” natural path as (i.e. the greedy/myopic path, where you simply take the lowest cost edge at each step). But neither of these ideas generalize; this simple schematic will not produce a BNE in an arbitrary graph, as the following counter example shows.
Consider the modified -fan below. Let . So the cost of the edges to increase faster than doubling.
We claim the following:
Proposition 3
There is no symmetric BNE for this graph that assigns positive probability to only two paths, assuming the bias is drawn from a distribution with support on .
The main idea behind the proof is to notice first that path and will always be taken, when the bias is very low or very high. We then define the interval of biases (the interval depends on the reward, costs, and probability of optimal behavior) wherein wants to take or and argue that at least one of the intervals must be non-empty.
Proof
Note that as , for any fixed reward, the agent will always take the greedy option of choosing the cheapest edge at every step. This will cause them to take the path . Further if their bias is below , they will take the optimal path, . So, any BNE must assign positive probability to those two paths. We now prove, by contradiction, that it must assign positive probability to either or .
Let be and . At , prefers to procrastinate to when , so suppose this is the case. At , would take if:
So, if , would take . If this interval is non-empty, then we’re done. If it’s not, then:
If the reward is at least this high, then always goes to at . Now, at (where we know ), takes when:
As before, if , would take . We claim that this interval must be non-empty. Suppose it wasn’t, and . Then:
However, this upper bound causes a contradiction – from the definitions, we know that , so clearly , and thus this bound requires . This contradicts our typical requirement that , but even ignoring this requirement, this also contradicts our earlier bound that .
So by contradiction, will either take or with positive probability. ∎
In general, the main challenge with finding a BNE is finding a set of paths and bias intervals such that, for all intervals , must want to traverse if their bias falls in . The complication is that ’s decision to traverse depends on the distribution over ’s choice of path, which is induced by the intervals and the bias distribution. These interweaving constraints are difficult to manage even for simple graphs, as this preliminary result suggests.
Appendix 0.C Adding More Competitors with Bias Uncertainty
Proof (Proof of Theorem 5.2)
Suppose all agents besides follow . Then, at , views procrastinating to as having reward , using the iid assumption and the fact that will only win if everyone else has a bias which, according to , causes them to procrastinate. On the other hand, going directly from to has utility , where is a random variable measuring how many agents take and is the random variable corresponding to the reward. Notice that , where . We now use the following claim:
Claim
Let . Then:
Proof
First, using the definition of the expectation and simplifying:
Now, notice that the binomial theorem says:
Plugging this back in:
∎
From the claim, we see that . Now, prefers to go immediately from to when:
For ease of notation, let . As a sanity check, one can show that is always positive (to show that the expected reward will always be positive).
Claim
Proof
We want:
() | ||||
And the last line holds by again using the fact that . ∎
We’ve shown that when the agent’s bias is less than , they prefer to (assuming everyone else takes with probability ). Now, if their bias is above , we know that they will progress to . We claim that they will surely procrastinate all the way to . To see why, since we are analyzing a Bayes-Nash equilibrium, the agent assumes that all other agents follow the equilibrium strategy, which means that all their competitors take either or . Thus, the rewards from paths to are all identical. So by the construction of the -fan, the agent procrastinates until . If is not extremely low, they will then go to , as the following claim demonstrates.
Claim
If and the agent has bias , then the agent takes path .
Proof
We’ve just shown that if , they go to . From there, going to has utility and going to has utility . So, the agent prefers when:
So, if the agents bias is in the interval , they prefer to deviate from the equilibrium strategy (in Theorem 5.2) by going to . We now compute when this interval is empty:
(Since ) | ||||
(Since ) | ||||
This is thus sufficient for the agent to take path . ∎
So, by the claim, if , the following strategy is a Bayes-Nash equilibrium:
Which is exactly what the theorem states. ∎
Proof (Proof of Theorem 5.3)
Recall that, for the equal revenue distribution shifted by , . Now, we attempt to solve for :
We only need an upper bound on , not to solve this equation for general . Implicit differentiation shows that is increasing in when the above equation is satisfied. =Given this fact, an upper bound for is simply the limit as , which is:
∎