1
Concentration-Bound Analysis for Probabilistic Programs and Probabilistic Recurrence Relations
Abstract.
Analyzing probabilistic programs and randomized algorithms are classical problems in computer science. The first basic problem in the analysis of stochastic processes is to consider the expectation or mean, and another basic problem is to consider concentration bounds, i.e. showing that large deviations from the mean have small probability. Similarly, in the context of probabilistic programs and randomized algorithms, the analysis of expected termination time/running time and their concentration bounds are fundamental problems. In this work, we focus on concentration bounds for probabilistic programs and probabilistic recurrences of randomized algorithms. For probabilistic programs, the basic technique to achieve concentration bounds is to consider martingales and apply the classical Azuma’s inequality (Azuma, 1967). For probabilistic recurrences of randomized algorithms, Karp’s classical “cookbook” method (Karp, 1994), which is similar to the master theorem for recurrences, is the standard approach to obtain concentration bounds. In this work, we propose a novel approach for deriving concentration bounds for probabilistic programs and probabilistic recurrence relations through the synthesis of exponential supermartingales. For probabilistic programs, we present algorithms for synthesis of such supermartingales in several cases. We also show that our approach can derive better concentration bounds than simply applying the classical Azuma’s inequality over various probabilistic programs considered in the literature. For probabilistic recurrences, our approach can derive tighter bounds than the well-established methods of (Karp, 1994) on classical algorithms such as quick sort, quick select, and randomized diameter computation. Moreover, we show that our approach could derive bounds comparable to the optimal bound for quicksort, proposed in (McDiarmid and Hayward, 1996). We also present a prototype implementation that can automatically infer these bounds.
1. Introduction
In this work, we present new methods to obtain concentration bounds for probabilistic programs and probabilistic recurrences. In this section, we start with a brief description of probabilistic programs and recurrences and provide an overview of the basic analysis problems. Then, we discuss previous results and finally present our contributions.
Probabilistic programs and recurrences
The formal analysis of probabilistic models is a fundamental problem that spans across various disciplines, including probability theory and statistics (Durrett, 1996; Howard, 1960; Kemeny et al., 1966; Rabin, 1963; Paz, 1971), randomized algorithms (Motwani and Raghavan, 1995), formal methods (Baier and Katoen, 2008; Kwiatkowska et al., 2011), artificial intelligence (Kaelbling et al., 1996, 1998), and programming languages (Barthe et al., 2020). The analysis of probabilistic programs, i.e. imperative programs extended with random value generators, has received significant attention in the programming languages community (Chakarov and Sankaranarayanan, 2013; Fioriti and Hermanns, 2015; Sankaranarayanan et al., 2013; Esparza et al., 2012; Chatterjee et al., 2018b; Barthe et al., 2018; Barthe et al., 2017; Foster et al., 2016; Cusumano-Towner et al., 2018; Chatterjee et al., 2016; Hark et al., 2020; Olmedo et al., 2016; Dahlqvist and Kozen, 2020). Similarly, the analysis of randomized algorithms is a central and classical problem in theoretical computer science (Motwani and Raghavan, 1995), and a key problem is to analyze the probabilistic recurrence relations arising from randomized algorithms.
Basic analysis problems
In the analysis of probabilistic programs and probabilistic recurrences, one has to analyze the underlying stochastic processes. The first basic problem in analysis of stochastic processes is to consider the expectation or the mean (Williams, 1991a). However, the expectation (or the first moment) does not provide enough information about the probability distribution associated with the process. Hence higher moments (such as variance) are key parameters of interest. Another fundamental problem is to obtain concentration bounds showing that large deviation from the mean has small probability. A key advantage of establishing strong concentration bounds is that it enables us to provide high-probability guarantees (e.g. with high probability the running time does not exceed a desired bound). In the context of probabilistic programs and randomized algorithms, a key quantity of interest is the termination time of programs or the running time associated with probabilistic recurrences. Thus, the analysis of expected termination time/running time and their concentration bounds for probabilistic programs and probabilistic recurrences are fundamental problems in computer science.
Previous results on expectation analysis
The problem of expected termination time analysis has received huge attention. The expected termination time analysis for probabilistic programs has been widely studied with different techniques, e.g. martingale-based techniques (Chakarov and Sankaranarayanan, 2013; Fu and Chatterjee, 2019) and weakest pre-expectation calculus (Kaminski et al., 2016). The analysis of expected running time of probabilistic recurrences is a fundamental problem studied in randomized algorithms (Motwani and Raghavan, 1995), and automated methods for them have also been considered (Chatterjee et al., 2017).
Previous results on concentration bounds
The analysis of concentration bounds is more involved than expectation analysis, both for the general case of stochastic processes, as well as in the context of probabilistic programs and probabilistic recurrences. For probabilistic programs, the only technique in the literature to obtain concentration bounds is to consider either linear or polynomial supermartingales (Chakarov and Sankaranarayanan, 2013; Chatterjee et al., 2016), and then apply the classical Azuma’s inequality (Azuma, 1967) on the supermartingales to obtain concentration bounds. For probabilistic recurrences of randomized algorithms, the standard approach for concentration bounds is Karp’s classical “cookbook” method (Karp, 1994), which is similar to the master theorem for non-probabilistic recurrences. More advanced methods have also been developed for specific algorithms such as QuickSort (McDiarmid and Hayward, 1996; Dubhashi and Panconesi, 2009).
Our contributions
In this work, we consider the problem of concentration bounds for probabilistic programs and probabilistic recurrences and our main contributions are as follows:
-
(1)
First, we propose a novel approach for deriving concentration bounds for probabilistic programs and probabilistic recurrence relations through the synthesis of exponential supermartingales.
-
(2)
For probabilistic programs, we present algorithms for the synthesis of such supermartingales for several cases. We show that our approach can derive better concentration bounds than simply applying the classical Azuma’s inequality, over various probabilistic programs considered in the literature.
-
(3)
For probabilistic recurrences, we show that our approach can derive tighter bounds than the classical methods of the literature on several basic problems. We show that our concentration bounds for probabilistic recurrences associated with classical randomized algorithms such as QuickSelect (which generalizes median selection), QuickSort and randomized diameter computation beat the bounds obtained using methods of (Karp, 1994). Moreover, we show that our approach could derive bounds comparable to the optimal bound for quicksort, proposed in (McDiarmid and Hayward, 1996). We also present a prototype implementation that can automatically infer these bounds.
Novelty
The key novelty of our approach is as follows: Instead of linear or polynomial martingales, we consider exponential martingales, and our concentration bounds are obtained using the standard Markov’s inequality. This is quite a simple technique. The surprising and novel aspect is that with such a simple technique, we can (a) obtain better concentration bounds for probabilistic programs in comparison with the only existing method of applying Azuma’s inequality; and (b) improve the more than two-decades old classical bounds for basic and well-studied randomized algorithms.
2. Preliminaries
Throughout the paper, we denote by , , , and the sets of positive integers, non-negative integers, integers, and real numbers, respectively. We first review several fundamental concepts in probability theory and then illustrate the problems to study.
2.1. Basics of Probability Theory
We provide a short review of some necessary concepts in probability theory. For a more detailed treatment, see (Williams, 1991a).
Probability Distributions
A discrete probability distribution over a countable set is a function such that . The support of is defined as .
Probability Spaces
A probability space is a triple , where is a non-empty set (called the sample space), is a -algebra over (i.e. a collection of subsets of that contains the empty set and is closed under complementation and countable union) and is a probability measure on , i.e. a function such that (i) and (ii) for all set-sequences that are pairwise-disjoint (i.e. whenever ) it holds that . Elements of are called events. An event holds almost-surely (a.s.) if .
Random Variables
A random variable from a probability space is an -measurable function , i.e. a function such that for all , the set belongs to .
Expectation
The expected value of a random variable from a probability space , denoted by , is defined as the Lebesgue integral of w.r.t. , i.e. . The precise definition of Lebesgue integral is somewhat technical and is omitted here (cf. (Williams, 1991a, Chapter 5) for a formal definition). If is countable, then we have .
Filtrations
A filtration of a probability space is an infinite sequence of -algebras over such that for all . Intuitively, a filtration models the information available at any given point of time.
Conditional Expectation
Let be any random variable from a probability space such that . Then, given any -algebra , there exists a random variable (from ), denoted by , such that:
-
(E1)
is -measurable, and
-
(E2)
, and
-
(E3)
for all , we have .
The random variable is called the conditional expectation of given . The random variable is a.s. unique in the sense that if is another random variable satisfying (E1)–(E3), then . We refer to (Williams, 1991a, Chapter 9) for details. Intuitively, is the expectation of , when assuming the information in .
Stochastic Processes
A (discrete-time) stochastic process is a sequence of random variables where ’s are all from some probability space . The process is adapted to a filtration if for all , is -measurable. Intuitively, the random variable models some value at the -th step of the process.
Martingales and Supermartingales
A stochastic process adapted to a filtration is a martingale (resp. supermartingale) if for every , and it holds a.s. that (resp. ). We refer to (Williams, 1991a, Chapter 10) for a deeper treatment.
Intuitively, a martingale (resp. supermartingale) is a discrete-time stochastic process in which for an observer who has seen the values of , the expected value at the next step, i.e. , is equal to (resp. no more than) the last observed value . Also, note that in a martingale, the observed values for do not matter given that In contrast, in a supermartingale, the only requirement is that and hence may depend on Also, note that might contain more information than just the observations of ’s.
Stopping Times
Given a probability space , a stopping time w.r.t a filtration is a random variable such that .
Example 2.1.
Consider an unbiased and discrete random walk, in which we start at a position , and at each second walk one step to either left or right with equal probability. Let denote our position after seconds. It is easy to verify that Hence, this random walk is a martingale. Note that by definition, every martingale is also a supermartingale. As another example, consider the classical gambler’s ruin: a gambler starts with dollars of money and bets continuously until he loses all of his money. If the bets are unfair, i.e. the expected value of his money after a bet is less than its expected value before the bet, then the sequence is a supermartingale. In this case, is the gambler’s total money after bets. On the other hand, if the bets are fair, then is a martingale.
2.2. Termination and Concentration Bounds
In this work, we consider concentration bounds of the termination time of probabilistic programs and recurrence relations. Below, we illustrate the notions of termination time and concentration bound.
Termination Time
The termination-time random variable counts the number of steps a program takes to termination. In the setting of probabilistic programs, the termination-time random variable is defined as the first time the program hits its termination program counter, which is a stopping time. For convenience, we treat each termination-time random variable as the total accumulated cost until the program terminates, for which each execution step of the program causes a single unit of cost. In the setting of probabilistic recurrence relations, since the amount of steps in a preprocessing stage is directly added to the termination time, we treat the number of steps in a preprocessing stage as the cost at current execution step, and define the termination-time variable as the total accumulated cost until the recurrence relation runs into the empty stack (i.e., terminates). In both cases, we consider the termination time random variable as the total accumulated cost until the stopping time of termination.
Concentration Bounds
A concentration bound of a random variable is an inequality of the form or which specifies whether the probability that the value of grows too large or too small is bounded by a function , which tends to zero when (for ) or (for ). Concentration bounds are an important probabilistic property of random variables as they witness that a random variable will have a substantially small probability of large deviation. When applied to termination time, the concentration bound specifies the following property: the probability that a randomized algorithm runs inefficiently, i.e. takes much longer than its expected termination time, is very small. Thus compared with expected termination time, concentration bounds provide a much finer characterization of the running time of randomized algorithms.
Problem Statement
In this work, we consider the following problems: For probabilistic programs, we consider unnested probabilistic while loops. Our aim is to have automated approaches for deriving much tighter exponentially-decreasing concentration bounds in comparison with the existing approach through Azuma or Hoeffding inequalities (Chatterjee et al., 2018b). For probabilistic recurrence relations, our goal is to derive substantially finer bounds compared with the classical results of (Karp, 1994), and for quicksort, we try to derive bounds comparable to the optimal result proposed in (McDiarmid and Hayward, 1996). Obtaining more precise (tighter) concentration bounds is important in many applications, e.g. resource-contention resolution (Kleinberg and Tardos, 2006, Chapter 13), or in resource-constrained environments such as embedded systems.
3. The General Method
In this section, we establish the mathematical foundation for the analysis of concentration bounds, based on which we will later develop novel sound approaches to derive concentration bounds for the termination time arising from probabilistic programs and recurrence relations. As in certain situations, e.g. probabilistic recurrence relations, the termination time is defined as the total cost accumulated at every execution step, we consider concentration bounds for a general non-negative random variable. First, we introduce the basic result through which we derive our concentration bounds. This lemma is similar to the basis for Chernoff bound.
Lemma 3.1.
For any real number , random variable and , we have .
Proof.
Since , we have . Then by applying Markov’s inequality, we obtain the desired result. ∎
Lemma 3.1 provides a mathematical way to derive a concentration bound for any random variable , but yet we do not know an upper bound for . In the setting of Chernoff bounds, one often chooses to evaluate e.g. when the random variable is a finite sum of independent random variables. In our setting, the situation is more complicated, because we consider to be the total accumulated cost until a stopping time***If the cost accumulated at each step is equal to one, then the total accumulated cost is equal to the running time.. We use supermartingales and Optional Stopping Theorem (Williams, 1991a, Chapter 10.10) to derive an upper bound for , as is shown by the following proposition. In the proposition below, is a stochastic process where each represents the amount of cost accumulated at the th step so that and is a key stochastic process for deriving an upper bound for .
Proposition 3.2.
Consider two stochastic processes and adapted to a filtration for which is a constant random variable, and a stopping time w.r.t the filtration . Let be real numbers. If it holds that
-
•
is a.s. finite, i.e., , and
-
•
a.s. for some constant , and
-
•
a.s. for all ,
then we have where . Here, the existence and (probabilistic) uniqueness of the conditional expectation is due to its non-negativity (see (Agrawal et al., 2018, Proposition 3.1)).
Proof.
Define the stochastic process as . By definition, we have that for all . Note that although may be non-integrable (due to its exponential construction), its conditional expectation still exists, following from and (Agrawal et al., 2018, Proposition 3.1). By the condition (C3) and the “take out what is known” property of conditional expectation (see (Williams, 1991a, Chapter 9.7)), we have that . (Note that although the integrability may not be ensured, but the proof on (Williams, 1991a, Page 90) guarantees the case for non-negative random variables.) Then we have that a.s., which means from the basic property of conditional expectation. It follows from an easy induction on that for all , thus the conditional expectation is also taken in the normal sense as each is indeed integrable, and is then a supermartingale. Moreover, the process is a non-negative supermartingale by definition. Then by applying Optional Stopping Theorem (Williams, 1991a, Chapter 10.10) and using (C1), we obtain that . Now, from the condition (C2), we have a.s. It follows that . ∎
Proposition 3.2 provides a way to bound by a stochastic process and an auxiliary if the conditions (C1)–(C3) are satisfied. By combining Lemma 3.1 and Proposition 3.2, we obtain the following main theorem of this section for concentration bounds of total accumulated cost until a stopping time.
Theorem 3.3.
Let and be stochastic processes adapted to a filtration for which is a constant random variable, and be a stopping time with respect to the filtration . Let be real numbers. If the conditions (C1)–(C3) (cf. Proposition 3.2) are fulfilled (by ’s, and ), then we have that for every , where .
Remark 1 (Comparison with Classical Methods).
Compared with classical mathematical methods such as Chernoff bounds and Hoeffding’s inequality that only examine the expected value, variance or range of related random variables, our method (Theorem 3.3) examines the detailed probability distribution by deriving a better bound for the moment generation function . The derivation is through the construction of an exponential supermartingale (see Proposition 3.2), and depends on the detailed probability distributions in the considered problem. Since we examine the probability distributions in detail, our method can obtain much tighter concentration bound than the inequalities of Azuma and Hoeffding (Chatterjee et al., 2018b) and Karp’s classical method (Karp, 1994). We do not consider Chernoff bounds as they are typically used for a finite sum of independent random variables, while our method focuses on concentration bounds for total accumulated costs until a stopping time. We show how the method can be automated for several classical and widely-used probability distributions, such as uniform and Bernoulli.
4. Concentration Bounds for Probabilistic Programs
In this section, we apply our method (Theorem 3.3) to probabilistic programs. Probabilistic programs are imperative programs extended with random number generators. Here we consider the concentration bound of a single probabilistic while loop in which there is no nested loops. We first illustrate how we can apply our method to a single probabilistic while loop. Then we develop automated algorithms for deriving concentration bound. Finally, we present experimental results and compare our results with previous methods.
4.1. The General Method Applied to Probabilistic While Loops
We consider a probabilistic while loop of the form
(1) |
where is the loop guard and an imperative probabilistic program with possibly assignment statements with samplings, conditional branches, probabilistic branches, but without (nested) while loops. For a detailed treatment of the syntax, we refer to (Chatterjee et al., 2018b). Since we only consider single probabilistic while loops here, we choose to have a light-weight description of probabilistic programs.
Before we describe probabilistic programs, we first illustrate a simple example.
Example 4.1.
Consider the probabilistic while loop:
In each loop iteration, the value of the variable is added a random sampling whose distribution is given by . The random value of is sampled independently in every iteration. The loop ends when the value of falls below zero.
Below we describe the behaviours of a probabilistic while loop. In the sequel, we fix two disjoint sets of variables: the set of program variables and the set of sampling variables. Informally, program variables are real-valued and are directly related to the control flow of a program, while sampling variables represent random inputs sampled from their predefined probability distributions.
Valuations. A valuation over a finite set of variables is a function that assigns a real value to each variable. We denote by the set of all valuations over . For simplicity, we treat each loop guard of a probabilistic while loop as a subset of so that a valuation satisfies the loop guard iff the valuation falls in the designated subset . Moreover, we may also regard each element (resp. ) as a vector (resp. ) with an implicit ordering between program variables (resp. sampling variables), such that (resp. ) represents the value of the th program variable (resp. sampling variable), respectively.
The Semantics. We present a lightweight semantics for probabilistic while loops. We describe the execution of a probabilistic while loop by an infinite sequence of vectors of random variables inductively defined as follows. Below we fix a probabilistic while loop in the form (1).
-
•
is a vector of constant random variables representing the initial input;
-
•
if (i.e., satisfies the loop guard), then , where (i) the vector represents the sampled values for sampling variables at the th loop iteration, and (ii) is the update function for the loop body such that given the current valuation for the program variables before the current loop iteration and the values sampled in the current loop iteration, is the updated vector for program variables after the execution of the current loop iteration;
-
•
if (i.e., does not satisfy the loop guard), then .
The inductive definition can be turned into a general state-space Markov chain by defining suitable kernel functions for the update function , see (Meyn and Tweedie, 1993, Chatper 3).
In the paper, we consider the simplified setting that the sampled values are discrete with bounded range. For this simplified setting, we use a sampling function to describe the sampling, which assigns to every sampling variable the predefined discrete probability distribution over . Then, the joint discrete probability distribution over is defined as for all valuations over sampling variables.
Below we apply our method (Theorem 3.3) to probabilistic while loops, where we consider that the cost to execute the th loop iteration is equal to , so that the total accumulated cost is equal to the number of loop iterations of the loop. Recall that is the update function for the loop body.
Theorem 4.2 (Concentration for Probabilistic Programs).
Let be a probabilistic while loop in the form (1) and be the random variable representing the number of loop iterations of until termination. Suppose that (i) is a.s. finite (i.e., terminates with probability one) and (ii) there exist real numbers and a Borel measurable function such that
-
•
for all ;
-
•
for all and ;
-
•
for every .
Then for any initial valuation , we have for all .
Proof.
Choose for every . Then . Let be the vector of random variables representing the valuation for program variables at the th step (where is a vector of constant random variables that represents the initial input). Define the filtration for which is the smallest sigma-algebra that makes all random variables in measurable. Define the stochastic process for . From the first two conditions, we have that for all . Note that the second condition specifies that the value of the process is no less than at the point the loop terminates. Then from the third condition and the “role of independence” property on (Williams, 1991b, Page 90), we have that the condition (C3) holds. Thus, by Theorem 3.3, we have . ∎
4.2. A Synthesis Algorithm for Exponential Supermartingales
Reasoning about exponential terms is an inherently challenge for algorithmic analysis. Here we provide a practical synthesis algorithm for exponential supermartingales in Theorem 4.2 through ranking-supermartingale (RSM) synthesis. Our heuristics is that we treat the function as an RSM-map, which is the core in existing RSM-synthesis algorithms. Then based on the synthesized RSM-map , we resolve through polynomial solvers such as MATLAB.
The Loops We Consider. We consider that every assignment in the loop is incremental, i.e., (i) every assignment has the form where is a program variable and is a linear expression consisting of constants and sampling variables. Although incremental assignments are in limited form, they are expressive enough to implement random walks, gambler’s ruin and many other examples in the literature. Since we utilize the synthesize linear RSMs, we also assume that the loop guard can be expressed as a polyhedron (i.e., a conjunction of linear inequalities). Also recall that we only handle samplings that observes discrete probability distributions with bounded range.
Before we illustrate the synthesis algorithm, we first recall the notion of RSM-maps, where we adopt the simplified version for a single probabilistic loop. Below, we fix an input probabilistic while loop in the form (1). Recall that is the update function.
Definition 4.3 (RSM-maps).
An RSM-maps is a Borel measurable function such that there exist constants such that
-
•
for all ;
-
•
for all and ;
-
•
for every .
Our algorithm fixes the to be in the definition as the function can be scaled by any factor. Moreover, in our algorithm, the constant will correspond to the same in (A1) and (A2). We do not use in our algorithm.
The Synthesis Algorithm. The algorithm first synthesizes a linear RSM-map through existing algorithms(which is not the focus of this paper). Then the algorithm searches the value for through a binary search. Once a value for is chosen, we solve the minimum value for (so as to obtain the best bound from Theorem 3.3) through the MATLAB polynomial solver. A key point here is that since we only consider incremental assignments and the RSM-map is linear, the condition (A3) reduces to a polynomial inequality that only involves when the value for is chosen. The algorithm thus proceeds by searching larger and larger through binary search until a given precision is reached, and then finds the corresponding minimal . Then we apply Theorem 4.2 to obtain the concentration bound.
4.3. Prototype Implementation and Experimental Results
Implementation
We implemented the algorithm of Section 4.2 using Matlab. All results were obtained on a machine with an Intel Core i7-8700 processor (3.2 GHz) and 16 GB of memory, running Microsoft Windows 10.
Benchmarks
We use benchmarks from (Chatterjee et al., 2018b; Ngo et al., 2018; Chatterjee et al., 2018a). Our benchmarks are as follows:
-
•
AdvRW1D: This program models a 1-dimensional adversarial random walk and is taken from (Chatterjee et al., 2018b).
-
•
rdwalk1, rdwalk2, rdwalk3: These programs model skewed 1-dimensional random walks, in which the probabilities of going to the left and to the right at each step are not equal. They are taken from (Ngo et al., 2018). In rdwalk1 there is a probability of moving left and a probability of moving right. In rdwalk2 the probabilities are , and in rdwalk3, they are .
-
•
prspeed: This example is also a random walk taken from (Ngo et al., 2018). The main difference is that the step length is not necessarily .
-
•
mini-Rou, mini-Rou2: These programs are taken from (Chatterjee et al., 2018a). They are implementations of the mini-roulette casino game as probabilistic programs. mini-Rou is the benchmark in (Chatterjee et al., 2018a), while mini-Rou2 is a variant where the probabilities of losing gambles are increased in order to reduce the runtime.
Program | Our bound for | (Chatterjee et al., 2018b)’s bound for | RSM () | Our runtime (s) | ||||
vad1D | 1.1915 | 1.1003 | 4.68 | |||||
rdwalk1 | 1.314 | 1.1547 | 3.82 | |||||
rdwalk2 | 2.072 | 1.511 | 4.34 | |||||
rdwalk3 | 3.265 | 2.065 | 5.74 | |||||
mini-Rou | 1.00124 | 1.00068 | 7.02 | |||||
mini-Rou2 | 1.607 | 1.309 | 6.68 | |||||
prspeed | 2.0479 | 1.3048 | 4.58 | |||||
Results and Discussion
Our experimental results are presented in Table 1. We compare our concentration bounds with those obtained using Azuma’s inequality and Hoeffding’s inequality in (Chatterjee et al., 2018b). As shown in Table 1 our approach successfully obtains tighter bounds in every case. Moreover, note that our approach is very efficient and can obtain these bounds in a few seconds.
5. Concentration bounds for probabilistic recurrences
5.1. Problem Setting and Examples
PRRs
In this section, we consider general Probabilistic Recurrence Relations (PRRs) as defined in (Karp, 1994). A PRR is a relation of the following form:
in which is a non-negative value and each is a non-negative random variable such that we always have . Intuitively, we think of as the time it takes for a randomized divide-and-conquer algorithm to solve an instance of size . The algorithm first performs a preprocessing procedure, which leads to smaller instances of random sizes It then solves the smaller instances recursively and merges the solutions in a postprocessing phase. We assume that the preprocessing and postprocessing on an instance of size take units of time overall. Our goal is to obtain upper bounds on the tail probability , where is the size of our initial input.
Example 5.1 (QuickSort (Hoare, 1961a)).
One of the most well-known sorting algorithms is QuickSort. Given an array with distinct elements, QuickSort first chooses an element of the array uniformly at random. It then compares all the other elements of the array with and divides them in two parts: (i) elements that are smaller than , and (ii) elements that are larger than . Finally, it recursively sorts each part. Let be the total number of comparisons made by QuickSort in handling an array of size . Assuming that there are elements in part (i) and elements in part (ii) above, we have:
Moreover, it is easy to see that
Example 5.2 (QuickSelect (Hoare, 1961b)).
Consider the problem of finding the -th smallest element in an unordered array of distinct items. A classical algorithm for solving this problem is QuickSelect. Much like QuickSort, QuickSelect begins by choosing an element of the array uniformly at random. It then compares all the other elements of the array with and divides them into two parts: (i) those that are smaller than and (ii) those that are larger. Suppose that there are elements in part (i). If then the algorithm recursively searches for the -th smallest element of part (ii). If the algorithm terminates by returning as the desired answer. Finally, if the algorithm recursively finds the th smallest element in part (i). Note that the classical median selection algorithm is a special case of QuickSelect. While more involved linear-time non-randomized algorithms exist for the same problem, QuickSelect provides a simple randomized variant. In this case, we have the following PRR:
Here, is the number of comparisons performed by QuickSelect over an input of size and is a random variable that captures the size of the remaining array that has to be searched recursively.
5.2. Modeling and Theoretical Results
The Markov Chain of a PRR
In order to apply our general approach to PRRs, we first need to embed them in stochastic processes. Suppose we are given a PRR of the following form:
(2) |
Moreover, assume that we are interested in for a specific initial value . We model this PRR as a Markov chain in which each state consists of a non-negative integer , and a stack of additional non-negative integers. Formally, for every , we have Intuitively, each models a state of our probabilistic divide-and-conquer algorithm in which there are more recursive calls waiting to be executed, and the -th call needs to process an instance of size Following this intuition, the transitions of are defined as follows:
Basically, we start with the state In other words, in the beginning, we have to process an instance of size . At each step in the Markov chain, if contains a call to process an instance of size , we can easily remove that call from the stack when transitioning to , because we assumed that Otherwise, we have to perform a call on an instance of size which by definition leads to further recursive calls to instances of sizes
Stopping Time and Costs
Let be the stopping time of Formally, We further define to model the total cost of execution until reaching :
Following the definitions, it is easy to see that is the total time cost of running the divide-and-conquer algorithm on an instance of size .
The Exponential Supermartingale of a PRR
Suppose that we are given a function such that for all In other words, is an upper-bound for the expected runtime of our divide-and-conquer algorithm on an instance of size . Finding such upper-bounds is a well-studied problem and can be automated using approaches such as (Chatterjee et al., 2017). We define a new stochastic process as follows:
Moreover, let be a constant that depends on the initial input size , and define a new stochastic process defined as Note that if is a supermartingale, then we can obtain a concentration bound for More formally, we have the following lemma:
Lemma 5.3.
Let be the size of the initial input instance for the recurrence relation in (2) and an upper-bound on the expected value of If for some
(3) |
for all , then for all
5.3. Case Studies
Based on the lemma above, we can now derive concentration bounds for a PRR by synthesizing a suitable . We now demonstrate the process with a few examples. In the next section, we will show how this process can be automated.
5.3.1. Obtaining Concentration Bounds for QuickSelect
Consider the PRR in Example 5.2. We are interested in the tail probability and wish to synthesize an upper-bound for this probability. Suppose that the given function is . In other words, we know that We apply Lemma 5.3 to obtain sufficient conditions on :
By simply computing the value of the geometric series, we get:
since , and , we strengthen the formula to obtain:
Let . We can rewrite the equation above as:
By basic calculus, we can further prove that holds for Recall that Since for every , increases as increases, our constraint becomes , so one possible solution is Plugging this value back into Lemma 5.3, we have
Remark 2 (Comparison with (Karp, 1994)).
As shown above, our approach is able to synthesize the concentration bound
for the PRR corresponding to QuickSelect. In contrast, (Karp, 1994) obtains the following concentration bound:
Hence, our bound is better by a factor of more than .
The advantage of our approach is not limited to the specific choice of in the tail probability. See Section 5.5 for concentration bounds for other tail probabilities. We now show how a more general result and a tighter bound can be obtained.
Suppose that we aim to find an upper-bound for the tail probability for an arbitrary constant . Let be a real number and consider the function Using a similar calculation as above, defining , we obtain:
Since , the inequality holds for , so it suffices to find such that . We choose . Plugging this back into Lemma 5.3, leads to:
Specifically, by letting we get
Remark 3 (Comparison with (Karp, 1994)).
If we plug into the general result above, our general approach is able to synthesize the concentration bound
for the PRR corresponding to QuickSelect. In contrast, (Karp, 1994) obtains the following concentration bound:
Hence, our bound is better by a factor of more than .
5.3.2. Obtaining Concentration Bounds for QuickSort
Consider the PRR in Example 5.1. Our goal is to synthesize an upper-bound for the tail probability Given , we apply Lemma 5.3 and obtain the following conditions for :†††We assume .
For , we can manually verify that it suffices to set . In the sequel, we assume Note that is monotonically decreasing on , and monotonically increasing on . We partition the summation above uniformly into eight parts and use the maximum of each part to overapproximate the sum. This leads to the following upper bound for this summation:
Plugging in this overapproximation back into the original inequality, we get:
for all . We define to do substitution, and we use the following formula to do strengthening:
By defining respectively, we obtain:
Now we study the following function:
By basic calculus, we can prove that holds on . Additionally, since for every , increases as increases, by plugging into Lemma 5.3, we obtain:
Remark 4 (Comparison with (Karp, 1994)).
As shown above, our approach is able to synthesize a concentration bound that is of the form
for the PRR corresponding to QuickSort. In contrast (Karp, 1994) provides the following concentration bound:
Note that the latter bound is a constant, while our bound improves as grows. More concretely, as grows, our bound becomes exponentially better than the one obtained by (Karp, 1994).
Just as in the previous case, the advantage of our approach is not limited to bounding A similar argument can be applied to obtain similar exponentially-decreasing bounds for with other values of and Moreover, our bounds improve as increases. This is in contrast to (Karp, 1994) that can only bound Hence, not only do we beat (Karp, 1994)’s bounds numerically, but we also provide a more fine-grained set of concentration bounds.
Remark 5 (Comparison with (Tassarotti, 2017)).
Recall that our approach synthesized the following bound:
for the PRR corresponding to QuickSort, the work of (Tassarotti, 2017) improves Karp’s cook-book method and provides the following bound:
Note that our bound beats theirs in both base and exponent, although both bounds are asympotically equal to .
Remark 6 (Comparison with (McDiarmid and Hayward, 1996)).
While our approach synthesized the following bound:
for the PRR corresponding to QuickSort, the work of (McDiarmid and Hayward, 1996) provides the asympotically optimal bound of the following form:
where is a constant. Our bound is comparable but slightly worse than the optimal result in (McDiarmid and Hayward, 1996) by only factor. However, their method only works for quick sort, while our method works on a wide-range of algorithms, such as quick select, quick sort and diameter computation. Furthermore, their method is completetly manual, while our approach could be automated once the monotonic interval of is obtained, and it is remained as a future work.
5.4. Automated Algorithm
In this section, we consider the problem of automating our PRR analysis. We provide a sound and polynomial time algorithm. Our algorithm is able to automatically synthesize concentration bounds that beat previous methods over several classical benchmarks.
The Setup
In this section, we consider PRRs of the following form:
(4) |
in which is an expression generated by the following grammar:
where is a real constant. We also assume that no sum in appears with a negative coefficient, and that the coefficient of the most significant non-sum term in (if it exists) is positive. We focus on the problem of synthesizing upper-bounds for the tail probability We also assume that the input to our algorithm contains a function that serves as an upper-bound on the expected runtime, i.e. There are well-known approaches for automatically deriving such functions, e.g. see (Chatterjee et al., 2017). To enable algorithmic methods, we further assume that and are generated using the following grammar, in which denotes a real constant:
(5) |
We also assume that the coefficient of the most significant term in and is positive. The goal of our algorithm is to synthesize an that satisfies
(6) |
Such an will directly lead to a concentration bound as in Lemma 5.3. Our algorithm relies on the following simple Lemma:
Lemma 5.4.
For any monotonically increasing function defined on the interval , where , we have:
Overview of the Algorithm
Our algorithm consists of five steps:
-
Step 1.
The algorithm symbolically computes Inequality (6).
-
Step 2.
The algorithm replaces every summation in (6) with an over-approximation, hence strengthening the requirements. The algorithm has two ways of obtaining such over-approximations: If the expression inside the sum has an elementary antiderivative that can be computed symbolically, the algorithm applies Lemma 5.4. Otherwise, it finds a bound based on sampling and relying on monotonicity.
-
Step 3.
The algorithm introduces a new variable and substitutes it into the inequality. It also removes non-significant terms, hence further strengthening the inequality.
-
Step 4.
The algorithm uses a calculus technique to obtain a value for , such that the inequality holds on but not on If no such value is found, the algorithm returns the obvious upper-bound for the tail probability.
-
Step 5.
The algorithm plugs back into the definition of and obtains a value for . Note that this value depends on .
Our Synthesis Algorithm
We now present each step of the algorithm in more detail:
Step 1. Computing Conditions on
The algorithm creates a variable and symbolically computes Inequality (6).
Step 2. Over-approximating the Summations
Note that, by design, the expressions inside our summations are monotonically increasing with respect to the summation index. As such, we can apply Lemma 5.4 to obtain an upper-bound for each sum. To do so, the algorithm symbolically computes an antiderivative of the expression inside the sums. Also, note that the antiderivative is always concave, given that the initial expression is increasing. The algorithm uses this concavity property to remove floors and ceilings, hence strengthening the inequality.
However, there are cases where no closed-form or elementary antiderivative can be obtained, e.g. if the expression is In such cases, the algorithm partitions the summation uniformly into parts (as in Section 5.3.2) and uses the maximum element of a part to over-approximate each of its elements. Furthermore, to tackle with floors and ceilings in such cases, we would strengthen the inequality by replacing to . The algorithm starts with and it doubles to obtain finer over-approximations and repeats the following steps until it succeeds in synthesizing a concentration bound.
Example 5.6.
Continuing with the previous example, we have
The algorithm applies this over-approximation to obtain the following strengthened inequality:
The algorithm then further strengthens the inequality by removing the floors and ceilings due to the concavity of :
Step 3. Substitution and Simplification
Let be the most significant term in ignoring constant factors. The algorithm defines a new variable and rewrites the inequality based on . It then moves everything to the LHS, writing the inequality in the form of It also eliminates all fractions by multiplying the inequality by their denominators. This is sound, because by construction, the denominators of all fractions are positive at this point. Then, the algorithm inspects all the terms in If a term is of the form in which contains as a sub-expression, if then it can be simplified to , if then we check whether holds, if it holds, it would be simplified into . This preserves soundness and strengthens the inequality. This preserves soundness and strengthens the inequality. Our algorithm eagerly applies as many such simplifications as possible. Finally, the algorithm divides by the greatest common divisor of its terms.
Example 5.7.
Continuing with the previous example, we have so the most significant term in is . The algorithm therefore defines , and rewrites the inequality as follows: (Note that )
It then moves everything to the LHS, and eliminates the fractions by multiplying their denominators:
Note that appears on the LHS with negative coefficient, so its removal would strengthen the inequality. The algorithm simplifies the corresponding term, obtaining the following:
The algorithm divides the inequality by obtaining:
Before moving to the next step, we need two simple lemmas:
Lemma 5.8 (Proof in Appendix B.1).
After Steps 1–3 above, our inequality is simplified to
in which is a univariate function of the following form:
(7) |
where and . Also, note that this form is closed under derivation.
Given that our inequality is now univariate in , we should look for a value such that on Intuitively, if we have such a , then we can let to solve the problem. Moreover, to find the best possible concentration bound, we would like to find the largest possible . To simplify the matter, we attempt to obtain a such that on and on Moreover, we say that is separable (into non-negative and negative parts) iff such a exists. The following lemma provides sufficient conditions for separability:
Step 4. Ensuring Separability and Finding
The algorithm attempts to prove separability of using Lemma 5.9. Rule (iii) of the Lemma is always used to simplify the expression . The algorithm first evaluates ensuring that it is non-negative. Then, it computes the derivative . If the derivative is negative, then case (i) of Lemma 5.9 is satisfied and is separable. Otherwise, the algorithm tries to recursively prove the separability of using the same method, hence ensuring case (ii) of the Lemma. If both cases fail, the algorithm has failed to prove the separability of and returns the trivial upper-bound On the other hand, if is proven to be separable, the algorithm obtains by a simple binary search using the fact that for all we have
Example 5.10.
Continuing with the previous example, we have
The algorithm evaluates which is non-negative. Hence, it computes the following derivative:
Note that is not always negative for . Hence, the algorithm tries to recursively prove that is separable. It first simplifies to obtain:
Now it tries to prove the separability of It first evaluates and then computes the derivative:
Another level of recursion shows that and is strictly decreasing over So, it is separable. Hence, it is proven that is separable, too. The algorithm performs a binary search and obtains
Step 5. Applying Lemma 5.3
Note that for every increases as increases. Hence, it suffices to find an such that and The algorithm calls an off-the-shelf solver to obtain the largest possible that satisfies these constraints. It then plugs this into Lemma 5.3 and reports the following concentration bound: for all
Example 5.11.
Continuing with previous examples, we had and So, the algorithm solves the constraints and It is easy to see that is the optimal solution. Hence, the algorithm computes and reports the following concentration bound:
for all This is equivalent to:
for all Note that the bound decreases exponentially as grows.
Theorem 5.12 (Soundness).
Proof Sketch. While the algorithm strengthens the inequality at some points, it never weakens it. Hence, any concentration bound found by our algorithm is valid, and the algorithm is sound.
Theorem 5.13.
Assuming fixed bounds on the number of iterations of the binary search, and the approximation parameter , given a PRR an initial value , an expression and an upper-bound function for the expected runtime of all generated by Grammars (4) and (5), the algorithm above terminates in polynomial time with respect to the size of input.
Proof Sketch. Note that each level of recursion in Step 4, i.e. simplification and derivation, strictly reduces the number of terms in the expression that is being studied. Hence, this step performs linearly many symbolic operations. Moreover, Step 5 can find the optimal value of in operations, where is the length of the expression (not its value). It is easy to verify that every other step of the algorithm takes polynomial time.
5.5. Prototype Implementation and Experimental Results
Implementation
We implemented the algorithm of Section 5.4 using Python and Mathematica (Wolfram Research, 2020). We used the SymPy package (Meurer et al., 2017) for symbolic differentiation and integration. All results were obtained on a machine with an Intel Core i7-8700 processor (3.2 GHz) and 16 GB of memory, running on Microsoft Windows 10.
Benchmarks
We experimented with PRRs corresponding to classical randomized divide-and-conquer algorithms, namely QuickSelect, RandomSearch, L1Diameter, and L2Diameter. QuickSelect is already described in Section 5.2. In RandomSearch, the input consists of a sorted array of distinct items and a key . The goal is to find the index of in the array, or report that it does not appear. The algorithm randomly chooses an index and compares with . If is larger, it recursively searches the first locations of the array. If they are equal, it terminates and returns . If is larger, it recursively searches the last locations. In L1Diameter and L2Diameter, the input is a set of points in the 3-d space. The goal is to find the diameter of or equivalently, to find two points in that are farthest apart from each other. The only difference between the two problems is the norm that is used for computing the distance (i.e. L1 or L2). Chapter 9 of the classical book (Motwani and Raghavan, 1995) provides randomized divide-and-conquer algorithms for these problems.
PRRs
The PRRs used in our experiments are shown in the table below.
Randomized Algorithm | Probabilistic Recurrence Relation |
QuickSelect | |
RandomSearch | |
L1Diameter | |
L2Diameter |
Results
Our experimental results are shown in Tables 2 and 3. For each recurrence relation, we consider several different tail probabilities, and manually compute and report the concentration bound obtained by the classical method of (Karp, 1994). We then provide results of our algorithm using various different functions . Recall that is an upper-bound for In cases where our algorithm was unable to find antiderivatives, we also report the parameter i.e. the number of blocks used in over-approximating the summations.
Discussion
As shown in Tables 2 and 3, our algorithm is very efficient and can handle the benchmarks in a few seconds. Moreover, it obtains concentration bounds that are consistently tighter than those of (Karp, 1994), often by one or more orders of magnitude (see the ratio columns). It is also noteworthy that, for the RandomSearch benchmark, our algorithm obtains concentration bounds that decrease as goes up, whereas (Karp, 1994) only provides constant bounds. In this case, the ratio becomes arbitrarily large as grows.
Recurrence | Tail Probability | Karp’s bound | Our bound | Our runtime (s) | Ratio of the bounds | ||
QuickSelect | / | ||||||
RandomSearch | as | / | |||||
Recurrence | Tail Probability | Karp’s bound | Our bound | Our runtime (s) | Ratio of the bounds | ||
L1Diameter | / | ||||||
L2Diameter | |||||||
6. Related Works
Previous results on concentration bounds for probabilistic programs
Concentration bounds for probabilistic programs were first considered by (Monniaux, 2001) where a basic approach for obtaining exponentially-decreasing concentration bounds through abstract interpretation and truncation of the sampling intervals is presented. The work of (Chakarov and Sankaranarayanan, 2013) used Azuma’s inequality to derive exponentially-decreasing concentration results for values of program variables in a probabilistic program. For termination time of probabilistic programs exponentially-decreasing concentration bounds for special classes of probabilistic programs using Azuma and Hoeffding inequalities were established in (Chatterjee et al., 2018b). Recently, for several cases where the Azuma and Hoeffding inequalities are not applicable, a reciprocal concentration bound using Markov’s inequality was presented in (Fu and Chatterjee, 2019), which was then extended to higher moments in (Kura et al., 2019; Wang et al., 2020). For a detailed survey of the current methods of concentration bounds for probabilistic programs see (Sankaranarayanan, 2020).
Previous results on concentration bounds for probabilistic recurrences
Concentration-bound analyses for probabilistic recurrence relations were first considered in the classical work of (Karp, 1994), where cookbook methods (similar to the master theorem for worst-case analysis of recurrences) were obtained for a large class of probabilistic recurrence relations. A variant of the results of Karp that weakened several conditions but obtained comparable bounds was presented in (Chaudhuri and Dubhashi, 1997). The optimal concentration bound for the QuickSort algorithm was presented in (McDiarmid and Hayward, 1996). Recently, concentration bounds for probabilistic recurrence relations were mechanized in a theorem prover (Tassarotti and Harper, 2018), and extended to parallel settings (Tassarotti, 2017).
Comparison with previous approaches on probabilistic programs
Compared with previous results on probabilistic programs, our approach considers synthesis of exponential supermartingales. As compared to (Monniaux, 2001), our approach is based on the well-studied theory of martingales for stochastic processes. In comparison to previous martingale-based approaches, we either achieve asymptotically better bounds (e.g. we achieve exponentially-decreasing bounds as compared to polynomially-decreasing bounds of (Fu and Chatterjee, 2019; Kura et al., 2019; Wang et al., 2020)) or substantially improve the bounds (e.g. in comparison with (Chatterjee et al., 2018b) (see our experimental results in Section 4.3). Moreover, all previous results, such as (Chakarov and Sankaranarayanan, 2013; Chatterjee et al., 2018b), that achieve exponentially-decreasing bounds require bounded-difference, i.e. the stepwise difference in a supermartingale needs to be globally bounded to apply Azuma or Hoeffding inequalities. In contrast, our results can apply to stochastic processes that are not necessarily difference-bounded.
Comparison with previous approaches on probabilistic recurrences
Compared with previous results on probabilistic recurrence relations, our result is based on the idea of exponential supermartingales and related automation techniques. It can derive much better concentration bounds over the classical approach of (Karp, 1994) (See Remarks 2,3, and 4 in Section 5.3), and beats the manual approach of (Tassarotti, 2017) in constants(See Remark 5). Moreover, our approach also derives a tail bound for QuickSort that is comparable to the optimal bound proposed in (McDiarmid and Hayward, 1996) (see Remark 6 in Section 5.3). In addition, the result of (Karp, 1994) requires the key condition to handle recurrences with multiple procedure calls. Whether this condition can be relaxed is raised as an important open problem in (Karp, 1994; Dubhashi and Panconesi, 2009). To address this issue, the approach of (Tassarotti, 2017) proposed a method that does not require this restriction. Our method also does not need this restriction, hence could be viewed as a new way to resolve this problem.
Key conceptual difference
Finally, our general approach of exponential supermartingales has a key difference with respect to previous approaches. The main conceptual difference is that our approach examines the detailed probability distribution (as moment generating function). In comparison, previous approaches (e.g. those based Azuma and Hoeffding inequalities, or results of (Karp, 1994)) only consider the expectation, the variance, or the range of related random variables. This is the key conceptual reason that our approach can derive tighter bounds.
7. Conclusion and Future Work
In this work, we presented a new approach to derive tighter concentration bounds for probabilistic programs and probabilistic recurrence relations. We showed that our new approach can derive tighter concentration bounds than the classical methods of applying Azuma’s inequality for probabilistic programs and the classical approaches for probabilistic recurrences, even for basic randomized algorithms such as QuickSelect and randomized diameter computation. On QuickSort, we beat the approach of (Karp, 1994) and (Tassarotti, 2017), and derive a bound comparable to the optimal bound in (McDiarmid and Hayward, 1996).
There are several interesting directions for future work. First, while we consider classical probability distributions such as uniform and Bernoulli for algorithmic aspects, extending the algorithms to handle various other distributions is an interesting direction. Second, whether our technique can be used for the relaxation of the key condition in the approach of (Karp, 1994) is another interesting problem. Finally, whether our approach can be directly applied to analyze randomized algorithms, rather than their corresponding probabilistic recurrences, is another direction of future work.
8. Acknowledgements
We sincerely thank Prof. Joseph Tassarotti for pointing out a miscalculation in Remark 6.
References
- (1)
- Agrawal et al. (2018) Sheshansh Agrawal, Krishnendu Chatterjee, and Petr Novotný. 2018. Lexicographic ranking supermartingales: an efficient approach to termination of probabilistic programs. In POPL. 34:1–34:32.
- Azuma (1967) Kazuoki Azuma. 1967. Weighted sums of certain dependent random variables. Tohoku Mathematical Journal 19, 3 (1967), 357–367.
- Baier and Katoen (2008) Christel Baier and Joost-Pieter Katoen. 2008. Principles of model checking. MIT Press.
- Barthe et al. (2018) Gilles Barthe, Thomas Espitau, Benjamin Grégoire, Justin Hsu, and Pierre-Yves Strub. 2018. Proving expected sensitivity of probabilistic programs. In POPL. 57:1–57:29.
- Barthe et al. (2017) Gilles Barthe, Benjamin Grégoire, Justin Hsu, and Pierre-Yves Strub. 2017. Coupling proofs are probabilistic product programs. In POPL. 161–174.
- Barthe et al. (2020) Gilles Barthe, Joost-Pieter Katoen, and Alexandra Silva (Eds.). 2020. Foundations of Probabilistic Programming. Springer.
- Chakarov and Sankaranarayanan (2013) Aleksandar Chakarov and Sriram Sankaranarayanan. 2013. Probabilistic Program Analysis with Martingales. In CAV 2013. 511–526.
- Chatterjee et al. (2016) Krishnendu Chatterjee, Hongfei Fu, and Amir Kafshdar Goharshady. 2016. Termination Analysis of Probabilistic Programs Through Positivstellensatz’s. In CAV. 3–22.
- Chatterjee et al. (2018a) Krishnendu Chatterjee, Hongfei Fu, Amir Kafshdar Goharshady, and Nastaran Okati. 2018a. Computational Approaches for Stochastic Shortest Path on Succinct MDPs. In IJCAI 2018. 4700–4707.
- Chatterjee et al. (2017) Krishnendu Chatterjee, Hongfei Fu, and Aniket Murhekar. 2017. Automated Recurrence Analysis for Almost-Linear Expected-Runtime Bounds. In CAV. 118–139.
- Chatterjee et al. (2018b) Krishnendu Chatterjee, Hongfei Fu, Petr Novotný, and Rouzbeh Hasheminezhad. 2018b. Algorithmic Analysis of Qualitative and Quantitative Termination Problems for Affine Probabilistic Programs. TOPLAS 40, 2 (2018), 7:1–7:45.
- Chaudhuri and Dubhashi (1997) Shiva Chaudhuri and Devdatt P. Dubhashi. 1997. Probabilistic Recurrence Relations Revisited. Theoretical Computer Science 181, 1 (1997), 45–56.
- Cusumano-Towner et al. (2018) Marco Cusumano-Towner, Benjamin Bichsel, Timon Gehr, Martin T. Vechev, and Vikash K. Mansinghka. 2018. Incremental inference for probabilistic programs. In PLDI. 571–585.
- Dahlqvist and Kozen (2020) Fredrik Dahlqvist and Dexter Kozen. 2020. Semantics of higher-order probabilistic programs with conditioning. In POPL. 57:1–57:29.
- Dubhashi and Panconesi (2009) Devdatt P. Dubhashi and Alessandro Panconesi. 2009. Concentration of Measure for the Analysis of Randomized Algorithms. Cambridge University Press.
- Durrett (1996) R. Durrett. 1996. Probability: Theory and Examples (Second Edition). Duxbury Press.
- Esparza et al. (2012) Javier Esparza, Andreas Gaiser, and Stefan Kiefer. 2012. Proving Termination of Probabilistic Programs Using Patterns. In CAV. 123–138.
- Fioriti and Hermanns (2015) Luis María Ferrer Fioriti and Holger Hermanns. 2015. Probabilistic Termination: Soundness, Completeness, and Compositionality. In POPL. 489–501.
- Foster et al. (2016) Nate Foster, Dexter Kozen, Konstantinos Mamouras, Mark Reitblatt, and Alexandra Silva. 2016. Probabilistic NetKAT. In ESOP. 282–309.
- Fu and Chatterjee (2019) Hongfei Fu and Krishnendu Chatterjee. 2019. Termination of Nondeterministic Probabilistic Programs. In VMCAI. 468–490.
- Hark et al. (2020) Marcel Hark, Benjamin Lucien Kaminski, Jürgen Giesl, and Joost-Pieter Katoen. 2020. Aiming low is harder: induction for lower bounds in probabilistic program verification. In POPL. 37:1–37:28.
- Hoare (1961a) Charles Antony Richard Hoare. 1961a. Algorithm 64: quicksort. Commun. ACM 4, 7 (1961), 321.
- Hoare (1961b) Charles Antony Richard Hoare. 1961b. Algorithm 65: find. Commun. ACM 4, 7 (1961), 321–322.
- Howard (1960) H. Howard. 1960. Dynamic Programming and Markov Processes. MIT Press.
- Kaelbling et al. (1998) L. P. Kaelbling, M. L. Littman, and A. R. Cassandra. 1998. Planning and acting in partially observable stochastic domains. Artificial intelligence 101, 1 (1998), 99–134.
- Kaelbling et al. (1996) L. P. Kaelbling, M. L. Littman, and A. W. Moore. 1996. Reinforcement learning: A survey. Journal of Artificial Intelligence Research 4 (1996), 237–285.
- Kaminski et al. (2016) Benjamin Lucien Kaminski, Joost-Pieter Katoen, Christoph Matheja, and Federico Olmedo. 2016. Weakest Precondition Reasoning for Expected Run-Times of Probabilistic Programs. In ESOP. 364–389.
- Karp (1994) Richard M. Karp. 1994. Probabilistic Recurrence Relations. Journal of the ACM 41, 6 (1994), 1136–1150.
- Kemeny et al. (1966) J.G. Kemeny, J.L. Snell, and A.W. Knapp. 1966. Denumerable Markov Chains. Van Nostrand Company.
- Kleinberg and Tardos (2006) Jon M. Kleinberg and Éva Tardos. 2006. Algorithm design. Addison-Wesley.
- Kura et al. (2019) Satoshi Kura, Natsuki Urabe, and Ichiro Hasuo. 2019. Tail Probabilities for Randomized Program Runtimes via Martingales for Higher Moments. In TACAS.
- Kwiatkowska et al. (2011) Marta Z. Kwiatkowska, Gethin Norman, and David Parker. 2011. PRISM 4.0: Verification of Probabilistic Real-Time Systems. In CAV. 585–591.
- McDiarmid and Hayward (1996) Colin McDiarmid and Ryan Hayward. 1996. Large Deviations for Quicksort. Journal of Algorithms 21, 3 (1996), 476–507.
- Meurer et al. (2017) Aaron Meurer, Christopher P Smith, Mateusz Paprocki, Ondřej Čertík, Sergey B Kirpichev, Matthew Rocklin, Amit Kumar, Sergiu Ivanov, Jason K Moore, Sartaj Singh, et al. 2017. SymPy: symbolic computing in Python. PeerJ Computer Science 3 (2017).
- Meyn and Tweedie (1993) S.P. Meyn and R.L. Tweedie. 1993. Markov chains and stochastic stability. Springer.
- Monniaux (2001) David Monniaux. 2001. An Abstract Analysis of the Probabilistic Termination of Programs. In SAS. 111–126.
- Motwani and Raghavan (1995) Rajeev Motwani and Prabhakar Raghavan. 1995. Randomized Algorithms. Cambridge University Press.
- Ngo et al. (2018) Van Chan Ngo, Quentin Carbonneaux, and Jan Hoffmann. 2018. Bounded expectations: resource analysis for probabilistic programs. In PLDI 2018. 496–512.
- Olmedo et al. (2016) Federico Olmedo, Benjamin Lucien Kaminski, Joost-Pieter Katoen, and Christoph Matheja. 2016. Reasoning about Recursive Probabilistic Programs. In LICS, Martin Grohe, Eric Koskinen, and Natarajan Shankar (Eds.). 672–681.
- Paz (1971) A. Paz. 1971. Introduction to probabilistic automata (Computer science and applied mathematics). Academic Press.
- Rabin (1963) M.O. Rabin. 1963. Probabilistic automata. Information and Control 6 (1963), 230–245.
- Sankaranarayanan (2020) Sriram Sankaranarayanan. 2020. Quantitative Analysis of Programs with Probabilities and Concentration of Measure Inequalities. In Foundations of Probabilistic Programming, Gilles Barthe, Joost-Pieter Katoen, and Alexandra Silva (Eds.). https://www.cs.colorado.edu/~srirams/papers/concMeasureSpringerChapter.PDF
- Sankaranarayanan et al. (2013) Sriram Sankaranarayanan, Aleksandar Chakarov, and Sumit Gulwani. 2013. Static analysis for probabilistic programs: inferring whole program properties from finitely many paths. In PLDI. 447–458.
- Tassarotti (2017) Joseph Tassarotti. 2017. Probabilistic Recurrence Relations for Work and Span of Parallel Algorithms. CoRR abs/1704.02061 (2017). http://arxiv.org/abs/1704.02061
- Tassarotti and Harper (2018) Joseph Tassarotti and Robert Harper. 2018. Verified Tail Bounds for Randomized Programs. In ITP. 560–578.
- Wang et al. (2020) Di Wang, Jan Hoffmann, and Thomas W. Reps. 2020. Tail Bound Analysis for Probabilistic Programs via Central Moments. CoRR abs/2001.10150 (2020). https://arxiv.org/abs/2001.10150
- Williams (1991a) David Williams. 1991a. Probability with Martingales. Cambridge University Press.
- Williams (1991b) David Williams. 1991b. Probability with martingales. Cambridge university press.
- Wolfram Research (2020) Wolfram Research. 2020. Mathematica, Version 12.1. https://www.wolfram.com/mathematica
Appendix A Detailed Experimental Results for Section 4.3
In this section, there are some figures of related example in section 4.3 in Figure 1. The numbers on the x-axis are the terminating time varying from to .






Appendix B Detailed Proofs for Section 5
B.1. Proof of Lemma 5.8
It is easy to see that this form is closed under addition, multiplication and derivative. We first prove the following lemma:
Lemma B.1.
For generated by , suppose is the most significant term (ignoring coefficients), which means . Set , then for every , could be simplified under Step 3’s strategy into a univariate function on as in Lemma 5.8.
Proof.
Suppose , where . Then we have:
For every , if , then this term would become , otherwise since . Then if , it would be simplifed into , otherwise it would be simplified into . Since the form is closed under multiplication, we conclude that could be simplified into the form of Lemma 5.8.
∎
We now return to proving Lemma 5.8. By design, our PRR always has the following form:
where . We consider several cases:
Case (i)
and . Recall that is a over-approximation of , would be . First note that , since if , then . Since , we have . Hence, the conditions for would be:
By integration and concavity, we have:
We define , we have:
By moving everything to left and eliminating the fraction, we get:
By removing common parts and , we derive:
Since are constants, this is a univariate inequality for in the form of Lemma 5.8.
Case (ii)
. Recall that is a over-approximation of . would be , where . Conditions for would be:
By integration and concavity, we have:
By moving everything to left and eliminating the fraction, we get:
We define . By Lemma B.1, , and since could be simplified into the form we want, thus the whole formula could be simplified into the form we want.
Case (iii)
Otherwise, by design, will have no elementary antiderivative, in this case we would partition the summation uniformly into parts, and use the maximum element of a part to over-approximate each of its elements. In this case, conditions for would be:
By uniformly dividing into parts and using the maximum element of a part to over-approximate each of its elements, we derive:
By moving everything to the left, and removing the denominators, we obtain:
We remove the common term , and let , where is the most siginificant term in (ignoring coefficients). By Lemma B.1, the whole formula could be simplified into the form we want.
B.2. Proof of Lemma 5.9
Proof for Case (i)
Define . Set , then by and continuity of , we have . Since is monotonically decreasing, , so is feasible.
Proof for Case (ii)
Since is separable, there exists such that on and on . By Newton-Leibnitz formula: , so . Since on , then by similarly to (i), we conclude that is separable.
Proof for Case (iii)
If is separable for some , then there exists such that on and on , since for , we conclude that on and on , so is separable.
Appendix C Supplementary material for probabilistic recurrences
C.1. On the concentration bound for Quicksort
Here we emphasize use exactly the same step in our paper to derive the following bound for all :
The steps below until the sentence ”Finally, we could prove..” are exactly the same as our paper in Page 14, without changing any word.
Given , we apply Lemma 5.3 and obtain the following conditions for :‡‡‡We assume .
For , we can manually verify that it suffices to set . In the sequel, we assume Note that is monotonically decreasing on , and monotonically increasing on . We partition the summation above uniformly into eight parts and use the maximum of each part to overapproximate the sum. This leads to the following upper bound for this summation:
Plugging in this overapproximation back into the original inequality, we get:
for all . We define to do substitution, and we use the following formula to do strengthening:
By defining respectively, we obtain:
Now we study the following function:
By basic calculus, we can prove that holds on . Additionally, since for every , increases as increases, by plugging into Lemma 5.3, we obtain: