This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

1

Concentration-Bound Analysis for Probabilistic Programs and Probabilistic Recurrence Relations

Jinyi Wang Shanghai Jiao Tong UniversityShanghaiChina [email protected] Yican Sun Peking UniversityBeijingChina [email protected] Hongfei Fu Shanghai Jiao Tong UniversityShanghaiChina [email protected] Mingzhang Huang IST Austria (Institute of Science and Technology Austria)KlosterneuburgAustria [email protected] Amir Kafshdar Goharshady IST AustriaKlosterneuburgAustria [email protected]  and  Krishnendu Chatterjee IST AustriaKlosterneuburgAustria [email protected]
Abstract.

Analyzing probabilistic programs and randomized algorithms are classical problems in computer science. The first basic problem in the analysis of stochastic processes is to consider the expectation or mean, and another basic problem is to consider concentration bounds, i.e. showing that large deviations from the mean have small probability. Similarly, in the context of probabilistic programs and randomized algorithms, the analysis of expected termination time/running time and their concentration bounds are fundamental problems. In this work, we focus on concentration bounds for probabilistic programs and probabilistic recurrences of randomized algorithms. For probabilistic programs, the basic technique to achieve concentration bounds is to consider martingales and apply the classical Azuma’s inequality (Azuma, 1967). For probabilistic recurrences of randomized algorithms, Karp’s classical “cookbook” method (Karp, 1994), which is similar to the master theorem for recurrences, is the standard approach to obtain concentration bounds. In this work, we propose a novel approach for deriving concentration bounds for probabilistic programs and probabilistic recurrence relations through the synthesis of exponential supermartingales. For probabilistic programs, we present algorithms for synthesis of such supermartingales in several cases. We also show that our approach can derive better concentration bounds than simply applying the classical Azuma’s inequality over various probabilistic programs considered in the literature. For probabilistic recurrences, our approach can derive tighter bounds than the well-established methods of (Karp, 1994) on classical algorithms such as quick sort, quick select, and randomized diameter computation. Moreover, we show that our approach could derive bounds comparable to the optimal bound for quicksort, proposed in  (McDiarmid and Hayward, 1996). We also present a prototype implementation that can automatically infer these bounds.

journal: PACMPLjournalvolume: 1journalnumber: CONFarticle: 1journalyear: 2018publicationmonth: 1copyright: noneccs: Software and its engineering General programming languagesccs: Social and professional topics History of programming languages

1. Introduction

In this work, we present new methods to obtain concentration bounds for probabilistic programs and probabilistic recurrences. In this section, we start with a brief description of probabilistic programs and recurrences and provide an overview of the basic analysis problems. Then, we discuss previous results and finally present our contributions.

Probabilistic programs and recurrences

The formal analysis of probabilistic models is a fundamental problem that spans across various disciplines, including probability theory and statistics (Durrett, 1996; Howard, 1960; Kemeny et al., 1966; Rabin, 1963; Paz, 1971), randomized algorithms (Motwani and Raghavan, 1995), formal methods (Baier and Katoen, 2008; Kwiatkowska et al., 2011), artificial intelligence (Kaelbling et al., 1996, 1998), and programming languages (Barthe et al., 2020). The analysis of probabilistic programs, i.e. imperative programs extended with random value generators, has received significant attention in the programming languages community (Chakarov and Sankaranarayanan, 2013; Fioriti and Hermanns, 2015; Sankaranarayanan et al., 2013; Esparza et al., 2012; Chatterjee et al., 2018b; Barthe et al., 2018; Barthe et al., 2017; Foster et al., 2016; Cusumano-Towner et al., 2018; Chatterjee et al., 2016; Hark et al., 2020; Olmedo et al., 2016; Dahlqvist and Kozen, 2020). Similarly, the analysis of randomized algorithms is a central and classical problem in theoretical computer science (Motwani and Raghavan, 1995), and a key problem is to analyze the probabilistic recurrence relations arising from randomized algorithms.

Basic analysis problems

In the analysis of probabilistic programs and probabilistic recurrences, one has to analyze the underlying stochastic processes. The first basic problem in analysis of stochastic processes is to consider the expectation or the mean (Williams, 1991a). However, the expectation (or the first moment) does not provide enough information about the probability distribution associated with the process. Hence higher moments (such as variance) are key parameters of interest. Another fundamental problem is to obtain concentration bounds showing that large deviation from the mean has small probability. A key advantage of establishing strong concentration bounds is that it enables us to provide high-probability guarantees (e.g. with high probability the running time does not exceed a desired bound). In the context of probabilistic programs and randomized algorithms, a key quantity of interest is the termination time of programs or the running time associated with probabilistic recurrences. Thus, the analysis of expected termination time/running time and their concentration bounds for probabilistic programs and probabilistic recurrences are fundamental problems in computer science.

Previous results on expectation analysis

The problem of expected termination time analysis has received huge attention. The expected termination time analysis for probabilistic programs has been widely studied with different techniques, e.g. martingale-based techniques (Chakarov and Sankaranarayanan, 2013; Fu and Chatterjee, 2019) and weakest pre-expectation calculus (Kaminski et al., 2016). The analysis of expected running time of probabilistic recurrences is a fundamental problem studied in randomized algorithms (Motwani and Raghavan, 1995), and automated methods for them have also been considered (Chatterjee et al., 2017).

Previous results on concentration bounds

The analysis of concentration bounds is more involved than expectation analysis, both for the general case of stochastic processes, as well as in the context of probabilistic programs and probabilistic recurrences. For probabilistic programs, the only technique in the literature to obtain concentration bounds is to consider either linear or polynomial supermartingales (Chakarov and Sankaranarayanan, 2013; Chatterjee et al., 2016), and then apply the classical Azuma’s inequality (Azuma, 1967) on the supermartingales to obtain concentration bounds. For probabilistic recurrences of randomized algorithms, the standard approach for concentration bounds is Karp’s classical “cookbook” method (Karp, 1994), which is similar to the master theorem for non-probabilistic recurrences. More advanced methods have also been developed for specific algorithms such as QuickSort (McDiarmid and Hayward, 1996; Dubhashi and Panconesi, 2009).

Our contributions

In this work, we consider the problem of concentration bounds for probabilistic programs and probabilistic recurrences and our main contributions are as follows:

  1. (1)

    First, we propose a novel approach for deriving concentration bounds for probabilistic programs and probabilistic recurrence relations through the synthesis of exponential supermartingales.

  2. (2)

    For probabilistic programs, we present algorithms for the synthesis of such supermartingales for several cases. We show that our approach can derive better concentration bounds than simply applying the classical Azuma’s inequality, over various probabilistic programs considered in the literature.

  3. (3)

    For probabilistic recurrences, we show that our approach can derive tighter bounds than the classical methods of the literature on several basic problems. We show that our concentration bounds for probabilistic recurrences associated with classical randomized algorithms such as QuickSelect (which generalizes median selection), QuickSort and randomized diameter computation beat the bounds obtained using methods of (Karp, 1994). Moreover, we show that our approach could derive bounds comparable to the optimal bound for quicksort, proposed in  (McDiarmid and Hayward, 1996). We also present a prototype implementation that can automatically infer these bounds.

Novelty

The key novelty of our approach is as follows: Instead of linear or polynomial martingales, we consider exponential martingales, and our concentration bounds are obtained using the standard Markov’s inequality. This is quite a simple technique. The surprising and novel aspect is that with such a simple technique, we can (a) obtain better concentration bounds for probabilistic programs in comparison with the only existing method of applying Azuma’s inequality; and (b) improve the more than two-decades old classical bounds for basic and well-studied randomized algorithms.

2. Preliminaries

Throughout the paper, we denote by \mathbb{N}, 0\mathbb{N}_{0}, \mathbb{Z}, and \mathbb{R} the sets of positive integers, non-negative integers, integers, and real numbers, respectively. We first review several fundamental concepts in probability theory and then illustrate the problems to study.

2.1. Basics of Probability Theory

We provide a short review of some necessary concepts in probability theory. For a more detailed treatment, see (Williams, 1991a).

Probability Distributions

A discrete probability distribution over a countable set UU is a function p:U[0,1]p:U\rightarrow[0,1] such that uUp(u)=1\sum_{u\in U}p(u)=1. The support of pp is defined as supp(p):={uUp(u)>0}{\mathrm{supp}}{\left(p\right)}:=\{u\in U\mid p(u)>0\}.

Probability Spaces

A probability space is a triple (Ω,,Pr)(\Omega,\mathcal{F},\Pr), where Ω\Omega is a non-empty set (called the sample space), \mathcal{F} is a σ\sigma-algebra over Ω\Omega (i.e. a collection of subsets of Ω\Omega that contains the empty set \emptyset and is closed under complementation and countable union) and Pr\Pr is a probability measure on \mathcal{F}, i.e. a function Pr:[0,1]\Pr\colon\mathcal{F}\rightarrow[0,1] such that (i) Pr(Ω)=1\Pr(\Omega)=1 and (ii) for all set-sequences A1,A2,A_{1},A_{2},\dots\in\mathcal{F} that are pairwise-disjoint (i.e. AiAj=A_{i}\cap A_{j}=\emptyset whenever iji\neq j) it holds that i=1Pr(Ai)=Pr(i=1Ai)\sum_{i=1}^{\infty}\Pr(A_{i})=\Pr\left(\bigcup_{i=1}^{\infty}A_{i}\right). Elements of \mathcal{F} are called events. An event AA\in\mathcal{F} holds almost-surely (a.s.) if Pr(A)=1\Pr(A)=1.

Random Variables

A random variable XX from a probability space (Ω,,Pr)(\Omega,\mathcal{F},\Pr) is an \mathcal{F}-measurable function X:Ω{,+}X\colon\Omega\rightarrow\mathbb{R}\cup\{-\infty,+\infty\}, i.e. a function such that for all d{,+}d\in\mathbb{R}\cup\{-\infty,+\infty\}, the set {ωΩX(ω)<d}\{\omega\in\Omega\mid X(\omega)<d\} belongs to \mathcal{F}.

Expectation

The expected value of a random variable XX from a probability space (Ω,,Pr)(\Omega,\mathcal{F},\Pr), denoted by 𝔼(X)\mathbb{E}(X), is defined as the Lebesgue integral of XX w.r.t. Pr\Pr, i.e. 𝔼(X):=XdPr\mathbb{E}(X):=\int X\,\mathrm{d}\Pr. The precise definition of Lebesgue integral is somewhat technical and is omitted here (cf. (Williams, 1991a, Chapter 5) for a formal definition). If rangeX={d0,d1,}\mbox{\sl range}~{}X=\{d_{0},d_{1},\ldots\} is countable, then we have 𝔼(X)=k=0dkPr(X=dk)\mathbb{E}(X)=\sum_{k=0}^{\infty}d_{k}\cdot\Pr(X=d_{k}).

Filtrations

A filtration of a probability space (Ω,,Pr)(\Omega,\mathcal{F},\Pr) is an infinite sequence {n}n0\{\mathcal{F}_{n}\}_{n\in\mathbb{N}_{0}} of σ\sigma-algebras over Ω\Omega such that nn+1\mathcal{F}_{n}\subseteq\mathcal{F}_{n+1}\subseteq\mathcal{F} for all n0n\in\mathbb{N}_{0}. Intuitively, a filtration models the information available at any given point of time.

Conditional Expectation

Let XX be any random variable from a probability space (Ω,,Pr)(\Omega,\mathcal{F},\Pr) such that 𝔼(|X|)<\mathbb{E}(|X|)<\infty. Then, given any σ\sigma-algebra 𝒢\mathcal{G}\subseteq\mathcal{F}, there exists a random variable (from (Ω,,Pr)(\Omega,\mathcal{F},\Pr)), denoted by 𝔼(X𝒢){\mathbb{E}}{\left({X}{\mid}{\mathcal{G}}\right)}, such that:

  • (E1)

    𝔼(X𝒢){\mathbb{E}}{\left({X}{\mid}{\mathcal{G}}\right)} is 𝒢\mathcal{G}-measurable, and

  • (E2)

    𝔼(|𝔼(X𝒢)|)<\mathbb{E}\left(\left|{\mathbb{E}}{\left({X}{\mid}{\mathcal{G}}\right)}\right|\right)<\infty, and

  • (E3)

    for all A𝒢A\in\mathcal{G}, we have A𝔼(X𝒢)dPr=AXdPr\int_{A}{\mathbb{E}}{\left({X}{\mid}{\mathcal{G}}\right)}\,\mathrm{d}\Pr=\int_{A}{X}\,\mathrm{d}\Pr.

The random variable 𝔼(X𝒢){\mathbb{E}}{\left({X}{\mid}{\mathcal{G}}\right)} is called the conditional expectation of XX given 𝒢\mathcal{G}. The random variable 𝔼(X𝒢){\mathbb{E}}{\left({X}{\mid}{\mathcal{G}}\right)} is a.s. unique in the sense that if YY is another random variable satisfying (E1)–(E3), then Pr(Y=𝔼(X𝒢))=1\Pr(Y={\mathbb{E}}{\left({X}{\mid}{\mathcal{G}}\right)})=1. We refer to  (Williams, 1991a, Chapter 9) for details. Intuitively, 𝔼(X𝒢){\mathbb{E}}{\left({X}{\mid}{\mathcal{G}}\right)} is the expectation of XX, when assuming the information in 𝒢\mathcal{G}.

Stochastic Processes

A (discrete-time) stochastic process is a sequence Γ={Xn}n0\Gamma=\{X_{n}\}_{n\in\mathbb{N}_{0}} of random variables where XnX_{n}’s are all from some probability space (Ω,,Pr)(\Omega,\mathcal{F},\Pr). The process Γ\Gamma is adapted to a filtration {n}n0\{\mathcal{F}_{n}\}_{n\in\mathbb{N}_{0}} if for all n0n\in\mathbb{N}_{0}, XnX_{n} is n\mathcal{F}_{n}-measurable. Intuitively, the random variable XiX_{i} models some value at the ii-th step of the process.

Martingales and Supermartingales

A stochastic process Γ={Xn}n0\Gamma=\{X_{n}\}_{n\in\mathbb{N}_{0}} adapted to a filtration {n}n0\{\mathcal{F}_{n}\}_{n\in\mathbb{N}_{0}} is a martingale (resp. supermartingale) if for every n0n\in\mathbb{N}_{0}, 𝔼(|Xn|)<\mathbb{E}(|X_{n}|)<\infty and it holds a.s. that 𝔼(Xn+1n)=Xn{\mathbb{E}}{\left({X_{n+1}}{\mid}{\mathcal{F}_{n}}\right)}=X_{n} (resp. 𝔼(Xn+1n)Xn{\mathbb{E}}{\left({X_{n+1}}{\mid}{\mathcal{F}_{n}}\right)}\leq X_{n}). We refer to  (Williams, 1991a, Chapter 10) for a deeper treatment.

Intuitively, a martingale (resp. supermartingale) is a discrete-time stochastic process in which for an observer who has seen the values of X0,,XnX_{0},\ldots,X_{n}, the expected value at the next step, i.e. 𝔼(Xn+1n){\mathbb{E}}{\left({X_{n+1}}{\mid}{\mathcal{F}_{n}}\right)}, is equal to (resp. no more than) the last observed value XnX_{n}. Also, note that in a martingale, the observed values for X0,,Xn1X_{0},\ldots,X_{n-1} do not matter given that 𝔼(Xn+1n)=Xn.{\mathbb{E}}{\left({X_{n+1}}{\mid}{\mathcal{F}_{n}}\right)}=X_{n}. In contrast, in a supermartingale, the only requirement is that 𝔼(Xn+1n)Xn{\mathbb{E}}{\left({X_{n+1}}{\mid}{\mathcal{F}_{n}}\right)}\leq X_{n} and hence 𝔼(Xn+1n){\mathbb{E}}{\left({X_{n+1}}{\mid}{\mathcal{F}_{n}}\right)} may depend on X0,,Xn1.X_{0},\ldots,X_{n-1}. Also, note that n\mathcal{F}_{n} might contain more information than just the observations of XiX_{i}’s.

Stopping Times

Given a probability space (Ω,,Pr)(\Omega,\mathcal{F},\Pr), a stopping time w.r.t a filtration {n}n0\{\mathcal{F}_{n}\}_{n\in\mathbb{N}_{0}} is a random variable T:Ω0{}T:\Omega\rightarrow\mathbb{N}_{0}\cup\{\infty\} such that {ωT(ω)=n}n\{\omega\mid T(\omega)=n\}\in\mathcal{F}_{n}.

Example 2.1.

Consider an unbiased and discrete random walk, in which we start at a position X0X_{0}, and at each second walk one step to either left or right with equal probability. Let XnX_{n} denote our position after nn seconds. It is easy to verify that 𝔼(Xn+1|X0,,Xn)=12(Xn1)+12(Xn+1)=Xn.\mathbb{E}\left(X_{n+1}|X_{0},\ldots,X_{n}\right)=\frac{1}{2}(X_{n}-1)+\frac{1}{2}(X_{n}+1)=X_{n}. Hence, this random walk is a martingale. Note that by definition, every martingale is also a supermartingale. As another example, consider the classical gambler’s ruin: a gambler starts with Y0Y_{0} dollars of money and bets continuously until he loses all of his money. If the bets are unfair, i.e. the expected value of his money after a bet is less than its expected value before the bet, then the sequence {Yn}n0\{Y_{n}\}_{n\in\mathbb{N}_{0}} is a supermartingale. In this case, YnY_{n} is the gambler’s total money after nn bets. On the other hand, if the bets are fair, then {Yn}n0\{Y_{n}\}_{n\in\mathbb{N}_{0}} is a martingale.

2.2. Termination and Concentration Bounds

In this work, we consider concentration bounds of the termination time of probabilistic programs and recurrence relations. Below, we illustrate the notions of termination time and concentration bound.

Termination Time

The termination-time random variable counts the number of steps a program takes to termination. In the setting of probabilistic programs, the termination-time random variable is defined as the first time the program hits its termination program counter, which is a stopping time. For convenience, we treat each termination-time random variable as the total accumulated cost until the program terminates, for which each execution step of the program causes a single unit of cost. In the setting of probabilistic recurrence relations, since the amount of steps in a preprocessing stage is directly added to the termination time, we treat the number of steps in a preprocessing stage as the cost at current execution step, and define the termination-time variable as the total accumulated cost until the recurrence relation runs into the empty stack (i.e., terminates). In both cases, we consider the termination time random variable as the total accumulated cost until the stopping time of termination.

Concentration Bounds

A concentration bound of a random variable CC is an inequality of the form Pr(Cd)f(d)\Pr(C\geq d)\leq f(d) or Pr(Cd)f(d)\Pr(C\leq d)\leq f(d) which specifies whether the probability that the value of CC grows too large or too small is bounded by a function ff, which tends to zero when dd\rightarrow\infty (for Pr(Cd)\Pr(C\geq d)) or dd\rightarrow-\infty (for Pr(Cd)\Pr(C\leq d)). Concentration bounds are an important probabilistic property of random variables as they witness that a random variable will have a substantially small probability of large deviation. When applied to termination time, the concentration bound specifies the following property: the probability that a randomized algorithm runs inefficiently, i.e. takes much longer than its expected termination time, is very small. Thus compared with expected termination time, concentration bounds provide a much finer characterization of the running time of randomized algorithms.

Problem Statement

In this work, we consider the following problems: For probabilistic programs, we consider unnested probabilistic while loops. Our aim is to have automated approaches for deriving much tighter exponentially-decreasing concentration bounds in comparison with the existing approach through Azuma or Hoeffding inequalities (Chatterjee et al., 2018b). For probabilistic recurrence relations, our goal is to derive substantially finer bounds compared with the classical results of (Karp, 1994), and for quicksort, we try to derive bounds comparable to the optimal result proposed in (McDiarmid and Hayward, 1996). Obtaining more precise (tighter) concentration bounds is important in many applications, e.g. resource-contention resolution (Kleinberg and Tardos, 2006, Chapter 13), or in resource-constrained environments such as embedded systems.

3. The General Method

In this section, we establish the mathematical foundation for the analysis of concentration bounds, based on which we will later develop novel sound approaches to derive concentration bounds for the termination time arising from probabilistic programs and recurrence relations. As in certain situations, e.g. probabilistic recurrence relations, the termination time is defined as the total cost accumulated at every execution step, we consider concentration bounds for a general non-negative random variable. First, we introduce the basic result through which we derive our concentration bounds. This lemma is similar to the basis for Chernoff bound.

Lemma 3.1.

For any real number dd, random variable CC and β>1\beta>1, we have Pr(Cd)𝔼(βC)/βd\Pr(C\geq d)\leq\mathbb{E}(\beta^{C})/\beta^{d}.

Proof.

Since β>1\beta>1, we have Pr(Cd)=Pr(βCβd)\Pr(C\geq d)=\Pr(\beta^{C}\geq\beta^{d}). Then by applying Markov’s inequality, we obtain the desired result. ∎

Lemma 3.1 provides a mathematical way to derive a concentration bound for any random variable CC, but yet we do not know an upper bound for 𝔼(βC)\mathbb{E}(\beta^{C}). In the setting of Chernoff bounds, one often chooses to evaluate 𝔼(βC),\mathbb{E}(\beta^{C}), e.g. when the random variable CC is a finite sum of independent random variables. In our setting, the situation is more complicated, because we consider CC to be the total accumulated cost until a stopping time***If the cost accumulated at each step is equal to one, then the total accumulated cost is equal to the running time.. We use supermartingales and Optional Stopping Theorem (Williams, 1991a, Chapter 10.10) to derive an upper bound for 𝔼(βC)\mathbb{E}(\beta^{C}), as is shown by the following proposition. In the proposition below, {Cn}n0\{C_{n}\}_{n\in\mathbb{N}_{0}} is a stochastic process where each CnC_{n} represents the amount of cost accumulated at the nn-th step so that C=n=0T1Cn,C=\sum_{n=0}^{T-1}C_{n}, and {Xn}n0\{X_{n}\}_{n\in\mathbb{N}_{0}} is a key stochastic process for deriving an upper bound for 𝔼(βC)\mathbb{E}(\beta^{C}).

Proposition 3.2.

Consider two stochastic processes {Xn}n0\{X_{n}\}_{n\in\mathbb{N}_{0}} and {Cn}n0\{C_{n}\}_{n\in\mathbb{N}_{0}} adapted to a filtration {n}\{\mathcal{F}_{n}\} for which X0X_{0} is a constant random variable, and a stopping time TT w.r.t the filtration {n}\{\mathcal{F}_{n}\}. Let α,β>1\alpha,\beta>1 be real numbers. If it holds that

  • (C1):\mathrm{(C1):} TT is a.s. finite, i.e., Pr(T<)=1\Pr(T<\infty)=1, and

  • (C2):\mathrm{(C2):} XTKX_{T}\geq K a.s. for some constant K0K\leq 0, and

  • (C3):\mathrm{(C3):} βCn𝔼(αXn+1n)αXn\beta^{C_{n}}\cdot{\mathbb{E}}{\left({\alpha^{X_{n+1}}}{\mid}{\mathcal{F}_{n}}\right)}\leq\alpha^{X_{n}} a.s. for all n0n\in\mathbb{N}_{0},

then we have 𝔼(βC)αX0K\mathbb{E}(\beta^{C})\leq\alpha^{X_{0}-K} where C:=n=0T1CnC:=\sum_{n=0}^{T-1}C_{n}. Here, the existence and (probabilistic) uniqueness of the conditional expectation 𝔼(αXn+1n){\mathbb{E}}{\left({\alpha^{X_{n+1}}}{\mid}{\mathcal{F}_{n}}\right)} is due to its non-negativity (see (Agrawal et al., 2018, Proposition 3.1)).

Proof.

Define the stochastic process {Yn}n0\{Y_{n}\}_{n\geq 0} as Yn:=αXnβj=0n1CjY_{n}:=\alpha^{X_{n}}\cdot\beta^{\sum_{j=0}^{n-1}C_{j}}. By definition, we have that Yn>0Y_{n}>0 for all nn. Note that although YnY_{n} may be non-integrable (due to its exponential construction), its conditional expectation 𝔼(Ynn){\mathbb{E}}{\left({Y_{n}}{\mid}{\mathcal{F}_{n}}\right)} still exists, following from Yn>0Y_{n}>0 and (Agrawal et al., 2018, Proposition 3.1). By the condition (C3) and the “take out what is known” property of conditional expectation (see (Williams, 1991a, Chapter 9.7)), we have that 𝔼(αXn+1βj=0nCjn)αXnβj=0n1Cj{\mathbb{E}}{\left({\alpha^{X_{n+1}}\cdot\beta^{\sum_{j=0}^{n}C_{j}}}{\mid}{\mathcal{F}_{n}}\right)}\leq\alpha^{X_{n}}\cdot\beta^{\sum_{j=0}^{n-1}C_{j}}. (Note that although the integrability may not be ensured, but the proof on (Williams, 1991a, Page 90) guarantees the case for non-negative random variables.) Then we have that 𝔼(Yn+1n)Yn{\mathbb{E}}{\left({Y_{n+1}}{\mid}{\mathcal{F}_{n}}\right)}\leq Y_{n} a.s., which means 𝔼(Yn+1)𝔼(Yn)\mathbb{E}(Y_{n+1})\leq\mathbb{E}(Y_{n}) from the basic property of conditional expectation. It follows from an easy induction on nn that 𝔼(Yn)𝔼(Y0)<\mathbb{E}(Y_{n})\leq\mathbb{E}(Y_{0})<\infty for all n0n\geq 0, thus the conditional expectation is also taken in the normal sense as each YnY_{n} is indeed integrable, and {Yn}n0\{Y_{n}\}_{n\in\mathbb{N}_{0}} is then a supermartingale. Moreover, the process {Yn}n0\{Y_{n}\}_{n\geq 0} is a non-negative supermartingale by definition. Then by applying Optional Stopping Theorem (Williams, 1991a, Chapter 10.10) and using (C1), we obtain that 𝔼(YT)𝔼(Y0)=αX0\mathbb{E}(Y_{T})\leq\mathbb{E}(Y_{0})=\alpha^{X_{0}}. Now, from the condition (C2), we have YTαKβCY_{T}\geq\alpha^{K}\cdot\beta^{C} a.s. It follows that 𝔼(βC)αX0K\mathbb{E}(\beta^{C})\leq\alpha^{X_{0}-K}. ∎

Proposition 3.2 provides a way to bound 𝔼(βC)\mathbb{E}(\beta^{C}) by a stochastic process {Xn}n0\{X_{n}\}_{n\geq 0} and an auxiliary α>1\alpha>1 if the conditions (C1)–(C3) are satisfied. By combining Lemma 3.1 and Proposition 3.2, we obtain the following main theorem of this section for concentration bounds of total accumulated cost until a stopping time.

Theorem 3.3.

Let {Xn}n0\{X_{n}\}_{n\in\mathbb{N}_{0}} and {Cn}n0\{C_{n}\}_{n\in\mathbb{N}_{0}} be stochastic processes adapted to a filtration {n}\{\mathcal{F}_{n}\} for which X0X_{0} is a constant random variable, and TT be a stopping time with respect to the filtration {n}\{\mathcal{F}_{n}\}. Let α,β>1\alpha,\beta>1 be real numbers. If the conditions (C1)–(C3) (cf. Proposition 3.2) are fulfilled (by Xn,CnX_{n},C_{n}’s, α,β\alpha,\beta and TT), then we have that Pr(Cd)αX0Kβd\Pr(C\geq d)\leq\alpha^{X_{0}-K}\cdot\beta^{-d} for every dd\in\mathbb{R}, where C:=n=0T1CnC:=\sum_{n=0}^{T-1}C_{n}.

Proof.

By Lemma 3.1, we have Pr(Cd)𝔼(βC)/βd\Pr(C\geq d)\leq\mathbb{E}(\beta^{C})/\beta^{d}. By Proposition 3.2, we have 𝔼(βC)αX0K\mathbb{E}(\beta^{C})\leq\alpha^{X_{0}-K}. Thus, we have Pr(Cd)αX0Kβd\Pr(C\geq d)\leq\alpha^{X_{0}-K}\cdot\beta^{-d}. ∎

Remark 1 (Comparison with Classical Methods).

Compared with classical mathematical methods such as Chernoff bounds and Hoeffding’s inequality that only examine the expected value, variance or range of related random variables, our method (Theorem 3.3) examines the detailed probability distribution by deriving a better bound for the moment generation function 𝔼(βC)\mathbb{E}(\beta^{C}). The derivation is through the construction of an exponential supermartingale (see Proposition 3.2), and depends on the detailed probability distributions in the considered problem. Since we examine the probability distributions in detail, our method can obtain much tighter concentration bound than the inequalities of Azuma and Hoeffding (Chatterjee et al., 2018b) and Karp’s classical method (Karp, 1994). We do not consider Chernoff bounds as they are typically used for a finite sum of independent random variables, while our method focuses on concentration bounds for total accumulated costs until a stopping time. We show how the method can be automated for several classical and widely-used probability distributions, such as uniform and Bernoulli.

4. Concentration Bounds for Probabilistic Programs

In this section, we apply our method (Theorem 3.3) to probabilistic programs. Probabilistic programs are imperative programs extended with random number generators. Here we consider the concentration bound of a single probabilistic while loop in which there is no nested loops. We first illustrate how we can apply our method to a single probabilistic while loop. Then we develop automated algorithms for deriving concentration bound. Finally, we present experimental results and compare our results with previous methods.

4.1. The General Method Applied to Probabilistic While Loops

We consider a probabilistic while loop of the form

(1) whileGdoQod\textbf{while}~{}G~{}\textbf{do}~{}Q~{}\textbf{od}

where GG is the loop guard and QQ an imperative probabilistic program with possibly assignment statements with samplings, conditional branches, probabilistic branches, but without (nested) while loops. For a detailed treatment of the syntax, we refer to (Chatterjee et al., 2018b). Since we only consider single probabilistic while loops here, we choose to have a light-weight description of probabilistic programs.

Before we describe probabilistic programs, we first illustrate a simple example.

Example 4.1.

Consider the probabilistic while loop:

while(x0x\geq 0)do x:=x+rx:=x+r od

In each loop iteration, the value of the variable xx is added a random sampling rr whose distribution is given by Pr(r=1)=0.75,Pr(r=1)=0.25\Pr(r=-1)=0.75,\Pr(r=1)=0.25. The random value of rr is sampled independently in every iteration. The loop ends when the value of xx falls below zero.

Below we describe the behaviours of a probabilistic while loop. In the sequel, we fix two disjoint sets of variables: the set VpV_{p} of program variables and the set VrV_{r} of sampling variables. Informally, program variables are real-valued and are directly related to the control flow of a program, while sampling variables represent random inputs sampled from their predefined probability distributions.

Valuations. A valuation over a finite set VV of variables is a function ν:V\nu:V\rightarrow\mathbb{R} that assigns a real value to each variable. We denote by ValV\mbox{\sl Val}_{V} the set of all valuations over VV. For simplicity, we treat each loop guard GG of a probabilistic while loop as a subset of ValVp\mbox{\sl Val}_{V_{\mathrm{p}}} so that a valuation satisfies the loop guard iff the valuation falls in the designated subset GG. Moreover, we may also regard each element νinValVp\nu in\mbox{\sl Val}_{{V_{\mathrm{p}}}} (resp. μValVr\mu\in\mbox{\sl Val}_{V_{\mathrm{r}}}) as a vector ν\nu (resp. μ\mu) with an implicit ordering between program variables (resp. sampling variables), such that ν[i]\nu[i] (resp. μ[i]\mu[i]) represents the value of the iith program variable (resp. sampling variable), respectively.

The Semantics. We present a lightweight semantics for probabilistic while loops. We describe the execution of a probabilistic while loop by an infinite sequence of vectors of random variables {νn}n0\{\nu_{n}\}_{n\geq 0} inductively defined as follows. Below we fix a probabilistic while loop in the form (1).

  • ν0\nu_{0} is a vector of constant random variables representing the initial input;

  • if νG\nu\in G (i.e., ν\nu satisfies the loop guard), then νn+1=F(νn,μn)\nu_{n+1}=F(\nu_{n},\mu_{n}), where (i) the vector μn\mu_{n} represents the sampled values for sampling variables at the (n+1)(n+1)th loop iteration, and (ii) FF is the update function for the loop body such that given the current valuation νValVp\nu\in\mbox{\sl Val}_{{V_{\mathrm{p}}}} for the program variables before the current loop iteration and the values μValVr\mu\in\mbox{\sl Val}_{{V_{\mathrm{r}}}} sampled in the current loop iteration, F(ν,μ)F(\nu,\mu) is the updated vector for program variables after the execution of the current loop iteration;

  • if νG\nu\not\in G (i.e., ν\nu does not satisfy the loop guard), then νn+1=νn\nu_{n+1}=\nu_{n}.

The inductive definition can be turned into a general state-space Markov chain by defining suitable kernel functions for the update function FF, see (Meyn and Tweedie, 1993, Chatper 3).

In the paper, we consider the simplified setting that the sampled values are discrete with bounded range. For this simplified setting, we use a sampling function Υ\Upsilon to describe the sampling, which assigns to every sampling variable rVrr\in{V_{\mathrm{r}}} the predefined discrete probability distribution over \mathbb{R}. Then, the joint discrete probability distribution Υ¯\overline{\Upsilon} over ValVr\mbox{\sl Val}_{{V_{\mathrm{r}}}} is defined as Υ¯(μ):=rVrΥ(r)(μ(r))\overline{\Upsilon}(\mu):=\prod_{r\in{V_{\mathrm{r}}}}\Upsilon(r)(\mu(r)) for all valuations μ\mu over sampling variables.

Below we apply our method (Theorem 3.3) to probabilistic while loops, where we consider that the cost CnC_{n} to execute the (n+1)(n+1)th loop iteration is equal to 11, so that the total accumulated cost is equal to the number of loop iterations of the loop. Recall that FF is the update function for the loop body.

Theorem 4.2 (Concentration for Probabilistic Programs).

Let PP be a probabilistic while loop in the form (1) and TT be the random variable representing the number of loop iterations of PP until termination. Suppose that (i) TT is a.s. finite (i.e., PP terminates with probability one) and (ii) there exist real numbers α,β>1,K0\alpha,\beta>1,K\leq 0 and a Borel measurable function η:ValVp\eta:\mbox{\sl Val}_{V_{\mathrm{p}}}\rightarrow\mathbb{R} such that

  • (A1):\mathrm{(A1):} η(ν)K\eta(\nu)\geq K for all νG\nu\in G;

  • (A2):\mathrm{(A2):} η(ν,μ)K\eta(\nu,\mu)\geq K for all νG\nu\in G and μVr\mu\in V_{\mathrm{r}};

  • (A3):\mathrm{(A3):} βμValVrΥ¯(μ)αη(F(ν,μ))αη(ν)\beta\cdot\sum_{\mu\in\mbox{\sl Val}_{V_{\mathrm{r}}}}\overline{\Upsilon}(\mu)\alpha^{\eta(F(\nu,\mu))}\leq\alpha^{\eta(\nu)} for every νG\nu\in G.

Then for any initial valuation ν0G\nu_{0}\in G, we have Pr(Tn)αη(ν0)Kβn\Pr(T\geq n)\leq\alpha^{\eta(\nu_{0})-K}\cdot\beta^{-n} for all n0n\geq 0.

Proof.

Choose Cn=1C_{n}=1 for every n0n\geq 0. Then T=C=n=0T1CnT=C=\sum_{n=0}^{T-1}C_{n}. Let νn\nu_{n} be the vector of random variables representing the valuation for program variables at the nnth step (where ν0\nu_{0} is a vector of constant random variables that represents the initial input). Define the filtration {n}n0\{\mathcal{F}_{n}\}_{n\geq 0} for which n\mathcal{F}_{n} is the smallest sigma-algebra that makes all random variables in ν0,,νn\nu_{0},\dots,\nu_{n} measurable. Define the stochastic process Xn:=η(νn)X_{n}:=\eta(\nu_{n}) for n0n\geq 0. From the first two conditions, we have that XnKX_{n}\geq K for all nn. Note that the second condition specifies that the value of the process is no less than KK at the point the loop terminates. Then from the third condition and the “role of independence” property on (Williams, 1991b, Page 90), we have that the condition (C3) holds. Thus, by Theorem 3.3, we have Pr(Tn)αX0K/βn\Pr(T\geq n)\leq\alpha^{X_{0}-K}/\beta^{n}. ∎

4.2. A Synthesis Algorithm for Exponential Supermartingales

Reasoning about exponential terms is an inherently challenge for algorithmic analysis. Here we provide a practical synthesis algorithm for exponential supermartingales in Theorem 4.2 through ranking-supermartingale (RSM) synthesis. Our heuristics is that we treat the function η\eta as an RSM-map, which is the core in existing RSM-synthesis algorithms. Then based on the synthesized RSM-map η\eta, we resolve β,α\beta,\alpha through polynomial solvers such as MATLAB.

The Loops We Consider. We consider that every assignment in the loop is incremental, i.e., (i) every assignment has the form x:=x+ex:=x+e where xx is a program variable and ee is a linear expression consisting of constants and sampling variables. Although incremental assignments are in limited form, they are expressive enough to implement random walks, gambler’s ruin and many other examples in the literature. Since we utilize the synthesize linear RSMs, we also assume that the loop guard GG can be expressed as a polyhedron (i.e., a conjunction of linear inequalities). Also recall that we only handle samplings that observes discrete probability distributions with bounded range.

Before we illustrate the synthesis algorithm, we first recall the notion of RSM-maps, where we adopt the simplified version for a single probabilistic loop. Below, we fix an input probabilistic while loop PP in the form (1). Recall that FF is the update function.

Definition 4.3 (RSM-maps).

An RSM-maps is a Borel measurable function η:ValVp\eta:\mbox{\sl Val}_{V_{\mathrm{p}}}\rightarrow\mathbb{R} such that there exist constants ϵ>0,K,K0\epsilon>0,K,K^{\prime}\leq 0 such that

  • (B1):\mathrm{(B1):} η(ν)0\eta(\nu)\geq 0 for all νG\nu\in G;

  • (B2):\mathrm{(B2):} Kη(ν,μ)KK\leq\eta(\nu,\mu)\leq K^{\prime} for all νG\nu\in G and μVr\mu\in V_{\mathrm{r}};

  • (B3):\mathrm{(B3):} μValVrΥ¯(μ)η(F(ν,μ))η(ν)ϵ\sum_{\mu\in\mbox{\sl Val}_{V_{\mathrm{r}}}}\overline{\Upsilon}(\mu)\eta(F(\nu,\mu))\leq\eta(\nu)-\epsilon for every νG\nu\in G.

Our algorithm fixes the ϵ\epsilon to be 11 in the definition as the function η\eta can be scaled by any factor. Moreover, in our algorithm, the constant KK will correspond to the same KK in (A1) and (A2). We do not use KK^{\prime} in our algorithm.

The Synthesis Algorithm. The algorithm first synthesizes a linear RSM-map η\eta through existing algorithms(which is not the focus of this paper). Then the algorithm searches the value for β\beta through a binary search. Once a value for β\beta is chosen, we solve the minimum value for α\alpha (so as to obtain the best bound from Theorem 3.3) through the MATLAB polynomial solver. A key point here is that since we only consider incremental assignments and the RSM-map η\eta is linear, the condition (A3) reduces to a polynomial inequality that only involves α\alpha when the value for β\beta is chosen. The algorithm thus proceeds by searching larger and larger β\beta through binary search until a given precision is reached, and then finds the corresponding minimal α\alpha. Then we apply Theorem 4.2 to obtain the concentration bound.

4.3. Prototype Implementation and Experimental Results

Implementation

We implemented the algorithm of Section 4.2 using Matlab. All results were obtained on a machine with an Intel Core i7-8700 processor (3.2 GHz) and 16 GB of memory, running Microsoft Windows 10.

Benchmarks

We use benchmarks from (Chatterjee et al., 2018b; Ngo et al., 2018; Chatterjee et al., 2018a). Our benchmarks are as follows:

  • AdvRW1D: This program models a 1-dimensional adversarial random walk and is taken from (Chatterjee et al., 2018b).

  • rdwalk1, rdwalk2, rdwalk3: These programs model skewed 1-dimensional random walks, in which the probabilities of going to the left and to the right at each step are not equal. They are taken from (Ngo et al., 2018). In rdwalk1 there is a 0.750.75 probability of moving left and a 0.250.25 probability of moving right. In rdwalk2 the probabilities are 0.875,0.1250.875,0.125, and in rdwalk3, they are 0.9375,0.06250.9375,0.0625.

  • prspeed: This example is also a random walk taken from (Ngo et al., 2018). The main difference is that the step length is not necessarily 11.

  • mini-Rou, mini-Rou2: These programs are taken from (Chatterjee et al., 2018a). They are implementations of the mini-roulette casino game as probabilistic programs. mini-Rou is the benchmark in (Chatterjee et al., 2018a), while mini-Rou2 is a variant where the probabilities of losing gambles are increased in order to reduce the runtime.

Table 1. Our Experimental Results over Probabilistic Programs.
Program α\alpha β\beta κ\kappa Our bound for P(T>κ)P(T>\kappa) (Chatterjee et al., 2018b)’s bound for P(T>κ)P(T>\kappa) RSM (η\eta) Our runtime (s)
vad1D 1.1915 1.1003 6X06X_{0} 2.25×1012.25\times 10^{-1} 8.16×1018.16\times 10^{-1} 2.8571x2.8571x 4.68
10X010X_{0} 3.33×1023.33\times 10^{-2} 5.12×1015.12\times 10^{-1}
14X014X_{0} 4.90×1034.90\times 10^{-3} 3.07×1013.07\times 10^{-1}
18X018X_{0} 7.28×1047.28\times 10^{-4} 1.81×1011.81\times 10^{-1}
22X022X_{0} 1.08×1041.08\times 10^{-4} 1.06×1011.06\times 10^{-1}
rdwalk1 1.314 1.1547 6X06X_{0} 9.07×1029.07\times 10^{-2} 2.10×1012.10\times 10^{-1} 2x2x 3.82
10X010X_{0} 5.11×1035.11\times 10^{-3} 2.06×1022.06\times 10^{-2}
14X014X_{0} 2.88×1042.88\times 10^{-4} 1.82×1031.82\times 10^{-3}
18X018X_{0} 1.62×1061.62\times 10^{-6} 1.56×1041.56\times 10^{-4}
22X022X_{0} 9.13×1079.13\times 10^{-7} 1.31×1051.31\times 10^{-5}
rdwalk2 2.072 1.511 6X06X_{0} 4.16×1044.16\times 10^{-4} 7.92×1037.92\times 10^{-3} 34x\frac{3}{4}x 4.34
10X010X_{0} 1.07×1071.07\times 10^{-7} 3.41×1053.41\times 10^{-5}
14X014X_{0} 2.75×10112.75\times 10^{-11} 1.32×1071.32\times 10^{-7}
18X018X_{0} 7.05×10157.05\times 10^{-15} 4.97×10104.97\times 10^{-10}
22X022X_{0} 1.81×10181.81\times 10^{-18} 1.84×10121.84\times 10^{-12}
rdwalk3 3.265 2.065 6X06X_{0} 5.07×1075.07\times 10^{-7} 7.79×1047.79\times 10^{-4} 1.142x1.142x 5.74
10X010X_{0} 2.54×10132.54\times 10^{-13} 4.39×1074.39\times 10^{-7}
14X014X_{0} 1.27×10191.27\times 10^{-19} 2.24×10102.24\times 10^{-10}
18X018X_{0} 6.35×10266.35\times 10^{-26} 1.10×10131.10\times 10^{-13}
22X022X_{0} 3.18×10323.18\times 10^{-32} 5.35×10175.35\times 10^{-17}
mini-Rou 1.00124 1.00068 6X06X_{0} 1.00×1011.00\times 10^{-1} 9.96×1019.96\times 10^{-1} 13x1313x-13 7.02
10X010X_{0} 9.88×1019.88\times 10^{-1} 9.99×1019.99\times 10^{-1}
14X014X_{0} 9.74×1019.74\times 10^{-1} 1.00×1011.00\times 10^{-1}
18X018X_{0} 9.96×1019.96\times 10^{-1} 9.99×1019.99\times 10^{-1}
22X022X_{0} 9.48×1019.48\times 10^{-1} 9.98×1019.98\times 10^{-1}
mini-Rou2 1.607 1.309 6X06X_{0} 3.90×1033.90\times 10^{-3} 1.44×1011.44\times 10^{-1} 0.323x0.3230.323x-0.323 6.68
10X010X_{0} 1.77×1051.77\times 10^{-5} 3.26×1023.26\times 10^{-2}
14X014X_{0} 8.11×1088.11\times 10^{-8} 7.30×1037.30\times 10^{-3}
18X018X_{0} 3.71×10103.71\times 10^{-10} 1.60×1031.60\times 10^{-3}
22X022X_{0} 1.69×10121.69\times 10^{-12} 3.69×1043.69\times 10^{-4}
prspeed 2.0479 1.3048 6X06X_{0} 4.90×1014.90\times 10^{-1} 2.52×1022.52\times 10^{-2} 17141.714x1714-1.714x 4.58
10X010X_{0} 2.40×1042.40\times 10^{-4} 6.00×1036.00\times 10^{-3}
14X014X_{0} 1.17×1051.17\times 10^{-5} 1.37×1031.37\times 10^{-3}
18X018X_{0} 5.71×1085.71\times 10^{-8} 3.07×1043.07\times 10^{-4}
22X022X_{0} 2.79×10102.79\times 10^{-10} 6.84×1056.84\times 10^{-5}
Results and Discussion

Our experimental results are presented in Table 1. We compare our concentration bounds with those obtained using Azuma’s inequality and Hoeffding’s inequality in (Chatterjee et al., 2018b). As shown in Table 1 our approach successfully obtains tighter bounds in every case. Moreover, note that our approach is very efficient and can obtain these bounds in a few seconds.

5. Concentration bounds for probabilistic recurrences

5.1. Problem Setting and Examples

PRRs

In this section, we consider general Probabilistic Recurrence Relations (PRRs) as defined in (Karp, 1994). A PRR is a relation of the following form:

n>1,𝒯(n)=a(n)+i=1𝒯(hi(n))𝒯(1)=0,\begin{matrix}\forall n>1,~{}~{}\mathcal{T}(n)=a(n)+\sum_{i=1}^{\ell}\mathcal{T}(h_{i}(n))&&&\mathcal{T}(1)=0,\end{matrix}

in which a(n)a(n) is a non-negative value and each hi(n)h_{i}(n) is a non-negative random variable such that we always have i=1hi(n)n1\sum_{i=1}^{\ell}h_{i}(n)\leq n-1. Intuitively, we think of 𝒯(n)\mathcal{T}(n) as the time it takes for a randomized divide-and-conquer algorithm to solve an instance of size nn. The algorithm first performs a preprocessing procedure, which leads to \ell smaller instances of random sizes h1(n),,h(n).h_{1}(n),\ldots,h_{\ell}(n). It then solves the smaller instances recursively and merges the solutions in a postprocessing phase. We assume that the preprocessing and postprocessing on an instance of size nn take a(n)a(n) units of time overall. Our goal is to obtain upper bounds on the tail probability Pr[𝒯(n)κ]\Pr\left[\mathcal{T}(n^{*})\geq\kappa\right], where nn^{*} is the size of our initial input.

Example 5.1 (QuickSort (Hoare, 1961a)).

One of the most well-known sorting algorithms is QuickSort. Given an array with nn distinct elements, QuickSort first chooses an element pp of the array uniformly at random. It then compares all the other n1n-1 elements of the array with pp and divides them in two parts: (i) elements that are smaller than pp, and (ii) elements that are larger than pp. Finally, it recursively sorts each part. Let 𝒯(n)\mathcal{T}(n) be the total number of comparisons made by QuickSort in handling an array of size nn. Assuming that there are h1(n)h_{1}(n) elements in part (i) and h2(n)h_{2}(n) elements in part (ii) above, we have:

𝒯(n)=n1+𝒯(h1(n))+𝒯(h2(n)).\mathcal{T}(n)=n-1+\mathcal{T}(h_{1}(n))+\mathcal{T}(h_{2}(n)).

Moreover, it is easy to see that h1(n)+h2(n)=n1.h_{1}(n)+h_{2}(n)=n-1.

Example 5.2 (QuickSelect (Hoare, 1961b)).

Consider the problem of finding the dd-th smallest element in an unordered array of nn distinct items. A classical algorithm for solving this problem is QuickSelect. Much like QuickSort, QuickSelect begins by choosing an element pp of the array uniformly at random. It then compares all the other n1n-1 elements of the array with pp and divides them into two parts: (i) those that are smaller than p,p, and (ii) those that are larger. Suppose that there are dd^{\prime} elements in part (i). If d<d1,d^{\prime}<d-1, then the algorithm recursively searches for the (dd1)(d-d^{\prime}-1)-th smallest element of part (ii). If d=d1,d^{\prime}=d-1, the algorithm terminates by returning pp as the desired answer. Finally, if d>d1,d^{\prime}>d-1, the algorithm recursively finds the dd-th smallest element in part (i). Note that the classical median selection algorithm is a special case of QuickSelect. While more involved linear-time non-randomized algorithms exist for the same problem, QuickSelect provides a simple randomized variant. In this case, we have the following PRR:

𝒯(n)=n1+𝒯(h(n)).\mathcal{T}(n)=n-1+\mathcal{T}(h(n)).

Here, 𝒯(n)\mathcal{T}(n) is the number of comparisons performed by QuickSelect over an input of size n,n, and h(n)h(n) is a random variable that captures the size of the remaining array that has to be searched recursively.

5.2. Modeling and Theoretical Results

The Markov Chain of a PRR

In order to apply our general approach to PRRs, we first need to embed them in stochastic processes. Suppose we are given a PRR of the following form:

(2) n>0,𝒯(n)=a(n)+i=1𝒯(hi(n))𝒯(1)=0.\begin{matrix}\forall n>0,~{}~{}\mathcal{T}(n)=a(n)+\sum_{i=1}^{\ell}\mathcal{T}(h_{i}(n))&&&\mathcal{T}(1)=0.\end{matrix}

Moreover, assume that we are interested in 𝒯(n)\mathcal{T}(n^{*}) for a specific initial value nn^{*}. We model this PRR as a Markov chain (Xm)m0(X_{m})_{m\geq 0} in which each state XiX_{i} consists of a non-negative integer kik_{i}, and a stack of kik_{i} additional non-negative integers. Formally, for every ii, we have Xi=(ki,n1(i),n2(i),,nki(i)).X_{i}=(k_{i},\langle n_{1}^{(i)},n_{2}^{(i)},\ldots,n_{k_{i}}^{(i)}\rangle). Intuitively, each XiX_{i} models a state of our probabilistic divide-and-conquer algorithm in which there are kik_{i} more recursive calls waiting to be executed, and the jj-th call needs to process an instance of size nj(i).n_{j}^{(i)}. Following this intuition, the transitions of (Xm)m0(X_{m})_{m\geq 0} are defined as follows:

Xm={(1,n)m=0(km11,n2(m1),,nkm1(m1))m>0n1(m1)=1(km1+1,h1(n1(m1)),,h(n1(m1)),n2(m1),,nkm1(m1))m>0n1(m1)>1\displaystyle X_{m}=\begin{cases}(1,\langle n^{*}\rangle)&m=0\\ (k_{m-1}-1,\langle n_{2}^{(m-1)},\ldots,n_{k_{m-1}}^{(m-1)}\rangle)&m>0\land n_{1}^{(m-1)}=1\\ (k_{m-1}+\ell-1,\langle h_{1}(n_{1}^{(m-1)}),\ldots,h_{\ell}(n_{1}^{(m-1)}),n_{2}^{(m-1)},\ldots,n_{k_{m-1}}^{(m-1)}\rangle)&m>0\land n_{1}^{(m-1)}>1\\ \end{cases}

Basically, we start with the state (1,n).(1,\langle n^{*}\rangle). In other words, in the beginning, we have to process an instance of size nn^{*}. At each step in the Markov chain, if Xm1X_{m-1} contains a call to process an instance of size 11, we can easily remove that call from the stack when transitioning to XmX_{m}, because we assumed that 𝒯(1)=0.\mathcal{T}(1)=0. Otherwise, we have to perform a call on an instance of size n1(m1),n_{1}^{(m-1)}, which by definition leads to further recursive calls to instances of sizes h1(n1(m1)),,h(n1(m1)).h_{1}(n_{1}^{(m-1)}),\ldots,h_{\ell}(n_{1}^{(m-1)}).

Stopping Time and Costs

Let τ\tau be the stopping time of (Xm)m0.(X_{m})_{m\geq 0}. Formally, τ:=min{i|ki=0}.\tau:=\min\{i~{}|~{}k_{i}=0\}. We further define (Cm)m0(C_{m})_{m\geq 0} to model the total cost of execution until reaching XmX_{m}:

Cm={0m=0Cm1m>0n1(m1)=1Cm1+a(n1(m1))m>0n1(m1)>1\displaystyle C_{m}=\begin{cases}0&m=0\\ C_{m-1}&m>0\land n_{1}^{(m-1)}=1\\ C_{m-1}+a\left(n_{1}^{(m-1)}\right)&m>0\land n_{1}^{(m-1)}>1\\ \end{cases}

Following the definitions, it is easy to see that CτC_{\tau} is the total time cost of running the divide-and-conquer algorithm on an instance of size nn^{*}.

The Exponential Supermartingale of a PRR

Suppose that we are given a function f:[0,),f:\mathbb{N}\rightarrow[0,\infty), such that f(n)𝔼[𝒯(n)]f(n)\geq\mathbb{E}[\mathcal{T}(n)] for all n.n\in\mathbb{N}. In other words, f(n)f(n) is an upper-bound for the expected runtime of our divide-and-conquer algorithm on an instance of size nn. Finding such upper-bounds is a well-studied problem and can be automated using approaches such as (Chatterjee et al., 2017). We define a new stochastic process (Ym)m0(Y_{m})_{m\geq 0} as follows:

Ym:=j=1kmf(nj(m)).Y_{m}:={\sum_{j=1}^{k_{m}}{f(n_{j}^{(m)})}}.

Moreover, let α:=α(n)>1\alpha:=\alpha(n^{*})>1 be a constant that depends on the initial input size nn^{*}, and define a new stochastic process (Zm)m0(Z_{m})_{m\geq 0} defined as Zm:=αCm+Ym.Z_{m}:=\alpha^{C_{m}+Y_{m}}. Note that if ZmZ_{m} is a supermartingale, then we can obtain a concentration bound for 𝒯(n).\mathcal{T}(n^{*}). More formally, we have the following lemma:

Lemma 5.3.

Let nn^{*} be the size of the initial input instance for the recurrence relation in (2) and f:[0,)f:\mathbb{N}\rightarrow[0,\infty) an upper-bound on the expected value of 𝒯.\mathcal{T}. If for some α>1,\alpha>1,

(3) αf(n)αa(n)𝔼[αj=1f(hj(n))]\alpha^{f(n)}\geq\alpha^{a(n)}\cdot\mathbb{E}\left[\alpha^{\sum_{j=1}^{\ell}f(h_{j}(n))}\right]

for all 0nn0\leq n\leq n^{*}, then Pr[𝒯(n)κ]αf(n)κ\Pr\left[\mathcal{T}(n^{*})\geq\kappa\right]\leq\alpha^{f(n^{*})-\kappa} for all κf(n).\kappa\geq f(n^{*}).

Proof.

Let (Cm),(Ym),(C_{m}),(Y_{m}), and (Zm)(Z_{m}) be defined as above. Equation (3) guarantees that (Zm)(Z_{m}) is a supermartingale. Applying Theorem 3.3 to (Cm+Ym)(C_{m}+Y_{m}) with β:=α,\beta:=\alpha, we obtain Pr[𝒯(n)f(n)+κ]ακ\Pr\left[\mathcal{T}(n^{*})\geq f(n^{*})+\kappa^{\prime}\right]\leq\alpha^{-\kappa^{\prime}} for all κ0.\kappa^{\prime}\geq 0. Substituting κ=f(n)+κ\kappa=f(n^{*})+\kappa^{\prime} into the latter inequality leads to the desired concentration bound. ∎

5.3. Case Studies

Based on the lemma above, we can now derive concentration bounds for a PRR by synthesizing a suitable α\alpha. We now demonstrate the process with a few examples. In the next section, we will show how this process can be automated.

5.3.1. Obtaining Concentration Bounds for QuickSelect

Consider the PRR in Example 5.2. We are interested in the tail probability Pr[𝒯(n)12n]\Pr[\mathcal{T}(n^{*})\geq 12\cdot n^{*}] and wish to synthesize an upper-bound for this probability. Suppose that the given function ff is f(n):=5nf(n):=5\cdot n. In other words, we know that 𝔼[𝒯(n)]5n.\mathbb{E}\left[\mathcal{T}(n)\right]\leq 5\cdot n. We apply Lemma 5.3 to obtain sufficient conditions on α\alpha:

1nn,α5n\displaystyle\forall 1\leq n\leq n^{*},~{}~{}~{}\alpha^{5\cdot n} αn11n(i=n/2n1α5i+i=n/2n1α5i)\displaystyle\geq\alpha^{n-1}\cdot\frac{1}{n}\cdot\left(\sum_{i=\lceil n/2\rceil}^{n-1}\alpha^{5\cdot i}+\sum_{i=\lfloor n/2\rfloor}^{n-1}\alpha^{5\cdot i}\right)

By simply computing the value of the geometric series, we get:

1nn,α4n+11n(α5nα5n/2+α5nα5n/2α51)\displaystyle\forall 1\leq n\leq n^{*},~{}~{}~{}\alpha^{4\cdot n+1}\geq\frac{1}{n}\cdot\left(\frac{\alpha^{5\cdot n}-\alpha^{5\cdot\lceil n/2\rceil}+\alpha^{5\cdot n}-\alpha^{5\cdot\lfloor n/2\rfloor}}{\alpha^{5}-1}\right)

since α515lnα\alpha^{5}-1\geq 5\cdot\ln\alpha, and α5n/2+α5n/22α2.5n\alpha^{5\cdot\lceil n/2\rceil}+\alpha^{5\cdot\lfloor n/2\rfloor}\geq 2\alpha^{2.5\cdot n}, we strengthen the formula to obtain:

1nn,α4n\displaystyle\forall 1\leq n\leq n^{*},~{}~{}~{}\alpha^{4\cdot n} 2n(α5nα2.5n5lnα)\displaystyle\geq\frac{2}{n}\cdot\left(\frac{\alpha^{5\cdot n}-\alpha^{2.5\cdot n}}{5\cdot\ln\alpha}\right)

Let c:=αnc:=\alpha^{n}. We can rewrite the equation above as:

1nn,5c4lnc\displaystyle\forall 1\leq n\leq n^{*},~{}~{}~{}5\cdot c^{4}\cdot\ln c 2(c5c2.5)\displaystyle\geq 2\cdot(c^{5}-c^{2.5})

By basic calculus, we can further prove that 5c4lnc2(c5c2.5)5\cdot c^{4}\cdot\ln c\geq 2\cdot(c^{5}-c^{2.5}) holds for c[1,2.74].c\in[1,2.74]. Recall that c=αn.c=\alpha^{n}. Since for every α1\alpha\geq 1, αn\alpha^{n} increases as nn increases, our constraint becomes 1ααn2.741\leq\alpha\land\alpha^{n^{*}}\leq 2.74, so one possible solution is α=(2.74)1/n.\alpha=(2.74)^{1/n^{*}}. Plugging this value back into Lemma 5.3, we have Pr[𝒯(n)12n](2.74)7<0.0009.\Pr[\mathcal{T}(n^{*})\geq 12\cdot n^{*}]\leq(2.74)^{-7}<0.0009.

Remark 2 (Comparison with (Karp, 1994)).

As shown above, our approach is able to synthesize the concentration bound

Pr[𝒯(n)12n](2.74)7<0.0009\textstyle\Pr[\mathcal{T}(n^{*})\geq 12\cdot n^{*}]\leq(2.74)^{-7}<0.0009

for the PRR corresponding to QuickSelect. In contrast, (Karp, 1994) obtains the following concentration bound:

Pr[𝒯(n)12n](34)80.1001.\textstyle\Pr[\mathcal{T}(n^{*})\geq 12\cdot n^{*}]\leq\left(\frac{3}{4}\right)^{8}\approx 0.1001.

Hence, our bound is better by a factor of more than 100100.

The advantage of our approach is not limited to the specific choice of 12n12\cdot n^{*} in the tail probability. See Section 5.5 for concentration bounds for other tail probabilities. We now show how a more general result and a tighter bound can be obtained.

Suppose that we aim to find an upper-bound for the tail probability Pr[𝒯(n)rn]\Pr[\mathcal{T}(n^{*})\geq r\cdot n^{*}] for an arbitrary constant r24r\geq 24. Let q>e2q>e^{2} be a real number and consider the function fq(n):=qn.f_{q}(n):=q\cdot n. Using a similar calculation as above, defining c:=αnc:=\alpha^{n}, we obtain:

1nn,cq/21qlnq2cq/2+20\forall 1\leq n\leq n^{*},~{}~{}c^{q/2-1}q\cdot\ln q-2c^{q/2}+2\geq 0

Since q>e2q>e^{2}, the inequality cq/21qlnq2cq/2+20c^{q/2-1}\cdot q\cdot\ln q-2c^{q/2}+2\geq 0 holds for c[1,q]c\in[1,q], so it suffices to find α\alpha such that αnq\alpha^{n^{*}}\leq q. We choose α=q1/n\alpha=q^{1/n^{*}}. Plugging this back into Lemma 5.3, leads to:

Pr[𝒯(n)rn]qqr\Pr[\mathcal{T}(n^{*})\geq r\cdot n^{*}]\leq q^{q-r}

Specifically, by letting q=r/lnr,q={r}/{\ln r}, we get

Pr[𝒯(n)rn](rlnr)rlnrr.\Pr[\mathcal{T}(n^{*})\geq r\cdot n^{*}]\leq\left(\frac{r}{\ln r}\right)^{\frac{r}{\ln r}-r}.
Remark 3 (Comparison with (Karp, 1994)).

If we plug r=24r=24 into the general result above, our general approach is able to synthesize the concentration bound

Pr[𝒯(n)24n](24ln24)24ln2424<3.612×1015\textstyle\Pr[\mathcal{T}(n^{*})\geq 24\cdot n^{*}]\leq\left(\frac{24}{\ln 24}\right)^{\frac{24}{\ln 24}-24}<3.612\times 10^{-15}

for the PRR corresponding to QuickSelect. In contrast, (Karp, 1994) obtains the following concentration bound:

Pr[𝒯(n)24n](34)200.00318.\textstyle\Pr[\mathcal{T}(n^{*})\geq 24\cdot n^{*}]\leq\left(\frac{3}{4}\right)^{20}\approx 0.00318.

Hence, our bound is better by a factor of more than 8.8×10118.8\times 10^{11}.

5.3.2. Obtaining Concentration Bounds for QuickSort

Consider the PRR in Example 5.1. Our goal is to synthesize an upper-bound for the tail probability Pr[𝒯(n)11nlnn+12n].\Pr[\mathcal{T}(n^{*})\geq 11\cdot n^{*}\cdot\ln n^{*}+12\cdot n^{*}]. Given f(n):=9nlnnf(n):=9\cdot n\cdot\ln n, we apply Lemma 5.3 and obtain the following conditions for α\alpha:We assume 0ln0:=00\ln 0:=0.

1nn,α9nlnnαn11ni=0n1αf(i)+f(n1i)\forall 1\leq n\leq n^{*},~{}~{}~{}\alpha^{9\cdot n\cdot\ln n}\geq\alpha^{n-1}\cdot\frac{1}{n}\cdot\sum_{i=0}^{n-1}{\alpha^{f(i)+f(n-1-i)}}

For 1n81\leq n\leq 8, we can manually verify that it suffices to set α>1\alpha>1. In the sequel, we assume n8.n\geq 8. Note that (ilni+(ni1)ln(ni1))\left(i\cdot\ln i+(n-i-1)\cdot\ln(n-i-1)\right) is monotonically decreasing on [1,n/2][1,\lfloor n/2\rfloor], and monotonically increasing on [n/2,n][\lceil n/2\rceil,n]. We partition the summation above uniformly into eight parts and use the maximum of each part to overapproximate the sum. This leads to the following upper bound for this summation:

i=0n1αf(i)\displaystyle\sum_{i=0}^{n-1}{\alpha^{f(i)}} j=07i=jn/8(j+1)n/8αf(i)+f(n1i)\displaystyle\leq\sum_{j=0}^{7}\sum_{i=\lceil j\cdot n/8\rceil}^{\lfloor(j+1)\cdot n/8\rfloor}\alpha^{f(i)+f(n-1-i)}
n4(α9nlnn+α9(n8lnn8+7n8ln7n8)+α9(n4lnn4+3n4ln3n4)+α9(5n8ln5n8+3n8ln3n8))\displaystyle\leq\frac{n}{4}\cdot\left(\alpha^{9\cdot n\cdot\ln n}+\alpha^{9\cdot(\frac{n}{8}\cdot\ln\frac{n}{8}+\frac{7\cdot n}{8}\cdot\ln\frac{7\cdot n}{8})}+\alpha^{9\cdot(\frac{n}{4}\cdot\ln\frac{n}{4}+\frac{3\cdot n}{4}\ln\frac{3\cdot n}{4})}+\alpha^{9\cdot(\frac{5\cdot n}{8}\ln\frac{5\cdot n}{8}+\frac{3\cdot n}{8}\ln\frac{3\cdot n}{8})}\right)

Plugging in this overapproximation back into the original inequality, we get:

α9nlnnαn114(α9nlnn+α9(n8lnn8+7n8ln7n8)+α9(n4lnn4+3n4ln3n4)+α9(5n8ln5n8+3n8ln3n8))\alpha^{9\cdot n\cdot\ln n}\geq\alpha^{n-1}\cdot\frac{1}{4}\cdot\left(\alpha^{9\cdot n\cdot\ln n}+\alpha^{9\cdot(\frac{n}{8}\cdot\ln\frac{n}{8}+\frac{7\cdot n}{8}\cdot\ln\frac{7\cdot n}{8})}+\alpha^{9\cdot(\frac{n}{4}\cdot\ln\frac{n}{4}+\frac{3\cdot n}{4}\ln\frac{3\cdot n}{4})}+\alpha^{9\cdot(\frac{5\cdot n}{8}\ln\frac{5\cdot n}{8}+\frac{3\cdot n}{8}\ln\frac{3\cdot n}{8})}\right)

for all 8nn8\leq n\leq n^{*}. We define c:=αnc:=\alpha^{n} to do substitution, and we use the following formula to do strengthening:

αβlnβ+(nβ)ln(nβ)\displaystyle\alpha^{\beta\cdot\ln\beta+(n-\beta)\cdot\ln(n-\beta)} αβlnn+(nβ)ln(nβ)\displaystyle\leq\alpha^{\beta\cdot\ln n+(n-\beta)\cdot\ln(n-\beta)}
=αnlnnα(nβ)lnn+(nβ)ln(nβ)\displaystyle=\alpha^{n\cdot\ln n}\cdot\alpha^{-(n-\beta)\cdot\ln n+(n-\beta)\cdot\ln(n-\beta)}
=αnlnn+(nβ)lnnβn=cc(nβ)nlnnlnnβn\displaystyle=\alpha^{n\cdot\ln n+(n-\beta)\cdot\ln\frac{n-\beta}{n}}=c\cdot c^{\frac{(n-\beta)}{n\cdot\ln n}\cdot\ln\frac{n-\beta}{n}}

By defining β=n8,n4,3n8\beta=\frac{n}{8},\frac{n}{4},\frac{3n}{8} respectively, we obtain:

8nn,\displaystyle\forall 8\leq n\leq n^{*},~{}~{} c9lnncnn4(c9lnn+c9lnn+638ln78+c9lnn+274ln34+c9lnn+458ln58)\displaystyle c^{9\ln n}\geq\frac{c}{n}\cdot\frac{n}{4}\cdot\left(c^{9\ln n}+c^{9\ln n+\frac{63}{8}\cdot\ln\frac{7}{8}}+c^{9\ln n+\frac{27}{4}\cdot\ln\frac{3}{4}}+c^{9\ln n+\frac{45}{8}\cdot\ln\frac{5}{8}}\right)
8nn,\displaystyle\forall 8\leq n\leq n^{*},~{}~{} 4(c+c1638ln87+c1274ln43+c1458ln85)0\displaystyle 4-\left(c+c^{1-\frac{63}{8}\cdot\ln\frac{8}{7}}+c^{1-\frac{27}{4}\cdot\ln\frac{4}{3}}+c^{1-\frac{45}{8}\cdot\ln\frac{8}{5}}\right)\geq 0

Now we study the following function:

ψ(c)=4(c+c1638ln87+c1274ln43+c1458ln85).\psi(c)=4-\left(c+c^{1-\frac{63}{8}\cdot\ln\frac{8}{7}}+c^{1-\frac{27}{4}\cdot\ln\frac{4}{3}}+c^{1-\frac{45}{8}\cdot\ln\frac{8}{5}}\right).

By basic calculus, we can prove that ψ(c)0\psi(c)\geq 0 holds on [1,2.3][1,2.3]. Additionally, since for every α\alpha, αn\alpha^{n} increases as nn increases, by plugging α=2.31/(n)\alpha=2.3^{1/(n^{*})} into Lemma 5.3, we obtain:

Pr[𝒯(n)11nlnn+12n](2.3)2lnn12.\Pr\left[\mathcal{T}(n^{*})\geq 11\cdot n^{*}\cdot\ln n^{*}+12\cdot n^{*}\right]\leq(2.3)^{-2\cdot\ln n^{*}-12}.
Remark 4 (Comparison with (Karp, 1994)).

As shown above, our approach is able to synthesize a concentration bound that is of the form

Pr[𝒯(n)11nlnn+12n](2.3)2lnn12\Pr\left[\mathcal{T}(n^{*})\geq 11\cdot n^{*}\cdot\ln n^{*}+12\cdot n^{*}\right]\leq(2.3)^{-2\cdot\ln n^{*}-12}

for the PRR corresponding to QuickSort. In contrast (Karp, 1994) provides the following concentration bound:

Pr[𝒯(n)10nlnn]e4.\Pr\left[\mathcal{T}(n^{*})\geq 10\cdot n^{*}\cdot\ln n^{*}\right]\leq e^{-4}.

Note that the latter bound is a constant, while our bound improves as nn^{*} grows. More concretely, as nn^{*} grows, our bound becomes exponentially better than the one obtained by (Karp, 1994).

Just as in the previous case, the advantage of our approach is not limited to bounding Pr[𝒯(n)11nlnn+12n].\Pr[\mathcal{T}(n^{*})\geq 11\cdot n^{*}\cdot\ln n^{*}+12\cdot n^{*}]. A similar argument can be applied to obtain similar exponentially-decreasing bounds for Pr[𝒯(n)a1nlnn+a2n]\Pr[\mathcal{T}(n^{*})\geq a_{1}\cdot n^{*}\cdot\ln n^{*}+a_{2}\cdot n^{*}] with other values of a1a_{1} and a2.a_{2}. Moreover, our bounds improve as a2a_{2} increases. This is in contrast to (Karp, 1994) that can only bound Pr[𝒯(n)anlnn].\Pr\left[\mathcal{T}(n^{*})\geq a\cdot n^{*}\cdot\ln n^{*}\right]. Hence, not only do we beat (Karp, 1994)’s bounds numerically, but we also provide a more fine-grained set of concentration bounds.

Remark 5 (Comparison with (Tassarotti, 2017)).

Recall that our approach synthesized the following bound:

Pr[𝒯(n)11nlnn+12n](2.3)2lnn12,\Pr\left[\mathcal{T}(n^{*})\geq 11\cdot n^{*}\cdot\ln n^{*}+12\cdot n^{*}\right]\leq(2.3)^{-2\cdot\ln n^{*}-12},

for the PRR corresponding to QuickSort, the work of (Tassarotti, 2017) improves Karp’s cook-book method and provides the following bound:

Pr[𝒯(n)11nlnn+12n](8/7)(2ln(8/7))lnn11\Pr\left[\mathcal{T}(n^{*})\geq 11\cdot n^{*}\cdot\ln n^{*}+12\cdot n^{*}\right]\leq(8/7)^{-(2-\ln(8/7))\cdot\ln n^{*}-11}

Note that our bound beats theirs in both base and exponent, although both bounds are asympotically equal to exp(Θ(lnn))\exp(-\Theta(\ln n^{*})).

Remark 6 (Comparison with (McDiarmid and Hayward, 1996)).

While our approach synthesized the following bound:

Pr[𝒯(n)11nlnn+12n](2.3)2lnn12,\Pr\left[\mathcal{T}(n^{*})\geq 11\cdot n^{*}\cdot\ln n^{*}+12\cdot n^{*}\right]\leq(2.3)^{-2\cdot\ln n^{*}-12},

for the PRR corresponding to QuickSort, the work of (McDiarmid and Hayward, 1996) provides the asympotically optimal bound of the following form:

Pr[𝒯(n)11nlnn+12n]eclnnlnlnn\Pr\left[\mathcal{T}(n^{*})\geq 11\cdot n^{*}\cdot\ln n^{*}+12\cdot n^{*}\right]\leq{e}^{-c\cdot\ln n^{*}\cdot\ln\ln n^{*}}

where cc is a constant. Our bound is comparable but slightly worse than the optimal result in (McDiarmid and Hayward, 1996) by only lnlnn\ln\ln n^{*} factor. However, their method only works for quick sort, while our method works on a wide-range of algorithms, such as quick select, quick sort and diameter computation. Furthermore, their method is completetly manual, while our approach could be automated once the monotonic interval of f(i)+f(n1i)f(i)+f(n-1-i) is obtained, and it is remained as a future work.

5.4. Automated Algorithm

In this section, we consider the problem of automating our PRR analysis. We provide a sound and polynomial time algorithm. Our algorithm is able to automatically synthesize concentration bounds that beat previous methods over several classical benchmarks.

The Setup

In this section, we consider PRRs of the following form:

(4) n>0,𝒯(n)=a(n)+𝒯(h(n)):=𝔱𝒯(1)=0.\begin{matrix}\forall n>0,~{}~{}\mathcal{T}(n)=a(n)+\mathcal{T}(h(n)):=\mathfrak{t}&&&\mathcal{T}(1)=0.\end{matrix}

in which 𝔱\mathfrak{t} is an expression generated by the following grammar:

𝔱::=\displaystyle\mathfrak{t}::= cnlnnnlnn𝔱+𝔱|c𝔱\displaystyle c\ \mid\ n\ \mid\ln n\ \mid\ n\cdot\ln n\ \mid\ \mathfrak{t}+\mathfrak{t}\ |\ c\cdot{\mathfrak{t}}
1ni=0n1𝒯(j)1n(i=n/2n1𝒯(i)+i=n/2n1𝒯(i))\displaystyle\ \mid\ \frac{1}{n}\cdot\sum_{i=0}^{n-1}\mathcal{T}(j)\ \mid\ \frac{1}{n}\cdot\left(\sum_{i=\lfloor n/2\rfloor}^{n-1}\mathcal{T}(i)+\sum_{i=\lceil n/2\rceil}^{n-1}\mathcal{T}(i)\right)

where cc is a real constant. We also assume that no sum in 𝔱\mathfrak{t} appears with a negative coefficient, and that the coefficient of the most significant non-sum term in 𝔱\mathfrak{t} (if it exists) is positive. We focus on the problem of synthesizing upper-bounds for the tail probability Pr[𝒯(n)κ].\Pr\left[\mathcal{T}(n^{*})\geq\kappa\right]. We also assume that the input to our algorithm contains a function f:[0,)f:\mathbb{N}\rightarrow[0,\infty) that serves as an upper-bound on the expected runtime, i.e. f(n)𝔼[𝒯(n)].f(n)\geq\mathbb{E}[\mathcal{T}(n)]. There are well-known approaches for automatically deriving such functions, e.g. see (Chatterjee et al., 2017). To enable algorithmic methods, we further assume that κ\kappa and f(n)f(n) are generated using the following grammar, in which cc denotes a real constant:

(5) E::=c|E+E|cE|nlnn|lnn|nE::=c\ |\ E+E\ |\ c\cdot E\ |\ n\cdot\ln n\ |\ \ln n\ |\ n

We also assume that the coefficient of the most significant term in κ\kappa and ff is positive. The goal of our algorithm is to synthesize an α>1\alpha>1 that satisfies

(6) 1nn,αf(n)αa(n)𝔼[αf(h(n))].\forall 1\leq n\leq n^{*},~{}~{}\alpha^{f(n)}\geq\alpha^{a(n)}\cdot\mathbb{E}\left[\alpha^{f(h(n))}\right].

Such an α\alpha will directly lead to a concentration bound as in Lemma 5.3. Our algorithm relies on the following simple Lemma:

Lemma 5.4.

For any monotonically increasing function ff defined on the interval [l,r+1][l,r+1], where l,rl,r\in\mathbb{N}, we have:

i=lrf(x)lr+1f(x)dx.\sum_{i=l}^{r}f(x)\leq\int_{l}^{r+1}f(x)\text{d}x.
Overview of the Algorithm

Our algorithm consists of five steps:

  1. Step 1.

    The algorithm symbolically computes Inequality (6).

  2. Step 2.

    The algorithm replaces every summation in (6) with an over-approximation, hence strengthening the requirements. The algorithm has two ways of obtaining such over-approximations: If the expression inside the sum has an elementary antiderivative that can be computed symbolically, the algorithm applies Lemma 5.4. Otherwise, it finds a bound based on sampling and relying on monotonicity.

  3. Step 3.

    The algorithm introduces a new variable c:=c(α,n)c:=c(\alpha,n) and substitutes it into the inequality. It also removes non-significant terms, hence further strengthening the inequality.

  4. Step 4.

    The algorithm uses a calculus technique to obtain a value c>1c^{*}>1 for cc, such that the inequality holds on [1,c],[1,c^{*}], but not on (c,+).(c^{*},+\infty). If no such value is found, the algorithm returns the obvious upper-bound 11 for the tail probability.

  5. Step 5.

    The algorithm plugs cc^{*} back into the definition of cc and obtains a value for α\alpha. Note that this value depends on nn^{*}.

Our Synthesis Algorithm

We now present each step of the algorithm in more detail:

Step 1. Computing Conditions on α\alpha

The algorithm creates a variable α\alpha and symbolically computes Inequality (6).

Example 5.5.

Consider the PRR for QuickSelect (Example 5.2) and assume f(n):=5n.f(n):=5\cdot n. The algorithm symbolically computes Inequality (6) and obtains the following:

1nn,α5n\displaystyle\forall 1\leq n\leq n^{*},~{}~{}\alpha^{5\cdot n} αn11n(i=n/2n1α5i+i=n/2n1α5i).\displaystyle\geq\alpha^{n-1}\cdot\frac{1}{n}\cdot\left(\sum_{i=\lceil n/2\rceil}^{n-1}\alpha^{5\cdot i}+\sum_{i=\lfloor n/2\rfloor}^{n-1}\alpha^{5\cdot i}\right).
Step 2. Over-approximating the Summations

Note that, by design, the expressions inside our summations are monotonically increasing with respect to the summation index. As such, we can apply Lemma 5.4 to obtain an upper-bound for each sum. To do so, the algorithm symbolically computes an antiderivative of the expression inside the sums. Also, note that the antiderivative is always concave, given that the initial expression is increasing. The algorithm uses this concavity property to remove floors and ceilings, hence strengthening the inequality.

However, there are cases where no closed-form or elementary antiderivative can be obtained, e.g. if the expression is αnlnn.\alpha^{n\cdot\ln n}. In such cases, the algorithm partitions the summation uniformly into BB parts (as in Section 5.3.2) and uses the maximum element of a part to over-approximate each of its elements. Furthermore, to tackle with floors and ceilings in such cases, we would strengthen the inequality by replacing n/2\lceil n/2\rceil to (n1)/2\lceil(n-1)/2\rceil. The algorithm starts with B=2,B=2, and it doubles BB to obtain finer over-approximations and repeats the following steps until it succeeds in synthesizing a concentration bound.

Example 5.6.

Continuing with the previous example, we have

i=lrα5ilr+1α5xdx=α5(r+1)α5l5lnα\sum_{i=l}^{r}{\alpha^{5\cdot i}}\leq\int_{l}^{r+1}\alpha^{5\cdot x}\text{d}x=\frac{\alpha^{5\cdot(r+1)}-\alpha^{5\cdot l}}{5\cdot\ln\alpha}

The algorithm applies this over-approximation to obtain the following strengthened inequality:

1nn,α5n\displaystyle\forall 1\leq n\leq n^{*},~{}~{}\alpha^{5\cdot n} αn11nα5nα5n/2+α5nα5n/25lnα\displaystyle\geq\alpha^{n-1}\cdot\frac{1}{n}\cdot\frac{\alpha^{5\cdot n}-\alpha^{5\cdot\lceil n/2\rceil}+\alpha^{5\cdot n}-\alpha^{5\cdot\lfloor n/2\rfloor}}{5\cdot\ln\alpha}

The algorithm then further strengthens the inequality by removing the floors and ceilings due to the concavity of α5x\alpha^{5\cdot x}:

1nn,α5n\displaystyle\forall 1\leq n\leq n^{*},~{}~{}\alpha^{5\cdot n} αn12nα5nα2.5n5lnα.\displaystyle\geq\alpha^{n-1}\cdot\frac{2}{n}\cdot\frac{\alpha^{5\cdot n}-\alpha^{2.5\cdot n}}{5\ln\alpha}.
Step 3. Substitution and Simplification

Let g(n)g(n) be the most significant term in f(n),f(n), ignoring constant factors. The algorithm defines a new variable c=c(α,n):=αg(n)>1,c=c(\alpha,n):=\alpha^{g(n)}>1, and rewrites the inequality based on cc. It then moves everything to the LHS, writing the inequality in the form of 𝔢0.\mathfrak{e}\geq 0. It also eliminates all fractions by multiplying the inequality by their denominators. This is sound, because by construction, the denominators of all fractions are positive at this point. Then, the algorithm inspects all the terms in 𝔢.\mathfrak{e}. If a term is of the form 𝔢1c𝔢2\mathfrak{e}_{1}\cdot c^{\mathfrak{e}_{2}} in which 𝔢2\mathfrak{e}_{2} contains nn as a sub-expression, if 𝔢1𝔢2>0,\mathfrak{e}_{1}\cdot\mathfrak{e}_{2}>0, then it can be simplified to 𝔢1\mathfrak{e}_{1}, if𝔢1𝔢2<0,\mathfrak{e}_{1}\cdot\mathfrak{e}_{2}<0, then we check whether 𝔢21\mathfrak{e}_{2}\leq 1 holds, if it holds, it would be simplified into cc. This preserves soundness and strengthens the inequality. This preserves soundness and strengthens the inequality. Our algorithm eagerly applies as many such simplifications as possible. Finally, the algorithm divides 𝔢\mathfrak{e} by the greatest common divisor of its terms.

Example 5.7.

Continuing with the previous example, we have f(n)=5n,f(n)=5\cdot n, so the most significant term in f(n)f(n) is nn. The algorithm therefore defines c:=αnc:=\alpha^{n}, and rewrites the inequality as follows: (Note that lnc=nlnα.\ln c=n\cdot\ln\alpha.)

1nn,c5\displaystyle\forall 1\leq n\leq n^{*},~{}~{}c^{5} c11/n2c52c2.55lnc\displaystyle\geq c^{1-1/n}\cdot\frac{2\cdot c^{5}-2\cdot c^{2.5}}{5\cdot\ln c}

It then moves everything to the LHS, and eliminates the fractions by multiplying their denominators:

1nn,5c5lnc2c11/n(c5c2.5)0\displaystyle\forall 1\leq n\leq n^{*},~{}~{}5\cdot c^{5}\cdot\ln c-2\cdot c^{1-1/n}\cdot(c^{5}-c^{2.5})\geq 0

Note that c1/n<1c^{-1/n}<1 appears on the LHS with negative coefficient, so its removal would strengthen the inequality. The algorithm simplifies the corresponding term, obtaining the following:

1nn,5c5lnc2(c6c3.5)0\displaystyle\forall 1\leq n\leq n^{*},~{}~{}5\cdot c^{5}\cdot\ln c-2\cdot(c^{6}-c^{3.5})\geq 0

The algorithm divides the inequality by c3.5,c^{3.5}, obtaining:

1nn,5c1.5lnc2c2.5+20\forall 1\leq n\leq n^{*},~{}~{}5\cdot c^{1.5}\cdot\ln c-2\cdot c^{2.5}+2\geq 0

Before moving to the next step, we need two simple lemmas:

Lemma 5.8 (Proof in Appendix B.1).

After Steps 1–3 above, our inequality is simplified to

1nn,ψ(c)0\forall 1\leq n\leq n^{*},~{}~{}\psi(c)\geq 0

in which ψ(c)\psi(c) is a univariate function of the following form:

(7) ψ(c)=iμicνilnξic\psi(c)=\sum_{i\in\mathcal{I}}\mu_{i}\cdot c^{\nu_{i}}\ln^{\xi_{i}}c

where μi,νi\mu_{i},\nu_{i}\in\mathbb{R} and ξi{0,1}\xi_{i}\in\{0,1\}. Also, note that this form is closed under derivation.

Given that our inequality is now univariate in c=αg(n)c=\alpha^{g(n)}, we should look for a value c>1,c^{*}>1, such that ψ(c)0\psi(c)\geq 0 on [1,c].[1,c^{*}]. Intuitively, if we have such a cc^{*}, then we can let α:=(c)1/g(n)\alpha:=\left(c^{*}\right)^{1/g(n)} to solve the problem. Moreover, to find the best possible concentration bound, we would like to find the largest possible cc^{*}. To simplify the matter, we attempt to obtain a cc^{*} such that ψ(c)0\psi(c)\geq 0 on [1,c][1,c^{*}] and ψ(c)<0\psi(c)<0 on (c,+).(c^{*},+\infty). Moreover, we say that ψ\psi is separable (into non-negative and negative parts) iff such a cc^{*} exists. The following lemma provides sufficient conditions for separability:

Lemma 5.9 (Proof in Appendix B.2).

Let ψ(c)\psi(c) be a function of the form (7). If at least one of the following conditions holds, then ψ\psi is separable:

  1. (i)

    ψ\psi is strictly decreasing over [1,+)[1,+\infty) and ψ(1)0\psi(1)\geq 0.

  2. (ii)

    ψ\psi^{\prime} is separable and ψ(1)0\psi(1)\geq 0.

  3. (iii)

    ψ/ca\psi/c^{a} is separable for some constant aa\in\mathbb{R}.

Step 4. Ensuring Separability and Finding cc^{*}

The algorithm attempts to prove separability of ψ\psi using Lemma 5.9. Rule (iii) of the Lemma is always used to simplify the expression ψ\psi. The algorithm first evaluates ψ(1),\psi(1), ensuring that it is non-negative. Then, it computes the derivative ψ\psi^{\prime}. If the derivative is negative, then case (i) of Lemma 5.9 is satisfied and ψ\psi is separable. Otherwise, the algorithm tries to recursively prove the separability of ψ\psi^{\prime} using the same method, hence ensuring case (ii) of the Lemma. If both cases fail, the algorithm has failed to prove the separability of ψ\psi and returns the trivial upper-bound 1.1. On the other hand, if ψ\psi is proven to be separable, the algorithm obtains cc^{*} by a simple binary search using the fact that for all c1,c\geq 1, we have ψ(c)<0c>c.\psi(c)<0\Leftrightarrow c>c^{*}.

Example 5.10.

Continuing with the previous example, we have

ψ(c)=5c1.5lnc2c2.5+2\psi(c)=5\cdot c^{1.5}\cdot\ln c-2\cdot c^{2.5}+2

The algorithm evaluates ψ(1)=0,\psi(1)=0, which is non-negative. Hence, it computes the following derivative:

ψ(c)=7.5c0.5lnc+5c0.55c1.5\psi^{\prime}(c)=7.5\cdot c^{0.5}\cdot\ln c+5\cdot c^{0.5}-5\cdot c^{1.5}

Note that ψ(c)\psi^{\prime}(c) is not always negative for c1c\geq 1. Hence, the algorithm tries to recursively prove that ψ(c)\psi^{\prime}(c) is separable. It first simplifies ψ\psi^{\prime} to obtain:

ψ1(c)=7.5lnc+55c\psi_{1}(c)=7.5\cdot\ln c+5-5\cdot c

Now it tries to prove the separability of ψ1.\psi_{1}. It first evaluates ψ1(1)=0,\psi_{1}(1)=0, and then computes the derivative:

ψ1(c)=7.5c5\psi_{1}^{\prime}(c)=\frac{7.5}{c}-5

Another level of recursion shows that ψ1(1)0\psi_{1}^{\prime}(1)\geq 0 and ψ1\psi_{1}^{\prime} is strictly decreasing over [1,+).[1,+\infty). So, it is separable. Hence, it is proven that ψ\psi is separable, too. The algorithm performs a binary search and obtains c2.74.c^{*}\approx 2.74.

Step 5. Applying Lemma 5.3

Note that for every α,\alpha, c:=αg(n)c:=\alpha^{g(n)} increases as nn increases. Hence, it suffices to find an α\alpha such that αg(1)>1\alpha^{g(1)}>1 and αg(n)c.\alpha^{g(n^{*})}\leq c^{*}. The algorithm calls an off-the-shelf solver to obtain the largest possible α\alpha that satisfies these constraints. It then plugs this α\alpha into Lemma 5.3 and reports the following concentration bound: Pr[𝒯(n)κ]αf(n)κ\Pr\left[\mathcal{T}(n^{*})\geq\kappa\right]\leq\alpha^{f(n^{*})-\kappa} for all κf(n).\kappa\geq f(n^{*}).

Example 5.11.

Continuing with previous examples, we had c=αnc=\alpha^{n} and c=2.74.c^{*}=2.74. So, the algorithm solves the constraints α>1\alpha>1 and αn2.74.\alpha^{n^{*}}\leq 2.74. It is easy to see that α=(2.74)1/n\alpha=\left(2.74\right)^{1/n^{*}} is the optimal solution. Hence, the algorithm computes and reports the following concentration bound:

Pr[𝒯(n)κ](2.74)5κ/n\Pr\left[\mathcal{T}(n^{*})\geq\kappa\right]\leq\left(2.74\right)^{5-\kappa/n^{*}}

for all κ5n.\kappa\geq 5\cdot n^{*}. This is equivalent to:

Pr[𝒯(n)κn](2.74)5κ\Pr\left[\mathcal{T}(n^{*})\geq\kappa^{\prime}\cdot n^{*}\right]\leq\left(2.74\right)^{5-\kappa^{\prime}}

for all κ5.\kappa^{\prime}\geq 5. Note that the bound decreases exponentially as κ\kappa^{\prime} grows.

Theorem 5.12 (Soundness).

Given a PRR 𝒯(n)=a(n)+𝒯(h(n)),\mathcal{T}(n)=a(n)+\mathcal{T}(h(n)), an expression κ,\kappa, and an upper-bound function ff for the expected runtime of 𝒯,\mathcal{T}, all generated by Grammars (4) and (5), any concentration bound

κf(n),Pr[𝒯(n)κ]αf(n)κ\forall\kappa\geq f(n^{*}),~{}~{}\Pr\left[\mathcal{T}(n^{*})\geq\kappa\right]\leq\alpha^{f(n^{*})-\kappa}

generated by the algorithm above is a correct bound.

Proof Sketch. While the algorithm strengthens the inequality at some points, it never weakens it. Hence, any concentration bound found by our algorithm is valid, and the algorithm is sound.

Theorem 5.13.

Assuming fixed bounds on the number of iterations of the binary search, and the approximation parameter BB, given a PRR 𝒯(n)=a(n)+𝒯(h(n)),\mathcal{T}(n)=a(n)+\mathcal{T}(h(n)), an initial value nn^{*}\in\mathbb{N}, an expression κ,\kappa, and an upper-bound function ff for the expected runtime of 𝒯,\mathcal{T}, all generated by Grammars (4) and (5), the algorithm above terminates in polynomial time with respect to the size of input.

Proof Sketch. Note that each level of recursion in Step 4, i.e. simplification and derivation, strictly reduces the number of terms in the expression that is being studied. Hence, this step performs linearly many symbolic operations. Moreover, Step 5 can find the optimal value of α\alpha in O(|g|)O(|g|) operations, where |g||g| is the length of the expression gg (not its value). It is easy to verify that every other step of the algorithm takes polynomial time.

5.5. Prototype Implementation and Experimental Results

Implementation

We implemented the algorithm of Section 5.4 using Python and Mathematica (Wolfram Research, 2020). We used the SymPy package (Meurer et al., 2017) for symbolic differentiation and integration. All results were obtained on a machine with an Intel Core i7-8700 processor (3.2 GHz) and 16 GB of memory, running on Microsoft Windows 10.

Benchmarks

We experimented with PRRs corresponding to 44 classical randomized divide-and-conquer algorithms, namely QuickSelect, RandomSearch, L1Diameter, and L2Diameter. QuickSelect is already described in Section 5.2. In RandomSearch, the input consists of a sorted array AA of nn distinct items and a key kk. The goal is to find the index of kk in the array, or report that it does not appear. The algorithm randomly chooses an index ii and compares A[i]A[i] with kk. If A[i]A[i] is larger, it recursively searches the first i1i-1 locations of the array. If they are equal, it terminates and returns ii. If kk is larger, it recursively searches the last nin-i locations. In L1Diameter and L2Diameter, the input is a set SS of nn points in the 3-d space. The goal is to find the diameter of S,S, or equivalently, to find two points in SS that are farthest apart from each other. The only difference between the two problems is the norm that is used for computing the distance (i.e. L1 or L2). Chapter 9 of the classical book (Motwani and Raghavan, 1995) provides randomized divide-and-conquer algorithms for these problems.

PRRs

The PRRs used in our experiments are shown in the table below.

Randomized Algorithm Probabilistic Recurrence Relation
QuickSelect 𝒯(n)=n1+1n(i=n/2n1𝒯(i)+i=n/2n1𝒯(i))\textstyle\mathcal{T}(n)=n-1+\frac{1}{n}\left(\sum_{i=\lceil n/2\rceil}^{n-1}\mathcal{T}(i)+\sum_{i=\lfloor n/2\rfloor}^{n-1}\mathcal{T}(i)\right)
RandomSearch 𝒯(n)=1+1n(i=n/2n1𝒯(i)+i=n/2n1𝒯(i))\textstyle\mathcal{T}(n)=1+\frac{1}{n}\left(\sum_{i=\lceil n/2\rceil}^{n-1}\mathcal{T}(i)+\sum_{i=\lfloor n/2\rfloor}^{n-1}\mathcal{T}(i)\right)
L1Diameter 𝒯(n)=n+1n(i=0n1𝒯(i))\textstyle\mathcal{T}(n)=n+\frac{1}{n}\left(\sum_{i=0}^{n-1}\mathcal{T}(i)\right)
L2Diameter 𝒯(n)=nlnn+1n(i=0n1𝒯(i))\textstyle\mathcal{T}(n)=n\ln n+\frac{1}{n}\left(\sum_{i=0}^{n-1}\mathcal{T}(i)\right)
Results

Our experimental results are shown in Tables 2 and 3. For each recurrence relation, we consider several different tail probabilities, and manually compute and report the concentration bound obtained by the classical method of (Karp, 1994). We then provide results of our algorithm using various different functions ff. Recall that f(n)f(n) is an upper-bound for 𝔼[𝒯(n)].\mathbb{E}[\mathcal{T}(n)]. In cases where our algorithm was unable to find antiderivatives, we also report the parameter B,B, i.e. the number of blocks used in over-approximating the summations.

Discussion

As shown in Tables 2 and 3, our algorithm is very efficient and can handle the benchmarks in a few seconds. Moreover, it obtains concentration bounds that are consistently tighter than those of (Karp, 1994), often by one or more orders of magnitude (see the ratio columns). It is also noteworthy that, for the RandomSearch benchmark, our algorithm obtains concentration bounds that decrease as nn^{*} goes up, whereas (Karp, 1994) only provides constant bounds. In this case, the ratio becomes arbitrarily large as nn^{*} grows.

Table 2. Experimental results over the PRRs corresponding to QuickSelect and RandomSearch
Recurrence Tail Probability Karp’s bound 𝐟(𝐧)\mathbf{f(n)} Our bound Our runtime (s) Ratio of the bounds \approx 𝐁\mathbf{B}
QuickSelect Pr[𝒯(n)17n]\Pr[\mathcal{T}(n^{*})\geq 17n^{*}] (34)130.024{\left(\frac{3}{4}\right)}^{13}\approx 0.024 6n6n 4.361084.36\cdot 10^{-8} 3.273.27 51055\cdot 10^{5} /
7n7n 6.111096.11\cdot 10^{-9} 3.243.24 3.631063.63\cdot 10^{6}
10n10n 1.861081.86\cdot 10^{-8} 3.343.34 1.291061.29\cdot 10^{6}
Pr[𝒯(n)15n]\Pr[\mathcal{T}(n^{*})\geq 15n^{*}] (34)110.043{\left(\frac{3}{4}\right)}^{11}\approx 0.043 9n9n 6.891076.89\cdot 10^{-7} 2.322.32 6.241046.24\cdot 10^{4}
12.5n12.5n 0.0020.002 2.182.18 21.521.5
13n13n 0.0050.005 2.092.09 8.68.6
Pr[𝒯(n)11n]\Pr[\mathcal{T}(n^{*})\geq 11n^{*}] (34)70.021{\left(\frac{3}{4}\right)}^{7}\approx 0.021 5n5n 0.00240.0024 2.252.25 8.758.75
6n6n 0.00210.0021 2.102.10 1010
8n8n 0.00160.0016 2.152.15 13.12513.125
Pr[𝒯(n)8n]\Pr[\mathcal{T}(n^{*})\geq 8n^{*}] (34)40.317{\left(\frac{3}{4}\right)}^{4}\approx 0.317 5.5n5.5n 0.0390.039 2.422.42 8.128.12
6n6n 0.0460.046 1.821.82 6.896.89
7n7n 0.1510.151 2.282.28 2.102.10
Pr[𝒯(n)6n]\Pr[\mathcal{T}(n^{*})\geq 6n^{*}] (34)2=0.5625{\left(\frac{3}{4}\right)}^{2}=0.5625 4.5n4.5n 0.4060.406 2.782.78 1.391.39
5n5n 0.3650.365 2.672.67 1.541.54
5.2n5.2n 0.4020.402 3.453.45 1.391.39
RandomSearch Pr[𝒯(n)11lnn]\Pr[\mathcal{T}(n^{*})\geq 11\ln n^{*}] (34)111ln430.12{\left(\frac{3}{4}\right)}^{11-\frac{1}{\ln\frac{4}{3}}}\approx 0.12 5lnn5\ln n (n)8.24(n^{*})^{-8.24} 3.123.12 ++\infty as nn^{*}\rightarrow\infty /
7lnn7\ln n (n)8.11(n^{*})^{-8.11} 3.073.07
9lnn9\ln n (n)6.75(n^{*})^{-6.75} 3.203.20
Pr[𝒯(n)10lnn]\Pr[\mathcal{T}(n^{*})\geq 10\ln n^{*}] (34)101ln430.154{\left(\frac{3}{4}\right)}^{10-\frac{1}{\ln\frac{4}{3}}}\approx 0.154 7lnn7\ln n (n)6.08(n^{*})^{-6.08} 2.282.28
8.5lnn8.5\ln n (n)3.52(n^{*})^{-3.52} 2.902.90
9.5lnn9.5\ln n (n)1.26(n^{*})^{-1.26} 2.372.37
Pr[𝒯(n)8lnn]\Pr[\mathcal{T}(n^{*})\geq 8\ln n^{*}] (34)81ln430.273{\left(\frac{3}{4}\right)}^{8-\frac{1}{\ln\frac{4}{3}}}\approx 0.273 5.5lnn5.5\ln n (n)3.94(n^{*})^{-3.94} 2.502.50
6lnn6\ln n (n)3.49(n^{*})^{-3.49} 2.372.37
6.5lnn6.5\ln n (n)2.84(n^{*})^{-2.84} 2.262.26
Pr[𝒯(n)7lnn]\Pr[\mathcal{T}(n^{*})\geq 7\ln n^{*}] (34)71ln430.363{\left(\frac{3}{4}\right)}^{7-\frac{1}{\ln\frac{4}{3}}}\approx 0.363 4.5lnn4.5\ln n (n)2.80(n^{*})^{-2.80} 2.292.29
5.5lnn5.5\ln n (n)2.36(n^{*})^{-2.36} 2.472.47
6lnn6\ln n (n)1.74(n^{*})^{-1.74} 2.402.40
Pr[𝒯(n)5lnn]\Pr[\mathcal{T}(n^{*})\geq 5\ln n^{*}] (34)51ln430.645{\left(\frac{3}{4}\right)}^{5-\frac{1}{\ln\frac{4}{3}}}\approx 0.645 3.7lnn3.7\ln n (n)0.68(n^{*})^{-0.68} 2.372.37
4lnn4\ln n (n)0.78(n^{*})^{-0.78} 2.612.61
4.5lnn4.5\ln n (n)0.56(n^{*})^{-0.56} 2.292.29
Table 3. Experimental results over the PRRs corresponding to L1Diameter and L2Diameter
Recurrence Tail Probability Karp’s bound 𝐟(𝐧)\mathbf{f(n)} Our bound Our runtime (s) Ratio of the bounds \approx 𝐁\mathbf{B}
L1Diameter Pr[𝒯(n)13n]\Pr[\mathcal{T}(n^{*})\geq 13n^{*}] (12)114.89104{\left(\frac{1}{2}\right)}^{11}\approx 4.89\cdot 10^{-4} 4.3n4.3n 2.331092.33\cdot 10^{-9} 3.763.76 2.091052.09\cdot 10^{5} /
5n5n 1.471091.47\cdot 10^{-9} 2.972.97 3.321053.32\cdot 10^{5}
5.2n5.2n 1.481091.48\cdot 10^{-9} 3.343.34 3.341053.34\cdot 10^{5}
Pr[𝒯(n)11n]\Pr[\mathcal{T}(n^{*})\geq 11n^{*}] (12)9=0.002{\left(\frac{1}{2}\right)}^{9}=0.002 2.5n2.5n 1.8911041.891\cdot 10^{-4} 2.092.09 10.5810.58
6n6n 1.3171061.317\cdot 10^{-6} 1.801.80 1518.611518.61
7n7n 1.9761051.976\cdot 10^{-5} 1.821.82 101.21101.21
Pr[𝒯(n)9n]\Pr[\mathcal{T}(n^{*})\geq 9n^{*}] (12)7=0.008{\left(\frac{1}{2}\right)}^{7}=0.008 2.5n2.5n 0.0020.002 2.332.33 44
5.5n5.5n 7.9571057.957\cdot 10^{-5} 2.272.27 25.1425.14
6n6n 2.9631042.963\cdot 10^{-4} 2.002.00 6.756.75
Pr[𝒯(n)7n]\Pr[\mathcal{T}(n^{*})\geq 7n^{*}] (12)5=0.032{\left(\frac{1}{2}\right)}^{5}=0.032 3.5n3.5n 0.0020.002 2.072.07 1616
5n5n 0.0070.007 2.072.07 4.5714.571
5.5n5.5n 0.0180.018 2.502.50 1.7781.778
Pr[𝒯(n)5n]\Pr[\mathcal{T}(n^{*})\geq 5n^{*}] (12)3=0.125{\left(\frac{1}{2}\right)}^{3}=0.125 2.5n2.5n 0.0810.081 2.682.68 1.541.54
3n3n 0.0460.046 2.782.78 2.7172.717
4n4n 0.1170.117 2.562.56 1.0681.068
L2Diameter Pr[𝒯(n)20nlnn]\Pr[\mathcal{T}(n^{*})\geq 20n^{*}\ln n^{*}] (12)183.81106{\left(\frac{1}{2}\right)}^{18}\approx 3.81\cdot 10^{-6} 3.5nlnn3.5n\ln n 2.071062.07\cdot 10^{-6} 4.904.90 1.841.84 22
5nlnn5n\ln n 5.5110105.51\cdot 10^{-10} 15.4215.42 6914.76914.7 44
Pr[𝒯(n)15nlnn]\Pr[\mathcal{T}(n^{*})\geq 15n^{*}\ln n^{*}] (12)131.23104{\left(\frac{1}{2}\right)}^{13}\approx 1.23\cdot 10^{-4} 5nlnn5n\ln n 6.7151076.715\cdot 10^{-7} 12.4112.41 567.38567.38 44
7nlnn7n\ln n 1.411051.41\cdot 10^{-5} 14.2114.21 27.0227.02 44
Pr[𝒯(n)13.5nlnn]\Pr[\mathcal{T}(n^{*})\geq 13.5n^{*}\ln n^{*}] (12)11.53.45104{\left(\frac{1}{2}\right)}^{11.5}\approx 3.45\cdot 10^{-4} 2.5nlnn2.5n\ln n 7.221057.22\cdot 10^{-5} 7.517.51 5.275.27 22
5nlnn5n\ln n 5.671065.67\cdot 10^{-6} 14.7514.75 67.1967.19 44
Pr[𝒯(n)9nlnn]\Pr[\mathcal{T}(n^{*})\geq 9n^{*}\ln n^{*}] (12)7=0.008{\left(\frac{1}{2}\right)}^{7}=0.008 2.5nlnn2.5n\ln n 0.0040.004 5.645.64 22 22
4.5nlnn4.5n\ln n 0.0010.001 12.4412.44 88 44
Pr[𝒯(n)8nlnn]\Pr[\mathcal{T}(n^{*})\geq 8n^{*}\ln n^{*}] (12)60.016{\left(\frac{1}{2}\right)}^{6}\approx 0.016 2.5nlnn2.5n\ln n 0.0090.009 6.146.14 1.791.79 22
4.5nlnn4.5n\ln n 0.0070.007 15.1315.13 2.292.29 44

6. Related Works

Previous results on concentration bounds for probabilistic programs

Concentration bounds for probabilistic programs were first considered by (Monniaux, 2001) where a basic approach for obtaining exponentially-decreasing concentration bounds through abstract interpretation and truncation of the sampling intervals is presented. The work of (Chakarov and Sankaranarayanan, 2013) used Azuma’s inequality to derive exponentially-decreasing concentration results for values of program variables in a probabilistic program. For termination time of probabilistic programs exponentially-decreasing concentration bounds for special classes of probabilistic programs using Azuma and Hoeffding inequalities were established in (Chatterjee et al., 2018b). Recently, for several cases where the Azuma and Hoeffding inequalities are not applicable, a reciprocal concentration bound using Markov’s inequality was presented in (Fu and Chatterjee, 2019), which was then extended to higher moments in (Kura et al., 2019; Wang et al., 2020). For a detailed survey of the current methods of concentration bounds for probabilistic programs see (Sankaranarayanan, 2020).

Previous results on concentration bounds for probabilistic recurrences

Concentration-bound analyses for probabilistic recurrence relations were first considered in the classical work of (Karp, 1994), where cookbook methods (similar to the master theorem for worst-case analysis of recurrences) were obtained for a large class of probabilistic recurrence relations. A variant of the results of Karp that weakened several conditions but obtained comparable bounds was presented in (Chaudhuri and Dubhashi, 1997). The optimal concentration bound for the QuickSort algorithm was presented in (McDiarmid and Hayward, 1996). Recently, concentration bounds for probabilistic recurrence relations were mechanized in a theorem prover (Tassarotti and Harper, 2018), and extended to parallel settings (Tassarotti, 2017).

Comparison with previous approaches on probabilistic programs

Compared with previous results on probabilistic programs, our approach considers synthesis of exponential supermartingales. As compared to (Monniaux, 2001), our approach is based on the well-studied theory of martingales for stochastic processes. In comparison to previous martingale-based approaches, we either achieve asymptotically better bounds (e.g. we achieve exponentially-decreasing bounds as compared to polynomially-decreasing bounds of (Fu and Chatterjee, 2019; Kura et al., 2019; Wang et al., 2020)) or substantially improve the bounds (e.g. in comparison with (Chatterjee et al., 2018b) (see our experimental results in Section 4.3). Moreover, all previous results, such as (Chakarov and Sankaranarayanan, 2013; Chatterjee et al., 2018b), that achieve exponentially-decreasing bounds require bounded-difference, i.e. the stepwise difference in a supermartingale needs to be globally bounded to apply Azuma or Hoeffding inequalities. In contrast, our results can apply to stochastic processes that are not necessarily difference-bounded.

Comparison with previous approaches on probabilistic recurrences

Compared with previous results on probabilistic recurrence relations, our result is based on the idea of exponential supermartingales and related automation techniques. It can derive much better concentration bounds over the classical approach of (Karp, 1994) (See Remarks 2,3, and 4 in Section 5.3), and beats the manual approach of (Tassarotti, 2017) in constants(See Remark 5). Moreover, our approach also derives a tail bound for QuickSort that is comparable to the optimal bound proposed in (McDiarmid and Hayward, 1996) (see Remark 6 in Section 5.3). In addition, the result of (Karp, 1994) requires the key condition i𝔼(T(hi(n)))𝔼(T(n))\sum_{i}\mathbb{E}(T(h_{i}(n)))\leq\mathbb{E}(T(n)) to handle recurrences with multiple procedure calls. Whether this condition can be relaxed is raised as an important open problem in (Karp, 1994; Dubhashi and Panconesi, 2009). To address this issue, the approach of (Tassarotti, 2017) proposed a method that does not require this restriction. Our method also does not need this restriction, hence could be viewed as a new way to resolve this problem.

Key conceptual difference

Finally, our general approach of exponential supermartingales has a key difference with respect to previous approaches. The main conceptual difference is that our approach examines the detailed probability distribution (as moment generating function). In comparison, previous approaches (e.g. those based Azuma and Hoeffding inequalities, or results of (Karp, 1994)) only consider the expectation, the variance, or the range of related random variables. This is the key conceptual reason that our approach can derive tighter bounds.

7. Conclusion and Future Work

In this work, we presented a new approach to derive tighter concentration bounds for probabilistic programs and probabilistic recurrence relations. We showed that our new approach can derive tighter concentration bounds than the classical methods of applying Azuma’s inequality for probabilistic programs and the classical approaches for probabilistic recurrences, even for basic randomized algorithms such as QuickSelect and randomized diameter computation. On QuickSort, we beat the approach of  (Karp, 1994) and (Tassarotti, 2017), and derive a bound comparable to the optimal bound in  (McDiarmid and Hayward, 1996).

There are several interesting directions for future work. First, while we consider classical probability distributions such as uniform and Bernoulli for algorithmic aspects, extending the algorithms to handle various other distributions is an interesting direction. Second, whether our technique can be used for the relaxation of the key condition i𝔼(T(hi(n)))𝔼(T(n))\sum_{i}\mathbb{E}(T(h_{i}(n)))\leq\mathbb{E}(T(n)) in the approach of (Karp, 1994) is another interesting problem. Finally, whether our approach can be directly applied to analyze randomized algorithms, rather than their corresponding probabilistic recurrences, is another direction of future work.

8. Acknowledgements

We sincerely thank Prof. Joseph Tassarotti for pointing out a miscalculation in Remark 6.

References

  • (1)
  • Agrawal et al. (2018) Sheshansh Agrawal, Krishnendu Chatterjee, and Petr Novotný. 2018. Lexicographic ranking supermartingales: an efficient approach to termination of probabilistic programs. In POPL. 34:1–34:32.
  • Azuma (1967) Kazuoki Azuma. 1967. Weighted sums of certain dependent random variables. Tohoku Mathematical Journal 19, 3 (1967), 357–367.
  • Baier and Katoen (2008) Christel Baier and Joost-Pieter Katoen. 2008. Principles of model checking. MIT Press.
  • Barthe et al. (2018) Gilles Barthe, Thomas Espitau, Benjamin Grégoire, Justin Hsu, and Pierre-Yves Strub. 2018. Proving expected sensitivity of probabilistic programs. In POPL. 57:1–57:29.
  • Barthe et al. (2017) Gilles Barthe, Benjamin Grégoire, Justin Hsu, and Pierre-Yves Strub. 2017. Coupling proofs are probabilistic product programs. In POPL. 161–174.
  • Barthe et al. (2020) Gilles Barthe, Joost-Pieter Katoen, and Alexandra Silva (Eds.). 2020. Foundations of Probabilistic Programming. Springer.
  • Chakarov and Sankaranarayanan (2013) Aleksandar Chakarov and Sriram Sankaranarayanan. 2013. Probabilistic Program Analysis with Martingales. In CAV 2013. 511–526.
  • Chatterjee et al. (2016) Krishnendu Chatterjee, Hongfei Fu, and Amir Kafshdar Goharshady. 2016. Termination Analysis of Probabilistic Programs Through Positivstellensatz’s. In CAV. 3–22.
  • Chatterjee et al. (2018a) Krishnendu Chatterjee, Hongfei Fu, Amir Kafshdar Goharshady, and Nastaran Okati. 2018a. Computational Approaches for Stochastic Shortest Path on Succinct MDPs. In IJCAI 2018. 4700–4707.
  • Chatterjee et al. (2017) Krishnendu Chatterjee, Hongfei Fu, and Aniket Murhekar. 2017. Automated Recurrence Analysis for Almost-Linear Expected-Runtime Bounds. In CAV. 118–139.
  • Chatterjee et al. (2018b) Krishnendu Chatterjee, Hongfei Fu, Petr Novotný, and Rouzbeh Hasheminezhad. 2018b. Algorithmic Analysis of Qualitative and Quantitative Termination Problems for Affine Probabilistic Programs. TOPLAS 40, 2 (2018), 7:1–7:45.
  • Chaudhuri and Dubhashi (1997) Shiva Chaudhuri and Devdatt P. Dubhashi. 1997. Probabilistic Recurrence Relations Revisited. Theoretical Computer Science 181, 1 (1997), 45–56.
  • Cusumano-Towner et al. (2018) Marco Cusumano-Towner, Benjamin Bichsel, Timon Gehr, Martin T. Vechev, and Vikash K. Mansinghka. 2018. Incremental inference for probabilistic programs. In PLDI. 571–585.
  • Dahlqvist and Kozen (2020) Fredrik Dahlqvist and Dexter Kozen. 2020. Semantics of higher-order probabilistic programs with conditioning. In POPL. 57:1–57:29.
  • Dubhashi and Panconesi (2009) Devdatt P. Dubhashi and Alessandro Panconesi. 2009. Concentration of Measure for the Analysis of Randomized Algorithms. Cambridge University Press.
  • Durrett (1996) R. Durrett. 1996. Probability: Theory and Examples (Second Edition). Duxbury Press.
  • Esparza et al. (2012) Javier Esparza, Andreas Gaiser, and Stefan Kiefer. 2012. Proving Termination of Probabilistic Programs Using Patterns. In CAV. 123–138.
  • Fioriti and Hermanns (2015) Luis María Ferrer Fioriti and Holger Hermanns. 2015. Probabilistic Termination: Soundness, Completeness, and Compositionality. In POPL. 489–501.
  • Foster et al. (2016) Nate Foster, Dexter Kozen, Konstantinos Mamouras, Mark Reitblatt, and Alexandra Silva. 2016. Probabilistic NetKAT. In ESOP. 282–309.
  • Fu and Chatterjee (2019) Hongfei Fu and Krishnendu Chatterjee. 2019. Termination of Nondeterministic Probabilistic Programs. In VMCAI. 468–490.
  • Hark et al. (2020) Marcel Hark, Benjamin Lucien Kaminski, Jürgen Giesl, and Joost-Pieter Katoen. 2020. Aiming low is harder: induction for lower bounds in probabilistic program verification. In POPL. 37:1–37:28.
  • Hoare (1961a) Charles Antony Richard Hoare. 1961a. Algorithm 64: quicksort. Commun. ACM 4, 7 (1961), 321.
  • Hoare (1961b) Charles Antony Richard Hoare. 1961b. Algorithm 65: find. Commun. ACM 4, 7 (1961), 321–322.
  • Howard (1960) H. Howard. 1960. Dynamic Programming and Markov Processes. MIT Press.
  • Kaelbling et al. (1998) L. P. Kaelbling, M. L. Littman, and A. R. Cassandra. 1998. Planning and acting in partially observable stochastic domains. Artificial intelligence 101, 1 (1998), 99–134.
  • Kaelbling et al. (1996) L. P. Kaelbling, M. L. Littman, and A. W. Moore. 1996. Reinforcement learning: A survey. Journal of Artificial Intelligence Research 4 (1996), 237–285.
  • Kaminski et al. (2016) Benjamin Lucien Kaminski, Joost-Pieter Katoen, Christoph Matheja, and Federico Olmedo. 2016. Weakest Precondition Reasoning for Expected Run-Times of Probabilistic Programs. In ESOP. 364–389.
  • Karp (1994) Richard M. Karp. 1994. Probabilistic Recurrence Relations. Journal of the ACM 41, 6 (1994), 1136–1150.
  • Kemeny et al. (1966) J.G. Kemeny, J.L. Snell, and A.W. Knapp. 1966. Denumerable Markov Chains. Van Nostrand Company.
  • Kleinberg and Tardos (2006) Jon M. Kleinberg and Éva Tardos. 2006. Algorithm design. Addison-Wesley.
  • Kura et al. (2019) Satoshi Kura, Natsuki Urabe, and Ichiro Hasuo. 2019. Tail Probabilities for Randomized Program Runtimes via Martingales for Higher Moments. In TACAS.
  • Kwiatkowska et al. (2011) Marta Z. Kwiatkowska, Gethin Norman, and David Parker. 2011. PRISM 4.0: Verification of Probabilistic Real-Time Systems. In CAV. 585–591.
  • McDiarmid and Hayward (1996) Colin McDiarmid and Ryan Hayward. 1996. Large Deviations for Quicksort. Journal of Algorithms 21, 3 (1996), 476–507.
  • Meurer et al. (2017) Aaron Meurer, Christopher P Smith, Mateusz Paprocki, Ondřej Čertík, Sergey B Kirpichev, Matthew Rocklin, Amit Kumar, Sergiu Ivanov, Jason K Moore, Sartaj Singh, et al. 2017. SymPy: symbolic computing in Python. PeerJ Computer Science 3 (2017).
  • Meyn and Tweedie (1993) S.P. Meyn and R.L. Tweedie. 1993. Markov chains and stochastic stability. Springer.
  • Monniaux (2001) David Monniaux. 2001. An Abstract Analysis of the Probabilistic Termination of Programs. In SAS. 111–126.
  • Motwani and Raghavan (1995) Rajeev Motwani and Prabhakar Raghavan. 1995. Randomized Algorithms. Cambridge University Press.
  • Ngo et al. (2018) Van Chan Ngo, Quentin Carbonneaux, and Jan Hoffmann. 2018. Bounded expectations: resource analysis for probabilistic programs. In PLDI 2018. 496–512.
  • Olmedo et al. (2016) Federico Olmedo, Benjamin Lucien Kaminski, Joost-Pieter Katoen, and Christoph Matheja. 2016. Reasoning about Recursive Probabilistic Programs. In LICS, Martin Grohe, Eric Koskinen, and Natarajan Shankar (Eds.). 672–681.
  • Paz (1971) A. Paz. 1971. Introduction to probabilistic automata (Computer science and applied mathematics). Academic Press.
  • Rabin (1963) M.O. Rabin. 1963. Probabilistic automata. Information and Control 6 (1963), 230–245.
  • Sankaranarayanan (2020) Sriram Sankaranarayanan. 2020. Quantitative Analysis of Programs with Probabilities and Concentration of Measure Inequalities. In Foundations of Probabilistic Programming, Gilles Barthe, Joost-Pieter Katoen, and Alexandra Silva (Eds.). https://www.cs.colorado.edu/~srirams/papers/concMeasureSpringerChapter.PDF
  • Sankaranarayanan et al. (2013) Sriram Sankaranarayanan, Aleksandar Chakarov, and Sumit Gulwani. 2013. Static analysis for probabilistic programs: inferring whole program properties from finitely many paths. In PLDI. 447–458.
  • Tassarotti (2017) Joseph Tassarotti. 2017. Probabilistic Recurrence Relations for Work and Span of Parallel Algorithms. CoRR abs/1704.02061 (2017). http://arxiv.org/abs/1704.02061
  • Tassarotti and Harper (2018) Joseph Tassarotti and Robert Harper. 2018. Verified Tail Bounds for Randomized Programs. In ITP. 560–578.
  • Wang et al. (2020) Di Wang, Jan Hoffmann, and Thomas W. Reps. 2020. Tail Bound Analysis for Probabilistic Programs via Central Moments. CoRR abs/2001.10150 (2020). https://arxiv.org/abs/2001.10150
  • Williams (1991a) David Williams. 1991a. Probability with Martingales. Cambridge University Press.
  • Williams (1991b) David Williams. 1991b. Probability with martingales. Cambridge university press.
  • Wolfram Research (2020) Wolfram Research. 2020. Mathematica, Version 12.1. https://www.wolfram.com/mathematica

Appendix A Detailed Experimental Results for Section 4.3

In this section, there are some figures of related example in section  4.3 in Figure 1. The numbers on the x-axis are the terminating time nn varying from 6X06X_{0} to 22X022X_{0}.

Refer to caption
(a) This is the result of example rdwalk1.
Refer to caption
(b)This is the result of example rdwalk2.
Refer to caption
(c) This is the result of example prspeed.
Refer to caption
(d)This is the result of example vad1D.
Refer to caption
(e) This is the result of example mini-roulette.
Refer to caption
(f)This is the result of example mini-roulette2.
Figure 1. They are specific figures of experiment results. In these figures, the solid line represents the Hoeffding Bound and the dotted line represents the bound from our approach.

Appendix B Detailed Proofs for Section 5

B.1. Proof of Lemma 5.8

It is easy to see that this form is closed under addition, multiplication and derivative. We first prove the following lemma:

Lemma B.1.

For 0h(n)f(n)0\leq h(n)\leq f(n) generated by 𝔢\mathfrak{e}, suppose g(n)g(n) is the most significant term (ignoring coefficients), which means f(n):=qg(n)+o(g(n))f(n):=q\cdot g(n)+o(g(n)). Set c:=αg(n)c:=\alpha^{g(n)}, then for every uu\in\mathbb{R}, uαh(n)u\cdot\alpha^{h(n)} could be simplified under Step 3’s strategy into a univariate function on cc as in Lemma 5.8.

Proof.

Suppose h(n):=iωihi(n)h(n):=\sum_{i}\omega_{i}h_{i}(n), where hi(n){n,lnn,nlnn}h_{i}(n)\in\{n,\ln n,n\ln n\}. Then we have:

uαh(n)\displaystyle u\cdot\alpha^{h(n)} =uch(n)g(n)\displaystyle=u\cdot c^{\frac{h(n)}{g(n)}}
=uicωihi(n)g(n)\displaystyle=u\cdot\prod_{i}{c^{\omega_{i}\frac{h_{i}(n)}{g(n)}}}

For every ii, if hi(n)=g(n)h_{i}(n)=g(n), then this term would become cωic^{\omega_{i}}, otherwise hi(n)g(n)1\frac{h_{i}(n)}{g(n)}\leq 1 since h(n)f(n)h(n)\leq f(n). Then if ωiu>0\omega_{i}\cdot u>0, it would be simplifed into 11, otherwise it would be simplified into cc. Since the form is closed under multiplication, we conclude that uαh(n)u\cdot\alpha^{h(n)} could be simplified into the form of Lemma 5.8.

We now return to proving Lemma 5.8. By design, our PRR always has the following form:

𝒯(n)=a(n)+γn(i=n/2n1𝒯(i)+i=n/2n1𝒯(i))+1γni=0n1𝒯(i)\mathcal{T}(n)=a(n)+\frac{\gamma}{n}\cdot\left(\sum_{i=\lceil n/2\rceil}^{n-1}\mathcal{T}(i)+\sum_{i=\lfloor n/2\rfloor}^{n-1}\mathcal{T}(i)\right)+\frac{1-\gamma}{n}\cdot{\sum_{i=0}^{n-1}\mathcal{T}(i)}

where 0γ10\leq\gamma\leq 1. We consider several cases:

Case (i)

f(n):=qlnnf(n):=q\ln n and q>0q>0. Recall that f(n)f(n) is a over-approximation of 𝔼[𝒯(n)]\mathbb{E}[\mathcal{T}(n)], a(n)a(n) would be k1lnn+k2k_{1}\ln n+k_{2}. First note that k1=0k_{1}=0, since if k1>0k_{1}>0, then 𝔼[𝒯(n)]=Ω(ln2n)>f(n)\mathbb{E}[\mathcal{T}(n)]=\Omega(\ln^{2}n)>f(n). Since k1=0k_{1}=0, we have k20k_{2}\geq 0. Hence, the conditions for α\alpha would be:

αqlnnαk2(γn(i=n/2n1αqlni+i=n/2n1αqlni)+1γni=0n1αqlni)\alpha^{q\cdot\ln n}\geq\alpha^{k_{2}}\cdot\left(\frac{\gamma}{n}\cdot\left(\sum_{i=\lceil n/2\rceil}^{n-1}\alpha^{q\cdot\ln i}+\sum_{i=\lfloor n/2\rfloor}^{n-1}\alpha^{q\cdot\ln i}\right)+\frac{1-\gamma}{n}\cdot{\sum_{i=0}^{n-1}\alpha^{q\cdot\ln i}}\right)

By integration and concavity, we have:

αqlnnαk2(2γn(nαqlnn(n/2)αqln(n/2)qlnα+1)+1γnnαqlnnqlnα+1)\alpha^{q\cdot\ln n}\geq\alpha^{k_{2}}\cdot\left(\frac{2\gamma}{n}\cdot\left(\frac{n\alpha^{q\ln n}-(n/2)\alpha^{q\ln(n/2)}}{q\ln\alpha+1}\right)+\frac{1-\gamma}{n}\cdot{\frac{n\alpha^{q\ln n}}{q\ln\alpha+1}}\right)

We define c:=αlnnc:=\alpha^{\ln n}, we have:

cqαk2(2γn(ncq(n/2)(cqαqln2)qlnα+1)+1γnncqqlnα+1)c^{q}\geq\alpha^{k_{2}}\cdot\left(\frac{2\gamma}{n}\cdot\left(\frac{nc^{q}-(n/2)(\frac{c^{q}}{\alpha^{q\ln 2}})}{q\ln\alpha+1}\right)+\frac{1-\gamma}{n}{\cdot\frac{nc^{q}}{q\ln\alpha+1}}\right)

By moving everything to left and eliminating the fraction, we get:

ncq(1+qlnα)αk22γ(ncq(n/2)cqαqln2)+(1γ)(ncq)0nc^{q}(1+q\ln\alpha)-\alpha^{k_{2}}\cdot{2\gamma}\cdot\left(nc^{q}-(n/2)\cdot\frac{c^{q}}{\alpha^{q\ln 2}}\right)+(1-\gamma)\cdot(n\cdot c^{q})\geq 0

By removing common parts nn and cqc^{q}, we derive:

(1+qlnα)αk22γ(1(1/2)1αqln2)+(1γ)0(1+q\ln\alpha)-\alpha^{k_{2}}\cdot{2\gamma}\cdot\left({1-(1/2)\cdot\frac{1}{\alpha^{q\ln 2}}}\right)+(1-\gamma)\geq 0

Since k2,γ,qk_{2},\gamma,q are constants, this is a univariate inequality for α\alpha in the form of Lemma 5.8.

Case (ii)

f(n):=qn,q>0f(n):=qn,q>0. Recall that f(n)f(n) is a over-approximation of 𝔼[𝒯(n)]\mathbb{E}[\mathcal{T}(n)]. a(n)a(n) would be k1n+k2lnn+k3k_{1}n+k_{2}\ln n+k_{3}, where 0<k1q,k2,k30<k_{1}\leq q,k_{2},k_{3}\in\mathbb{R}. Conditions for α\alpha would be:

αqnαk1n+k2lnn+k3(γn(i=n/2n1αqi+i=n/2n1αqi)+1γni=0n1αqi)\alpha^{q\cdot n}\geq\alpha^{k_{1}\cdot n+k_{2}\cdot\ln n+k_{3}}\cdot\left(\frac{\gamma}{n}\cdot\left(\sum_{i=\lceil n/2\rceil}^{n-1}\alpha^{q\cdot i}+\sum_{i=\lfloor n/2\rfloor}^{n-1}\alpha^{q\cdot i}\right)+\frac{1-\gamma}{n}\cdot{\sum_{i=0}^{n-1}\alpha^{q\cdot i}}\right)

By integration and concavity, we have:

αqnαk1n+k2lnn+k3(2γn(αqnαq(n/2)qlnα)+1γnαqn1qlnα)\alpha^{q\cdot n}\geq\alpha^{k_{1}\cdot n+k_{2}\cdot\ln n+k_{3}}\cdot\left(\frac{2\gamma}{n}\cdot\left(\frac{\alpha^{q\cdot n}-\alpha^{q\cdot(n/2)}}{q\ln\alpha}\right)+\frac{1-\gamma}{n}\cdot{\frac{\alpha^{q\cdot n}-1}{q\ln\alpha}}\right)

By moving everything to left and eliminating the fraction, we get:

nqlnααqnαk1n+k2lnn+k3(2γ(αqnαq(n/2))+(1γ)(αqn1))0n\cdot q\cdot\ln\alpha\cdot\ \alpha^{q\cdot n}-\alpha^{k_{1}\cdot n+k_{2}\cdot\ln n+k_{3}}\cdot\left({2\gamma}\cdot\left({\alpha^{q\cdot n}-\alpha^{q\cdot(n/2)}}\right)+({1-\gamma})\cdot{\left({\alpha^{q\cdot n}-1}\right)}\right)\geq 0

We define c:=αnc:=\alpha^{n}. By Lemma B.1, αk1n+k2lnn+k3\alpha^{k_{1}\cdot n+k_{2}\cdot\ln n+k_{3}}, and since nlnα=lncn\ln\alpha=\ln c could be simplified into the form we want, thus the whole formula could be simplified into the form we want.

Case (iii)

Otherwise, by design, αf(n)\alpha^{f(n)} will have no elementary antiderivative, in this case we would partition the summation uniformly into BB parts, and use the maximum element of a part to over-approximate each of its elements. In this case, conditions for α\alpha would be:

αf(n)αa(n)(γn(2i=(n1)/2n1αf(i))+1γni=0n1αf(i))\alpha^{f(n)}\geq\alpha^{a(n)}\cdot\left(\frac{\gamma}{n}\cdot\left(2\sum_{i=\lceil(n-1)/2\rceil}^{n-1}\alpha^{f(i)}\right)+\frac{1-\gamma}{n}\cdot{\sum_{i=0}^{n-1}\alpha^{f(i)}}\right)

By uniformly dividing into BB parts and using the maximum element of a part to over-approximate each of its elements, we derive:

αf(n)αa(n)(γn(2nBi=1Bαf(n(B+i)2B))+1γnnBi=1Bαf(inB))\alpha^{f(n)}\geq\alpha^{a(n)}\cdot\left(\frac{\gamma}{n}\cdot\left(\frac{2n}{B}\cdot\sum_{i=1}^{B}{\alpha^{f\left(\frac{n(B+i)}{2B}\right)}}\right)+\frac{1-\gamma}{n}\cdot\frac{n}{B}\cdot{\sum_{i=1}^{B}\alpha^{f\left(\frac{in}{B}\right)}}\right)

By moving everything to the left, and removing the denominators, we obtain:

Bnαf(n)αa(n)(2γni=1Bαf(n(B+i)2B)+n(1γ)i=1Bαf(inB))0\displaystyle B\cdot n\cdot\alpha^{f(n)}-\alpha^{a(n)}\cdot\left(2\cdot\gamma\cdot n\cdot\sum_{i=1}^{B}\alpha^{f\left(\frac{n(B+i)}{2B}\right)}+n\cdot(1-\gamma)\cdot\sum_{i=1}^{B}\alpha^{f\left(\frac{in}{B}\right)}\right)\geq 0

We remove the common term nn, and let c:=αg(n)c:=\alpha^{g(n)}, where g(n)g(n) is the most siginificant term in f(n)f(n) (ignoring coefficients). By Lemma B.1, the whole formula could be simplified into the form we want.

B.2. Proof of Lemma 5.9

Proof for Case (i)

Define ψ():=limtψ(t){}\psi(\infty):=\lim_{t\to\infty}\psi(t)\in\mathbb{R}\cup\{-\infty\}. Set c:=sup{x|x1ψ(x)0}c^{*}:=\sup\{x|x\geq 1\land\psi(x)\geq 0\}, then by x>c,ψ(x)<0\forall x>c^{*},\psi(x)<0 and continuity of ψ\psi, we have ψ(c)0\psi(c^{*})\geq 0. Since ψ\psi is monotonically decreasing, x<c,ψ(x)>ψ(c)0\forall x<c^{*},\psi(x)>\psi(c^{*})\geq 0, so cc^{*} is feasible.

Proof for Case (ii)

Since ψ(c)\psi^{\prime}(c) is separable, there exists cc^{\prime} such that ψ(c)>0\psi^{\prime}(c)>0 on (1,c)(1,c^{\prime}) and ψ(c)<0\psi^{\prime}(c)<0 on (c,)(c^{\prime},\infty). By Newton-Leibnitz formula: ψ(c)=ψ(1)+1cψ(t)dt\psi(c^{\prime})=\psi(1)+\int_{1}^{c^{\prime}}\psi^{\prime}(t)\text{d}t, so ψ(c)ψ(1)0\psi(c^{\prime})\geq\psi(1)\geq 0. Since ψ(c)<0\psi^{\prime}(c)<0 on (c,)(c^{\prime},\infty), then by similarly to (i), we conclude that ψ\psi is separable.

Proof for Case (iii)

If ψ/(ca)\psi/(c^{a}) is separable for some aa\in\mathbb{R}, then there exists cc^{*} such that ψ(c)/ca>0\psi(c)/c^{a}>0 on (1,c)(1,c^{*}) and ψ(c)/ca<0\psi(c)/c^{a}<0 on (c,)(c^{*},\infty), since ca>0c^{a}>0 for c>1c>1, we conclude that ψ(c)>0\psi(c)>0 on (1,c)(1,c^{*}) and ψ(c)<0\psi(c)<0 on (c,)(c^{*},\infty), so ψ\psi is separable.

Appendix C Supplementary material for probabilistic recurrences

C.1. On the concentration bound for Quicksort

Here we emphasize use exactly the same step in our paper to derive the following bound for all k1,k2>0k_{1},k_{2}>0:

Pr[𝒯(n)(9+k1)nlnn+k2n](2.3)k1lnnk2\Pr[\mathcal{T}(n^{*})\geq(9+k_{1})\cdot n^{*}\ln n^{*}+k_{2}\cdot n]\leq(2.3)^{-k_{1}\cdot\ln n^{*}-k_{2}}

The steps below until the sentence ”Finally, we could prove..” are exactly the same as our paper in Page 14, without changing any word.

Given f(n):=9nlnnf(n):=9\cdot n\cdot\ln n, we apply Lemma 5.3 and obtain the following conditions for α\alpha:We assume 0ln0:=00\ln 0:=0.

1nn,α9nlnnαn11ni=0n1αf(i)+f(n1i)\forall 1\leq n\leq n^{*},~{}~{}~{}\alpha^{9\cdot n\cdot\ln n}\geq\alpha^{n-1}\cdot\frac{1}{n}\cdot\sum_{i=0}^{n-1}{\alpha^{f(i)+f(n-1-i)}}

For 1n81\leq n\leq 8, we can manually verify that it suffices to set α>1\alpha>1. In the sequel, we assume n8.n\geq 8. Note that (ilni+(ni1)ln(ni1))\left(i\cdot\ln i+(n-i-1)\cdot\ln(n-i-1)\right) is monotonically decreasing on [1,n/2][1,\lfloor n/2\rfloor], and monotonically increasing on [n/2,n][\lceil n/2\rceil,n]. We partition the summation above uniformly into eight parts and use the maximum of each part to overapproximate the sum. This leads to the following upper bound for this summation:

i=0n1αf(i)\displaystyle\sum_{i=0}^{n-1}{\alpha^{f(i)}} j=07i=jn/8(j+1)n/8αf(i)+f(n1i)\displaystyle\leq\sum_{j=0}^{7}\sum_{i=\lceil j\cdot n/8\rceil}^{\lfloor(j+1)\cdot n/8\rfloor}\alpha^{f(i)+f(n-1-i)}
n4(α9nlnn+α9(n8lnn8+7n8ln7n8)+α9(n4lnn4+3n4ln3n4)+α9(5n8ln5n8+3n8ln3n8))\displaystyle\leq\frac{n}{4}\cdot\left(\alpha^{9\cdot n\cdot\ln n}+\alpha^{9\cdot(\frac{n}{8}\cdot\ln\frac{n}{8}+\frac{7\cdot n}{8}\cdot\ln\frac{7\cdot n}{8})}+\alpha^{9\cdot(\frac{n}{4}\cdot\ln\frac{n}{4}+\frac{3\cdot n}{4}\ln\frac{3\cdot n}{4})}+\alpha^{9\cdot(\frac{5\cdot n}{8}\ln\frac{5\cdot n}{8}+\frac{3\cdot n}{8}\ln\frac{3\cdot n}{8})}\right)

Plugging in this overapproximation back into the original inequality, we get:

α9nlnnαn114(α9nlnn+α9(n8lnn8+7n8ln7n8)+α9(n4lnn4+3n4ln3n4)+α9(5n8ln5n8+3n8ln3n8))\alpha^{9\cdot n\cdot\ln n}\geq\alpha^{n-1}\cdot\frac{1}{4}\cdot\left(\alpha^{9\cdot n\cdot\ln n}+\alpha^{9\cdot(\frac{n}{8}\cdot\ln\frac{n}{8}+\frac{7\cdot n}{8}\cdot\ln\frac{7\cdot n}{8})}+\alpha^{9\cdot(\frac{n}{4}\cdot\ln\frac{n}{4}+\frac{3\cdot n}{4}\ln\frac{3\cdot n}{4})}+\alpha^{9\cdot(\frac{5\cdot n}{8}\ln\frac{5\cdot n}{8}+\frac{3\cdot n}{8}\ln\frac{3\cdot n}{8})}\right)

for all 8nn8\leq n\leq n^{*}. We define c:=αnc:=\alpha^{n} to do substitution, and we use the following formula to do strengthening:

αβlnβ+(nβ)ln(nβ)\displaystyle\alpha^{\beta\cdot\ln\beta+(n-\beta)\cdot\ln(n-\beta)} αβlnn+(nβ)ln(nβ)\displaystyle\leq\alpha^{\beta\cdot\ln n+(n-\beta)\cdot\ln(n-\beta)}
=αnlnnα(nβ)lnn+(nβ)ln(nβ)\displaystyle=\alpha^{n\cdot\ln n}\cdot\alpha^{-(n-\beta)\cdot\ln n+(n-\beta)\cdot\ln(n-\beta)}
=αnlnn+(nβ)lnnβn=cc(nβ)nlnnlnnβn\displaystyle=\alpha^{n\cdot\ln n+(n-\beta)\cdot\ln\frac{n-\beta}{n}}=c\cdot c^{\frac{(n-\beta)}{n\cdot\ln n}\cdot\ln\frac{n-\beta}{n}}

By defining β=n8,n4,3n8\beta=\frac{n}{8},\frac{n}{4},\frac{3n}{8} respectively, we obtain:

8nn,\displaystyle\forall 8\leq n\leq n^{*},~{}~{} c9lnncnn4(c9lnn+c9lnn+638ln78+c9lnn+274ln34+c9lnn+458ln58)\displaystyle c^{9\ln n}\geq\frac{c}{n}\cdot\frac{n}{4}\cdot\left(c^{9\ln n}+c^{9\ln n+\frac{63}{8}\cdot\ln\frac{7}{8}}+c^{9\ln n+\frac{27}{4}\cdot\ln\frac{3}{4}}+c^{9\ln n+\frac{45}{8}\cdot\ln\frac{5}{8}}\right)
8nn,\displaystyle\forall 8\leq n\leq n^{*},~{}~{} 4(c+c1638ln87+c1274ln43+c1458ln85)0\displaystyle 4-\left(c+c^{1-\frac{63}{8}\cdot\ln\frac{8}{7}}+c^{1-\frac{27}{4}\cdot\ln\frac{4}{3}}+c^{1-\frac{45}{8}\cdot\ln\frac{8}{5}}\right)\geq 0

Now we study the following function:

ψ(c)=4(c+c1638ln87+c1274ln43+c1458ln85).\psi(c)=4-\left(c+c^{1-\frac{63}{8}\cdot\ln\frac{8}{7}}+c^{1-\frac{27}{4}\cdot\ln\frac{4}{3}}+c^{1-\frac{45}{8}\cdot\ln\frac{8}{5}}\right).

By basic calculus, we can prove that ψ(c)0\psi(c)\geq 0 holds on [1,2.3][1,2.3]. Additionally, since for every α\alpha, αn\alpha^{n} increases as nn increases, by plugging α=2.31/(n)\alpha=2.3^{1/(n^{*})} into Lemma 5.3, we obtain:

Pr[𝒯(n)(9+k1)nlnn+k2n](2.3)k1lnnk2\Pr[\mathcal{T}(n^{*})\geq(9+k_{1})\cdot n^{*}\ln n^{*}+k_{2}\cdot n]\leq(2.3)^{-k_{1}\cdot\ln n^{*}-k_{2}}