Concentration of the Langevin Algorithm’s Stationary Distribution
Abstract
A canonical algorithm for log-concave sampling is the Langevin Algorithm, aka the Langevin Diffusion run with some discretization stepsize . This discretization leads the Langevin Algorithm to have a stationary distribution which differs from the stationary distribution of the Langevin Diffusion, and it is an important challenge to understand whether the well-known properties of extend to . In particular, while concentration properties such as isoperimetry and rapidly decaying tails are classically known for , the analogous properties for are open questions with algorithmic implications. This note provides a first step in this direction by establishing concentration results for that mirror classical results for . Specifically, we show that for any nontrivial stepsize , is sub-exponential (respectively, sub-Gaussian) when the potential is convex (respectively, strongly convex). Moreover, the concentration bounds we show are essentially tight. We also show that these concentration bounds extend to all iterates along the trajectory of the Langevin Algorithm, and to inexact implementations which use sub-Gaussian estimates of the gradient.
Key to our analysis is the use of a rotation-invariant moment generating function (aka Bessel function) to study the stationary dynamics of the Langevin Algorithm. This technique may be of independent interest because it enables directly analyzing the discrete-time stationary distribution without going through the continuous-time stationary distribution as an intermediary.
1 Introduction
Sampling from a log-concave distribution over is a foundational problem with applications throughout statistics, engineering, and the sciences. A canonical and well-studied approach is the Langevin Algorithm, which is the discrete-time Markov process
(1.1) |
where is a stepsize parameter, and are independent Gaussians. Briefly, the intuition behind the Langevin Algorithm is that it discretizes the Langevin Diffusion, which is a continuous-time Markov process with stationary distribution . More precisely, the Langevin Diffusion is the stochastic differential equation
(1.2) |
where is a standard Brownian motion on , and the Langevin Algorithm is identical except that it updates the gradient every units of time rather than continuously. We refer to the textbooks [7, 2, 16, 10, 13] for a detailed historical account of the Langevin Algorithm and the truly extensive literature surrounding it, which spans multiple decades and communities.
Although the discretization enables simulating the Langevin Diffusion algorithmically, it introduces an asymptotic bias. Specifically, for any discretization stepsize , the stationary distribution of the Langevin Algorithm differs from the stationary distribution of the Langevin Diffusion. An important question is:
Concentration?
The motivation of this note is that this question is wide open for concentration properties. Indeed, while concentration properties for have been classically known for several decades, the analogous properties for have remained unknown. Perhaps the most notable example of this is isoperimetry: while classical results imply that satisfies Poincaré/log-Sobolev isoperimetric inequalities when the potential is convex/strongly-convex [4, 6, 11], the analogous properties for are unknown. The influential paper [19] stated this as an open question and proved rapid mixing of the Langevin Algorithm in Rényi divergence under this conjectural assumption of isoperimetry of . Another example is rapidly decaying tails: while classical results imply that has sub-exponential/sub-Gaussian tails when the potential is convex/strongly-convex [12, 3], the analogous properties for were previously unknown. Such tail decay results can be exploited to extend fast-mixing results for the Langevin Algorithm from constrained settings to unconstrained settings [1].
Contributions.
The purpose of this note is to provide a first step in this direction. Specifically, in §4 we show that the stationary distribution of the Langevin Algorithm is sub-Gaussian when the potential is strongly convex; and in §5 we show that is sub-exponential when is convex. These results hold for any nontrivial222 Our results apply for . This is the relevant regime in both theory and practice since the Langevin Algorithm is transient if . In fact, is typically polynomially smaller than to ensure small bias [7]. discretization stepsize and mirror the aforementioned classical results for . Moreover, the concentration bounds we show are essentially tight.
In §6, we mention two extensions of these results: the concentration bounds extend to all iterates along the trajectory of the Langevin Algorithm, and/or to inexact implementations of the Langevin Algorithm which use sub-Gaussian estimates of the gradient.
Techniques.
A central obstacle for proving this result is that this requires establishing properties of for any discretization stepsize , not just arbitrarily small. Since the difference between and grows with , we approach this problem by directly analyzing —rather than analyzing properties of and then trying to conclude properties of via approximation bounds between and . However, directly analyzing comes with the difficulty that many standard analysis techniques for the continuous-time Langevin Diffusion become much more complicated after discretization. To overcome this, our analysis makes use of a rotation-invariant moment generating function (aka Bessel function) to directly study the dynamics of the discrete-time Langevin Algorithm at stationarity. Briefly, the key point is that this Lyapunov function readily tracks the effect of gradient descent updates and Gaussian noise convolutions—the two components of the update equation (1.1) for the Langevin Algorithm—thereby enabling a simple and direct analysis of this discrete-time process. This technique may be of independent interest and is developed in §3.
Independent work.
After completing an initial draft of this manuscript, we found out that a recent revision of [19] (arXiv v4) has a new result (Theorem 8) which shows that in the strongly convex setting, satisfies LSI and thus is sub-Gaussian with concentration parameters that exactly match our Theorem 4.1. The proof techniques are completely different, and the final results are incomparable in their level of generality. On one hand, their result subsumes Theorem 4.1 in that it shows LSI, which implies sub-Gaussianity. On the other hand, our techniques are more general in that they readily extend to the (non-strongly) convex setting, while the techniques of [19] only apply to the strongly convex setting (it is stated as an open question in their paper whether one can obtain the analogous results for this more general setting). We are grateful to Andre Wibisono for making us aware of this recent revision and for discussing the differences with us.
2 Preliminaries
Notation.
We write to denote the unit sphere in , and we abuse notation slightly by writing to indicate that is a random vector drawn uniformly from . Throughout, denotes the Euclidean norm. Recall that a differentiable function is said to be -smooth if is -Lipschitz; and is said to be -strongly convex if for all . All logarithms are natural. All other notation is introduced in the main text.
Integrability assumption.
Throughout, we assume that the convex potential satisfies since this suffices to ensure that the stationary distribution of the Langevin Diffusion exists and is equal to [17, 5]. This assumption is automatically satisfied in the strongly convex setting, and is a mild assumption in the convex setting (e.g., it holds if there is a minimizer of and a large enough radius such that at all points satisfying .)
3 Lyapunov function
We propose to use the following Lyapunov function in order to analyze .
Definition 3.1 (Lyapunov function).
For any dimension and weight , let denote the function
(3.1) |
This definition extends to random variables by taking the expectation . The rest of this section discusses properties and interpretations of this Lyapunov function.
3.1 Relation to rotation-invariant MGF
This Lyapunov function has a natural interpretation in terms of a rotationally-invariant version of the standard moment generating function (MGF). Specifically, rather than bounding the standard MGF of a distribution —which we recall is as a function of a fixed vector —we will bound the average . The latter is precisely the average of the standard MGF over all vectors , hence why we refer to it as the “rotation-invariant MGF”.
How do bounds on this rotation-invariant MGF imply concentration inequalities? Since properties like sub-Gaussianity and sub-exponentiality of a distribution are by definition equivalent to MGF bounds which hold uniformly over all vectors [15], these properties of course imply the same bounds for the rotationally-invariant MGF (simply average over ). The converse is in general lossy, in that a bound on the rotationally-invariant MGF does not imply the same333It is possible to extract a weaker bound for the standard MGF, but this loses dimension-dependent factors, which leads to loose bounds. In contrast, our analysis is tight up to a small constant factor. bound on the standard MGF uniformly over all vectors . But this conversion is of course lossless for rotationally-invariant distributions —and this suffices for the purposes of this note.
Let us elaborate: in order to prove concentration inequalities for where is drawn from the (not rotationally-symmetric) distribution , it suffices to prove the same concentration inequalities for , where is a randomly rotated version of , namely where is a random rotation. Since our analysis in the sequel lets us bound the rotationally-invariant MGF for the law of , this immediately gives concentration inequalities for and thus also for , as desired.
Briefly, the upshot of using this rotationally-invariant MGF is that, unlike the standard MGF, it is a function of the norm (see §3.2)—and as we describe in the sequel, this enables exploiting contractivity of the gradient descent step in the Langevin Algorithm update. We also mention that since our goal is to prove concentration inequalities on the norm, of course no information is lost by using the rotational-invariant MGF rather than the standard MGF, since the only difference is averaging over directions.
3.2 Explicit expression via Bessel functions
Of course, by rotational invariance, this Lyapunov function depends on the point only through its norm . That is,
(3.2) |
where is the univariate function given by
(3.3) |
The following lemma provides an elegant closed-form expression for this univariate function in terms of modified Bessel functions of the first kind and the Gamma function .
Lemma 3.2 (Explicit formula for Lyapunov function).
For any dimension and argument ,
where . (In dimension , we simply have .)
3.3 Properties of the Lyapunov function
The key reason we use the Lyapunov function to analyze the dynamics of the Langevin Algorithm is that the Langevin Algorithm’s update decomposes into a gradient descent step and a noise convolution step, and enables tracking the effect of both steps in an elegant manner.
The following lemma describes how is affected by the noise convolution. While a similar property holds for the standard MGF, the advantage of the rotation-invariant MGF over the standard MGF is that it enables exploiting contractivity of the gradient descent step. However, the effect of the gradient descent step is slightly different depending on whether the potential is just convex or also strongly convex, and thus is treated separately in the corresponding sections below.
Lemma 3.3 (Behavior of under Gaussian convolution).
For any dimension , point , weight , and noise variance ,
Proof.
We compute:
Above, the first and last steps are by definition of ; the second step is by Fubini’s Theorem; and the third step is by the well-known formula for the moment generating function of the multivariate Gaussian distribution (see, e.g., [9, §5.8]). ∎
The next lemma collects various basic properties of that we use in our analysis in the sequel. We remark that in the setting of strongly convex potentials, our analysis only makes use of the positivity and monotonicity in item (i) of this lemma; the setting of convex potentials has a more involved analysis and requires the eventual exponential growth in item (ii) of this lemma.
Lemma 3.4 (Properties of rotation-invariant MGF).
For any dimension :
-
(i)
The function is positive and increasing on .
-
(ii)
There exists some such that
for all and .
Proof.
Item (i) is immediate because is by definition an expectation over functions which are each positive and increasing. For item (ii), observe that Lemma 3.2 expresses as , which grows inverse polynomially in , times the Bessel function , which grows exponentially for large arguments due to the Hankel expansion [8, equation (10.40.1)]. The desired exponential growth immediately follows444The “fudge factor” of in the exponent is used to simply and crudely bound the lower-order polynomial terms, and is irrelevant for our purposes in the sequel.. ∎
Although unnecessary for the purposes of this paper, we remark in passing that non-asymptotic bounds on the exponential growth in item (ii) of the above lemma can be computed in several ways. One way is to use more precise non-asymptotic bounds on Bessel functions since, as mentioned above, is equal to modulo a lower-order polynomial term. Another way is to show that the logarithmic derivative of is eventually lower bounded by a constant; we cannot help but mention that an elegant identity for this purpose is that this logarithmic derivative simplifies to , which is the ratio of Bessel functions of different orders.
4 Sub-Gaussian concentration for strongly convex potentials
In this section we show that if the potential is strongly convex and smooth, then the stationary distribution of the Langevin Algorithm has sub-Gaussian tails. This parallels the classically known fact that in this strongly convex setting, the unbiased stationary distribution of the continuous Langevin Diffusion is sub-Gaussian [12].
Theorem 4.1 (Sub-Gaussianity of for strongly convex potentials).
Suppose is -strongly convex and -smooth for some . Consider running the Langevin Algorithm with stepsize . Then its stationary distribution satisfies
where denotes the unique minimizer of , and .
The operational interpretation of is that it is the contraction coefficient corresponding to a gradient descent step (see Lemma 4.2 below). Note that because . In the typical setting of , it simplifies to , whereby Theorem 4.1 simplifies to
The rest of the section is organized as follows. In §4.1 we prove Theorem 4.1 by showing that the rotationally symmetrized version of is sub-Gaussian with variance proxy , and in §4.2 we show that this bound is tight in all parameters (up to a constant factor of at most ).
4.1 Proof
The proof makes use of the following helper lemma. This fact is well-known in convex optimization because it immediately implies a tight bound on the convergence rate of gradient descent for strongly-convex, smooth objectives. A short proof can be found in, e.g., Appendix A.1 of [1].
Lemma 4.2 (Contractivity of gradient descent step).
Suppose is -strongly convex and -smooth over for some , and consider any stepsize . Then
where is any minimizer of , and .
Proof of Theorem 4.1.
To avoid cumbersome notation, let us assume that is centered to have minimizer (this is without loss of generality after a translation). We bound the change in the Lyapunov function from running an iteration of the Langevin Algorithm from some point to
where . To analyze this, we disassociate the two parts of this update: the gradient descent step and the noise convolution.
First, we analyze the gradient descent step:
Above, the first and last steps are by definition of in terms of (see (3.2)); the second step is by contractivity of a gradient descent iteration (Lemma 4.2) and monotonicity of (item (i) of Lemma 3.4); and the third step is by an application of Jensen’s inequality where , which applies since the function is concave for .
Second, we use Lemma 3.3 to bound the change in the Lyapunov function from the Gaussian noise convolution:
By combining the above two displays, we conclude that
(4.1) |
Now take expectations on both sides, drawing . Note that by stationarity, . Thus
where above we have again used Jensen’s inequality on the concave function . Rearranging and simplifying yields
(4.2) |
By definition of the Lyapunov function , this implies that
(4.3) |
where denotes the Haar measure over rotations of . Define to be the law of (in words, is a rotationally symmetrized version of ). Then by definition of sub-Gaussianity, the above display establishes that is sub-Gaussian with variance proxy . By a standard concentration inequality on the norm of a sub-Gaussian vector (see e.g., [15, Theorem 1.19]),
Since the Euclidean norm is invariant under rotations, this concentration inequality remains true if is replaced by (cf., the discussion in §3.1). This concludes the proof. ∎
4.2 Tightness
Here we observe that, modulo a constant factor of at most , the sub-Gaussianity we proved is tight in all parameters. It is simplest to explain this tightness in terms of the sub-Gaussian parameters since the final concentration inequality in Theorem 4.1 was proved as a direct consequence of this. To this end, recall that in the proof of Theorem 4.1 above, we proved that the rotationally symmetric version of is sub-Gaussian with variance proxy . Following is a simple construction which matches this upper bound. This calculation is similar to [19, Example 4].
Example 4.3 (Tightness of sub-Gaussanity parameters).
Consider running the Langevin Algorithm on the univariate quadratic potential , where the curvature is chosen so as to maximize the contraction coefficient . Then, for any stepsize , the Langevin Algorithm has stationary distribution
when it is initialized to . (This fact is proved in [1, §4.2] and follows by simply unraveling the definition of a Langevin Algorithm update, composing times, and taking .) Note that in this example since is already rotationally symmetric. Moreover, since and since also , this construction matches our aforementioned upper bound on the sub-Gaussian parameter of up to a constant factor of at most .
5 Sub-exponential concentration for convex potentials
In this section we show that if the potential is convex and smooth, then the stationary distribution of the Langevin Algorithm has sub-exponential tails. This parallels the classically known fact that in this convex setting, the unbiased stationary distribution of the continuous Langevin Diffusion is sub-exponential (see Lemma 5.2 below).
Theorem 5.1 (Sub-exponentiality of for convex potentials).
Consider running the Langevin Algorithm on a convex, -smooth potential with stepsize . Then the stationary distribution of the Langevin Algorithm has sub-exponential concentration; i.e., there exists and a minimizer of such that
Note that in this sub-exponential concentration inequality, the parameters , , and depend on and (this dependence is made explicit in the proof). A dependence on is inevitable since even the unbiased distribution has sub-exponential parameters which depend on . As a simple concrete example, consider the potential where is the Euclidean ball of radius centered at the origin, and is the Euclidean distance to the ball. The distribution then concentrates no better than the uniform distribution on the Euclidean ball of radius , and a randomly drawn point from has norm with high probability. See also the tightness discussion in §5.2 for a fully worked-out example.
The rest of the section is organized as follows. In §4.1 we prove Theorem 4.1 by showing that the rotationally symmetrized version of is sub-Gaussian with variance proxy , and in §4.2 we show that this bound is tight in all parameters (up to a constant factor of at most ).
The rest of the section is organized as follows. In §5.1 we prove Theorem 5.1, and in §5.2 we discuss tightness of this bound.
5.1 Proof
We begin by recalling the classical fact from functional analysis that every log-concave distribution is sub-exponential. For a proof, see e.g., [3, Lemma 10.6.1].
Lemma 5.2 (Log-concavity implies sub-exponentiality).
If is a log-concave probability distribution, then there exist such that for all .
To apply this lemma, it is convenient to first assume without loss of generality that is translated and shifted so that is a minimizer of with value . Then . By Lemma 5.2, we conclude that
(5.1) |
for some . (Here, plays the role of and plays the role of , where and are the quantities from Lemma 5.2.) We exploit this super-linear growth (5.1) of to show that a gradient descent update on makes significant progress towards the minimizer when the current iterate has sufficiently large norm.
Lemma 5.3 (Gradient descent updates make significant progress for super-linear objectives).
Suppose is convex, -smooth, achieves its minimum value at , and satisfies (5.1). Then there exists such that for any stepsize , it holds that
Proof.
We bound
Above, the second step expands the square. The third step uses the folklore fact from optimization (see, e.g., [14, Equation (2.1.8)]) that -smoothness implies , which we use here for . The fourth step uses the assumption that . The fifth step adds a non-negative quantity to complete the square. The penultimate step uses convexity and the assumption to bound . The final step uses the super-linear growth condition (5.1). We conclude that if , then . ∎
Proof of Theorem 5.1.
As in the proof of Theorem 4.1, we bound the change in the Lyapunov function from running an iteration of the Langevin Algorithm from some point to
where . To analyze this, we disassociate the two parts of this update: the gradient descent step and the noise convolution.
First, we analyze the gradient descent step. We consider two cases depending on whether the norm of the current iterate is larger than , where is the quantity from Lemma 3.4, and is the quantity from Lemma 5.3. (We will set .)
- •
- •
Now, since is non-negative (item (i) of Lemma 3.4), we can crudely combine these two cases to conclude that
Now use Lemma 3.3 to bound the change in the Lyapunov function from the Gaussian noise convolution:
(5.2) |
By combining the above two displays, we conclude that
Now take expectations on both sides, drawing . Note that by stationarity, . Thus
Re-arranging and simplifying yields
Setting yields
(5.3) |
where for notational shorthand we have defined .
From this, we argue a concentration inequality in a way analogous to Chernoff bounds, except using our rotationally-invariant version of the moment generating function. Specifically, bound
Above, the first step is by monotonicity of (item (i) of Lemma 3.4), the second step is by Markov’s inequality which is applicable since is non-negative (item (i) of Lemma 3.4), and the third step is by (5.3) and the super-exponential growth of above (item (ii) of Lemma 3.4). ∎
5.2 Tightness
Here we observe that, modulo a constant factor, the sub-exponential concentration bound that we proved for matches the concentration for —both in terms of when the sub-exponentiality threshold occurs, and also in terms of the sub-exponential decay rate beyond this threshold. Although no techniques currently exist for extending these lower bounds from to (this is an interesting open question), we conjecture that this tightness extends from to since we suspect that should not have better concentration than .
Example 5.4 (Tightness of sub-exponentiality parameters).
Consider running the Langevin Algorithm where the potential is the univariate Huber function
where is set so that and are continuous at the breakpoints. Intuitively, modulo a translation and a quadratic smoothing around , is essentially the absolute value function times , and therefore is essentially the exponential distribution with parameter . It is straightforward to check that is convex and -smooth with , and thus satisfies the conditions in Theorem 5.1. In this example, the quantities in the proof of Theorem 5.1 are as follows. To simplify the asymptotics, suppose that .
-
•
The quantities and in the super-linear growth inequality (5.1) exactly match the and in the construction of this function .
-
•
The quantity from Lemma 3.4 can be set to because in dimension , the Lyapunov function , and it holds555Proof: By Jensen’s inequality, . Re-arranging gives for any . Re-arranging further establishes the desired inequality . that for all and all .
-
•
The quantity from Lemma 5.3 is .
-
•
The quantity since .
Therefore, for this potential , Theorem 5.1 establishes sub-exponential tails that decay at rate beyond norms of size . Similarly, has sub-exponential tails that decay at a rate of order beyond norms of order , because has tails that are similar to that of an exponential random variable with parameter . We conclude that for all stepsizes that are not exponentially small in the problem parameters666This is a mild assumption, since in all relevant parameter regimes for sampling, is only polynomially small; and moreover, for exponentially small , sub-exponential concentration of is immediate from combining any bias bound between and (which depend polynomially on ) with the classical result that is sub-exponential (Lemma 5.2)., the concentration inequality in Theorem 5.1 matches that of —both in terms of when the sub-exponentiality threshold occurs, as well as the rate of sub-exponential decay beyond this threshold.
6 Further extensions
6.1 Concentration along the trajectory
In this subsection we mention that the concentration we showed for extends to concentration of the -th iterate of the Langevin Algorithm, for every . This is a straightforward extension of our proofs for , recovers that result in the limit , and provides slightly tighter concentration bounds for every finite . Moreover, it applies to both the settings of strongly and non-strongly convex potentials, which we studied in §4 and §5, respectively. We state the results for both settings below. These results are most simply stated when the algorithm is initialized to a minimizer of , which is a standard pre-processing for sampling algorithms (see, e.g., [7]) and can be done efficiently using the same access to first-order queries (e.g., via gradient descent).
Theorem 6.1 (Sub-Gaussianity of Langevin iterates for strongly convex potentials).
Proof.
The proof is a simple extension of the proof of Theorem 4.1. Recall the key step (4.1) in that proof. By applying this inductively to iterates of the Langevin Algorithm (i.e., applying this inequality with as ), we obtain the following recursion for the rotation-invariant MGF:
Moreover, note that since the initialization is . Unraveling this recursion, it follows that
for all iterations . This recovers (and improves for any finite ) the upper bound in (4.2) which is for the stationary distribution (a.k.a., the limit ). The rest of the proof then follows identically as done there (simply use sub-Gaussian concentration). ∎
Theorem 6.2 (Sub-exponentiality of Langevin iterates for convex potentials).
Proof.
The proof is a simple extension of the proof of Theorem 5.1 (mirroring the extension described above for the strongly convex case). Recall the key step (5.2) in the proof of Theorem 5.1. By applying this inductively to iterates of the Langevin Algorithm (i.e., applying this inequality with as ) and setting as done in the proof of Theorem 5.1, we obtain the following recursion for the rotation-invariant MGF:
where for shorthand we denote . Moreover, note that since the initialization is . Unraveling this recursion, it follows that
(6.1) |
for all iterations . Above, the last step is by a crude bound that uses the fact and . This recovers (and improves for any finite ) the upper bound in (5.3) which is for the stationary distribution (a.k.a., the limit ). The rest of the proof then follows identically as done there (i.e., by using a Chernoff-type argument). ∎
6.2 Using inexact gradients
In this subsection we mention that the results in this paper apply to inexact implementations of the Langevin Algorithm which use sub-Gaussian estimates of the true gradients. Specifically, here we consider implementations of the form
(6.2) |
where is sub-Gaussian with some variance proxy and, conditional on , is independent of all other randomness.
This extension applies to all settings in this paper—both the settings of strongly and non-strongly convex potentials, and both the concentration of the stationary distribution as well as the entire trajectory of the Langevin Algorithm. For brevity, we state these results for the trajectory; the same bounds then hold for the stationary distribution by taking the limit . In what follows, we use the standard convention that being sub-Gaussian with variance proxy means and for any unit vector .
Theorem 6.3 (Sub-Gaussianity of inexact Langevin for strongly convex potentials).
Proof.
By a nearly identical argument, the key recursion (4.1) becomes
where the only difference is the extra factor of arising from the inexact gradients and the definition of sub-Gaussianity. The rest of the proof is then identical, with the sub-Gaussian proxy changing from to due to this extra term. ∎
Theorem 6.4 (Sub-exponentiality of inexact Langevin for convex potentials).
Proof.
By a nearly identical argument, the key recursion (5.2) becomes
due to the sub-Gaussian term, as explained in the above proof for the strongly convex case. The rest of the proof is then identical to that of Theorem 6.2, except with now set to , so that now becomes , which just changes the definition of . ∎
Acknowledgements.
We thank Sinho Chewi, Murat Erdogdu, and Andre Wibisono for helpful conversations about the related literature. We also thank Sinho Chewi for sharing an early draft of his book [7] on log-concave sampling.
References
- Altschuler and Talwar [2023] Jason M. Altschuler and Kunal Talwar. Resolving the mixing time of the Langevin Algorithm to its stationary distribution for log-concave sampling. In Conference on Learning Theory, pages 2509–2510, 2023.
- Andrieu et al. [2003] Christophe Andrieu, Nando De Freitas, Arnaud Doucet, and Michael I Jordan. An introduction to MCMC for machine learning. Machine Learning, 50(1):5–43, 2003.
- Artstein-Avidan et al. [2015] Shiri Artstein-Avidan, Apostolos Giannopoulos, and Vitali D Milman. Asymptotic Geometric Analysis, Part I, volume 202. American Mathematical Society, 2015.
- Bakry and Émery [1985] Dominique Bakry and Michel Émery. Diffusions hypercontractives. In Seminaire de probabilités XIX 1983/84, pages 177–206. Springer, 1985.
- Bhattacharya [1978] RN Bhattacharya. Criteria for recurrence and existence of invariant measures for multidimensional diffusions. The Annals of Probability, pages 541–553, 1978.
- Bobkov [2003] Sergey G Bobkov. Spectral gap and concentration for some spherically symmetric probability measures. In Geometric Aspects of Functional Analysis, pages 37–43. Springer, 2003.
- Chewi [2022] Sinho Chewi. Log-Concave Sampling. Forthcoming, 2022.
- [8] DLMF. NIST Digital Library of Mathematical Functions. http://dlmf.nist.gov/, Release 1.0.23 of 2019-06-15. URL http://dlmf.nist.gov/. F. W. J. Olver, F. A. B. Olde Daalhuis, D. W. Lozier, B. I. Schneider, R. F. Boisvert, C. W. Clark, B. R. Miller and B. V. Saunders, eds.
- Grimmett and Stirzaker [2020] Geoffrey Grimmett and David Stirzaker. Probability and Random Processes. Oxford University Press, 2020.
- Jerrum and Sinclair [1996] Mark Jerrum and Alistair Sinclair. The Markov chain Monte Carlo method: an approach to approximate counting and integration. Approximation Algorithms for NP-hard problems, PWS Publishing, 1996.
- Kannan et al. [1995] Ravi Kannan, László Lovász, and Miklós Simonovits. Isoperimetric problems for convex bodies and a localization lemma. Discrete & Computational Geometry, 13(3):541–559, 1995.
- Ledoux [1999] Michel Ledoux. Concentration of measure and logarithmic Sobolev inequalities. In Seminaire de probabilites XXXIII, pages 120–216. Springer, 1999.
- Liu [2001] Jun S Liu. Monte Carlo strategies in scientific computing, volume 10. Springer, 2001.
- Nesterov [2003] Yurii Nesterov. Introductory lectures on convex optimization: A basic course, volume 87. Springer Science & Business Media, 2003.
- Rigollet and Hütter [2015] Phillippe Rigollet and Jan-Christian Hütter. High dimensional statistics. Lecture notes for course 18S997, 813(814):46, 2015.
- Robert and Casella [1999] Christian P Robert and George Casella. Monte Carlo statistical methods, volume 2. Springer, 1999.
- Roberts and Tweedie [1996] Gareth O Roberts and Richard L Tweedie. Exponential convergence of Langevin distributions and their discrete approximations. Bernoulli, pages 341–363, 1996.
- Szegő [1939] Gábor Szegő. Orthogonal polynomials, volume 23. American Mathematical Society, 1939.
- Vempala and Wibisono [2019] Santosh Vempala and Andre Wibisono. Rapid convergence of the unadjusted Langevin algorithm: Isoperimetry suffices. In Advances in Neural Information Processing Systems, pages 8092–8104. 2019.