This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Gibbs measures with multilinear forms

Sohom Bhattacharya S.  Bhattacharya Department of Operations Research and Financial Engineering
Princeton University
Princeton, NJ 08540, USA.
[email protected]
Nabarun Deb N.  Deb Econometrics and Statistics
University of Chicago Booth School of Business
Chicago, IL 60637, USA.
[email protected]
 and  Sumit Mukherjee S.  Mukherjee Department of Statistics
Columbia University
New York, NY 10027, USA.
[email protected]
Abstract.

In this paper, we study a class of multilinear Gibbs measures with Hamiltonian given by a generalized U\mathrm{U}-statistic and with a general base measure. Expressing the asymptotic free energy as an optimization problem over a space of functions, we obtain necessary and sufficient conditions for replica-symmetry. Utilizing this, we obtain weak limits for a large class of statistics of interest, which includes the “local fields/magnetization”, the Hamiltonian, the global magnetization, etc. An interesting consequence is a universal weak law for contrasts under replica symmetry, namely, n1i=1nciXi0n^{-1}\sum_{i=1}^{n}c_{i}X_{i}\to 0 weakly, if i=1nci=o(n)\sum_{i=1}^{n}c_{i}=o(n). Our results yield a probabilistic interpretation for the optimizers arising out of the limiting free energy. We also prove the existence of a sharp phase transition point in terms of the temperature parameter, thereby generalizing existing results that were only known for quadratic Hamiltonians. As a by-product of our proof technique, we obtain exponential concentration bounds on local and global magnetizations, which are of independent interest.

Key words and phrases:
Graph limits, magnetization, phase transition, replica-symmetry, tensor Ising model
1991 Mathematics Subject Classification:
82B20, 05C80
The third author’s research is partially supported by NSF grant DMS-2113414.

1. Introduction

Suppose μ\mu is a (non-degenerate) probability measure on \mathcal{\mathbb{R}}. Let H=(V(H),E(H))H=(V(H),E(H)) be a finite graph with v:=|V(H)|2v:=|V(H)|\geq 2 vertices labeled [v]={1,2,,v}[v]=\{1,2,\ldots,v\}, and maximum degree Δ\Delta. Fixing θ\theta\in\mathbb{R}, define a function

(1.1) Zn(θ):=1nlog𝔼μnenθ𝕌n(𝐗)(,],\displaystyle Z_{n}(\theta):=\frac{1}{n}\log\mathbb{E}_{\mu^{\otimes n}}e^{n\theta\mathbb{U}_{n}({\bf X})}\in(-\infty,\infty],

where 𝕌n(𝐗)\mathbb{U}_{n}({\bf X}) be a multilinear form, defined by

(1.2) 𝕌n(𝐗):=1nv(i1,,iv)𝒮(n,v)(a=1vXia)(a,b)E(H)Qn(ia,ib).\displaystyle\mathbb{U}_{n}({\bf X}):=\frac{1}{n^{v}}\sum_{(i_{1},\ldots,i_{v})\in\mathcal{S}(n,v)}\Big{(}\prod_{a=1}^{v}X_{i_{a}}\Big{)}\prod_{(a,b)\in E(H)}Q_{n}(i_{a},i_{b}).

Here 𝒮(n,v)\mathcal{S}(n,v) is the set of all distinct tuples from [n]v[n]^{v} (so that |𝒮(n,v)|=v!(nv)|\mathcal{S}(n,v)|=v!\binom{n}{v}), and QnQ_{n} is a symmetric n×nn\times n matrix with 0 on the diagonal. If θ\theta is such that Zn(θ)Z_{n}(\theta) is finite, we can define a Gibbs probability measure n,θ\mathbb{R}_{n,\theta} on n\mathbb{R}^{n} by setting

(1.3) dn,θdμn(𝐱)=exp(nθ𝕌n(𝐱)nZn(θ)).\displaystyle\frac{d\mathbb{R}_{n,\theta}}{d\mu^{\otimes n}}({\bf x})=\exp\Big{(}n\theta\mathbb{U}_{n}({\bf x})-nZ_{n}(\theta)\Big{)}.

Several Gibbs measures of interest can be expressed in the form (1.3) with various choices of (Qn,H,μ)(Q_{n},H,\mu). Below we give two examples of such Gibbs measures which have been well studied in Probability and Statistics.

  • If H=K2H=K_{2} is an edge, then

    dn,θdμn(𝐱)=exp(1nijQn(i,j)xixjnZn(θ))\frac{d\mathbb{R}_{n,\theta}}{d\mu^{\otimes n}}({\bf x})=\exp\left(\frac{1}{n}\sum_{i\neq j}Q_{n}(i,j)x_{i}x_{j}-nZ_{n}(\theta)\right)

    is a Gibbs measure with a quadratic Hamiltonian. In particular if μ\mu is supported on {1,1}\{-1,1\}, then n,θ\mathbb{R}_{n,\theta} is the celebrated Ising model on {1,1}n\{-1,1\}^{n} with coupling matrix QnQ_{n} (see [24, 3, 12, 1] for various examples). Popular examples of QnQ_{n} include the adjacency matrix of the complete graph, line graph, random graphs such as Erdős-Rényi or random dd-regular, and so on.

  • If H=KvH=K_{v} is the complete graph on vv vertices, and QnQ_{n} is the adjacency matrix of a complete graph, then model (1.3) reduces to

    dn,θdμn(𝐱)=exp(1nv1(i1,,iv)𝒮n,va=1vxianZn(θ)).\frac{d\mathbb{R}_{n,\theta}}{d\mu^{\otimes n}}({\bf x})=\exp\left(\frac{1}{n^{v-1}}\sum_{(i_{1},\ldots,i_{v})\in\mathcal{S}_{n,v}}\prod_{a=1}^{v}x_{i_{a}}-nZ_{n}(\theta)\right).

    For the special case where μ\mu is supported on {1,1}\{-1,1\}, n,θ\mathbb{R}_{n,\theta} is just the vv-spin version of the Curie-Weiss model, which has attracted attention in recent years (see [29, 28, 33, 15]).

In this paper, we study the generalized model (1.3), when the sequence of matrices {Qn}n1\{Q_{n}\}_{n\geq 1} converge in weak cut metric (defined by (1.4)). Our main contributions are:

(a) We give an exact characterization for replica-symmetry for the asymptotic free energy/log partition function (see 1.2).

(b) We obtain weak limits for a large family of statistics which include the Hamiltonian, “local magnetizations”, global magnetization, and contrasts (see Theorems 1.3 and 1.6).

(c) We provide tail bounds for global and local magnetizations (see 1.4).

(d) We show the existence of a “phase transition” for multilinear Gibbs measures of the form (1.3) with compactly supported μ\mu (see 1.9).

1.1. Main results

To establish our main results, we will assume throughout that the sequence of matrices {Qn}n1\{Q_{n}\}_{n\geq 1} converges in the weak cut distance (defined below). Cut distance/cut metric has been introduced in the combinatorics literature to study limits of graphs and matrices (see [21]), and have received significant attention in the recent literature ([8, 9, 10, 11]). For more details on cut metric and its manifold applications, we refer the interested reader to [27]. Below we formally introduce the notion of strong and weak cut distances used in this paper.

Definition 1.1.

Suppose 𝒲\mathcal{W} is the space of all symmetric real-valued functions in L1([0,1]2)L^{1}([0,1]^{2}). Given two functions W1,W2𝒲W_{1},W_{2}\in\mathcal{W}, define the strong cut distance between W1,W2W_{1},W_{2} by setting

d(W1,W2):=supS,T|S×T[W1(x,y)W2(x,y)]𝑑x𝑑y|.d_{\square}(W_{1},W_{2}):=\sup_{S,T}\Big{|}\int_{S\times T}\Big{[}W_{1}(x,y)-W_{2}(x,y)\Big{]}dxdy\Big{|}.

In the above display, the supremum is taken over all measurable subsets S,TS,T of [0,1][0,1]. Define the weak cut distance by

δ(W1,W2):=infσ(W1σ,W2)=infσ(W1,W2σ)\delta_{\square}(W_{1},W_{2}):=\inf_{\sigma}(W^{\sigma}_{1},W_{2})=\inf_{\sigma}(W_{1},W^{\sigma}_{2})

where σ\sigma ranges from all measure preserving bijections [0,1][0,1][0,1]\rightarrow[0,1] and Wσ(x,y)=W(σ(x),σ(y))W^{\sigma}(x,y)=W(\sigma(x),\sigma(y)).

Given a symmetric matrix QnQ_{n}, define a function WQn𝒲W_{Q_{n}}\in\mathcal{W} by setting

WQn(x,y)=\displaystyle W_{Q_{n}}(x,y)= Qn(i,j) if nx=i,ny=y.\displaystyle Q_{n}(i,j)\text{ if }\lceil nx\rceil=i,\lceil ny\rceil=y.

We will assume throughout the paper that the sequence of matrices {Qn}n1\{Q_{n}\}_{n\geq 1} introduced in (1.2) converge in weak cut distance, i.e. for some W𝒲W\in\mathcal{W},

(1.4) δ(WQn,W)0.\displaystyle\delta_{\square}(W_{Q_{n}},W)\rightarrow 0.

We now introduce some notation that will be used throughout the rest of the paper.

Definition 1.2.

Let \mathcal{M} denote the set of probability measures on [0,1]×[0,1]\times\mathbb{R}, equipped with weak topology. Given a probability measure ν\nu\in\mathcal{M}, let ν(1)\nu_{(1)} and ν(2)\nu_{(2)} denote its first and second marginals respectively. Also define 𝔪p(ν):=|x|p𝑑ν(2)(x)\mathfrak{m}_{p}(\nu):=\int|x|^{p}\,d\nu_{(2)}(x) for p0p\geq 0. Define ~\widetilde{\mathcal{M}}\subseteq\mathcal{M} as follows:

~:={ν:ν(1)=Unif[0,1]}.\widetilde{\mathcal{M}}:=\{\nu\in\mathcal{M}:\ \nu_{(1)}=\mathrm{Unif}[0,1]\}.

Also define ~p~\widetilde{\mathcal{M}}_{p}\subseteq\widetilde{\mathcal{M}} as follows:

(1.5) ~p:={ν~:𝔪p(ν)<}.\displaystyle\widetilde{\mathcal{M}}_{p}:=\{\nu\in\widetilde{\mathcal{M}}:\ \mathfrak{m}_{p}(\nu)<\infty\}.

Note that ~p\widetilde{\mathcal{M}}_{p} is a closed subset of ~\widetilde{\mathcal{M}} (by Fatou’s Lemma), and ~\widetilde{\mathcal{M}} is a closed subset of \mathcal{M}, in the weak topology. For two measures ν1,ν2\nu_{1},\nu_{2} on [0,1]×[0,1]\times\mathbb{R}, define

d(ν1,ν2):=supfLip(1)|f𝑑ν1f𝑑ν2|,d_{\ell}(\nu_{1},\nu_{2}):=\sup_{f\in\text{Lip}(1)}\Bigg{|}\int fd\nu_{1}-\int fd\nu_{2}\Bigg{|},

where the supremum is over the set of functions f:[0,1]×[1,1]f:[0,1]\times\mathbb{R}\mapsto[-1,1] which are 11-Lipschitz.

We now introduce the exponential tilt of the base measure μ\mu, and some related notations. This requires the following assumption, which we make throughout the paper: For all λ>0\lambda>0 and some p[1,]p\in[1,\infty], we have

(1.6) 𝔼μeλ|X1|p<,\displaystyle\mathbb{E}_{\mu}e^{\lambda|X_{1}|^{p}}<\infty,

where the case p=p=\infty corresponds to assuming μ\mu is compactly supported.

Definition 1.3.

Given (1.6), the function

α(θ):=logeθx𝑑μ(x)\alpha(\theta):=\log\int_{\mathbb{R}}e^{\theta x}d\mu(x)

is finite for all θ\theta\in\mathbb{R}. Define the θ\theta-exponential tilt of μ\mu by setting

dμθdμ(x):=exp(θxα(θ)).\frac{d\mu_{\theta}}{d\mu}(x):=\exp(\theta x-\alpha(\theta)).

Then the function α(.)\alpha(.) is infinitely differentiable, with

α(θ)=𝔼μθ(X),α′′(θ)=Varμθ(X)>0.\alpha^{\prime}(\theta)=\mathbb{E}_{\mu_{\theta}}(X),\quad\alpha^{\prime\prime}(\theta)=\mathrm{Var}_{\mu_{\theta}}(X)>0.

Consequently the function α(.)\alpha^{\prime}(.) is strictly increasing on \mathbb{R}, and has an inverse β(.):𝒩\beta(.):\mathcal{N}\mapsto\mathbb{R}, where 𝒩:=α()\mathcal{N}:=\alpha^{\prime}(\mathbb{R}) is an open interval. Let cl\mathrm{cl} denote the closure of a set in \mathbb{R}, and extend β(.)\beta(.) to a (possibly infinite valued) function on cl(𝒩)\mathrm{cl}(\mathcal{N}) by setting

β(sup{𝒩})=\displaystyle\beta(\sup\{\mathcal{N}\})= + if sup{𝒩}<,\displaystyle+\infty\text{ if }\sup\{\mathcal{N}\}<\infty,
β(inf{𝒩})=\displaystyle\beta(\inf\{\mathcal{N}\})=  if inf{𝒩}>.\displaystyle-\infty\text{ if }\inf\{\mathcal{N}\}>-\infty.

We write D(|)D(\cdot|\cdot) to denote the standard Kullback-Leibler divergence. Define a function γ:β(cl(𝒩))[0,]\gamma:\beta(\mathrm{cl}(\mathcal{N}))\mapsto[0,\infty] by setting

γ(θ):=\displaystyle\gamma(\theta):= D(μθμ)=θα(θ)α(θ)\displaystyle D(\mu_{\theta}\|\mu)=\theta\alpha^{\prime}(\theta)-\alpha(\theta) if θ=β(𝒩),\displaystyle\text{ if }\theta\in\mathbb{R}=\beta(\mathcal{N}),
γ():=\displaystyle\gamma(\infty):= D(δsup{𝒩}|μ)\displaystyle D(\delta_{\sup\{\mathcal{N}\}}|\mu) if sup{𝒩}<,\displaystyle\text{ if }\sup\{\mathcal{N}\}<\infty,
γ():=\displaystyle\gamma(-\infty):= D(δinf{𝒩}|μ)\displaystyle D(\delta_{\inf\{\mathcal{N}\}}|\mu) if inf{𝒩}>.\displaystyle\text{ if }\inf\{\mathcal{N}\}>-\infty.
Definition 1.4.

Let \mathcal{L} denote the space of all measurable functions f:[0,1]cl(𝒩)f:[0,1]\mapsto{\rm cl}(\mathcal{N}) such that 01|f(u)|p𝑑u<\int_{0}^{1}|f(u)|^{p}du<\infty. Define a map Ξ:~\Xi:\mathcal{L}\mapsto\widetilde{\mathcal{M}} as follows:

For any ff\in\mathcal{L}, if (U,V)Ξ(f)(U,V)\sim\Xi(f), then UU[0,1]U\sim\mathrm{U}[0,1], and given U=uU=u, one has

V\displaystyle V\sim μβ(f(u)) if f(u)𝒩,\displaystyle\mu_{\beta(f(u))}\text{ if }f(u)\in\mathcal{N},
=\displaystyle= sup{α()} if f(u)=sup{𝒩},\displaystyle\sup\{\alpha^{\prime}(\mathbb{R})\}\text{ if }f(u)=\sup\{\mathcal{N}\}, (this can only happen if sup{𝒩}<),\displaystyle\text{ (this can only happen if }\sup\{\mathcal{N}\}<\infty),
=\displaystyle= inf{α()} if f(u)=inf{𝒩},\displaystyle\inf\{\alpha^{\prime}(\mathbb{R})\}\text{ if }f(u)=\inf\{\mathcal{N}\}, (this can only happen if inf{𝒩}>).\displaystyle\text{ (this can only happen if }\inf\{\mathcal{N}\}>-\infty).
Definition 1.5.

Fix W𝒲W\in\mathcal{W} and let \mathcal{L} be as defined above. Define the functional GW(.):G_{W}(.):\mathcal{L}\mapsto\mathbb{R} by setting

GW(f):=[0,1]v((a,b)E(H)W(xa,xb))(a=1vf(xa)dxa),G_{W}(f):=\int_{[0,1]^{v}}\left(\prod_{(a,b)\in E(H)}W(x_{a},x_{b})\right)\left(\prod_{a=1}^{v}f(x_{a})dx_{a}\right),

whenever G|W|(|f|)<G_{|W|}(|f|)<\infty (see 1.1 below for sufficient conditions).

Finally, let 𝔏n()\mathfrak{L}_{n}(\cdot) be a map from n\mathbb{R}^{n} to \mathcal{M} defined by

(1.7) 𝔏n(𝐱):=1ni=1nδ(in,xi),𝐱=(x1,,xn).{\color[rgb]{0,0,0}\mathfrak{L}_{n}(\mathbf{x})}:=\frac{1}{n}\sum_{i=1}^{n}\delta_{(\frac{i}{n},x_{i})},\qquad\mathbf{x}=(x_{1},\ldots,x_{n}).

The following proposition characterizes the asymptotics of the log partition function/free energy in terms of an infinite dimensional optimization problem, and gives a characterization for the class of optimizers in terms of a fixed point equation.

Proposition 1.1.

Suppose that μ\mu satisfies (1.6) for some pvp\geq v and all λ>0\lambda>0. Let {Qn}n1\{Q_{n}\}_{n\geq 1} be a sequence of matrices such that (1.4) holds for some W𝒲W\in\mathcal{W}, and

(1.8) lim supnWQnqΔ<,\limsup_{n\to\infty}\ \lVert W_{Q_{n}}\rVert_{q\Delta}<\infty,

for some q>1q>1 such that 1p+1q1\frac{1}{p}+\frac{1}{q}\leq 1. Then the following conclusions hold.

(i) The function GW(.)G_{W}(.) is well-defined on \mathcal{L}, i.e., G|W|(|f|)<G_{|W|}(|f|)<\infty for all ff\in\mathcal{L}.

(ii) With Zn(θ)Z_{n}(\theta) as in (1.1), we have supn1Zn(θ)<\sup_{n\geq 1}Z_{n}(\theta)<\infty and

(1.9) limnZn(θ)\displaystyle\lim_{n\to\infty}Z_{n}(\theta) =\displaystyle= supt:I(t)<{θtI(t)}\displaystyle\sup_{t\in\mathbb{R}:I(t)<\infty}\{\theta t-I(t)\}
=\displaystyle= supf:[0,1]γ(β(f(x)))𝑑x<{θGW(f)[0,1]γ(β(f(x)))dx}=:Z(θ).\displaystyle\sup_{f\in\mathcal{L}:\ \int_{[0,1]}\gamma(\beta(f(x)))dx<\infty}\left\{\theta G_{W}(f)-\int_{[0,1]}\gamma(\beta(f(x)))dx\right\}=:Z(\theta).

(iii) The supremum in (1.9) is achieved on a set FθF_{\theta}\subset\mathcal{L} (say), which satisfies

(1.10) d(𝔏n(𝐗),Ξ(Fθ))𝑃0d_{\ell}(\mathfrak{L}_{n}({\bf X}),\Xi(F_{\theta}))\overset{P}{\longrightarrow}0

under 𝐗n,θ{\bf X}\sim\mathbb{R}_{n,\theta} (as in (1.3)), where Ξ\Xi is defined by 1.4. Further Ξ(Fθ)\Xi(F_{\theta}) is compact in the weak topology.

The above proposition follows from [4, Theorems 1.1 and 1.6].

Remark 1.1.

Under assumptions (1.4) and (1.8),  [9, Theorem 2.13] gives

(1.11) WqΔ<,\|W\|_{q\Delta}<\infty,

for any q>1q>1, a fact that we use throughout the paper. We note in passing that under stronger assumptions on HH and μ\mu (similar to [4, Theorem 1.2]) it is possible to forego the requirement in (1.8) and replace it with weaker assumptions.

1.1.1. Replica-symmetry

The above proposition shows that the infinite dimensional optimization problem in the second line of (1.9) is useful for understanding the Gibbs measure n,θ\mathbb{R}_{n,\theta}. (see parts (iii) and (iv)). A natural question is when does the set of optimizers of (1.9) consist only of constant functions. Equivalently, borrowing terminology from statistical physics, we want to understand the “replica-symmetry” phase of the Gibbs measure n,θ\mathbb{R}_{n,\theta}. Our first main result provides necessary and sufficient conditions for optimizers to be constant functions. For this we need the following two definitions.

Definition 1.6.

Given a symmetric matrix QnQ_{n}, define a symmetric tensor

Sym[Qn](i1,,iv):=1v!σSv(a,b)E(H)Qn(iσ(a),iσ(b)),\mathrm{Sym}[Q_{n}](i_{1},\ldots,i_{v}):=\frac{1}{v!}\sum_{\sigma\in S_{v}}\prod_{(a,b)\in E(H)}Q_{n}\Big{(}i_{\sigma(a)},i_{\sigma(b)}\Big{)},

where SvS_{v} denote the set of all permutations of [v][v]. In a similar vein, given a symmetric function W𝒲W\in\mathcal{W}, define the symmetric function

Sym[W](x1,,xv):=1v!σSv(a,b)E(H)W(xσ(a),xσ(b)).\mathrm{Sym}[W](x_{1},\ldots,x_{v}):=\frac{1}{v!}\sum_{\sigma\in S_{v}}\prod_{(a,b)\in E(H)}W\Big{(}x_{\sigma(a)},x_{\sigma(b)}\Big{)}.

As an example, if H=K1,2H=K_{1,2}, Sym[W](x1,x2,x3)\mathrm{Sym}[W](x_{1},x_{2},x_{3}) equals

13[W(x1,x2)W(x1,x3)+W(x1,x2)W(x2,x3)+W(x1,x3)W(x2,x3)],\displaystyle\frac{1}{3}\Big{[}W(x_{1},x_{2})W(x_{1},x_{3})+W(x_{1},x_{2})W(x_{2},x_{3})+W(x_{1},x_{3})W(x_{2},x_{3})\Big{]},

whereas if H=K3H=K_{3}, Sym[W](x1,x2,x3)\mathrm{Sym}[W](x_{1},x_{2},x_{3}) equals W(x1,x2)W(x1,x3)W(x2,x3).W(x_{1},x_{2})W(x_{1},x_{3})W(x_{2},x_{3}).

Let

(1.12) 𝒯[Sym[W]](x):=[0,1]v1Sym[W](x,x2,,xv)a=2vdxa,\displaystyle\mathcal{T}[\mathrm{Sym}[W]](x):=\int_{[0,1]^{v-1}}\mathrm{Sym}[W](x,x_{2},\ldots,x_{v})\prod_{a=2}^{v}\,dx_{a},

provided the integral exists and is finite.

Definition 1.7.

Let μ\mu be a measure in \mathbb{R} and β(.),γ(.),𝒩\beta(.),\gamma(.),\mathcal{N} be as in 1.3. We will say μ\mu is stochastically non-negative, if for any t>0t>0, if t𝒩-t\in\mathcal{N} then t𝒩t\in\mathcal{N}, and γ(β(t))γ(β(t)).\gamma(\beta(t))\leq\gamma(\beta(-t)).

We now state the first main result of this paper.

Theorem 1.2 (Replica-symmetry).

Suppose we are in the setting of 1.1. Then 𝒯[Sym[W]](.)\mathcal{T}[\mathrm{Sym}[W]](.) is finite a.s., and the following conclusions hold:

(i) Any maximizer ff of the optimization problem (1.9) satisfies

(1.13) f(x)=a.s.α(θv[0,1]v1Sym[W](x,x2,,xv)(a=2vf(xa)dxa)).f(x)\stackrel{{\scriptstyle a.s.}}{{=}}\alpha^{\prime}\left(\theta v\int_{[0,1]^{v-1}}\mathrm{Sym}[W](x,x_{2},\ldots,x_{v})\left(\prod_{a=2}^{v}f(x_{a})\,dx_{a}\right)\right).

(ii) If θ0\theta\neq 0 and 𝒯[Sym[W]]()\mathcal{T}[\mathrm{Sym}[W]](\cdot) is not constant a.s., none of the maximizers in (1.9) are non-zero constant functions.

(iii) If 𝒯[Sym[W]]()\mathcal{T}[\mathrm{Sym}[W]](\cdot) is constant a.s., and θW\theta W is strictly positive a.s., then all of the maximizers in (1.9) are constant functions, provided either vv is even or μ\mu is stochastically non-negative.

(iv) μ\mu is stochastically non-negative if one of the following conditions hold:

(a) μ\mu is supported on the non-negative half line, or

(b) μ\mu is a non-negative tilt of a symmetric measure, i.e. there exists B0B\geq 0 and a symmetric measure μ~\tilde{\mu}, such that dμdμ~(x)=exp(BxC(x))\frac{d\mu}{d\tilde{\mu}}(x)=\exp(Bx-C(x)).

Remark 1.2.

It follows from the construction of the map Ξ\Xi in 1.4 that if ff\in\mathcal{L} is a constant function, then Ξ(f)~\Xi(f)\in\widetilde{\mathcal{M}} is a product measure. Thus under the conditions of 1.2 part (iii), any weak limit of the empirical measure 𝔏n\mathfrak{L}_{n} (introduced in (1.7)) under n,θ\mathbb{R}_{n,\theta} is a product measure. Further these product measures have first marginal Unif[0,1]\mathrm{Unif}[0,1] and second marginal of the form μβ(t)\mu_{\beta(t)} where tt satisfies the fixed point equation

t=α(θvtv1),t=\alpha^{\prime}(\theta vt^{v-1}),

by 1.2 part (i). In particular if v=2v=2 and μ\mu is supported on {1,1}\{-1,1\} with μ(1)=exp(2B)/(1+exp(2B))\mu(1)=\exp(2B)/(1+\exp(2B)) for some BB\in\mathbb{R}, the above fixed point equation simplifies to

t=tanh(2θt+B).t=\tanh(2\theta t+B).

The solutions of this equation for θ0\theta\geq 0, BB\in\mathbb{R} are well understood, see e.g., [14, Page 2] and [18, page 144, Section 1.1.3]. In 1.7 below we study a broader class of measures μ\mu, which includes this as a special case.

Example 1.1 (Necessity of conditions on μ\mu).

To demonstrate the necessity of stochastic non-negativeness for μ\mu in 1.2 part (iii), we provide an example of a measure μ\mu and a graphon WW with 𝒯[Sym[W]]()\mathcal{T}[\mathrm{Sym}[W]](\cdot) is constant a.s. and θW0\theta W\geq 0, but none of the maximizers are constant functions. Set H=K3H=K_{3} and

W(x,y)=\displaystyle W(x,y)= 0 if (x,y)[0,13)2 or (x,y)[13,23)2 or (x,y)[23,1)2,\displaystyle 0\text{ if }(x,y)\in\Big{[}0,\frac{1}{3}\Big{)}^{2}\text{ or }(x,y)\in\Big{[}\frac{1}{3},\frac{2}{3}\Big{)}^{2}\text{ or }(x,y)\in\Big{[}\frac{2}{3},1\Big{)}^{2},
=\displaystyle= 1 otherwise .\displaystyle 1\text{ otherwise }.

In this case WW is the graphon corresponding to a complete tripartite graph. Let μ\mu be a probability measure on {1,1}\{-1,1\} with μ(1)=e4/(e4+e4)\mu(1)=e^{-4}/(e^{-4}+e^{4}). Set θ=9\theta=9 and

f(x):={0.99 if 0x<23+0.83 if 23x1.\displaystyle f(x):=\begin{cases}-0.99&\text{ if }0\leq x<\frac{2}{3}\\ +0.83&\text{ if }\frac{2}{3}\leq x\leq 1.\end{cases}

Numerical computations show that

θGW(f)[0,1]γ(β(f(x)))𝑑x>supt[1,1]{23θtvγ(β(t))}.\displaystyle\theta G_{W}(f)-\int_{[0,1]}\gamma(\beta(f(x)))dx>\sup_{t\in[-1,1]}\left\{\frac{2}{3}\theta t^{v}-\gamma(\beta(t))\right\}.

Thus all global optimizers must be non-constant functions when v=3v=3, and the measure μ\mu is a negative tilt of a symmetric distribution. Note that the function WW in this counterexample is not strictly positive a.s., but this can be circumvented by a continuity argument to allow for small positive values in the diagonal blocks.

Example 1.2 (Necessity of θ>0\theta>0).

The requirement θ>0\theta>0 is indeed necessary in 1.2 part (iii). To see a counterexample, consider the case v=2v=2 and

W(x,y)=\displaystyle W(x,y)= 2 if (x,y)(0,.5)×(.5×1) or (x,y)(.5,1)×(0,.5),\displaystyle 2\text{ if }(x,y)\in(0,.5)\times(.5\times 1)\text{ or }(x,y)\in(.5,1)\times(0,.5),
=\displaystyle= 0 otherwise .\displaystyle 0\text{ otherwise }.

Let μ\mu be a compactly supported probability measure which is symmetric about 0. If θ\theta is large and negative, then one can show numerically that no optimizer of (1.9) is a constant function, and any optimizer is of the form

f(x)=\displaystyle f(x)= a if 0<x<0.5;\displaystyle a\text{ if }0<x<0.5;
=\displaystyle= b if .5<x<1,\displaystyle b\text{ if }.5<x<1,

where aa and bb are of opposite signs. Similar to the previous remark, even though WW is not strictly positive a.s., this can be circumvented by a continuity argument to allow for small positive values in the diagonal blocks.

1.1.2. Weak laws and tail bounds

Having studied the optimizers of the limiting free energy under model (1.3) in 1.2, the next natural question is to obtain weak laws for various statistics of interest under (1.3). Some popular examples include the Hamiltonian 𝕌n\mathbb{U}_{n} (see (1.2)), the global magnetization i=1nXi\sum_{i=1}^{n}X_{i} or other interesting linear combinations of XiX_{i}’s. In the sequel, we will obtain a family of such weak limits in a unified fashion. Along the way, we will provide a probabilistic interpretation of the optimizers of the limiting free energy. The key tool that will help us address these question simultaneously is a sharp analysis of the vector of local fields, which we define below.

Definition 1.8 (Local fields).

Define the local magnetization/field at the ii-th observation as follows:

(1.14) mi:=vnv1(i2,,iv)𝒮(n,v,i)Sym[Qn](i,i2,,iv)(a=2vXia),\displaystyle m_{i}:=\frac{v}{n^{v-1}}\sum_{(i_{2},\ldots,i_{v})\in\mathcal{S}(n,v,i)}\mathrm{Sym}[Q_{n}](i,i_{2},\ldots,i_{v})\left(\prod_{a=2}^{v}X_{i_{a}}\right),

for i[n]i\in[n]. Here, for i[n]i\in[n], 𝒮(n,v,i)\mathcal{S}(n,v,i) denotes the set of all distinct tuples in [n]v1[n]^{v-1}, such that none of the elements equal to ii, and Sym[Qn]\mathrm{Sym}[Q_{n}] is as in 1.6. Set 𝐦:=(m1,,mn)\mathbf{m}:=(m_{1},\ldots,m_{n}). Following (1.7), consider the associated empirical measure

(1.15) 𝔏n(𝐦)=1ni=1nδ(in,mi),\mathfrak{L}_{n}(\mathbf{m})=\frac{1}{n}\sum_{i=1}^{n}\delta_{\left(\frac{i}{n},m_{i}\right)},

The mim_{i}’s defined in (1.14) are local magnetizations/fields which capture “how well” one can predict XiX_{i} given all XjX_{j}, jij\neq i. More precisely, under model (1.3), the conditional distribution of XiX_{i} given XjX_{j}, jij\neq i is completely determined by mim_{i} in the following manner:

(1.16) dn,θ(xi|xj,ji)dμ(xi)=μθmi,\frac{d\mathbb{R}_{n,\theta}(x_{i}|x_{j},\ j\neq i)}{d\mu(x_{i})}=\mu_{\theta m_{i}},

with μ.\mu_{.} as in 1.3. In particular, by (1.16) and 1.3,

(1.17) 𝔼[Xi|Xj,ji]=α(θmi).\mathbb{E}[X_{i}|X_{j},\ j\neq i]=\alpha^{\prime}(\theta m_{i}).

Consequently, understanding the behavior of the vector 𝐦{\bf m} or

𝜶:=α(θ𝐦)=(α(θm1),,α(θmn)),\boldsymbol{\alpha}:=\alpha^{\prime}(\theta{\bf m})=(\alpha^{\prime}(\theta m_{1}),\ldots,\alpha^{\prime}(\theta m_{n})),

plays an important role in obtaining correlation bounds, tail decay estimates and fluctuations for such Gibbs measures (see [22, 13, 17, 16, 7] and the references therein). The major focus of the existing literature is on the special case of v=2v=2 and μ\mu supported on {1,1}\{-1,1\}, which is not needed in our paper.

Our second main result gives a weak law for the vector 𝐦{\bf m}, in terms of the empirical measure 𝔏n(𝐦)\mathfrak{L}_{n}(\mathbf{m}). For any measure ν~p\nu\in\widetilde{\mathcal{M}}_{p} (see (1.5)), define

(1.18) ϑW,ν(u):=v𝔼[Sym[W](U1,,Uv)(a=2vVa)|U1=u],foru[0,1],\vartheta_{W,\nu}(u):=v\mathbb{E}\left[\mathrm{Sym}[W](U_{1},\ldots,U_{v})\left(\prod_{a=2}^{v}V_{a}\right)\bigg{|}U_{1}=u\right],\quad\mbox{for}\ u\in[0,1],

where (U1,V1),,(Uv,Vv)i.i.d.ν(U_{1},V_{1}),\ldots,(U_{v},V_{v})\overset{i.i.d.}{\sim}\nu, and Sym[.]\mathrm{Sym}[.] is as in 1.6. In the Theorem below, we will provide sufficient conditions under which ϑW,ν()\vartheta_{W,\nu}(\cdot) is well-defined. While the definition of ϑW,ν\vartheta_{W,\nu} may seem abstract at first, it simplifies nicely in the context of 1.2. To see this, assume that (U,V)νΞ(Fθ)(U,V)\sim\nu\in\Xi(F_{\theta}) for some θ\theta\in\mathbb{R}, where FθF_{\theta} is as in 1.1 part (iv). By 1.4 we have 𝔼[V|U=u]=f(u)\mathbb{E}[V|U=u]=f(u) for some fFθf\in F_{\theta}. So,

ϑW,ν(u)\displaystyle\vartheta_{W,\nu}(u) =v𝔼[Sym[W](u,U2,,Uv)(a=2v𝔼[Va|Ua])]\displaystyle=v\mathbb{E}\left[\mathrm{Sym}[W](u,U_{2},\ldots,U_{v})\left(\prod_{a=2}^{v}\mathbb{E}[V_{a}|U_{a}]\right)\right]
=v[0,1]v1Sym[W](u,u2,,uv)(a=2vf(ua)dua).\displaystyle=v\int_{[0,1]^{v-1}}\mathrm{Sym}[W](u,u_{2},\ldots,u_{v})\left(\prod_{a=2}^{v}f(u_{a})\,du_{a}\right).

By invoking (1.13), we then get for a.e. u[0,1]u\in[0,1],

(1.19) f(u)=α(θϑW,ν(u)).f(u)=\alpha^{\prime}(\theta\vartheta_{W,\nu}(u)).

Therefore, there is a direct one-one correspondence between the two sets of functions FθF_{\theta} and {ϑW,ν(.),νΞ(Fθ)}\{\vartheta_{W,\nu}(.),\nu\in\Xi(F_{\theta})\}. This observation will be crucial in the proof of 1.5 below.

We are now in a position to state our second main result.

Theorem 1.3.

Suppose (1.6) holds for some p[v,]p\in[v,\infty]. Assume that (1.4) and (1.8) holds with some qq satisfying 1p+1q<1\frac{1}{p}+\frac{1}{q}<1. Then the following conclusions hold:

(i) With FθF_{\theta} as in 1.1 part (iv), and ϑW,ν\vartheta_{W,\nu} as in (1.18), we have Ξ(Fθ)~p\Xi(F_{\theta})\subseteq\widetilde{\mathcal{M}}_{p} and ϑW,ν()\vartheta_{W,\nu}(\cdot) is well-defined for every ν~p\nu\in\widetilde{\mathcal{M}}_{p}, a.s. on [0,1][0,1].

(ii) Set

𝔅θ:={Law(U,ϑW,ν(U)):νΞ(Fθ)}~.\mathfrak{B}^{*}_{\theta}:=\{\mathrm{Law}(U,\vartheta_{W,\nu}(U)):\nu\in\Xi(F_{\theta})\}\subset\widetilde{\mathcal{M}}.

Then we have:

(1.20) d(𝔏n(𝐦),𝔅θ)𝑃0.d_{\ell}\left(\mathfrak{L}_{n}(\mathbf{m}),\mathfrak{B}^{*}_{\theta}\right)\overset{P}{\longrightarrow}0.

The above theorem gives a weak limit for the empirical measure of the local field vector 𝐦\bf m. The weak law for the empirical measure of the conditional means (introduced in (1.17)) then follows from 1.3 by a continuous mapping type argument. The limit in that case will naturally be the set

(1.21) 𝔅~θ:={Law(U,α(θϑW,ν(U))):νΞ(Fθ)},\displaystyle\widetilde{\mathfrak{B}}_{\theta}:=\{\mbox{Law}(U,\alpha^{\prime}(\theta\vartheta_{W,\nu}(U))):\ \nu\in\Xi(F_{\theta})\},

We stress here that no assumption of replica-symmetry is necessary for 1.3 to hold.

Remark 1.3.

Given νΞ(Fθ)\nu\in\Xi(F_{\theta}), let ν(2|1)()\nu^{(2|1)}(\cdot) denote the conditional distribution of the second coordinate given the first coordinate. The proof of (1.20) can be adapted to show the following stronger conclusion:

d(1ni=1nδ(in,Xi,mi),𝔅¯θ)𝑃0,d_{\ell}\left(\frac{1}{n}\sum_{i=1}^{n}\delta_{\left(\frac{i}{n},X_{i},m_{i}\right)},\underline{\mathfrak{B}}_{\theta}\right)\overset{P}{\longrightarrow}0,

where

𝔅¯θ:={(U,ν(2|1)(U),ϑW,ν(U)):νΞ(Fθ)}.\underline{\mathfrak{B}}_{\theta}:=\{(U,\nu^{(2|1)}(U),\vartheta_{W,\nu}(U)):\ \nu\in\Xi(F_{\theta})\}.

Since this version is not necessary for our applications, we do not prove it here.

In order to obtain weak laws for common statistics of interest using 1.3, we require appropriate tail estimates for the XiX_{i}’s, the mim_{i}’s and the α(θmi)\alpha^{\prime}(\theta m_{i})’s. In particular, we will derive exponential tail bounds for the said quantities below, which is of possible independent interest.

Theorem 1.4.

Consider the same setting as in 1.1. Then the following conclusions hold:

  1. (i)

    There exists C0>0C_{0}>0, free of nn, such that for all CC0C\geq C_{0} we have

    (1.22) lim supn1nlog(1ni=1n|Xi|pC)<0\displaystyle\limsup\limits_{n\to\infty}\frac{1}{n}\log\mathbb{P}\left(\frac{1}{n}\sum_{i=1}^{n}|X_{i}|^{p}\geq C\right)<0 i=1n𝔼|Xi|p=O(n),\displaystyle\Rightarrow\sum_{i=1}^{n}\mathbb{E}|X_{i}|^{p}=O(n),
    (1.23) lim supn1nlog(1ni=1n|mi|qC)<0\displaystyle\limsup\limits_{n\to\infty}\frac{1}{n}\log\mathbb{P}\left(\frac{1}{n}\sum_{i=1}^{n}|m_{i}|^{q}\geq C\right)<0 i=1n𝔼|mi|q=O(n),\displaystyle\Rightarrow\sum_{i=1}^{n}\mathbb{E}|m_{i}|^{q}=O(n),
    (1.24) lim supn1nlog(1ni=1n|α(θmi)|pC)<0\displaystyle\limsup\limits_{n\to\infty}\frac{1}{n}\log\mathbb{P}\left(\frac{1}{n}\sum_{i=1}^{n}|\alpha^{\prime}(\theta m_{i})|^{p}\geq C\right)<0 i=1n𝔼|α(θmi)|p=O(n).\displaystyle\Rightarrow\sum_{i=1}^{n}\mathbb{E}|\alpha^{\prime}(\theta m_{i})|^{p}=O(n).
  2. (ii)

    Moreover,

    (1.25) supνΞ(Fθ)𝔪p(ν)<,supν𝔅θ𝔪q(ν)<,supν𝔅~θ𝔪p(ν)<,\displaystyle\sup_{\nu\in\Xi(F_{\theta})}\mathfrak{m}_{p}(\nu)<\infty,\quad\sup_{\nu\in\mathfrak{B}_{\theta}^{*}}\mathfrak{m}_{q}(\nu)<\infty,\quad\sup_{\nu\in\widetilde{\mathfrak{B}}_{\theta}}\mathfrak{m}_{p}(\nu)<\infty,

    where 𝔪p(ν),𝔪q(ν)\mathfrak{m}_{p}(\nu),\mathfrak{m}_{q}(\nu) are given by 1.2.

In order to interpret part (ii) of the above theorem, note that 𝔏n(𝐗)\mathfrak{L}_{n}({\bf X}), 𝔏n(𝐦)\mathfrak{L}_{n}({\bf m}), and 𝔏n(𝜶)\mathfrak{L}_{n}({\boldsymbol{\alpha}}) converge weakly in probability to the set of probability measures Ξ(Fθ)\Xi(F_{\theta}) (by (1.10)), 𝔅θ\mathfrak{B}_{\theta}^{*} (by (1.20)), and 𝔅~θ\widetilde{\mathfrak{B}}_{\theta} (as discussed around (1.21)) respectively. 1.4 part (ii) shows that these limiting set of measures have uniformly bounded moments of a suitable order.

In view of the above results, coupled with the observation made in (1.19), it seems intuitive to expect a correspondence between elements of FθF_{\theta} and the map uα(θmnu))u\mapsto\alpha^{\prime}(\theta m_{\lceil nu\rceil})). This is made precise in the following corollary.

Corollary 1.5.

In the setting of 1.3, we have

(1.26) inffFθ01|α(θmnu)f(u)|p𝑑u𝑃0,\inf_{f\in F_{\theta}}\int_{0}^{1}|\alpha^{\prime}(\theta m_{\lceil nu\rceil})-f(u)|^{p^{\prime}}\,du\overset{P}{\longrightarrow}0,

for any p<pp^{\prime}<p, under the measure (1.3).

Remark 1.4.

Recall from (1.17) that α(θmi)=𝔼[Xi|Xj,ji]\alpha^{\prime}(\theta m_{i})=\mathbb{E}[X_{i}|X_{j},\ j\neq i]. Therefore (1.26) shows that the functions in FθF_{\theta} are “close” to the vector of conditional expectations of XiX_{i}’s given all the other coordinates. In particular, if Fθ={f}F_{\theta}=\{f\} is a singleton and ff is continuous on [0,1][0,1], then (1.26) and (1.17) together imply that

1ni=1n|α(θmi)f(in)|p𝑃0.\frac{1}{n}\sum_{i=1}^{n}\bigg{|}\alpha^{\prime}(\theta m_{i})-f\left(\frac{i}{n}\right)\bigg{|}^{p^{\prime}}\overset{P}{\longrightarrow}0.

For the special case where v=2v=2, μ\mu is supported on {1,1}\{-1,1\} with μ(1)=μ(1)=0.5\mu(1)=\mu(-1)=0.5, and 𝒯[Sym[W]]()=1\mathcal{T}[\mathrm{Sym}[W]](\cdot)=1 a.s, FθF_{\theta} consists of two constant functions of the form {tθ,tθ}\{-t_{\theta},t_{\theta}\} for some tθ>0t_{\theta}>0 (see [18, Page 144]; also see 1.7 for general μ\mu). The symmetry of n,θ\mathbb{R}_{n,\theta} around 0, coupled with (1.26) implies

1ni=1n||tanh(θmi)|tθ|𝑃0.\frac{1}{n}\sum_{i=1}^{n}\bigg{|}|\tanh(\theta m_{i})|-t_{\theta}\bigg{|}\overset{P}{\longrightarrow}0.
Remark 1.5.

Note that the above displays are not true with XiX_{i} replacing tanh(θmi)=𝔼[Xi|Xj,ji]\tanh(\theta m_{i})=\mathbb{E}[X_{i}|X_{j},\ j\neq i]. This shows mim_{i}’s are “more concentrated” than XiX_{i}’s.

As applications of 1.3 and 1.4, we obtain weak limits for linear statistics as well as the Hamiltonian n,θ\mathbb{R}_{n,\theta} under replica-symmetry, both of which are of independent interest.

Theorem 1.6.

Suppose 𝐗n,θ{\mathbf{X}}\sim\mathbb{R}_{n,\theta} (defined via (1.3)) for some base measure μ\mu which satisfies (1.6) for some p[v,]p\in[v,\infty]. Assume that either vv is even or μ\mu is stochastically non-negative. Further, suppose that {Qn}n1\{Q_{n}\}_{n\geq 1} satisfies (1.4) and (1.8) for some q>1q>1 satisfying 1p+1q<1\frac{1}{p}+\frac{1}{q}<1, and that the limiting WW is strictly positive and satisfies 𝒯[Sym[W]](x)=1\mathcal{T}[\mathrm{Sym}[W]](x)=1 a.s. Then, setting

(1.27) 𝒜θ:=argmint𝒩[γ(β(t))θtv],\displaystyle\mathcal{A}_{\theta}:=\operatorname{argmin}_{t\in\mathcal{N}}\left[\gamma(\beta(t))-\theta t^{v}\right],

we have 𝒜θ\mathcal{A}_{\theta} is a finite set, and the following conclusions hold:

(i) Suppose {ci}i1\{c_{i}\}_{i\geq 1} is a real sequence satisfying i=1nci=o(n)\sum_{i=1}^{n}c_{i}=o(n), and i=1n|ci|r=O(n)\sum_{i=1}^{n}|c_{i}|^{r}=O(n) for some rr such that 1p+1r<1\frac{1}{p}+\frac{1}{r}<1. Then we have:

1ni=1nciXi𝑃0.\frac{1}{n}\sum_{i=1}^{n}c_{i}X_{i}\overset{P}{\longrightarrow}0.

(ii) If we replace i=1nci=o(n)\sum_{i=1}^{n}c_{i}=o(n) with n1i=1ncic~n^{-1}\sum_{i=1}^{n}c_{i}\to\tilde{c}, then

d(1ni=1nciXi,{c~t:t𝒜θ})𝑃0.d_{\ell}\left(\frac{1}{n}\sum_{i=1}^{n}c_{i}X_{i},\{{\tilde{c}t}:\ t\in\mathcal{A}_{\theta}\}\right)\overset{P}{\longrightarrow}0.

(iii) The Hamiltonian satisfies

d(1ni=1nXimi,{vtv:t𝒜θ})𝑃0.d_{\ell}\left(\frac{1}{n}\sum_{i=1}^{n}X_{i}m_{i},\{{vt^{v}}:\ t\in\mathcal{A}_{\theta}\}\right)\overset{P}{\longrightarrow}0.
Remark 1.6.

Part (i) of the above Theorem shows that for contrast vectors 𝐜{\bf c} (i.e. i=1nci=0\sum_{i=1}^{n}c_{i}=0) which are delocalized (in the sense i=1n|ci|r=O(n)\sum_{i=1}^{n}|c_{i}|^{r}=O(n)), the corresponding linear statistic exhibits a universal behavior across general Gibbs measures with higher order multilinear interactions which doesn’t depend on the matrix sequence {Qn}n1\{Q_{n}\}_{n\geq 1}, as long as 𝒯[Sym[W]](.)\mathcal{T}[\mathrm{Sym}[W]](.) is constant, i.e. the symmetrized tensor is regular. In a similar manner, part (ii) gives a universal behavior for the global magnetization X¯\bar{X} for regular tensors. Universality results for X¯\bar{X} were earlier obtained for regular Ising models, which correspond to v=2v=2 and μ\mu is supported on {1,1}\{-1,1\} with μ(1)=μ(+1)=0.5\mu(-1)=\mu(+1)=0.5 (see [3, Theorem 2.1] and [17, Theorems 1.1—1.4]). In this special case, for θ>0.5\theta>0.5 we have 𝒜θ={tθ,tθ}\mathcal{A}_{\theta}=\{-t_{\theta},t_{\theta}\} for some tθ>0t_{\theta}>0 (see Remark 1.4). In this case symmetry implies (see 1.8 part (ii) below for a more general result) that:

X¯𝑑δtθ+δtθ2,1ni=1nXimi𝑃2tθ2.\bar{X}\overset{d}{\longrightarrow}\frac{\delta_{-t_{\theta}}+\delta_{t_{\theta}}}{2},\quad\frac{1}{n}\sum_{i=1}^{n}X_{i}m_{i}\overset{P}{\longrightarrow}2t_{\theta}^{2}.

The more recent work of [25] demonstrates universality for quadratic interactions for log concave μ\mu (see [25, Theorem 1.1 and Corollary 1.4]). We note that the results of the current paper requires neither quadratic interactions, nor log-concave base measures. In the following subsection, we will apply 1.6 to analyze a broad class of Gibbs measures which are not necessarily quadratic, and cover cubic and higher order interactions (see 1.9 below).

1.2. Examples

We now apply our general results to analyze some Gibbs measures of interest. In 1.2, we proved that for regular tensors (i.e. when 𝒯[Sym[W]]()=1\mathcal{T}[\mathrm{Sym}[W]](\cdot)=1 a.s.) the optimization problem (1.9) has only constant functions as optimizers, under mild assumptions on μ\mu or vv. In this section, we focus on particular examples of the regular case, and provide more explicit description for the optimizers.

1.2.1. Quadratic interaction models with symmetric base measure

Suppose μ\mu is a probability measure on \mathbb{R} which is symmetric about the origin. Define a Gibbs measure on n\mathbb{R}^{n} by setting

(1.28) dn,θ,Bquaddμn(𝐗)=exp(θnijQn(i,j)XiXj+Bi=1nXinZnquad(θ,B)),\displaystyle\frac{d\mathbb{R}^{\mathrm{quad}}_{n,\theta,B}}{d\mu^{\otimes n}}({\bf X})=\exp\Big{(}\frac{\theta}{n}\sum_{i\neq j}Q_{n}(i,j)X_{i}X_{j}+B\sum_{i=1}^{n}X_{i}-nZ^{\mathrm{quad}}_{n}(\theta,B)\Big{)},

where θ0\theta\geq 0, BB\in\mathbb{R}. In particular if μ\mu is supported on {1,1}\{-1,1\} reduces the above model to the celebrated Ising model, which has attracted significant attention in probability and statistics (c.f. [3, 18, 20, 30, 17] and references there-in). The following results analyzes the optimization problem (1.9) in the particular setting (which corresponds to setting H=K2H=K_{2}).

Lemma 1.7.

Let μ\mu be a probability measure symmetric about 0, which satisfies (1.6) with p2p\geq 2, and let α()\alpha(\cdot), β()\beta(\cdot), and 𝒩\mathcal{N} be as in 1.3. Assume that

(1.29) α′′(x)α′′(y) for all |x||y|.\alpha^{\prime\prime}(x)\leq\alpha^{\prime\prime}(y)\quad\mbox{ for all }\quad|x|\geq|y|.

Then, setting

𝔳θ,B,μ(x):=θx2+Bxxβ(x)+α(β(x))\mathfrak{v}_{\theta,B,\mu}(x):=\theta x^{2}+Bx-x\beta(x)+\alpha(\beta(x))

for θ0\theta\geq 0, B,xcl(𝒩)B\in\mathbb{R},x\in{\rm cl}(\mathcal{N}), the following conclusions hold:

  1. (i)

    If 2θ(α′′(0))12\theta\leq(\alpha^{\prime\prime}(0))^{-1} and B=0B=0, then 𝔳θ,B,μ()\mathfrak{v}_{\theta,B,\mu}(\cdot) has a unique maximizer at tθ,B,μ=0t_{\theta,B,\mu}=0.

  2. (ii)

    If B0B\neq 0, then 𝔳θ,B,μ()\mathfrak{v}_{\theta,B,\mu}(\cdot) has a unique maximizer tθ,B,μt_{\theta,B,\mu} with the same sign as that of BB which satisfies tθ,B,μ=α(2θtθ,B,μ+B)t_{\theta,B,\mu}=\alpha^{\prime}(2\theta t_{\theta,B,\mu}+B).

  3. (iii)

    If 2θ>(α′′(0))12\theta>(\alpha^{\prime\prime}(0))^{-1} and B=0B=0, then 𝔳θ,B,μ()\mathfrak{v}_{\theta,B,\mu}(\cdot) has two maximizers ±tθ,B,μ\pm t_{\theta,B,\mu}, where tθ,B,μ>0t_{\theta,B,\mu}>0, and tθ,B,μ=α(2θtθ,B,μ)t_{\theta,B,\mu}=\alpha^{\prime}(2\theta t_{\theta,B,\mu}).

The proof of 1.7 is provided in Section 3.

Remark 1.7.

A few comments about the extra condition (1.29) utilized in the above lemma are in order. First note that that if any measure μ\mu satisfies the celebrated GHS inequality of statistical physics (see [19]), then μ\mu must satisfy (1.29). Indeed, taking the matrix JJ in [19, (1.2)] to be the 𝟎{\bf 0} matrix, it follows on applying the GHS inequality ([19, (1.4)] that α′′′(θ)0\alpha^{\prime\prime\prime}(\theta)\leq 0 for θ0\theta\geq 0, which immediately implies (1.29). Sufficient conditions on μ\mu for the GHS inequality (and hence (1.29)) can be found in [19, Theorem 1.2]. In [19, (1.5)] the authors give a counterexample where GHS inequality fails. Using the same example, it is not hard to show that (1.29) fails in this case as well, and 𝔳()\mathfrak{v}(\cdot) does not have a unique maximizer for B=0B=0 and some θ(α′′(0))1/2\theta\leq(\alpha^{\prime\prime}(0))^{-1}/2.

Proposition 1.8.

Suppose that the measure μ\mu satisfies the assumptions of 1.7, and {Qn}n1\{Q_{n}\}_{n\geq 1} satisfy (1.4) and (1.8) with H=K2H=K_{2} (i.e. Δ=1\Delta=1). Also assume that the limiting graphon WW is strictly positive a.s., and satisfies 01W(.,y)dy=1\int_{0}^{1}W(.,y)dy=1 a.s. With \mathcal{L} as in 1.4, define the functional GW:G_{W}:\mathcal{L}\mapsto\mathbb{R} by setting

(1.30) GW(f):=[0,1]2W(x,y)f(x)f(y)𝑑x𝑑y.\displaystyle G_{W}(f):=\int_{[0,1]^{2}}W(x,y)f(x)f(y)dxdy.

Then GWG_{W} is well defined by 1.1 part (i), and further, the following conclusions hold:

(i) For any θ0\theta\geq 0, BB\in\mathbb{R} the optimization problem

(1.31) supf{θGW(f)[0,1]γ(β(f(x)))𝑑x}\displaystyle\sup_{f\in\mathcal{L}}\{\theta G_{W}(f)-\int_{[0,1]}\gamma(\beta(f(x)))dx\}

has only constant global maximizers, given by

Fθ,B{0 if θ(α′′(0))1/2,B=0,±tθ,B,μ if θ>(α′′(0))1/2,B=0,tθ,B,μ if B0{F}_{\theta,B}\equiv\begin{cases}0&\text{ if }\theta\leq(\alpha^{\prime\prime}(0))^{-1}/{2},B=0,\\ \pm{t_{\theta,B,\mu}}&\text{ if }\theta>(\alpha^{\prime\prime}(0))^{-1}/{2},B=0,\\ {t_{\theta,B,\mu}}&\text{ if }B\neq 0\end{cases}

Here tθ,B,μt_{\theta,B,\mu} is as in 1.7.

(ii) The following weak limits hold:

1ni=1nXimi𝑑2tθ,B,μ2,X¯𝑑{0 if θ(α′′(0))1/2,B=0,δtθ,B,μ+δtθ,B,μ2 if θ>(α′′(0))1/2,B=0,tθ,B,μ if B0.\frac{1}{n}\sum_{i=1}^{n}X_{i}m_{i}\overset{d}{\longrightarrow}2t_{\theta,B,\mu}^{2},\qquad\bar{X}\overset{d}{\longrightarrow}\begin{cases}0&\text{ if }\theta\leq(\alpha^{\prime\prime}(0))^{-1}/{2},B=0,\\ \frac{\delta_{t_{\theta,B,\mu}}+\delta_{-t_{\theta,B,\mu}}}{2}&\text{ if }\theta>(\alpha^{\prime\prime}(0))^{-1}/{2},B=0,\\ {t_{\theta,B,\mu}}&\text{ if }B\neq 0\end{cases}.
Remark 1.8.

We note here that in contrast to  1.6 part (iii) which only allows us to identify possible limit points for the random variable 1ni=1nXimi\frac{1}{n}\sum_{i=1}^{n}X_{i}m_{i}, in this case we are able to identify the limit, even in the low temperature regime θ>(α′′(0))1/2\theta>(\alpha^{\prime\prime}(0))^{-1}/2. This is because, even though 𝔳θ,B,μ(.)\mathfrak{v}_{\theta,B,\mu}(.) has two roots which are both global optimzers (i.e. in 𝒜θ\mathcal{A}_{\theta} defined in 1.6), the optimizers are symmetric, and the limit of the Hamiltonian is the same under both optimizers.

1.2.2. Gibbs measures with higher order interactions

We now focus on Gibbs measures with higher order interactions, which has gained significant attention in recent years (see [29, 2, 23, 26, 31, 32, 33, 6] and the references therein). Here, we analyze the optimization problem (1.27), under some conditions on θ\theta and μ\mu. We point the reader to [5, Section 2.1] for related results in the special case where μ\mu is supported on {1,1}\{-1,1\}.

Theorem 1.9.

Consider the optimization problem

(1.32) supf:[0,1]γ(β(f(x)))𝑑x<{θGW(f)+Bf(x)𝑑x[0,1]γ(β(f(x)))𝑑x}.\displaystyle\sup_{f\in\mathcal{L}:\ \int_{[0,1]}\gamma(\beta(f(x)))dx<\infty}\left\{\theta G_{W}(f)+B\int f(x)dx-\int_{[0,1]}\gamma(\beta(f(x)))dx\right\}.

Then the following conclusions hold:

(i) Fixing θ\theta\in\mathbb{R}, the maximizers of the optimization problem are attained and satisfies the equation

(1.33) f(x)=a.s.α(θv[0,1]v1Sym[W](x,x2,,xv)(a=2vf(xa)dxa)+B).\displaystyle f(x)\stackrel{{\scriptstyle a.s.}}{{=}}\alpha^{\prime}\left(\theta v\int_{[0,1]^{v-1}}\mathrm{Sym}[W](x,x_{2},\ldots,x_{v})\left(\prod_{a=2}^{v}f(x_{a})\,dx_{a}\right)+B\right).

(ii) If 𝒯[Sym[W]]()\mathcal{T}[\mathrm{Sym}[W]](\cdot) is constant a.s., and WW is strictly positive a.s., and θ,B0\theta,B\geq 0, then all of the maximizers are constant functions, provided provided either vv is even or μ\mu is stochastically non-negative. Further any such constant maximizer xx satisfies

(1.34) x=a.s.α(θvxv1+B).x\stackrel{{\scriptstyle a.s.}}{{=}}\alpha^{\prime}\left(\theta vx^{v-1}+B\right).

(iii) Suppose further that μ\mu is compactly supported on [1,1][-1,1] Then the following hold:

  1. (a)

    There exists B0=B0(θ,v)B_{0}=B_{0}(\theta,v) such that if B>B0B>B_{0}, the optimization problem has a unique maximizer.

  2. (b)

    If B=0B=0 and α(0)=0\alpha^{\prime}(0)=0, there exists θc(0,)\theta_{c}\in(0,\infty) such that if θ<θc\theta<\theta_{c}, the optimization problem has the unique maximizer x=0x=0, whereas if θ>θc\theta>\theta_{c}, then x=0x=0 is not a global maximizer.

1.3. Proof overview and future scope

Let us discuss the proof techniques employed in the characterization of replica symmetry (see 1.2) and weak laws (see Theorems 1.3 and 1.6). In 1.2 part (i), we establish the first-order conditions (in (1.13)) for the optimization problem (1.9). It is immediate from (1.13) that if all optimizers of (1.9) are constants, then 𝒯[Sym[W]]()\mathcal{T}[\mathrm{Sym}[W]](\cdot) is a constant function (1.2, part (ii)). For the other direction, if 𝒯[Sym[W]]()\mathcal{T}[\mathrm{Sym}[W]](\cdot) is a constant, the crucial observation is that Sym[W](x1,,xv)\mathrm{Sym}[W](x_{1},\ldots,x_{v}) is a (possibility un-normalized) probability density function on [0,1]v[0,1]^{v}, with Unif[0,1]\mathrm{Unif}[0,1] marginals. The conclusion in 1.2 part (iii) then follows from the equality conditions of Hölder’s inequality. We provide examples to demonstrate that our conditions required for replica symmetry are essentially tight.

The weak limits we prove involve a number of technical steps. We distil some of the main ideas here in the context of the universal result that n1i=1nciXi𝑃0n^{-1}\sum_{i=1}^{n}c_{i}X_{i}\overset{P}{\longrightarrow}0 provided i=1nci=o(n)\sum_{i=1}^{n}c_{i}=o(n), under any multilinear Gibbs in the replica symmetry phase (see 1.6 part (ii)). For simplicity, let us assume that the optimization problem (1.9) has a unique constant optimizer, say f(x)tf(x)\equiv t (note that the actual result does not require uniqueness of optimizers). Let us split the proof outline into a few steps.

Step (i). Recall the definition of mim_{i} from (1.14). We first show that

1ni=1nci(Xi𝔼[Xi|Xj,ji])=1ni=1nci(Xiα(θmi))=oP(1).\frac{1}{n}\sum_{i=1}^{n}c_{i}(X_{i}-\mathbb{E}[X_{i}|X_{j},\ j\neq i])=\frac{1}{n}\sum_{i=1}^{n}c_{i}(X_{i}-\alpha^{\prime}(\theta m_{i}))=o_{P}(1).

This is the subject of 2.10 part (a), and proceeds with a second moment argument, after a suitable truncation. The above display now suggests that it is sufficient to show that n1i=1nciα(θmi)𝑃0n^{-1}\sum_{i=1}^{n}c_{i}\alpha^{\prime}(\theta m_{i})\overset{P}{\longrightarrow}0.

Step (ii). Based on step (i), it is natural to focus on the vector of local fields 𝐦=(m1,,mn){\bf m}=(m_{1},\ldots,m_{n}). The advantage of working with 𝐦{\bf m} rather than 𝐗{\bf X} is that each mim_{i} is a (v1)(v-1)-th order “weighted average”, and hence they are much more “concentrated” than XiX_{i}’s. We provide a formalization in 1.3 where (in the current setting) we show that

(1.35) 1ni=1nδmi𝑑δvtv1,\frac{1}{n}\sum_{i=1}^{n}\delta_{m_{i}}\overset{d}{\longrightarrow}\delta_{vt^{v-1}},

which is a degenerate limit. In contrast, by 1.1 part (iii), 1ni=1nδXi𝑑μβ(t)\frac{1}{n}\sum_{i=1}^{n}\delta_{X_{i}}\overset{d}{\longrightarrow}\mu_{\beta(t)} which is a non-degenerate limit. The proof of 1.3 relies primarily on 2.5, which is a stability lemma, the proof of which proceeds relies on counting lemma for LpL^{p} graphons. In fact, 2.5 can be viewed as a refinement of the counting lemma in [8, Proposition 2.19] for “star-like” graphs.

Step (iii). Based on (1.35) in step (ii), it is natural to consider the following approximation:

1ni=1nciα(θmi)α(θvtv1)1ni=1nci=o(1).\frac{1}{n}\sum_{i=1}^{n}c_{i}\alpha^{\prime}(\theta m_{i})\approx\alpha^{\prime}(\theta vt^{v-1})\frac{1}{n}\sum_{i=1}^{n}c_{i}=o(1).

In the first approximation, we have essentially replaced the mim_{i}’s by the corresponding weak limit from (1.35). To make this rigorous, we need some moment estimates which are immediate byproducts of the exponential tail bounds in 1.4. The final conclusion in the above display uses the condition i=1nci=o(n)\sum_{i=1}^{n}c_{i}=o(n).

More generally, the weak limit of 𝐦{\bf m} in 1.3 has broad applications. We use it in 1.5 to provide a probabilistic interpretation of the optimizers of (1.9) (note that this does not require the optimizers to be constant functions). We also use 1.3 to derive other weak laws of interest in 1.6. We note in passing that many other statistics of interest, such as the maximum likelihood or the pseudo-maximum likelihood estimators for the inverse temperature parameter are also expressible (sometimes implicitly) as functions of 𝐦{\bf m}. Consequently one can derive appropriate weak laws for these estimators using 1.3 as well.

Our work leads to several important future research directions. Our results apply, as a special case, to Ising models with quadratic Hamiltonians, and a general base measure. A first question is to extend the techniques of this paper to cover more general Hamiltonians from statistical physics, such as Potts models. Another related question is to go beyond the setting of cut metric convergence, and allow for the matrix {Qn}n1\{Q_{n}\}_{n\geq 1} to converge in other topologies (such as local weak convergence on bounded degree graphs). A third question is to study Gibbs measure under more general tensor Hamiltonians, which cannot be specified by a matrix QnQ_{n}. This would require significant development of cut metric theory for cubic and higher order functions. Finally, it remains to be seen whether we can answer more delicate questions about such Gibbs measures, which include Central Limit Theorems/limit distributions.

1.4. Outline of the paper

In Sections 2 and 3, we prove our main results from Sections 1.1 and 1.2 respectively. The proofs of the major technical lemmas (in the order in which they are presented in the paper) are provided in Section 4. In the Appendix 5, we defer the proof some of our supporting lemmas, which deal with properties of the base measure μ\mu, and general results on weak convergence.

2. Proof of Main Results

2.1. Proofs of 1.1 and 1.2

In order to prove 1.1, we need the following preparatory result.

Proposition 2.1.

Fix any v2v\geq 2, p1p\geq 1, q>1q>1 such that 1p+1q1\frac{1}{p}+\frac{1}{q}\leq 1 and W𝒲W\in\mathcal{W}. Fix any probability measure ν\nu supported on [0,1]×[0,1]\times\mathbb{R} with first marginal Unif[0,1]\mathrm{Unif}[0,1] and sample (U1,V1),,(Uv,Vv)i.i.d.ν(U_{1},V_{1}),\ldots,(U_{v},V_{v})\overset{i.i.d.}{\sim}\nu. Then the following conclusions hold:

  1. (i)

    We have:

    𝔼((a,b)E(H)|W(Ua,Ub)|)WΔ|E(H)|.\displaystyle\mathbb{E}\left(\prod_{(a,b)\in E(H)}|W(U_{a},U_{b})|\right)\leq\|W\|_{\Delta}^{|E(H)|}.
  2. (ii)

    For any measurable ϕ:v\phi:\mathbb{R}^{v}\to\mathbb{R} we have

    𝔼[((a,b)E(H)|W(Ua,Ub)|)|ϕ(V1,,Vv)|]WqΔ|E(H)|(𝔼|ϕ(V1,,Vv)|p)1p.\displaystyle\mathbb{E}\left[\left(\prod_{(a,b)\in E(H)}|W(U_{a},U_{b})|\right)|\phi(V_{1},\ldots,V_{v})|\right]\leq\|W\|_{q\Delta}^{|E(H)|}\Big{(}\mathbb{E}|\phi(V_{1},\ldots,V_{v})|^{p}\Big{)}^{\frac{1}{p}}.
  3. (iii)

    With Sym[.]\mathrm{Sym}[.] as in 1.6, we have

    𝔼[Sym[|W|](U1,,Uv)q]WqΔq|E(H)|.\displaystyle\mathbb{E}\Big{[}\mathrm{Sym}[|W|](U_{1},\ldots,U_{v})^{q}\Big{]}\leq\|W\|_{q\Delta}^{q|E(H)|}.

Parts (i) and (ii) above follow from [10, Proposition 2.19] and [6, Lemma 2.2] respectively. However, part (iii) is new and a proof is provided in Section 5.2. While the proof of 1.1 only uses 2.1 part (ii), the other parts of 2.1 will be useful in the rest of the paper.

Remark 2.1.

When the RHS of the display in part (ii) of 2.1 is finite, we can define

TW,ϕ(ν):=𝔼[((a,b)E(H)W(Ua,Ub))ϕ(V1,,Vv)].\displaystyle T_{W,\phi}(\nu):=\mathbb{E}\left[\left(\prod_{(a,b)\in E(H)}W(U_{a},U_{b})\right)\phi(V_{1},\ldots,V_{v})\right].
Proof of 1.1.

Under the conditions of 1.1, 2.1 part (ii) implies that TW,ϕ(ν)T_{W,\phi}(\nu) is well-defined and finite.

(i) This is pointed out in [6, Definition 1.6] by invoking [6, Lemma 2.2].

(ii), (iii) These are restatements of parts (i) and (ii) of [6, Theorem 1.6]. The fact that supn1Zn(θ)<\sup_{n\geq 1}Z_{n}(\theta)<\infty follows from the proof of [6, Corollary 1.3]. To prove Ξ(Fθ)\Xi(F_{\theta}) is compact, we invoke [6, Remark 2.1] to note that

Ξ(Fθ)=arginfν~J(ν),\Xi(F_{\theta})=\arg\inf_{\nu\in\widetilde{\mathcal{M}}}J(\nu),

where the function J(.)J(.) defined by

J(ν):=D(ν|ρ)θTW,ϕ(ν)J(\nu):=D(\nu|\rho)-\theta T_{W,\phi}(\nu)

with ϕ(x1,,xv)=a=1vxa\phi(x_{1},\ldots,x_{v})=\prod_{a=1}^{v}x_{a} has compact level sets (by [6, Corollary 1.3, part (ii)]), and ~\widetilde{\mathcal{M}} is a closed subset of probability measures.

Remark 2.2.

We now claim that throughout the rest of the paper, without loss of generality we can assume that WQnW_{Q_{n}} converges to WW in strong cut metric dd_{\square}, instead of the weak cut metric δ\delta_{\square} (see 1.1). Indeed, by definition of the weak-cut convergence, there exists a sequence of permutations {πn}n1\{\pi_{n}\}_{n\geq 1} with πnSn\pi_{n}\in S_{n} such that

d(WQnπn,W)0, where Qnπn(i,j):=Qn(πn(i),πn(j)).d_{\square}(W_{Q_{n}^{\pi_{n}}},W)\rightarrow 0,\text{ where }Q_{n}^{\pi_{n}}(i,j):=Q_{n}(\pi_{n}(i),\pi_{n}(j)).

Then setting Yi=Xπn(i)Y_{i}=X_{\pi_{n}(i)} we have

𝕌n(𝐱)=\displaystyle\mathbb{U}_{n}({\bf x})= 1nv(i1,,iv)𝒮(n,v)(a=1vXia)((a,b)E(H)Qn(ia,ib))\displaystyle\frac{1}{n^{v}}\sum_{(i_{1},\ldots,i_{v})\in\mathcal{S}(n,v)}\left(\prod_{a=1}^{v}X_{i_{a}}\right)\left(\prod_{(a,b)\in E(H)}Q_{n}(i_{a},i_{b})\right)
=\displaystyle= 1nv(i1,,iv)𝒮(n,v)(a=1vXπn(ia))((a,b)E(H)Qn(πn(ia),πn(ib)))\displaystyle\frac{1}{n^{v}}\sum_{(i_{1},\ldots,i_{v})\in\mathcal{S}(n,v)}\left(\prod_{a=1}^{v}X_{\pi_{n}(i_{a})}\right)\left(\prod_{(a,b)\in E(H)}Q_{n}(\pi_{n}(i_{a}),\pi_{n}(i_{b}))\right)
=\displaystyle= 1nv(i1,,iv)𝒮(n,v)(a=1vYia)((a,b)E(H)Qn(πn(ia),πn(ib)))=:𝕌~n(𝐱).\displaystyle\frac{1}{n^{v}}\sum_{(i_{1},\ldots,i_{v})\in\mathcal{S}(n,v)}\left(\prod_{a=1}^{v}Y_{i_{a}}\right)\left(\prod_{(a,b)\in E(H)}Q_{n}(\pi_{n}(i_{a}),\pi_{n}(i_{b}))\right)=:\widetilde{\mathbb{U}}_{n}({\bf x}).

Set ~n,θ\widetilde{\mathbb{R}}_{n,\theta} to be the Gibbs probability measure given by

d~n,θdμn(𝐱)=exp(nθ𝕌~n(𝐱)nZ~n(θ)),\displaystyle\frac{d\widetilde{\mathbb{R}}_{n,\theta}}{d\mu^{\otimes n}}({\bf x})=\exp\Big{(}n\theta\widetilde{\mathbb{U}}_{n}({\bf x})-n\widetilde{Z}_{n}(\theta)\Big{)},

where Z~n(θ)\widetilde{Z}_{n}(\theta) is the corresponding normalizing constant. Now if (X1,,Xn)IIDμ(X_{1},\ldots,X_{n})\stackrel{{\scriptstyle IID}}{{\sim}}\mu, then so does (Xπn(1),,Xπn(n))(X_{\pi_{n}(1)},\ldots,X_{\pi_{n}(n)}), and so

enZ~n(θ)=𝔼μnexp(nθU~n(𝐗))=𝔼μnexp(nθUn(𝐗))=enZn(θ).e^{n\widetilde{Z}_{n}(\theta)}=\mathbb{E}_{\mu^{\otimes n}}\exp\Big{(}n\theta\widetilde{U}_{n}({\bf X})\Big{)}=\mathbb{E}_{\mu^{\otimes n}}\exp\Big{(}n\theta{U}_{n}({\bf X})\Big{)}=e^{n{Z}_{n}(\theta)}.

Thus for any λ\lambda\in\mathbb{R} we have

𝔼R~n,θexp(nλU~n(𝐗))=enZ~n(θ+λ)nZ~n(θ)=enZn(θ+λ)nZn(θ)=𝔼Rn,θexp(nλUn(𝐗)).\mathbb{E}_{\widetilde{R}_{n,\theta}}\exp\Big{(}n\lambda\widetilde{U}_{n}({\bf X})\Big{)}=e^{n\widetilde{Z}_{n}(\theta+\lambda)-n\widetilde{Z}_{n}(\theta)}=e^{n{Z}_{n}(\theta+\lambda)-n{Z}_{n}(\theta)}=\mathbb{E}_{{R}_{n,\theta}}\exp\Big{(}n\lambda{U}_{n}({\bf X})\Big{)}.

In the above display, all quantities are finite and well defined using 1.1 part (ii). Thus the distribution of 𝕌~n(𝐗)\widetilde{\mathbb{U}}_{n}({\bf X}) under ~n,θ\widetilde{\mathbb{R}}_{n,\theta} is same as the distribution of 𝕌n{\mathbb{U}}_{n} under n,θ{\mathbb{R}}_{n,\theta}. Since d(WQnπn,W)0d_{\square}(W_{Q_{n}^{\pi_{n}}},W)\to 0, by replacing WQnW_{Q_{n}} by WQnπnW_{Q_{n}^{\pi_{n}}} without loss of generality we can assume d(WQn,W)0d_{\square}(W_{Q_{n}},W)\to 0 as claimed, which we do throughout the rest of the paper.

Next, we state an elementary property of γ()\gamma(\cdot) that will be useful in proving 1.2 below. A short proof is provided in Section 5.

Lemma 2.2.

The function γβ():cl(𝒩)[0,]\gamma\circ\beta(\cdot):cl(\mathcal{N})\to[0,\infty] is a continuous (possible extended) real-valued function.

Proof of 1.2.

(i) By switching the variables of integration, it is easy to check that the optimization problem (1.9) is equivalent to maximizing the function

𝒢W(f):=θ[0,1]vSym[W](x1,,xv)(a=1vf(xa)dxa)[0,1]γ(β(f(x)))𝑑x.\mathcal{G}_{W}(f):=\theta\int_{[0,1]^{v}}\mathrm{Sym}[W](x_{1},\ldots,x_{v})\left(\prod_{a=1}^{v}f(x_{a})\,dx_{a}\right)-\int_{[0,1]}\gamma(\beta(f(x)))dx.

Note that for all ε[0,1]\varepsilon\in[0,1], gg\in\mathcal{L} and fFθf\in F_{\theta}\subseteq\mathcal{L}, the function f+ε(fg)=(1λ)f+εgf+\varepsilon(f-g)=(1-\lambda)f+\varepsilon g\in\mathcal{L}, and so 𝒢W(f+ε(gf))𝒢W(f)\mathcal{G}_{W}(f+\varepsilon(g-f))\leq\mathcal{G}_{W}(f). This gives

ddε𝒢W(f+ε(gf))|ε=00,\frac{d}{d\varepsilon}\mathcal{G}_{W}(f+\varepsilon(g-f))\bigg{|}_{\varepsilon=0}\leq 0,

which is equivalent to

(2.1) [0,1]((g(x1)f(x1))(β(f(x1))θv[0,1]v1Sym[W](x1,,xv)(a=2vf(xa)dxa))δ(x1)dx1)0.`\displaystyle\footnotesize{\int_{[0,1]}\Big{(}(g(x_{1})-f(x_{1}))\underbrace{\left(\beta(f(x_{1}))-\theta v\int_{[0,1]^{v-1}}\mathrm{Sym}[W](x_{1},\ldots,x_{v})\left(\prod_{a=2}^{v}f(x_{a})\,dx_{a}\right)\right)}_{\delta(x_{1})}\,dx_{1}\Big{)}\geq 0.}`

We will show that λ({x1[0,1]:δ(x1)0})=0\lambda(\{x_{1}\in[0,1]:\ \delta(x_{1})\neq 0\})=0, where λ\lambda denotes the Lebesgue measure on \mathbb{R}. Let us assume the contrary. Without loss of generality, assume that λ({x1[0,1]:δ(x1)>0})>0\lambda(\{x_{1}\in[0,1]:\ \delta(x_{1})>0\})>0. On this set, we have

f(x1)>α(θv[0,1]v1Sym[W](x1,,xv)(a=2vf(xa)dxa))=:v(x1),f(x_{1})>\alpha^{\prime}\left(\theta v\int_{[0,1]^{v-1}}\mathrm{Sym}[W](x_{1},\ldots,x_{v})\left(\prod_{a=2}^{v}f(x_{a})\,dx_{a}\right)\right)=:v(x_{1}),

yielding

λ({x1[0,1]:δ(x1)>0,f(x1)>v(x1)})>0.\lambda(\{x_{1}\in[0,1]:\ \delta(x_{1})>0,f(x_{1})>v(x_{1})\})>0.

This implies that there exists ε>0\varepsilon>0 such that

λ(𝒜ε)>0,𝒜ε:={x1[0,1]:δ(x1)>ϵ,f(x1)>v(x1)+ε})>0.\lambda(\mathcal{A}_{\varepsilon})>0,\quad\mathcal{A}_{\varepsilon}:=\{x_{1}\in[0,1]:\ \delta(x_{1})>\epsilon,f(x_{1})>v(x_{1})+\varepsilon\})>0.

Define a function g:[0,1]cl(𝒩)g:[0,1]\mapsto{\rm cl}(\mathcal{N}) by setting

g(x1):={f(x1)εifx1𝒜ϵ,f(x1)otherwise.g(x_{1}):=\begin{cases}f(x_{1})-\varepsilon&\mbox{if}\ x_{1}\in\mathcal{A}_{\epsilon},\\ f(x_{1})&\mbox{otherwise}.\end{cases}

Note that gg\in\mathcal{L}, as ff\in\mathcal{L}, and (g(x1)f(x1))δ(x1)𝑑x1<0\int(g(x_{1})-f(x_{1}))\delta(x_{1})dx_{1}<0, contradicting (2.1). This shows that f(x1)=v(x1)f(x_{1})=v(x_{1}) a.s., as desired.

(ii) We will prove the contrapositive. Suppose there exists an almost surely constant function fFθ(1)f\in F_{\theta}^{(1)}, say f(x)=c0f(x)=c\neq 0 for a.e. x[0,1]x\in[0,1]. Then by (1.13), we have c=α(θcv1𝒯[Sym[W]](x))c=\alpha^{\prime}(\theta c^{v-1}\mathcal{T}[\mathrm{Sym}[W]](x)) for a.e. x[0,1]x\in[0,1]. This implies 𝒯[Sym[W]]()=β(c)θcv1\mathcal{T}[\mathrm{Sym}[W]](\cdot)=\frac{\beta(c)}{\theta c^{v-1}} is constant almost surely, which is a contradiction.

(iii) Without loss of generality, assume that 𝒯[Sym[W]](x)=1\mathcal{T}[\mathrm{Sym}[W]](x)=1 for a.e. x[0,1]x\in[0,1]. Then Sym[W](x1,,xv)\mathrm{Sym}[W](x_{1},\ldots,x_{v}) is a probability density function on [0,1]v[0,1]^{v} with all marginals uniformly distributed on [0,1][0,1]. By an application of Hölder’s inequality with respect to the probability measure induced by Sym[W]\mathrm{Sym}[W], we then have

𝒢W(f)=𝔼(Z1,,Zv)Sym[W][a=1vf(Za)][0,1]|f|v(x)𝑑x.\displaystyle\mathcal{G}_{W}(f)=\mathbb{E}_{(Z_{1},\ldots,Z_{v})\sim\mathrm{Sym}[W]}\bigg{[}\prod_{a=1}^{v}f(Z_{a})\bigg{]}\leq\int_{[0,1]}|f|^{v}(x)\,dx.

Consequently, it holds that

(2.2) supf𝒢W(f)\displaystyle\sup_{f\in\mathcal{L}}\mathcal{G}_{W}(f) supf{01[θ|f|v(x)γ(β(f(x)))]𝑑x}suptcl(𝒩){θ|t|vγ(β(t))}.\displaystyle\leq\sup_{f\in\mathcal{L}}\left\{\int_{0}^{1}[\theta|f|^{v}(x)-\gamma(\beta(f(x)))]\,dx\right\}\leq\sup_{t\in{\rm cl}(\mathcal{N})}\{\theta|t|^{v}-\gamma(\beta(t))\}.

(a) If vv is even, then (2.2) gives

supf𝒢W(f)suptcl(𝒩){θtvγ(β(t))}.\sup_{f\in\mathcal{L}}\mathcal{G}_{W}(f)\leq\sup_{t\in{\rm cl}(\mathcal{N})}\{\theta t^{v}-\gamma(\beta(t))\}.

Equality holds in the above display by taking ff to be constant functions. To find out the maximizing ff, we need equality in Hölder’s inequality. So ff must be a constant function a.s.
(b) If γ(β(t))γ(β(t))\gamma(\beta(t))\leq\gamma(\beta(-t)) for all t𝒩[0,)t\in\mathcal{N}\cap[0,\infty), the same inequality continues to hold for all tcl(𝒩)(0,)t\in{\rm cl}(\mathcal{N})\cap(0,\infty) by 2.2. Thus (2.2) gives

𝒢W(f)suptcl(𝒩),t0{θtvγ(β(t))},\mathcal{G}_{W}(f)\leq\sup_{\begin{subarray}{c}t\in{\rm cl}(\mathcal{N}),t\geq 0\end{subarray}}\{\theta t^{v}-\gamma(\beta(t))\},

Again equality holds in the above display by taking supremum over constant functions, and the maximizing ff is again constant a.s.

(iv) The result for (a) follows immediately on noting that γ(β(t))=\gamma(\beta(t))=\infty for all t<0t<0. We thus focus on proving (b). In this case there exists a symmetric measure ν\nu such that μ=νB\mu=\nu_{B}. Fixing t>0t>0 such that tα()-t\in\alpha^{\prime}(\mathbb{R}), using symmetry of ν\nu it follows that tα()t\in\alpha^{\prime}(\mathbb{R}), and

α(θ)=αν(θ+B)αν(B), where αν(θ):=logeθx𝑑ν(x) for all θ.\alpha(\theta)=\alpha_{\nu}(\theta+B)-\alpha_{\nu}(B),\text{ where }{\alpha}_{\nu}(\theta):=\log\int_{\mathbb{R}}e^{\theta x}d{\nu}(x)\text{ for all }\theta\in\mathbb{R}.

Thus, with βν{\beta}_{\nu} denoting the inverse of αν{\alpha}_{\nu}, we have βν(t)=β(t)+B{\beta}_{\nu}(t)=\beta(t)+B for all t𝒩μt\in\mathcal{N}_{\mu}, where 𝒩ν{\mathcal{N}}_{\nu} is the natural parameter space of ν{\nu}. This gives

(2.3) γ(β(t))=tβν(t)Btαν(βν(t))+αν(B)=γν(βν(t))+αν(B)Bt.\displaystyle\gamma(\beta(t))=t{\beta}_{\nu}(t)-Bt-{\alpha}_{\nu}({\beta}_{\nu}(t))+{\alpha}_{\nu}(B)={\gamma}_{\nu}({\beta}_{\nu}(t))+{\alpha}_{\nu}(B)-Bt.

As ν{\nu} is symmetric about 0, so γν(){\gamma}_{\nu}(\cdot) and βν(){\beta}_{\nu}(\cdot) are even functions. The assumption B0B\geq 0 along with (2.3) gives γ(β(t))γ(β(t))\gamma(\beta(t))\leq\gamma(\beta(-t)) for t0t\geq 0, completing the proof. ∎

In the sequel, we will first prove 1.4 independently. Then we will prove 1.3 using 1.4. In order to prove 1.4, we need the following lemma whose proof we defer.

Lemma 2.3.

Suppose μ\mu satisfies (1.6) for some p>1p>1.
(i) Then with α()\alpha(\cdot) as in 1.3 we have

limθ±α(θ)|θ|1p1=0.\lim_{\theta\to\pm\infty}\frac{\alpha^{\prime}(\theta)}{|\theta|^{\frac{1}{p-1}}}=0.

(ii) With β()\beta(\cdot) as in 1.3 we have

limx{inf{𝒩},sup{𝒩}}β(x)|x|p1=.\lim_{x\to\{\inf\{\mathcal{N}\},\sup\{\mathcal{N}\}\}}\frac{\beta(x)}{|x|^{p-1}}=\infty.
Proof of 1.4.

(i) (1.22) follows from [6, Eq 2.28].

We next prove (1.23). Fix i[n]i\in[n] and use Hölder’s inequality to note that

|mi|\displaystyle|m_{i}| vnv1(i2,,iv)[n]v1|Sym[Qn](i,i2,,iv)|a=2v|Xia|\displaystyle\leq\frac{v}{n^{v-1}}\sum_{(i_{2},\ldots,i_{v})\in[n]^{v-1}}|\mathrm{Sym}[Q_{n}](i,i_{2},\ldots,i_{v})|\prod_{a=2}^{v}|X_{i_{a}}|
v(1nv1(i2,,iv)[n]v1|Sym[Qn](i,i2,,iv)|q)1q(1nv1(i2,,iv)[n]v1a=2v|Xia|p)1p\displaystyle\leq v\left(\frac{1}{n^{v-1}}\sum_{(i_{2},\ldots,i_{v})\in[n]^{v-1}}|\mathrm{Sym}[Q_{n}](i,i_{2},\ldots,i_{v})|^{q}\right)^{\frac{1}{q}}\left(\frac{1}{n^{v-1}}\sum_{(i_{2},\ldots,i_{v})\in[n]^{v-1}}\prod_{a=2}^{v}|X_{i_{a}}|^{p}\right)^{\frac{1}{p}}
(2.4) =v(1nv1(i2,,iv)[n]v1|Sym[Qn](i,i2,,iv)|q)1q(1nj=1n|Xj|p)v1p.\displaystyle=v\left(\frac{1}{n^{v-1}}\sum_{(i_{2},\ldots,i_{v})\in[n]^{v-1}}|\mathrm{Sym}[Q_{n}](i,i_{2},\ldots,i_{v})|^{q}\right)^{\frac{1}{q}}\left(\frac{1}{n}\sum_{j=1}^{n}|X_{j}|^{p}\right)^{\frac{v-1}{p}}.

Raising both sides to the qthq^{th} power and summing (2.1) over i[n]i\in[n] gives

1ni=1n|mi|q\displaystyle\frac{1}{n}\sum_{i=1}^{n}|m_{i}|^{q} vq(1nj=1n|Xj|p)q(v1)p(1nv(i1,,iv)[n]v|Sym[Qn](i1,i2,,iv)|q)\displaystyle\leq v^{q}\left(\frac{1}{n}\sum_{j=1}^{n}|X_{j}|^{p}\right)^{\frac{q(v-1)}{p}}\left(\frac{1}{n^{v}}\sum_{(i_{1},\ldots,i_{v})\in[n]^{v}}|\mathrm{Sym}[Q_{n}](i_{1},i_{2},\ldots,i_{v})|^{q}\right)
(2.5) vq(1nj=1n|Xj|p)q(v1)pWQnqΔq|E(H)|,\displaystyle\leq v^{q}\left(\frac{1}{n}\sum_{j=1}^{n}|X_{j}|^{p}\right)^{\frac{q(v-1)}{p}}\|W_{Q_{n}}\|_{q\Delta}^{q|E(H)|},

where the last inequality uses 2.1 part (c), with WWQnW\equiv W_{Q_{n}}. The conclusion then follows by (1.8) and (1.22).

Next, we will prove (1.24). By 2.3 part (i), there exists cμ>0c_{\mu}>0 such that for all θ\theta\in\mathbb{R} we have

(2.6) |α(θ)|cμ|θ|1p1.\displaystyle|\alpha^{\prime}(\theta)|\leq c_{\mu}|\theta|^{\frac{1}{p-1}}.

Now, note the following chain of equalities/inequalities with explanations to follow.

|α(θmi)|\displaystyle|\alpha^{\prime}(\theta m_{i})| =|α(θvnv1(i2,,iv)𝒮(n,v,i)|Sym[Qn](i,i2,,iv)|a=2v|Xia|)|\displaystyle=\bigg{|}\alpha^{\prime}\left(\frac{\theta v}{n^{v-1}}\sum_{(i_{2},\ldots,i_{v})\in\mathcal{S}(n,v,i)}|\mathrm{Sym}[Q_{n}](i,i_{2},\ldots,i_{v})|\prod_{a=2}^{v}|X_{i_{a}}|\right)\bigg{|}
cμ(θvnv1(i2,,iv)[n]v1|Sym[Qn](i,i2,,iv)|a=2v|Xia|)1p1\displaystyle\leq c_{\mu}\left(\frac{\theta v}{n^{v-1}}\sum_{(i_{2},\ldots,i_{v})\in[n]^{v-1}}|\mathrm{Sym}[Q_{n}](i,i_{2},\ldots,i_{v})|\prod_{a=2}^{v}|X_{i_{a}}|\right)^{\frac{1}{p-1}}
cμ(θv)1p1(1nv1(i2,,iv)[n]v1|Sym[Qn](i,i2,,iv)|q)1q(p1)(1nj=1n|Xj|p)v1p(p1).\displaystyle\leq c_{\mu}(\theta v)^{\frac{1}{p-1}}\left(\frac{1}{n^{v-1}}\sum_{(i_{2},\ldots,i_{v})\in[n]^{v-1}}|\mathrm{Sym}[Q_{n}](i,i_{2},\ldots,i_{v})|^{q}\right)^{\frac{1}{q(p-1)}}\left(\frac{1}{n}\sum_{j=1}^{n}|X_{j}|^{p}\right)^{\frac{v-1}{p(p-1)}}.

The first inequality follows directly from (2.6). The second inequality follows from (2.1). Raising both sides to the power pp and summing over i[n]i\in[n], we get:

1ni=1n|α(θmi)|p\displaystyle\;\;\;\;\;\frac{1}{n}\sum_{i=1}^{n}|\alpha^{\prime}(\theta m_{i})|^{p}
1ni=1ncμp(θv)pp1(1nv1(i2,,iv)|Sym[Qn](i,i2,,iv)|q)pq(p1)(1nj=1n|Xj|p)v1p1\displaystyle\leq\frac{1}{n}\sum_{i=1}^{n}c_{\mu}^{p}(\theta v)^{\frac{p}{p-1}}\left(\frac{1}{n^{v-1}}\sum_{(i_{2},\ldots,i_{v})}|\mathrm{Sym}[Q_{n}](i,i_{2},\ldots,i_{v})|^{q}\right)^{\frac{p}{q(p-1)}}\left(\frac{1}{n}\sum_{j=1}^{n}|X_{j}|^{p}\right)^{\frac{v-1}{p-1}}
(2.7) cμp(θv)pp1(1+WQnqΔq|E(H)|)(1nj=1n|Xj|p)v1p1.\displaystyle\leq c_{\mu}^{p}(\theta v)^{\frac{p}{p-1}}\left(1+\lVert W_{Q_{n}}\rVert_{q\Delta}^{q|E(H)|}\right)\left(\frac{1}{n}\sum_{j=1}^{n}|X_{j}|^{p}\right)^{\frac{v-1}{p-1}}.

The final inequality follows by noting that |x|pq(p1)1+|x||x|^{\frac{p}{q(p-1)}}\leq 1+|x| and then using 2.1 part (c), with WWQnW\equiv W_{Q_{n}}. The conclusion follows by (1.8) and (1.22).

(ii) The proof of (1.25) is very similar to the proof of (a). Firstly, supνΞ(Fθ)𝔪p(ν)<\sup_{\nu\in\Xi(F_{\theta})}\mathfrak{m}_{p}(\nu)<\infty follows from [6, Eq 2.29]. This also implies:

(2.8) supfFθfp=supνΞ(Fθ)𝔼ν[V|U]psupνΞ(Fθ)𝔼ν|V|p=supνΞ(Fθ)𝔪p(ν)<.\sup_{f\in F_{\theta}}\lVert f\rVert_{p}=\sup_{\nu\in\Xi(F_{\theta})}\lVert\mathbb{E}_{\nu}[V|U]\rVert_{p}\leq\sup_{\nu\in\Xi(F_{\theta})}\mathbb{E}_{\nu}|V|^{p}=\sup_{\nu\in\Xi(F_{\theta})}\mathfrak{m}_{p}(\nu)<\infty.

Next, in the same vein as (2.1), we get

supν𝔅θ𝔪q(ν)vqsupfFθfpq(v1)WqΔq|E(H)|<,\sup_{\nu\in\mathfrak{B}_{\theta}^{*}}\mathfrak{m}_{q}(\nu)\leq v^{q}\sup_{f\in F_{\theta}}\lVert f\rVert_{p}^{q(v-1)}\lVert W\rVert_{q\Delta}^{q|E(H)|}<\infty,

by invoking (2.8) and (1.11), thereby proving the second conclusion. Finally, proceeding similar to (2.1), we have

supν𝔅~θ𝔪p(ν)cμp(θv)pp1(1+WqΔq|E(H)|)supfFθfp(v1)pp1<,\sup_{\nu\in\widetilde{\mathfrak{B}}_{\theta}}\mathfrak{m}_{p}(\nu)\leq c_{\mu}^{p}(\theta v)^{\frac{p}{p-1}}\left(1+\lVert W\rVert_{q\Delta}^{q|E(H)|}\right)\sup_{f\in F_{\theta}}\lVert f\rVert_{p}^{\frac{(v-1)p}{p-1}}<\infty,

where we have used (2.8) and (1.11). This completes the proof of part (b).

2.2. Proof of 1.3

(i) The fact that Ξ(Fθ)~p\Xi(F_{\theta})\subseteq\widetilde{\mathcal{M}}_{p} follows directly from (1.25). By an application of Hölder’s inequality with 2.1 part (iii), we get:

𝔼[Sym[|W|](U1,U2,,Uv)a=2v|Va|]WqΔ|E(H)|(𝔼|V1|p)v1p,\mathbb{E}\left[\mathrm{Sym}[|W|](U_{1},U_{2},\ldots,U_{v})\prod_{a=2}^{v}|V_{a}|\right]\leq\lVert W\rVert_{q\Delta}^{|E(H)|}\left(\mathbb{E}|V_{1}|^{p}\right)^{\frac{v-1}{p}},

which is finite on using (1.6) and (1.11). By Fubini’s Theorem, ϑW,ν(.)\vartheta_{W,\nu}(.) (see (1.18)) is well-defined a.s. on [0,1][0,1], as desired.

Remark 2.3.

Note that the above argument does not require 1/p+1/q<11/p+1/q<1 but the weaker condition 1/p+1/q11/p+1/q\leq 1.

(ii) We begin the proof with the following definition.

Definition 2.1.

Let 𝒲\mathcal{W} and ~p\widetilde{\mathcal{M}}_{p} be as in 1.1 and 1.2 respectively. Recall that 𝔪p(ν)=|x|p𝑑ν(2)(x)<\mathfrak{m}_{p}(\nu)=\int|x|^{p}\,d\nu_{(2)}(x)<\infty (from 1.2) for ν~p\nu\in\widetilde{\mathcal{M}}_{p}. Define

:={(W,ν),W𝒲,ν~p,WqΔ<}.\mathcal{R}:=\{(W,\nu),\ W\in\mathcal{W},\ \nu\in\widetilde{\mathcal{M}}_{p},\ \lVert W\rVert_{q\Delta}<\infty\}.

Construct the following function Υ:\Upsilon:\mathcal{R}\to\mathcal{M} (the space of probability measures on [0,1]×[0,1]\times\mathbb{R}) by setting:

(2.9) Υ(W,ν):=Law(U1,ϑW,ν(U1)).\displaystyle\Upsilon(W,\nu):=\mathrm{Law}\left(U_{1},\vartheta_{W,\nu}(U_{1})\right).

Here (U1,V1),,(Uv,Vv)i.i.d.ν(U_{1},V_{1}),\ldots,(U_{v},V_{v})\overset{i.i.d.}{\sim}\nu, and ϑW,ν(.)\vartheta_{W,\nu}(.) is as in (1.18). Note that Υ(W,ν)\Upsilon(W,\nu) is well-defined for (W,ν)(W,\nu)\in\mathcal{R}, as the function ϑW,ν(.)\vartheta_{W,\nu}(.) is well defined a.s. by 1.3 part (i). Also for L>0L>0 and a random variable XX, set X(L)=X𝟏(|X|L)X^{(L)}=X\mathbf{1}(|X|\leq L). For any measure νM~\nu\in\widetilde{M} and (U,V)ν(U,V)\sim\nu, let ν(L)\nu^{(L)} denote the distribution of the truncated random variable (U,V(L))(U,V^{(L)}).

Set

(2.10) mi,V(L):=vnv1(i2,,iv)[n]v1Sym[Qn](i,i2,,iv)(a=2vXia(L)).m_{i,V}^{(L)}:=\frac{v}{n^{v-1}}\sum_{(i_{2},\ldots,i_{v})\in[n]^{v-1}}\mathrm{Sym}[Q_{n}]\left(i,i_{2},\ldots,i_{v}\right)\left(\prod_{a=2}^{v}X_{i_{a}}^{(L)}\right).

As a shorthand, we denote mi,V:=mi,Vm_{i,V}:=m_{i,V}^{\infty}. Let us also define

𝐗(L):={X1(L),,Xn(L)},𝐦V(L):={m1,V(L),,mn,V(L)},𝐦V:={m1,V,,mn,V}.\mathbf{X}^{(L)}:=\{X_{1}^{(L)},\ldots,X_{n}^{(L)}\},\quad\mathbf{m}_{V}^{(L)}:=\{m_{1,V}^{(L)},\ldots,m_{n,V}^{(L)}\},\quad\mathbf{m}_{V}:=\{m_{1,V},\ldots,m_{n,V}\}.

Following (1.7), we have:

𝔏n(𝐦V):=1ni=1nδ(in,mi,V).\mathfrak{L}_{n}(\mathbf{m}_{V}):=\frac{1}{n}\sum_{i=1}^{n}\delta_{\left(\frac{i}{n},m_{i,V}\right)}.

Next we generate UUnif[0,1]U\sim\mathrm{Unif}[0,1]. We define a map 𝔏~n\tilde{\mathfrak{L}}_{n} from n\mathbb{R}^{n} to ~\widetilde{\mathcal{M}} given by

(2.11) 𝔏~n(𝐱):=Law(U,xnU),𝐱=(x1,,xn).\tilde{\mathfrak{L}}_{n}(\mathbf{x}):=\mathrm{Law}(U,x_{\lceil nU\rceil}),\qquad\mathbf{x}=(x_{1},\ldots,x_{n}).

In view of (2.11), note that 𝔏~n(𝐗)\tilde{\mathfrak{L}}_{n}(\mathbf{X}), 𝔏~n(𝐗(L))\tilde{\mathfrak{L}}_{n}(\mathbf{X}^{(L)}), 𝔏~n(𝐦V(L))\tilde{\mathfrak{L}}_{n}(\mathbf{m}_{V}^{(L)}), and 𝔏~n(𝐦V)\tilde{\mathfrak{L}}_{n}(\mathbf{m}_{V}) denotes the laws of (U,XnU)(U,X_{\lceil nU\rceil}), (U,XnU(L))(U,X_{\lceil nU\rceil}^{(L)}), (U,mnU,V(L))(U,m_{\lceil nU\rceil,V}^{(L)}), and (U,mnU,V)(U,m_{\lceil nU\rceil,V}) conditioned on X1,,XnX_{1},\ldots,X_{n}, respectively. Also, with Υ\Upsilon as in 2.1, we have

(2.12) Υ(WQn,𝔏~n(𝐗))=𝔏~n(𝐦V),Υ(WQn,𝔏~n(𝐗(L)))=𝔏~n(𝐦V(L)).\Upsilon\left(W_{Q_{n}},\tilde{\mathfrak{L}}_{n}(\mathbf{X})\right)=\tilde{\mathfrak{L}}_{n}(\mathbf{m}_{V}),\quad\Upsilon\left(W_{Q_{n}},\tilde{\mathfrak{L}}_{n}(\mathbf{X}^{(L)})\right)=\tilde{\mathfrak{L}}_{n}(\mathbf{m}_{V}^{(L)}).

In order to prove the above, note that, given any bounded continuous real-valued function ff on [0,1]×[0,1]\times\mathbb{R}, we have:

𝔼Υ(WQn,𝔏~n(𝐗))[f]\displaystyle\mathbb{E}_{\Upsilon\left(W_{Q_{n}},\tilde{\mathfrak{L}}_{n}(\mathbf{X})\right)}[f]
=01f(u1,v[0,1]v1Sym[Qn](nu1,nu2,,nuv)(a=2vXnua)𝑑u2𝑑uv)𝑑u1\displaystyle=\int_{0}^{1}f\left(u_{1},v\int_{[0,1]^{v-1}}\mathrm{Sym}[Q_{n}](\lceil nu_{1}\rceil,\lceil nu_{2}\rceil,\ldots,\lceil nu_{v}\rceil)\left(\prod_{a=2}^{v}X_{\lceil nu_{a}\rceil}\right)\,du_{2}\ldots\,du_{v}\right)\,du_{1}
=i1=1ni11ni1nf(u1,vnv1(i2,,iv)[n]v1Sym[Qn](nu1,i2,,iv)(a=2vXia))𝑑u1\displaystyle=\sum_{i_{1}=1}^{n}\int_{\frac{i_{1}-1}{n}}^{\frac{i_{1}}{n}}f\left(u_{1},\frac{v}{n^{v-1}}\sum_{(i_{2},\ldots,i_{v})\in[n]^{v-1}}\mathrm{Sym}[Q_{n}](\lceil nu_{1}\rceil,i_{2},\ldots,i_{v})\left(\prod_{a=2}^{v}X_{i_{a}}\right)\right)\,du_{1}
=i1=1ni11ni1nf(u1,mnu1,V)𝑑u1=𝔼𝔏~n(𝐦V)[f],\displaystyle=\sum_{i_{1}=1}^{n}\int_{\frac{i_{1}-1}{n}}^{\frac{i_{1}}{n}}f(u_{1},m_{\lceil nu_{1}\rceil,V})\,du_{1}=\mathbb{E}_{\tilde{\mathfrak{L}}_{n}(\mathbf{m}_{V})}[f],

and so the first conclusion of (2.12) holds. The proof of the second conclusion is similar. By the definition of 𝔅θ\mathfrak{B}^{*}_{\theta} in 1.3, we have

(2.13) 𝔅θ={Υ(W,ν):νΞ(Fθ)}=Υ(W,Ξ(Fθ)).\mathfrak{B}^{*}_{\theta}=\{\Upsilon(W,\nu):\ \nu\in\Xi(F_{\theta})\}=\Upsilon(W,\Xi(F_{\theta})).

With 𝔏n(𝐦)\mathfrak{L}_{n}(\mathbf{m}) as in (1.15), triangle inequality gives

d(𝔏n(𝐦),𝔅θ)\displaystyle\;\;\;\;d_{\ell}(\mathfrak{L}_{n}(\mathbf{m}),\mathfrak{B}^{*}_{\theta})
d(𝔏n(𝐦),𝔏n(𝐦V))+d(𝔏n(𝐦V),𝔏~n(𝐦V))+d(𝔏~n(𝐦V),𝔏~n(𝐦V(L)))\displaystyle\leq d_{\ell}(\mathfrak{L}_{n}(\mathbf{m}),\mathfrak{L}_{n}(\mathbf{m}_{V}))+d_{\ell}(\mathfrak{L}_{n}(\mathbf{m}_{V}),\tilde{\mathfrak{L}}_{n}(\mathbf{m}_{V}))+d_{\ell}(\tilde{\mathfrak{L}}_{n}(\mathbf{m}_{V}),\tilde{\mathfrak{L}}_{n}(\mathbf{m}_{V}^{(L)}))
+d(𝔏~n(𝐦V(L)),Υ(W,𝔏~n(𝐗(L))))+d(Υ(W,𝔏~n(𝐗(L))),𝔅θ)\displaystyle\quad+d_{\ell}(\tilde{\mathfrak{L}}_{n}(\mathbf{m}_{V}^{(L)}),\Upsilon(W,\tilde{\mathfrak{L}}_{n}(\mathbf{X}^{(L)})))+d_{\ell}(\Upsilon(W,\tilde{\mathfrak{L}}_{n}(\mathbf{X}^{(L)})),\mathfrak{B}^{*}_{\theta})
=d(𝔏n(𝐦),𝔏n(𝐦V))+d(𝔏n(𝐦V),𝔏~n(𝐦V))+d(Υ(WQn,𝔏~n(𝐗)),Υ(WQn,𝔏~n(𝐗(L))))\displaystyle=d_{\ell}(\mathfrak{L}_{n}(\mathbf{m}),\mathfrak{L}_{n}(\mathbf{m}_{V}))+d_{\ell}(\mathfrak{L}_{n}(\mathbf{m}_{V}),\tilde{\mathfrak{L}}_{n}(\mathbf{m}_{V}))+d_{\ell}(\Upsilon(W_{Q_{n}},\tilde{\mathfrak{L}}_{n}(\mathbf{X})),\Upsilon(W_{Q_{n}},\tilde{\mathfrak{L}}_{n}(\mathbf{X}^{(L)})))
+d(Υ(WQn,𝔏~n(𝐗(L))),Υ(W,𝔏~n(𝐗(L)))+d(Υ(W,𝔏~n(𝐗(L))),Υ(W,Ξ(Fθ)),\displaystyle\quad+d_{\ell}(\Upsilon(W_{Q_{n}},\tilde{\mathfrak{L}}_{n}(\mathbf{X}^{(L)})),\Upsilon(W,\tilde{\mathfrak{L}}_{n}(\mathbf{X}^{(L)}))+d_{\ell}(\Upsilon(W,\tilde{\mathfrak{L}}_{n}(\mathbf{X}^{(L)})),\Upsilon(W,\Xi(F_{\theta})),

where the second equality uses (2.12) and (2.13). We now show that each of the terms on the right hand side converge to 0 as we take limits with nn\to\infty first, followed by LL\to\infty. Towards this direction, we observe that:

d(𝔏n(𝐦V),𝔏~n(𝐦V))\displaystyle d_{\ell}\left(\mathfrak{L}_{n}(\mathbf{m}_{V}),\tilde{\mathfrak{L}}_{n}(\mathbf{m}_{V})\right) =supfLip(1)|1ni=1nf(in,mi,V)i=1ni1ninf(u,mi,V)𝑑u|\displaystyle=\sup_{f\in\mathrm{Lip}(1)}\bigg{|}\frac{1}{n}\sum_{i=1}^{n}f\left(\frac{i}{n},m_{i,V}\right)-\sum_{i=1}^{n}\int_{\frac{i-1}{n}}^{\frac{i}{n}}f(u,m_{i,V})\,du\bigg{|}
(2.14) supfLip(1)i=1ni1nin|f(in,mi,V)f(u,mi,V)|𝑑u1n0.\displaystyle\leq\sup_{f\in\mathrm{Lip}(1)}\sum_{i=1}^{n}\int_{\frac{i-1}{n}}^{\frac{i}{n}}\bigg{|}f\left(\frac{i}{n},m_{i,V}\right)-f(u,m_{i,V})\bigg{|}\,du\leq\frac{1}{n}\to 0.

Based on the above two displays, it now suffices to prove the following:

(2.15) d(𝔏n(𝐦),𝔏n(𝐦V))𝑃0,d_{\ell}(\mathfrak{L}_{n}(\mathbf{m}),\mathfrak{L}_{n}(\mathbf{m}_{V}))\overset{P}{\longrightarrow}0,
(2.16) d(Υ(WQn,𝔏~n(𝐗)),Υ(WQn,𝔏~n(𝐗(L))))𝑃0,d_{\ell}(\Upsilon(W_{Q_{n}},\tilde{\mathfrak{L}}_{n}(\mathbf{X})),\Upsilon(W_{Q_{n}},\tilde{\mathfrak{L}}_{n}(\mathbf{X}^{(L)})))\overset{P}{\longrightarrow}0,

as nn\to\infty followed by LL\to\infty, and

(2.17) d(Υ(WQn,𝔏~n(𝐗(L))),Υ(W,𝔏~n(𝐗(L))))𝑃0,d_{\ell}(\Upsilon(W_{Q_{n}},\tilde{\mathfrak{L}}_{n}(\mathbf{X}^{(L)})),\Upsilon(W,\tilde{\mathfrak{L}}_{n}(\mathbf{X}^{(L)})))\overset{P}{\longrightarrow}0,

as nn\to\infty for every fixed L>0L>0, and

(2.18) d(Υ(W,𝔏~n(𝐗(L))),Υ(W,Ξ(Fθ))𝑃0,d_{\ell}(\Upsilon(W,\tilde{\mathfrak{L}}_{n}(\mathbf{X}^{(L)})),\Upsilon(W,\Xi(F_{\theta}))\overset{P}{\longrightarrow}0,

as nn\to\infty, followed by LL\to\infty.

We now split the proof into four parts, proving the four preceding displays. We begin with the proof of (2.15) which requires the following lemma. It is a variant of [6, Lemma 2.7]. We omit the details of the proof for brevity.

Lemma 2.4.

Suppose QnQ_{n} satisfies (1.8) for some q>1q>1. Let ϕ~:v1[L,L]\tilde{\phi}:\mathbb{R}^{v-1}\to[-L,L] for some L>0L>0, and 𝒮(n,v,i)\mathcal{S}(n,v,i) be as in Definition 1.8. Then given any permutation σ\sigma of [v][v], we get:

limn1nvsup(x1,,xn)ni1=1n\displaystyle\lim\limits_{n\rightarrow\infty}\frac{1}{n^{v}}\sup\limits_{\begin{subarray}{c}(x_{1},\ldots,x_{n})\\ \in\mathbb{R}^{n}\end{subarray}}\sum_{i_{1}=1}^{n} |(i2,,iv)𝒮(n,v,i1)((a,b)E(H)Qn(iσ(a),iσ(b)))ϕ~(xi2,,xiv)\displaystyle\bigg{\lvert}\sum_{\begin{subarray}{c}(i_{2},\ldots,i_{v})\\ \in\mathcal{S}(n,v,i_{1})\end{subarray}}\left(\prod_{(a,b)\in E(H)}Q_{n}(i_{\sigma(a)},i_{\sigma(b)})\right)\tilde{\phi}(x_{i_{2}},\ldots,x_{i_{v}})-
(i2,,iv)[n]v1((a,b)E(H)Qn(iσ(a),iσ(b)))ϕ~(xi2,,xiv)|=0.\displaystyle\sum_{\begin{subarray}{c}(i_{2},\ldots,i_{v})\\ \in[n]^{v-1}\end{subarray}}\left(\prod_{(a,b)\in E(H)}Q_{n}(i_{\sigma(a)},i_{\sigma(b)})\right)\tilde{\phi}(x_{i_{2}},\ldots,x_{i_{v}})\bigg{\rvert}=0.
Proof of (2.15).

With 𝒮(n,v,i)\mathcal{S}(n,v,i) as in Definition 1.8, define

(2.19) mi(L):=vnv1(i2,,iv)𝒮(n,v,i)Sym[Qn](i,i2,,iv)(a=2vXia(L)).m_{i}^{(L)}:=\frac{v}{n^{v-1}}\sum_{(i_{2},\ldots,i_{v})\in\mathcal{S}(n,v,i)}\mathrm{Sym}[Q_{n}]\left(i,i_{2},\ldots,i_{v}\right)\left(\prod_{a=2}^{v}X_{i_{a}}^{(L)}\right).

It then suffices to prove the following:

(2.20) limLlim supn(n1i=1n|mimi(L)|ϵ)=0,\lim\limits_{L\to\infty}\limsup\limits_{n\to\infty}\mathbb{P}\left(n^{-1}\sum_{i=1}^{n}|m_{i}-m_{i}^{(L)}|\geq\epsilon\right)=0,
(2.21) limLlim supn(n1i=1n|mi,Vmi,V(L)|ϵ)=0,\lim\limits_{L\to\infty}\limsup\limits_{n\to\infty}\mathbb{P}\left(n^{-1}\sum_{i=1}^{n}|m_{i,V}-m_{i,V}^{(L)}|\geq\epsilon\right)=0,

for any ϵ>0\epsilon>0, and for any fixed L>0L>0,

(2.22) i[n]|mi(L)mi,V(L)|=oP(n).\sum_{i\in[n]}|m_{i}^{(L)}-m_{i,V}^{(L)}|=o_{P}(n).

Proof of (2.20) Fix p~(1,p)\tilde{p}\in(1,p) such that p~1+q1<1\tilde{p}^{-1}+q^{-1}<1. For any L>1L>1 we have

1ni=1n|mimi(L)|\displaystyle\;\;\;\;\frac{1}{n}\sum_{i=1}^{n}|m_{i}-m_{i}^{(L)}|
vnvA{2,,v},|A|1(i1,,iv)[n]v|Sym[Qn](i1,i2,,iv)|(aA|XiaXia(L)|)(aAc|Xia(L)|)\displaystyle\leq\frac{v}{n^{v}}\sum_{A\subseteq\{2,\ldots,v\},\ |A|\geq 1}\sum_{(i_{1},\ldots,i_{v})\in[n]^{v}}\big{|}\mathrm{Sym}[Q_{n}]\left(i_{1},i_{2},\ldots,i_{v}\right)\big{|}\left(\prod_{a\in A}|X_{i_{a}}-X_{i_{a}}^{(L)}|\right)\left(\prod_{a\in A^{c}}|X_{i_{a}}^{(L)}|\right)
vA{2,,v},|A|1[1nv(i1,,iv)[n]v|Sym[Qn](i1,i2,,iv)|q]1q\displaystyle\leq v\sum_{A\subseteq\{2,\ldots,v\},\ |A|\geq 1}\left[\frac{1}{n^{v}}\sum_{(i_{1},\ldots,i_{v})\in[n]^{v}}\big{|}\mathrm{Sym}[Q_{n}]\left(i_{1},i_{2},\ldots,i_{v}\right)\big{|}^{q}\right]^{\frac{1}{q}}
(aA(1nia=1n|Xia|p~𝟏(|Xia|>L)))1p~(aAc(1nia=1n|Xia|p~𝟏(|Xia|L)))1p~\displaystyle\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\left(\prod_{a\in A}\left(\frac{1}{n}\sum_{i_{a}=1}^{n}|X_{i_{a}}|^{\tilde{p}}\mathbf{1}(|X_{i_{a}}|>L)\right)\right)^{\frac{1}{\tilde{p}}}\left(\prod_{a\in A^{c}}\left(\frac{1}{n}\sum_{i_{a}=1}^{n}|X_{i_{a}}|^{\tilde{p}}\mathbf{1}(|X_{i_{a}}|\leq L)\right)\right)^{\frac{1}{\tilde{p}}}
(2.23) v2v1Lp~pWQnqΔ|E(H)|(1+1ni=1n|Xi|p)v1p~.\displaystyle\leq v2^{v-1}L^{\tilde{p}-p}\lVert W_{Q_{n}}\rVert_{q\Delta}^{|E(H)|}\left(1+\frac{1}{n}\sum_{i=1}^{n}|X_{i}|^{p}\right)^{\frac{v-1}{\tilde{p}}}.

as nn\to\infty, followed by LL\to\infty. Here the second line uses (2.19), the third line uses Hölder’s inequality, and the fourth inequality follows from 2.1 part (iii) along with the inequalities

|x|p~𝟏(|x|>L)Lp~p(1+|x|p),|x|p~𝟏(|x|L)1+|x|p.|x|^{\tilde{p}}\mathbf{1}(|x|>L)\leq L^{\tilde{p}-p}(1+|x|^{p}),\quad|x|^{\tilde{p}}\mathbf{1}(|x|\leq L)\leq 1+|x|^{p}.

The conclusion holds on noting that the RHS of (2.2) converges to 0 on letting nn\to\infty followed by LL\to\infty, since WQnqΔ=O(1)\lVert W_{Q_{n}}\rVert_{q\Delta}=O(1) and 𝔪p(𝔏~n(𝐗))=Op(1)\mathfrak{m}_{p}(\tilde{\mathfrak{L}}_{n}(\mathbf{X}))=O_{p}(1) (which are direct consequences of (1.8) and  (1.22) respectively). This proves (2.20).

Proof of (2.21). The proof is same as that of (2.20). We skip the details for brevity.

Proof of (2.22). Using (2.22), observe that

1ni1=1n|mi1(L)mi1,V(L)|\displaystyle\frac{1}{n}\sum_{i_{1}=1}^{n}|m_{i_{1}}^{(L)}-m_{i_{1},V}^{(L)}|
=1nvi1=1n|1v!σ𝒮v[(i2,,iv)𝒮(n,v,i1)((a,b)E(H)Qn(iσ(a),iσ(b)))a=2vXia(L)\displaystyle=\frac{1}{n^{v}}\sum_{i_{1}=1}^{n}\Bigg{|}\frac{1}{v!}\sum_{\sigma\in\mathcal{S}_{v}}\Bigg{[}\sum_{\begin{subarray}{c}(i_{2},\ldots,i_{v})\\ \in\mathcal{S}(n,v,i_{1})\end{subarray}}\left(\prod_{(a,b)\in E(H)}Q_{n}(i_{\sigma(a)},i_{\sigma(b)})\right)\prod_{a=2}^{v}X_{i_{a}}^{(L)}
(i2,,iv)[n]v1((a,b)E(H)Qn(iσ(a),iσ(b)))a=2vXia(L)]|\displaystyle\qquad-\sum_{\begin{subarray}{c}(i_{2},\ldots,i_{v})\\ \in[n]^{v-1}\end{subarray}}\left(\prod_{(a,b)\in E(H)}Q_{n}(i_{\sigma(a)},i_{\sigma(b)})\right)\prod_{a=2}^{v}X_{i_{a}}^{(L)}\Bigg{]}\Bigg{|}
1nvmaxσ𝒮vi1=1n|(i2,,iv)𝒮(n,v,i1)((a,b)E(H)Qn(iσ(a),iσ(b)))a=2vXia(L)\displaystyle\leq\frac{1}{n^{v}}\max_{\sigma\in\mathcal{S}_{v}}\sum_{i_{1}=1}^{n}\bigg{\lvert}\sum_{\begin{subarray}{c}(i_{2},\ldots,i_{v})\\ \in\mathcal{S}(n,v,i_{1})\end{subarray}}\left(\prod_{(a,b)\in E(H)}Q_{n}(i_{\sigma(a)},i_{\sigma(b)})\right)\prod_{a=2}^{v}X_{i_{a}}^{(L)}-
(i2,,iv)[n]v1((a,b)E(H)Qn(iσ(a),iσ(b)))a=2vXia(L)|.\displaystyle\sum_{\begin{subarray}{c}(i_{2},\ldots,i_{v})\\ \in[n]^{v-1}\end{subarray}}\left(\prod_{(a,b)\in E(H)}Q_{n}(i_{\sigma(a)},i_{\sigma(b)})\right)\prod_{a=2}^{v}X_{i_{a}}^{(L)}\bigg{\rvert}.

The RHS above converges to 0 as nn\to\infty, using 2.4 with ϕ~(x1,,xv)=a=2vxa(L)\tilde{\phi}(x_{1},\ldots,x_{v})=\prod_{a=2}^{v}x^{(L)}_{a} along with triangle inequality.

In order to prove (2.16) and (2.17), we need the following additional lemma whose proof we defer to Section 4.

Lemma 2.5.

Fix a graph HH with vv vertices and maximum degree Δ\Delta as before. Fix p,q>0p,q>0 such that 1p+1q<1\frac{1}{p}+\frac{1}{q}<1, pvp\geq v. Then is well-defined on \mathcal{R}, and the following conclusions hold:

  1. (i)

    Fix C>0C>0. Then

    limLsupν~:𝔪p(ν)CsupW𝒲:WqΔCd(Υ(W,ν),Υ(W,ν(L)))=0.\lim\limits_{L\to\infty}\sup_{\nu\in\widetilde{\mathcal{M}}:\ \mathfrak{m}_{p}(\nu)\leq C}\sup_{W\in\mathcal{W}:\ \lVert W\rVert_{q\Delta}\leq C}d_{\ell}(\Upsilon(W,\nu),\Upsilon(W,\nu^{(L)}))=0.
  2. (ii)

    Suppose Wk,W𝒲W_{k},W_{\infty}\in\mathcal{W}, k1k\geq 1 such that d(Wk,W)0d_{\square}(W_{k},W)\to 0 as kk\to\infty, and sup1kWkqΔ<\sup_{1\leq k\leq\infty}\lVert W_{k}\rVert_{q\Delta}<\infty for 1k1\leq k\leq\infty. Fix L(0,)L\in(0,\infty) and let ~(L)\widetilde{\mathcal{M}}^{(L)} denote the subset of ~\widetilde{\mathcal{M}} for which the second marginal is compactly supported on [L,L][-L,L]. Then we have

    limksupν~(L)d(Υ(Wk,ν),Υ(W,ν))=0.\lim\limits_{k\to\infty}\sup_{\nu\in\widetilde{\mathcal{M}}^{(L)}}d_{\ell}(\Upsilon(W_{k},\nu),\Upsilon(W_{\infty},\nu))=0.
  3. (iii)

    Fix W𝒲W\in\mathcal{W} such that WqΔ<\lVert W\rVert_{q\Delta}<\infty, and let νk,ν~(L)\nu_{k},\nu_{\infty}\in\widetilde{\mathcal{M}}^{(L)} such that d(νk,ν)0d_{\ell}(\nu_{k},\nu_{\infty})\to 0. Then,

    limkd(Υ(W,νk),Υ(W,ν))=0.\lim\limits_{k\to\infty}d_{\ell}(\Upsilon(W,\nu_{k}),\Upsilon(W,\nu_{\infty}))=0.
Proof of (2.16).

We can use 2.5 part (i) to get the desired conclusion provided we can show WQnqΔ=O(1)\lVert W_{Q_{n}}\rVert_{q\Delta}=O(1) and 𝔪p(𝔏~n(𝐗))=Op(1)\mathfrak{m}_{p}(\tilde{\mathfrak{L}}_{n}(\mathbf{X}))=O_{p}(1) (these requirements follow from the definition of \mathcal{R}, see 2.1). But these are direct consequences of (1.8) and (1.22) respectively. ∎

Proof of (2.17).

We can use 2.5 part (ii) to get the desired conclusion, if we can verify that d(WQn,W)0d_{\square}(W_{Q_{n}},W)\to 0, WQnqΔ=O(1)\lVert W_{Q_{n}}\rVert_{q\Delta}=O(1), and WqΔ=O(1)\lVert W\rVert_{q\Delta}=O(1). But these are direct consequences of (1.4), (1.8), and (1.11) respectively. ∎

The final step is to establish (2.18) for which we need two results. The first one is an immediate corollary of 2.5 parts (i) and (iii) (and hence it’s proof is omitted), while the second one is a simple convergence lemma, whose proof is provided in Section 5.2.

Corollary 2.6.

Consider the same setting as in 2.5. For C>0C>0, define

(2.24) ~p,C:={ν~p:𝔪p(ν)C}.\displaystyle\widetilde{\mathcal{M}}_{p,C}:=\{\nu\in\widetilde{\mathcal{M}}_{p}:\ \mathfrak{m}_{p}(\nu)\leq C\}.

Suppose W𝒲W\in\mathcal{W} be such that WqΔ<\lVert W\rVert_{q\Delta}<\infty, then Υ(W,)\Upsilon(W,\cdot) is continuous on ~p,C\widetilde{\mathcal{M}}_{p,C} in the weak topology.

Lemma 2.7.

Suppose (X,dX)(X,d_{X}) and (Y,dY)(Y,d_{Y}) be two Polish spaces. Let ξn\xi_{n} be a sequence of XX-valued random variables such that dX(ξn,)P0d_{X}(\xi_{n},\mathcal{F})\xrightarrow{\text{P}}0 for some closed set X\mathcal{F}\subseteq X. Assume that there exists a compact set KXK\subseteq X such that

(2.25) limn(ξnK)=0.\displaystyle\lim\limits_{n\to\infty}\mathbb{P}(\xi_{n}\notin K)=0.

Finally consider a function g:XYg:X\mapsto Y such that gg is continuous on KK. Then we have

dY(g(ξn),g())P0.d_{Y}(g(\xi_{n}),g(\mathcal{F}))\xrightarrow{\text{P}}0.
Proof of (2.18).

Applying 2.5 part (i), for every ε>0\varepsilon>0 we have

limLlim supn(d(Υ(W,𝔏~n(𝐗(L))),Υ(W,𝔏~n(𝐗)))ε)=0.\lim_{L\to\infty}\limsup_{n\to\infty}\;\;\mathbb{P}\left(d_{\ell}(\Upsilon(W,\tilde{\mathfrak{L}}_{n}(\mathbf{X}^{(L)})),\Upsilon(W,\tilde{\mathfrak{L}}_{n}(\mathbf{X})))\geq\varepsilon\right)=0.

It thus suffices to show that

(2.26) d(Υ(W,𝔏~n(𝐗)),Υ(W,Ξ(Fθ)))𝑃0.d_{\ell}(\Upsilon(W,\tilde{\mathfrak{L}}_{n}(\mathbf{X})),\Upsilon(W,\Xi(F_{\theta})))\overset{P}{\longrightarrow}0.

To this effect, use 1.1 part (iv) to note that

(2.27) d(𝔏~n(𝐗),Ξ(Fθ))P0,\displaystyle d_{\ell}(\tilde{\mathfrak{L}}_{n}(\mathbf{X}),\Xi(F_{\theta}))\stackrel{{\scriptstyle P}}{{\to}}0,

where the set Ξ(Fθ)\Xi(F_{\theta}) is compact in the weak topology. Also note that by (1.22), there exists C>0C>0 such that

(2.28) limn(𝔏~n(𝐗)~p,C)=0.\displaystyle\lim\limits_{n\to\infty}\mathbb{P}(\tilde{\mathfrak{L}}_{n}(\mathbf{X})\notin\widetilde{\mathcal{M}}_{p,C})=0.

We will now invoke 2.7 with X=~pX=\widetilde{\mathcal{M}}_{p} and Y=Y=\mathcal{M}, both coupled with weak topology, ξn=𝔏~n(𝐗)\xi_{n}=\tilde{\mathfrak{L}}_{n}(\mathbf{X}), =Ξ(Fθ)\mathcal{F}=\Xi(F_{\theta}), K=~p,CK=\widetilde{\mathcal{M}}_{p,C} and g()=Υ(W,)g(\cdot)=\Upsilon(W,\cdot). Once we verify the conditions of 2.7 with the above specifications, we will then conclude (2.26), which in turn, completes the proof.

To verify the conditions of 2.7, note that =Ξ(Fθ)\mathcal{F}=\Xi(F_{\theta}) is compact, and is a subset of X=~pX=\widetilde{\mathcal{M}}_{p} by (1.25). Further, (2.27) implies dX(ξn,)𝑃0d_{X}(\xi_{n},\mathcal{F})\overset{P}{\to}0. The conclusion in (2.28) implies (2.25). The fact that g()=Υ(W,)g(\cdot)=\Upsilon(W,\cdot) is well-defined on XX follows from 1.3 part (a) (also see 2.1). Finally, the continuity of gg on KK follows from 2.6.

This finally completes the proof of 1.3. ∎

2.3. Proofs of 1.5 and 1.6

In order to prove 1.5, we need the following results. The first result is a lemma about a sequence of functions converging in measure. It’s proof is deferred to Section 5.2.

Lemma 2.8.

Let UUnif[0,1]U\sim\mathrm{Unif}[0,1] and p1p\geq 1.

  1. (i)

    Suppose {fn}n1\{f_{n}\}_{n\geq 1} is a sequence of measurable real-valued functions on [0,1][0,1] such that

    lim supn𝔼|fn(U)|p<, and (U,fn(U))𝐷(U,f(U)).\limsup_{n\to\infty}\mathbb{E}|f_{n}(U)|^{p}<\infty,\text{ and }(U,f_{n}(U))\overset{D}{\longrightarrow}(U,f_{\infty}(U)).

    Then for any p~(0,p)\tilde{p}\in(0,p) we have:

    (2.29) 𝔼|fn(U)f(U)|p~0.\mathbb{E}|f_{n}(U)-f_{\infty}(U)|^{\tilde{p}}\longrightarrow 0.
  2. (ii)

    If (U,f(U))=𝐷(U,g(U))(U,f(U))\overset{D}{=}(U,g(U)) for some f,gf,g such that 𝔼|f(U)|p<\mathbb{E}|f(U)|^{p}<\infty and 𝔼|g(U)|p<\mathbb{E}|g(U)|^{p}<\infty, then f(U)=g(U)f(U)=g(U) a.s.

For stating the second result, we recall the definitions of 𝔅θ\mathfrak{B}^{*}_{\theta}, 𝔅~θ\widetilde{\mathfrak{B}}_{\theta}, ϑW,ν\vartheta_{W,\nu}, Ξ(Fθ)\Xi(F_{\theta}), \mathcal{L} from 1.3, (1.21), (1.18), 1.1 part (iv), and 1.4, respectively. Also define

(2.30) M~p,C:={Law(U,f(U)):f,01|f(u)|p𝑑uC},M~p:=CM~p,C.\widetilde{M}_{p,C}:=\left\{\mathrm{Law}(U,f(U)):\ f\in\mathcal{L},\ \int_{0}^{1}|f(u)|^{p}\,du\leq C\right\},\;\;\widetilde{M}_{p}:=\cup_{C\in\mathbb{N}}\ \widetilde{M}_{p,C}.

Based on 1.2 and (2.24), M~p~p\widetilde{M}_{p}\subseteq\widetilde{\mathcal{M}}_{p} and M~p,C~p,C\widetilde{M}_{p,C}\subseteq\widetilde{\mathcal{M}}_{p,C}. We also construct G1:[0,1]×[0,1]×𝒩G_{1}:[0,1]\times\mathbb{R}\to[0,1]\times\mathcal{N} given by G1(x,y):=(x,α(θy))G_{1}(x,y):=(x,\alpha^{\prime}(\theta y)).

We note an elementary observation here which will be used in the sequel. To wit, recall from 1.3 that 𝔅θ={Law(U,ϑW,ν(U)),νΞ(Fθ)}\mathfrak{B}_{\theta}^{*}=\{\mathrm{Law}(U,\vartheta_{W,\nu}(U)),\nu\in\Xi(F_{\theta})\}. Consequently from the definition of G1G_{1} it follows that:

(2.31) 𝔅~θ=G1(𝔅θ):={Law(U,α(θϑW,ν(U))),νΞ(Fθ)}.\displaystyle\widetilde{\mathfrak{B}}_{\theta}=G_{1}(\mathfrak{B}_{\theta}^{*}):=\{{\rm Law}(U,\alpha^{\prime}(\theta\vartheta_{W,\nu}(U))),\nu\in\Xi(F_{\theta})\}.

We now state the following lemma, formalizes a key property of the sets 𝔅θ\mathfrak{B}_{\theta}^{*} and 𝔅~θ\widetilde{\mathfrak{B}}_{\theta}. It’s proof is deferred to Section 4.

Lemma 2.9.

Consider the same setting as in 1.3. Then the set 𝔅θ\mathfrak{B}^{*}_{\theta} is a compact subset of M~q\widetilde{M}_{q} in the weak topology, whereas 𝔅~θ\widetilde{\mathfrak{B}}_{\theta} is a compact subset of M~p\widetilde{M}_{p} in the weak topology.

We are now in the position to prove 1.5.

Proof of 1.5.

By arguments similar to (2.2) we have

d(𝔏n(𝐦),𝔏~n(𝐦))𝑃0.d_{\ell}(\mathfrak{L}_{n}(\mathbf{m}),\tilde{\mathfrak{L}}_{n}(\mathbf{m}))\overset{P}{\longrightarrow}0.

Consequently by invoking 1.3 part (b) we get:

(2.32) d(𝔏~n(𝐦),𝔅θ)𝑃0.\displaystyle d_{\ell}(\tilde{\mathfrak{L}}_{n}(\mathbf{m}),\mathfrak{B}_{\theta}^{*})\overset{P}{\longrightarrow}0.

Further by (1.23), there exists C>0C>0 such that

(2.33) limn(𝔏~n(𝐦)M~q,C)=0.\displaystyle\lim\limits_{n\to\infty}\mathbb{P}(\tilde{\mathfrak{L}}_{n}(\mathbf{m})\notin\widetilde{M}_{q,C})=0.

With the above observations in mind, we invoke 2.7 with X=M~qX=\widetilde{M}_{q}, Y=Y=\mathcal{M} equipped with the topology of weak convergence, ξn=𝔏~n(𝐦)\xi_{n}=\tilde{\mathfrak{L}}_{n}(\mathbf{m}), =𝔅θ\mathcal{F}=\mathfrak{B}_{\theta}^{*}, g=G1g=G_{1} and K=M~q,CK=\widetilde{M}_{q,C} (with CC chosen as in (2.33)). Once we verify the assumptions of 2.7 with the above specifications, by (2.31), we obtain:

(2.34) d(𝔏~n(𝜶),𝔅~θ)𝑃0,d_{\ell}\left(\tilde{\mathfrak{L}}_{n}(\boldsymbol{\alpha}),\tilde{\mathfrak{B}}_{\theta}\right)\overset{P}{\longrightarrow}0,

To verify the conditions of 2.7, note that =𝔅θX=M~q\mathcal{F}=\mathfrak{B}_{\theta}^{*}\subseteq X=\widetilde{M}_{q} by (1.25). Further, (2.32) implies dX(ξn,)𝑃0d_{X}(\xi_{n},\mathcal{F})\overset{P}{\to}0 and 1.4 part (ii) along with Fatou’s lemma implies \mathcal{F} is a compact subset of XX. The conclusion in (2.33) implies (2.25). Finally, the continuity of gg on KK follows from the continuity of α()\alpha^{\prime}(\cdot).

We now use (2.34) to complete the proof. The key tool will once again be 2.7. To set things up, fixing C>0C>0 equip M~p,C\widetilde{M}_{p,C} with the weak topology. Pick any νM~p,C\nu\in\widetilde{M}_{p,C}. Then ν\nu is distributed as (U,f(U))(U,f(U)), where UUnif[0,1]U\sim\mathrm{Unif}[0,1] and f:[0,1]f:[0,1]\mapsto\mathbb{R} is measurable with fpC\lVert f\rVert_{p}\leq C. Consequently by 2.8 part (ii), the map G2:~p,CLp[0,1]G_{2}:\widetilde{\mathcal{M}}_{p,C}\to L^{p^{\prime}}[0,1], (for some p<pp^{\prime}<p) given by G2(ν)=fG_{2}(\nu)=f is well-defined.

For any fFθf\in F_{\theta}, setting ν=Ξ(f)\nu=\Xi(f) use (1.19) to note that f(U)=α(θϑW,ν(U))f(U)=\alpha^{\prime}(\theta\vartheta_{W,\nu}(U)) a.s. Consequently, by (2.31), we get:

(2.35) G2(𝔅~θ)=Fθ.\displaystyle G_{2}(\widetilde{\mathfrak{B}}_{\theta})=F_{\theta}.

Moreover, by (1.24), there exists C>0C>0 such that

(2.36) limn(𝔏~n(𝜶)M~p,C)=0.\displaystyle\lim\limits_{n\to\infty}\mathbb{P}(\tilde{\mathfrak{L}}_{n}(\boldsymbol{\alpha})\notin\widetilde{M}_{p,C})=0.

With this observation, we will invoke 2.7 with X=M~pX=\widetilde{M}_{p}, YLp[0,1]Y\equiv L^{p^{\prime}}[0,1], equipped with the topologies of weak convergence and Lp[0,1]L^{p^{\prime}}[0,1] respectively, and ξn=𝔏~n(𝜶)\xi_{n}=\tilde{\mathfrak{L}}_{n}(\boldsymbol{\alpha}), 𝔅~θ\mathcal{F}\equiv\widetilde{\mathfrak{B}}_{\theta}, gG2g\equiv G_{2}, and K=M~p,CK=\widetilde{M}_{p,C} with CC chosen from (2.36). Once we verify the conditions of 2.7, an application of (2.35) will yield

G2(U,α(θmnU))G2(𝔅~θ)p=inffFθ01|α(θmnu)f(u)|p𝑑u𝑃0,\lVert G_{2}(U,\alpha^{\prime}(\theta m_{\lceil nU\rceil}))-G_{2}(\widetilde{\mathfrak{B}}_{\theta})\rVert_{p^{\prime}}=\inf_{f\in F_{\theta}}\int_{0}^{1}|\alpha^{\prime}(\theta m_{\lceil nu\rceil})-f(u)|^{p^{\prime}}\,du\overset{P}{\longrightarrow}0,

which will complete the proof of (1.26).

To verify the conditions of 2.7, note that =𝔅~θX=M~p\mathcal{F}=\widetilde{\mathfrak{B}}_{\theta}\subseteq X=\widetilde{M}_{p} by (1.25). Further, (2.34) implies dX(ξn,)𝑃0d_{X}(\xi_{n},\mathcal{F})\overset{P}{\to}0, and 1.4 part (ii) implies \mathcal{F} is a compact subset of XX. The conclusion in (2.36) implies (2.25). The fact that g()=G2()g(\cdot)=G_{2}(\cdot) is well-defined on X=M~pX=\widetilde{M}_{p} follows from 2.8 part (b). Continuity of gg on KK follows from 2.8 part (i).

For proving 1.6, we will need the following lemma whose proof we defer to Section 4.

Lemma 2.10.

Suppose 𝐗{\bf X} is a sample from the model (1.3) (θ\theta need not be non-negative). Suppose p[v,]p\in[v,\infty], q>1q>1 satisfy (1.6), lim supnWQnqΔ<\limsup_{n\to\infty}\lVert W_{Q_{n}}\rVert_{q\Delta}<\infty and 1p+1q1\frac{1}{p}+\frac{1}{q}\leq 1.

(i) Given any vector 𝐝(N):=(d1,d2,,dN)\mathbf{d}^{(N)}:=(d_{1},d_{2},\ldots,d_{N}) such that 𝐝(N)=O(1)\lVert\mathbf{d}^{(N)}\rVert_{\infty}=O(1), we have

i=1Ndi(Xiα(θmi))=op(n).\sum_{i=1}^{N}d_{i}(X_{i}-\alpha^{\prime}(\theta m_{i}))=o_{p}(n).

(ii) If 1p+1q<1\frac{1}{p}+\frac{1}{q}<1, then

i=1nmi(Xiα(θmi))=oP(n).\sum_{i=1}^{n}m_{i}\left(X_{i}-\alpha^{\prime}(\theta m_{i})\right)=o_{P}(n).
Proof of 1.6.

By 1.2 part (iii), all the optimizers of the problem in (1.9) are constant functions. Further, (1.25) shows that there exists K>0K>0 (depending on θ\theta) such that all the optimizers of (1.9) have LpL^{p} norm bounded by KK. Combining these two observations, we have that FθF_{\theta} consists only of constant functions where the constants are given by

𝒜θ=argmint𝒩,|t|K[γ(β(t))θtv].\mathcal{A}_{\theta}=\operatorname{argmin}_{t\in\mathcal{N},\ |t|\leq K}[\gamma(\beta(t))-\theta t^{v}].

As analytic non-constant functions can only have finitely many optimizers in a compact set, it follows that AθA_{\theta} is a finite set.

(i) Define ci,L:=ci1{|ci|L}c_{i,L}:=c_{i}1\{|c_{i}|\leq L\} and m¯:=n1i=1nmi\bar{m}:=n^{-1}\sum_{i=1}^{n}m_{i}. We claim that result follows given the following display:

(2.37) limLlim supn𝔼[1ni=1n|cici(L)||Xi|]=0.\lim_{L\to\infty}\limsup_{n\to\infty}\mathbb{E}\left[\frac{1}{n}\sum_{i=1}^{n}\left|c_{i}-c_{i}^{(L)}\right||X_{i}|\right]=0.

This is because given any L>0L>0 and any t𝒜θt\in\mathcal{A}_{\theta} (recall this implies |t|K|t|\leq K), the following inequalities hold:

|1ni=1nciXi|\displaystyle\;\;\;\;\;\bigg{|}\frac{1}{n}\sum_{i=1}^{n}c_{i}X_{i}\bigg{|}
1ni=1n|cici(L)||Xi|+|1ni=1nci(L)(Xiα(θmi))|+|1ni=1nci(L)(α(θmi)t)|+|t|n|i=1nci(L)|\displaystyle\leq\frac{1}{n}\sum_{i=1}^{n}\bigg{|}c_{i}-c_{i}^{(L)}\bigg{|}|X_{i}|+\bigg{|}\frac{1}{n}\sum_{i=1}^{n}c_{i}^{(L)}(X_{i}-\alpha^{\prime}(\theta m_{i}))\bigg{|}+\bigg{|}\frac{1}{n}\sum_{i=1}^{n}c_{i}^{(L)}(\alpha^{\prime}(\theta m_{i})-t)\bigg{|}+\frac{|t|}{n}\bigg{|}\sum_{i=1}^{n}c_{i}^{(L)}\bigg{|}
1ni=1n|cici(L)||Xi|+|1ni=1nci(L)(Xiα(θmi))|+Lni=1n|α(θmi)t|\displaystyle\leq\frac{1}{n}\sum_{i=1}^{n}\bigg{|}c_{i}-c_{i}^{(L)}\bigg{|}|X_{i}|+\bigg{|}\frac{1}{n}\sum_{i=1}^{n}c_{i}^{(L)}(X_{i}-\alpha^{\prime}(\theta m_{i}))\bigg{|}+\frac{L}{n}\sum_{i=1}^{n}|\alpha^{\prime}(\theta m_{i})-t|
+Kn|i=1nci|+KnLr1i=1n|ci|r.\displaystyle\quad\quad\quad+\frac{K}{n}\bigg{|}\sum_{i=1}^{n}c_{i}\bigg{|}+\frac{K}{nL^{r-1}}\sum_{i=1}^{n}|c_{i}|^{r}.

Taking an infimum over t𝒜θt\in\mathcal{A}_{\theta} gives the bound

|1ni=1nciXi|\displaystyle\bigg{|}\frac{1}{n}\sum_{i=1}^{n}c_{i}X_{i}\bigg{|} 1ni=1n|cici(L)||Xi|+|1ni=1nci(L)(Xiα(θmi))|\displaystyle\leq\frac{1}{n}\sum_{i=1}^{n}\bigg{|}c_{i}-c_{i}^{(L)}\bigg{|}|X_{i}|+\bigg{|}\frac{1}{n}\sum_{i=1}^{n}c_{i}^{(L)}(X_{i}-\alpha^{\prime}(\theta m_{i}))\bigg{|}
+inft𝒜θLni=1n|α(θmi)t|+Kn|i=1nci|+KnLr1i=1n|ci|r.\displaystyle+\inf_{t\in\mathcal{A}_{\theta}}\frac{L}{n}\sum_{i=1}^{n}|\alpha^{\prime}(\theta m_{i})-t|+\frac{K}{n}\bigg{|}\sum_{i=1}^{n}c_{i}\bigg{|}+\frac{K}{nL^{r-1}}\sum_{i=1}^{n}|c_{i}|^{r}.

The first and last terms above converge to 0 in probability as nn\to\infty first, followed by LL\to\infty, by using (2.37) and the assumption i=1n|ci|r=O(n)\sum_{i=1}^{n}|c_{i}|^{r}=O(n) for r>1r>1. The remaining terms converge to 0 as nn\to\infty for fixed L>0L>0, by using 2.10 part (i), (1.26), and i=1nci=o(n)\sum_{i=1}^{n}c_{i}=o(n), respectively. This completes the proof.

Next, we prove (2.37). Fix r~(1,r)\tilde{r}\in(1,r) such that 1p+1r~=1\frac{1}{p}+\frac{1}{\tilde{r}}=1. By Hölder’s inequality,

𝔼[1ni=1n|cici(L)||Xi|]=𝔼[1ni=1n|ci|1{|ci|>L}|Xi|]\displaystyle\mathbb{E}\left[\frac{1}{n}\sum_{i=1}^{n}\left|c_{i}-c_{i}^{(L)}\right||X_{i}|\right]=\mathbb{E}\left[\frac{1}{n}\sum_{i=1}^{n}\left|c_{i}\right|1\{|c_{i}|>L\}|X_{i}|\right]
\displaystyle\leq (1Ni=1n|ci|r~1{|ci|>L})1r~(1ni=1n𝔼|Xi|p)1p1Lrr~r~(1Ni=1n|ci|r)1r~(1ni=1n𝔼|Xi|p)1p.\displaystyle\left(\frac{1}{N}\sum_{i=1}^{n}|c_{i}|^{\tilde{r}}1\{|c_{i}|>L\}\right)^{\frac{1}{\tilde{r}}}\left(\frac{1}{n}\sum_{i=1}^{n}\mathbb{E}|X_{i}|^{p}\right)^{\frac{1}{p}}\leq\frac{1}{L^{\frac{r-\tilde{r}}{\tilde{r}}}}\left(\frac{1}{N}\sum_{i=1}^{n}|c_{i}|^{r}\right)^{\frac{1}{\tilde{r}}}\left(\frac{1}{n}\sum_{i=1}^{n}\mathbb{E}|X_{i}|^{p}\right)^{\frac{1}{p}}.

This, along with (1.22) and the assumption i=1n|ci|r=O(n)\sum_{i=1}^{n}|c_{i}|^{r}=O(n), establishes (2.37).

(ii) As in part (i), we have

inft𝒜θ|1ni=1nciXic~t||1ni=1n(cic~)Xi|+|c~|n|i=1n(Xiα(θmi))|+|c~|ninft𝒜θi=1n|α(θmi)t|.\displaystyle\inf_{t\in\mathcal{A}_{\theta}}\bigg{|}\frac{1}{n}\sum_{i=1}^{n}c_{i}X_{i}-\tilde{c}t\bigg{|}\leq\bigg{|}\frac{1}{n}\sum_{i=1}^{n}(c_{i}-\tilde{c})X_{i}\bigg{|}+\frac{|\tilde{c}|}{n}\bigg{|}\sum_{i=1}^{n}(X_{i}-\alpha^{\prime}(\theta m_{i}))\bigg{|}+\frac{|\tilde{c}|}{n}\inf_{t\in\mathcal{A}_{\theta}}\sum_{i=1}^{n}|\alpha^{\prime}(\theta m_{i})-t|.

The first term converges to 0 in probability by part (i), the second converges to 0 by 2.10 part (a), and the third term converges to 0 by (1.26). This completes the proof.

(iii) We begin by observing that for any L>1L>1, we have:

L:=supx[L,L]|ddx(xα(x))|<,~L:=infx[L,L]α′′(x)>0,\mathfrak{C}_{L}:=\sup_{x\in[-L,L]}\bigg{|}\frac{d}{dx}(x\alpha^{\prime}(x))\bigg{|}<\infty,\quad\quad\widetilde{\mathfrak{C}}_{L}:=\inf_{x\in[-L,L]}\alpha^{\prime\prime}(x)>0,

both of which follow from standard properties of exponential families. Recall that for any t𝒜θt\in\mathcal{A}_{\theta}, we have t=α(θvtv1)t=\alpha^{\prime}(\theta vt^{v-1}) by 1.2 part (i). This gives

|α(θmi)t|=|α(θmi)α(θvtv1)||θ|~L|θ||mivtv1|,|\alpha^{\prime}(\theta m_{i})-t|=|\alpha^{\prime}(\theta m_{i})-\alpha^{\prime}(\theta vt^{v-1})|\geq|\theta|\widetilde{\mathfrak{C}}_{L|\theta|}|m_{i}-vt^{v-1}|,

and so for all large LL (depending on θ,K\theta,K) we have

inft𝒜θ1ni=1n|mivtv1|\displaystyle\;\;\;\;\inf_{t\in\mathcal{A}_{\theta}}\frac{1}{n}\sum_{i=1}^{n}|m_{i}-vt^{v-1}|
inft𝒜θ1n~L|θ|i=1n|α(θmi)t|+1ni=1n|mi|𝟙(|mi|L)\displaystyle\leq\inf_{t\in\mathcal{A}_{\theta}}\frac{1}{n\widetilde{\mathfrak{C}}}_{{L|\theta|}}\sum_{i=1}^{n}|\alpha^{\prime}(\theta m_{i})-t|+\frac{1}{n}\sum_{i=1}^{n}|m_{i}|\mathbbm{1}(|m_{i}|\geq L)
(2.38) inft𝒜θ1n~L|θ|i=1n|α(θmi)t|+1nLq1i=1n|mi|q.\displaystyle\leq\inf_{t\in\mathcal{A}_{\theta}}\frac{1}{n\widetilde{\mathfrak{C}}_{L}|\theta|}\sum_{i=1}^{n}|\alpha^{\prime}(\theta m_{i})-t|+\frac{1}{nL^{q-1}}\sum_{i=1}^{n}|m_{i}|^{q}.

The RHS above converges to 0 in probability, as nn\to\infty, followed by LL\to\infty. This is because, the first term in (2.3) converges to 0 in probability as nn\to\infty for every fixed LL, by using (1.26). The second term converges to to 0 in probability by taking nn\to\infty first, followed by LL\to\infty, by using (1.23).

Next choose q~(1,q)\tilde{q}\in(1,q) such that p1+q~1=1p^{-1}+\tilde{q}^{-1}=1. Note that for any t𝒜θt\in\mathcal{A}_{\theta}, vtv1α(θvtv1)=vtvvt^{v-1}\alpha^{\prime}(\theta vt^{v-1})=vt^{v} (using the relation t=α(θvtv1)t=\alpha^{\prime}(\theta vt^{v-1})). In the same vein as (2.3), by using (1.17), we also get for all LL large enough:

inft𝒜θ1ni=1n|miα(θmi)vtv|\displaystyle\;\;\;\;\;\inf_{t\in\mathcal{A}_{\theta}}\frac{1}{n}\sum_{i=1}^{n}|m_{i}\alpha^{\prime}(\theta m_{i})-vt^{v}|
(2.39) inft𝒜θL|θ||θ|ni=1n|mivtv1|+1Lqq~1(1ni=1n|mi|q)1q~(1ni=1n|α(θmi)|p)1p.\displaystyle\leq\inf_{t\in\mathcal{A}_{\theta}}\frac{\mathfrak{C}_{L|\theta|}}{|\theta|n}\sum_{i=1}^{n}|m_{i}-vt^{v-1}|+\frac{1}{L^{\frac{q}{\tilde{q}}-1}}\left(\frac{1}{n}\sum_{i=1}^{n}|m_{i}|^{q}\right)^{\frac{1}{\tilde{q}}}\left(\frac{1}{n}\sum_{i=1}^{n}|\alpha^{\prime}(\theta m_{i})|^{p}\right)^{\frac{1}{p}}.

The RHS above converges to 0 in probability, as nn\to\infty, followed by LL\to\infty. This is because, the first term converges to 0 as nn\to\infty by (2.3), and the second term converges to 0 as nn\to\infty followed by LL\to\infty by using (1.23) and (1.22). Finally, the conclusion in part (iii) follows by combining (2.3) with 2.10 part (ii).

3. Proof of Results from Section 1.2

Proof of 1.8.

(i) Note that quadratic forms correspond to the choice H=K2H=K_{2} and v=2v=2 in (1.2). Let μθ\mu_{\theta} be the tilted probability measure on \mathbb{R} obtained from μ\mu as in 1.3. Then a direct computation using (1.28) gives

(3.1) Znquad(θ,B)=α(B)+1nlog𝔼𝐗μBnexp(θnijQn(i,j)XiXj).\displaystyle Z^{\mathrm{quad}}_{n}(\theta,B)={\alpha}(B)+\frac{1}{n}\log{\mathbb{E}_{{\bf X}\sim{\mu}_{B}^{\otimes n}}\exp\left(\frac{\theta}{n}\sum_{i\neq j}Q_{n}(i,j)X_{i}X_{j}\right)}.

Using this along with 1.1 part (iii) we get

Znquad(θ,B)α(B)\displaystyle Z^{\mathrm{quad}}_{n}(\theta,B)-\alpha(B)
supf(θ[0,1]2W(x,y)f(x)f(y)𝑑x𝑑y[0,1]γB(βB(f(x)))𝑑x)\displaystyle\to\sup_{f\in\mathcal{L}}\left(\theta\int_{[0,1]^{2}}W(x,y)f(x)f(y)\,dx\,dy-\int_{[0,1]}{\gamma}_{B}({\beta}_{B}(f(x)))\,dx\right)
(3.2) =supf(θ[0,1]2W(x,y)f(x)f(y)𝑑x𝑑y[0,1](γ(β(f(x)))+α(B)Bf(x))𝑑x).\displaystyle=\sup_{f\in\mathcal{L}}\left(\theta\int_{[0,1]^{2}}W(x,y)f(x)f(y)\,dx\,dy-\int_{[0,1]}(\gamma(\beta(f(x)))+{\alpha}(B)-Bf(x))\,dx\right).

Here γB(.){\gamma}_{B}(.) and αB(.){\alpha}_{B}(.) are as in 1.3, but for the tilted measure μB{\mu}_{B} instead of μ\mu, and the last equality uses (2.3). By invoking 1.2 part (iii), if vv is even, the set of optimizers FθFθ,BF_{\theta}\equiv F_{\theta,B} in the above display are constant functions, where the constant is an optimizer of the following optimization problem:

(3.3) supxα()(θx2+Bxxβ(x)+α(β(x))).\sup_{x\in\alpha^{\prime}(\mathbb{R})}\left(\theta x^{2}+Bx-x\beta(x)+\alpha(\beta(x))\right).

By 1.7 (parts (i) and (ii)), if either (a) B0B\neq 0, or (b) B=0B=0, θ(α′′(0))1/2\theta\leq(\alpha^{\prime\prime}(0))^{-1}/2, then the optimizer is x=tθ,B,μx=t_{\theta,B,\mu}. On the other hand when B=0B=0 and θ>(α′′(0))1/2\theta>(\alpha^{\prime\prime}(0))^{-1}/2, by 1.7 part (iii) the optimizers are x=±tθ,B,μx=\pm t_{\theta,B,\mu}. Using this, the desired conclusion of part (i) follows.

(ii) Recall the definition of 𝒜θ𝒜θ,B\mathcal{A}_{\theta}\equiv\mathcal{A}_{\theta,B} from (1.27), and use part (i) to note that all functions in Fθ,BF_{\theta,B} are constant functions, with constants belonging to the set 𝒜θ,B\mathcal{A}_{\theta,B}. Since v=2v=2, we have

{vtv:t𝒜θ,B}={2t2:t𝒜θ,B}=2tθ,B21ni=1nXimi𝑑2tθ,B,μ2,\{{vt^{v}}:\ t\in\mathcal{A}_{\theta,B}\}=\{{2t^{2}}:\ t\in\mathcal{A}_{\theta,B}\}=2t_{\theta,B}^{2}\Rightarrow\frac{1}{n}\sum_{i=1}^{n}X_{i}m_{i}\overset{d}{\longrightarrow}2t_{\theta,B,\mu}^{2},

where we use 1.6 part (iii).

For the weak limit of X¯\bar{X} we invoke 1.6 part (ii) with ci=1c_{i}=1 which implies c~=1\tilde{c}=1. The conclusion follows by noting that when B=0B=0, the symmetry of μ\mu about the origin implies that X¯\bar{X} and X¯-\bar{X} have the same distribution. ∎

Proof of 1.9.

(i) Let μθ\mu_{\theta} be the tilted measure obtained from μ\mu as in 1.3, and let αB(.),βB(.),γB(.)\alpha_{B}(.),\beta_{B}(.),{\gamma}_{B}(.) be as in 1.3, but for the measure μB{\mu}_{B} instead of μ\mu. Using (2.3) we get

γB(βB(t))=γ(β(t))+α(B)Bt,\gamma_{B}(\beta_{B}(t))=\gamma(\beta(t))+\alpha(B)-Bt,

using which the optimization problem in (1.32) (ignoring the additive constant α(B)\alpha(B)) becomes

(3.4) supf:[0,1]γ(β(f(x)))𝑑x<{θGW(f)[0,1]γB(βB(f(x)))𝑑x}.\displaystyle\sup_{f\in\mathcal{L}:\ \int_{[0,1]}\gamma(\beta(f(x)))dx<\infty}\left\{\theta G_{W}(f)-\int_{[0,1]}\gamma_{B}(\beta_{B}(f(x)))dx\right\}.

Now, we invoke 1.2 part (i) to conclude that any maximizer of the above display satisfies the fixed point equation (1.33).
(ii) It suffices to show that all optimizers of (3.4) are constant functions, for which invoking 1.2 part (iii) it suffices to show that μB\mu_{B} is stochastically non-negative (as per 1.7), if μ\mu is stochastically non-negative. In this case we have γ(β(t))γ(β(t))\gamma(\beta(t))\leq\gamma(\beta(-t)) for t0t\geq 0. Along with (2.3), this gives

γB(βB(t))=γ(β(t))+α(B)Btγ(β(t))+α(B)+Bt=γB(βB(t)),\gamma_{B}(\beta_{B}(t))=\gamma(\beta(t))+\alpha(B)-Bt\leq\gamma(\beta(-t))+\alpha(B)+Bt=\gamma_{B}(\beta_{B}(-t)),

where we use the fact that B0B\geq 0. This shows that μB\mu_{B} is stochastically non-negative as well.

Hence, by the proof of 1.2 part (iii), the maximizers of (1.32) are constant functions provided either vv is even or μ\mu is stochastically non-negative. Finally, (1.33) follows from (1.32), on setting f(.)f(.) to be a constant function.

(iii)(a) The optimization problem (1.32) reduces to maximizing

(3.5) Hθ,B(x):=θxv+Bxγ(β(x))\displaystyle H_{\theta,B}(x):=\theta x^{v}+Bx-\gamma(\beta(x))

over x[1,1]x\in[-1,1]. Differentiating we get

(3.6) Hθ,B(x)=θvxv1+Bβ(x),Hθ,B′′(x)=θv(v1)xv2β(x).\displaystyle H^{\prime}_{\theta,B}(x)=\theta vx^{v-1}+B-\beta(x),\quad H^{\prime\prime}_{\theta,B}(x)=\theta v(v-1)x^{v-2}-\beta^{\prime}(x).

Since, μ\mu is supported on [1,1][-1,1], we have limθα(θ)=1\lim_{\theta\to\infty}\alpha^{\prime}(\theta)=1, and so

α′′(θ)=𝔼μθ(X2)(α(θ))21(α(θ))20\alpha^{\prime\prime}(\theta)=\mathbb{E}_{\mu_{\theta}}(X^{2})-(\alpha^{\prime}(\theta))^{2}\leq 1-(\alpha^{\prime}(\theta))^{2}\rightarrow 0

as θ\theta\rightarrow\infty. Hence, there exists B0=B0(θ,v)B_{0}=B_{0}(\theta,v) such that for BB0B\geq B_{0} we have α′′(B)<12θv(v1)\alpha^{\prime\prime}(B)<\frac{1}{2\theta v(v-1)}. If xx is a global maximizer of H(.)H(.), then we have

x=α(θvxv1+B)α(B)β(x)B.x=\alpha^{\prime}(\theta vx^{v-1}+B)\geq\alpha^{\prime}(B)\implies\beta(x)\geq B.

However, on the interval {x:β(x)B}\{x:\beta(x)\geq B\}, using boundedness of support, we have

Hθ,0′′(x)θv(v1)1α′′(β(x))<0.H^{\prime\prime}_{\theta,0}(x)\leq\theta v(v-1)-\frac{1}{\alpha^{\prime\prime}(\beta(x))}<0.

Thus Hθ,B(.)H_{\theta,B}(.) is strictly concave on the interval {x:β(x)B}\{x:\beta(x)\geq B\}, and so the global maximizer must be unique.

(iii)(b) We break the proof into the following steps:

  • There exists θ1c(0,)\theta_{1c}\in(0,\infty) such that 0 is the unique global maximizer for Hθ,0(.)H_{\theta,0}(.).
    Since μ\mu is compactly supported on [1,1][-1,1], we have

    α′′(θ)=Varμθ(X)1, and so β(x)=1α′′(β(x))1.\alpha^{\prime\prime}(\theta)=\mathrm{Var}_{\mu_{\theta}}(X)\leq 1,\text{ and so }\beta^{\prime}(x)=\frac{1}{\alpha^{\prime\prime}(\beta(x))}\geq 1.

    Thus for θ<12v(v1)=:θ1c\theta<\frac{1}{2v(v-1)}=:\theta_{1c} we have

    Hθ,0′′(x)θv(v1)β(x)<0,\displaystyle H^{\prime\prime}_{\theta,0}(x)\leq\theta v(v-1)-\beta^{\prime}(x)<0,

    and so Hθ,0H_{\theta,0} is strictly concave. Since Hθ,0(0)=0H^{\prime}_{\theta,0}(0)=0, x=0x=0 is the unique global maximizer of Hθ,0(.)H_{\theta,0}(.).

  • There exists θ2c(0,)\theta_{2c}\in(0,\infty) such that for θ>θ2c\theta>\theta_{2c}, 0 is not a global maximizer of Hθ,0(.)H_{\theta,0}(.)

    We consider two separate cases:

    • μ\mu is stochastically non-negative.

      In this case there exists x0>0x_{0}>0 such that γ(β(x0))(0,)\gamma(\beta(x_{0}))\in(0,\infty). Then setting θ2c:=x0vγ(β(x0))(0,)\theta_{2c}:=x_{0}^{-v}\gamma(\beta(x_{0}))\in(0,\infty), for θ>θ2c\theta>\theta_{2c} we have

      Hθ,0(x0)=θx0vγ(β(x0))>0=Hθ,0(0),H_{\theta,0}\left(x_{0}\right)=\theta x_{0}^{-v}-\gamma\left(\beta\left(x_{0}\right)\right)>0=H_{\theta,0}(0),

      and so 0 cannot be a global optimizer of Hθ,0(.)H_{\theta,0}(.)

    • vv is even.

      If there exists x0>0x_{0}>0 such that γ(β(x0))(0,)\gamma(\beta(x_{0}))\in(0,\infty), we are through by previous argument. Othewise, since μ\mu is not degenerate at 0, there exists x0<0x_{0}<0 such that γ(β(x0))(0,)\gamma(\beta(x_{0}))\in(0,\infty). Again setting θ2c:=x0vγ(β(x0))(0,)>0\theta_{2c}:=x_{0}^{-v}\gamma(\beta(x_{0}))\in(0,\infty)>0 with vv even, the same proof works.

  • For any θ>0\theta>0, let xθx_{\theta} be any non-negative global optimizer of Hθ,0(.)H_{\theta,0}(.). Then the map θxθ\theta\mapsto x_{\theta} is non-decreasing.
    Suppose by way of contradiction there exists 0<θ1<θ2<0<\theta_{1}<\theta_{2}<\infty such that 0xθ2<xθ10\leq x_{\theta_{2}}<x_{\theta_{1}}. By optimality of xθ1x_{\theta_{1}} we have

    θ1xθ1vγ(β(x1))\displaystyle\theta_{1}x^{v}_{\theta_{1}}-\gamma(\beta(x_{1})) \displaystyle\geq θ1xθ2vγ(β(x2))\displaystyle\theta_{1}x^{v}_{\theta_{2}}-\gamma(\beta(x_{2}))
    θ1(xθ1vxθ2v)\displaystyle\implies\theta_{1}(x^{v}_{\theta_{1}}-x^{v}_{\theta_{2}}) \displaystyle\geq γ(β(xθ1))γ(β(xθ2))\displaystyle\gamma(\beta(x_{\theta_{1}}))-\gamma(\beta(x_{\theta_{2}}))
    θ2(xθ1vxθ2v)\displaystyle\implies\theta_{2}(x^{v}_{\theta_{1}}-x^{v}_{\theta_{2}}) >\displaystyle> γ(β(xθ1))γ(β(xθ2)).\displaystyle\gamma(\beta(x_{\theta_{1}}))-\gamma(\beta(x_{\theta_{2}})).

    Here the last implication uses the fact xθ1>xθ20x_{\theta_{1}}>x_{\theta_{2}}\geq 0. But this contradicts the fact that xθ2x_{\theta_{2}} is a global maximizer for Hθ2,0(.)H_{\theta_{2},0}(.).

    Combining the last three claims, the conclusion of part (iii)(b) follows on setting θc:=supθ>0{0 is a global maximizer of Hθ,0(.)}.\theta_{c}:=\sup_{\theta>0}\{0\text{ is a global maximizer of }H_{\theta,0}(.)\}.

4. Proof of Main Lemmas

Proof of 2.5.

As before, we also choose p~(1,p)\tilde{p}\in(1,p) and q~(1,q)\tilde{q}\in(1,q) such that 1p~+1q~=1\frac{1}{\tilde{p}}+\frac{1}{\tilde{q}}=1. Note that Υ(,)\Upsilon(\cdot,\cdot) is well-defined on \mathcal{R} (see 2.1) by using 2.1 (ii), with W(,)W(\cdot,\cdot) as is, ϕ(x1,,xv)=a=2vxa\phi(x_{1},\ldots,x_{v})=\prod_{a=2}^{v}x_{a}, and p,qp,q replaced with p~,q~\tilde{p},\tilde{q}.

(i) Recall the definition of ϑW,ν()\vartheta_{W,\nu}(\cdot) from (1.18), and the connection Υ(W,ν):=Law(U1,ϑW,ν(U1))\Upsilon(W,\nu):=\mathrm{Law}\left(U_{1},{\vartheta_{W,\nu}(U_{1})}\right) between Υ\Upsilon and ϑW,ν\vartheta_{W,\nu} from 2.1. We will prove the following stronger claim.

(4.1) supW:WqΔCsupν:𝔪p(ν)C01|ϑW,ν(u)ϑW,νL(u)|𝑑u0,asL.\sup_{W:\ \lVert W\rVert_{q\Delta}\leq C}\sup_{\nu:\ \mathfrak{m}_{p}(\nu)\leq C}\int_{0}^{1}|\vartheta_{W,\nu}(u)-\vartheta_{W,\nu_{L}}(u)|\,du\to 0,\quad\mbox{as}\ L\to\infty.

Towards this end, fix L>1L>1 and note that

|ϑW,ν(u)ϑW,νL(u)|\displaystyle\;\;\;\;|\vartheta_{W,\nu}(u)-\vartheta_{W,\nu_{L}}(u)|
vA{2,,v},|A|1𝔼[|Sym[W](u,U2,,Uv)|(aA|Va|1{|Va|>L})(aAc|Va|1{|Va|L})].\displaystyle\leq v\sum_{\begin{subarray}{c}A\subseteq\{2,\ldots,v\},\\ |A|\geq 1\end{subarray}}\mathbb{E}\left[|\mathrm{Sym}[W](u,U_{2},\ldots,U_{v})|\left(\prod_{a\in A}|V_{a}|1\{|V_{a}|>L\}\right)\left(\prod_{a\in A^{c}}|V_{a}|1\{|V_{a}|\leq L\}\right)\right].

For every non empty fixed set A{2,,v}A\subseteq\{2,\ldots,v\}, an application of 2.1 part (ii) with W(,)W(\cdot,\cdot) as is,

ϕ(x1,,xv)=(aA|xa|𝟙(|xa|L))(aAc|xa|𝟙(|xa|L)),\phi(x_{1},\ldots,x_{v})=\left(\prod_{a\in A}|x_{a}|\mathbbm{1}(|x_{a}|\geq L)\right)\left(\prod_{a\in A^{c}}|x_{a}|\mathbbm{1}(|x_{a}|\leq L)\right),

and p,qp,q replaced by p~\tilde{p}, q~\tilde{q} on the above bound, gives

supW:WqΔCsupν:𝔪p(ν)C01|ϑW,ν(u)ϑW,ν(L)(u)|𝑑u\displaystyle\;\;\;\;\sup_{W:\ \lVert W\rVert_{q\Delta}\leq C}\sup_{\nu:\ \mathfrak{m}_{p}(\nu)\leq C}\int_{0}^{1}|\vartheta_{W,\nu}(u)-\vartheta_{W,\nu^{(L)}}(u)|\,du
vsupW:WqΔCsupν:𝔪p(ν)CA{2,,v},|A|1Wq~Δ|E(H)|((aA𝔼ν[|Va|p~𝟙(|Va|L)])\displaystyle\leq v\sup_{W:\ \lVert W\rVert_{q\Delta}\leq C}\sup_{\nu:\ \mathfrak{m}_{p}(\nu)\leq C}\sum_{A\subseteq\{2,\ldots,v\},\ |A|\geq 1}\lVert W\rVert_{\tilde{q}\Delta}^{|E(H)|}\Bigg{(}\left(\prod_{a\in A}\mathbb{E}_{\nu}[|V_{a}|^{\tilde{p}}\mathbbm{1}(|V_{a}|\geq L)]\right)
(aAc𝔼ν[|Va|p~𝟙(|Va|L)]))1p~\displaystyle\;\;\;\;\;\;\qquad\left(\prod_{a\in A^{c}}\mathbb{E}_{\nu}[|V_{a}|^{\tilde{p}}\mathbbm{1}(|V_{a}|\leq L)]\right)\Bigg{)}^{\frac{1}{\tilde{p}}}
(4.2) v2vsupW:WqΔCsupν:𝔪p(ν)CLp~pWq~Δ|E(H)|(1+𝔪p(ν))v1p0,\displaystyle\leq v2^{v}\sup_{W:\ \lVert W\rVert_{q\Delta}\leq C}\sup_{\nu:\ \mathfrak{m}_{p}(\nu)\leq C}L^{\tilde{p}-p}\lVert W\rVert_{\tilde{q}\Delta}^{|E(H)|}(1+\mathfrak{m}_{p}(\nu))^{\frac{v-1}{p}}\to 0,

as LL\to\infty. This proves (4.1), and hence completes part (i).

(ii) Given W𝒲W\in\mathcal{W}, ν\nu\in\mathcal{M} and any u[0,1]u\in[0,1], define

(4.3) (u;W):=𝔼[Sym[|W|](u,U2,,Uv)]\mathfrak{R}(u;W):=\mathbb{E}[\mathrm{Sym}[|W|](u,U_{2},\ldots,U_{v})]

where U2,,Uvi.i.d.Unif[0,1]U_{2},\ldots,U_{v}\overset{i.i.d.}{\sim}\mathrm{Unif}[0,1]. For k<k<\infty, and T>0T>0, define

ck(T)(u):=1{(u;Wk)T,(u;W)T},c_{k}^{(T)}(u):=1\{\mathfrak{R}(u;W_{k})\leq T,\ \mathfrak{R}(u;W_{\infty})\leq T\},

for u1[0,1]u_{1}\in[0,1]. With this notation, by a truncation followed by a simple method of moments argument, the conclusion in part (ii) will follow if we can show the following:

(4.4) limTlim supksupν(L)01|ϑWk,ν(u)(1ck(T)(u))|𝑑u=0,\lim\limits_{T\to\infty}\limsup_{k\to\infty}\sup_{\nu\in\mathcal{M}^{(L)}}\int_{0}^{1}|\vartheta_{W_{k},\nu}(u)(1-c_{k}^{(T)}(u))|\,du=0,
(4.5) limTlim supksupν(L)01|ϑW,ν(u)(1ck(T)(u))|𝑑u=0,\lim\limits_{T\to\infty}\limsup_{k\to\infty}\sup_{\nu\in\mathcal{M}^{(L)}}\int_{0}^{1}|\vartheta_{W_{\infty},\nu}(u)(1-c_{k}^{(T)}(u))|\,du=0,
(4.6) supν(L)|01(ϑWk,ν(u)ck(T)(u))r𝑑u01(ϑW,ν(u)ck(T)(u))r𝑑u|0,\displaystyle\sup_{\nu\in\mathcal{M}^{(L)}}\Bigg{|}\int_{0}^{1}\big{(}\vartheta_{W_{k},\nu}(u)c_{k}^{(T)}(u)\big{)}^{r}\,du-\int_{0}^{1}\big{(}\vartheta_{W_{\infty},\nu}(u)c_{k}^{(T)}(u)\big{)}^{r}\,du\Bigg{|}\to 0,

as kk\to\infty, for every T>0T>0, and every rr\in\mathbb{N}.

Proof of (4.4). To begin, for any W𝒲W\in\mathcal{W} we have the bound

supν(L)|ϑW,ν(u)|Lv1(u;W),\displaystyle\sup_{\nu\in\mathcal{M}^{(L)}}|\vartheta_{W,\nu}(u)|\leq L^{v-1}\mathfrak{R}(u;W),

which gives

|ϑWk,ν(u)(1ck(T)(u))|Lv1(u;Wk)(1{(u;Wk)>T}+1{(u;W)>T}).|\vartheta_{W_{k},\nu}(u)(1-c_{k}^{(T)}(u))|\leq L^{v-1}\mathfrak{R}(u;W_{k})\left(1\{\mathfrak{R}(u;W_{k})>T\}+1\{\mathfrak{R}(u;W_{\infty})>T\}\right).

Therefore, (4.4) will follow if we can show that

(4.7) limTlim supk01(u;Wk)(1{(u;Wk)>T}+1{(u;W)>T})𝑑u=0.\lim_{T\to\infty}\limsup_{k\to\infty}\int_{0}^{1}\mathfrak{R}(u;W_{k})\left(1\{\mathfrak{R}(u;W_{k})>T\}+1\{\mathfrak{R}(u;W_{\infty})>T\}\right)\,du=0.

We now complete the proof based on the following claim, whose proof we defer.

(4.8) lim supk01q(u;Wk)𝑑u<,01q(u;W)𝑑u<.\displaystyle\limsup_{k\to\infty}\int_{0}^{1}\mathfrak{R}^{q}(u;W_{k})\,du<\infty,\quad\int_{0}^{1}\mathfrak{R}^{q}(u;W_{\infty})\,du<\infty.

We will now deal with (4.7) term by term. First note that:

01(u;Wk)1{(u;Wk)>T}𝑑u1Tq101q(u;Wk)𝑑u.\displaystyle\int_{0}^{1}\mathfrak{R}(u;W_{k})1\{\mathfrak{R}(u;W_{k})>T\}\,du\leq\frac{1}{T^{q-1}}\int_{0}^{1}\mathfrak{R}^{q}(u;W_{k})\,du.

By the first claim in (4.8), the right hand side above converges to 0 by taking kk\to\infty followed by TT\to\infty, thus proving the first claim in (4.7). For the second claim in (4.7), setting p~=q/(q1)\tilde{p}=q/(q-1) Hölder’s inequality gives

01(u;Wk)1{(u;W)>T}𝑑u\displaystyle\;\;\;\;\int_{0}^{1}\mathfrak{R}(u;W_{k})1\{\mathfrak{R}(u;W_{\infty})>T\}\,du
(01q(u;Wk)𝑑u)1q(011{(u;W)>T}𝑑u)1p~\displaystyle\leq\left(\int_{0}^{1}\mathfrak{R}^{q}(u;W_{k})\,du\right)^{\frac{1}{q}}\left(\int_{0}^{1}1\{\mathfrak{R}(u;W_{\infty})>T\}\,du\right)^{\frac{1}{\tilde{p}}}
(01q(u;Wk)𝑑u)1q1T1p(01(u;W)𝑑u)1p~,\displaystyle\leq\left(\int_{0}^{1}\mathfrak{R}^{q}(u;W_{k})\,du\right)^{\frac{1}{q}}\frac{1}{T^{\frac{1}{p}}}\left(\int_{0}^{1}\mathfrak{R}(u;W_{\infty})\,du\right)^{\frac{1}{\tilde{p}}},

where the final quantity above converges to 0 taking kk\rightarrow\infty followed by TT\rightarrow\infty using both claims in (4.8). This proves the second claim in (4.7), and hence completes the verification of (4.4), subject to proving (4.8).

Proof of (4.8). Note that

q(u;Wk)\displaystyle\mathfrak{R}^{q}(u;W_{k}) =(𝔼[Sym[|Wk|](u,U2,,Uv)])q𝔼[Sym[|Wk|q](u,U2,,Uv)],\displaystyle=\left(\mathbb{E}[\mathrm{Sym}[|W_{k}|](u,U_{2},\ldots,U_{v})]\right)^{q}\leq\mathbb{E}[\mathrm{Sym}[|W_{k}|^{q}](u,U_{2},\ldots,U_{v})],

where the inequality follows from Lyapunov’s inequality (the function r𝔼[|X|r]1/rr\mapsto\mathbb{E}[|X|^{r}]^{1/r} is non-decreasing on (0,)(0,\infty)). On integrating over uu we get

01q(u;Wk)𝑑u𝔼[Sym[|Wk|q](U1,,Uv)]WkqΔq,\int_{0}^{1}\mathfrak{R}^{q}(u;W_{k})\,du\leq\mathbb{E}[\mathrm{Sym}[|W_{k}|^{q}](U_{1},\ldots,U_{v})]\leq\lVert W_{k}\rVert_{q\Delta}^{q},

where the last inequality follows from 2.1, part (iii). By our assumption lim supkWkqΔ<\limsup_{k\to\infty}\lVert W_{k}\rVert_{q\Delta}<\infty, the first conclusion in (4.8) follows. The second conclusion follows similarly.

Proof of (4.5). This follows the exact same line of argument as the proof of (4.4), and hence is omitted for brevity.

Proof of (4.6). Set hν(u):=𝔼ν[V|U=u]h_{\nu}(u):=\mathbb{E}_{\nu}[V|U=u], and use the definition ϑWk,ν()\vartheta_{W_{k},\nu}(\cdot) in (1.18) to note that

01(ϑWk,ν(u1)ck(T)(u1))r𝑑u1\displaystyle\;\;\;\;\int_{0}^{1}\left(\vartheta_{W_{k},\nu}(u_{1})c_{k}^{(T)}(u_{1})\right)^{r}\,du_{1}
=01ck(T)(u1)([0,1](v1)ri=1r(Sym[Wk](u1,u2(i),,uv(i))a=2vhν(ua(i))dua(i)))𝑑u1\displaystyle=\int_{0}^{1}c_{k}^{(T)}(u_{1})\left(\int_{[0,1]^{(v-1)r}}\prod_{i=1}^{r}\left(\mathrm{Sym}[W_{k}](u_{1},u_{2}^{(i)},\ldots,u^{(i)}_{v})\prod_{a=2}^{v}h_{\nu}(u_{a}^{(i)})\,du_{a}^{(i)}\right)\right)\,du_{1}

We can similarly write out an expression for 01(ϑW,ν(u1)ck(T)(u1))r𝑑u1\int_{0}^{1}\left(\vartheta_{W_{\infty},\nu}(u_{1})c_{k}^{(T)}(u_{1})\right)^{r}\,du_{1} with Sym[Wk]\mathrm{Sym}[W_{k}] is replaced by Sym[W]\mathrm{Sym}[W_{\infty}]. Accordingly, to establish (4.6), replacing each Sym[Wk](u1,u2(i),,uv(i))\mathrm{Sym}[W_{k}](u_{1},u_{2}^{(i)},\ldots,u_{v}^{(i)}) by Sym[W](u1,u2(i),,uv(i))\mathrm{Sym}[W_{\infty}](u_{1},u_{2}^{(i)},\ldots,u_{v}^{(i)}) sequentially, it suffices to show that:

(4.9) limksupν~(L)|𝔉kν,A|=0,\lim_{k\to\infty}\sup_{\nu\in\widetilde{\mathcal{M}}^{(L)}}\ \big{|}\mathfrak{F}_{k}^{\nu,A}\big{|}=0,

for every fixed L>0L>0 and A{2,,r}A\subseteq\{2,\ldots,r\}, where

𝔉kν,A:=01([0,1]v1(Sym[Wk](u1,u2(1),,uv(1))Sym[W](u1,u2(1),,uv(1)))ck(T)(u1)\displaystyle\mathfrak{F}_{k}^{\nu,A}:=\int_{0}^{1}\bigg{(}\int_{[0,1]^{v-1}}(\mathrm{Sym}[W_{k}](u_{1},u_{2}^{(1)},\ldots,u_{v}^{(1)})-\mathrm{Sym}[W_{\infty}](u_{1},u_{2}^{(1)},\ldots,u_{v}^{(1)}))c_{k}^{(T)}(u_{1})
a=2vhν(ua(1))dua(1))([0,1]|A|×(v1)iA(Sym[Wk](u1,u2(i),,uv(i))ck(T)(u1)a=2vhν(ua(i))a=2vdua(i)))\displaystyle\prod_{a=2}^{v}h_{\nu}(u_{a}^{(1)})\,du_{a}^{(1)}\bigg{)}\left(\int_{[0,1]^{|A|\times(v-1)}}\prod_{i\in A}\left(\mathrm{Sym}[W_{k}](u_{1},u^{(i)}_{2},\ldots,u^{(i)}_{v})c_{k}^{(T)}(u_{1})\prod_{a=2}^{v}h_{\nu}(u^{(i)}_{a})\prod_{a=2}^{v}\,du^{(i)}_{a}\right)\right)
([0,1]|Ac|×(v1)iAc(Sym[W](u1,u2(i),,uv(i))ck(T)(u1)a=2vhν(ua(i))a=2vdua(i)))ck(T)(u1)du1.\displaystyle\left(\int_{[0,1]^{|A^{c}|\times(v-1)}}\prod_{i\in A^{c}}\left(\mathrm{Sym}[W_{\infty}](u_{1},u^{(i)}_{2},\ldots,u^{(i)}_{v})c_{k}^{(T)}(u_{1})\prod_{a=2}^{v}h_{\nu}(u^{(i)}_{a})\prod_{a=2}^{v}\,du^{(i)}_{a}\right)\right)c_{k}^{(T)}(u_{1})\,du_{1}.

In order to establish (4.9), let us further define

𝔫kν,(T)(u):=[0,1]v1Sym[Wk](u,u2,,uv)ck(T)(u)a=2vhν(ua)a=2vdua,\mathfrak{n}_{k}^{\nu,(T)}(u):=\int_{[0,1]^{v-1}}\mathrm{Sym}[W_{k}](u,u_{2},\ldots,u_{v})c_{k}^{(T)}(u)\prod_{a=2}^{v}h_{\nu}(u_{a})\prod_{a=2}^{v}\,du_{a},
𝔭kν,(T)(u):=[0,1]v1Sym[W](u,u2,,uv)ck(T)(u)a=2vhν(ua)a=2vdua,\mathfrak{p}_{k}^{\nu,(T)}(u):=\int_{[0,1]^{v-1}}\mathrm{Sym}[W_{\infty}](u,u_{2},\ldots,u_{v})c_{k}^{(T)}(u)\prod_{a=2}^{v}h_{\nu}(u_{a})\prod_{a=2}^{v}\,du_{a},

and note that

(4.10) supν~(L)supk1max{𝔫kν,(T),𝔭kν,(T)}Lv1T.\displaystyle\sup_{\nu\in\widetilde{\mathcal{M}}^{(L)}}\sup_{k\geq 1}\max\left\{\lVert\mathfrak{n}_{k}^{\nu,(T)}\rVert_{\infty},\lVert\mathfrak{p}_{k}^{\nu,(T)}\rVert_{\infty}\right\}\leq L^{v-1}T.

Proceeding to show (4.9), integrating with respect to all the variables other than u1,u2(i0),,uv(i0)u_{1},u^{(i_{0})}_{2},\ldots,u^{(i_{0})}_{v}, we get

|𝔉kν,A|\displaystyle\big{|}\mathfrak{F}_{k}^{\nu,A}\big{|} =|01([0,1](v1)((Sym[Wk](u1,u2(1),,uv(1))Sym[W](u1,u2(1),,uv(1)))\displaystyle=\bigg{|}\int_{0}^{1}\Bigg{(}\int_{[0,1]^{(v-1)}}\Bigg{(}\big{(}\mathrm{Sym}[W_{k}](u_{1},u_{2}^{(1)},\ldots,u_{v}^{(1)})-\mathrm{Sym}[W_{\infty}](u_{1},u_{2}^{(1)},\ldots,u_{v}^{(1)})\big{)}
ck(T)(u1)a=2vhν(ua(1))a=2vdua))(𝔫kν,(T)(u1))|A|(𝔭kν,(T)(u1))|Ac|du1|\displaystyle\quad c_{k}^{(T)}(u_{1})\prod_{a=2}^{v}h_{\nu}(u_{a}^{(1)})\prod_{a=2}^{v}\,du_{a}\Bigg{)}\Bigg{)}\left(\mathfrak{n}_{k}^{\nu,(T)}(u_{1})\right)^{|A|}\left(\mathfrak{p}_{k}^{\nu,(T)}(u_{1})\right)^{|A^{c}|}\,du_{1}\bigg{|}
1v!σSv|((a,b)E(H)Wk(uσ(a),uσ(b))(a,b)E(H)W(uσ(a),uσ(b)))\displaystyle\leq\frac{1}{v!}\sum_{\sigma\in S_{v}}\Bigg{|}\int\left(\prod_{(a,b)\in E(H)}W_{k}(u_{\sigma(a)},u_{\sigma(b)})-\prod_{(a,b)\in E(H)}W_{\infty}(u_{\sigma(a)},u_{\sigma(b)})\right)
(a=2vhν(ua))ck(T)(u1)(𝔫kν,(T)(u1))|A|(𝔭kν,(T)(u1))|Ac|a=1vdua|.\displaystyle\left(\prod_{a=2}^{v}h_{\nu}(u_{a})\right)c_{k}^{(T)}(u_{1})\left(\mathfrak{n}_{k}^{\nu,(T)}(u_{1})\right)^{|A|}\left(\mathfrak{p}_{k}^{\nu,(T)}(u_{1})\right)^{|A^{c}|}\prod_{a=1}^{v}\,du_{a}\Bigg{|}.

Observe that |hν||h_{\nu}|’s are bounded by LL for ν~(L)\nu\in\widetilde{\mathcal{M}}^{(L)}, ck(T)c_{k}^{(T)} is bounded by definition, and further 𝔫kν,(T)\mathfrak{n}_{k}^{\nu,(T)} and 𝔭kν,(T)\mathfrak{p}_{k}^{\nu,(T)} are both bounded by (4.10). The conclusion in (4.9) then follows from [6, Proposition 3.1 part (ii)].

(iii) Note that there exists a sequence of bounded continuous functions Wm𝒲+W_{m}\in\mathcal{W}^{+} such that WmWq0\lVert W_{m}-W\rVert_{q}\to 0 as mm\to\infty. The triangle inequality implies that given any m1m\geq 1, k1k\geq 1, we have:

d(Υ(W,νk),Υ(W,ν))\displaystyle d_{\ell}(\Upsilon(W,\nu_{k}),\Upsilon(W,\nu_{\infty})) d(Υ(W,νk),Υ(Wm,νk))+d(Υ(Wm,νk),Υ(Wm,ν))\displaystyle\leq d_{\ell}(\Upsilon(W,\nu_{k}),\Upsilon(W_{m},\nu_{k}))+d_{\ell}(\Upsilon(W_{m},\nu_{k}),\Upsilon(W_{m},\nu_{\infty}))
(4.11) +d(Υ(W,ν),Υ(Wm,ν))\displaystyle+d_{\ell}(\Upsilon(W,\nu_{\infty}),\Upsilon(W_{m},\nu_{\infty}))

By part (ii), we have:

limmsupk[1,]d(Υ(Wm,νk),Υ(W,νk))=0.\lim_{m\to\infty}\sup_{k\in[1,\infty]}d_{\ell}(\Upsilon(W_{m},\nu_{k}),\Upsilon(W,\nu_{k}))=0.

Further from the definition of weak convergence we have, for every fixed mm,

d(Υ(Wm,νk),Υ(Wm,ν))0,ask.d_{\ell}(\Upsilon(W_{m},\nu_{k}),\Upsilon(W_{m},\nu_{\infty}))\to 0,\quad\mbox{as}\ k\to\infty.

Combining the two displays above with (4) establishes part (iii).

Proof of 2.9.

Recall from (2.31) that 𝔅~θ=G1(𝔅θ)\widetilde{\mathfrak{B}}_{\theta}=G_{1}(\mathfrak{B}_{\theta}^{*}), where G1(x,y)=(x,α(y))G_{1}(x,y)=(x,\alpha^{\prime}(y)) with α(.)\alpha^{\prime}(.) continuous (see 1.3 for definition of α(.)\alpha(.)). The facts that 𝔅θM~q\mathfrak{B}_{\theta}^{*}\subseteq\widetilde{M}_{q} and 𝔅~θM~p\widetilde{\mathfrak{B}}_{\theta}\subseteq\widetilde{M}_{p} follow directly from (1.25). It thus suffices to prove compactness of 𝔅θ\mathfrak{B}_{\theta}^{*} (which will imply compactness of 𝔅~θ\widetilde{\mathfrak{B}}_{\theta}).

To this effect, invoking (1.25) there exists C>0C>0 such that Ξ(Fθ)~p,C\Xi(F_{\theta})\in\widetilde{\mathcal{M}}_{p,C} (see (2.24) for the definition of ~p,C\widetilde{\mathcal{M}}_{p,C}). Also by 2.6 the function Υ(W,)\Upsilon(W,\cdot) is continuous on Ξ(Fθ)\Xi(F_{\theta}) with respect to weak topology. Since Ξ(Fθ)\Xi(F_{\theta}) is compact in the weak topology (see 1.1 part (iv)), and 𝔅θ=Υ(W,Ξ(Fθ))\mathfrak{B}_{\theta}^{*}=\Upsilon(W,\Xi(F_{\theta})) (from (2.13)), compactness of 𝔅θ\mathfrak{B}_{\theta}^{*} follows.

Proof of 2.10.

(i) For any L>0L>0 under n,θ(1)\mathbb{R}_{n,\theta}^{(1)} we have

(4.12) 1ni=1n𝔼|di(XiXi(L))|DnLp1i=1n𝔼|Xi|p,\frac{1}{n}\sum_{i=1}^{n}\mathbb{E}\left|d_{i}\left(X_{i}-X_{i}^{(L)}\right)\right|\leq\frac{D}{nL^{p-1}}\sum_{i=1}^{n}\mathbb{E}|X_{i}|^{p},

where Xi(L):=Xi1{|Xi|L}X_{i}^{(L)}:=X_{i}1\{|X_{i}|\leq L\} and 𝐝D\|{\bf d}\|_{\infty}\leq D. The RHS of (4.12) converges to 0 as nn\to\infty followed by LL\to\infty by using (1.22). Since α(θmi)=𝔼[Xi|Xj,j[n],ji]\alpha^{\prime}(\theta m_{i})=\mathbb{E}[X_{i}|X_{j},\ j\in[n],\ j\neq i], setting

𝒥i(L):=𝔼[Xi(L)|Xj,ji],\mathcal{J}_{i}^{(L)}:=\mathbb{E}\left[X_{i}^{(L)}|X_{j},\ j\neq i\right],

we note that

|α(θmi)𝒥i(L)|𝔼[|Xi|1(|Xi|>L)|Xj,ji]1Lp1𝔼[|Xi|p|Xj,ji].\displaystyle\bigg{|}\alpha^{\prime}(\theta m_{i})-\mathcal{J}_{i}^{(L)}\bigg{|}\leq\mathbb{E}\left[|X_{i}|1(|X_{i}|>L)|X_{j},\ j\neq i\right]\leq\frac{1}{L^{p-1}}\mathbb{E}\left[|X_{i}|^{p}|X_{j},\ j\neq i\right].

Consequently,

(4.13) 1ni=1n𝔼|di(α(θmi)𝒥i(L))|DnLp1i=1n𝔼|Xi|p,\displaystyle\frac{1}{n}\sum_{i=1}^{n}\mathbb{E}\bigg{|}d_{i}\left(\alpha^{\prime}(\theta m_{i})-\mathcal{J}_{i}^{(L)}\right)\bigg{|}\leq\frac{D}{nL^{p-1}}\sum_{i=1}^{n}\mathbb{E}|X_{i}|^{p},

which converges to 0 as nn\to\infty followed by LL\to\infty, by using (1.22) and the fact that p>1p>1. Combining (4.12) and (4.13), it suffices to show i=1ndi(Xi(L)𝒥i(L))=oP(1)\sum_{i=1}^{n}d_{i}\left(X_{i}^{(L)}-\mathcal{J}_{i}^{(L)}\right)=o_{P}(1). Towards this direction, we further define, for iji\neq j,

𝒥i,j(L):=𝔼[Xi(L)|Xk,k{i,j},Xj=0],\mathcal{J}^{(L)}_{i,j}:=\mathbb{E}\left[X_{i}^{(L)}|X_{k},\ k\neq\{i,j\},\ X_{j}=0\right],

and observe that

𝔼[1ni=1ndi(Xi(L)𝒥i(L))]2\displaystyle\;\;\;\;\mathbb{E}\left[\frac{1}{n}\sum_{i=1}^{n}d_{i}\Big{(}X_{i}^{(L)}-\mathcal{J}_{i}^{(L)}\Big{)}\right]^{2}
=1n2i=1ndi2𝔼(Xi(L)𝒥i(L))2+1n2ijdidj𝔼[((Xi(L)𝒥i(L))(Xj(L)𝒥j(L))]\displaystyle=\frac{1}{n^{2}}\sum_{i=1}^{n}d_{i}^{2}\mathbb{E}\left(X_{i}^{(L)}-\mathcal{J}_{i}^{(L)}\right)^{2}+\frac{1}{n^{2}}\sum_{i\neq j}d_{i}d_{j}\mathbb{E}\left[\left((X_{i}^{(L)}-\mathcal{J}_{i}^{(L)}\right)\left(X_{j}^{(L)}-\mathcal{J}_{j}^{(L)}\right)\right]
4D2L2n+1n2ijdidj𝔼[(Xi𝒥i,j(L)+𝒥i,j(L)𝒥i(L))(Xj(L)𝒥j(L))].\displaystyle\leq\frac{4D^{2}L^{2}}{n}+\frac{1}{n^{2}}\sum_{i\neq j}d_{i}d_{j}\mathbb{E}\left[\left(X_{i}-\mathcal{J}_{i,j}^{(L)}+\mathcal{J}_{i,j}^{(L)}-\mathcal{J}_{i}^{(L)}\right)\left(X_{j}^{(L)}-\mathcal{J}_{j}^{(L)}\right)\right].

For iji\neq j the random variable Xi(L)𝒥i,j(L)X_{i}^{(L)}-\mathcal{J}_{i,j}^{(L)} is measurable with respect to the sigma field generated by {Xk,k[n],kj}\{X_{k},\ k\in[n],\ k\neq j\}, and consequently,

𝔼[(Xi(L)𝒥i,j(L))(Xj(L)𝒥j(L))]=0,\mathbb{E}\left[\left(X_{i}^{(L)}-\mathcal{J}_{i,j}^{(L)}\right)\left(X_{j}^{(L)}-\mathcal{J}_{j}^{(L)}\right)\right]=0,

for iji\neq j. Combining the last two displays gives

(4.14) 𝔼[1ni=1ndi(Xi(L)𝒥i(L))]24D2L2n+2LD2n2ij𝔼|𝒥i,j(L)𝒥i(L)|.\mathbb{E}\left[\frac{1}{n}\sum_{i=1}^{n}d_{i}\Big{(}X_{i}^{(L)}-\mathcal{J}_{i}^{(L)}\Big{)}\right]^{2}\leq\frac{4D^{2}L^{2}}{n}+\frac{2LD^{2}}{n^{2}}\sum_{i\neq j}\mathbb{E}\left|\mathcal{J}_{i,j}^{(L)}-\mathcal{J}_{i}^{(L)}\right|.

It suffices to show that the second term in the RHS of (4.14) converges to 0 for every fixed D,LD,L. To control this second term, define

(4.15) mi,j:=vnv1(k2,,kv)𝒮(n,v,{i,j})Sym[Qn](i,k2,,kv)(m=2vXkm)m_{i,j}:=\frac{v}{n^{v-1}}\sum_{\begin{subarray}{c}(k_{2},\ldots,k_{v})\\ \in\mathcal{S}(n,v,\{i,j\})\end{subarray}}\mathrm{Sym}[Q_{n}](i,k_{2},\ldots,k_{v})\left(\prod_{m=2}^{v}X_{k_{m}}\right)

for iji\neq j, where 𝒮(n,v,{i,j})\mathcal{S}(n,v,\{i,j\}) denotes the set of all distinct tuples in [n]v1[n]^{v-1}, such that none of the elements equal to ii or jj. For any K>0K>0, by the triangle inequality we have the following for any iji\neq j,

1n2ij𝔼|𝒥i,j(L)𝒥i(L)|\displaystyle\;\;\;\;\;\frac{1}{n^{2}}\sum_{i\neq j}\mathbb{E}\left|\mathcal{J}_{i,j}^{(L)}-\mathcal{J}_{i}^{(L)}\right|
1n2ij𝔼[(|𝒥i,j(L)|+|𝒥i(L)|)(𝟙(|mi,j|K)+𝟙(|mi|K))]\displaystyle\leq\frac{1}{n^{2}}\sum_{i\neq j}\mathbb{E}\left[\left(\left|\mathcal{J}_{i,j}^{(L)}\right|+\left|\mathcal{J}_{i}^{(L)}\right|\right)\left(\mathbbm{1}(|m_{i,j}|\geq K)+\mathbbm{1}(|m_{i}|\geq K)\right)\right]
+1n2i=1n𝔼[|𝒥i,j(L)𝒥i(L)|𝟙(|mi,j|K,|mi|K)]\displaystyle\;\;\;\;\;\;\;\;\;\;+\frac{1}{n^{2}}\sum_{i=1}^{n}\mathbb{E}\left[\left|\mathcal{J}_{i,j}^{(L)}-\mathcal{J}_{i}^{(L)}\right|\mathbbm{1}(|m_{i,j}|\leq K,\ |m_{i}|\leq K)\right]
(4.16) 2Ln2Kij𝔼(|mi,j|+|mi|)+1n2ijn𝔼[|𝒥i,j(L)𝒥i(L)|𝟙(|mi,j|K,|mi|K)].\displaystyle\leq\frac{2L}{n^{2}K}\sum_{i\neq j}\mathbb{E}\left(|m_{i,j}|+|m_{i}|\right)+\frac{1}{n^{2}}\sum_{i\neq j}^{n}\mathbb{E}\left[\left|\mathcal{J}_{i,j}^{(L)}-\mathcal{J}_{i}^{(L)}\right|\mathbbm{1}(|m_{i,j}|\leq K,\ |m_{i}|\leq K)\right].

It suffices to show that the RHS of (4) converges to 0 as nn\to\infty, followed by KK\to\infty. Now let us complete this proof based on the following claim, whose proof we defer:

(4.17) ij𝔼|mimi,j|=O(n).\sum_{i\neq j}\mathbb{E}|m_{i}-m_{i,j}|=O(n).

By combining (4.17) with (1.23), we also have:

(4.18) ij𝔼|mi,j|=O(n2).\sum_{i\neq j}\mathbb{E}|m_{i,j}|=O(n^{2}).

By combining (4.18) with (1.23), it is immediate that the first term in the RHS of (4) converges to 0 as nn\to\infty, followed by KK\to\infty. For the second term in the RHS of (4), let us define the function:

𝔈L(t):=|x|Lxexp(tx)𝑑μ(x)exp(tx)𝑑μ(x)=𝔼Xμt[X𝟙(|X|L)],\mathfrak{E}_{L}(t):=\frac{\int_{|x|\leq L}x\exp(tx)\,d\mu(x)}{\int_{-\infty}^{\infty}\exp(tx)\,d\mu(x)}=\mathbb{E}_{X\sim\mu_{t}}[X\mathbbm{1}(|X|\leq L)],

where μt\mu_{t} is the exponential tilt of μ\mu as introduced in 1.3. From standard properties of exponential families, 𝔈L()\mathfrak{E}_{L}(\cdot) has a continuous derivative on \mathbb{R} and therefore,

sup|t||θ|K|𝔈L(t)|𝔠,\sup_{|t|\leq|\theta|K}|\mathfrak{E}_{L}^{\prime}(t)|\leq\mathfrak{c},

where 𝔠>0\mathfrak{c}>0 depends on |θ||\theta|, LL, and KK. Hence,

1n2ij𝔼[|𝒥i,j(L)𝒥i(L)|𝟙(|mi,j|K,|mi|K)]\displaystyle\;\;\;\;\;\frac{1}{n^{2}}\sum_{i\neq j}\mathbb{E}\left[\left|\mathcal{J}_{i,j}^{(L)}-\mathcal{J}_{i}^{(L)}\right|\mathbbm{1}(|m_{i,j}|\leq K,\ |m_{i}|\leq K)\right]
=1n2ij𝔼[|𝔈(θmi,j)𝔈(θmi)|𝟙(|mi,j|K,|mi|K)]\displaystyle=\frac{1}{n^{2}}\sum_{i\neq j}\mathbb{E}\left[\left|\mathfrak{E}(\theta m_{i,j})-\mathfrak{E}(\theta m_{i})\right|\mathbbm{1}(|m_{i,j}|\leq K,\ |m_{i}|\leq K)\right]
(4.19) 𝔠|θ|n2ij𝔼|mi,jmi|=O(1n),\displaystyle\leq\frac{\mathfrak{c}|\theta|}{n^{2}}\sum_{i\neq j}\mathbb{E}|m_{i,j}-m_{i}|=O\left(\frac{1}{n}\right),

for every fixed θ\theta, LL, and KK. This completes the proof that (4) converges to 0 as nn\to\infty followed by KK\to\infty.

Proof of (4.17). The symmetry of Sym[Qn]\mathrm{Sym}[Q_{n}] implies

|mimi,j|vnv1|Xj|(k3,,kv)𝒮(n,v1,{i,j})Sym[|Qn|](i,j,k3,,kv)(m=3v|Xkm|).\displaystyle|m_{i}-m_{i,j}|\leq\frac{v}{n^{v-1}}|X_{j}|\sum_{\begin{subarray}{c}(k_{3},\ldots,k_{v})\\ \in\mathcal{S}(n,v-1,\{i,j\})\end{subarray}}\mathrm{Sym}[|Q_{n}|](i,j,k_{3},\ldots,k_{v})\left(\prod_{m=3}^{v}|X_{k_{m}}|\right).

Using this, we bound the left hand side of (4) below.

1nij|mimi,j|\displaystyle\frac{1}{n}\sum_{i\neq j}\left|m_{i}-m_{i,j}\right| vnvij(k3,,kv)𝒮(n,v,{i,j})Sym[|Qn|](i,j,k3,,kv)|Xj|(m=3v|Xkm|)\displaystyle\leq\frac{v}{n^{v}}\sum_{i\neq j}\sum_{\begin{subarray}{c}(k_{3},\ldots,k_{v})\\ \in\mathcal{S}(n,v,\{i,j\})\end{subarray}}\mathrm{Sym}[|Q_{n}|](i,j,k_{3},\ldots,k_{v})|X_{j}|\left(\prod_{m=3}^{v}|X_{k_{m}}|\right)
v(1nv(k1,,kv)[n]v|Sym[|Qn|](k1,,kv)|q)1q(1ni=1n|Xi|p)v1p\displaystyle\leq v\left(\frac{1}{n^{v}}\sum_{(k_{1},\ldots,k_{v})\in[n]^{v}}|\mathrm{Sym}[|Q_{n}|](k_{1},\ldots,k_{v})|^{q}\right)^{\frac{1}{q}}\left(\frac{1}{n}\sum_{i=1}^{n}|X_{i}|^{p}\right)^{\frac{v-1}{p}}
vWQnqΔ(1ni=1n|Xi|p)v1p,\displaystyle\leq v\lVert W_{Q_{n}}\rVert_{q\Delta}\left(\frac{1}{n}\sum_{i=1}^{n}|X_{i}|^{p}\right)^{\frac{v-1}{p}},

where the second inequality follows from Hölder’s inequality, and the third inequality uses 2.1 part (c). The above display, on taking expectation, gives

1nij𝔼|mimi,j|vWQnqΔ(1ni=1n𝔼|Xi|p)v1p.\displaystyle\frac{1}{n}\sum_{i\neq j}\mathbb{E}|m_{i}-m_{i,j}|\leq v\lVert W_{Q_{n}}\rVert_{q\Delta}\left(\frac{1}{n}\sum_{i=1}^{n}\mathbb{E}|X_{i}|^{p}\right)^{\frac{v-1}{p}}.

Here, we have used Lyapunov’s inequality coupled with the observation that v1pv-1\leq p. As lim supnWQnqΔ<\limsup_{n\to\infty}\lVert W_{Q_{n}}\rVert_{q\Delta}<\infty by (1.8), an application of (1.22) in the last display above completes the proof of (4.17).

Proof of 2.10.

(ii) Choose q~<q\tilde{q}<q and p~<p\tilde{p}<p such that p~1+q~1=1\tilde{p}^{-1}+\tilde{q}^{-1}=1. Fixing L>0L>0 we have

1n|i=1nmi(XiXi(L))|1Lpp~1(1ni=1n|mi|q~)1q~(1ni=1n|Xi|p)1p~=op(1),\frac{1}{n}\bigg{|}\sum_{i=1}^{n}m_{i}(X_{i}-X_{i}^{(L)})\bigg{|}\leq\frac{1}{L^{\frac{p}{\tilde{p}}-1}}\left(\frac{1}{n}\sum_{i=1}^{n}|m_{i}|^{\tilde{q}}\right)^{\frac{1}{\tilde{q}}}\left(\frac{1}{n}\sum_{i=1}^{n}|X_{i}|^{p}\right)^{\frac{1}{\tilde{p}}}=o_{p}(1),

where the limit is to be understood as nn\to\infty followed by LL\to\infty. Here we used (1.22) and (1.23). Now, from standard analysis we have the existence of a C1C_{1} function ψL:\psi_{L}:\mathbb{R}\to\mathbb{R} such that ψL(x)=x\psi_{L}(x)=x for |x|L|x|\leq L, |ψL(x)||x||\psi_{L}(x)|\leq|x|, ψL<\lVert\psi_{L}\rVert_{\infty}<\infty, and ψL<\lVert\psi_{L}^{\prime}\rVert_{\infty}<\infty. This gives

|miψL(mi)|q~2q~|mi|q~1{|mi|>L}2q~Lqq~|mi|q.|m_{i}-\psi_{L}(m_{i})|^{\tilde{q}}\leq 2^{\tilde{q}}|m_{i}|^{\tilde{q}}{1}\{|m_{i}|>L\}\leq\frac{2^{\tilde{q}}}{L^{q-\tilde{q}}}|m_{i}|^{q}.

Using this bound along with Hölder’s inequality, we get:

1n|i=1n(miψL(mi))Xi(L)|\displaystyle\frac{1}{n}\bigg{|}\sum_{i=1}^{n}(m_{i}-\psi_{L}(m_{i}))X_{i}^{(L)}\bigg{|} (1ni=1n|miψL(mi)|q~)1q~(1ni=1n|Xi|p~)1p~\displaystyle\leq\left(\frac{1}{n}\sum_{i=1}^{n}|m_{i}-\psi_{L}(m_{i})|^{\tilde{q}}\right)^{\frac{1}{\tilde{q}}}\left(\frac{1}{n}\sum_{i=1}^{n}|X_{i}|^{\tilde{p}}\right)^{\frac{1}{\tilde{p}}}
2Lqq~1(1ni=1n|mi|q)1q~(1nj=1n|Xj|p~)1p~=oP(1)\displaystyle\leq\frac{2}{L^{\frac{q}{\tilde{q}}-1}}\left(\frac{1}{n}\sum_{i=1}^{n}|m_{i}|^{q}\right)^{\frac{1}{\tilde{q}}}\left(\frac{1}{n}\sum_{j=1}^{n}|X_{j}|^{\tilde{p}}\right)^{\frac{1}{\tilde{p}}}=o_{P}(1)

as nn\to\infty followed by LL\to\infty, on using (1.22) and (1.23). Combining the above displays we get

1n|i=1nmiXii=1nψL(mi)Xi(L)|=op(1),\displaystyle\frac{1}{n}\bigg{|}\sum_{i=1}^{n}m_{i}X_{i}-\sum_{i=1}^{n}\psi_{L}(m_{i})X_{i}^{(L)}\bigg{|}=o_{p}(1),

as nn\to\infty followed by LL\to\infty. A similar computation shows

1n|i=1nmi𝔼[Xi|Xj,ji]i=1nψL(mi)𝔼[Xi(L)|Xj,ji]|=op(1)\displaystyle\frac{1}{n}\bigg{|}\sum_{i=1}^{n}m_{i}\mathbb{E}[X_{i}|X_{j},\ j\neq i]-\sum_{i=1}^{n}\psi_{L}(m_{i})\mathbb{E}[X_{i}^{(L)}|X_{j},\ j\neq i]\bigg{|}=o_{p}(1)

in the same sense. Using the last two displays above, it suffices to show that

(4.20) 1n|i=1nψL(mi)(Xi(L)𝔼[Xi(L)|Xj,ji])|=op(1),\displaystyle\frac{1}{n}\bigg{|}\sum_{i=1}^{n}\psi_{L}(m_{i})\left(X_{i}^{(L)}-\mathbb{E}[X_{i}^{(L)}|X_{j},\ j\neq i]\right)\bigg{|}=o_{p}(1),

as nn\to\infty for fixed LL. Towards this direction, we will use the definitions of mi,jm_{i,j}, 𝒥i(L)\mathcal{J}_{i}^{(L)}, 𝒥i,j(L)\mathcal{J}_{i,j}^{(L)} from the proof of 2.10 part (a). Observe that

1n2𝔼|i=1nψL(mi)(Xi(L)𝒥i(L))|2\displaystyle\;\;\;\;\;\frac{1}{n^{2}}\mathbb{E}\bigg{|}\sum_{i=1}^{n}\psi_{L}(m_{i})\left(X_{i}^{(L)}-\mathcal{J}_{i}^{(L)}\right)\bigg{|}^{2}
L2ψL2n+1n2ik𝔼[ψL(mi)ψL(mk)(Xi(L)𝒥i(L))(Xk(L)𝒥k(L))].\displaystyle\leq\frac{L^{2}\lVert\psi_{L}\rVert_{\infty}^{2}}{n}+\frac{1}{n^{2}}\sum_{i\neq k}\mathbb{E}\left[\psi_{L}(m_{i})\psi_{L}(m_{k})(X_{i}^{(L)}-\mathcal{J}_{i}^{(L)})(X_{k}^{(L)}-\mathcal{J}_{k}^{(L)})\right].

By Markov’s inequality, in order to establish (4.20), it suffices to show that the above display converges to 0 as nn\to\infty for any fixed LL. As

𝔼[ψL(mi,k)ψL(mk)(Xi(L)𝒥i,k(L))(Xk(L)𝒥k(L))]=0\mathbb{E}\left[\psi_{L}(m_{i,k})\psi_{L}(m_{k})(X_{i}^{(L)}-\mathcal{J}_{i,k}^{(L)})(X_{k}^{(L)}-\mathcal{J}_{k}^{(L)})\right]=0

for iki\neq k, it suffices to show that

1n2ik𝔼|𝒥i(L)𝒥i,k(L)|=o(1),1n2ik𝔼|ψL(mi)ψL(mi,k)|=o(1).\frac{1}{n^{2}}\sum_{i\neq k}\mathbb{E}|\mathcal{J}_{i}^{(L)}-\mathcal{J}_{i,k}^{(L)}|=o(1),\qquad\frac{1}{n^{2}}\sum_{i\neq k}\mathbb{E}|\psi_{L}(m_{i})-\psi_{L}(m_{i,k})|=o(1).

The left hand display is what we bounded in (4). As |ψL(mi)ψL(mi,k)|ψL|mimi,k||\psi_{L}(m_{i})-\psi_{L}(m_{i,k})|\leq\lVert\psi_{L}^{\prime}\rVert_{\infty}|m_{i}-m_{i,k}|, the right hand display above follows directly from (4.17). ∎

5. Appendix

In this Section, we will prove the auxiliary lemmas from earlier in the paper. Section 5.1 collects all results on the properties of the base measure μ\mu, and Section 5.2 contains some general probabilistic convergence results.

5.1. Proofs of Lemmas 1.7, 2.2, and 2.3

Proof of 1.7.

With 𝔳θ,B,μ(x)=θx2+Bxxβ(x)+α(β(x))\mathfrak{v}_{\theta,B,\mu}(x)=\theta x^{2}+Bx-x\beta(x)+\alpha(\beta(x)) as in the statement of the lemma, differentiation gives 𝔳θ,B,μ(x)=2θx+Bβ(x)\mathfrak{v}^{\prime}_{\theta,B,\mu}(x)=2\theta x+B-\beta(x). Using 2.3 part (ii) we get limx±𝔳θ,B,μ(x)=±\lim_{x\to\pm\infty}\mathfrak{v}^{\prime}_{\theta,B,\mu}(x)=\pm\infty (since p2)p\geq 2), and so the continuous function 𝔳θ,B,μ(.)\mathfrak{v}_{\theta,B,\mu}(.) attains its global maximizers on \mathbb{R}, and any maximizer (local or global) satisfies 𝔳θ,B,μ(x)=2θx+Bβ(x)=0\mathfrak{v}^{\prime}_{\theta,B,\mu}(x)=2\theta x+B-\beta(x)=0, which is equivalent to solving 𝔳~θ,B,μ(x)=0\tilde{\mathfrak{v}}_{\theta,B,\mu}(x)=0, where

(5.1) 𝔳~θ,B,μ(x):=xα(2θx+B),𝔳~θ,B,μ(x):=12θα′′(2θx+B).\tilde{\mathfrak{v}}_{\theta,B,\mu}(x):=x-\alpha^{\prime}(2\theta x+B),\quad\tilde{\mathfrak{v}}^{\prime}_{\theta,B,\mu}(x):=1-2\theta\alpha^{\prime\prime}(2\theta x+B).

(i) Here B=0B=0, and symmetry of μ\mu gives α(0)=𝔳~θ,0,μ(0)=0\alpha^{\prime}(0)=\tilde{\mathfrak{v}}_{\theta,0,\mu}(0)=0. To show that 0 is the only root of 𝔳~θ,0,μ()\tilde{\mathfrak{v}}_{\theta,0,\mu}(\cdot) (and hence the unique maximizer of 𝔳θ,0,μ\mathfrak{v}_{\theta,0,\mu}), using symmetry of μ\mu it suffices to show that 𝔳~θ,0,μ\tilde{\mathfrak{v}}_{\theta,0,\mu} does not have any other roots on (0,).(0,\infty). To this effect, using (1.29) it follows that α′′(.)\alpha^{\prime\prime}(.) is non-increasing on (0,)(0,\infty), and so 𝔳~θ,0,μ\tilde{\mathfrak{v}}_{\theta,0,\mu} is convex using (5.1). Since 𝔳~θ,0,μ(0)=0\tilde{\mathfrak{v}}^{\prime}_{\theta,0,\mu}(0)=0, it follows that 0 is also a global minimizer of 𝔳~θ,0,μ()\tilde{\mathfrak{v}}_{\theta,0,\mu}(\cdot), and so 𝔳~θ,0,μ\tilde{\mathfrak{v}}_{\theta,0,\mu} is non-positive. If there exists a positive root x0x_{0} of 𝔳~θ,0,μ()\tilde{\mathfrak{v}}_{\theta,0,\mu}(\cdot), then by convexity (and symmetry) we have 𝔳~θ,0,μ0\tilde{\mathfrak{v}}_{\theta,0,\mu}\equiv 0 on [x0,x0][-x_{0},x_{0}]. But this implies α()\alpha^{\prime}(\cdot) is linear on this domain, and so α()\alpha(\cdot) must be a quadratic, which is only possible only if μ\mu is a Gaussian. This contradicts (1.6), and hence completes the proof of part (i).

(ii) By symmetry, it suffices to consider the case B>0B>0. Comparing xx with x-x and using the symmetry of μ\mu, it follows that all global maximizers lie in [0,)[0,\infty). Also in this case α(B)>0\alpha^{\prime}(B)>0, which implies 𝔳~θ,B,μ(0)<0\tilde{\mathfrak{v}}_{\theta,B,\mu}(0)<0. As limx𝔳~θ,B,μ(x)=\lim\limits_{x\to\infty}\tilde{\mathfrak{v}}_{\theta,B,\mu}(x)=\infty by 2.3 part (i), 𝔳~θ,B,μ()\tilde{\mathfrak{v}}_{\theta,B,\mu}(\cdot) either has a unique positive root, or at least 33 positive roots. If the latter holds, using (5.1) α′′()\alpha^{\prime\prime}(\cdot) must have two positive roots (x1,x2)(x_{1},x_{2}), which on using (1.29) gives that α′′′()0\alpha^{\prime\prime\prime}(\cdot)\equiv 0 on the interval [x1,x2][x_{1},x_{2}]. As in part (i), this implies that μ\mu is Gaussian, a contradiction to (1.6). Thus 𝔳()\mathfrak{v}(\cdot) has a unique positive maximizer tθ,B,μt_{\theta,B,\mu}.

(iii) In this case 𝔳~θ,B,μ(0)=0\tilde{\mathfrak{v}}_{\theta,B,\mu}(0)=0 and 𝔳~θ,B,μ(0)<0\tilde{\mathfrak{v}}_{\theta,B,\mu}^{\prime}(0)<0. Therefore, 𝔳~θ,B,μ()\tilde{\mathfrak{v}}_{\theta,B,\mu}(\cdot) either has a unique positive root or at least 33 positive roots. From there we argue, similar to part (ii) above, that 𝔳~θ,0,μ()\tilde{\mathfrak{v}}_{\theta,0,\mu}(\cdot) has exactly one positive root tθ,0,μt_{\theta,0,\mu}. By symmetry, it follows that tθ,0,μ-t_{\theta,0,\mu} is the unique negative root of 𝔳~θ,0,μ()\tilde{\mathfrak{v}}_{\theta,0,\mu}(\cdot), and ±tθ,0,μ()\pm t_{\theta,0,\mu}(\cdot) are the global maximizers of 𝔳θ,0,μ\mathfrak{v}_{\theta,0,\mu}. ∎

Proof of 2.2.

The function β(.)\beta(.) is smooth (CC^{\infty}) on 𝒩\mathcal{N}, and the function γ(.)\gamma(.) is smooth on \mathbb{R}. Consequently, the function γ(β(.))\gamma(\beta(.)) is smooth on 𝒩\mathcal{N}. To verify continuity on cl(𝒩){\rm cl}(\mathcal{N}), it suffices to cover the (possible) boundary cases:

  • If a:=sup{𝒩}<a:=\sup\{\mathcal{N}\}<\infty, then limuaγ(β(u))=γ(β(a))=γ()\lim_{u\to a}\gamma(\beta(u))=\gamma(\beta(a))=\gamma(\infty).

  • If b:=inf{𝒩}>b:=\inf\{\mathcal{N}\}>-\infty, then limubγ(β(u))=γ(β(b))=γ()\lim_{u\to b}\gamma(\beta(u))=\gamma(\beta(b))=\gamma(-\infty).

We will only prove the first case, as the other case follows similarly. Note that,

limuaβ(u)=limuaμβ(u)=δa,\lim_{u\to a}\beta(u)=\infty\Rightarrow\lim_{u\to a}\mu_{\beta(u)}=\delta_{a},

where the second limit is in weak topology. Further,

(5.2) lim infuaγ(β(u))=lim infusup{𝒩}D(μβ(u)|μ)D(δa|μ)=γ()\liminf_{u\to a}\gamma(\beta(u))=\liminf_{u\to\sup\{\mathcal{N}\}}D(\mu_{\beta(u)}|\mu)\geq D(\delta_{a}|\mu)=\gamma(\infty)

by the lower semi-continuity of Kullback-Leibler divergence. If μ({a})=0\mu(\{a\})=0, then γ()=\gamma(\infty)=\infty, and (5.2) yields the desired conclusion. If μ({a})>0\mu(\{a\})>0, then γ()=logμ({a})\gamma(\infty)=-\log{\mu(\{a\})}. Also, for any θ\theta\in\mathbb{R}, we have

α(θ)=logexp(θx)𝑑μ(x)θa+logμ({a}).\alpha(\theta)=\log{\int\exp(\theta x)\,d\mu(x)}\geq\theta a+\log{\mu(\{a\})}.

For all uu such that β(u)>0\beta(u)>0 (which holds for all uu close to aa), this gives

γ(β(u))=uβ(u)α(β(u))uβ(u)aβ(u)logμ({a})logμ({a})=γ().\gamma(\beta(u))=u\beta(u)-\alpha(\beta(u))\leq u\beta(u)-a\beta(u)-\log{\mu(\{a\})}\leq\log{\mu(\{a\})}=\gamma(\infty).

Combining the above display with (5.2) gives limuaγ(β(u))=γ(),\lim_{u\to a}\gamma(\beta(u))=\gamma(\infty), as desired. ∎

Proof of 2.3.

(i) We prove limθα(θ)θ1p1=0\lim_{\theta\to\infty}\frac{\alpha^{\prime}(\theta)}{\theta^{\frac{1}{p-1}}}=0, noting that the proof of the other limit is similar. To this effect, we consider the following two cases separately:

  • μ(0,)>0\mu(0,\infty)>0.

    Fixing θ>0\theta>0 and δ>0\delta>0, we have:

    |α(θ)|θ1p1\displaystyle\frac{|\alpha^{\prime}(\theta)|}{\theta^{\frac{1}{p-1}}} |y|exp(θy)𝑑μ(y)θ1p1exp(θy)𝑑μ(y)\displaystyle\leq\frac{\int_{\mathbb{R}}|y|\exp(\theta y)\,d\mu(y)}{\theta^{\frac{1}{p-1}}\int_{\mathbb{R}}\exp(\theta y)\,d\mu(y)}
    δθ1p1|y|δθ1p1exp(θy)𝑑μ(y)+|y|δθ1p1|y|exp(θy)𝑑μ(y)θ1p1exp(θy)𝑑μ(y)\displaystyle\leq\frac{\delta\theta^{\frac{1}{p-1}}\int_{|y|\leq\delta\theta^{\frac{1}{p-1}}}\exp(\theta y)\,d\mu(y)+\int_{|y|\geq\delta\theta^{\frac{1}{p-1}}}|y|\exp(\theta y)\,d\mu(y)}{\theta^{\frac{1}{p-1}}\int\exp(\theta y)\,d\mu(y)}
    δ+|y|exp(|y|pδ1p)𝑑μ(y)θ1p1exp(θy)𝑑μ(y),\displaystyle\leq\delta+\frac{\int_{\mathbb{R}}|y|\exp(|y|^{p}\delta^{1-p})d\mu(y)}{\theta^{\frac{1}{p-1}}\int_{\mathbb{R}}\exp(\theta y)d\mu(y)},

    where we use the bound |θy||y|pδ1p|\theta y|\leq|y|^{p}\delta^{1-p} on the set |y|δ|θ|1p1|y|\geq\delta|\theta|^{\frac{1}{p-1}}. Letting θ\theta\to\infty we have exp(θy)𝑑μ(y)\int_{\mathbb{R}}\exp(\theta y)d\mu(y)\to\infty, as μ(0,)>0\mu(0,\infty)>0. Since the numerator in the second term in the display above is finite invoking (1.6), the second term above converges to 0 as θ\theta\to\infty, allowing us to conclude lim supθ|α(θ)|θ1p1δ\limsup_{\theta\to\infty}\frac{|\alpha^{\prime}(\theta)|}{\theta^{\frac{1}{p-1}}}\leq\delta. Since δ>0\delta>0 is arbitrary, the desired limit follows.

  • μ(0,)=0\mu(0,\infty)=0.
    In this case, α(θ)0\alpha^{\prime}(\theta)\leq 0. Since α()\alpha^{\prime}(\cdot) is non-decreasing, limθα(θ)\lim_{\theta\to\infty}\alpha^{\prime}(\theta) exists as a finite (non-positive) number. Consequently we have limθα(θ)θ1p1=0\lim_{\theta\to\infty}\frac{\alpha^{\prime}(\theta)}{\theta^{\frac{1}{p-1}}}=0.

(ii) We only study the case when xsup{𝒩}x\to\sup\{\mathcal{N}\}. If sup{𝒩}<\sup\{\mathcal{N}\}<\infty, then the conclusion is immediate as the denominator converges to a finite number while the numerator diverges. Therefore, we only focus on the case sup{𝒩}=\sup\{\mathcal{N}\}=\infty. To this effect, fixing M>0M>0 using part (i) gives that for all xx large enough (depending on MM) we have

α(Mx)x1p1Mxβ(x1p1).\alpha^{\prime}(Mx)\leq x^{\frac{1}{p-1}}\Leftrightarrow Mx\leq\beta\left(x^{\frac{1}{p-1}}\right).

Taking limit gives

lim infxβ(x1p1)xM.\liminf_{x\to\infty}\frac{\beta\left(x^{\frac{1}{p-1}}\right)}{x}\geq M.

Since MM is arbitrary, we conclude the desired conclusion follows. ∎

5.2. Proofs of 2.1 and Lemmas 2.7 and 2.8

Proof of 2.1.

(i) and (ii) These are direct consequences of [10, Proposition 2.19] and [6, Lemma 2.2].

(iii) With Sym[|W|]\mathrm{Sym}[|W|] as in 1.6, we have

𝔼[Sym[|W|](U1,,Uv)]q\displaystyle\mathbb{E}\left[\mathrm{Sym}[|W|](U_{1},\ldots,U_{v})\right]^{q} =𝔼|1v!σ𝒮v(a,b)E(H)|W|(Uσ(a),Uσ(b))|q\displaystyle=\mathbb{E}\bigg{|}\frac{1}{v!}\sum_{\sigma\in\mathcal{S}_{v}}\prod_{(a,b)\in E(H)}|W|(U_{\sigma(a)},U_{\sigma(b)})\bigg{|}^{q}
1v!σ𝒮v𝔼(a,b)E(H)|W|q(Uσ(a),Uσ(b))WqΔq|E(H)|,\displaystyle\leq\frac{1}{v!}\sum_{\sigma\in\mathcal{S}_{v}}\mathbb{E}\prod_{(a,b)\in E(H)}|W|^{q}(U_{\sigma(a)},U_{\sigma(b)})\leq\lVert W\rVert_{q\Delta}^{q|E(H)|},

where the first inequality uses Lyapunov’s inequality, and the second inequality follows from 2.1 part (ii), with WW replaced by |W|q|W|^{q}. ∎

Proof of 2.7.

By using (2.25), it follows that the sequence {ξn}n1\{\xi_{n}\}_{n\geq 1} is tight. Passing to a subsequence, w.l.o.g. we can assume ξndξ\xi_{n}\stackrel{{\scriptstyle d}}{{\to}}\xi_{\infty}, where (ξ)=1\mathbb{P}(\xi_{\infty}\in\mathcal{F})=1 (as \mathcal{F} is closed). By the Portmanteau Theorem,

(5.3) (ξKc)lim supn(ξnKc)=0.\displaystyle\mathbb{P}(\xi_{\infty}\in K^{c})\leq\limsup\limits_{n\to\infty}\mathbb{P}(\xi_{n}\in K^{c})=0.

Next we will show that g(ξn)dg(ξ)g(\xi_{n})\stackrel{{\scriptstyle d}}{{\to}}g(\xi_{\infty}). Towards this direction let Hg()H\subseteq g(\mathcal{F}) be a closed set. We will write g1(H)g^{-1}(H) to denote the inverse image of the set HH under gg. Another application of the Portmanteau Theorem implies:

lim supn(g(ξn)H,ξnK)=\displaystyle\limsup\limits_{n\to\infty}\mathbb{P}(g(\xi_{n})\in H,\ \xi_{n}\in K)= lim supn(ξng1(H)K)\displaystyle\limsup\limits_{n\to\infty}\mathbb{P}(\xi_{n}\in g^{-1}(H)\cap K)
(5.4) \displaystyle\geq (ξg1(H)K).\displaystyle\mathbb{P}(\xi_{\infty}\in g^{-1}(H)\cap K).

The last line uses the fact that g1(H)Kg^{-1}(H)\cap K is closed which in turn follows from the continuity of gg on KK. Finally, by (2.25) and (5.3), we have:

(5.5) lim supn|(g(ξn)H,ξnK)(g(ξn)H)|=0,\displaystyle\limsup\limits_{n\to\infty}|\mathbb{P}(g(\xi_{n})\in H,\ \xi_{n}\in K)-\mathbb{P}(g(\xi_{n})\in H)|=0,
(5.6) (ξg1(H)K)=(ξg1(H)).\displaystyle\mathbb{P}(\xi_{\infty}\in g^{-1}(H)\cap K)=\mathbb{P}(\xi_{\infty}\in g^{-1}(H)).

By combining (5.2), (5.5), and (5.6), it follows that

lim supnP(g(ξn)H)(g(ξ)H).\limsup\limits_{n\to\infty}P(g(\xi_{n})\in H)\geq\mathbb{P}(g(\xi_{\infty})\in H).

By the Portmanteau theorem, this yields g(ξn)dg(ξ)g(\xi_{n})\stackrel{{\scriptstyle d}}{{\to}}g(\xi_{\infty}). So for any ε>0\varepsilon>0 we get

lim supn(dY(g(ξn),g())ε)(dY(g(ξ),g())ε)=0\displaystyle\limsup_{n\to\infty}\mathbb{P}(d_{Y}(g(\xi_{n}),g(\mathcal{F}))\geq\varepsilon)\leq\mathbb{P}(d_{Y}(g(\xi_{\infty}),g(\mathcal{F}))\geq\varepsilon)=0

as g(ξ)g()g(\xi_{\infty})\in g(\mathcal{F}) a.s. Since ε>0\varepsilon>0 is arbitrary, dY(g(ξ),g())P0d_{Y}(g(\xi_{\infty}),g(\mathcal{F}))\stackrel{{\scriptstyle P}}{{\to}}0, as desired.

Proof of 2.8.

(i) Since lim supn𝔼|fn(U)|p<\limsup_{n\to\infty}\mathbb{E}|f_{n}(U)|^{p}<\infty, it follows that the sequence {|fn(U)|p}n1\{|f_{n}(U)|^{p^{\prime}}\}_{n\geq 1} is uniformly integrable, and so 𝔼|f(U)|p<\mathbb{E}|f_{\infty}(U)|^{p^{\prime}}<\infty. By standard approximation results, given any ε>0\varepsilon>0, there exists h:[0,1]h:[0,1]\to\mathbb{R} (depending on ε\varepsilon) such that hh is continuous on [0,1][0,1] and 𝔼|h(U)f(U)|p<ε\mathbb{E}|h(U)-f_{\infty}(U)|^{p^{\prime}}<\varepsilon. Continuous mapping theorem gives fn(U)h(U)𝐷f(U)h(U)f_{n}(U)-h(U)\overset{D}{\longrightarrow}f_{\infty}(U)-h(U). Since |fn(U)f(U)|p|f_{n}(U)-f_{\infty}(U)|^{p^{\prime}} is uniformly integrable, fnfpfhpε\lVert f_{n}-f_{\infty}\rVert_{p^{\prime}}\longrightarrow\lVert f_{\infty}-h\rVert_{p^{\prime}}\leq\varepsilon. As ε>0\varepsilon>0 is arbitrary, this completes the proof of part (a).

(ii) The conclusion follows by applying part (a) on the sequence of measures alternating between (U,f(U))(U,f(U)) and (U,g(U))(U,g(U)) along odd and even subsequences. ∎

References

  • Adamczak et al., [2019] Adamczak, R., Kotowski, M., Polaczyk, B., and Strzelecki, M. (2019). A note on concentration for polynomials in the Ising model. Electron. J. Probab., 24:Paper No. 42, 22.
  • Barra, [2009] Barra, A. (2009). Notes on Ferromagnetic p-spin and REM. Mathematical methods in the applied sciences, 32(7):783–797.
  • Basak and Mukherjee, [2017] Basak, A. and Mukherjee, S. (2017). Universality of the mean-field for the Potts model. Probab. Theory Related Fields, 168(3-4):557–600.
  • [4] Bhattacharya, B. B., Fang, X., and Yan, H. (2022a). Normal approximation and fourth moment theorems for monochromatic triangles. Random Structures & Algorithms, 60(1):25–53.
  • Bhattacharya et al., [2020] Bhattacharya, B. B., Mukherjee, S., and Mukherjee, S. (2020). The second-moment phenomenon for monochromatic subgraphs. SIAM Journal on Discrete Mathematics, 34(1):794–824.
  • [6] Bhattacharya, S., Deb, N., and Mukherjee, S. (2022b). LDP for inhomogeneous U-Statistics. arXiv preprint arXiv:2212.03944.
  • Bhattacharya et al., [2021] Bhattacharya, S., Mukherjee, R., and Ray, G. (2021). Sharp signal detection under Ferromagnetic Ising models. arXiv preprint arXiv:2110.02949.
  • Borgs et al., [2019] Borgs, C., Chayes, J., Cohn, H., and Zhao, Y. (2019). An Lp{L}^{p} theory of sparse graph convergence I: Limits, sparse random graph models, and power law distributions. Transactions of the American Mathematical Society, 372(5):3019–3062.
  • Borgs et al., [2018] Borgs, C., Chayes, J. T., Cohn, H., and Zhao, Y. (2018). An Lp{L}^{p} theory of sparse graph convergence II: Ld convergence, quotients and right convergence. The Annals of Probability, 46(1):337–396.
  • Borgs et al., [2008] Borgs, C., Chayes, J. T., Lovász, L., Sós, V. T., and Vesztergombi, K. (2008). Convergent sequences of dense graphs. I. Subgraph frequencies, metric properties and testing. Adv. Math., 219(6):1801–1851.
  • Borgs et al., [2012] Borgs, C., Chayes, J. T., Lovász, L., Sós, V. T., and Vesztergombi, K. (2012). Convergent sequences of dense graphs II. Multiway cuts and statistical physics. Ann. of Math. (2), 176(1):151–219.
  • Bresler and Nagaraj, [2019] Bresler, G. and Nagaraj, D. (2019). Stein’s method for stationary distributions of Markov chains and application to Ising models. Ann. Appl. Probab., 29(5):3230–3265.
  • Chatterjee, [2007] Chatterjee, S. (2007). Estimation in spin glasses: a first step. Ann. Statist., 35(5):1931–1946.
  • Comets and Gidas, [1991] Comets, F. and Gidas, B. (1991). Asymptotics of maximum likelihood estimators for the Curie-Weiss model. Ann. Statist., 19(2):557–578.
  • Daskalakis et al., [2020] Daskalakis, C., Dikkala, N., and Panageas, I. (2020). Logistic regression with peer-group effects via inference in higher-order Ising models. In International Conference on Artificial Intelligence and Statistics, pages 3653–3663. PMLR.
  • Deb et al., [2020] Deb, N., Mukherjee, R., Mukherjee, S., and Yuan, M. (2020). Detecting structured signals in Ising models. The Annals of Applied Probability (To Appear).
  • Deb and Mukherjee, [2023] Deb, N. and Mukherjee, S. (2023). Fluctuations in mean-field Ising models. The Annals of Applied Probability, 33(3):1961 – 2003.
  • Dembo and Montanari, [2010] Dembo, A. and Montanari, A. (2010). Gibbs measures and phase transitions on sparse random graphs. Brazilian Journal of Probability and Statistics, 24(2):137–211.
  • Ellis et al., [1976] Ellis, R. S., Monroe, J. L., and Newman, C. M. (1976). The GHS and other correlation inequalities for a class of even Ferromagnets. Comm. Math. Phys., 46(2):167–182.
  • Ellis and Newman, [1978] Ellis, R. S. and Newman, C. M. (1978). The statistics of Curie-Weiss models. J. Statist. Phys., 19(2):149–161.
  • Frieze and Kannan, [1999] Frieze, A. and Kannan, R. (1999). Quick approximation to matrices and applications. Combinatorica, 19(2):175–220.
  • Gheissari et al., [2019] Gheissari, R., Hongler, C., and Park, S. (2019). Ising model: Local spin correlations and conformal invariance. Communications in Mathematical Physics, 367(3):771–833.
  • Heringa et al., [1989] Heringa, J., Blöte, H., and Hoogland, A. (1989). Phase transitions in self-dual Ising models with multispin interactions and a field. Physical review letters, 63(15):1546.
  • Ising, [1925] Ising, E. (1925). Beitrag zur theorie des Ferromagnetismus. Zeitschrift für Physik, 31(1):253–258.
  • Lacker et al., [2022] Lacker, D., Mukherjee, S., and Yeung, L. C. (2022). Mean field approximations via log-concavity. arXiv preprint arXiv:2206.01260.
  • Liu et al., [2019] Liu, J., Sinclair, A., and Srivastava, P. (2019). The Ising partition function: Zeros and deterministic approximation. Journal of Statistical Physics, 174(2):287–315.
  • Lovász, [2012] Lovász, L. (2012). Large networks and graph limits, volume 60 of American Mathematical Society Colloquium Publications. American Mathematical Society, Providence, RI.
  • Mukherjee et al., [2021] Mukherjee, S., Son, J., and Bhattacharya, B. B. (2021). Fluctuations of the magnetization in the p-spin Curie–Weiss model. Communications in Mathematical Physics, 387(2):681–728.
  • Mukherjee et al., [2022] Mukherjee, S., Son, J., and Bhattacharya, B. B. (2022). Estimation in tensor ising models. Information and Inference: A Journal of the IMA, 11(4):1457–1500.
  • Sly and Sun, [2014] Sly, A. and Sun, N. (2014). Counting in two-spin models on dd-regular graphs. Ann. Probab., 42(6):2383–2416.
  • Suzuki and Fisher, [1971] Suzuki, M. and Fisher, M. E. (1971). Zeros of the partition function for the Heisenberg, Ferroelectric, and general Ising models. Journal of Mathematical Physics, 12(2):235–246.
  • Turban, [2016] Turban, L. (2016). One-dimensional Ising model with multispin interactions. Journal of Physics A: Mathematical and Theoretical, 49(35):355002.
  • Yamashiro et al., [2019] Yamashiro, Y., Ohkuwa, M., Nishimori, H., and Lidar, D. A. (2019). Dynamics of reverse annealing for the fully connected p-spin model. Physical Review A, 100(5):052321.