This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Large-width functional asymptotics for deep Gaussian neural networks

Daniele Bracale1, Stefano Favaro1,2, Sandra Fortini3, Stefano Peluchetti4
1 University of Torino, 2 Collegio Carlo Alberto, 3 Bocconi University, 4 Cogent Labs
Abstract

In this paper, we consider fully-connected feed-forward deep neural networks where weights and biases are independent and identically distributed according to Gaussian distributions. Extending previous results (Matthews et al., 2018a; b; Yang, 2019) we adopt a function-space perspective, i.e. we look at neural networks as infinite-dimensional random elements on the input space I{\mathbb{R}}^{I}. Under suitable assumptions on the activation function we show that: i) a network defines a continuous stochastic process on the input space I\mathbb{R}^{I}; ii) a network with re-scaled weights converges weakly to a continuous Gaussian Process in the large-width limit; iii) the limiting Gaussian Process has almost surely locally γ\gamma-Hölder continuous paths, for 0<γ<10<\gamma<1. Our results contribute to recent theoretical studies on the interplay between infinitely-wide deep neural networks and Gaussian Processes by establishing weak convergence in function-space with respect to a stronger metric.

1 Introduction

The interplay between infinitely-wide deep neural networks and classes of Gaussian Processes has its origins in the seminal work of Neal (1995), and it has been the subject of several theoretical studies. See, e.g., Der & Lee (2006), Lee et al. (2018), Matthews et al. (2018a; b), Yang (2019) and references therein. Let consider a fully-connected feed-forward neural network with re-scaled weights composed of L1L\geq 1 layers of widths n1,,nLn_{1},\dots,n_{L}, i.e.

fi(1)(x)=j=1Iwi,j(1)xj+bi(1)\displaystyle f_{i}^{(1)}(x)=\sum_{j=1}^{I}w_{i,j}^{(1)}x_{j}+b_{i}^{(1)}\quad i=1,,n1\displaystyle i=1,\dots,n_{1} (1)
fi(l)(x)=1nl1j=1nl1wi,j(l)ϕ(fj(l1)(x))+bi(l)\displaystyle f_{i}^{(l)}(x)=\frac{1}{\sqrt{n_{l-1}}}\sum_{j=1}^{n_{l-1}}w_{i,j}^{(l)}\phi(f_{j}^{(l-1)}(x))+b_{i}^{(l)}\quad l=2,,L,i=1,,nl\displaystyle l=2,\ldots,L,\ \ i=1,\dots,n_{l}

where ϕ\phi is a non-linearity and xIx\in\mathbb{R}^{I} is a real-valued input of dimension II\in{\mathbb{N}}. Neal (1995) considered the case L=2L=2, a finite number kk\in{\mathbb{N}} of fixed distinct inputs (x(1),,x(k))(x^{(1)},\ldots,x^{(k)}), with each x(r)Ix^{(r)}\in\mathbb{R}^{I}, and weights wi,j(l)w_{i,j}^{(l)} and biases bi(l)b_{i}^{(l)} independently and identically distributed (iid) as Gaussian distributions. Under appropriate assumptions on the activation ϕ\phi Neal (1995) showed that: i) for a fixed unit ii, the kk-dimensional random vector (fi(2)(x(1)),,fi(2)(x(k)))(f^{(2)}_{i}(x^{(1)}),\dots,f^{(2)}_{i}(x^{(k)})) converges in distribution, as the width n1n_{1} goes to infinity, to a kk-dimensional Gaussian random vector; ii) the large-width convergence holds jointly over finite collections of ii’s and the limiting kk-dimensional Gaussian random vectors are independent across the index ii. These results concerns neural networks with a single hidden layer, but Neal (1995) also includes preliminary considerations on infinitely-wide deep neural networks. More recent works, such as Lee et al. (2018), established convergence results corresponding to Neal (1995) results i) and ii) for deep neural networks under the assumption that widths n1,,nLn_{1},\dots,n_{L} go to infinity sequentially over network layers. Matthews et al. (2018a; b) extended the work of Neal (1995); Lee et al. (2018) by assuming that the width nn grows to infinity jointly over network layers, instead of sequentially, and by establishing joint convergence over all ii and countable distinct inputs. The joint growth over the layers is certainly more realistic than the sequential growth, since the infinite Gaussian limit is considered as an approximation of a very wide network. We operate in the same setting of Matthews et al. (2018b), hence from here onward n1n\geq 1 denotes the common layer width, i.e. n1,,nL=nn_{1},\dots,n_{L}=n. Finally, similar large-width limits have been established for a great variety of neural network architectures, see for instance Yang (2019).

The assumption of a countable number of fixed distinct inputs is the common trait of the literature on large-width asymptotics for deep neural networks. Under this assumption, the large-width limit of a network boils down to the study of the large-width asymptotic behavior of the kk-dimensional random vector (fi(l)(x(1)),,fi(l)(x(k)))(f^{(l)}_{i}(x^{(1)}),\dots,f^{(l)}_{i}(x^{(k)})) over i1i\geq 1 for finite kk. Such limiting finite-dimensional distributions describe the large-width distribution of a neural network a priori over any dataset, which is finite by definition. When the limiting distribution is Gaussian, as it often is, this immediately paves the way to Bayesian inference for the limiting network. Such an approach is competitive with the more standard stochastic gradient descent training for the fully-connected architectures object of our study (Lee et al., 2020). However, knowledge of the limiting finite-dimensional distributions is not enough to infer properties of the limiting neural network which are inherently uncountable such as the continuity of the limiting neural network, or the distribution of its maximum over a bounded interval. Results in this direction give a more complete understanding of the assumptions being made a priori, and hence whether a given model is appropriate for a specific application. For instance, Van Der Vaart & Van Zanten (2011) shows that for Gaussian Processes the function smoothness under the prior should match the smoothness of the target function for satisfactory inference performance.

In this paper we thus consider a novel, and more natural, perspective to the study of large-width limits of deep neural networks. This is an infinite-dimensional perspective where, instead of fixing a countable number of distinct inputs, we look at fi(l)(x,n)f^{(l)}_{i}(x,n) as a stochastic process over the input space I{\mathbb{R}}^{I}. Under this perspective, establishing large-width limits requires considerable care and, in addition, it requires to show the existence of both the stochastic process induced by the neural network and its large-width limit. We start by proving the existence of i) a continuous stochastic process, indexed by the network width nn, corresponding to the fully-connected feed-forward deep neural network; ii) a continuous Gaussian Process corresponding to the infinitely-wide limit of the deep neural network. Then, we prove that the stochastic process i) converges weakly, as the width nn goes to infinity, to the Gaussian Process ii) jointly over all units ii. As a by-product of our results, we show that the limiting Gaussian Process has almost surely locally γ\gamma-Hölder continuous paths, for 0<γ<10<\gamma<1. To make the exposition self-contained we include an alternative proof of the main result of Matthews et al. (2018a; b), i.e. the finite-dimensional limit for full-connected neural networks. The major difference between our proof and that of Matthews et al. (2018b) is due to the use of the characteristic function to establish convergence in distribution, instead of relying on a CLT (Blum et al., 1958) for exchangeable sequences.

The paper is structured as follows. In Section 2 we introduce the setting under which we operate, whereas in Section 3 we present a high-level overview of the approach taken to establish our results. Section 4 contains the core arguments of the proof of our large-width functional limit for deep Gaussian neural networks, which are spelled out in detail in the supplementary material (SM). We conclude in Section 5.

2 Setting

Let (Ω,,)(\Omega,\mathcal{H},\mathbb{P}) be the probability space on which all random elements of interest are defined. Furthermore, let N(μ,σ2)N(\mu,\sigma^{2}) denote a Gaussian distribution with mean μ\mu\in{\mathbb{R}} and strictly positive variance σ2+\sigma^{2}\in{\mathbb{R}}_{+}, and let Nk(𝐦,Σ)N_{k}(\mathbf{m},\Sigma) be a kk-dimensional Gaussian distribution with mean 𝐦k\mathbf{m}\in{\mathbb{R}}^{k} and covariance matrix Σk×k\Sigma\in{\mathbb{R}}^{k\times k}. In particular, k{\mathbb{R}}^{k} is equipped with k\|\cdot\|_{{\mathbb{R}}^{k}}, the euclidean norm induced by the inner product ,k\langle\cdot,\cdot\rangle_{{\mathbb{R}}^{k}}, and =×i=1{\mathbb{R}}^{\infty}=\times_{i=1}^{\infty}{\mathbb{R}} is equipped with \|\cdot\|_{{\mathbb{R}}^{\infty}}, the norm induced by the distance d(a,b)=i1ξ(|aibi|)/2id(\textbf{a},\textbf{b})_{\infty}=\sum_{i\geq 1}\xi(|a_{i}-b_{i}|)/2^{i} for a,b\textbf{a},\textbf{b}\in{\mathbb{R}}^{\infty} (Theorem 3.38 of Aliprantis & Border (2006)), where ξ(t)=t/(1+t)\xi(t)=t/(1+t) for all real values t0t\geq 0. Note that (,||)({\mathbb{R}},|\cdot|) and (,)({\mathbb{R}}^{\infty},\|\cdot\|_{{\mathbb{R}}^{\infty}}) are Polish spaces, i.e. separable and complete metric spaces (Corollary 3.39 of Aliprantis & Border (2006)). We choose dd_{\infty} since it generates a topology that coincides with the product topology (line 5 of the proof of Theorem 3.36 of Aliprantis & Border (2006)). The space (S,d)(S,d) will indicate a generic Polish space such as {\mathbb{R}} or {\mathbb{R}}^{\infty} with the associated distance. We indicate with SIS^{{\mathbb{R}}^{I}} the space of functions from I{\mathbb{R}}^{I} into SS and C(I;S)SIC({\mathbb{R}}^{I};S)\subset S^{{\mathbb{R}}^{I}} the space of continuous functions from I{\mathbb{R}}^{I} into SS. Let ωi,j(l)\omega_{i,j}^{(l)} be the random weights of the ll-th layer, and assume that they are iid as N(0,σω2)N(0,\sigma_{\omega}^{2}), i.e.

φωi,j(l)(t)=𝔼[eitωi,j(l)]=e12σω2t2\displaystyle\varphi_{\omega_{i,j}^{(l)}}(t)={\mathbb{E}}[e^{{\rm i}t\omega_{i,j}^{(l)}}]=e^{-\frac{1}{2}\sigma_{\omega}^{2}t^{2}} (2)

is the characteristic function of ωi,j(l)\omega_{i,j}^{(l)}, for i1i\geq 1, j=1,,nj=1,\dots,n and l1l\geq 1. Let bi(l)b^{(l)}_{i} be the random biases of the ll-th layer, and assume that they are iid as N(0,σb2)N(0,\sigma_{b}^{2}), i.e.

φbi(l)(t)=𝔼[eitbi(l)]=e12σb2t2\displaystyle\varphi_{b_{i}^{(l)}}(t)={\mathbb{E}}[e^{{\rm i}tb_{i}^{(l)}}]=e^{-\frac{1}{2}\sigma_{b}^{2}t^{2}} (3)

is the characteristic function of bi(l)b_{i}^{(l)}, for i1i\geq 1 and l1l\geq 1. Weights ωi,j(l)\omega^{(l)}_{i,j} are independent of biases bi(l)b_{i}^{(l)}, for any i1,j=1,,ni\geq 1,j=1,\dots,n and l1l\geq 1. Let ϕ:\phi:{\mathbb{R}}\to\ {\mathbb{R}} denote a continuous non-linearity. For the finite-dimensional limit we will assume the polynomial envelop condition

|ϕ(s)|a+b|s|m,\displaystyle|\phi(s)|\leq a+b|s|^{m}, (4)

for any ss\in{\mathbb{R}} and some real values a,b>0a,b>0 and m1m\geq 1. For the functional limit we will use a stronger assumption on ϕ\phi, assuming ϕ\phi to be Lipschitz on {\mathbb{R}} with Lipschitz constant LϕL_{\phi}.

Let ZZ be a stochastic process on I\mathbb{R}^{I}, i.e. for each xIx\in\mathbb{R}^{I}, Z(x)Z(x) is defined on (Ω,,)(\Omega,\mathcal{H},\mathbb{P}) and it takes values in SS. For any kk\in{\mathbb{N}} and x1,,xkIx_{1},\ldots,x_{k}\in\mathbb{R}^{I}, let Px1,,xkZ=(Z(x1)A1,,Z(xk)Ak)P^{Z}_{x_{1},\ldots,x_{k}}={\mathbb{P}}(Z(x_{1})\in A_{1},\ldots,Z(x_{k})\in A_{k}), with A1,,Ak(S)A_{1},\ldots,A_{k}\in\mathcal{B}(S). Then, the family of finite-dimensional distributions of Z(x)Z(x) is defined as the family of distributions {Px1,,xkZ : x1,,xkI and k}\{P^{Z}_{x_{1},\ldots,x_{k}}\text{ : }x_{1},\ldots,x_{k}\in\mathbb{R}^{I}\text{ and }k\in{\mathbb{N}}\}. See, e.g., Billingsley (1995). In Definition 1 and Definition 2 we look at the deep neural network (1) as a stochastic process on input space I\mathbb{R}^{I}, that is a stochastic process whose finite-dimensional distributions are determined by a finite number kk\in{\mathbb{N}} of fixed distinct inputs (x(1),,x(k))(x^{(1)},\ldots,x^{(k)}), with each x(r)Ix^{(r)}\in\mathbb{R}^{I}. The existence of the stochastic processes of Definition 1 and Definition 2 will be thoroughly discussed in Section 3.

Definition 1.

For any fixed l2l\geq 2 and i1i\geq 1, let (fi(l)(n))n1(f_{i}^{(l)}(n))_{n\geq 1} be a sequence of stochastic processes on I{\mathbb{R}}^{I}. That is, fi(l)(n):If_{i}^{(l)}(n):{\mathbb{R}}^{I}\rightarrow{\mathbb{R}}, with xfi(l)(x,n)x\mapsto f_{i}^{(l)}(x,n), is a stochastic process on I{\mathbb{R}}^{I} whose finite-dimensional distributions are the laws, for any kk\in{\mathbb{N}} and x(1),,x(k)Ix^{(1)},\ldots,x^{(k)}\in{\mathbb{R}}^{I}, of the kk-dimensional random vectors

fi(1)(X,n)=fi(1)(X)=[fi(1)(x(1),n),,fi(1)(x(k),n)]T=j=1Iωi,j(1)xj+bi(1)1f^{(1)}_{i}({\textbf{X},n})=f^{(1)}_{i}({\textbf{X}})=[f^{(1)}_{i}(x^{(1)},n),\dots,f^{(1)}_{i}(x^{(k)},n)]^{T}=\sum_{j=1}^{I}\omega_{i,j}^{(1)}\textbf{x}_{j}+b_{i}^{(1)}\textbf{1} (5)
fi(l)(X,n)=[fi(l)(x(1),n),,fi(l)(x(k),n)]T=1nj=1nωi,j(l)(ϕfj(l1)(X,n))+bi(l)1f^{(l)}_{i}({\textbf{X},n})=[f^{(l)}_{i}(x^{(1)},n),\dots,f^{(l)}_{i}(x^{(k)},n)]^{T}=\frac{1}{\sqrt{n}}\sum_{j=1}^{n}\omega_{i,j}^{(l)}(\phi\bullet f^{(l-1)}_{j}({\textbf{X},n}))+b_{i}^{(l)}\textbf{1} (6)

where X=[x(1),,x(k)]I×k\textbf{X}=[x^{(1)},\dots,x^{(k)}]\in{\mathbb{R}}^{I\times k} is a I×kI\times k input matrix of kk distinct inputs x(r)Ix^{(r)}\in{\mathbb{R}}^{I}, 1 denotes a vector of dimension k×1k\times 1 of 11’s, xj\textbf{x}_{j} denotes the jj-th row of the input matrix and ϕX\phi\bullet\textbf{X} is the element-wise application of ϕ\phi to the matrix X. Let fr,i(l)(X,n)=1rTfi(l)(X,n)=fi(l)(x(r),n)f^{(l)}_{r,i}({\textbf{X},n})=\textbf{1}^{T}_{r}f^{(l)}_{i}(\textbf{X},n)=f_{i}^{(l)}(x^{(r)},n) denote the rr-th component of the k×1k\times 1 vector fi(l)(X,n)f^{(l)}_{i}({\textbf{X},n}), being 1r\textbf{1}_{r} a vector of dimension k×1k\times 1 with 11 in the rr-the entry and 0 elsewhere.

Remark: in contrast to (1), we have defined (5)-(6) over an infinite number of units i1i\geq 1 over each layer ll, but the dependency on each previous layer l1l-1 remains limited to the first nn components.

Definition 2.

For any fixed l2l\geq 2, let (F(l)(n))n1(\textbf{F}^{(l)}(n))_{n\geq 1} be a sequence of stochastic processes on I\mathbb{R}^{I}. That is, F(l)(n):I\textbf{F}^{(l)}(n):{\mathbb{R}}^{I}\rightarrow{\mathbb{R}}^{\infty}, with xF(l)(x,n)x\mapsto\textbf{F}^{(l)}(x,n), is a stochastic process on I{\mathbb{R}}^{I} whose finite-dimensional distributions are the laws, for any kk\in{\mathbb{N}} and x(1),,x(k)Ix^{(1)},\ldots,x^{(k)}\in{\mathbb{R}}^{I}, of the kk-dimensional random vectors

{F(1)(X)=[f1(1)(X),f2(1)(X),]TF(l)(X,n)=[f1(l)(X,n),f2(l)(X,n),]T.\left\{\begin{array}[]{@{}l@{}}\textbf{F}^{(1)}(\textbf{X})=\Big{[}f_{1}^{(1)}(\textbf{X}),f_{2}^{(1)}(\textbf{X}),\dots\Big{]}^{T}\\ \textbf{F}^{(l)}(\textbf{X},n)=\Big{[}f_{1}^{(l)}(\textbf{X},n),f_{2}^{(l)}(\textbf{X},n),\dots\Big{]}^{T}.\end{array}\right.

Remark: for kk inputs, the vector F(l)(X,n)\textbf{F}^{(l)}(\textbf{X},n) is an ×k\infty\times k array, and for a single input x(r)x^{(r)}, F(l)(x(r),n)\textbf{F}^{(l)}(x^{(r)},n) can be written as [f1(l)(x(r),n),f2(l)(x(r),n),]T×1[f^{(l)}_{1}(x^{(r)},n),f^{(l)}_{2}(x^{(r)},n),\dots]^{T}\in{\mathbb{R}}^{\infty\times 1}. We define Fr(l)(X,n)=F(l)(x(r),n)\textbf{F}_{r}^{(l)}(\textbf{X},n)=\textbf{F}^{(l)}(x^{(r)},n) the rr-th column of F(l)(X,n)\textbf{F}^{(l)}(\textbf{X},n). When we write F(l1)(x,n),F(l1)(y,n)n\langle\textbf{F}^{(l-1)}(x,n),\textbf{F}^{(l-1)}(y,n)\rangle_{{\mathbb{R}}^{n}} (see (11)) we treat F(l)(x,n)\textbf{F}^{(l)}(x,n) and F(l)(y,n)\textbf{F}^{(l)}(y,n) as elements in n{\mathbb{R}}^{n} and not in {\mathbb{R}}^{\infty}, i.e. we consider only the first nn components of F(l)(x,n)\textbf{F}^{(l)}(x,n) and F(l)(y,n)\textbf{F}^{(l)}(y,n).

3 Plan sketch

We start by recalling the notion of convergence in law, also referred to as convergence in distribution or weak convergence, for a sequence of stochastic processes. See Billingsley (1995) for a comprehensive account.

Definition 3 (convergence in distribution).

Suppose that ff and (f(n))n1(f(n))_{n\geq 1} are random elements in a topological space CC. Then, (f(n))n1(f(n))_{n\geq 1} is said to converge in distribution to ff, if 𝔼[h(f(n))]𝔼[h(f)]\mathbb{E}[h(f(n))]\rightarrow\mathbb{E}[h(f)] as nn\rightarrow\infty for every bounded and continuous function h:Ch:C\rightarrow\mathbb{R}. In that case we write f(n)dff(n)\stackrel{{\scriptstyle d}}{{\rightarrow}}f.

In this paper, we deal with continuous and real-valued stochastic processes. More precisely, we consider random elements defined on C(I;S)C({\mathbb{R}}^{I};S), with (S,d)(S,d) Polish space. Our aim is to study in C(I;S)C({\mathbb{R}}^{I};S) the convergence in distribution as the width nn goes to infinity for:

  1. i)

    the sequence (fi(l)(n))n1(f_{i}^{(l)}(n))_{n\geq 1} for a fixed l2l\geq 2 and i1i\geq 1 with (S,d)=(,||)(S,d)=({\mathbb{R}},|\cdot|), i.e. the neural network process for a single unit;

  2. ii)

    the sequence (F(l)(n))n1(\textbf{F}^{(l)}(n))_{n\geq 1} for a fixed l2l\geq 2 with (S,d)=(,)(S,d)=({\mathbb{R}}^{\infty},\|\cdot\|_{\infty}), i.e. the neural network process for all units.

Since applying Definition 3 in a function space is not easy, we need, proved in ??, the following proposition.

Proposition 1 (convergence in distribution in C(I;S)C({\mathbb{R}}^{I};S), (S,d)(S,d) Polish).

Suppose that ff and (f(n))n1(f(n))_{n\geq 1} are random elements in C(I;S)C({\mathbb{R}}^{I};S) with (S,d)(S,d) Polish space. Then, f(n)dff(n)\stackrel{{\scriptstyle d}}{{\rightarrow}}f if: i) f(n)fdff(n)\stackrel{{\scriptstyle f_{d}}}{{\rightarrow}}f and ii) the sequence (f(n))n1(f(n))_{n\geq 1} is uniformly tight.

We denoted with fd\stackrel{{\scriptstyle f_{d}}}{{\rightarrow}} the convergence in law of the finite-dimensional distributions of a sequence of stochastic processes. The notion of tightness formalizes the concept that the probability mass is not allowed to “escape at infinity”: a single random element ff in a topological space CC is said to be tight if for each ϵ>0\epsilon>0 there exists a compact TCT\subset C such that [fCT]<ϵ\mathbb{P}[f\in C\setminus T]<\epsilon. If a metric space (C,ρ)(C,\rho) is Polish any random element on the Borel σ\sigma-algebra of CC is tight. A sequence of random elements (f(n))n1(f(n))_{n\geq 1} in a topological space CC is said to be uniformly tight111Kallenberg (2002) uses the same term “tightness” for both cases of a single random element and of sequences of random elements; we find that the introduction of “uniform tightness” brings more clarity. if for every ϵ>0\epsilon>0 there exists a compact TCT\subset C such that [f(n)CT]<ϵ\mathbb{P}[f(n)\in C\setminus T]<\epsilon for all nn.

According to Proposition 1, to achieve convergence in distribution in function spaces we need the following Steps A-D:

Step A) to establish the existence of the finite-dimensional weak-limit ff on I{\mathbb{R}}^{I}. We will rely on Theorem 5.3 of Kallenberg (2002), known as Levy theorem.

Step B) to establish the existence of the stochastic processes ff and (f(n))n1(f(n))_{n\geq 1} as elements in SIS^{{\mathbb{R}}^{I}} the space of function from I{\mathbb{R}}^{I} into SS. We make use of Daniell-Kolmogorov criterion (Kallenberg, 2002, Theorem 6.16): given a family of multivariate distributions {P probability measure on dim(){x(1),,x(k)}x(z)I,k}\{P_{\mathcal{I}}\text{ probability measure on }{\mathbb{R}}^{\dim({\mathcal{I}})}\mid{\mathcal{I}}\subset\{x^{(1)},\dots,x^{(k)}\}_{x^{(z)}\in{\mathbb{R}}^{I},k\in{\mathbb{N}}}\} there exists a stochastic process with {P}\{P_{\mathcal{I}}\} as finite-dimensional distributions if {P}\{P_{\mathcal{I}}\} satisfies the projective property: PJ(×J)=P(),J{x(1),,x(k)}x(z)I,kP_{J}(\cdot\times{\mathbb{R}}_{J\setminus{\mathcal{I}}})=P_{\mathcal{I}}(\cdot),{\mathcal{I}}\subset J\subset\{x^{(1)},\dots,x^{(k)}\}_{x^{(z)}\in{\mathbb{R}}^{I},k\in{\mathbb{N}}}. That is, it is required consistency with respect to the marginalization over arbitrary components. In this step we also suppose, for a moment, that the stochastic processes (f(n))n1(f(n))_{n\geq 1} and ff belong to C(I;S)C({\mathbb{R}}^{I};S) and we establish the existence of such stochastic processes in C(I;S)C({\mathbb{R}}^{I};S) endowed with a σ\sigma-algebra and a probability measure that will be defined.

Step C) to show that the stochastic processes (f(n))n1(f(n))_{n\geq 1} and ff belong to C(I;S)SIC({\mathbb{R}}^{I};S)\subset S^{{\mathbb{R}}^{I}}. With regards to (f(n))n1(f(n))_{n\geq 1} this is a direct consequence of (5)-(6) and the continuity of ϕ\phi. With regards to the limiting process ff, with an additional Lipschitz assumption on ϕ\phi, we rely on the following Kolmogorov-Chentsov criterion (Kallenberg, 2002, Theorem 3.23):

Proposition 2 (continuous version and local-Hölderianity, (S,d)(S,d) complete).

Let ff be a process on I{\mathbb{R}}^{I} with values in a complete metric space (S,d)(S,d), and assume that there exist a,b,H>0a,b,H>0 such that,

𝔼[d(f(x),f(y))a]Hxy(I+b),x,yI{\mathbb{E}}[d(f(x),f(y))^{a}]\leq H{\|x-y\|}^{(I+b)},\quad x,y\in{\mathbb{R}}^{I}

Then ff has a continuous version (i.e. ff belongs to C(I;S)C({\mathbb{R}}^{I};S)), and the latter is a.s. locally Hölder continuous with exponent cc for any c(0,b/a)c\in(0,\nicefrac{{b}}{{a}}).

Step D) the uniform tightness of (f(n))n1(f(n))_{n\geq 1} in C(I;S)C({\mathbb{R}}^{I};S). We rely on an extension of the Kolmogorov-Chentsov criterion (Kallenberg, 2002, Corollary 16.9), which is stated in the following proposition.

Proposition 3 (uniform tightness in C(I;S)C({\mathbb{R}}^{I};S), (S,d)(S,d) Polish).

Suppose that (f(n))n1(f(n))_{n\geq 1} are random elements in C(I;S)C({\mathbb{R}}^{I};S) with (S,d)(S,d) Polish space. Assume that f(0I,n)n1f(0_{{\mathbb{R}}^{I}},n)_{n\geq 1} (i.e. f(n)f(n) evaluated at the origin) is uniformly tight in SS and that there exist a,b,H>0a,b,H>0 such that,

𝔼[d(f(x,n),f(y,n))a]Hxy(I+b),x,yI,n{\mathbb{E}}[d(f(x,n),f(y,n))^{a}]\leq H{\|x-y\|}^{(I+b)},\quad x,y\in{\mathbb{R}}^{I},n\in{\mathbb{N}}

uniformly in nn. Then (f(n))n1(f(n))_{n\geq 1} is uniformly tight in C(I;S)C({\mathbb{R}}^{I};S).

4 Large-width functional limits

4.1 Limit on C(I;S)C({\mathbb{R}}^{I};S), with (S,d)=(,||)(S,d)=({\mathbb{R}},|\cdot|), for a fixed unit i1i\geq 1 and layer ll

Lemma 1 (finite-dimensional limit).

If ϕ\phi satisfies (4) then there exists a stochastic process fi(l):If^{(l)}_{i}:{\mathbb{R}}^{I}\to{\mathbb{R}} such that (fi(l)(n))n1fdfi(l)(f^{(l)}_{i}(n))_{n\geq 1}\stackrel{{\scriptstyle f_{d}}}{{\rightarrow}}f^{(l)}_{i} as nn\rightarrow\infty.

Proof.

Fix l2l\geq 2 and i1i\geq 1. Fixed kk inputs X=[x(1),,x(k)]\textbf{X}=[x^{(1)},\dots,x^{(k)}], we show that as n+n\rightarrow+\infty

fi(l)(X,n)dNk(0,Σ(l)),\displaystyle f^{(l)}_{i}(\textbf{X},n)\stackrel{{\scriptstyle d}}{{\rightarrow}}N_{k}(\textbf{0},\Sigma(l)), (7)

where Σ(l)\Sigma(l) denotes the k×kk\times k covariance matrix, which can be computed through the recursion: Σ(1)i,j=σb2+σω2x(i),x(j)I\Sigma(1)_{i,j}=\sigma_{b}^{2}+\sigma_{\omega}^{2}\langle x^{(i)},x^{(j)}\rangle_{{\mathbb{R}}^{I}}, Σ(l)i,j=σb2+σω2ϕ(fi)ϕ(fj)q(l1)(df)\Sigma(l)_{i,j}=\sigma_{b}^{2}+\sigma_{\omega}^{2}\int\phi(f_{i})\phi(f_{j})q^{(l-1)}(\mathrm{d}f), where q(l1)=Nk(0,Σ(l1))q^{(l-1)}=N_{k}(\textbf{0},\Sigma(l-1)). By means of (2), (3), (5) and (6),

{fi(1)(X)=dNk(0,Σ(1)),Σ(1)i,j=σb2+σω2x(i),x(j)Ifi(l)(X,n)|f1,,n(l1)=dNk(0,Σ(l,n)), for l2,Σ(l,n)i,j=σb2+σω2n(ϕFi(l1)(X,n)),(ϕFj(l1)(X,n))n\displaystyle\left\{\begin{array}[]{@{}l@{}}f^{(1)}_{i}(\textbf{X})\stackrel{{\scriptstyle d}}{{=}}N_{k}(\textbf{0},\Sigma(1)),\quad\Sigma(1)_{i,j}=\sigma_{b}^{2}+\sigma_{\omega}^{2}\langle x^{(i)},x^{(j)}\rangle_{{\mathbb{R}}^{I}}\\ f^{(l)}_{i}({\textbf{X},n})|f^{(l-1)}_{1,\dots,n}\stackrel{{\scriptstyle d}}{{=}}N_{k}(\textbf{0},\Sigma(l,n)),\quad\text{ for }l\geq 2,\\ \quad\Sigma(l,n)_{i,j}=\sigma_{b}^{2}+\frac{\sigma_{\omega}^{2}}{n}\Big{\langle}(\phi\bullet\textbf{F}^{(l-1)}_{i}({\textbf{X},n})),(\phi\bullet\textbf{F}^{(l-1)}_{j}({\textbf{X},n}))\Big{\rangle}_{{\mathbb{R}}^{n}}\end{array}\right. (11)

We prove (7) using Levy’s theorem, that is the point-wise convergence of the sequence of characteristic functions of (11). We defer to ?? for the complete proof. ∎

Lemma 1 proves Step A. This proof gives an alternative and self-contained proof of the main result of Matthews et al. (2018b), under the more general assumption that the activation function ϕ\phi satisfies the polynomial envelop (4). Now we prove Step B, i.e. the existence of the stochastic processes fi(l)(n)f^{(l)}_{i}(n) and fi(l)f^{(l)}_{i} on the space I{\mathbb{R}}^{{\mathbb{R}}^{I}}, for each layer l1l\geq 1, unit i1i\geq 1 and nn\in{\mathbb{N}}. In ??.1 we show that the finite-dimensional distributions of fi(l)(n)f^{(l)}_{i}(n) satisfies Daniell-Kolmogorov criterion (Kallenberg, 2002, Theorem 6.16), and hence the stochastic process fi(l)(n)f^{(l)}_{i}(n) exists. In ??.2 we prove a similar result for the finite-dimensional distributions of the limiting process fi(l)f_{i}^{(l)}. In ??.3 we prove that, if these stochastic processes are continuous, they are naturally defined in C(I;)C({\mathbb{R}}^{I};{\mathbb{R}}). In order to prove the continuity, i.e. Step C note that fi(1)(x)=j=1Iωi,j(1)xj+bi(1)f^{(1)}_{i}(x)=\sum_{j=1}^{I}\omega_{i,j}^{(1)}x_{j}+b^{(1)}_{i} is continuous by construction, thus by induction on ll, if fi(l1)(n)f^{(l-1)}_{i}(n) are continuous for each i1i\geq 1 and nn, then fi(l)(x,n)=1nj=1nωi,j(l)ϕ(fj(l1)(x,n))+bi(l)f^{(l)}_{i}(x,n)=\frac{1}{\sqrt{n}}\sum_{j=1}^{n}\omega_{i,j}^{(l)}\phi(f^{(l-1)}_{j}(x,n))+b^{(l)}_{i} is continuous being composition of continuous functions. For the limiting process fi(l)f^{(l)}_{i} we assume ϕ\phi to be Lipschitz with Lipschitz constant LϕL_{\phi}. In particular we have the following:

Lemma 2 (continuity).

If ϕ\phi is Lipschitz on {\mathbb{R}} then fi(l)(1),fi(l)(2),f^{(l)}_{i}(1),f^{(l)}_{i}(2),\dots are \mathbb{P}-a.s. Lipschitz on I{\mathbb{R}}^{I}, while the limiting process fi(l)f^{(l)}_{i} is \mathbb{P}-a.s. continuous on I{\mathbb{R}}^{I} and locally γ\gamma-Hölder continuous for each 0<γ<10<\gamma<1.

Proof.

Here we present a sketch of the proof, and we defer to ?? and ?? for the complete proof. For (fi(l)(n))n1(f^{(l)}_{i}(n))_{n\geq 1} it is trivial to show that for each nn

|fi(l)(x,n)fi(l)(y,n)|Hi(l)(n)xyI,x,yI,a.s.\displaystyle|f^{(l)}_{i}(x,n)-f^{(l)}_{i}(y,n)|\leq H^{(l)}_{i}(n)\|x-y\|_{{\mathbb{R}}^{I}},\quad x,y\in{\mathbb{R}}^{I},\mathbb{P}-a.s. (12)

where Hi(l)(n)H^{(l)}_{i}(n) denotes a suitable random variable, which is defined by the following recursion over ll

{Hi(1)(n)=j=1I|ωi,j(1)|Hi(l)(n)=Lϕnj=1n|ωi,j(l)|Hj(l1)(n)\displaystyle\left\{\begin{array}[]{@{}l@{}}H^{(1)}_{i}(n)=\sum_{j=1}^{I}\big{|}\omega_{i,j}^{(1)}\big{|}\\ H^{(l)}_{i}(n)=\frac{L_{\phi}}{\sqrt{n}}\sum_{j=1}^{n}\big{|}\omega_{i,j}^{(l)}\big{|}H^{(l-1)}_{j}(n)\end{array}\right. (15)

To establish the continuity of the limiting process fi(l)f^{(l)}_{i} we rely on Proposition 2. Take two inputs x,yIx,y\in{\mathbb{R}}^{I}. From (7) we get that [fi(l)(x),fi(l)(y)]N2(0,Σ(l))[f^{(l)}_{i}(x),f^{(l)}_{i}(y)]\sim N_{2}(\textbf{0},\Sigma(l)) where

Σ(1)=σb2[1111]+σω2[xI2x,yIx,yIyI2],Σ(l)=σb2[1111]+σω2[|ϕ(u)|2ϕ(u)ϕ(v)ϕ(u)ϕ(v)|ϕ(v)|2]q(l1)(du,dv),\begin{split}&\Sigma(1)=\sigma_{b}^{2}\begin{bmatrix}1&1\\ 1&1\\ \end{bmatrix}+\sigma_{\omega}^{2}\begin{bmatrix}\|x\|_{{\mathbb{R}}^{I}}^{2}&\langle x,y\rangle_{{\mathbb{R}}^{I}}\\ \langle x,y\rangle_{{\mathbb{R}}^{I}}&\|y\|_{{\mathbb{R}}^{I}}^{2}\\ \end{bmatrix},\\ &\Sigma(l)=\sigma_{b}^{2}\begin{bmatrix}1&1\\ 1&1\\ \end{bmatrix}+\sigma_{\omega}^{2}\int\begin{bmatrix}|\phi(u)|^{2}&\phi(u)\phi(v)\\ \phi(u)\phi(v)&|\phi(v)|^{2}\\ \end{bmatrix}q^{(l-1)}(\mathrm{d}u,\mathrm{d}v),\end{split}

where q(l1)=N2(0,Σ(l1))q^{(l-1)}=N_{2}(\textbf{0},\Sigma(l-1)). Defining aT=[1,1]\textbf{a}^{T}=[1,-1], from (7) we know that fi(l)(y)fi(l)(x)N(aT0,aTΣ(l)a)f^{(l)}_{i}(y)-f^{(l)}_{i}(x)\sim N(\textbf{a}^{T}\textbf{0},\textbf{a}^{T}\Sigma(l)\textbf{a}).Thus

|fi(l)(y)fi(l)(x)|2θ|aTΣ(l)aN(0,1)|2θ(aTΣ(l)a)θ|N(0,1)|2θ.|f^{(l)}_{i}(y)-f^{(l)}_{i}(x)|^{2\theta}\sim|\sqrt{\textbf{a}^{T}\Sigma(l)\textbf{a}}N(0,1)|^{2\theta}\sim(\textbf{a}^{T}\Sigma(l)\textbf{a})^{\theta}|N(0,1)|^{2\theta}.

We proceed by induction over the layers. For l=1l=1,

𝔼[|fi(1)(y)fi(1)(x)|2θ]=Cθ(aTΣ(1)a)θ=Cθ(σω2yI22σω2y,xI+σω2xI2)θ=Cθ(σω2)θ(yI22y,xI+xI2)θ=Cθ(σω2)θyxI2θ,\begin{split}{\mathbb{E}}\Big{[}|f^{(1)}_{i}(y)-f^{(1)}_{i}(x)|^{2\theta}\Big{]}&=C_{\theta}(\textbf{a}^{T}\Sigma(1)\textbf{a})^{\theta}\\ &=C_{\theta}(\sigma_{\omega}^{2}\|y\|_{{\mathbb{R}}^{I}}^{2}-2\sigma_{\omega}^{2}\langle y,x\rangle_{{\mathbb{R}}^{I}}+\sigma_{\omega}^{2}\|x\|_{{\mathbb{R}}^{I}}^{2})^{\theta}\\ &=C_{\theta}(\sigma_{\omega}^{2})^{\theta}(\|y\|_{{\mathbb{R}}^{I}}^{2}-2\langle y,x\rangle_{{\mathbb{R}}^{I}}+\|x\|_{{\mathbb{R}}^{I}}^{2})^{\theta}\\ &=C_{\theta}(\sigma_{\omega}^{2})^{\theta}\|y-x\|_{{\mathbb{R}}^{I}}^{2\theta},\end{split}

where Cθ=𝔼[|N(0,1)|2θ]C_{\theta}={\mathbb{E}}[|N(0,1)|^{2\theta}]. By hypothesis induction there exists a constant H(l1)>0H^{(l-1)}>0 such that |uv|2θq(l1)(du,dv)H(l1)yxI2θ\int|u-v|^{2\theta}q^{(l-1)}(\mathrm{d}u,\mathrm{d}v)\leq H^{(l-1)}\|y-x\|^{2\theta}_{{\mathbb{R}}^{I}}. Then,

|fi(l)(y)fi(l)(x)|2θ|N(0,1)|2θ(aTΣ(l)a)θ=|N(0,1)|2θ(σω2[|ϕ(u)|22ϕ(u)ϕ(v)+|ϕ(v)|2]q(l1)(du,dv))θ|N(0,1)|2θ(σω2Lϕ2)θ|uv|2θq(l1)(du,dv)|N(0,1)|2θ(σω2Lϕ2)θH(l1)yxI2θ.\begin{split}|f^{(l)}_{i}(y)-f^{(l)}_{i}(x)|^{2\theta}&\sim|N(0,1)|^{2\theta}(\textbf{a}^{T}\Sigma(l)\textbf{a})^{\theta}\\ &=|N(0,1)|^{2\theta}\Big{(}\sigma^{2}_{\omega}\int[|\phi(u)|^{2}-2\phi(u)\phi(v)+|\phi(v)|^{2}]q^{(l-1)}(\mathrm{d}u,\mathrm{d}v)\Big{)}^{\theta}\\ &\leq|N(0,1)|^{2\theta}(\sigma^{2}_{\omega}L_{\phi}^{2})^{\theta}\int|u-v|^{2\theta}q^{(l-1)}(\mathrm{d}u,\mathrm{d}v)\\ &\leq|N(0,1)|^{2\theta}(\sigma^{2}_{\omega}L_{\phi}^{2})^{\theta}H^{(l-1)}\|y-x\|_{{\mathbb{R}}^{I}}^{2\theta}.\end{split}

where we used |ϕ(u)|22ϕ(u)ϕ(v)+|ϕ(v)|2=|ϕ(u)ϕ(v)|2Lϕ2|uv|2|\phi(u)|^{2}-2\phi(u)\phi(v)+|\phi(v)|^{2}=|\phi(u)-\phi(v)|^{2}\leq L_{\phi}^{2}|u-v|^{2} and the Jensen inequality. Thus,

𝔼[|fi(l)(y)fi(l)(x)|2θ]H(l)yxI2θ,\displaystyle\begin{split}{\mathbb{E}}\Big{[}|f^{(l)}_{i}(y)-f^{(l)}_{i}(x)|^{2\theta}\Big{]}\leq H^{(l)}\|y-x\|_{{\mathbb{R}}^{I}}^{2\theta},\end{split} (16)

where the constant H(l)H^{(l)} can be explicitly derived by solving the following system

{H(1)=Cθ(σω2)θH(l)=Cθ(σω2Lϕ2)θH(l1).\displaystyle\left\{\begin{array}[]{@{}l@{}}H^{(1)}=C_{\theta}(\sigma^{2}_{\omega})^{\theta}\\ H^{(l)}=C_{\theta}(\sigma^{2}_{\omega}L_{\phi}^{2})^{\theta}H^{(l-1)}.\end{array}\right. (19)

It is easy to get H(l)=Cθl(σω2)lθ(Lϕ2)(l1)θH^{(l)}=C_{\theta}^{l}(\sigma^{2}_{\omega})^{l\theta}(L_{\phi}^{2})^{(l-1)\theta}. Observe that H(l)H^{(l)} does not depend on ii (this will be helpful in establishing the uniformly tightness of (fi(l)(n))n1(f^{(l)}_{i}(n))_{n\geq 1} and the continuity of 𝐅(l)\mathbf{F}^{(l)}). By Proposition 2, setting α=2θ\alpha=2\theta, and β=2θI\beta=2\theta-I (since β\beta needs to be positive, it is sufficient to choose θ>I/2\theta>I/2) we get that fi(l)f^{(l)}_{i} has a continuous version and the latter is \mathbb{P}-a.s locally γ\gamma-Hölder continuous for every 0<γ<1I2θ0<\gamma<1-\frac{I}{2\theta}, for each θ>I/2\theta>I/2. Taking the limit as θ+\theta\rightarrow+\infty we conclude the proof. ∎

Lemma 3 (uniform tightness).

If ϕ\phi is Lipschitz on {\mathbb{R}} then (fi(l)(n))n1(f^{(l)}_{i}(n))_{n\geq 1} is uniformly tight in C(I;)C({\mathbb{R}}^{I};{\mathbb{R}}).

Proof.

We defer to ?? for details. Fix i1i\geq 1, l1l\geq 1. We apply Proposition 3 to show the uniform tightness of the sequence (fi(l)(n))n1(f^{(l)}_{i}(n))_{n\geq 1} in C(I;)C({\mathbb{R}}^{I};{\mathbb{R}}). By Lemma 2 fi(l)(1),fi(l)(2),f^{(l)}_{i}(1),f^{(l)}_{i}(2),\dots are random elements in C(I;)C({\mathbb{R}}^{I};{\mathbb{R}}). Since (,||)({\mathbb{R}},|\cdot|) is Polish, every probability measure is tight, then f(0I,n)f(0_{{\mathbb{R}}^{I}},n) is tight in {\mathbb{R}} for every nn. Moreover, by Lemma 1 fi(0I,n)n1dfi(l)(0I)f_{i}(0_{{\mathbb{R}}^{I}},n)_{n\geq 1}\stackrel{{\scriptstyle d}}{{\rightarrow}}f_{i}^{(l)}(0_{{\mathbb{R}}^{I}}), therefore by (Dudley, 2002, Theorem 11.5.3), f(0I,n)n1f(0_{{\mathbb{R}}^{I}},n)_{n\geq 1} is uniformly tight in {\mathbb{R}}.

It remains to show that there exist two values α>0\alpha>0 and β>0\beta>0, and a constant H(l)>0H^{(l)}>0 such that

𝔼[|fi(l)(y,n)fi(l)(x,n)|α]H(l)yxII+β,x,yI,n{\mathbb{E}}\Big{[}|f_{i}^{(l)}(y,n)-f_{i}^{(l)}(x,n)|^{\alpha}\Big{]}\leq H^{(l)}\|y-x\|_{{\mathbb{R}}^{I}}^{I+\beta},\quad x,y\in{\mathbb{R}}^{I},n\in{\mathbb{N}}

uniformly in nn. Take two points x,yIx,y\in{\mathbb{R}}^{I}. From (11) we know that fi(l)(y,n)|f1,,n(l1)N(0,σy2(l,n))f_{i}^{(l)}(y,n)|f^{(l-1)}_{1,\dots,n}\sim N(0,\sigma^{2}_{y}(l,n)) and fi(l)(x,n)|f1,,n(l1)N(0,σx2(l,n))f_{i}^{(l)}(x,n)|f^{(l-1)}_{1,\dots,n}\sim N(0,\sigma^{2}_{x}(l,n)) with joint distribution N2(0,Σ(l,n))N_{2}(\textbf{0},\Sigma(l,n)), where

Σ(1)=[σx2(1)Σ(1)x,yΣ(1)x,yσy2(1)],Σ(l)=[σx2(l,n)Σ(l,n)x,yΣ(l,n)x,yσy2(l,n)],\Sigma(1)=\begin{bmatrix}\sigma^{2}_{x}(1)&\Sigma(1)_{x,y}\\ \Sigma(1)_{x,y}&\sigma^{2}_{y}(1)\\ \end{bmatrix},\quad\Sigma(l)=\begin{bmatrix}\sigma^{2}_{x}(l,n)&\Sigma(l,n)_{x,y}\\ \Sigma(l,n)_{x,y}&\sigma^{2}_{y}(l,n)\\ \end{bmatrix},

with,

{σx2(1)=σb2+σω2xI2,σy2(1)=σb2+σω2yI2,Σ(1)x,y=σb2+σω2x,yI,σx2(l,n)=σb2+σω2nj=1n|ϕfj(l1)(x,n)|2,σy2(l,n)=σb2+σω2nj=1n|ϕfj(l1)(y,n)|2,Σ(l,n)x,y=σb2+σω2nj=1nϕ(fj(l1)(x,n))ϕ(fj(l1)(y,n))\left\{\begin{array}[]{@{}l@{}}\sigma^{2}_{x}(1)=\sigma_{b}^{2}+\sigma_{\omega}^{2}\|x\|_{{\mathbb{R}}^{I}}^{2},\\ \sigma^{2}_{y}(1)=\sigma_{b}^{2}+\sigma_{\omega}^{2}\|y\|_{{\mathbb{R}}^{I}}^{2},\\ \Sigma(1)_{x,y}=\sigma_{b}^{2}+\sigma_{\omega}^{2}\langle x,y\rangle_{{\mathbb{R}}^{I}},\\ \sigma^{2}_{x}(l,n)=\sigma_{b}^{2}+\frac{\sigma_{\omega}^{2}}{n}\sum_{j=1}^{n}|\phi\circ f_{j}^{(l-1)}(x,n)|^{2},\\ \sigma^{2}_{y}(l,n)=\sigma_{b}^{2}+\frac{\sigma_{\omega}^{2}}{n}\sum_{j=1}^{n}|\phi\circ f_{j}^{(l-1)}(y,n)|^{2},\\ \Sigma(l,n)_{x,y}=\sigma_{b}^{2}+\frac{\sigma_{\omega}^{2}}{n}\sum_{j=1}^{n}\phi(f^{(l-1)}_{j}(x,n))\phi(f^{(l-1)}_{j}(y,n))\end{array}\right.

Defining aT=[1,1]\textbf{a}^{T}=[1,-1] we have that fi(l)(y,n)|f1,,n(l1)fi(l)(x,n)|f1,,n(l1)f_{i}^{(l)}(y,n)|f^{(l-1)}_{1,\dots,n}-f_{i}^{(l)}(x,n)|f^{(l-1)}_{1,\dots,n} is distributed as N(aT0,aTΣ(l,n)a)N(\textbf{a}^{T}\textbf{0},\textbf{a}^{T}\Sigma(l,n)\textbf{a}), where aTΣ(l,n)a=σy2(l,n)2Σ(l,n)x,y+σx2(l,n)\textbf{a}^{T}\Sigma(l,n)\textbf{a}=\sigma_{y}^{2}(l,n)-2\Sigma(l,n)_{x,y}+\sigma_{x}^{2}(l,n). Consider α=2θ\alpha=2\theta with θ\theta integer. Thus

|fi(l)(y,n)|f1,,n(l1)fi(l)(x,n)|f1,,n(l1)|2θ|aTΣ(l,n)aN(0,1)|2θ(aTΣ(l,n)a)θ|N(0,1)|2θ.\Big{|}f_{i}^{(l)}(y,n)|f^{(l-1)}_{1,\dots,n}-f_{i}^{(l)}(x,n)|f^{(l-1)}_{1,\dots,n}\Big{|}^{2\theta}\sim|\sqrt{\textbf{a}^{T}\Sigma(l,n)\textbf{a}}N(0,1)|^{2\theta}\sim(\textbf{a}^{T}\Sigma(l,n)\textbf{a})^{\theta}|N(0,1)|^{2\theta}.

As in previous theorem, for l=1l=1 we get 𝔼[|fi(1)(y,n)fi(1)(x,n)|2θ]=Cθ(σω2)θyxI2θ{\mathbb{E}}\Big{[}|f_{i}^{(1)}(y,n)-f_{i}^{(1)}(x,n)|^{2\theta}\Big{]}=C_{\theta}(\sigma_{\omega}^{2})^{\theta}\|y-x\|_{{\mathbb{R}}^{I}}^{2\theta} where Cθ=𝔼[|N(0,1)2θ|]C_{\theta}={\mathbb{E}}[|N(0,1)^{2\theta}|]. Set H(1)=Cθ(σω2)θH^{(1)}=C_{\theta}(\sigma_{\omega}^{2})^{\theta} and by hypothesis induction suppose that for every j1j\geq 1

𝔼[|fj(l1)(y,n)fj(l1)(x,n)|2θ]H(l1)yxI2θ.{\mathbb{E}}\Big{[}|f^{(l-1)}_{j}(y,n)-f^{(l-1)}_{j}(x,n)|^{2\theta}\Big{]}\leq H^{(l-1)}\|y-x\|_{{\mathbb{R}}^{I}}^{2\theta}.

By hypothesis ϕ\phi is Lipschitz, then

𝔼[|fi(l)(y,n)fi(l)(x,n)|2θ|f1,,n(l1)]=Cθ(aTΣ(l,n)a)θ=Cθ(σy2(l,n)2Σ(l,n)x,y+σx2(l,n))θ=Cθ(σω2nj=1n|ϕfj(l1)(y,n)ϕfj(l1)(x,n)|2)θCθ(σω2Lϕ2nj=1n|fj(l1)(y,n)fj(l1)(x,n)|2)θ=Cθ(σω2Lϕ2)θnθ(j=1n|fj(l1)(y,n)fj(l1)(x,n)|2)θCθ(σω2Lϕ2)θnj=1n|fj(l1)(y,n)fj(l1)(x,n)|2θ.\begin{split}{\mathbb{E}}\Big{[}|f_{i}^{(l)}(y,n)-f_{i}^{(l)}(x,n)|^{2\theta}\Big{|}f^{(l-1)}_{1,\dots,n}\Big{]}&=C_{\theta}(\textbf{a}^{T}\Sigma(l,n)\textbf{a})^{\theta}\\ &=C_{\theta}\Big{(}\sigma_{y}^{2}(l,n)-2\Sigma(l,n)_{x,y}+\sigma_{x}^{2}(l,n)\Big{)}^{\theta}\\ &=C_{\theta}\Big{(}\frac{\sigma^{2}_{\omega}}{n}\sum_{j=1}^{n}\Big{|}\phi\circ f_{j}^{(l-1)}(y,n)-\phi\circ f_{j}^{(l-1)}(x,n)\Big{|}^{2}\Big{)}^{\theta}\\ &\leq C_{\theta}\Big{(}\frac{\sigma^{2}_{\omega}L_{\phi}^{2}}{n}\sum_{j=1}^{n}\Big{|}f_{j}^{(l-1)}(y,n)-f_{j}^{(l-1)}(x,n)\Big{|}^{2}\Big{)}^{\theta}\\ &=C_{\theta}\frac{(\sigma^{2}_{\omega}L_{\phi}^{2})^{\theta}}{n^{\theta}}\Big{(}\sum_{j=1}^{n}\Big{|}f_{j}^{(l-1)}(y,n)-f_{j}^{(l-1)}(x,n)\Big{|}^{2}\Big{)}^{\theta}\\ &\leq C_{\theta}\frac{(\sigma^{2}_{\omega}L_{\phi}^{2})^{\theta}}{n}\sum_{j=1}^{n}\Big{|}f_{j}^{(l-1)}(y,n)-f_{j}^{(l-1)}(x,n)\Big{|}^{2\theta}.\\ \end{split}

Using the induction hypothesis

𝔼[|fi(l)(y,n)fi(l)(x,n)|2θ]=𝔼[𝔼[|fi(l)(y,n)fi(l)(x,n)|2θ|f1,,n(l1)]]Cθ(σω2Lϕ2)θnj=1n𝔼[|fj(l1)(y,n)fj(l1)(x,n)|2θ]Cθ(σω2Lϕ2)θH(l1)yxI2θ.\begin{split}{\mathbb{E}}\Big{[}|f_{i}^{(l)}(y,n)-f_{i}^{(l)}(x,n)|^{2\theta}\Big{]}&={\mathbb{E}}\Big{[}{\mathbb{E}}\Big{[}|f_{i}^{(l)}(y,n)-f_{i}^{(l)}(x,n)|^{2\theta}\Big{|}f^{(l-1)}_{1,\dots,n}\Big{]}\Big{]}\\ &\leq C_{\theta}\frac{(\sigma^{2}_{\omega}L_{\phi}^{2})^{\theta}}{n}\sum_{j=1}^{n}{\mathbb{E}}\Big{[}|f_{j}^{(l-1)}(y,n)-f_{j}^{(l-1)}(x,n)|^{2\theta}\Big{]}\\ &\leq C_{\theta}(\sigma^{2}_{\omega}L_{\phi}^{2})^{\theta}H^{(l-1)}\|y-x\|^{2\theta}_{{\mathbb{R}}^{I}}.\end{split}

We can get the constant H(l)H^{(l)} by solving the same system as (19), obtaining H(l)=Cθl(σω2)lθ(Lϕ2)(l1)θH^{(l)}=C_{\theta}^{l}(\sigma^{2}_{\omega})^{l\theta}(L_{\phi}^{2})^{(l-1)\theta} which does not depend on nn. By Proposition 3 setting α=2θ\alpha=2\theta and β=2θI\beta=2\theta-I, since β\beta must be a positive constant, it is sufficient to take θ>I/2\theta>I/2 and this concludes the proof. ∎

Note that Lemma 3 provides the last Step D that allows us to prove the desired result which is explained in the theorem that follows:

Theorem 1 (functional limit).

If ϕ\phi is Lipschitz on {\mathbb{R}} then fi(l)(n)dfi(l)f^{(l)}_{i}(n)\stackrel{{\scriptstyle d}}{{\rightarrow}}f^{(l)}_{i} on C(I;)C({\mathbb{R}}^{I};{\mathbb{R}}).

Proof.

We apply Proposition 1 to (fi(l)(n))n1(f_{i}^{(l)}(n))_{n\geq 1}. By Lemma 2, we have that fi(l),(fi(l)(n))n1f^{(l)}_{i},(f^{(l)}_{i}(n))_{n\geq 1} belong to C(I;)C({\mathbb{R}}^{I};{\mathbb{R}}). From Lemma 1 we have the convergence of the finite-dimensional distributions of (fi(l)(n))n1(f_{i}^{(l)}(n))_{n\geq 1}, and form Lemma 3 we have the uniform tightness of (fi(l)(n))n1(f_{i}^{(l)}(n))_{n\geq 1}. ∎

4.2 Limit on C(I;S)C({\mathbb{R}}^{I};S), with (S,d)=(,)(S,d)=({\mathbb{R}}^{\infty},\|\cdot\|_{{\mathbb{R}}^{\infty}}), for a fixed layer ll

As in the previous section we prove Steps A-D for the sequence (F(l)(n))n1(\textbf{F}^{(l)}(n))_{n\geq 1}. Remark that each stochastic process F(l),F(l)(1),F(l)(2),\textbf{F}^{(l)},\textbf{F}^{(l)}(1),\textbf{F}^{(l)}(2),\dots defines on C(I;)C({\mathbb{R}}^{I};{\mathbb{R}}^{\infty}) a joint measure whose ii-th marginal is the measure induced respectively by fi(l),fi(l)(n),fi(2)(n),f^{(l)}_{i},f^{(l)}_{i}(n),f^{(2)}_{i}(n),\dots (see ??.1 -??.4). Let F(l)=di=1fi(l)\textbf{F}^{(l)}\stackrel{{\scriptstyle d}}{{=}}\bigotimes_{i=1}^{\infty}f_{i}^{(l)}, where \bigotimes denotes the product measure.

Lemma 4 (finite-dimensional limit).

If ϕ\phi satisfies (4) then F(l)(n)fdF(l)\textbf{F}^{(l)}(n)\stackrel{{\scriptstyle f_{d}}}{{\rightarrow}}\textbf{F}^{(l)} as nn\rightarrow\infty.

Proof.

The proof follows by Lemma 1 and Cramér-Wold theorem for finite-dimensional projection of F(l)(n)\textbf{F}^{(l)}(n): it is sufficient to establish the large nn asymptotic of linear combinations of the fi(l)(X,n)f^{(l)}_{i}(\textbf{X},n)’s for ii\in\mathcal{L}\subset{\mathbb{N}}. In particular, we show that for any choice of inputs elements X, as n+n\rightarrow+\infty

F(l)(X,n)di=1Nk(0,Σ(l)),\displaystyle\textbf{F}^{(l)}(\textbf{X},n)\stackrel{{\scriptstyle d}}{{\rightarrow}}\bigotimes_{i=1}^{\infty}N_{k}(\textbf{0},\Sigma(l)), (20)

where Σ(l)\Sigma(l) is defined in (7). The proof is reported in ??. ∎

Lemma 5 (continuity).

If ϕ\phi is Lipschitz on {\mathbb{R}} then F(l),(F(l)(n))n1\textbf{F}^{(l)},(\textbf{F}^{(l)}(n))_{n\geq 1} belong to C(I;)C({\mathbb{R}}^{I};{\mathbb{R}}^{\infty}). More precisely F(l)(1),F(l)(2),\textbf{F}^{(l)}(1),\textbf{F}^{(l)}(2),\dots are \mathbb{P}-a.s. Lipschitz on I{\mathbb{R}}^{I}, while the limiting process F(l)\textbf{F}^{(l)} is \mathbb{P}-a.s. continuous on I{\mathbb{R}}^{I} and locally γ\gamma-Hölder continuous for each 0<γ<10<\gamma<1.

Proof.

It derives immediately from Lemma 2. We defer to ?? and ?? for details. The continuity of the sequence process immediately follows from the Lipschitzianity of each component in (12) while the continuity of the limiting process F(l)\textbf{F}^{(l)} is proved by applying Proposition 2. Take two inputs x,yIx,y\in{\mathbb{R}}^{I} and fix α1\alpha\geq 1 even integer. Since ξ(t)t\xi(t)\leq t for all t0t\geq 0, and by Jensen inequality

d(F(l)(x),F(l)(y))α(i=112i|fi(l)(x)fi(l)(y)|)αi=112i|fi(l)(x)fi(l)(y)|αd\big{(}\textbf{F}^{(l)}(x),\textbf{F}^{(l)}(y)\big{)}_{\infty}^{\alpha}\leq\Big{(}\sum_{i=1}^{\infty}\frac{1}{2^{i}}|f_{i}^{(l)}(x)-f^{(l)}_{i}(y)|\Big{)}^{\alpha}\leq\sum_{i=1}^{\infty}\frac{1}{2^{i}}|f_{i}^{(l)}(x)-f^{(l)}_{i}(y)|^{\alpha}

Thus, by applying monotone convergence theorem to the positive increasing sequence g(N)=i=1N12i|fi(l)(x)fi(l)(y)|αg(N)=\sum_{i=1}^{N}\frac{1}{2^{i}}|f_{i}^{(l)}(x)-f^{(l)}_{i}(y)|^{\alpha} (which allows to exchange 𝔼{\mathbb{E}} and i=1\sum_{i=1}^{\infty}), we get

𝔼[d(F(l)(x),F(l)(y))α]𝔼[i=112i|f(l)(x)fi(l)(y)|α]=limN𝔼[i=1N12i|fi(l)(x)fi(l)(y)|α]=i=112i𝔼[|fi(l)(x)fi(l)(y)|α]=i=112iH(l)xyIα=H(l)xyIα\begin{split}{\mathbb{E}}\Big{[}d\big{(}\textbf{F}^{(l)}(x),&\textbf{F}^{(l)}(y)\big{)}_{\infty}^{\alpha}\Big{]}\leq{\mathbb{E}}\Big{[}\sum_{i=1}^{\infty}\frac{1}{2^{i}}|f^{(l)}(x)-f^{(l)}_{i}(y)|^{\alpha}\Big{]}=\lim_{N\rightarrow\infty}{\mathbb{E}}\Big{[}\sum_{i=1}^{N}\frac{1}{2^{i}}|f_{i}^{(l)}(x)-f^{(l)}_{i}(y)|^{\alpha}\Big{]}\\ &=\sum_{i=1}^{\infty}\frac{1}{2^{i}}{\mathbb{E}}\Big{[}|f_{i}^{(l)}(x)-f^{(l)}_{i}(y)|^{\alpha}\Big{]}=\sum_{i=1}^{\infty}\frac{1}{2^{i}}H^{(l)}\|x-y\|^{\alpha}_{{\mathbb{R}}^{I}}=H^{(l)}\|x-y\|_{{\mathbb{R}}^{I}}^{\alpha}\end{split}

where we used (16) and the fact that H(l)H^{(l)} does not depend on ii (see (19)). Therefore, by Proposition 2, for each α>I\alpha>I, setting β=αI\beta=\alpha-I (since β\beta needs to be positive, it is sufficient to choose α>I\alpha>I) F(l)\textbf{F}^{(l)} has a continuous version F(l)(θ)\textbf{F}^{(l)(\theta)} which is \mathbb{P}-a.s locally γ\gamma-Hölder continuous for every 0<γ<1Iα0<\gamma<1-\frac{I}{\alpha}. Letting α\alpha\rightarrow\infty we conclude. ∎

Theorem 2 (functional limit).

If ϕ\phi is Lipschitz on {\mathbb{R}} then (F(l)(n))n1dF(l)(\textbf{F}^{(l)}(n))_{n\geq 1}\stackrel{{\scriptstyle d}}{{\rightarrow}}\textbf{F}^{(l)} as nn\rightarrow\infty on C(I;)C({\mathbb{R}}^{I};{\mathbb{R}}^{\infty}).

Proof.

This is Proposition 1 applied to (F(l)(n))n1(\textbf{F}^{(l)}(n))_{n\geq 1}. From Lemma 4 and Lemma 5 it remains to show the uniform tightness of the sequence (F(l)(n))n1(\textbf{F}^{(l)}(n))_{n\geq 1} in C(I;)C({\mathbb{R}}^{I};{\mathbb{R}}^{\infty}). Let ϵ>0\epsilon>0 and let (ϵi)i1(\epsilon_{i})_{i\geq 1} be a positive sequence such that i=1ϵi=ϵ/2\sum_{i=1}^{\infty}\epsilon_{i}=\epsilon/2 . We have established the uniform tightness of each component (Lemma 3). Therefore for each ii\in{\mathbb{N}} there exists a compact KiC(I;)K_{i}\subset C({\mathbb{R}}^{I};{\mathbb{R}}) such that [fi(l)(n)C(I;)Ki]<ϵi\mathbb{P}[f_{i}^{(l)}(n)\in C({\mathbb{R}}^{I};{\mathbb{R}})\setminus K_{i}]<\epsilon_{i} for each nn\in{\mathbb{N}} (such compact depends on ϵi\epsilon_{i}). Set K=×i=1KiK=\times_{i=1}^{\infty}K_{i} which is compact by Tychonoff theorem. Note that this is a compact on the product space ×i=1C(I;)\times_{i=1}^{\infty}C({\mathbb{R}}^{I};{\mathbb{R}}) with associated product topology, and this is also a compact on C(I;)C({\mathbb{R}}^{I};{\mathbb{R}}^{\infty}) (see ??.4). Then [F(l)(n)C(I;)K]=[i=1{fi(l)(n)C(I;)Ki}]i=1[fi(l)(n)C(I;)Ki]i=1ϵi<ϵ\mathbb{P}\Big{[}\textbf{F}^{(l)}(n)\in C({\mathbb{R}}^{I};{\mathbb{R}}^{\infty})\setminus K\Big{]}=\mathbb{P}\Big{[}\bigcup_{i=1}^{\infty}\{f^{(l)}_{i}(n)\in C({\mathbb{R}}^{I};{\mathbb{R}})\setminus K_{i}\}\Big{]}\leq\sum_{i=1}^{\infty}\mathbb{P}\Big{[}f^{(l)}_{i}(n)\in C({\mathbb{R}}^{I};{\mathbb{R}})\setminus K_{i}\Big{]}\leq\sum_{i=1}^{\infty}\epsilon_{i}<\epsilon which concludes the proof. ∎

5 Discussion

We looked at deep Gaussian neural networks as stochastic processes, i.e. infinite-dimensional random elements, on the input space I\mathbb{R}^{I}, and we showed that: i) a network defines a stochastic process on the input space I\mathbb{R}^{I}; ii) under suitable assumptions on the activation function, a network with re-scaled weights converges weakly to a Gaussian Process in the large-width limit. These results extend previous works (Neal, 1995; Der & Lee, 2006; Lee et al., 2018; Matthews et al., 2018a; b; Yang, 2019) that investigate the limiting distribution of neural network over a countable number of distinct inputs. From the point of view of applications, the convergence in distribution is the starting point for the convergence of expectations. Let consider a continuous function g:C(I;)g:C({\mathbb{R}}^{I};{\mathbb{R}}^{\infty})\rightarrow{\mathbb{R}}. By the continuous function mapping theorem (Billingsley, 1999, Theorem 2.7), we have g(F(l)(n))𝑑g(F(l))g(\textbf{F}^{(l)}(n))\overset{d}{\rightarrow}g(\textbf{F}^{(l)}) as n+n\rightarrow+\infty, and under uniform integrability (Billingsley, 1999, Section 3), we have (Billingsley, 1999, Theorem 3.5) 𝔼[g(F(l)(n))]𝔼[g(F(l))]{\mathbb{E}}[g(\textbf{F}^{(l)}(n))]\rightarrow{\mathbb{E}}[g(\textbf{F}^{(l)})] as n+n\rightarrow+\infty. See also Dudley (2002) and references therein.

As a by-product of our results we showed that, under a Lipschitz activation function, the limiting Gaussian Process has almost surely locally γ\gamma-Hölder continuous paths, for 0<γ<10<\gamma<1. This raises the question on whether it is possible to strengthen our results to cover the case γ=1\gamma=1, or even the case of local Lipschitzianity of the paths of the limiting process. In addition, if the activation function is differentiable, does this property transfer to the limiting process? We leave these questions to future research. Finally, while fully-connected deep neural networks represent an ideal starting point for theoretical analysis, modern neural network architectures are composed of a much richer class of layers which includes convolutional, residual, recurrent and attention components. The technical arguments followed in this paper are amenable to extensions to more complex network architectures. Providing a mathematical formulation of network’s architectures and convergence results in a way that it allows for extensions to arbitrary architectures, instead of providing an ad-hoc proof for each specific case, is a fundamental research problem. Greg Yang’s work on Tensor Programs (Yang, 2019) constitutes an important step in this direction.

References

  • Aliprantis & Border (2006) Charalambos D. Aliprantis and Kim Border. Infinite Dimensional Analysis: A Hitchhiker’s Guide. Springer-Verlag Berlin and Heidelberg GmbH & Company KG, 2006.
  • Billingsley (1995) Patrick Billingsley. Probability and Measure. John Wiley & Sons, 3rd edition, 1995.
  • Billingsley (1999) Patrick Billingsley. Convergence of Probability Measures. Wiley-Interscience, 2nd edition, 1999.
  • Blum et al. (1958) JR Blum, H Chernoff, M Rosenblatt, and H Teicher. Central Limit Theorems for Interchangeable Processes. Canadian Journal of Mathematics, 10:222–229, 1958.
  • Der & Lee (2006) Ricky Der and Daniel D Lee. Beyond Gaussian Processes: On the Distributions of Infinite Networks. In Advances in Neural Information Processing Systems, pp. 275–282, 2006.
  • Dudley (2002) RM Dudley. Real Analysis and Probability. Cambridge University Press, 2002.
  • Kallenberg (2002) Olav Kallenberg. Foundations of Modern Probability. Springer Science & Business Media, 2nd edition, 2002.
  • Lee et al. (2018) Jaehoon Lee, Jascha Sohl-dickstein, Jeffrey Pennington, Roman Novak, Sam Schoenholz, and Yasaman Bahri. Deep Neural Networks as Gaussian Processes. In International Conference on Learning Representations, 2018.
  • Lee et al. (2020) Jaehoon Lee, Samuel Schoenholz, Jeffrey Pennington, Ben Adlam, Lechao Xiao, Roman Novak, and Jascha Sohl-Dickstein. Finite Versus Infinite Neural Networks: an Empirical Study. volume 33, 2020.
  • Matthews et al. (2018a) Alexander G. de G. Matthews, Jiri Hron, Mark Rowland, Richard E. Turner, and Zoubin Ghahramani. Gaussian Process Behaviour in Wide Deep Neural Networks. In International Conference on Learning Representations, 2018a.
  • Matthews et al. (2018b) Alexander G. de G. Matthews, Mark Rowland, Jiri Hron, Richard E Turner, and Zoubin Ghahramani. Gaussian Process Behaviour in Wide Deep Neural Networks, 2018b.
  • Neal (1995) Radford M Neal. Bayesian Learning for Neural Networks. PhD thesis, University of Toronto, 1995.
  • Van Der Vaart & Van Zanten (2011) Aad Van Der Vaart and Harry Van Zanten. Information Rates of Nonparametric Gaussian Process Methods. Journal of Machine Learning Research, 12(6), 2011.
  • Yang (2019) Greg Yang. Wide Feedforward or Recurrent Neural Networks of Any Architecture are Gaussian Processes. In Advances in Neural Information Processing Systems, volume 32, 2019.

SM A

The following inequalities will be used different times in the proofs without explicit mention:

  • 1)

    For any real values a1,,an0a_{1},\dots,a_{n}\geq 0 and s1s\geq 1,

    (a1++an)sns1(a1s++ans).\displaystyle(a_{1}+\dots+a_{n})^{s}\leq n^{s-1}(a_{1}^{s}+\dots+a_{n}^{s}).

    It follows immediately considering the convex function xxsx\mapsto x^{s} applied to the the weighted sum a1++an/n\nicefrac{{a_{1}+\dots+a_{n}}}{{n}}.

  • 2)

    For every values a1,,ana_{1},\dots,a_{n}\in{\mathbb{R}} and 0<s<10<s<1,

    |a1++an|s|a1|s++|an|s.\displaystyle|a_{1}+\dots+a_{n}|^{s}\leq|a_{1}|^{s}+\dots+|a_{n}|^{s}.

    It follows immediately studying the ss-Hölder function x|x|sx\mapsto|x|^{s}.

By means of (2), (3) and (5), we can write for i1i\geq 1 and l2l\geq 2

φfi(1)(X,n)(t)=𝔼[eitTfi(1)(X,n)]=𝔼[exp{itT[j=1Iωi,j(1)xj+bi(1)1]}]=𝔼[exp{itTbi(1)1+itTj=1Iωi,j(1)xj}]=𝔼[exp{i(tT1)bi(1)}]j=1I𝔼[exp{i(tTxj)ωi,j(1)}]=exp{12σb2(tT1)2}j=1Iexp{12σω2(tTxj)2}=exp{12[σb2(tT1)2+σω2j=1I(tTxj)2]}=exp{12tTΣ(1)t},\begin{split}\varphi_{f^{(1)}_{i}({\textbf{X},n})}(\textbf{t})&={\mathbb{E}}[e^{{\rm i}\textbf{t}^{T}f^{(1)}_{i}({\textbf{X},n})}]\\ &={\mathbb{E}}\Big{[}\exp\Big{\{}{\rm i}\textbf{t}^{T}\Big{[}\sum_{j=1}^{I}\omega_{i,j}^{(1)}\textbf{x}_{j}+b_{i}^{(1)}\textbf{1}\Big{]}\Big{\}}\Big{]}\\ &={\mathbb{E}}\Big{[}\exp\Big{\{}{\rm i}\textbf{t}^{T}b_{i}^{(1)}\textbf{1}+{\rm i}\textbf{t}^{T}\sum_{j=1}^{I}\omega_{i,j}^{(1)}\textbf{x}_{j}\Big{\}}\Big{]}\\ &={\mathbb{E}}\Big{[}\exp\Big{\{}{\rm i}(\textbf{t}^{T}\textbf{1})b_{i}^{(1)}\Big{\}}\Big{]}\prod_{j=1}^{I}{\mathbb{E}}\Big{[}\exp\Big{\{}{\rm i}(\textbf{t}^{T}\textbf{x}_{j})\omega_{i,j}^{(1)}\Big{\}}\Big{]}\\ &=\exp\Big{\{}-\frac{1}{2}\sigma_{b}^{2}(\textbf{t}^{T}\textbf{1})^{2}\Big{\}}\prod_{j=1}^{I}\exp\Big{\{}-\frac{1}{2}\sigma_{\omega}^{2}(\textbf{t}^{T}\textbf{x}_{j})^{2}\Big{\}}\\ &=\exp\Big{\{}-\frac{1}{2}\Big{[}\sigma_{b}^{2}(\textbf{t}^{T}\textbf{1})^{2}+\sigma_{\omega}^{2}\sum_{j=1}^{I}(\textbf{t}^{T}\textbf{x}_{j})^{2}\Big{]}\Big{\}}\\ &=\exp\Big{\{}-\frac{1}{2}\textbf{t}^{T}\Sigma(1)\textbf{t}\Big{\}},\end{split}

i.e.

fi(1)(X)=dNk(0,Σ(1)),f^{(1)}_{i}(\textbf{X})\stackrel{{\scriptstyle d}}{{=}}N_{k}(\textbf{0},\Sigma(1)),

with k×kk\times k covariance matrix with element in the ii-th row and jj-th column as follows

Σ(1)i,j=σb2+σω2x(i),x(j)I.\Sigma(1)_{i,j}=\sigma_{b}^{2}+\sigma_{\omega}^{2}\langle x^{(i)},x^{(j)}\rangle_{{\mathbb{R}}^{I}}.

Observe that we can also determine the marginal distributions,

fr,i(1)(X)N(0,Σ(1)r,r),\displaystyle f^{(1)}_{r,i}({\textbf{X}})\sim N(0,\Sigma(1)_{r,r}), (21)

where

Σ(1)r,r=σb2+σω2x(r)I2.\Sigma(1)_{r,r}=\sigma_{b}^{2}+\sigma_{\omega}^{2}\|x^{(r)}\|^{2}_{{\mathbb{R}}^{I}}.

Now, for i1i\geq 1 and l2l\geq 2, by means of (2), (3) and (6) we can write

φfi(l)(X,n)|f1,,n(l1)(t)=𝔼[eitTfi(l)(X,n)|f1,,n(l1)]=𝔼[exp{itT[1nj=1nωi,j(l)(ϕfj(l1)(X,n))+bi(1)1]}|f1,,n(l1)]=𝔼[exp{itTbi(1)1+itT1nj=1nωi,j(l)(ϕfj(l1)(X,n))}|f1,,n(l1)]=𝔼[exp{i(tT1)bi(1)}]j=1n𝔼[exp{iωi,j(l)(1ntT(ϕfj(l1)(X,n)))}|f1,,n(l1)]=exp{12σb2(tT1)2}j=1nexp{12nσω2(tT(ϕfj(l1)(X,n)))2}=exp{12[σb2(tT1)2+σω2nj=1n(tT(ϕfj(l1)(X,n)))2]}=exp{12tTΣ(l,n)t},\begin{split}\varphi_{f^{(l)}_{i}({\textbf{X},n})|f^{(l-1)}_{1,\dots,n}}(\textbf{t})&={\mathbb{E}}[e^{{\rm i}\textbf{t}^{T}f^{(l)}_{i}({\textbf{X},n})}|f^{(l-1)}_{1,\dots,n}]\\ &={\mathbb{E}}\Big{[}\exp\Big{\{}{\rm i}\textbf{t}^{T}\Big{[}\frac{1}{\sqrt{n}}\sum_{j=1}^{n}\omega_{i,j}^{(l)}(\phi\bullet f^{(l-1)}_{j}({\textbf{X},n}))+b_{i}^{(1)}\textbf{1}\Big{]}\Big{\}}|f^{(l-1)}_{1,\dots,n}\Big{]}\\ &={\mathbb{E}}\Big{[}\exp\Big{\{}{\rm i}\textbf{t}^{T}b_{i}^{(1)}\textbf{1}+{\rm i}\textbf{t}^{T}\frac{1}{\sqrt{n}}\sum_{j=1}^{n}\omega_{i,j}^{(l)}(\phi\bullet f^{(l-1)}_{j}({\textbf{X},n}))\Big{\}}|f^{(l-1)}_{1,\dots,n}\Big{]}\\ &={\mathbb{E}}\Big{[}\exp\Big{\{}{\rm i}(\textbf{t}^{T}\textbf{1})b_{i}^{(1)}\Big{\}}\Big{]}\prod_{j=1}^{n}{\mathbb{E}}\Big{[}\exp\Big{\{}{\rm i}\omega_{i,j}^{(l)}\Big{(}\frac{1}{\sqrt{n}}\textbf{t}^{T}(\phi\bullet f^{(l-1)}_{j}({\textbf{X},n}))\Big{)}\Big{\}}|f^{(l-1)}_{1,\dots,n}\Big{]}\\ &=\exp\Big{\{}-\frac{1}{2}\sigma_{b}^{2}(\textbf{t}^{T}\textbf{1})^{2}\Big{\}}\prod_{j=1}^{n}\exp\Big{\{}-\frac{1}{2n}\sigma_{\omega}^{2}\Big{(}\textbf{t}^{T}(\phi\bullet f^{(l-1)}_{j}({\textbf{X},n}))\Big{)}^{2}\Big{\}}\\ &=\exp\Big{\{}-\frac{1}{2}\Big{[}\sigma_{b}^{2}(\textbf{t}^{T}\textbf{1})^{2}+\frac{\sigma_{\omega}^{2}}{n}\sum_{j=1}^{n}\Big{(}\textbf{t}^{T}(\phi\bullet f^{(l-1)}_{j}({\textbf{X},n}))\Big{)}^{2}\Big{]}\Big{\}}\\ &=\exp\Big{\{}-\frac{1}{2}\textbf{t}^{T}\Sigma(l,n)\textbf{t}\Big{\}},\end{split}

i.e.

fi(l)(X,n)|f1,,n(l1)=dNk(0,Σ(l,n)),f^{(l)}_{i}({\textbf{X},n})|f^{(l-1)}_{1,\dots,n}\stackrel{{\scriptstyle d}}{{=}}N_{k}(\textbf{0},\Sigma(l,n)),

with k×kk\times k covariance matrix with element in the ii-th row and jj-th column as follows

Σ(l,n)i,j=σb2+σω2n(ϕFi(l1)(X,n)),(ϕFj(l1)(X,n))n.\Sigma(l,n)_{i,j}=\sigma_{b}^{2}+\frac{\sigma_{\omega}^{2}}{n}\Big{\langle}(\phi\bullet\textbf{F}^{(l-1)}_{i}({\textbf{X},n})),(\phi\bullet\textbf{F}^{(l-1)}_{j}({\textbf{X},n}))\Big{\rangle}_{{\mathbb{R}}^{n}}.

Observe that we can also determine the marginal distributions,

fr,i(l)(X,n)|f1,,n(l1)N(0,Σ(l,n)r,r),\displaystyle f^{(l)}_{r,i}({\textbf{X}},n)|f^{(l-1)}_{1,\dots,n}\sim N(0,\Sigma(l,n)_{r,r}), (22)

where

Σ(l,n)r,r=σb2+σω2nϕFr(l1)(X,n)n2.\Sigma(l,n)_{r,r}=\sigma_{b}^{2}+\frac{\sigma_{\omega}^{2}}{n}\|\phi\bullet\textbf{F}^{(l-1)}_{r}(\textbf{X},n)\|_{{\mathbb{R}}^{n}}^{2}.

SM A.1: asymptotics for the ithi-th coordinate

First of all, from Definition 1, note that since fi(1)(X)f^{(1)}_{i}(\textbf{X}) does not depend on nn we consider the limit as nn\rightarrow\infty only for fi(l)(X,n)f^{(l)}_{i}({\textbf{X},n}) for all l2l\geq 2. It comes directly from Equation (6) that, for every fixed ll and nn the sequence (fi(l)(X,n))i1\big{(}f^{(l)}_{i}(\textbf{X},n)\big{)}_{i\geq 1} is exchangeable. In particular, let pn(l)p^{(l)}_{n} denote the de Finetti (random) probability measure of the exchangeable sequence (fi(l)(X,n))i1\big{(}f^{(l)}_{i}(\textbf{X},n)\big{)}_{i\geq 1}. That is, by the celebrated de Finetti representation theorem, conditionally to pn(l)p^{(l)}_{n} the fi(l)(X,n)f^{(l)}_{i}(\textbf{X},n)’s are iid as pn(l)p^{(l)}_{n}. Now, let consider the induction hypothesis that, pn(l1)dq(l1)p_{n}^{(l-1)}\stackrel{{\scriptstyle d}}{{\rightarrow}}q^{(l-1)} as n+n\rightarrow+\infty, where q(l1)=Nk(0,Σ(l1))q^{(l-1)}=N_{k}(\textbf{0},\Sigma(l-1)). To establish the convergence in distribution we rely on Theorem 5.3 of Kallenberg (2002) known as Levy theorem, taking into account the point-wise convergence of the characteristic functions. Therefore we can write the following expression:

φfi(l)(X,n)(t)=𝔼[eitTfi(l)(X,n)]=𝔼[𝔼[eitTfi(l)(X,n)|f1,,n(l1)]]=𝔼[exp{12tTΣ(l,n)t}]=𝔼[exp{12[σb2(tT1)2+σω2nj=1n(tT(ϕfj(l1)(X,n)))2]}]=e12σb2(tT1)2𝔼[exp{σω22nj=1n(tT(ϕfj(l1)(X,n)))2}]=e12σb2(tT1)2𝔼[𝔼[exp{σω22nj=1n(tT(ϕfj(l1)(X,n)))2}|pn(l1)]]=e12σb2(tT1)2𝔼[j=1n𝔼[exp{σω22n(tT(ϕfj(l1)(X,n)))2}|pn(l1)]]=e12σb2(tT1)2𝔼[j=1nexp{σω22n(tT(ϕf))2}pn(l1)(df)]=e12σb2(tT1)2𝔼[(exp{σω22n(tT(ϕf))2}pn(l1)(df))n].\begin{split}\varphi_{f^{(l)}_{i}({\textbf{X},n})}(\textbf{t})&={\mathbb{E}}[e^{{\rm i}\textbf{t}^{T}f^{(l)}_{i}({\textbf{X},n})}]\\ &={\mathbb{E}}[{\mathbb{E}}[e^{{\rm i}\textbf{t}^{T}f^{(l)}_{i}({\textbf{X},n})}|f^{(l-1)}_{1,\dots,n}]]\\ &={\mathbb{E}}\Big{[}\exp\Big{\{}-\frac{1}{2}\textbf{t}^{T}\Sigma(l,n)\textbf{t}\Big{\}}\Big{]}\\ &={\mathbb{E}}\Big{[}\exp\Big{\{}-\frac{1}{2}\Big{[}\sigma_{b}^{2}(\textbf{t}^{T}\textbf{1})^{2}+\frac{\sigma_{\omega}^{2}}{n}\sum_{j=1}^{n}\Big{(}\textbf{t}^{T}(\phi\bullet f^{(l-1)}_{j}({\textbf{X},n}))\Big{)}^{2}\Big{]}\Big{\}}\Big{]}\\ &=e^{-\frac{1}{2}\sigma_{b}^{2}(\textbf{t}^{T}\textbf{1})^{2}}{\mathbb{E}}\Big{[}\exp\Big{\{}-\frac{\sigma_{\omega}^{2}}{2n}\sum_{j=1}^{n}\Big{(}\textbf{t}^{T}(\phi\bullet f^{(l-1)}_{j}({\textbf{X},n}))\Big{)}^{2}\Big{\}}\Big{]}\\ &=e^{-\frac{1}{2}\sigma_{b}^{2}(\textbf{t}^{T}\textbf{1})^{2}}{\mathbb{E}}\Big{[}{\mathbb{E}}\Big{[}\exp\Big{\{}-\frac{\sigma_{\omega}^{2}}{2n}\sum_{j=1}^{n}\Big{(}\textbf{t}^{T}(\phi\bullet f^{(l-1)}_{j}({\textbf{X},n}))\Big{)}^{2}\Big{\}}|p^{(l-1)}_{n}\Big{]}\Big{]}\\ &=e^{-\frac{1}{2}\sigma_{b}^{2}(\textbf{t}^{T}\textbf{1})^{2}}{\mathbb{E}}\Big{[}\prod_{j=1}^{n}{\mathbb{E}}\Big{[}\exp\Big{\{}-\frac{\sigma_{\omega}^{2}}{2n}\Big{(}\textbf{t}^{T}(\phi\bullet f^{(l-1)}_{j}({\textbf{X},n}))\Big{)}^{2}\Big{\}}|p^{(l-1)}_{n}\Big{]}\Big{]}\\ &=e^{-\frac{1}{2}\sigma_{b}^{2}(\textbf{t}^{T}\textbf{1})^{2}}{\mathbb{E}}\Big{[}\prod_{j=1}^{n}\int\exp\Big{\{}-\frac{\sigma_{\omega}^{2}}{2n}\Big{(}\textbf{t}^{T}(\phi\bullet f)\Big{)}^{2}\Big{\}}p^{(l-1)}_{n}(\mathrm{d}f)\Big{]}\\ &=e^{-\frac{1}{2}\sigma_{b}^{2}(\textbf{t}^{T}\textbf{1})^{2}}{\mathbb{E}}\Big{[}\Big{(}\int\exp\Big{\{}-\frac{\sigma_{\omega}^{2}}{2n}\Big{(}\textbf{t}^{T}(\phi\bullet f)\Big{)}^{2}\Big{\}}p^{(l-1)}_{n}(\mathrm{d}f)\Big{)}^{n}\Big{]}.\\ \end{split}

Observe that the last integral is with respect to kk coordinates: i.e. df=(df1,,dfk)\mathrm{d}f=(\mathrm{d}f_{1},\dots,\mathrm{d}f_{k}). Denote as p\stackrel{{\scriptstyle p}}{{\rightarrow}} the convergence in probability. We will prove the following lemmas:

  1. L1)

    for each l2l\geq 2 and s1s\geq 1, [pn(l1)Ys]=1\mathbb{P}[p^{(l-1)}_{n}\in Y_{s}]=1, where Ys={p:ϕfksp(df)<+}Y_{s}=\{p:\int\|\phi\bullet f\|_{{\mathbb{R}}^{k}}^{s}p(\mathrm{d}f)<+\infty\};

  2. L2)

    (tT(ϕf))2pn(l1)(df)p(tT(ϕf))2q(l1)(df)\int(\textbf{t}^{T}(\phi\bullet f))^{2}p^{(l-1)}_{n}(\mathrm{d}f)\stackrel{{\scriptstyle p}}{{\rightarrow}}\int(\textbf{t}^{T}(\phi\bullet f))^{2}q^{(l-1)}(\mathrm{d}f), as n+n\rightarrow+\infty;

  3. L3)

    (tT(ϕf))2[1exp{θσω22n(tT(ϕf))2}]pn(l1)(df)p0\int(\textbf{t}^{T}(\phi\bullet f))^{2}\big{[}1-\exp\big{\{}-\theta\frac{\sigma_{\omega}^{2}}{2n}(\textbf{t}^{T}(\phi\bullet f))^{2}\big{\}}\big{]}p^{(l-1)}_{n}(\mathrm{d}f)\stackrel{{\scriptstyle p}}{{\rightarrow}}0, as n+n\rightarrow+\infty for every θ(0,1)\theta\in(0,1).

SM A.1.1: proof of L1

In order to prove the three lemmas, we will use many times the envelope condition (4) without explicit mention. For l=2l=2 we have

𝔼[ϕfi(1)(X)ks]𝔼[(r=1k|ϕfr,i(1)(X)|2)s/2]𝔼[(r=1k|ϕfr,i(1)(X)|)s]𝔼[ks1r=1k|ϕfr,i(1)(X)|s]=ks1r=1k𝔼[|ϕfr,i(1)(X)|s]ks1r=1k𝔼[(a+b|fr,i(1)(X)|m)s](2k)s1r=1k(as+bs𝔼[|fr,i(1)(X)|sm])<+,\begin{split}{\mathbb{E}}[\|\phi\bullet f^{(1)}_{i}(\textbf{X})\|_{{\mathbb{R}}^{k}}^{s}]&\leq{\mathbb{E}}\Big{[}\Big{(}\sum_{r=1}^{k}|\phi\circ f^{(1)}_{r,i}(\textbf{X})|^{2}\Big{)}^{s/2}\Big{]}\\ &\leq{\mathbb{E}}\Big{[}\Big{(}\sum_{r=1}^{k}|\phi\circ f^{(1)}_{r,i}(\textbf{X})|\Big{)}^{s}\Big{]}\\ &\leq{\mathbb{E}}\Big{[}k^{s-1}\sum_{r=1}^{k}|\phi\circ f^{(1)}_{r,i}(\textbf{X})|^{s}\Big{]}\\ &=k^{s-1}\sum_{r=1}^{k}{\mathbb{E}}\Big{[}|\phi\circ f^{(1)}_{r,i}(\textbf{X})|^{s}\Big{]}\\ &\leq k^{s-1}\sum_{r=1}^{k}{\mathbb{E}}\Big{[}(a+b|f^{(1)}_{r,i}(\textbf{X})|^{m})^{s}\Big{]}\\ &\leq(2k)^{s-1}\sum_{r=1}^{k}\Big{(}a^{s}+b^{s}{\mathbb{E}}[|f^{(1)}_{r,i}(\textbf{X})|^{sm}]\Big{)}\\ &<+\infty,\end{split}

where we used that from (21), fr,i(1)(X)N(0,σb2+σω2x(r)I2)f^{(1)}_{r,i}(\textbf{X})\sim N(0,\sigma_{b}^{2}+\sigma_{\omega}^{2}\|x^{(r)}\|^{2}_{{\mathbb{R}}^{I}}) and then

𝔼[|fr,i(1)(X)|sm]=Msm(σb2+σω2x(r)I2)sm/2,{\mathbb{E}}[|f^{(1)}_{r,i}(\textbf{X})|^{sm}]=M_{sm}(\sigma_{b}^{2}+\sigma_{\omega}^{2}\|x^{(r)}\|^{2}_{{\mathbb{R}}^{I}})^{\nicefrac{{sm}}{{2}}},

where McM_{c} is the cc-th moment of |N(0,1)||N(0,1)|. Now assume that L1 is true for (l2)(l-2), i.e. for each s1s\geq 1 it holds ϕfkspn(l2)(df)<+\int\|\phi\bullet f\|_{{\mathbb{R}}^{k}}^{s}p^{(l-2)}_{n}(\mathrm{d}f)<+\infty uniformly in nn, and we prove that it is true also for (l1)(l-1).

𝔼[ϕfi(l1)(X,n)ks|f1,,n(l2)]𝔼[ks1r=1k|ϕfr,i(l1)(X,n)|s|f1,,n(l2)](2k)s1r=1k(as+bs𝔼[|fr,i(l1)(X,n)|ms|f1,,n(l2)])D1(a,k.s)+D2(b,k,s)r=1k𝔼[|fr,i(l1)(X,n)|ms|f1,,n(l2)].\begin{split}{\mathbb{E}}[\|\phi\bullet f_{i}^{(l-1)}(\textbf{X},n)\|_{{\mathbb{R}}^{k}}^{s}|f_{1,\dots,n}^{(l-2)}]&\leq{\mathbb{E}}\Big{[}k^{s-1}\sum_{r=1}^{k}|\phi\circ f_{r,i}^{(l-1)}(\textbf{X},n)|^{s}|f_{1,\dots,n}^{(l-2)}\Big{]}\\ &\leq(2k)^{s-1}\sum_{r=1}^{k}\Big{(}a^{s}+b^{s}{\mathbb{E}}\Big{[}|f_{r,i}^{(l-1)}(\textbf{X},n)|^{ms}|f_{1,\dots,n}^{(l-2)}\Big{]}\Big{)}\\ &\leq D_{1}(a,k.s)+D_{2}(b,k,s)\sum_{r=1}^{k}{\mathbb{E}}\Big{[}|f_{r,i}^{(l-1)}(\textbf{X},n)|^{ms}|f_{1,\dots,n}^{(l-2)}\Big{]}.\\ \end{split}

From (22) we get

𝔼[|fr,i(l1)(X,n)|ms|f1,,n(l2)]=Mms(σb2+σω2nϕFr(l2)(X,n)n2)sm/2Mms2sm1(σb2sm+σω2smnsmϕFr(l2)(X,n)n2sm)1/2.\begin{split}{\mathbb{E}}\Big{[}|f_{r,i}^{(l-1)}(\textbf{X},n)|^{ms}|f_{1,\dots,n}^{(l-2)}\Big{]}&=M_{ms}\Big{(}\sigma_{b}^{2}+\frac{\sigma_{\omega}^{2}}{n}\|\phi\bullet\textbf{F}^{(l-2)}_{r}(\textbf{X},n)\|_{{\mathbb{R}}^{n}}^{2}\Big{)}^{\nicefrac{{sm}}{{2}}}\\ &\leq M_{ms}2^{sm-1}\Big{(}\sigma_{b}^{2sm}+\frac{\sigma_{\omega}^{2sm}}{n^{sm}}\|\phi\bullet\textbf{F}^{(l-2)}_{r}(\textbf{X},n)\|_{{\mathbb{R}}^{n}}^{2sm}\Big{)}^{\nicefrac{{1}}{{2}}}.\\ \end{split}

Thus we have

𝔼[ϕfi(l1)(X,n)ks|pn(l2)]D1(a,k,s)+D3(b,k,s,m)r=1k(σb2sm+σω2smnsm𝔼[ϕFr(l2)(X,n)n2sm|pn(l2)])1/2,\begin{split}&{\mathbb{E}}[\|\phi\bullet f_{i}^{(l-1)}(\textbf{X},n)\|_{{\mathbb{R}}^{k}}^{s}|p_{n}^{(l-2)}]\\ &\leq D_{1}(a,k,s)+D_{3}(b,k,s,m)\sum_{r=1}^{k}\Big{(}\sigma_{b}^{2sm}+\frac{\sigma_{\omega}^{2sm}}{n^{sm}}{\mathbb{E}}\Big{[}\|\phi\bullet\textbf{F}^{(l-2)}_{r}(\textbf{X},n)\|_{{\mathbb{R}}^{n}}^{2sm}|p^{(l-2)}_{n}\Big{]}\Big{)}^{\nicefrac{{1}}{{2}}},\\ \end{split}

where

𝔼[ϕFr(l2)(X,n)n2sm|pn(l2)]𝔼[nsm1i=1n|ϕfr,i(l2)(X,n)|2sm|pn(l2)]D4(s,m)nsm|ϕ(fr)|2smpn(l2)(dfr)D4(s,m)nsmϕfk2smpn(l2)(df),\begin{split}{\mathbb{E}}\Big{[}\|\phi\bullet\textbf{F}^{(l-2)}_{r}(\textbf{X},n)\|_{{\mathbb{R}}^{n}}^{2sm}|p^{(l-2)}_{n}\Big{]}&\leq{\mathbb{E}}\Big{[}n^{sm-1}\sum_{i=1}^{n}|\phi\circ f_{r,i}^{(l-2)}(\textbf{X},n)|^{2sm}|p_{n}^{(l-2)}\Big{]}\\ &\leq D_{4}(s,m)n^{sm}\int|\phi(f_{r})|^{2sm}p_{n}^{(l-2)}(\mathrm{d}f_{r})\\ &\leq D_{4}(s,m)n^{sm}\int\|\phi\bullet f\|_{{\mathbb{R}}^{k}}^{2sm}p_{n}^{(l-2)}(\mathrm{d}f),\\ \end{split}

where the last inequality is due to the fact that |ϕ(fr)|2sm(r=1k|ϕ(fr)|2)sm|\phi(f_{r})|^{2sm}\leq\Big{(}\sum_{r=1}^{k}|\phi(f_{r})|^{2}\Big{)}^{sm} and then

|ϕ(fr)|2smpn(l2)(dfr)(r=1k|ϕ(fr)|2)smpn(l2)(df1,,dfk)=ϕfk2smpn(l2)(df).\int|\phi(f_{r})|^{2sm}p_{n}^{(l-2)}(\mathrm{d}f_{r})\leq\int\Big{(}\sum_{r=1}^{k}|\phi(f_{r})|^{2}\Big{)}^{sm}p_{n}^{(l-2)}(\mathrm{d}f_{1},\dots,\mathrm{d}f_{k})=\int\|\phi\bullet f\|_{{\mathbb{R}}^{k}}^{2sm}p_{n}^{(l-2)}(\mathrm{d}f).

So, we proved that

𝔼[ϕfi(l1)(X,n)ks|pn(l2)]D1(a,k,s)+D3(b,k,s,m)r=1k(σb2sm+σω2smD4(s,m)ϕfk2smpn(l2)(df))1/2,\displaystyle\begin{split}&{\mathbb{E}}[\|\phi\bullet f_{i}^{(l-1)}(\textbf{X},n)\|_{{\mathbb{R}}^{k}}^{s}|p_{n}^{(l-2)}]\\ &\leq D_{1}(a,k,s)+D_{3}(b,k,s,m)\sum_{r=1}^{k}\Big{(}\sigma_{b}^{2sm}+\sigma_{\omega}^{2sm}D_{4}(s,m)\int\|\phi\bullet f\|_{{\mathbb{R}}^{k}}^{2sm}p_{n}^{(l-2)}(\mathrm{d}f)\Big{)}^{\nicefrac{{1}}{{2}}},\\ \end{split} (23)

which is finite by induction hypothesis uniformly in nn. To conclude, since pn(l1)iidfi(l1)(X,n)|pn(l1)p_{n}^{(l-1)}\overset{iid}{\sim}f_{i}^{(l-1)}(\textbf{X},n)|p_{n}^{(l-1)} we get

ϕfkspn(l1)(df)=𝔼[ϕfi(l1)(X,n)ks|pn(l1)]=𝔼[𝔼[ϕfi(l1)(X,n)ks|pn(l2)]|pn(l1)]cost(a,k,s,m)<\displaystyle\begin{split}\int\|\phi\bullet f\|_{{\mathbb{R}}^{k}}^{s}p_{n}^{(l-1)}(\mathrm{d}f)&={\mathbb{E}}[\|\phi\bullet f_{i}^{(l-1)}(\textbf{X},n)\|_{{\mathbb{R}}^{k}}^{s}|p_{n}^{(l-1)}]\\ &={\mathbb{E}}[{\mathbb{E}}[\|\phi\bullet f_{i}^{(l-1)}(\textbf{X},n)\|_{{\mathbb{R}}^{k}}^{s}|p_{n}^{(l-2)}]|p_{n}^{(l-1)}]\\ &\leq cost(a,k,s,m)<\infty\end{split} (24)

which is bounded uniformly in nn since the inner expectation is bounded uniformly in nn by (23).

Remark: YsY_{s} is a measurable set with respect to the weak topology for each s1s\geq 1, indeed for each RR\in{\mathbb{N}} defining the map

TR:U,TR(p)=BR(0)ϕfksp(df)=kϕfks𝒳(BR(0))(f)p(df)T_{R}:U\rightarrow{\mathbb{R}},\quad T_{R}(p)=\int_{B_{R}(0)}\|\phi\bullet f\|_{{\mathbb{R}}^{k}}^{s}p(\mathrm{d}f)=\int_{{\mathbb{R}}^{k}}\|\phi\bullet f\|_{{\mathbb{R}}^{k}}^{s}\mathcal{X}_{(B_{R}(0))}(f)p(\mathrm{d}f)

where U:={p:pU:=\{p:p distribution of a r.v. X:Ωk}X:\Omega\rightarrow{\mathbb{R}}^{k}\} endowed with the weak topology, since RTR1(0,)=Ys\cap_{R\in{\mathbb{N}}}T_{R}^{-1}(0,\infty)=Y_{s} and (0,)(0,\infty) is open, it is sufficient to prove that TRT_{R} is continuous. Let (pm)U(p_{m})\subset U such that pmp_{m} converges to pp with respect to the weak topology, then by Definition 3

|TR(pm)TR(p)|=|ϕfks𝒳(BR(0))(f)pm(df)ϕfks𝒳(BR(0))(f)p(df)|0|T_{R}(p_{m})-T_{R}(p)|=\Big{|}\int\|\phi\bullet f\|_{{\mathbb{R}}^{k}}^{s}\mathcal{X}_{(B_{R}(0))}(f)p_{m}(\mathrm{d}f)-\int\|\phi\bullet f\|_{{\mathbb{R}}^{k}}^{s}\mathcal{X}_{(B_{R}(0))}(f)p(\mathrm{d}f)\Big{|}\rightarrow 0

because the function fϕfks𝒳(BR(0))(f)f\mapsto\|\phi\bullet f\|_{{\mathbb{R}}^{k}}^{s}\mathcal{X}_{(B_{R}(0))}(f) is continuous (by composition of the continuous functions ϕ\phi and s\|\|^{s}) and bounded by Weierstrass theorem.

SM A.1.2: proof of L2

By induction hypothesis, pn(l1)p^{(l-1)}_{n} converges weakly to a p(l1)p^{(l-1)} with respect to the weak topology and the limit is degenerate, in the sense that it provides a.s. the distribution q(l1)q^{(l-1)}. Then pn(l1)p^{(l-1)}_{n} converges in probability to p(l1)p^{(l-1)}. Then for every sub sequence nn^{\prime} there exists a further sub sequence n′′n^{\prime\prime} such that pn′′(l1)p^{(l-1)}_{n^{\prime\prime}} converges a.s. to p(l1)p^{(l-1)}. By induction hypothesis, p(l1)p^{(l-1)} is absolutely continuous with respect to the Lebesgue measure. Since ϕ\phi is a.s. continuous and the sequence ((tT(ϕf))2)n1\big{(}(\textbf{t}^{T}(\phi\bullet f))^{2}\big{)}_{n\geq 1} uniformly integrable with respect to pn(l1)p^{(l-1)}_{n} (by Cauchy-Schwarz inequality and L1 (tT(ϕf))2spn(l1)(df)tk2sϕfk2spn(l1)(df)<\int\big{(}\textbf{t}^{T}(\phi\bullet f)\big{)}^{2s}p^{(l-1)}_{n}(\mathrm{d}f)\leq\|\textbf{t}\|^{2s}_{{\mathbb{R}}^{k}}\int\|\phi\bullet f\|^{2s}_{{\mathbb{R}}^{k}}p^{(l-1)}_{n}(\mathrm{d}f)<\infty, thus is LsL^{s}-bounded for each s1s\geq 1, and so uniformly integrable, then we can write the following

(tT(ϕf))2pn′′(l1)(df)a.s.(tT(ϕf))2q(l1)(df).\int(\textbf{t}^{T}(\phi\bullet f))^{2}p^{(l-1)}_{n^{\prime\prime}}(\mathrm{d}f)\stackrel{{\scriptstyle a.s.}}{{\rightarrow}}\int(\textbf{t}^{T}(\phi\bullet f))^{2}q^{(l-1)}(\mathrm{d}f).

Thus, as n+n\to+\infty

(tT(ϕf))2pn(l1)(df)p(tT(ϕf))2q(l1)(df).\int(\textbf{t}^{T}(\phi\bullet f))^{2}p^{(l-1)}_{n}(\mathrm{d}f)\stackrel{{\scriptstyle p}}{{\rightarrow}}\int(\textbf{t}^{T}(\phi\bullet f))^{2}q^{(l-1)}(\mathrm{d}f).

SM A.1.3: proof of L3

Let p1p\geq 1 and q1q\geq 1 such that 1p+1q=1\frac{1}{p}+\frac{1}{q}=1. By means of Hölder inequality

ϕfk2(1eσω22n(tT(ϕf))2)pn(l1)(df)(ϕfk2ppn(l1)(df))1/p((1eσω22n(tT(ϕf))2)qpn(l1)(df))1/q.\begin{split}&\int\|\phi\bullet f\|_{{\mathbb{R}}^{k}}^{2}(1-e^{-\frac{\sigma_{\omega}^{2}}{2n}(\textbf{t}^{T}(\phi\bullet f))^{2}})p^{(l-1)}_{n}(\mathrm{d}f)\\ &\leq\Big{(}\int\|\phi\bullet f\|_{{\mathbb{R}}^{k}}^{2p}p^{(l-1)}_{n}(\mathrm{d}f)\Big{)}^{\nicefrac{{1}}{{p}}}\Big{(}\int(1-e^{-\frac{\sigma_{\omega}^{2}}{2n}(\textbf{t}^{T}(\phi\bullet f))^{2}})^{q}p^{(l-1)}_{n}(\mathrm{d}f)\Big{)}^{\nicefrac{{1}}{{q}}}.\end{split}

Since q1q\geq 1, for every y0y\geq 0 we have 01ey<10\leq 1-e^{-y}<1, then (1ey)q(1ey)y(1-e^{-y})^{q}\leq(1-e^{-y})\leq y. It implies the following

ϕfk2(1eσω22n(tT(ϕf))2)pn(l1)(df)(ϕfk2ppn(l1)(df))1/p(σω22n(tT(ϕf))2pn(l1)(df))1/q(ϕfk2ppn(l1)(df))1/p(tk2σω22nϕfk2pn(l1)(df))1/q0,\begin{split}&\int\|\phi\bullet f\|_{{\mathbb{R}}^{k}}^{2}(1-e^{-\frac{\sigma_{\omega}^{2}}{2n}(\textbf{t}^{T}(\phi\bullet f))^{2}})p^{(l-1)}_{n}(\mathrm{d}f)\\ &\leq\Big{(}\int\|\phi\bullet f\|_{{\mathbb{R}}^{k}}^{2p}p^{(l-1)}_{n}(\mathrm{d}f)\Big{)}^{\nicefrac{{1}}{{p}}}\Big{(}\int\frac{\sigma_{\omega}^{2}}{2n}(\textbf{t}^{T}(\phi\bullet f))^{2}p^{(l-1)}_{n}(\mathrm{d}f)\Big{)}^{\nicefrac{{1}}{{q}}}\\ &\leq\Big{(}\int\|\phi\bullet f\|_{{\mathbb{R}}^{k}}^{2p}p^{(l-1)}_{n}(\mathrm{d}f)\Big{)}^{\nicefrac{{1}}{{p}}}\Big{(}\|\textbf{t}\|_{{\mathbb{R}}^{k}}^{2}\frac{\sigma_{\omega}^{2}}{2n}\int\|\phi\bullet f\|_{{\mathbb{R}}^{k}}^{2}p^{(l-1)}_{n}(\mathrm{d}f)\Big{)}^{\nicefrac{{1}}{{q}}}\rightarrow 0,\end{split}

as n+n\rightarrow+\infty since by L1 the two integrals are bounded uniformly in n. Thus for every y>0y>0 and θ(0,1)\theta\in(0,1) eθyey01eθy1ey1e^{-\theta y}\geq e^{-y}\Rightarrow 0\leq 1-e^{-\theta y}\leq 1-e^{-y}\leq 1 we get

0(tT(ϕf))2[1exp{θσω22n(tT(ϕf))2}]pn(l1)(df)tk2ϕfk2[1exp{σω22n(tT(ϕf))2}]pn(l1)(df)0,\begin{split}0&\leq\int\big{(}\textbf{t}^{T}(\phi\bullet f)\big{)}^{2}\Big{[}1-\exp\big{\{}-\theta\frac{\sigma_{\omega}^{2}}{2n}\big{(}\textbf{t}^{T}(\phi\bullet f)\big{)}^{2}\big{\}}\Big{]}p^{(l-1)}_{n}(\mathrm{d}f)\\ &\leq\|\textbf{t}\|_{{\mathbb{R}}^{k}}^{2}\int\|\phi\bullet f\|_{{\mathbb{R}}^{k}}^{2}\Big{[}1-\exp\big{\{}-\frac{\sigma_{\omega}^{2}}{2n}\big{(}\textbf{t}^{T}(\phi\bullet f)\big{)}^{2}\big{\}}\Big{]}p^{(l-1)}_{n}(\mathrm{d}f)\rightarrow 0,\end{split}

as n+n\rightarrow+\infty.

SM A.1.4: combination of the lemmas

We conclude in two steps.

Step 1: uniform integrability. Define y=yn(f)=σω22n(tT(ϕf))2y=y_{n}(f)=\frac{\sigma_{\omega}^{2}}{2n}(\textbf{t}^{T}(\phi\bullet f))^{2}. Thus

φfi(l)(X,n)(t)=e12σb2(tT1)2𝔼[(eyn(f)pn(l1)(df))n]=e12σb2(tT1)2𝔼[An]\begin{split}\varphi_{f^{(l)}_{i}({\textbf{X},n})}(\textbf{t})&=e^{-\frac{1}{2}\sigma_{b}^{2}(\textbf{t}^{T}\textbf{1})^{2}}{\mathbb{E}}\Big{[}\Big{(}\int e^{-y_{n}(f)}p^{(l-1)}_{n}(\mathrm{d}f)\Big{)}^{n}\Big{]}\\ &=e^{-\frac{1}{2}\sigma_{b}^{2}(\textbf{t}^{T}\textbf{1})^{2}}{\mathbb{E}}[A_{n}]\end{split}

where An=(eyn(f)pn(l1)(df))nA_{n}=\Big{(}\int e^{-y_{n}(f)}p^{(l-1)}_{n}(\mathrm{d}f)\Big{)}^{n}. (An)n1(A_{n})_{n\geq 1} is is uniformly integrable because it is LsL^{s}-bounded for all s1s\geq 1. Indeed, since 0<eyn(f)10<e^{-y_{n}(f)}\leq 1

𝔼[Ans]𝔼[(pn(l1)(df))ns]=𝔼[1]=1{\mathbb{E}}[A_{n}^{s}]\leq{\mathbb{E}}\Big{[}\Big{(}\int p_{n}^{(l-1)}(\mathrm{d}f)\Big{)}^{ns}\Big{]}={\mathbb{E}}\big{[}1\big{]}=1

Step 2: convergence in probability. By Lagrange theorem for y>0y>0 there exists θ(0,1)\theta\in(0,1) such that ey=1y+y(1eyθ)e^{-y}=1-y+y(1-e^{-y\theta}). Then for every nn there exists a real value θn(0,1)\theta_{n}\in(0,1) such that the follow equality holds:

An=(1σω22n[AnAn′′])n.\begin{split}A_{n}=\Big{(}1-\frac{\sigma_{\omega}^{2}}{2n}[A^{\prime}_{n}-A^{\prime\prime}_{n}]\Big{)}^{n}.\end{split}

where

{An=(tT(ϕf))2pn(l1)(df)An′′=(tT(ϕf))2[1exp{θnσω22n(tT(ϕf))2}]pn(l1)(df)\left\{\begin{array}[]{@{}l@{}}A^{\prime}_{n}=\int(\textbf{t}^{T}(\phi\bullet f))^{2}p^{(l-1)}_{n}(\mathrm{d}f)\\ A^{\prime\prime}_{n}=\int(\textbf{t}^{T}(\phi\bullet f))^{2}\Big{[}1-\exp\Big{\{}-\theta_{n}\frac{\sigma_{\omega}^{2}}{2n}(\textbf{t}^{T}(\phi\bullet f))^{2}\Big{\}}\Big{]}p^{(l-1)}_{n}(\mathrm{d}f)\end{array}\right.

Using the definition of the exponential function, i.e. ex=limn(1+xn)ne^{x}=\lim_{n\to\infty}(1+\frac{x}{n})^{n}, L2 and L3 we get that

An𝑝exp{σω22(tT(ϕf))2q(l1)(df)}, as nA_{n}\overset{p}{\rightarrow}\exp\Big{\{}-\frac{\sigma_{\omega}^{2}}{2}\int(\textbf{t}^{T}(\phi\bullet f))^{2}q^{(l-1)}(\mathrm{d}f)\Big{\}},\quad\text{ as }n\rightarrow\infty

Conclusion: since convergence in probability with uniform integrability implies convergence in mean, by the two above steps we get

φfi(l)(X,n)(t)=e12σb2(tT1)2𝔼[An]exp{σb22(tT1)2σω22(tT(ϕf))2q(l1)(df)}=exp{12[σb2(tT1)2+σω2(tT(ϕf))2q(l1)(df)]}=exp{12tTΣ(l)t},\begin{split}\varphi_{f^{(l)}_{i}({\textbf{X},n})}(\textbf{t})=e^{-\frac{1}{2}\sigma_{b}^{2}(\textbf{t}^{T}\textbf{1})^{2}}{\mathbb{E}}[A_{n}]&\rightarrow\exp\Big{\{}-\frac{\sigma_{b}^{2}}{2}(\textbf{t}^{T}\textbf{1})^{2}-\frac{\sigma_{\omega}^{2}}{2}\int(\textbf{t}^{T}(\phi\bullet f))^{2}q^{(l-1)}(\mathrm{d}f)\Big{\}}\\ &=\exp\Big{\{}-\frac{1}{2}\Big{[}\sigma_{b}^{2}(\textbf{t}^{T}\textbf{1})^{2}+\sigma_{\omega}^{2}\int(\textbf{t}^{T}(\phi\bullet f))^{2}q^{(l-1)}(\mathrm{d}f)\Big{]}\Big{\}}\\ &=\exp\Big{\{}-\frac{1}{2}\textbf{t}^{T}\Sigma(l)\textbf{t}\Big{\}},\end{split}

where Σ(l)\Sigma(l) is a k×kk\times k matrix with elements

Σ(l)i,j=σb2+σω2ϕ(fi)ϕ(fj)q(l1)(df),\Sigma(l)_{i,j}=\sigma_{b}^{2}+\sigma_{\omega}^{2}\int\phi(f_{i})\phi(f_{j})q^{(l-1)}(\mathrm{d}f),

where q(l1)=Nk(0,Σ(l1))q^{(l-1)}=N_{k}(\textbf{0},\Sigma(l-1)). Then the limit distribution of fi(l)(X)f^{(l)}_{i}({\textbf{X}}) is a kk-dimensional Gaussian distribution with mean 0 and covariance matrix Σ(l)\Sigma(l), i.e. as n+n\rightarrow+\infty,

fi(l)(X,n)dNk(0,Σ(l)).f^{(l)}_{i}(\textbf{X},n)\stackrel{{\scriptstyle d}}{{\rightarrow}}N_{k}(\textbf{0},\Sigma(l)).

SM B.1

Fix i1i\geq 1, l1l\geq 1, nn\in{\mathbb{N}} . We prove that there exists a random variable Hi(l)(n)H^{(l)}_{i}(n) such that

|fi(l)(x,n)fi(l)(y,n)|Hi(l)(n)xyI,x,yI,a.s.|f^{(l)}_{i}(x,n)-f^{(l)}_{i}(y,n)|\leq H^{(l)}_{i}(n)\|x-y\|_{{\mathbb{R}}^{I}},\quad x,y\in{\mathbb{R}}^{I},\mathbb{P}-a.s.

i.e. fixed ξΩ\xi\in\Omega the function xfi(l)(x,n)(ξ)x\mapsto f^{(l)}_{i}(x,n)(\xi) is Lipschitz. We proceed by induction on the layers. Fix x,yIx,y\in{\mathbb{R}}^{I}. For the first layer, by (5) we get

|fi(1)(x,n)(ξ)fi(1)(y,n)(ξ)|=|j=1Iωi,j(1)(ξ)xj+bi(1)(ξ)(j=1Iωi,j(1)(ξ)yj+bi(1)(ξ))|=|j=1Iωi,j(1)(ξ)xjj=1Iωi,j(1)(ξ)yj|=|j=1Iωi,j(1)(ξ)(xjyj)|j=1I|ωi,j(1)(ξ)||xjyj|xyIj=1I|ωi,j(1)(ξ)|\begin{split}|f^{(1)}_{i}(x,n)(\xi)-f^{(1)}_{i}(y,n)(\xi)|&=\Big{|}\sum_{j=1}^{I}\omega_{i,j}^{(1)}(\xi)x_{j}+b_{i}^{(1)}(\xi)-\big{(}\sum_{j=1}^{I}\omega_{i,j}^{(1)}(\xi)y_{j}+b_{i}^{(1)}(\xi)\big{)}\Big{|}\\ &=\Big{|}\sum_{j=1}^{I}\omega_{i,j}^{(1)}(\xi)x_{j}-\sum_{j=1}^{I}\omega_{i,j}^{(1)}(\xi)y_{j}\Big{|}\\ &=\Big{|}\sum_{j=1}^{I}\omega_{i,j}^{(1)}(\xi)(x_{j}-y_{j})\Big{|}\\ &\leq\sum_{j=1}^{I}\Big{|}\omega_{i,j}^{(1)}(\xi)\Big{|}|x_{j}-y_{j}|\\ &\leq\|x-y\|_{{\mathbb{R}}^{I}}\sum_{j=1}^{I}\Big{|}\omega_{i,j}^{(1)}(\xi)\Big{|}\\ \end{split}

where we used that |xjyj|xyI|x_{j}-y_{j}|\leq\|x-y\|_{{\mathbb{R}}^{I}}. Set Hi(1)(n)=j=1I|ωi,j(1)|H^{(1)}_{i}(n)=\sum_{j=1}^{I}\big{|}\omega_{i,j}^{(1)}\big{|}. Suppose by induction hypothesis that for each j1j\geq 1 there exists a random variable Hj(l1)(n)H_{j}^{(l-1)}(n) such that |fj(l1)(x,n)(ξ)fj(l1)(y,n)(ξ)|Hj(l1)(n)(ξ)xyI|f^{(l-1)}_{j}(x,n)(\xi)-f^{(l-1)}_{j}(y,n)(\xi)|\leq H_{j}^{(l-1)}(n)(\xi)\|x-y\|_{{\mathbb{R}}^{I}}, and let LϕL_{\phi} be the Lipschitz constant of ϕ\phi. Then by (6) we get

|fi(l)(x,n)(ξ)fi(l)(y,n)(ξ)|=|1nj=1nωi,j(l)(ξ)ϕ(fj(l1)(x,n))+bi(l)(ξ)[1nj=1nωi,j(l)(ξ)ϕ(fj(l1)(y,n))+bi(l)(ξ)]|=|1nj=1nωi,j(l)(ξ)ϕ(fj(l1)(x,n))1nj=1nωi,j(l)(ξ)ϕ(fj(l1)(y,n))|1nj=1n|ωi,j(l)(ξ)||ϕ(fj(l1)(x,n))ϕ(fj(l1)(y,n))|Lϕnj=1n|ωi,j(l)(ξ)||fj(l1)(x,n)fj(l1)(y,n)|Lϕnj=1n|ωi,j(l)(ξ)|Hj(l1)(n)(ξ)xyIxyILϕnj=1n|ωi,j(l)(ξ)|Hj(l1)(n)(ξ)\begin{split}|f^{(l)}_{i}(x,n)&(\xi)-f^{(l)}_{i}(y,n)(\xi)|\\ &=\Big{|}\frac{1}{\sqrt{n}}\sum_{j=1}^{n}\omega_{i,j}^{(l)}(\xi)\phi(f^{(l-1)}_{j}(x,n))+b_{i}^{(l)}(\xi)-\Big{[}\frac{1}{\sqrt{n}}\sum_{j=1}^{n}\omega_{i,j}^{(l)}(\xi)\phi(f^{(l-1)}_{j}(y,n))+b_{i}^{(l)}(\xi)\Big{]}\Big{|}\\ &=\Big{|}\frac{1}{\sqrt{n}}\sum_{j=1}^{n}\omega_{i,j}^{(l)}(\xi)\phi(f^{(l-1)}_{j}(x,n))-\frac{1}{\sqrt{n}}\sum_{j=1}^{n}\omega_{i,j}^{(l)}(\xi)\phi(f^{(l-1)}_{j}(y,n))\Big{|}\\ &\leq\frac{1}{\sqrt{n}}\sum_{j=1}^{n}\Big{|}\omega_{i,j}^{(l)}(\xi)\Big{|}\big{|}\phi(f^{(l-1)}_{j}(x,n))-\phi(f^{(l-1)}_{j}(y,n))\big{|}\\ &\leq\frac{L_{\phi}}{\sqrt{n}}\sum_{j=1}^{n}\Big{|}\omega_{i,j}^{(l)}(\xi)\Big{|}\big{|}f^{(l-1)}_{j}(x,n)-f^{(l-1)}_{j}(y,n)\big{|}\\ &\leq\frac{L_{\phi}}{\sqrt{n}}\sum_{j=1}^{n}\Big{|}\omega_{i,j}^{(l)}(\xi)\Big{|}H^{(l-1)}_{j}(n)(\xi)\|x-y\|_{{\mathbb{R}}^{I}}\\ &\leq\|x-y\|_{{\mathbb{R}}^{I}}\frac{L_{\phi}}{\sqrt{n}}\sum_{j=1}^{n}\Big{|}\omega_{i,j}^{(l)}(\xi)\Big{|}H^{(l-1)}_{j}(n)(\xi)\\ \end{split}

Set

Hi(l)(n)=Lϕnj=1n|ωi,j(l)|Hj(l1)(n)\displaystyle H^{(l)}_{i}(n)=\frac{L_{\phi}}{\sqrt{n}}\sum_{j=1}^{n}\Big{|}\omega_{i,j}^{(l)}\Big{|}H^{(l-1)}_{j}(n)

Thus we proved that fixed l1l\geq 1, and i1i\geq 1, for each nn\in{\mathbb{N}}

[{ξΩ:|fi(l)(x,n)(ξ)fi(l)(y,n)(ξ)|Hi(l)(n)(ξ)xyI}]=1.\mathbb{P}\Big{[}\Big{\{}\xi\in\Omega:|f^{(l)}_{i}(x,n)(\xi)-f^{(l)}_{i}(y,n)(\xi)|\leq H^{(l)}_{i}(n)(\xi)\|x-y\|_{{\mathbb{R}}^{I}}\Big{\}}\Big{]}=1.

Thus, each process fi(l)(1),fi(l)(2),f^{(l)}_{i}(1),f^{(l)}_{i}(2),\dots is \mathbb{P}-a.s. Lipschitz, in particular is \mathbb{P}-a.s. continuous processes, i.e. belongs to C(I;)C({\mathbb{R}}^{I};{\mathbb{R}}). In order to prove the continuity of fi(l)f^{(l)}_{i} we can not just take the limit as n+n\rightarrow+\infty of (12) because the left quantity converges to |fi(l)(x)fi(l)(y)||f_{i}^{(l)}(x)-f^{(l)}_{i}(y)| only in distribution and not \mathbb{P}-a.s., but we can prove the continuity by applying Proposition 2, as we will show in ??.

SM B.2

Fix i1i\geq 1, l1l\geq 1. We show the continuity of the limiting process fi(l)f^{(l)}_{i} by applying Proposition 2. Take two inputs x,yIx,y\in{\mathbb{R}}^{I}. From (7) we know that [fi(l)(x),fi(l)(y)]N2(0,Σ(l))[f^{(l)}_{i}(x),f^{(l)}_{i}(y)]\sim N_{2}(\textbf{0},\Sigma(l)) where

Σ(1)=σb2[1111]+σω2[xI2x,yIx,yIyI2],Σ(l)=σb2[1111]+σω2[|ϕ(u)|2ϕ(u)ϕ(v)ϕ(u)ϕ(v)|ϕ(v)|2]q(l1)(du,dv),\begin{split}&\Sigma(1)=\sigma_{b}^{2}\begin{bmatrix}1&1\\ 1&1\\ \end{bmatrix}+\sigma_{\omega}^{2}\begin{bmatrix}\|x\|_{{\mathbb{R}}^{I}}^{2}&\langle x,y\rangle_{{\mathbb{R}}^{I}}\\ \langle x,y\rangle_{{\mathbb{R}}^{I}}&\|y\|_{{\mathbb{R}}^{I}}^{2}\\ \end{bmatrix},\\ &\Sigma(l)=\sigma_{b}^{2}\begin{bmatrix}1&1\\ 1&1\\ \end{bmatrix}+\sigma_{\omega}^{2}\int\begin{bmatrix}|\phi(u)|^{2}&\phi(u)\phi(v)\\ \phi(u)\phi(v)&|\phi(v)|^{2}\\ \end{bmatrix}q^{(l-1)}(\mathrm{d}u,\mathrm{d}v),\end{split}

where q(l1)=N2(0,Σ(l1))q^{(l-1)}=N_{2}(\textbf{0},\Sigma(l-1)). We want to find two values α>0\alpha>0 and β>0\beta>0, and a constant H(l)>0H^{(l)}>0 such that

𝔼[|fi(l)(y)fi(l)(x)|α]H(l)yxII+β.{\mathbb{E}}\Big{[}|f^{(l)}_{i}(y)-f^{(l)}_{i}(x)|^{\alpha}\Big{]}\leq H^{(l)}\|y-x\|_{{\mathbb{R}}^{I}}^{I+\beta}.

Defining aT=[1,1]\textbf{a}^{T}=[1,-1] we have fi(l)(y)fi(l)(x)N(aT0,aTΣ(l)a)f^{(l)}_{i}(y)-f^{(l)}_{i}(x)\sim N(\textbf{a}^{T}\textbf{0},\textbf{a}^{T}\Sigma(l)\textbf{a}). Consider α=2θ\alpha=2\theta with θ\theta integer. Thus

|fi(l)(y)fi(l)(x)|2θ|aTΣ(l)aN(0,1)|2θ(aTΣ(l)a)θ|N(0,1)|2θ.|f^{(l)}_{i}(y)-f^{(l)}_{i}(x)|^{2\theta}\sim|\sqrt{\textbf{a}^{T}\Sigma(l)\textbf{a}}N(0,1)|^{2\theta}\sim(\textbf{a}^{T}\Sigma(l)\textbf{a})^{\theta}|N(0,1)|^{2\theta}.

We proceed by induction over the layers. For l=1l=1,

𝔼[|fi(1)(y)fi(1)(x)|2θ]=Cθ(aTΣ(1)a)θ=Cθ(σω2yI22σω2y,xI+σω2xI2)θ=Cθ(σω2)θ(yI22y,xI+xI2)θ=Cθ(σω2)θyxI2θ,\begin{split}{\mathbb{E}}\Big{[}|f^{(1)}_{i}(y)-f^{(1)}_{i}(x)|^{2\theta}\Big{]}&=C_{\theta}(\textbf{a}^{T}\Sigma(1)\textbf{a})^{\theta}\\ &=C_{\theta}(\sigma_{\omega}^{2}\|y\|_{{\mathbb{R}}^{I}}^{2}-2\sigma_{\omega}^{2}\langle y,x\rangle_{{\mathbb{R}}^{I}}+\sigma_{\omega}^{2}\|x\|_{{\mathbb{R}}^{I}}^{2})^{\theta}\\ &=C_{\theta}(\sigma_{\omega}^{2})^{\theta}(\|y\|_{{\mathbb{R}}^{I}}^{2}-2\langle y,x\rangle_{{\mathbb{R}}^{I}}+\|x\|_{{\mathbb{R}}^{I}}^{2})^{\theta}\\ &=C_{\theta}(\sigma_{\omega}^{2})^{\theta}\|y-x\|_{{\mathbb{R}}^{I}}^{2\theta},\end{split}

where CθC_{\theta} is the θ\theta-th moment of the chi-square distribution with one degree of freedom. By hypothesis ϕ\phi is Lipschitz.

|uv|2θq(l1)(du,dv)H(l1)yxI2θ.\int|u-v|^{2\theta}q^{(l-1)}(\mathrm{d}u,\mathrm{d}v)\leq H^{(l-1)}\|y-x\|^{2\theta}_{{\mathbb{R}}^{I}}.

Then,

|fi(l)(y)fi(l)(x)|2θ|N(0,1)|2θ(aTΣ(l)a)θ=|N(0,1)|2θ(σω2[|ϕ(u)|22ϕ(u)ϕ(v)+|ϕ(v)|2]q(l1)(du,dv))θ=|N(0,1)|2θ(σω2|ϕ(u)ϕ(v)|2q(l1)(du,dv))θ|N(0,1)|2θ(σω2Lϕ2)θ(|uv|2q(l1)(du,dv))θ|N(0,1)|2θ(σω2Lϕ2)θ|uv|2θq(l1)(du,dv)|N(0,1)|2θ(σω2Lϕ2)θH(l1)yxI2θ.\begin{split}|f^{(l)}_{i}(y)-f^{(l)}_{i}(x)|^{2\theta}&\sim|N(0,1)|^{2\theta}(\textbf{a}^{T}\Sigma(l)\textbf{a})^{\theta}\\ &=|N(0,1)|^{2\theta}\Big{(}\sigma^{2}_{\omega}\int[|\phi(u)|^{2}-2\phi(u)\phi(v)+|\phi(v)|^{2}]q^{(l-1)}(\mathrm{d}u,\mathrm{d}v)\Big{)}^{\theta}\\ &=|N(0,1)|^{2\theta}\Big{(}\sigma^{2}_{\omega}\int|\phi(u)-\phi(v)|^{2}q^{(l-1)}(\mathrm{d}u,\mathrm{d}v)\Big{)}^{\theta}\\ &\leq|N(0,1)|^{2\theta}(\sigma^{2}_{\omega}L_{\phi}^{2})^{\theta}\Big{(}\int|u-v|^{2}q^{(l-1)}(\mathrm{d}u,\mathrm{d}v)\Big{)}^{\theta}\\ &\leq|N(0,1)|^{2\theta}(\sigma^{2}_{\omega}L_{\phi}^{2})^{\theta}\int|u-v|^{2\theta}q^{(l-1)}(\mathrm{d}u,\mathrm{d}v)\\ &\leq|N(0,1)|^{2\theta}(\sigma^{2}_{\omega}L_{\phi}^{2})^{\theta}H^{(l-1)}\|y-x\|_{{\mathbb{R}}^{I}}^{2\theta}.\end{split}

Thus we conclude

𝔼[|fi(l)(y)fi(l)(x)|2θ]H(l)yxI2θ,\begin{split}{\mathbb{E}}\Big{[}|f^{(l)}_{i}(y)-f^{(l)}_{i}(x)|^{2\theta}\Big{]}\leq H^{(l)}\|y-x\|_{{\mathbb{R}}^{I}}^{2\theta},\end{split}

where the constant H(l)H^{(l)} can be explicitly derived by solving the following system

{H(1)=Cθ(σω2)θH(l)=Cθ(σω2Lϕ2)θH(l1).\left\{\begin{array}[]{@{}l@{}}H^{(1)}=C_{\theta}(\sigma^{2}_{\omega})^{\theta}\\ H^{(l)}=C_{\theta}(\sigma^{2}_{\omega}L_{\phi}^{2})^{\theta}H^{(l-1)}.\end{array}\right.

It is easy to get H(l)=Cθl(σω2)lθ(Lϕ2)(l1)θH^{(l)}=C_{\theta}^{l}(\sigma^{2}_{\omega})^{l\theta}(L_{\phi}^{2})^{(l-1)\theta}. Notice that this quantity does not depend on ii. Therefore, by Proposition 2, by placing α=2θ\alpha=2\theta and β=2θI\beta=2\theta-I, for every θ>I/2\theta>I/2 (β\beta needs to be positive then we take θ>I/2\theta>I/2) there exists a continuous version fi(l)(θ)f^{(l)(\theta)}_{i} of the process fi(l)f^{(l)}_{i} with \mathbb{P}-a.s. locally γ\gamma-Hölder paths for every 0<γ<1I2θ0<\gamma<1-\frac{I}{2\theta}.

  • Thus fi(l)(θ)f^{(l)(\theta)}_{i} and fi(l)f^{(l)}_{i} are indistinguishable (same trajectories), i.e there exists Ω(θ)Ω\Omega^{(\theta)}\subset\Omega with (Ω(θ))=1\mathbb{P}(\Omega^{(\theta)})=1 such that for each ωΩ(θ)\omega\in\Omega^{(\theta)}, xfi(l)(x)(ω)x\mapsto f^{(l)}_{i}(x)(\omega) is locally γ\gamma-Hölder for each 0<γ<1I2θ0<\gamma<1-\frac{I}{2\theta}.

  • Define Ω=θ>I/2Ω(θ)\Omega^{\star}=\bigcap_{\theta>I/2}\Omega^{(\theta)}, then for each 0<δ0<10<\delta_{0}<1 there exists θ0\theta_{0} such that δ0<1I2θ0<1\delta_{0}<1-\frac{I}{2\theta_{0}}<1, thus for each ωΩΩ(θ0)\omega\in\Omega^{\star}\subset\Omega^{(\theta_{0})}, the trajectory xfi(l)(x)(ω)x\mapsto f^{(l)}_{i}(x)(\omega) is locally δ0\delta_{0}-Hölder continuous.

By Proposition 2 we can conclude that fi(l)f^{(l)}_{i} has a continuous version and the latter is \mathbb{P}-a.s locally γ\gamma-Hölder continuous for every 0<γ<10<\gamma<1.

SM B.3

Fix i1i\geq 1, l1l\geq 1. We apply Proposition 3 to show the uniform tightness of the sequence (fi(l)(n))n1(f^{(l)}_{i}(n))_{n\geq 1} in C(I;)C({\mathbb{R}}^{I};{\mathbb{R}}). By Lemma 2 fi(l)(1),fi(l)(2),f^{(l)}_{i}(1),f^{(l)}_{i}(2),\dots are random elements in C(I;)C({\mathbb{R}}^{I};{\mathbb{R}}). First we show that the sequence f(0I,n)n1f(0_{{\mathbb{R}}^{I}},n)_{n\geq 1} is uniformly tight in {\mathbb{R}}. We use the following statement from (Dudley, 2002, Theorem 11.5.3)

Proposition 4.

Let (C,ρ)(C,\rho) be a metric space and suppose f(n)dff(n)\stackrel{{\scriptstyle d}}{{\rightarrow}}f where f(n)f(n) is tight for all nn. Then f(n)n1f(n)_{n\geq 1} is uniformly tight.

Since (,||)({\mathbb{R}},|\cdot|) is Polish every probability measure is tight, then f(0I,n)f(0_{{\mathbb{R}}^{I}},n) is tight in {\mathbb{R}} for every nn. Moreover, by Lemma 1 fi(0I,n)n1dfi(l)(0I)f_{i}(0_{{\mathbb{R}}^{I}},n)_{n\geq 1}\stackrel{{\scriptstyle d}}{{\rightarrow}}f_{i}^{(l)}(0_{{\mathbb{R}}^{I}}), then by Proposition (4) f(0I,n)n1f(0_{{\mathbb{R}}^{I}},n)_{n\geq 1} is uniformly tight in {\mathbb{R}}. In order to apply Proposition 3 it remains to show that there exist two values α>0\alpha>0 and β>0\beta>0, and a constant H(l)>0H^{(l)}>0 such that

𝔼[|fi(l)(y,n)fi(l)(x,n)|α]H(l)yxII+β,x,yI,n{\mathbb{E}}\Big{[}|f_{i}^{(l)}(y,n)-f_{i}^{(l)}(x,n)|^{\alpha}\Big{]}\leq H^{(l)}\|y-x\|_{{\mathbb{R}}^{I}}^{I+\beta},\quad x,y\in{\mathbb{R}}^{I},n\in{\mathbb{N}}

uniformly in nn. The first idea could be try to bound (uniformly in nn) the expected value of Hi(l)(n)H^{(l)}_{i}(n) obtained in (SM B.1), but this turns out to be very difficult. Thus we choose another way. Take two points x,yIx,y\in{\mathbb{R}}^{I}. From (11) we know that fi(l)(y,n)|f1,,n(l1)N(0,σy2(l,n))f_{i}^{(l)}(y,n)|f^{(l-1)}_{1,\dots,n}\sim N(0,\sigma^{2}_{y}(l,n)) and fi(l)(x,n)|f1,,n(l1)N(0,σx2(l,n))f_{i}^{(l)}(x,n)|f^{(l-1)}_{1,\dots,n}\sim N(0,\sigma^{2}_{x}(l,n)) with joint distribution N2(0,Σ(l,n))N_{2}(\textbf{0},\Sigma(l,n)), where

Σ(1)=[σx2(1)Σ(1)x,yΣ(1)x,yσy2(1)],Σ(l)=[σx2(l,n)Σ(l,n)x,yΣ(l,n)x,yσy2(l,n)],\begin{split}&\Sigma(1)=\begin{bmatrix}\sigma^{2}_{x}(1)&\Sigma(1)_{x,y}\\ \Sigma(1)_{x,y}&\sigma^{2}_{y}(1)\\ \end{bmatrix},\\ &\Sigma(l)=\begin{bmatrix}\sigma^{2}_{x}(l,n)&\Sigma(l,n)_{x,y}\\ \Sigma(l,n)_{x,y}&\sigma^{2}_{y}(l,n)\\ \end{bmatrix},\end{split}

with,

σx2(1)=σb2+σω2xI2,σy2(1)=σb2+σω2yI2,Σ(1)x,y=σb2+σω2x,yI,σx2(l,n)=σb2+σω2nj=1n|ϕfj(l1)(x,n)|2,σy2(l,n)=σb2+σω2nj=1n|ϕfj(l1)(y,n)|2,Σ(l,n)x,y=σb2+σω2nj=1nϕ(fj(l1)(x,n))ϕ(fj(l1)(y,n))\begin{split}&\sigma^{2}_{x}(1)=\sigma_{b}^{2}+\sigma_{\omega}^{2}\|x\|_{{\mathbb{R}}^{I}}^{2},\\ &\sigma^{2}_{y}(1)=\sigma_{b}^{2}+\sigma_{\omega}^{2}\|y\|_{{\mathbb{R}}^{I}}^{2},\\ &\Sigma(1)_{x,y}=\sigma_{b}^{2}+\sigma_{\omega}^{2}\langle x,y\rangle_{{\mathbb{R}}^{I}},\\ &\sigma^{2}_{x}(l,n)=\sigma_{b}^{2}+\frac{\sigma_{\omega}^{2}}{n}\sum_{j=1}^{n}|\phi\circ f_{j}^{(l-1)}(x,n)|^{2},\\ &\sigma^{2}_{y}(l,n)=\sigma_{b}^{2}+\frac{\sigma_{\omega}^{2}}{n}\sum_{j=1}^{n}|\phi\circ f_{j}^{(l-1)}(y,n)|^{2},\\ &\Sigma(l,n)_{x,y}=\sigma_{b}^{2}+\frac{\sigma_{\omega}^{2}}{n}\sum_{j=1}^{n}\phi(f^{(l-1)}_{j}(x,n))\phi(f^{(l-1)}_{j}(y,n))\end{split}

Defining aT=[1,1]\textbf{a}^{T}=[1,-1] we have that fi(l)(y,n)|f1,,n(l1)fi(l)(x,n)|f1,,n(l1)f_{i}^{(l)}(y,n)|f^{(l-1)}_{1,\dots,n}-f_{i}^{(l)}(x,n)|f^{(l-1)}_{1,\dots,n} is distributed as N(aT0,aTΣ(l,n)a)N(\textbf{a}^{T}\textbf{0},\textbf{a}^{T}\Sigma(l,n)\textbf{a}), where

aTΣ(l,n)a=σy2(l,n)2Σ(l,n)x,y+σx2(l,n).\textbf{a}^{T}\Sigma(l,n)\textbf{a}=\sigma_{y}^{2}(l,n)-2\Sigma(l,n)_{x,y}+\sigma_{x}^{2}(l,n).

Consider α=2θ\alpha=2\theta with θ\theta integer. Thus

|fi(l)(y,n)|f1,,n(l1)fi(l)(x,n)|f1,,n(l1)|2θ|aTΣ(l,n)aN(0,1)|2θ(aTΣ(l,n)a)θ|N(0,1)|2θ.\Big{|}f_{i}^{(l)}(y,n)|f^{(l-1)}_{1,\dots,n}-f_{i}^{(l)}(x,n)|f^{(l-1)}_{1,\dots,n}\Big{|}^{2\theta}\sim|\sqrt{\textbf{a}^{T}\Sigma(l,n)\textbf{a}}N(0,1)|^{2\theta}\sim(\textbf{a}^{T}\Sigma(l,n)\textbf{a})^{\theta}|N(0,1)|^{2\theta}.

Start first with the case l=1l=1.

𝔼[|fi(1)(y,n)fi(1)(x,n)|2θ]=Cθ(aTΣ(1)a)θ=Cθ(σω2yI22σω2y,xI+σω2xI2)θ=Cθ(σω2)θ(yI22y,xI+xI2)θ=Cθ(σω2)θyxI2θ,\begin{split}{\mathbb{E}}\Big{[}|f_{i}^{(1)}(y,n)-f_{i}^{(1)}(x,n)|^{2\theta}\Big{]}&=C_{\theta}(\textbf{a}^{T}\Sigma(1)\textbf{a})^{\theta}\\ &=C_{\theta}(\sigma_{\omega}^{2}\|y\|_{{\mathbb{R}}^{I}}^{2}-2\sigma_{\omega}^{2}\langle y,x\rangle_{{\mathbb{R}}^{I}}+\sigma_{\omega}^{2}\|x\|_{{\mathbb{R}}^{I}}^{2})^{\theta}\\ &=C_{\theta}(\sigma_{\omega}^{2})^{\theta}(\|y\|_{{\mathbb{R}}^{I}}^{2}-2\langle y,x\rangle_{{\mathbb{R}}^{I}}+\|x\|_{{\mathbb{R}}^{I}}^{2})^{\theta}\\ &=C_{\theta}(\sigma_{\omega}^{2})^{\theta}\|y-x\|_{{\mathbb{R}}^{I}}^{2\theta},\end{split}

where CθC_{\theta} is the θ\theta-th moment of the chi-square distribution with one degree of freedom. Set H(1)=Cθ(σω2)θH^{(1)}=C_{\theta}(\sigma_{\omega}^{2})^{\theta}. By hypothesis induction suppose that for every j1j\geq 1

𝔼[|fj(l1)(y,n)fj(l1)(x,n)|2θ]H(l1)yxI2θ.{\mathbb{E}}\Big{[}|f^{(l-1)}_{j}(y,n)-f^{(l-1)}_{j}(x,n)|^{2\theta}\Big{]}\leq H^{(l-1)}\|y-x\|_{{\mathbb{R}}^{I}}^{2\theta}.

By hypothesis ϕ\phi is Lipschitz, then

𝔼[|fi(l)(y,n)fi(l)(x,n)|2θ|f1,,n(l1)]=Cθ(aTΣ(l,n)a)θ=Cθ(σy2(l,n)2Σ(l,n)x,y+σx2(l,n))θ=Cθ(σω2nj=1n|ϕfj(l1)(y,n)ϕfj(l1)(x,n)|2)θCθ(σω2Lϕ2nj=1n|fj(l1)(y,n)fj(l1)(x,n)|2)θ=Cθ(σω2Lϕ2)θnθ(j=1n|fj(l1)(y,n)fj(l1)(x,n)|2)θCθ(σω2Lϕ2)θnj=1n|fj(l1)(y,n)fj(l1)(x,n)|2θ.\begin{split}{\mathbb{E}}\Big{[}|f_{i}^{(l)}(y,n)-f_{i}^{(l)}(x,n)|^{2\theta}\Big{|}f^{(l-1)}_{1,\dots,n}\Big{]}&=C_{\theta}(\textbf{a}^{T}\Sigma(l,n)\textbf{a})^{\theta}\\ &=C_{\theta}\Big{(}\sigma_{y}^{2}(l,n)-2\Sigma(l,n)_{x,y}+\sigma_{x}^{2}(l,n)\Big{)}^{\theta}\\ &=C_{\theta}\Big{(}\frac{\sigma^{2}_{\omega}}{n}\sum_{j=1}^{n}\Big{|}\phi\circ f_{j}^{(l-1)}(y,n)-\phi\circ f_{j}^{(l-1)}(x,n)\Big{|}^{2}\Big{)}^{\theta}\\ &\leq C_{\theta}\Big{(}\frac{\sigma^{2}_{\omega}L_{\phi}^{2}}{n}\sum_{j=1}^{n}\Big{|}f_{j}^{(l-1)}(y,n)-f_{j}^{(l-1)}(x,n)\Big{|}^{2}\Big{)}^{\theta}\\ &=C_{\theta}\frac{(\sigma^{2}_{\omega}L_{\phi}^{2})^{\theta}}{n^{\theta}}\Big{(}\sum_{j=1}^{n}\Big{|}f_{j}^{(l-1)}(y,n)-f_{j}^{(l-1)}(x,n)\Big{|}^{2}\Big{)}^{\theta}\\ &\leq C_{\theta}\frac{(\sigma^{2}_{\omega}L_{\phi}^{2})^{\theta}}{n}\sum_{j=1}^{n}\Big{|}f_{j}^{(l-1)}(y,n)-f_{j}^{(l-1)}(x,n)\Big{|}^{2\theta}.\\ \end{split}

Using the induction hypothesis

𝔼[|fi(l)(y,n)fi(l)(x,n)|2θ]=𝔼[𝔼[|fi(l)(y,n)fi(l)(x,n)|2θ|f1,,n(l1)]]Cθ(σω2Lϕ2)θnj=1n𝔼[|fj(l1)(y,n)fj(l1)(x,n)|2θ]Cθ(σω2Lϕ2)θH(l1)yxI2θ.\begin{split}{\mathbb{E}}\Big{[}|f_{i}^{(l)}(y,n)-f_{i}^{(l)}(x,n)|^{2\theta}\Big{]}&={\mathbb{E}}\Big{[}{\mathbb{E}}\Big{[}|f_{i}^{(l)}(y,n)-f_{i}^{(l)}(x,n)|^{2\theta}\Big{|}f^{(l-1)}_{1,\dots,n}\Big{]}\Big{]}\\ &\leq C_{\theta}\frac{(\sigma^{2}_{\omega}L_{\phi}^{2})^{\theta}}{n}\sum_{j=1}^{n}{\mathbb{E}}\Big{[}|f_{j}^{(l-1)}(y,n)-f_{j}^{(l-1)}(x,n)|^{2\theta}\Big{]}\\ &\leq C_{\theta}(\sigma^{2}_{\omega}L_{\phi}^{2})^{\theta}H^{(l-1)}\|y-x\|^{2\theta}_{{\mathbb{R}}^{I}}.\end{split}

We can get the constant H(l)H^{(l)} by solving the same system as (19), obtaining H(l)=Cθl(σω2)lθ(Lϕ2)(l1)θH^{(l)}=C_{\theta}^{l}(\sigma^{2}_{\omega})^{l\theta}(L_{\phi}^{2})^{(l-1)\theta} which does not depend on nn. By Proposition 3 setting α=2θ\alpha=2\theta and β=2θI\beta=2\theta-I, since β\beta must be a positive constant, it is sufficient to take θ>I/2\theta>I/2 and this concludes the proof.

SM C

Fix kk inputs X=[x(1),,x(k)]\textbf{X}=[x^{(1)},\dots,x^{(k)}] and a layer ll. We show that as n+n\rightarrow+\infty

(fi(l)(X,n))i1di=1Nk(0,Σ(l))\displaystyle\Big{(}f^{(l)}_{i}(\textbf{X},n)\Big{)}_{i\geq 1}\stackrel{{\scriptstyle d}}{{\rightarrow}}\bigotimes_{i=1}^{\infty}N_{k}(\textbf{0},\Sigma(l))

where \bigotimes denotes the product measure and with Σ(l)\Sigma(l) as in (7). We prove this statement by proving the nn large asymptotic behaviour of any finite linear combination of the fi(l)(X,n)f^{(l)}_{i}(\textbf{X},n)’s, for ii\in\mathcal{L}\subset{\mathbb{N}}. See, e.g. Billingsley (1999) for details. Following the notation of Matthews et al. (2018b), consider a finite linear combination of the function values without the bias, i.e.,

𝒯(l)(,p,X,n)=ipi[fi(l)(X,n)bi(l)1].\mathcal{T}^{(l)}(\mathcal{L},p,\textbf{X},n)=\sum_{i\in\mathcal{L}}p_{i}[f^{(l)}_{i}(\textbf{X},n)-b_{i}^{(l)}\textbf{1}].

Then for the first layer we write

𝒯(1)(,p,X)=ipi[j=1Iωi,j(1)xj]=j=1Iγj(1)(,p,X),\begin{split}\mathcal{T}^{(1)}(\mathcal{L},p,\textbf{X})&=\sum_{i\in\mathcal{L}}p_{i}\Big{[}\sum_{j=1}^{I}\omega_{i,j}^{(1)}\textbf{x}_{j}\Big{]}\\ &=\sum_{j=1}^{I}\gamma_{j}^{(1)}(\mathcal{L},p,\textbf{X}),\end{split}

where

γj(1)(,p,X)=ipiωi,j(l)xj.\begin{split}\gamma_{j}^{(1)}(\mathcal{L},p,\textbf{X})=\sum_{i\in\mathcal{L}}p_{i}\omega_{i,j}^{(l)}\textbf{x}_{j}.\end{split}

and for any l2l\geq 2

𝒯(l)(,p,X,n)=ipi[1nj=1nωi,j(l)(ϕfj(l1)(X,n))]=1nj=1nγj(l)(,p,X,n),\begin{split}\mathcal{T}^{(l)}(\mathcal{L},p,\textbf{X},n)&=\sum_{i\in\mathcal{L}}p_{i}\Big{[}\frac{1}{\sqrt{n}}\sum_{j=1}^{n}\omega_{i,j}^{(l)}(\phi\bullet f_{j}^{(l-1)}(\textbf{X},n))\Big{]}\\ &=\frac{1}{\sqrt{n}}\sum_{j=1}^{n}\gamma_{j}^{(l)}(\mathcal{L},p,\textbf{X},n),\end{split}

where

γj(l)(,p,X,n)=ipiωi,j(l)(ϕfj(l1)(X,n)).\begin{split}\gamma_{j}^{(l)}(\mathcal{L},p,\textbf{X},n)=\sum_{i\in\mathcal{L}}p_{i}\omega_{i,j}^{(l)}(\phi\bullet f_{j}^{(l-1)}(\textbf{X},n)).\end{split}

For the first layer we get

φ𝒯(1)(,p,X)(t)=𝔼[eitT𝒯(1)(,p,X)]=𝔼[exp{itT[j=1Iipiωi,j(1)xj]}]=j=1Ii𝔼[exp{itT[piωi,j(1)xj]}]=j=1Iiexp{σω22pi2(tTxj)2}=exp{σω22nipi2j=1n(tTxj)2}=exp{12tTΘ(,p,1)t},\begin{split}\varphi_{\mathcal{T}^{(1)}(\mathcal{L},p,\textbf{X})}(\textbf{t})&={\mathbb{E}}\Big{[}e^{{\rm i}\textbf{t}^{T}\mathcal{T}^{(1)}(\mathcal{L},p,\textbf{X})}\Big{]}\\ &={\mathbb{E}}\Big{[}\exp\Big{\{}{\rm i}\textbf{t}^{T}\Big{[}\sum_{j=1}^{I}\sum_{i\in\mathcal{L}}p_{i}\omega_{i,j}^{(1)}\textbf{x}_{j}\Big{]}\Big{\}}\Big{]}\\ &=\prod_{j=1}^{I}\prod_{i\in\mathcal{L}}{\mathbb{E}}\Big{[}\exp\Big{\{}{\rm i}\textbf{t}^{T}\Big{[}p_{i}\omega_{i,j}^{(1)}\textbf{x}_{j}\Big{]}\Big{\}}\Big{]}\\ &=\prod_{j=1}^{I}\prod_{i\in\mathcal{L}}\exp\Big{\{}-\frac{\sigma^{2}_{\omega}}{2}p_{i}^{2}\Big{(}\textbf{t}^{T}\textbf{x}_{j}\Big{)}^{2}\Big{\}}\\ &=\exp\Big{\{}-\frac{\sigma^{2}_{\omega}}{2n}\sum_{i\in\mathcal{L}}p_{i}^{2}\sum_{j=1}^{n}\Big{(}\textbf{t}^{T}\textbf{x}_{j}\Big{)}^{2}\Big{\}}\\ &=\exp\Big{\{}-\frac{1}{2}\textbf{t}^{T}\Theta(\mathcal{L},p,1)\textbf{t}\Big{\}},\\ \end{split}

i.e.

𝒯(1)(,p,X)=dNk(0,Θ(,p,1)),\mathcal{T}^{(1)}(\mathcal{L},p,\textbf{X})\stackrel{{\scriptstyle d}}{{=}}N_{k}(\textbf{0},\Theta(\mathcal{L},p,1)),

with k×kk\times k covariance matrix with element in the ii-th row and jj-th column as follows

Θi,j(,p,1)=pTpσω2x(i),x(j)I,\begin{split}&\Theta_{i,j}(\mathcal{L},p,1)=p^{T}p\sigma_{\omega}^{2}\langle x^{(i)},x^{(j)}\rangle_{{\mathbb{R}}^{I}},\end{split}

where pTp=ipi2p^{T}p=\sum_{i\in\mathcal{L}}p_{i}^{2}. For l2l\geq 2 we get

φ𝒯(l)(,p,X,n)|f1,,n(l1)(t)=𝔼[eitT𝒯(l)(,p,X,n)|f1,,n(l1)]=𝔼[exp{itT[1nj=1nipiωi,j(l)(ϕfj(l1)(X,n))]}|f1,,n(l1)]=j=1ni𝔼[exp{itT[1npiωi,j(l)(ϕfj(l1)(X,n))]}|f1,,n(l1)]=j=1niexp{σω22npi2(tT(ϕfj(l1)(X,n)))2}=exp{σω22nipi2j=1n(tT(ϕfj(l1)(X,n)))2}=exp{12tTΘ(,p,l,n)t},\begin{split}\varphi_{\mathcal{T}^{(l)}(\mathcal{L},p,\textbf{X},n)|f_{1,\dots,n}^{(l-1)}}(\textbf{t})&={\mathbb{E}}\Big{[}e^{{\rm i}\textbf{t}^{T}\mathcal{T}^{(l)}(\mathcal{L},p,\textbf{X},n)}|f_{1,\dots,n}^{(l-1)}\Big{]}\\ &={\mathbb{E}}\Big{[}\exp\Big{\{}{\rm i}\textbf{t}^{T}\Big{[}\frac{1}{\sqrt{n}}\sum_{j=1}^{n}\sum_{i\in\mathcal{L}}p_{i}\omega_{i,j}^{(l)}(\phi\bullet f_{j}^{(l-1)}(\textbf{X},n))\Big{]}\Big{\}}|f_{1,\dots,n}^{(l-1)}\Big{]}\\ &=\prod_{j=1}^{n}\prod_{i\in\mathcal{L}}{\mathbb{E}}\Big{[}\exp\Big{\{}{\rm i}\textbf{t}^{T}\Big{[}\frac{1}{\sqrt{n}}p_{i}\omega_{i,j}^{(l)}(\phi\bullet f_{j}^{(l-1)}(\textbf{X},n))\Big{]}\Big{\}}|f_{1,\dots,n}^{(l-1)}\Big{]}\\ &=\prod_{j=1}^{n}\prod_{i\in\mathcal{L}}\ \exp\Big{\{}-\frac{\sigma^{2}_{\omega}}{2n}p_{i}^{2}\Big{(}\textbf{t}^{T}(\phi\bullet f_{j}^{(l-1)}(\textbf{X},n))\Big{)}^{2}\Big{\}}\\ &=\ \exp\Big{\{}-\frac{\sigma^{2}_{\omega}}{2n}\sum_{i\in\mathcal{L}}p_{i}^{2}\sum_{j=1}^{n}\Big{(}\textbf{t}^{T}(\phi\bullet f_{j}^{(l-1)}(\textbf{X},n))\Big{)}^{2}\Big{\}}\\ &=\exp\Big{\{}-\frac{1}{2}\textbf{t}^{T}\Theta(\mathcal{L},p,l,n)\textbf{t}\Big{\}},\\ \end{split}

i.e.

𝒯(l)(,p,X,n)|f1,,n(l1)=dNk(0,Θ(,p,l,n)),\mathcal{T}^{(l)}(\mathcal{L},p,\textbf{X},n)|f_{1,\dots,n}^{(l-1)}\stackrel{{\scriptstyle d}}{{=}}N_{k}(\textbf{0},\Theta(\mathcal{L},p,l,n)),

with k×kk\times k covariance matrix with element in the ii-th row and jj-th column as follows

Θi,j(,p,l,n)=pTpσω2n(ϕFi(l1)(X,n)),(ϕFj(l1)(X,n))n,\begin{split}&\Theta_{i,j}(\mathcal{L},p,l,n)=p^{T}p\frac{\sigma_{\omega}^{2}}{n}\Big{\langle}(\phi\bullet\textbf{F}^{(l-1)}_{i}({\textbf{X},n})),(\phi\bullet\textbf{F}^{(l-1)}_{j}({\textbf{X},n}))\Big{\rangle}_{{\mathbb{R}}^{n}},\end{split}

where pTp=ipi2p^{T}p=\sum_{i\in\mathcal{L}}p_{i}^{2}. Thus, along lines similar to the proof of the large nn asymptotics for the ithi-th coordinate (just replacing σb20\sigma_{b}^{2}\leftarrow 0 and σω2pTpσω2\sigma^{2}_{\omega}\leftarrow p^{T}p\sigma^{2}_{\omega}), we have that for any l2l\geq 2, as n+n\rightarrow+\infty,

φ𝒯(l)(,p,X,n)(t)exp{12pTpσω2(tT(ϕf))2q(l1)(df)]}=exp{12tTΘ(,p,l)t},\begin{split}\varphi_{\mathcal{T}^{(l)}(\mathcal{L},p,\textbf{X},n)}(\textbf{t})&\rightarrow\exp\Big{\{}-\frac{1}{2}p^{T}p\sigma_{\omega}^{2}\int\Big{(}\textbf{t}^{T}(\phi\bullet f)\Big{)}^{2}q^{(l-1)}(\mathrm{d}f)\Big{]}\Big{\}}\\ &=\exp\Big{\{}-\frac{1}{2}\textbf{t}^{T}\Theta(\mathcal{L},p,l)\textbf{t}\Big{\}},\\ \end{split}

i.e. 𝒯(l)(,p,X,n)\mathcal{T}^{(l)}(\mathcal{L},p,\textbf{X},n) converges weakly to a kk-dimensional Gaussian distribution with mean 0 and k×kk\times k covariance matrix Θ(,p,l)\Theta(\mathcal{L},p,l) with elements

Θi,j(,p,l)=pTpσω2ϕ(fi)ϕ(fj)q(l1)(df),\begin{split}&\Theta_{i,j}(\mathcal{L},p,l)=p^{T}p\sigma_{\omega}^{2}\int\phi(f_{i})\phi(f_{j})q^{(l-1)}(\mathrm{d}f),\end{split}

where q(l1)(df)=q(l1)(df1,,dfk)=Nk(0,Θ(,p,l1))dfq^{(l-1)}(\mathrm{d}f)=q^{(l-1)}(\mathrm{d}f_{1},\dots,\mathrm{d}f_{k})=N_{k}(\textbf{0},\Theta(\mathcal{L},p,l-1))\mathrm{d}f. To complete the proof just observe that Θ(,p,l)=pTpΣ(l)\Theta(\mathcal{L},p,l)=p^{T}p\Sigma(l).

SM D.1

We will use, without explicit mention, that the series i=1qi\sum_{i=1}^{\infty}q^{i} converges when |q|<1|q|<1. In particular when q=1/2q=\nicefrac{{1}}{{2}} the series sum to 1. Fix, l1l\geq 1 and nn\in{\mathbb{N}} . We prove that there exists a random variable H(l)(n)H^{(l)}(n) such that

d(F(l)(x,n),F(l)(y,n))H(l)(n)xyI,x,yI,a.s.d\big{(}\textbf{F}^{(l)}(x,n),\textbf{F}^{(l)}(y,n)\big{)}_{\infty}\leq H^{(l)}(n)\|x-y\|_{{\mathbb{R}}^{I}},\quad x,y\in{\mathbb{R}}^{I},\mathbb{P}-a.s.

It immediately derives from the Lipschitzianity of each component, indeed by (12) we get

d(F(l)(x,n),F(l)(y,n))=i=112i|fi(l)(x,n)fi(l)(y,n)|1+|fi(l)(x,n)fi(l)(y,n)|i=112i|fi(l)(x,n)fi(l)(y,n)|xyIi=112iHi(l)(n).\begin{split}d\big{(}\textbf{F}^{(l)}(x,n),\textbf{F}^{(l)}(y,n)\big{)}_{\infty}&=\sum_{i=1}^{\infty}\frac{1}{2^{i}}\frac{|f^{(l)}_{i}(x,n)-f^{(l)}_{i}(y,n)|}{1+|f^{(l)}_{i}(x,n)-f^{(l)}_{i}(y,n)|}\\ &\leq\sum_{i=1}^{\infty}\frac{1}{2^{i}}|f^{(l)}_{i}(x,n)-f^{(l)}_{i}(y,n)|\\ &\leq\|x-y\|_{{\mathbb{R}}^{I}}\sum_{i=1}^{\infty}\frac{1}{2^{i}}H^{(l)}_{i}(n).\end{split}

It remains to show that the series i=112iHi(l)(n)\sum_{i=1}^{\infty}\frac{1}{2^{i}}H^{(l)}_{i}(n) converges almost surely. By (SM B.1) we get

i=112iHi(l)(n)=i=112iLϕnj=1n|ωi,j(l)|Hj(l1)(n)=Lϕnj=1nHj(l1)(n)i=1|ωi,j(l)|2i.\begin{split}\sum_{i=1}^{\infty}\frac{1}{2^{i}}H^{(l)}_{i}(n)&=\sum_{i=1}^{\infty}\frac{1}{2^{i}}\frac{L_{\phi}}{\sqrt{n}}\sum_{j=1}^{n}\big{|}\omega_{i,j}^{(l)}\big{|}H_{j}^{(l-1)}(n)\\ &=\frac{L_{\phi}}{\sqrt{n}}\sum_{j=1}^{n}H_{j}^{(l-1)}(n)\sum_{i=1}^{\infty}\frac{|\omega_{i,j}^{(l)}|}{2^{i}}.\end{split}

It remains to show the convergence almost surely of the series i=1|ωi,j(l)|2i\sum_{i=1}^{\infty}\frac{|\omega_{i,j}^{(l)}|}{2^{i}}. We apply the three-series Kolmogorov criterion (Kallenberg, 2002, Theorem 4.18). Call Xi:=|ωi,j(l)|2iX_{i}:=\frac{|\omega_{i,j}^{(l)}|}{2^{i}}

  • By Markov inequality (Xi>1)𝔼[Xi]=𝔼[|N(0,σω2)|]2i\mathbb{P}(X_{i}>1)\leq{\mathbb{E}}[X_{i}]=\frac{{\mathbb{E}}[|N(0,\sigma_{\omega}^{2})|]}{2^{i}}, thus i=1(Xi>1)𝔼[|N(0,σω2)|]<\sum_{i=1}^{\infty}\mathbb{P}(X_{i}>1)\leq{\mathbb{E}}[|N(0,\sigma_{\omega}^{2})|]<\infty

  • Call Yi=Xi𝕀{Xi1}XiY_{i}=X_{i}\mathbb{I}_{\{X_{i}\leq 1\}}\leq X_{i}. Then i=1𝔼[Yi]i=1𝔼[Xi]=i=1𝔼[|N(0,σω)|]2i=𝔼[|N(0,σω2)|]<\sum_{i=1}^{\infty}{\mathbb{E}}[Y_{i}]\leq\sum_{i=1}^{\infty}{\mathbb{E}}[X_{i}]=\sum_{i=1}^{\infty}\frac{{\mathbb{E}}[|N(0,\sigma_{\omega})|]}{2^{i}}={\mathbb{E}}[|N(0,\sigma_{\omega}^{2})|]<\infty

  • V(Yi)=𝔼[Yi2]𝔼2[Yi]\mathrm{V}(Y_{i})={\mathbb{E}}[Y_{i}^{2}]-{\mathbb{E}}^{2}[Y_{i}], thus i=1V(Yi)=i=1𝔼[Yi2]i=1𝔼2[Yi]\sum_{i=1}^{\infty}\mathrm{V}(Y_{i})=\sum_{i=1}^{\infty}{\mathbb{E}}[Y_{i}^{2}]-\sum_{i=1}^{\infty}{\mathbb{E}}^{2}[Y_{i}]. The first series converges since 𝔼[Yi2]𝔼[Xi2]=σω2𝔼[χ2(1)]4i=σω24i{\mathbb{E}}[Y_{i}^{2}]\leq{\mathbb{E}}[X_{i}^{2}]=\frac{\sigma_{\omega}^{2}{\mathbb{E}}[\chi^{2}(1)]}{4^{i}}=\frac{\sigma_{\omega}^{2}}{4^{i}} (then 𝔼Yiσω214i<\sum{\mathbb{E}}Y_{i}\leq\sigma_{\omega}^{2}\sum\frac{1}{4^{i}}<\infty), and the other series converges since 0<𝔼[Yi]𝔼[Xi]0<{\mathbb{E}}[Y_{i}]\leq{\mathbb{E}}[X_{i}] implies 𝔼2[Yi]𝔼2[Xi]=𝔼2[|N(0,σω2)|]4i{\mathbb{E}}^{2}[Y_{i}]\leq{\mathbb{E}}^{2}[X_{i}]=\frac{{\mathbb{E}}^{2}[|N(0,\sigma_{\omega}^{2})|]}{4^{i}} (then 𝔼2[Yi]𝔼2[|N(0,σω2)|]14i<\sum{\mathbb{E}}^{2}[Y_{i}]\leq{\mathbb{E}}^{2}[|N(0,\sigma_{\omega}^{2})|]\sum\frac{1}{4^{i}}<\infty).

Denoting Qj(l)=i=1|ωi,j(l)|2iQ_{j}^{(l)}=\sum_{i=1}^{\infty}\frac{|\omega_{i,j}^{(l)}|}{2^{i}} and by setting H(l)(n):=Lϕnj=1nHj(l1)(n)Qj(l)H^{(l)}(n):=\frac{L_{\phi}}{\sqrt{n}}\sum_{j=1}^{n}H_{j}^{(l-1)}(n)Q_{j}^{(l)} we complete the proof.

SM D.2

Fix l1l\geq 1. We show the continuity of the limiting process F(l)\textbf{F}^{(l)} by applying Proposition 2. We will use, without explicit mention, that the function rr1+rr\mapsto\frac{r}{1+r} is bounded by 1 for r>0r>0. Take two inputs x,yIx,y\in{\mathbb{R}}^{I} and fix α12\alpha\geq 12 even integer. Since i=112i|f(l)(x)fi(l)(y)|1+|f(l)(x)fi(l)(y)|<i=112i=1\sum_{i=1}^{\infty}\frac{1}{2^{i}}\frac{|f^{(l)}(x)-f^{(l)}_{i}(y)|}{1+|f^{(l)}(x)-f^{(l)}_{i}(y)|}<\sum_{i=1}^{\infty}\frac{1}{2^{i}}=1 and, by Jensen inequality, also i=112i(|f(l)(x)fi(l)(y)|1+|f(l)(x)fi(l)(y)|)α<i=112i=1\sum_{i=1}^{\infty}\frac{1}{2^{i}}\big{(}\frac{|f^{(l)}(x)-f^{(l)}_{i}(y)|}{1+|f^{(l)}(x)-f^{(l)}_{i}(y)|}\big{)}^{\alpha}<\sum_{i=1}^{\infty}\frac{1}{2^{i}}=1, we get

d(F(l)(x),F(l)(y))α=(i=112i|fi(l)(x)fi(l)(y)|1+|fi(l)(x)fi(l)(y)|)αi=112i(|fi(l)(x)fi(l)(y)|1+|fi(l)(x)fi(l)(y)|)αi=112i|fi(l)(x)fi(l)(y)|α\begin{split}d\big{(}\textbf{F}^{(l)}(x),\textbf{F}^{(l)}(y)\big{)}_{\infty}^{\alpha}&=\Big{(}\sum_{i=1}^{\infty}\frac{1}{2^{i}}\frac{|f_{i}^{(l)}(x)-f^{(l)}_{i}(y)|}{1+|f_{i}^{(l)}(x)-f^{(l)}_{i}(y)|}\Big{)}^{\alpha}\\ &\leq\sum_{i=1}^{\infty}\frac{1}{2^{i}}\Big{(}\frac{|f_{i}^{(l)}(x)-f^{(l)}_{i}(y)|}{1+|f_{i}^{(l)}(x)-f^{(l)}_{i}(y)|}\Big{)}^{\alpha}\\ &\leq\sum_{i=1}^{\infty}\frac{1}{2^{i}}|f_{i}^{(l)}(x)-f^{(l)}_{i}(y)|^{\alpha}\\ \end{split}

Thus, by applying monotone convergence theorem to the positive increasing sequence g(N)=i=1N12i|fi(l)(x)fi(l)(y)|αg(N)=\sum_{i=1}^{N}\frac{1}{2^{i}}|f_{i}^{(l)}(x)-f^{(l)}_{i}(y)|^{\alpha} (which allows to exchange 𝔼{\mathbb{E}} and i=1\sum_{i=1}^{\infty}), we get

𝔼[d(F(l)(x),F(l)(y))α]𝔼[i=112i|f(l)(x)fi(l)(y)|α]=𝔼[limNi=1N12i|fi(l)(x)fi(l)(y)|α]=limN𝔼[i=1N12i|fi(l)(x)fi(l)(y)|α]=i=112i𝔼[|fi(l)(x)fi(l)(y)|α]=i=112iH(l)xyIα=H(l)xyIα\begin{split}{\mathbb{E}}\Big{[}d\big{(}\textbf{F}^{(l)}(x),\textbf{F}^{(l)}(y)\big{)}_{\infty}^{\alpha}\Big{]}&\leq{\mathbb{E}}\Big{[}\sum_{i=1}^{\infty}\frac{1}{2^{i}}|f^{(l)}(x)-f^{(l)}_{i}(y)|^{\alpha}\Big{]}\\ &={\mathbb{E}}\Big{[}\lim_{N\rightarrow\infty}\sum_{i=1}^{N}\frac{1}{2^{i}}|f_{i}^{(l)}(x)-f^{(l)}_{i}(y)|^{\alpha}\Big{]}\\ &=\lim_{N\rightarrow\infty}{\mathbb{E}}\Big{[}\sum_{i=1}^{N}\frac{1}{2^{i}}|f_{i}^{(l)}(x)-f^{(l)}_{i}(y)|^{\alpha}\Big{]}\\ &=\sum_{i=1}^{\infty}\frac{1}{2^{i}}{\mathbb{E}}\Big{[}|f_{i}^{(l)}(x)-f^{(l)}_{i}(y)|^{\alpha}\Big{]}\\ &=\sum_{i=1}^{\infty}\frac{1}{2^{i}}H^{(l)}\|x-y\|^{\alpha}_{{\mathbb{R}}^{I}}\\ &=H^{(l)}\|x-y\|_{{\mathbb{R}}^{I}}^{\alpha}\end{split}

where we used (16) and the fact that H(l)H^{(l)} does not depend on ii (see (19)).

Therefore, by Proposition 2, for each α>I\alpha>I, setting β=αI\beta=\alpha-I (since β\beta needs to be positive, it is sufficient to choose α>I\alpha>I) F(l)\textbf{F}^{(l)} has a continuous version F(l)(θ)\textbf{F}^{(l)(\theta)} and the latter is \mathbb{P}-a.s locally γ\gamma-Hölder continuous for every 0<γ<1Iα0<\gamma<1-\frac{I}{\alpha}.

  • Thus F(l)(α)\textbf{F}^{(l)(\alpha)} and F(l)\textbf{F}^{(l)} are indistinguishable (same trajectories), i.e there exists Ω(α)Ω\Omega^{(\alpha)}\subset\Omega with (Ω(α))=1\mathbb{P}(\Omega^{(\alpha)})=1 such that for each ωΩ(α)\omega\in\Omega^{(\alpha)}, xF(l)(x)(ω)x\mapsto\textbf{F}^{(l)}(x)(\omega) is locally γ\gamma-Hölder for each 0<γ<1Iα0<\gamma<1-\frac{I}{\alpha}.

  • Define Ω=α>IΩ(α)\Omega^{\star}=\bigcap_{\alpha>I}\Omega^{(\alpha)}, then for each 0<δ0<10<\delta_{0}<1 there exists α0\alpha_{0} such that δ0<1Iα0<1\delta_{0}<1-\frac{I}{\alpha_{0}}<1, thus for each ωΩΩ(α0)\omega\in\Omega^{\star}\subset\Omega^{(\alpha_{0})}, the trajectory xF(l)(x)(ω)x\mapsto\textbf{F}^{(l)}(x)(\omega) is locally δ0\delta_{0}-Hölder continuous.

By Proposition 2 we can conclude that F(l)\textbf{F}^{(l)} has a continuous version and the latter is \mathbb{P}-a.s locally γ\gamma-Hölder continuous for every 0<γ<10<\gamma<1.

SM E

General introduction to Daniell-Kolmogorov extension theorem

Let XX be a set of indexes and {(Ex,x)}xX\{(E_{x},\mathcal{E}_{x})\}_{x\in X} measurable spaces. On E:=×xXExE:=\times_{x\in X}E_{x} we can consider the σ\sigma-algebra :=xXx\mathcal{E}:=\bigotimes_{x\in X}\mathcal{E}_{x} that is

=σ(πx,xX)=σ(xXπx1(x))\mathcal{E}=\sigma(\pi_{x},x\in X)=\sigma\Big{(}\bigcup_{x\in X}\pi_{x}^{-1}(\mathcal{E}_{x})\Big{)}

where for each xXx\in X, πx:EEx,ω:=(ωx)xXπx(ω)=ωx\pi_{x}:E\rightarrow E_{x},\omega:=(\omega_{x})_{x\in X}\mapsto\pi_{x}(\omega)=\omega_{x}. \mathcal{E} is generated by measurable rectangles. A measurable rectangle AA is of the form

A:=×xXAx such that only a finite number of Axx are different from ExA:=\times_{x\in X}A_{x}\text{ such that only a finite number of $A_{x}\in\mathcal{E}_{x}$ are different from $E_{x}$}

σ\sigma-algebra on the space of functions

Fix X=IX={\mathbb{R}}^{I} and (S,d)(S,d) Polish space. We consider the measurable sets {(Ex,x)}xX={(S,(S))}xI\{(E_{x},\mathcal{E}_{x})\}_{x\in X}=\{(S,\mathcal{B}(S))\}_{x\in{\mathbb{R}}^{I}} thus we can construct a measurable space

(E,)=(×xXEx,xXx)=(SI,(SI))(E,\mathcal{E})=(\times_{x\in X}E_{x},\bigotimes_{x\in X}\mathcal{E}_{x})=\Big{(}S^{{\mathbb{R}}^{I}},\mathcal{B}(S^{{\mathbb{R}}^{I}})\Big{)}

where SI=×xISS^{{\mathbb{R}}^{I}}=\times_{x\in{\mathbb{R}}^{I}}S is the set of all functions from I{\mathbb{R}}^{I} into SS and

(SI):=xI(S)=σ(xIπx1((S)))=σ({A:=×xXAx such that only a finite number of Ax are different from S })\begin{split}\mathcal{B}(S^{{\mathbb{R}}^{I}})&:=\bigotimes_{x\in{\mathbb{R}}^{I}}\mathcal{B}(S)\\ &=\sigma\Big{(}\bigcup_{x\in{\mathbb{R}}^{I}}\pi_{x}^{-1}(\mathcal{B}(S))\Big{)}\\ &=\sigma\Big{(}\Big{\{}A:=\times_{x\in X}A_{x}\text{ such that only a finite number of $A_{x}$ are different from $S$ \Big{\}}}\Big{)}\end{split}

An example of measurable rectangle is

A=S×Ax(1)×S×Ax(2)×S×S××Ax(k)×S×S×A=S\times A_{x^{(1)}}\times S\times A_{x^{(2)}}\times S\times S\times\dots\times A_{x^{(k)}}\times S\times S\times\dots

where kk\in{\mathbb{N}} and only for x(1),,x(k)x^{(1)},\dots,x^{(k)} the cartesian products are different to SS.

Denote by Z=(Zx)xIZ=(Z_{x})_{x\in{\mathbb{R}}^{I}}, Zx:(Ω,,)SZ_{x}:(\Omega,\mathcal{H},{\mathbb{P}})\rightarrow S any stochastic process of interest, such as fi(l)(n)f^{(l)}_{i}(n) or fi(l)f^{(l)}_{i} for some l1l\geq 1, i1i\geq 1 and n1n\geq 1 when S=(,||)S=({\mathbb{R}},|\cdot|), or even 𝐅(l)(n)\mathbf{F}^{(l)}(n) or 𝐅(l)\mathbf{F}^{(l)} for l1l\geq 1 and n1n\geq 1 when S=(,)S=({\mathbb{R}}^{\infty},\|\cdot\|_{\infty}). Consider the finite-dimensional distributions of ZZ

Λ={Px(1),,x(k)Z on (Sk)|x(j)I,j{1,,k},k}\Lambda=\{P^{Z}_{x^{(1)},\dots,x^{(k)}}\text{ on }\mathcal{B}(S^{k})|x^{(j)}\in{\mathbb{R}}^{I},j\in\{1,\dots,k\},k\in{\mathbb{N}}\}

If Λ\Lambda is consistent in the sense of Kolmogorov theorem, then there exists an unique probability measure {\mathbb{P}}^{\prime} on (SI,(SI))(S^{{\mathbb{R}}^{I}},\mathcal{B}(S^{{\mathbb{R}}^{I}})) such that the canonical process Z=(Zx)xIZ^{\prime}=(Z^{\prime}_{x})_{x\in{\mathbb{R}}^{I}}, Zx:SIS,ωZx(ω)=ω(x)Z^{\prime}_{x}:S^{{\mathbb{R}}^{I}}\rightarrow S,\omega\mapsto Z^{\prime}_{x}(\omega)=\omega(x) on (SI,(SI),)(S^{{\mathbb{R}}^{I}},\mathcal{B}(S^{{\mathbb{R}}^{I}}),{\mathbb{P}}^{\prime}) has finite-dimensional distributions that coincide with Λ\Lambda.

SM E.1 : Existence of a probability measure on SIS^{{\mathbb{R}}^{I}} for the sequence processes

Fix S=S={\mathbb{R}}. Fix a layer ll, a unit i1i\geq 1 on that layer and nn\in{\mathbb{N}}. We want to prove that there exists a probability measure (i,l,n){\mathbb{P}}^{(i,l,n)} on (I,(I))({\mathbb{R}}^{{\mathbb{R}}^{I}},\mathcal{B}({\mathbb{R}}^{{\mathbb{R}}^{I}})) such that the associated canonical process Θx(i,l,n):I\Theta^{(i,l,n)}_{x}:{\mathbb{R}}^{{\mathbb{R}}^{I}}\to{\mathbb{R}}, ωω(x)\omega\mapsto\omega(x) has finite-dimensional distributions that coincide with

Λ(i,l,n)={Px(1),,x(k)(i,l,n)}k,\Lambda^{(i,l,n)}=\Big{\{}P^{(i,l,n)}_{x^{(1)},\dots,x^{(k)}}\Big{\}}_{k\in{\mathbb{N}}},

where Px(1),,x(k)(i,l,n)P^{(i,l,n)}_{x^{(1)},\dots,x^{(k)}} is the distribution of fi(l)(X,n)f^{(l)}_{i}(\textbf{X},n). We do not know the exact form of this distribution but we know the distribution of the conditioned random variable fi(l)(X,n)|f1,,n(l1)f^{(l)}_{i}(\textbf{X},n)|f^{(l-1)}_{1,\dots,n} (see (11)). Thus, since from (11) the distribution of fi(1)(X)f^{(1)}_{i}(\textbf{X}) is well known, proceeding by induction it is sufficient to prove the existence of two probability measures (i,1,n){\mathbb{P}}^{(i,1,n)} and |l1(i,l,n){\mathbb{P}}^{(i,l,n)}_{|l-1} on (I,(I))({\mathbb{R}}^{{\mathbb{R}}^{I}},\mathcal{B}({\mathbb{R}}^{{\mathbb{R}}^{I}})) such that the associated canonical processes Θx(i,1,n)\Theta^{(i,1,n)}_{x}, and Θx(i,1,n)|l1\Theta^{(i,1,n)|l-1}_{x} have finite-dimensional distributions that coincide respectively with

Λ(i,1,n):={Px(1),,x(k)(i,1,n)}k and Λ|l1(i,l,n):={Px(1),,x(k)(i,l,n,)|l1}k,\Lambda^{(i,1,n)}:=\Big{\{}P^{(i,1,n)}_{x^{(1)},\dots,x^{(k)}}\Big{\}}_{k\in{\mathbb{N}}}\quad\text{ and }\quad\Lambda^{(i,l,n)}_{|l-1}:=\Big{\{}P^{(i,l,n,)|l-1}_{x^{(1)},\dots,x^{(k)}}\Big{\}}_{k\in{\mathbb{N}}},

where Px(1),,x(k)(i,1,n)=Nk(0,Σ(1,X))P^{(i,1,n)}_{x^{(1)},\dots,x^{(k)}}=N_{k}(\textbf{0},\Sigma(1,\textbf{X})) and Px(1),,x(k)(i,l,n)|l1=Nk(0,Σ(l,n,X))P^{(i,l,n)|l-1}_{x^{(1)},\dots,x^{(k)}}=N_{k}(\textbf{0},\Sigma(l,n,\textbf{X})) defined on (k)\mathcal{B}({\mathbb{R}}^{k}). Observe that, for simplicity of notation, we have always avoided to write the dependence of the covariance matrix on the inputs matrix X, but in this case it is important to emphasize this. For the proof we defer to the limit case in the next subsection since the proof is the same step by step. When S=S={\mathbb{R}}^{\infty}, recall that given a sequence of probability spaces {(I,(I),(i,l,n))}i1\{({\mathbb{R}}^{{\mathbb{R}}^{I}},\mathcal{B}({\mathbb{R}}^{{\mathbb{R}}^{I}}),{\mathbb{P}}^{(i,l,n)})\}_{i\geq 1} there exists a unique probability measure (l,n){\mathbb{P}}^{(l,n)} on (×i=1I,i=1(I))=(()I,(()I))(\times_{i=1}^{\infty}{\mathbb{R}}^{{\mathbb{R}}^{I}},\bigotimes_{i=1}^{\infty}\mathcal{B}({\mathbb{R}}^{{\mathbb{R}}^{I}}))=\big{(}({\mathbb{R}}^{\infty})^{{\mathbb{R}}^{I}},\mathcal{B}(({\mathbb{R}}^{\infty})^{{\mathbb{R}}^{I}})\big{)} such that, for each measurable rectangle A=×i=1AiA=\times_{i=1}^{\infty}A_{i} where only for a finite number of ii the set AiA_{i} is different from I{\mathbb{R}}^{{\mathbb{R}}^{I}}, then (l,n)(A)=i=1(i,l,n)(Ai){\mathbb{P}}^{(l,n)}(A)=\prod_{i=1}^{\infty}{\mathbb{P}}^{(i,l,n)}(A_{i}). Moreover this probability is denoted as (l,n)=:i=1(i,l,n){\mathbb{P}}^{(l,n)}=:\bigotimes_{i=1}^{\infty}{\mathbb{P}}^{(i,l,n)}. This means that the existence of the stochastic processes fi(l)(n)f^{(l)}_{i}(n) implies the existence of the stochastic processes 𝐅(l)(n)\mathbf{F}^{(l)}(n).

SM E.2 : Existence of a probability measure on SIS^{{\mathbb{R}}^{I}} for the limit process

Note that, as observed in previous section, the existence of the stochastic processes fi(l)f^{(l)}_{i} on (I,(I))({\mathbb{R}}^{{\mathbb{R}}^{I}},\mathcal{B}({\mathbb{R}}^{{\mathbb{R}}^{I}})) implies the existence of the stochastic processes F(l)\textbf{F}^{(l)} on (()I,(()I))\big{(}({\mathbb{R}}^{\infty})^{{\mathbb{R}}^{I}},\mathcal{B}(({\mathbb{R}}^{\infty})^{{\mathbb{R}}^{I}})\big{)}. Then we focus on the proof when S=S={\mathbb{R}}. Fix a layer ll and a unit i1i\geq 1 on that layer. We want to prove that there exists a probability measure (i,l){\mathbb{P}}^{(i,l)} on (I,(I))({\mathbb{R}}^{{\mathbb{R}}^{I}},\mathcal{B}({\mathbb{R}}^{{\mathbb{R}}^{I}})) such that the canonical process Θx(i,l):I\Theta^{(i,l)}_{x}:{\mathbb{R}}^{{\mathbb{R}}^{I}}\to{\mathbb{R}}, ωω(x)\omega\mapsto\omega(x) has finite-dimensional distributions that coincide with

Λ(i,l)={Px(1),,x(k)(i,l)}k,\Lambda^{(i,l)}=\Big{\{}P^{(i,l)}_{x^{(1)},\dots,x^{(k)}}\Big{\}}_{k\in{\mathbb{N}}},

where Px(1),,x(k)(i,l)P^{(i,l)}_{x^{(1)},\dots,x^{(k)}} are the finite-dimensional distributions of fi(l)f^{(l)}_{i} determined in (7), i.e. Px(1),,x(k)(i,l)=Nk(0,Σ(l,X))P^{(i,l)}_{x^{(1)},\dots,x^{(k)}}=N_{k}(\textbf{0},\Sigma(l,\textbf{X})) defined on (k)\mathcal{B}({\mathbb{R}}^{k}). By Daniell-Kolmogorov existence result (Kallenberg, 2002, Theorem 6.16) it is sufficient to prove that for each kk\in{\mathbb{N}} and for each x(1),,x(k)x^{(1)},\dots,x^{(k)} elements on I{\mathbb{R}}^{I}, then

Px(1),,x(z),,x(k)(i,l)(B(1)××B(z1)××B(z+1)××B(k))=Px(1),,x(z1),x(z+1),,x(k)(i,l)(B(1)××B(z1)×B(z+1)××B(k)),\displaystyle\begin{split}&P^{(i,l)}_{x^{(1)},\dots,x^{(z)},\dots,x^{(k)}}(B^{(1)}\times\dots\times B^{(z-1)}\times{\mathbb{R}}\times B^{(z+1)}\times\dots\times B^{(k)})\\ &=P^{(i,l)}_{x^{(1)},\dots,x^{(z-1)},x^{(z+1)},\dots,x^{(k)}}(B^{(1)}\times\dots\times B^{(z-1)}\times B^{(z+1)}\times\dots\times B^{(k)}),\end{split} (25)

for every z{1,,k}z\in\{1,\dots,k\} and for every B(j)()B^{(j)}\in\mathcal{B}({\mathbb{R}}) for all j=1,,kj=1,\dots,k, jzj\neq z. Fix kk\in{\mathbb{N}}, kk inputs x(1),,x(k)x^{(1)},\dots,x^{(k)}, z{1,,k}z\in\{1,\dots,k\} and B(j)()B^{(j)}\in\mathcal{B}({\mathbb{R}}) for all j=1,,kj=1,\dots,k, jzj\neq z. Define the projection π[z]:kk1\pi_{[z]}:{\mathbb{R}}^{k}\to{\mathbb{R}}^{k-1} such that π[k](y1,,yk)=[y1,,yz1,yz+1,yk]T\pi_{[k]}(y_{1},\dots,y_{k})=[y_{1},\dots,y_{z-1},y_{z+1},\dots y_{k}]^{T}. Thus, condition (25) is equivalent to the following:

Px(1),,x(k)(i,l)π[z]=Pπ[z](x(1),,x(k))(i,l),P^{(i,l)}_{x^{(1)},\dots,x^{(k)}}\circ\pi_{[z]}=P^{(i,l)}_{\pi_{[z]}(x^{(1)},\dots,x^{(k))}},

where on the left we have the image measure of Px(1),,x(k)(i,l)P^{(i,l)}_{x^{(1)},\dots,x^{(k)}} under π[z]\pi_{[z]}. We prove this by proving that the respective Fourier transformations coincide. In the following calculations we define y=[y1,,yk]T\textbf{y}=[y_{1},\dots,y_{k}]^{T}, y[z]=[y1,,yz1,yz+1,,yk]T\textbf{y}_{[z]}=[y_{1},\dots,y_{z-1},y_{z+1},\dots,y_{k}]^{T} and t=[t1,,tk]T\textbf{t}=[t_{1},\dots,t_{k}]^{T}, t[k]=[t1,,tz1,tz+1,,tk]T\textbf{t}_{[k]}=[t_{1},\dots,t_{z-1},t_{z+1},\dots,t_{k}]^{T}, then by definition of image measure we get

φ(Px(1),,x(k)(i,l)π[z])(t[z])=k1eit[z]Ty[z](Px(1),,x(k)(i,l)π[z])(dy[z])=keit[z]Tπ[z](y)Px(1),,x(k)(i,l)(dy).\begin{split}\varphi_{\big{(}P^{(i,l)}_{x^{(1)},\dots,x^{(k)}}\circ\pi_{[z]}\big{)}}(\textbf{t}_{[z]})&=\int_{{\mathbb{R}}^{k-1}}e^{{\rm i}\textbf{t}_{[z]}^{T}\textbf{y}_{[z]}}\big{(}P^{(i,l)}_{x^{(1)},\dots,x^{(k)}}\circ\pi_{[z]}\big{)}(\mathrm{d}\textbf{y}_{[z]})\\ &=\int_{{\mathbb{R}}^{k}}e^{{\rm i}\textbf{t}_{[z]}^{T}\pi_{[z]}(\textbf{y})}P^{(i,l)}_{x^{(1)},\dots,x^{(k)}}(\mathrm{d}\textbf{y}).\end{split}

Now, recalling that 1j\textbf{1}_{j} is the k×1k\times 1 vector with 11 in the jj-th position and 0 otherwise, since π[z](y)=y[z]\pi_{[z]}(\textbf{y})=\textbf{y}_{[z]}, defining π[z](t)=j=1jzk1jtj\pi^{\star}_{[z]}(\textbf{t})=\sum_{j=1\\ j\neq z}^{k}\textbf{1}_{j}t_{j} we get t[z]Tπ[z](y)=yTπ[z](t)\textbf{t}_{[z]}^{T}\pi_{[z]}(\textbf{y})=\textbf{y}^{T}\pi^{\star}_{[z]}(\textbf{t}). Then

φ(Px(1),,x(k)(i,l)π[z])(t[z])=keiyTπ[z](t)Px(1),,x(k)(i,l)(dy)=φPx(1),,x(k)(i,l)(π[z](t))=φNk(0,Σ(l,X))(π[z](t))=exp{12π[z](t)TΣ(l,X)π[z](t)}=exp{12t[z]TΣ^(l,X)t[z]},\begin{split}\varphi_{\big{(}P^{(i,l)}_{x^{(1)},\dots,x^{(k)}}\circ\pi_{[z]}\big{)}}(\textbf{t}_{[z]})&=\int_{{\mathbb{R}}^{k}}e^{{\rm i}\textbf{y}^{T}\pi^{\star}_{[z]}(\textbf{t})}P^{(i,l)}_{x^{(1)},\dots,x^{(k)}}(\mathrm{d}\textbf{y})\\ &=\varphi_{P^{(i,l)}_{x^{(1)},\dots,x^{(k)}}}(\pi^{\star}_{[z]}(\textbf{t}))\\ &=\varphi_{N_{k}(\textbf{0},\Sigma(l,\textbf{X}))}(\pi^{\star}_{[z]}(\textbf{t}))\\ &=\exp\big{\{}-\frac{1}{2}\pi^{\star}_{[z]}(\textbf{t})^{T}\Sigma(l,\textbf{X})\pi^{\star}_{[z]}(\textbf{t})\big{\}}\\ &=\exp\big{\{}-\frac{1}{2}\textbf{t}_{[z]}^{T}\widehat{\Sigma}(l,\textbf{X})\textbf{t}_{[z]}\big{\}},\end{split}

where Σ^(l,X)\widehat{\Sigma}(l,\textbf{X}) is the matrix Σ(l,X)\Sigma(l,\textbf{X}) without the zz-th row and the zz-th column. But since Σ^(l,X)=Σ(l,π[z](x(1),,x(k)))\widehat{\Sigma}(l,\textbf{X})=\Sigma(l,\pi_{[z]}(x^{(1)},\dots,x^{(k)})) we get φ(Px(1),,x(k)(i,l)π[z])(t[z])=φPπ[z](x(1),,x(k))(i,l)(t[z])\varphi_{\big{(}P^{(i,l)}_{x^{(1)},\dots,x^{(k)}}\circ\pi_{[z]}\big{)}}(\textbf{t}_{[z]})=\varphi_{P^{(i,l)}_{\pi_{[z]}(x^{(1)},\dots,x^{(k)})}}(\textbf{t}_{[z]}) for each t[z]\textbf{t}_{[z]} and thus the two Fourier transformations coincides, as we wanted to prove.

SM E.3: Existence of a probability measure on C(I;)C({\mathbb{R}}^{I};{\mathbb{R}})

If ZZ is, in addition, a continuous stochastic process then we will show that there exists a probability measure Z{\mathbb{P}}^{Z} on C(I;)IC({\mathbb{R}}^{I};{\mathbb{R}})\subset{\mathbb{R}}^{{\mathbb{R}}^{I}} endowed with a σ\sigma-algebra 𝒢(I)\mathcal{G}\subset\mathcal{B}({\mathbb{R}}^{{\mathbb{R}}^{I}}) such that the finite-dimensional distribution of ZZ^{\prime} and ZZ coincide.

As suggested by Kallenberg (2002) (page 311) we consider C(I;)C({\mathbb{R}}^{I};{\mathbb{R}}) with the topology of uniform convergence on compacts, that is

{ρ:C(I;)×C(I;)[0,),(ω1,ω2)ρS(ω1,ω2)=R=112RsupxB¯R(0)ξ(|ω1(x)ω2(x)|)\displaystyle\left\{\begin{array}[]{@{}l@{}}\rho_{{\mathbb{R}}}:C({\mathbb{R}}^{I};{\mathbb{R}})\times C({\mathbb{R}}^{I};{\mathbb{R}})\rightarrow[0,\infty),\\ \qquad(\omega_{1},\omega_{2})\mapsto\rho_{S}(\omega_{1},\omega_{2})=\sum_{R=1}^{\infty}\frac{1}{2^{R}}\sup_{x\in\overline{B}_{R}(0)}\xi(|\omega_{1}(x)-\omega_{2}(x)|_{{\mathbb{R}}})\end{array}\right. (28)

The Borel σ\sigma-field 𝒢:=(C(I;),ρ)\mathcal{G}:=\mathcal{B}(C({\mathbb{R}}^{I};{\mathbb{R}}),\rho_{{\mathbb{R}}}) is generated by the evaluation maps πx\pi_{x}, thus it coincide with the product σ\sigma-field, i.e. 𝒢=σ(Γ)\mathcal{G}=\sigma(\Gamma), where

Γ={Γx(1),,x(k)(A)|A=Ax(1)××Ax(k),Ax(j)(),x(j)I,j{1,,k},k}\Gamma=\big{\{}\Gamma_{x^{(1)},\dots,x^{(k)}}(A)|A=A_{x^{(1)}}\times\dots\times A_{x^{(k)}},A_{x^{(j)}}\in\mathcal{B}({\mathbb{R}}),x^{(j)}\in{\mathbb{R}}^{I},j\in\{1,\dots,k\},k\in{\mathbb{N}}\big{\}}

where Γx(1),,x(k)(A)={ωC(I;)|ω(x(1))Ax(1),,ω(x(k))Ax(k)}\Gamma_{x^{(1)},\dots,x^{(k)}}(A)=\big{\{}\omega\in C({\mathbb{R}}^{I};{\mathbb{R}})|\omega(x^{(1)})\in A_{x^{(1)}},\dots,\omega(x^{(k)})\in A_{x^{(k)}}\big{\}}. Note that since σ(Γ)(I)\sigma(\Gamma)\subset\mathcal{B}({\mathbb{R}}^{{\mathbb{R}}^{I}}) then 𝒢=σ(Γ)(I)\mathcal{G}=\sigma(\Gamma)\subset\mathcal{B}({\mathbb{R}}^{{\mathbb{R}}^{I}}).

Theorem 3.

There exists a unique probability measure Z{\mathbb{P}}^{Z} on (C(I;),𝒢)(C({\mathbb{R}}^{I};{\mathbb{R}}),\mathcal{G}) such that the canonical process ZZ^{\prime} restricted to (C(I;),𝒢))(C({\mathbb{R}}^{I};{\mathbb{R}}),\mathcal{G})) has finite-dimensional distributions that coincide with those of ZZ.

For the existence of Z{\mathbb{P}}^{Z} consider the following

Lemma 6.

Let (Zx)xI(Z_{x})_{x\in{\mathbb{R}}^{I}} be a {\mathbb{R}}-valued continuous stochastic process defined on (Ω,,)(\Omega,\mathcal{H},{\mathbb{P}}). Then

{𝒵:ΩC(I;)ω𝒵(ω)=(Zx(ω))xI\left\{\begin{array}[]{@{}l@{}}\mathcal{Z}:\Omega\rightarrow C({\mathbb{R}}^{I};{\mathbb{R}})\\ \qquad\omega\rightarrow\mathcal{Z}(\omega)=(Z_{x}(\omega))_{x\in{\mathbb{R}}^{I}}\end{array}\right.

is a random variable, i.e. measurable from (Ω,)(\Omega,\mathcal{H}) into (C(I;),𝒢)(C({\mathbb{R}}^{I};{\mathbb{R}}),\mathcal{G}).

Proof.

By previous proposition 𝒢=σ(Γ)\mathcal{G}=\sigma(\Gamma), then taking 𝒪σ(Γ)\mathcal{O}\in\sigma(\Gamma), 𝒪=Γx(1),,x(k)(A)\mathcal{O}=\Gamma_{x^{(1)},\dots,x^{(k)}}(A) for some kk\in{\mathbb{N}}, {x(1),,x(k)}I\{x^{(1)},\dots,x^{(k)}\}\subset{\mathbb{R}}^{I} and A=Ax(1)××Ax(k),Ax(j)()A=A_{x^{(1)}}\times\dots\times A_{x^{(k)}},A_{x^{(j)}}\in\mathcal{B}({\mathbb{R}}),we get

{ωΩ|𝒵(ω)𝒪}={ωΩ|Zx(1)(ω)Ax(1),,Zx(k)(ω)Ax(k)}=j=1k{Zx(j)Ax(j)}\begin{split}\{\omega\in\Omega|\mathcal{Z}(\omega)\in\mathcal{O}\}&=\{\omega\in\Omega|Z_{x^{(1)}}(\omega)\in A_{x^{(1)}},\dots,Z_{x^{(k)}}(\omega)\in A_{x^{(k)}}\}\\ &=\bigcap_{j=1}^{k}\{Z_{x^{(j)}}\in A_{x^{(j)}}\}\in\mathcal{H}\end{split}

where we used that Zx(j)Z_{x^{(j)}} are random variables from (Ω,)(\Omega,\mathcal{H}) into (,())({\mathbb{R}},\mathcal{B}({\mathbb{R}})). ∎

Then we can define a probability measure Z{\mathbb{P}}^{Z} on (C(I;),𝒢)(C({\mathbb{R}}^{I};{\mathbb{R}}),\mathcal{G}) being the image measure of 𝒵\mathcal{Z} under {\mathbb{P}}, that is

𝒪𝒢,Z(𝒪)=(𝒵𝒪)\forall\mathcal{O}\in\mathcal{G},\quad{\mathbb{P}}^{Z}(\mathcal{O})={\mathbb{P}}(\mathcal{Z}\in\mathcal{O})

Now we prove that the finite-dimensional distributions of ZZ^{\prime} coincide with those of ZZ. It is sufficient to prove the following

Lemma 7.

Z{\mathbb{P}}^{Z} coincide wit the image measure of the canonical process ZZ^{\prime} under {\mathbb{P}}^{{}^{\prime}} restricted to (C(I;),𝒢)(C({\mathbb{R}}^{I};{\mathbb{R}}),\mathcal{G}).

Proof.

Fix 𝒪𝒢=σ(Γ)\mathcal{O}\in\mathcal{G}=\sigma(\Gamma), 𝒪=Γx(1),,x(k)(A)\mathcal{O}=\Gamma_{x^{(1)},\dots,x^{(k)}}(A) for some kk\in{\mathbb{N}}, {x(1),,x(k)}I\{x^{(1)},\dots,x^{(k)}\}\subset{\mathbb{R}}^{I} and A=Ax(1)××Ax(k),Ax(j)()A=A_{x^{(1)}}\times\dots\times A_{x^{(k)}},A_{x^{(j)}}\in\mathcal{B}({\mathbb{R}}). By definition of Z{\mathbb{P}}^{Z},

Z(𝒪)=(𝒵Γx(1),,x(k)(A))=({ωΩ|𝒵(ω)𝒪})=({ωΩ|Zx(1)(ω)Ax(1),,Zx(k)(ω)Ax(k))=Px(1),,x(k)Z(A)\begin{split}{\mathbb{P}}^{Z}(\mathcal{O})&={\mathbb{P}}(\mathcal{Z}\in\Gamma_{x^{(1)},\dots,x^{(k)}}(A))\\ &={\mathbb{P}}(\{\omega\in\Omega|\mathcal{Z}(\omega)\in\mathcal{O}\})\\ &={\mathbb{P}}(\{\omega\in\Omega|Z_{x^{(1)}}(\omega)\in A_{x^{(1)}},\dots,Z_{x^{(k)}}(\omega)\in A_{x^{(k)}})\\ &=P^{Z}_{x^{(1)},\dots,x^{(k)}}(A)\end{split}

By Daniell-Kolmogorv extension theorem the finite-dimensional distributions of ZZ coincide with those of the canonical process ZZ^{\prime} under {\mathbb{P}}^{\prime}, then Px(1),,x(k)Z(A)=(Z𝒪)P^{Z}_{x^{(1)},\dots,x^{(k)}}(A)={\mathbb{P}}^{\prime}(Z^{\prime}\in\mathcal{O}). ∎

The uniqueness of Z{\mathbb{P}}^{Z} follows by the uniqueness {\mathbb{P}}^{\prime}.

SM E.4: σ(×i=1C(I;))σ(C(I;)\sigma(\times_{i=1}^{\infty}C({\mathbb{R}}^{I};{\mathbb{R}}))\subset\sigma(C({\mathbb{R}}^{I};{\mathbb{R}}^{\infty})

First, note that ×i=1C(I;)C(I;)\times_{i=1}^{\infty}C({\mathbb{R}}^{I};{\mathbb{R}})\simeq C({\mathbb{R}}^{I};{\mathbb{R}}^{\infty}), indeed the map

Ξ:C(I;)×i=1C(I;),ω(ω1,ω2,)\Xi:C({\mathbb{R}}^{I};{\mathbb{R}}^{\infty})\rightarrow\times_{i=1}^{\infty}C({\mathbb{R}}^{I};{\mathbb{R}}),\qquad\omega\mapsto(\omega_{1},\omega_{2},\dots)

is an isomorphism because is linear and bijective, indeed ω\omega is \|\cdot\|_{\infty}-continuous if and only if each component ωi\omega_{i} is |||\cdot|-continuous. It means that each element in one space could be seen as an element in the other and vice-versa, but different topologies are defined on these spaces. Now we prove that the sigma algebra generated by the product topology in ×i=1C(I;)\times_{i=1}^{\infty}C({\mathbb{R}}^{I};{\mathbb{R}}) is contained on the sigma algebra generated by the topology of uniform convergence on compact set in C(I;)C({\mathbb{R}}^{I};{\mathbb{R}}^{\infty}). For each f,gC(I;)f,g\in C({\mathbb{R}}^{I};{\mathbb{R}}^{\infty}) we have the following distances

{ρprod(f,g)=i=112iξ(R=112RsupxBR(0)ξ(|fi(x)gi(x)|)), on ×i=1C(I;)ρunif(f,g)=R=112RsupxBR(0)ξ(i=112iξ(|fi(x)gi(x)|)), on C(I,)\displaystyle\left\{\begin{array}[]{@{}l@{}}\rho_{prod}(f,g)=\sum_{i=1}^{\infty}\frac{1}{2^{i}}\xi\Big{(}\sum_{R=1}^{\infty}\frac{1}{2^{R}}\sup_{x\in B_{R}(0)}\xi(|f_{i}(x)-g_{i}(x)|)\Big{)},\quad\text{ on }\times_{i=1}^{\infty}C({\mathbb{R}}^{I};{\mathbb{R}})\\ \rho_{unif}(f,g)=\sum_{R=1}^{\infty}\frac{1}{2^{R}}\sup_{x\in B_{R}(0)}\xi\Big{(}\sum_{i=1}^{\infty}\frac{1}{2^{i}}\xi(|f_{i}(x)-g_{i}(x)|)\Big{)},\quad\text{ on }C({\mathbb{R}}^{I},{\mathbb{R}}^{\infty})\end{array}\right. (31)

Using that ξ\xi is increasing and continuous and that supx(ihi(x))isupxhi(x)\sup_{x}(\sum_{i}h_{i}(x))\leq\sum_{i}\sup_{x}h_{i}(x) it can be proved that there exists a constant C>0C>0 such that funifCfprod\|f\|_{unif}\leq C\|f\|_{prod}. This mean that if hBϵprod(f)={g:fgprod<ϵ}h\in B^{prod}_{\epsilon}(f)=\{g:\|f-g\|_{prod}<\epsilon\} than hBCϵunif(f)={g:fgunif<ϵ}h\in B^{unif}_{C\epsilon}(f)=\{g:\|f-g\|_{unif}<\epsilon\}, that is Bϵprod(f)BCϵunif(f)B^{prod}_{\epsilon}(f)\subset B^{unif}_{C\epsilon}(f) which implies σ(ρprod)σ(ρunif)\sigma(\rho_{prod})\subset\sigma(\rho_{unif}). In particular each compact with respect to prod\|\cdot\|_{prod} is compact with respect to unif\|\cdot\|_{unif}, indeed considering a prod\|\cdot\|_{prod}-compact KK then for every sequence (ki)K(k_{i})\subset K there exists (kij)K(k_{i_{j}})\subset K and kKk\in K such that kijkprod0\|k_{i_{j}}-k\|_{prod}\rightarrow 0. Moreover kijkunifCkijkprod0\|k_{i_{j}}-k\|_{unif}\leq C\|k_{i_{j}}-k\|_{prod}\rightarrow 0, i.e. KK is compact with respect to unif\|\cdot\|_{unif}.

SM F

In this section we prove the Proposition 1.

Proof.

By Proposition 16.6 of Kallenberg (2002) f(n)𝑑ff(n)\overset{d}{\rightarrow}f in C(I;S)C({\mathbb{R}}^{I};S) iff f(n)𝑑ff(n)\overset{d}{\rightarrow}f in C(K;S)C(K;S) for any KIK\subset{\mathbb{R}}^{I} compact. By Lemma 16.2 of of Kallenberg (2002) the latter holds iff f(n)fdff(n)\overset{f_{d}}{\rightarrow}f and (f(n))n1(f(n))_{n\geq 1} is relatively compact in distribution in C(K;S)C(K;S). Note that converge of the finite-dimensional distributions holds in I{\mathbb{R}}^{I} iif it holds in the restriction KK for any compact KIK\subset{\mathbb{R}}^{I}. The space (C(K;S),ρK)(C(K;S),\rho_{K}), namely the space of continuous functions from a generic compact KIK\subset{\mathbb{R}}^{I} to a Polish space SS and C(K;S)C(K;S) endowed with the uniform metric ρK(f,g)=supxKd(f(x),g(x))\rho_{K}(f,g)=\sup_{x\in K}d(f(x),g(x)), is itself a Polish space (Aliprantis & Border, 2006, Lemma 3.97 and Lemma 3.99). Thus by Proposition 16.3 of Kallenberg (2002), i.e. Prohorov Theorem, on C(K,S)C(K,S) (f(n))n1(f(n))_{n\geq 1} is relatively compact in distribution iif (f(n))n1(f(n))_{n\geq 1} is uniformly tight. Thus, so far we have shown that f(n)dff(n)\stackrel{{\scriptstyle d}}{{\rightarrow}}f in C(I;S)C({\mathbb{R}}^{I};S) iff: i) f(n)fdff(n)\stackrel{{\scriptstyle f_{d}}}{{\rightarrow}}f and ii) the sequence (f(n))n1(f(n))_{n\geq 1} is uniformly tight on C(K,S)C(K,S) for a generic compact KIK\subset{\mathbb{R}}^{I}. It remains to show that the latter holds if (f(n))n1(f(n))_{n\geq 1} is uniformly tight on C(I;S)C({\mathbb{R}}^{I};S). Fix KK compact in I{\mathbb{R}}^{I} and Consider the map

πK:(C(I;S),ρS)(C(K;S),ρK),ff|K\pi_{K}:(C({\mathbb{R}}^{I};S),\rho_{S})\rightarrow(C(K;S),\rho_{K}),\quad f\rightarrow f_{|K}

where f|Kf_{|K} is the restriction of ff to KK and ρS\rho_{S} is the metric ρ\rho_{{\mathbb{R}}} defined in (28) when S=S={\mathbb{R}} and ρunif\rho_{unif} defined in (31) when S=S={\mathbb{R}}^{\infty}. By proposition 16.4 of Kallenberg (2002) if πK\pi_{K} is continuous then it moves uniformly tight sequences into uniformly tight sequences. The continuity of πK\pi_{K} follows by the proof of Proposition 16.6 of Kallenberg (2002). ∎