This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Preintegration is not smoothing when monotonicity fails

Alexander D. Gilbert, Frances Y. Kuo and Ian H. Sloan111School of Mathematics and Statistics, UNSW Sydney, Sydney NSW 2052, Australia
Emails: [email protected], [email protected], [email protected]
Abstract

Preintegration is a technique for high-dimensional integration over dd-dimensional Euclidean space, which is designed to reduce an integral whose integrand contains kinks or jumps to a (d1)(d-1)-dimensional integral of a smooth function. The resulting smoothness allows efficient evaluation of the (d1)(d-1)-dimensional integral by a Quasi-Monte Carlo or Sparse Grid method. The technique is similar to conditional sampling in statistical contexts, but the intention is different: in conditional sampling the aim is to reduce the variance, rather than to achieve smoothness. Preintegration involves an initial integration with respect to one well chosen real-valued variable. Griebel, Kuo, Sloan [Math. Comp. 82 (2013), 383–400] and Griewank, Kuo, Leövey, Sloan [J. Comput. Appl. Maths. 344 (2018), 259–274] showed that the resulting (d1)(d-1)-dimensional integrand is indeed smooth under appropriate conditions, including a key assumption — the integrand of the smooth function underlying the kink or jump is strictly monotone with respect to the chosen special variable when all other variables are held fixed. The question addressed in this paper is whether this monotonicity property with respect to one well chosen variable is necessary. We show here that the answer is essentially yes, in the sense that without this property the resulting (d1)(d-1)-dimensional integrand is generally not smooth, having square-root or other singularities.

1 Introduction

Preintegration is a method for numerical integration over d{\mathbb{R}}^{d}, where dd may be large, in the presence of “kinks” (i.e., discontinuities in the gradients) or “jumps” (i.e., discontinuities in the function values). In this method one of the variables is integrated out in a “preintegration” step, with the aim of creating a smooth integrand over d1{\mathbb{R}}^{d-1}, smoothness being important if the intention is to approximate the (d1)(d-1)-dimensional integral by a method that relies on some smoothness of the integrand, such as the Quasi-Monte Carlo (QMC) method [5] or Sparse Grid (SG) method [4].

Integrands with kinks and jumps arise in option pricing, because an option is normally considered worthless if the value falls below a predetermined strike price. In the case of a continuous payoff function this introduces a kink, while in the case of a binary or other digital option it introduces a jump. Integrands with jumps also arise in computations of cumulative probability distributions, see [6].

In this paper we consider the version of preintegration for functions with kinks or jumps presented in the recent papers [10, 11, 12], in which the emphasis was on a rigorous proof of smoothness of the preintegrated (d1)(d-1)-dimensional integrand, in the sense of proving membership of a certain mixed derivative Sobolev space, under appropriate conditions.

A key assumption in [10, 11, 12] was that the smooth function (the function ϕ\phi in (2) below) underlying the kink or jump is strictly monotone with respect to the special variable chosen for the preintegration step, when all other variables are held fixed. While a satisfactory analysis was obtained under that assumption, it was not clear from the analysis in [10, 11, 12] whether or not the monotonicity assumption is in some sense necessary. That is the question we address in the present paper. The short answer is that the monotonicity condition is necessary, in that in the absence of monotonicity the integrand typically has square-root or other singularities.

1.1 Related work

A similar method has already appeared as a practical tool in many other papers, often under the heading “conditional sampling”, see [8], Lemma 7.2 and preceding comments in [1], and a recent paper [15] by L’Ecuyer and colleagues. Also relevant are root-finding strategies for identifying where the payoff is positive, see a remark in [2] and [13, 17]. For other “smoothing” methods, see [3, 18].

The goal in conditional sampling is to decrease the variance of the integrand, motivated by the idea that if the Monte Carlo method is the chosen method for evaluating the integral then reducing the variance will certainly reduce the root mean square expected error. The reality of variance reduction in the preintegration context was explored analytically in Section 4 of [12]. But if cubature methods are used that depend on smoothness of the integrand, as with QMC and SG methods, then variance reduction is not the only consideration. In the present work the focus is on smoothness of the resulting integrand.

1.2 The problem

For the rest of the paper we will follow the setting of [12]. The problem addressed in [12] was the approximate evaluation of the dd-dimensional integral

Idf:=df(𝒚)ρd(𝒚)d𝒚=f(y1,,yd)ρd(𝒚)dy1dyd,I_{d}f\,:=\,\int_{{\mathbb{R}}^{d}}f({\boldsymbol{y}})\rho_{d}({\boldsymbol{y}})\,{\mathrm{d}}{\boldsymbol{y}}\,=\,\int_{-\infty}^{\infty}\ldots\int_{-\infty}^{\infty}f(y_{1},\ldots,y_{d})\,\rho_{d}({\boldsymbol{y}})\,{\mathrm{d}}y_{1}\cdots{\mathrm{d}}y_{d}, (1)

with

ρd(𝒚):=k=1dρ(yk),\rho_{d}({\boldsymbol{y}})\,:=\,\prod_{k=1}^{d}\rho(y_{k}),

where ρ\rho is a continuous and strictly positive probability density function on {\mathbb{R}} with some smoothness, and ff is a real-valued function of the form

f(𝒚)=θ(𝒚)ind(ϕ(𝒚)),f({\boldsymbol{y}})\,=\,\theta({\boldsymbol{y}})\,\mathop{\rm ind}\big{(}\phi({\boldsymbol{y}})\big{)}, (2)

or more generally

f(𝒚)=θ(𝒚)ind(ϕ(𝒚)t),f({\boldsymbol{y}})\,=\,\theta({\boldsymbol{y}})\,\mathop{\rm ind}\big{(}\phi({\boldsymbol{y}})-t\big{)}, (3)

where θ\theta and ϕ\phi are somewhat smooth functions, ind()\mathop{\rm ind}(\cdot) is the indicator function which has the value 11 if the argument is positive and 0 otherwise, and tt is an arbitrary real number. When t=0t=0 and θ=ϕ\theta=\phi we have f(𝒚)=max(ϕ(𝒚),0)f({\boldsymbol{y}})=\max(\phi({\boldsymbol{y}}),0) and thus we have the familiar kink seen in option pricing through the occurrence of a strike price. When θ\theta and ϕ\phi are different (for example, when θ(𝒚)=1\theta({\boldsymbol{y}})=1) we have a structure that includes digital options.

The key assumption on the smooth function ϕ\phi in [12] was that it has a positive partial derivative with respect to some well chosen variable yjy_{j} (and so is an increasing function of yjy_{j}); that is, we assume that for the special choice of j{1,,d}j\in\{1,\ldots,d\} we have

ϕyj(𝒚)>0for all𝒚d.\frac{\partial\phi}{\partial y_{j}}({\boldsymbol{y}})>0\qquad\mbox{for all}\quad{\boldsymbol{y}}\in{\mathbb{R}}^{d}. (4)

In other words, ϕ\phi is monotone increasing with respect to yjy_{j} when all variables other than yjy_{j} are held fixed.

With the variable yjy_{j} chosen to satisfy this condition, the preintegration step is to evaluate

(Pjf)(𝒚j):=f(yj,𝒚j)ρ(yj)dyj,(P_{j}f)({\boldsymbol{y}}_{-j})\,:=\,\int_{-\infty}^{\infty}f(y_{j},{\boldsymbol{y}}_{-j})\,\rho(y_{j})\,{\mathrm{d}}y_{j}, (5)

where 𝒚jd1{\boldsymbol{y}}_{-j}\in{\mathbb{R}}^{d-1} denotes all the components of 𝒚{\boldsymbol{y}} other than yjy_{j}. Once (Pjf)(𝒚j)(P_{j}f)({\boldsymbol{y}}_{-j}) is known we can evaluate IdfI_{d}f as the (d1)(d-1)-dimensional integral

Idf=d1(Pjf)(𝒚j)ρd1(𝒚j)d𝒚j,I_{d}f\,=\,\int_{{\mathbb{R}}^{d-1}}(P_{j}f)({\boldsymbol{y}}_{-j})\,\rho_{d-1}({\boldsymbol{y}}_{-j})\,{\mathrm{d}}{\boldsymbol{y}}_{-j}, (6)

which can be done efficiently if (Pjf)(𝒚j)(P_{j}f)({\boldsymbol{y}}_{-j}) is smooth. In the implementation of preintegration, note that if the integral (6) is to be evaluated by an NN-point cubature rule, then the preintegration step in (5) needs to be carried out for NN different values of 𝒚j{\boldsymbol{y}}_{-j}.

The key is the preintegration step. Because of the monotonicity assumption (4), for each 𝒚jd1{\boldsymbol{y}}_{-j}\in{\mathbb{R}}^{d-1} there is at most one value of the integration variable yjy_{j} such that ϕ(yj,𝒚j)=t\phi(y_{j},{\boldsymbol{y}}_{-j})=t. We denote that value of yjy_{j}, if it exists, by ξ(𝒚j)=ξt(𝒚j)\xi({\boldsymbol{y}}_{-j})=\xi_{t}({\boldsymbol{y}}_{-j}) so that ϕ(ξ(𝒚j),𝒚j)=t\phi(\xi({\boldsymbol{y}}_{-j}),{\boldsymbol{y}}_{-j})=t. Under the condition (4) it follows from the implicit function theorem that ξ(𝒚j)\xi({\boldsymbol{y}}_{-j}) is smooth if ϕ\phi is smooth. Then we can write the preintegration step as

(Pjf)(𝒚j)=ξ(𝒚j)θ(yj,𝒚j)ρ(yj)dyj,(P_{j}f)({\boldsymbol{y}}_{-j})\,=\,\int_{\xi({\boldsymbol{y}}_{-j})}^{\infty}\theta\big{(}y_{j},{\boldsymbol{y}}_{-j}\big{)}\,\rho(y_{j})\,{\mathrm{d}}y_{j}, (7)

which is a smooth function of 𝒚j{\boldsymbol{y}}_{-j} if θ\theta is smooth.

1.3 Informative examples

We now illustrate the success and failure of the preintegration process with some simple examples. In these examples we take d=2d=2 and t=0t=0, and choose ρ\rho to be the standard normal probability density, ρ(y)=exp(y2/2)/2π\rho(y)=\exp(-y^{2}/2)/\sqrt{2\pi}. We also initially take θ(y1,y2)=1\theta(y_{1},y_{2})=1, and comment on other choices at the end of the section.

Refer to caption
Figure 1: Illustrations for Example 1.
Example 1.

In this example we take

ϕ(y1,y2)=y2y12,\phi(y_{1},y_{2})\,=\,y_{2}-y_{1}^{2},

see Figure 1 (left). The zero set of this function is the parabolic curve y2=y12y_{2}=y_{1}^{2}, see Figure 1 (middle). The positivity set of ϕ\phi (i.e., the set for which f(𝐲)f({\boldsymbol{y}}) defined by (2) is non-zero) is the open region above the parabola.

If we take the special variable to be y2y_{2} (i.e., if we take j=2j=2) then the monotonicity condition (4) is satisfied, and the preintegration step is truly smoothing: specifically, we see that

(P2f)(y1)=y12ρ(y2)dy2= 1Φ(y12),(P_{2}f)(y_{1})\,=\,\int_{y_{1}^{2}}^{\infty}\rho(y_{2})\,{\mathrm{d}}y_{2}\,=\,1-\Phi(y_{1}^{2}),

where Φ(x):=xρ(y)dy\Phi(x):=\int_{-\infty}^{x}\rho(y)\,{\mathrm{d}}y is the standard normal cumulative distribution. Thus (P2f)(y1)(P_{2}f)(y_{1}) is a smooth function for all y1y_{1}\in{\mathbb{R}}, and I2fI_{2}f is the integral of a smooth integrand over the real line,

I2f=(P2f)(y1)ρ(y1)dy1=(1Φ(y12))ρ(y1)dy1.I_{2}f\,=\,\int_{-\infty}^{\infty}(P_{2}f)(y_{1})\,\rho(y_{1})\,{\mathrm{d}}y_{1}\,=\,\int_{-\infty}^{\infty}\big{(}1-\Phi(y_{1}^{2})\big{)}\,\rho(y_{1})\,{\mathrm{d}}y_{1}.

If on the other hand we take the special variable to be y1y_{1} (i.e., take j=1j=1) then we have

(P1f)(y2)={0if y20,y2y2ρ(y1)dy1=Φ(y2)Φ(y2)if y2>0.(P_{1}f)(y_{2})\,=\,\begin{cases}0&\mbox{if }y_{2}\leq 0,\\ \displaystyle\int_{-\sqrt{y_{2}}}^{\sqrt{y_{2}}}\rho(y_{1})\,{\mathrm{d}}y_{1}\,=\,\Phi(\sqrt{y_{2}})-\Phi(-\sqrt{y_{2}})&\mbox{if }y_{2}>0.\end{cases}

The graph of (P1f)(y2)(P_{1}f)(y_{2}), shown in Figure 1 (right), reveals that there is a singularity at y2=0y_{2}=0. To see the nature of the singularity, note that since ρ(y1)=ρ(0)exp(y12/2)=ρ(0)+𝒪(y12)\rho(y_{1})=\rho(0)\exp(-y_{1}^{2}/2)=\rho(0)+{\mathcal{O}}(y_{1}^{2}) as y10y_{1}\to 0, we can write

(P1f)(y2)={0if y20,y2y2ρ(y1)dy1= 2y2ρ(0)+𝒪(y23/2)if y2>0.(P_{1}f)(y_{2})\,=\,\begin{cases}0&\mbox{if }y_{2}\leq 0,\\ \displaystyle\int_{-\sqrt{y_{2}}}^{\sqrt{y_{2}}}\rho(y_{1})\,{\mathrm{d}}y_{1}\,=\,2\sqrt{y_{2}}\,\rho(0)+{\mathcal{O}}\big{(}y_{2}^{3/2}\big{)}&\mbox{if }y_{2}>0.\end{cases} (8)

Thus in this simple example (P1f)(y2)(P_{1}f)(y_{2}) is not at all a smooth function of y2y_{2}, having a square-root singularity, and hence an infinite one-sided derivative, at y2=0y_{2}=0.

Refer to caption
Figure 2: Illustrations for Example 2.
Example 2.

In this example we take

ϕ(y1,y2)=y22y121,\phi(y_{1},y_{2})\,=\,y_{2}^{2}-y_{1}^{2}-1,

see Figure 2 (left). The zero set of ϕ\phi is now the hyperbola y22=y12+1y_{2}^{2}=y_{1}^{2}+1, see Figure 2 (middle), and the positivity set is the union of the open regions above and below the upper and lower branches respectively. Taking j=1j=1, we see that monotonicity again fails, and that specifically

(P1f)(y2)\displaystyle(P_{1}f)(y_{2})
={0if y2[1,1],y221y221ρ(y1)dy1= 2y221ρ(0)+𝒪((y221)3/2)if |y2|>1,\displaystyle\,=\,\begin{cases}0&\mbox{if }y_{2}\in[-1,1],\\ \displaystyle\int_{-\sqrt{y_{2}^{2}-1}}^{\sqrt{y_{2}^{2}-1}}\rho(y_{1})\,{\mathrm{d}}y_{1}\,=\,2\sqrt{y_{2}^{2}-1}\,\rho(0)+{\mathcal{O}}\big{(}(y_{2}^{2}-1)^{3/2}\big{)}&\mbox{if }|y_{2}|>1,\end{cases}

the graph of which is shown in Figure 2 (right). Again we see square-root singularities, this time two of them.

Example 3.

Here we take

ϕ(y1,y2)=y22y12,\phi(y_{1},y_{2})\,=\,y_{2}^{2}-y_{1}^{2},

see Figure 3 (left). The level set is now the pair of lines y2=±y1y_{2}=\pm y_{1}, see Figure 3 (middle), and the positivity set is the open region above and below the crossed lines. This time P1fP_{1}f is given by

(P1f)(y2)=|y2||y2|ρ(y1)dy1= 2|y2|ρ(0)+𝒪(|y2|3),\displaystyle(P_{1}f)(y_{2})\,=\,\int_{-|y_{2}|}^{|y_{2}|}\rho(y_{1})\,{\mathrm{d}}y_{1}\,=\,2|y_{2}|\,\rho(0)+{\mathcal{O}}\big{(}|y_{2}|^{3}\big{)},

revealing in Figure 3 (right) a different kind of singularity (a simple discontinuity in the first derivative), but one still unfavorable for numerical integration.

Example 3 is rather special, in that the preintegration is performed on a line that touches a saddle at its critical point (the “flat” point” of the saddle). Example 4 below illustrates another situation, one that is in some ways similar to Example 1, but one perhaps less likely to be seen in practice.

Refer to caption
Figure 3: Illustrations for Example 3.
Refer to caption
Figure 4: Illustrations for Example 4.
Example 4.

Here we consider

ϕ(y1,y2)=y13y2,\phi(y_{1},y_{2})\,=\,y_{1}^{3}-y_{2},

see Figure 4 (left). The zero level set of ϕ\phi is the graph of y2=y13y_{2}=y_{1}^{3}, see Figure 4 (middle), and the positivity set is the unbounded domain to the right of the curve. We see that

(P1f)(y2)=y21/3ρ(y1)dy1\displaystyle(P_{1}f)(y_{2})\,=\,\int_{-\infty}^{y_{2}^{1/3}}\rho(y_{1})\,{\mathrm{d}}y_{1} =0ρ(y1)dy1+0y21/3ρ(y1)dy1\displaystyle\,=\,\int_{-\infty}^{0}\rho(y_{1})\,{\mathrm{d}}y_{1}+\int_{0}^{y_{2}^{1/3}}\rho(y_{1})\,{\mathrm{d}}y_{1}
=12+y21/3ρ(0)+𝒪(|y2|),\displaystyle\,=\,\frac{1}{2}+y_{2}^{1/3}\rho(0)+{\mathcal{O}}\big{(}|y_{2}|\big{)},

which holds regardless of the sign of y2y_{2}. The graph of P1fP_{1}f in Figure 4 (right) displays the cube-root singularity at y2=0y_{2}=0.

In each of the above examples we took θ(y1,y2)=1\theta(y_{1},y_{2})=1. Other choices for θ\theta are generally not interesting, as they do not affect the nature of the singularity. An exception is the choice θ(y1,y2)=ϕ(y1,y2)\theta(y_{1},y_{2})=\phi(y_{1},y_{2}), which yields a kink rather than a jump because

ϕ(𝒚)ind(ϕ(𝒚))=max(ϕ(𝒚),0),\phi({\boldsymbol{y}})\,\mathop{\rm ind}(\phi({\boldsymbol{y}}))\,=\,\max(\phi({\boldsymbol{y}}),0),

and so leads to a weaker singularity. For example, for f(y1,y2)=max(ϕ(y1,y2),0)f(y_{1},y_{2})=\max(\phi(y_{1},y_{2}),0) with ϕ\phi as in Example 1, we obtain instead of (8)

(P1f)(y2)={0if y20,y2y2(y2y12)ρ(y1)dy1=43y23/2ρ(0)+𝒪(y25/2)if y2>0.(P_{1}f)(y_{2})\,=\,\begin{cases}0&\mbox{if }y_{2}\leq 0,\\ \displaystyle\int_{-\sqrt{y_{2}}}^{\sqrt{y_{2}}}(y_{2}-y_{1}^{2})\,\rho(y_{1})\,{\mathrm{d}}y_{1}\,=\,\frac{4}{3}\,y_{2}^{3/2}\,\rho(0)+{\mathcal{O}}\big{(}y_{2}^{5/2}\big{)}&\mbox{if }y_{2}>0.\end{cases}

With the recognition that kinks lead to less severe singularities than jumps, but located at the same places, from now on we shall for simplicity consider only the case θ(𝒚)=1\theta({\boldsymbol{y}})=1.

1.4 Outline of this paper

In Section 2 we study theoretically the smoothness of the preintegrated function, assuming that the original dd-variate function is f(𝒚)=ind(ϕ(𝒚)t)f({\boldsymbol{y}})=\mathrm{ind}(\phi({\boldsymbol{y}})-t), with ϕ\phi smooth but not monotone. We prove that the behavior seen in the above informative examples is typical. Section 3 contains a numerical experiment for a high-dimensional integrand that allows both monotone and non-monotone choices for the preintegrated variable. Section 4 gives brief conclusions.

2 Smoothness theorems in dd dimensions

In the general dd-dimensional setting we take θ1\theta\equiv 1, and use the general form (3) with arbitrary tt\in{\mathbb{R}}. Thus now we consider

f(𝒚):=ft(𝒚):=ind(ϕ(𝒚)t),𝒚d.f({\boldsymbol{y}})\,:=\,f_{t}({\boldsymbol{y}})\,:=\,\mathop{\rm ind}\big{(}\phi({\boldsymbol{y}})-t\big{)},\quad{\boldsymbol{y}}\in{\mathbb{R}}^{d}. (9)

A natural setting in which tt can take any value is in the computation of the (complementary) cumulative probability distribution of a random variable X=ϕ(𝒚)X=\phi({\boldsymbol{y}}), as in [6]. In the case of option pricing varying tt corresponds to varying the strike price.

For simplicity in this section we shall take the special preintegration variable to be y1y_{1}, so fixing j=1j=1. The question is then, assuming that ϕ\phi in (9) has smoothness at least C2(d)C^{2}({\mathbb{R}}^{d}), whether or not P1ftP_{1}f_{t} given by

(P1ft)(𝒚1)ft(y1,𝒚1)ρ(y1)dy1=ind(ϕ(𝒚)t)ρ(y1)dy1(P_{1}f_{t})({\boldsymbol{y}}_{-1})\,\coloneqq\,\int_{-\infty}^{\infty}f_{t}(y_{1},{\boldsymbol{y}}_{-1})\,\rho(y_{1})\,{\mathrm{d}}y_{1}\,=\,\int_{-\infty}^{\infty}\mathop{\rm ind}\big{(}\phi({\boldsymbol{y}})-t\big{)}\,\rho(y_{1})\,{\mathrm{d}}y_{1}

is a smooth function of 𝒚1d1{\boldsymbol{y}}_{-1}\in{\mathbb{R}}^{d-1}.

To gain a first insight into the role of the parameter tt in (9), it is useful to observe that for the examples in Section 1.3 a variation in tt can change the position and even the nature of the singularity in P1ftP_{1}f_{t}, but does not necessarily eliminate the singularity. For a general tt\in{\mathbb{R}} and ϕ\phi as in Example 1, we easily find that (8) is replaced by

(P1ft)(y2)={0if y2t,y2ty2tρ(y1)dy1if y2>t,(P_{1}f_{t})(y_{2})\,=\,\begin{cases}0&\mbox{if }y_{2}\leq t,\\ \displaystyle\int_{-\sqrt{y_{2}-t}}^{\sqrt{y_{2}-t}}\rho(y_{1})\,{\mathrm{d}}y_{1}&\mbox{if }y_{2}>t,\end{cases}

so that the graph of P1ftP_{1}f_{t} is simply translated with the singularity now occurring at y2=ty_{2}=t instead of y2=0y_{2}=0. The situation is the same for ϕ\phi as in Example 4.

For ϕ\phi as in Example 2, the choice t=1t=-1 recovers Example 3, while for t>1t>-1 we find

(P1ft)(y2)={0if y2[1+t,1+t],y221ty221tρ(y1)dy1if |y2|>1+t,(P_{1}f_{t})(y_{2})\,=\,\begin{cases}0&\mbox{if }y_{2}\in[-\sqrt{1+t},\sqrt{1+t}\,],\\ \displaystyle\int_{-\sqrt{y_{2}^{2}-1-t}}^{\sqrt{y_{2}^{2}-1-t}}\rho(y_{1})\,{\mathrm{d}}y_{1}&\mbox{if }|y_{2}|>\sqrt{1+t}\,,\\ \end{cases}

thus in this case (P1ft)(y2)(P_{1}f_{t})(y_{2}) has square-root singularities at y2=±1+ty_{2}=\pm\sqrt{1+t}. For the case t<1t<-1 (which we leave to the reader) P1ftP_{1}f_{t} has no singularity.

In [12] it was proved that P1ftP_{1}f_{t} has the same smoothness as ϕ\phi, provided that

ϕy1(𝒚)>0for all𝒚d,\frac{\partial\phi}{\partial y_{1}}({\boldsymbol{y}})>0\qquad\mbox{for all}\quad{\boldsymbol{y}}\in{\mathbb{R}}^{d}, (10)

together with some other technical conditions, see [12, Theorems 2 and 3].

Here we are interested in the situation when ϕ\phi is not monotone increasing with respect to y1y_{1} for all 𝒚1{\boldsymbol{y}}_{-1}. In that case (unless ϕ\phi is always monotone decreasing) there is at least one point, say 𝒚=(y1,𝒚1)d{\boldsymbol{y}}^{*}=(y_{1}^{*},{\boldsymbol{y}}_{-1}^{*})\in{\mathbb{R}}^{d}, at which (ϕ/y1)(𝒚)=0(\partial\phi/\partial y_{1})({\boldsymbol{y}}^{*})=0. At such a point the gradient of ϕ\phi is either zero or orthogonal to the y1y_{1} axis. If tt in (9) has the value t=ϕ(𝒚)t=\phi({\boldsymbol{y}}^{*}) then there is generically a singularity of some kind in P1ftP_{1}f_{t} at the point 𝒚1d1{\boldsymbol{y}}^{*}_{-1}\in{\mathbb{R}}^{d-1}. If tϕ(𝒚)t\neq\phi({\boldsymbol{y}}^{*}) then there is in general no singularity in P1ftP_{1}f_{t} at the point 𝒚1d1{\boldsymbol{y}}^{*}_{-1}\in{\mathbb{R}}^{d-1}, but if tt is not constant then the risk of encountering a near-singularity is high.

The following theorem states a general result for the existence and the nature of the singularities induced in P1ftP_{1}f_{t} in the common situation in which the second derivative of ϕ\phi with respect to y1y_{1} is non-zero at 𝒚=𝒚{\boldsymbol{y}}={\boldsymbol{y}}^{*}, the point at which the first derivative with respect to y1y_{1} is zero.

Theorem 1.

Let ϕC2(d)\phi\in C^{2}({\mathbb{R}}^{d}), and assume that 𝐲=(y1,𝐲1)d{\boldsymbol{y}}^{*}=(y_{1}^{*},{\boldsymbol{y}}_{-1}^{*})\in{\mathbb{R}}^{d} is such that

ϕy1(𝒚)=0,2ϕy12(𝒚)0,andϕ(𝒚)𝟎.\displaystyle\frac{\partial\phi}{\partial y_{1}}({\boldsymbol{y}}^{*})=0,\quad\frac{\partial^{2}\phi}{\partial y_{1}^{2}}({\boldsymbol{y}}^{*})\neq 0,\quad\mbox{and}\quad\nabla\phi({\boldsymbol{y}}^{*})\neq{\boldsymbol{0}}. (11)

Define t:=ϕ(𝐲)t:=\phi({\boldsymbol{y}}^{*}). Then (P1ft)(𝐲1)(P_{1}f_{t})({\boldsymbol{y}}_{-1}) has a square-root singularity at 𝐲1d1{\boldsymbol{y}}^{*}_{-1}\in{\mathbb{R}}^{d-1}, similar to those in Examples 1 and 2, along any line in d1{\mathbb{R}}^{d-1} through 𝐲1{\boldsymbol{y}}^{*}_{-1} that is not orthogonal to 1ϕ(𝐲):=((ϕ/y2)(𝐲),(ϕ/y3)(𝐲),,(ϕ/yd)(𝐲))\nabla_{\!-1}\phi({\boldsymbol{y}}^{*}):=((\partial\phi/\partial y_{2})({\boldsymbol{y}}^{*}),(\partial\phi/\partial y_{3})({\boldsymbol{y}}^{*}),\ldots,(\partial\phi/\partial y_{d})({\boldsymbol{y}}^{*})).

Proof.

Since ϕ(𝒚)\nabla\phi({\boldsymbol{y}}^{*}) is not zero, and has no component in the direction of the y1y_{1} axis, it follows that ϕ(𝒚)\nabla\phi({\boldsymbol{y}}^{*}) can be written as (0,1ϕ(𝒚))(0,\nabla_{\!-1}\phi({\boldsymbol{y}}^{*})), where 1ϕ(𝒚)=1ϕ(y1,𝒚1)\nabla_{\!-1}\phi({\boldsymbol{y}}^{*})=\nabla_{\!-1}\phi(y_{1}^{*},{\boldsymbol{y}}_{-1}^{*}) is a non-zero vector in d1{\mathbb{R}}^{d-1} orthogonal to the y1y_{1} axis. Note that as 𝒚1{\boldsymbol{y}}_{-1} changes in a neighborhood of 𝒚1{\boldsymbol{y}}_{-1}^{*}, the function ϕ(y1,𝒚1)\phi(y_{1}^{*},{\boldsymbol{y}}_{-1}) is increasing in the direction 1ϕ(𝒚)\nabla_{\!-1}\phi({\boldsymbol{y}}^{*}), and also in the direction of an arbitrary unit vector 𝒛{\boldsymbol{z}} in d1{\mathbb{R}}^{d-1} that has a positive inner product with 1ϕ(𝒚)\nabla_{\!-1}\phi({\boldsymbol{y}}^{*}). Our aim now is to explore the nature of P1ftP_{1}f_{t} on the line through 𝒚1d1{\boldsymbol{y}}_{-1}^{*}\in{\mathbb{R}}^{d-1} that is parallel to any such unit vector 𝒛{\boldsymbol{z}}.

For simplicity of presentation, and without loss of generality, we assume from now on that the unit vector 𝒛{\boldsymbol{z}} points in the direction of the positive y2y_{2} axis. This allows us to write 𝒚=(y1,y2,y3,,yd)=:(y1,y2){\boldsymbol{y}}=(y_{1},y_{2},y^{*}_{3},\ldots,y^{*}_{d})=:(y_{1},y_{2}), temporarily ignoring all components other than the first two. In this 22-dimensional setting we know that

ϕy1(y1,y2)=0,ϕy2(y1,y2)>0,and2ϕy12(y1,y2)0.\frac{\partial\phi}{\partial y_{1}}(y_{1}^{*},y_{2}^{*})=0,\quad\frac{\partial\phi}{\partial y_{2}}(y_{1}^{*},y_{2}^{*})>0,\quad\mbox{and}\quad\frac{\partial^{2}\phi}{\partial y_{1}^{2}}(y_{1}^{*},y_{2}^{*})\neq 0.

Since (ϕ/y2)(y1,y2)(\partial\phi/\partial y_{2})(y_{1},y_{2}) is continuous and positive at (y1,y2)=(y1,y2)(y_{1},y_{2})=(y_{1}^{*},y_{2}^{*}), it follows that (ϕ/y2)(y1,y2)(\partial\phi/\partial y_{2})(y_{1},y_{2}) is positive on the rectangle [y1δ,y1+δ]×[y2δ,y2+δ][y^{*}_{1}-\delta,y^{*}_{1}+\delta]\times[y^{*}_{2}-\delta,y^{*}_{2}+\delta] for sufficiently small δ>0\delta>0. Since ϕ(y1,y2)\phi(y_{1},y_{2}) is increasing with respect to y2y_{2} on this rectangle, it follows that for each y1[y1δ,y1+δ]y_{1}\in[y^{*}_{1}-\delta,y^{*}_{1}+\delta] there is at most one value of y2y_{2} such that ϕ(y1,y2)=t\phi(y_{1},y_{2})=t, and further, that for |y1y1||y_{1}-y^{*}_{1}| sufficiently small there is exactly one value of y2y_{2} such that ϕ(y1,y2)=t\phi(y_{1},y_{2})=t. For that unique value we write y2=ζ(y1)y_{2}=\zeta(y_{1}), hence by construction we have ϕ(y1,ζ(y1))=t\phi(y_{1},\zeta(y_{1}))=t, and ζ(y1)=y2\zeta(y^{*}_{1})=y^{*}_{2}.

From the implicit function theorem (or by implicit differentiation of ϕ(y1,ζ(y1))=t\phi(y_{1},\zeta(y_{1}))=t with respect to y1y_{1}) we obtain

ζ(y1)=(ϕ/y1)(y1,ζ(y1))(ϕ/y2)(y1,ζ(y1)),\displaystyle\zeta^{\prime}(y_{1})\,=\,-\,\frac{(\partial\phi/\partial y_{1})(y_{1},\zeta(y_{1}))}{(\partial\phi/\partial y_{2})(y_{1},\zeta(y_{1}))}, (12)

in which the denominator is non-zero in a neighborhood of y1y^{*}_{1}. From this it follows that

ζ(y1)= 0.\zeta^{\prime}(y^{*}_{1})\,=\,0.

Differentiating (12) using the product rule and the chain rule and then setting y1=y1y_{1}=y^{*}_{1} (so that several terms vanish), we obtain

ζ′′(y1)=(2ϕ/y12)(y1,y2)(ϕ/y2)(y1,y2),\zeta^{\prime\prime}(y^{*}_{1})\,=\,-\,\frac{(\partial^{2}\phi/\partial y_{1}^{2})(y^{*}_{1},y^{*}_{2})}{(\partial\phi/\partial y_{2})(y^{*}_{1},y^{*}_{2})},

which by assumption is non-zero. Below we assume that (2ϕ/y12)(y1,y2)<0(\partial^{2}\phi/\partial y_{1}^{2})(y^{*}_{1},y^{*}_{2})<0, from which it follows that ζ′′(y1)\zeta^{\prime\prime}(y^{*}_{1}) is positive; the case (2ϕ/y12)(y1,y2)>0(\partial^{2}\phi/\partial y_{1}^{2})(y^{*}_{1},y^{*}_{2})>0 is similar. Taylor’s theorem with remainder gives

ζ(y1)\displaystyle\zeta(y_{1}) =ζ(y1)+12(y1y1)2ζ′′(y1)(1+o(1))\displaystyle\,=\,\zeta(y^{*}_{1})+\tfrac{1}{2}(y_{1}-y^{*}_{1})^{2}\,\zeta^{\prime\prime}(y^{*}_{1})\,(1+o(1))
=y2+12(y1y1)2ζ′′(y1)(1+o(1)),\displaystyle\,=\,y^{*}_{2}+\tfrac{1}{2}(y_{1}-y^{*}_{1})^{2}\,\zeta^{\prime\prime}(y^{*}_{1})\,(1+o(1)), (13)

where o(1)0o(1)\to 0 as |y1y1|0|y_{1}-y^{*}_{1}|\to 0, thus ζ(y1)\zeta(y_{1}) is a convex function in a neighborhood of y1y^{*}_{1}.

Given y2y_{2} in a neighborhood of y2y^{*}_{2}, our task now is to evaluate the contribution to the integral

(P1ft)(y2)=ind(ϕ(y1,y2)t)ρ(y1)dy1(P_{1}f_{t})(y_{2})\,=\,\int_{-\infty}^{\infty}\mathop{\rm ind}\big{(}\phi(y_{1},y_{2})-t\big{)}\rho(y_{1})\,{\mathrm{d}}y_{1}

from a neighborhood of y1y^{*}_{1}. Thus we need to find the set of y1y_{1} values in a neighborhood of y1y^{*}_{1} for which ϕ(y1,y2)>t\phi(y_{1},y_{2})>t. Because of (2), the set is either empty, or is the open interval with extreme points given by the solutions of ζ(y1)=y2\zeta(y_{1})=y_{2}, i.e.,

y2+12(y1y1)2ζ′′(y1)(1+o(1))=y2,y^{*}_{2}+\tfrac{1}{2}(y_{1}-y^{*}_{1})^{2}\,\zeta^{\prime\prime}(y^{*}_{1})\,(1+o(1))\,=\,y_{2},

implying

(y1y1)2=2(y2y2)ζ′′(y1)(1+o(1))=2(y2y2)ζ′′(y1)(1+o(1)).(y_{1}-y^{*}_{1})^{2}\,=\,\frac{2(y_{2}-y^{*}_{2})}{\zeta^{\prime\prime}(y^{*}_{1})\,(1+o(1))}\,=\,\frac{2(y_{2}-y^{*}_{2})}{\zeta^{\prime\prime}(y^{*}_{1})}(1+o(1)).

There is no solution for y2<y2y_{2}<y^{*}_{2}, while for y2>y2y_{2}>y^{*}_{2} the solutions are

y1=y1±cy2y2(1+o(1)),y_{1}\,=\,y^{*}_{1}\pm c\sqrt{y_{2}-y^{*}_{2}}\,(1+o(1)),

with c:=2/ζ′′(y1)c:=\sqrt{2/\zeta^{\prime\prime}(y^{*}_{1})}. Thus the contribution to P1ft(y2)P_{1}f_{t}(y_{2}) from the neighborhood of y2y^{*}_{2} is zero for y2<y2y_{2}<y^{*}_{2}, and for y2>y2y_{2}>y^{*}_{2} is

y1cy2y2(1+o(1))y1+cy2y2(1+o(1))ρ(y1)dy1= 2cy2y2ρ(y1)(1+o(1)).\int_{y^{*}_{1}-c\sqrt{y_{2}-y^{*}_{2}}\,(1+o(1))}^{y^{*}_{1}+c\sqrt{y_{2}-y^{*}_{2}}\,(1+o(1))}\rho(y_{1})\,{\mathrm{d}}y_{1}\,=\,2c\sqrt{y_{2}-y^{*}_{2}}\,\rho(y^{*}_{1})\,(1+o(1)).

Thus there is a singularity in P1ftP_{1}f_{t} of exactly the same character as that in Examples 1 and 2. ∎

Remark 1.

We now return to consider the examples in Section 1.3 in the context of Theorem 1.

  • For ϕ\phi as in Example 1, we have (ϕ/y1)(y1,y2)=2y1(\partial\phi/\partial y_{1})(y_{1},y_{2})=-2y_{1}, (2ϕ/y12)(y1,y2)=20(\partial^{2}\phi/\partial y_{1}^{2})(y_{1},y_{2})=-2\neq 0, and ϕ(y1,y2)=(2y1,1)(0,0)\nabla\phi(y_{1},y_{2})=(-2y_{1},1)\neq(0,0). Thus (11) holds, e.g., with 𝒚=(0,0){\boldsymbol{y}}^{*}=(0,0) and t=ϕ(𝒚)=0t=\phi({\boldsymbol{y}}^{*})=0, and P1ftP_{1}f_{t} indeed displays the predicted square-root singularity at y2=0y_{2}=0, see Figure 1.

  • For ϕ\phi as in Example 2, we have (ϕ/y1)(y1,y2)=2y1(\partial\phi/\partial y_{1})(y_{1},y_{2})=-2y_{1}, (2ϕ/y12)(y1,y2)=20(\partial^{2}\phi/\partial y_{1}^{2})(y_{1},y_{2})=-2\neq 0, and ϕ(y1,y2)=(2y1,2y2)\nabla\phi(y_{1},y_{2})=(-2y_{1},2y_{2}). Thus (11) holds, e.g., with 𝒚=(0,±1){\boldsymbol{y}}^{*}=(0,\pm 1) and t=ϕ(𝒚)=0t=\phi({\boldsymbol{y}}^{*})=0, and P1ftP_{1}f_{t} indeed shows the predicted square-root singularities at y2=±1y_{2}=\pm 1, see Figure 2.

  • For ϕ\phi as in Example 3, we have the same derivative expressions as in Example 2. Thus (11) holds, e.g., again with 𝒚=(0,±1){\boldsymbol{y}}^{*}=(0,\pm 1), but now t=ϕ(𝒚)=1t=\phi({\boldsymbol{y}}^{*})=1, and we effectively recover Example 2 with square-root singularities for P1ftP_{1}f_{t} at y2=±1y_{2}=\pm 1. However, if we consider instead the point 𝒚=(0,0){\boldsymbol{y}}^{\dagger}=(0,0) and t=ϕ(𝒚)=0t=\phi({\boldsymbol{y}}^{\dagger})=0, as in Figure 3, then we have ϕ(𝒚)=0\nabla\phi({\boldsymbol{y}}^{\dagger})=0 so the non-vanishing gradient condition in (11) fails and Theorem 1 does not apply at this point 𝒚{\boldsymbol{y}}^{\dagger}. In this case we actually observe an absolute-value singularity for P1ftP_{1}f_{t} at y2=0y_{2}=0 rather than a square-root singularity.

  • For ϕ\phi as in Example 4, we have (ϕ/y1)(y1,y2)=3y12(\partial\phi/\partial y_{1})(y_{1},y_{2})=3y_{1}^{2}, (2ϕ/y12)(y1,y2)=6y1(\partial^{2}\phi/\partial y_{1}^{2})(y_{1},y_{2})=6y_{1}, and ϕ(y1,y2)=(3y12,1)(0,0)\nabla\phi(y_{1},y_{2})=(3y_{1}^{2},-1)\neq(0,0). It is impossible to satisfy both the first and second conditions in (11) so Theorem 1 does not apply anywhere. In particular, at the point 𝒚=(0,0){\boldsymbol{y}}^{\dagger}=(0,0) and t=ϕ(𝒚)=0t=\phi({\boldsymbol{y}}^{\dagger})=0, as in Figure 4, we have (2ϕ/y12)(𝒚)=0(\partial^{2}\phi/\partial y_{1}^{2})({\boldsymbol{y}}^{\dagger})=0, and in consequence (given that the third derivative does not vanish) P1ftP_{1}f_{t} has a cube-root singularity at y2=0y_{2}=0 rather than a square-root singularity.

From Theorem 1 one might suspect, because tt in the theorem has the particular value t=ϕ(𝒚)t=\phi({\boldsymbol{y}}^{*}), that singularities of this kind are rare. However, in the following theorem we show that values of tt\in{\mathbb{R}} at which singularities occur in P1ftP_{1}f_{t} are generally not isolated. This is essentially because points at which (ϕ/y1)(𝒚)=0(\partial\phi/\partial y_{1})({\boldsymbol{y}})=0 are themselves generally not isolated.

Theorem 2.

Let ϕC2(d)\phi\in C^{2}({\mathbb{R}}^{d}), and assume that 𝐲d{\boldsymbol{y}}^{*}\in{\mathbb{R}}^{d} is such that

ϕy1(𝒚)=0,ϕ(𝒚)𝟎,andϕy1(𝒚)𝟎,\displaystyle\frac{\partial\phi}{\partial y_{1}}({\boldsymbol{y}}^{*})=0,\quad\nabla\phi({\boldsymbol{y}}^{*})\neq{\boldsymbol{0}},\quad\mbox{and}\quad\nabla\frac{\partial\phi}{\partial y_{1}}({\boldsymbol{y}}^{*})\neq{\boldsymbol{0}}, (14)

with ϕ(𝐲)\nabla\phi({\boldsymbol{y}}^{*}) not parallel to (ϕ/y1)(𝐲)\nabla(\partial\phi/\partial y_{1})({\boldsymbol{y}}^{*}). Then for any tt in some open interval containing ϕ(𝐲)\phi({\boldsymbol{y}}^{*}) there exists a point 𝐲(t)d{\boldsymbol{y}}^{(t)}\in{\mathbb{R}}^{d} in a neighborhood of 𝐲{\boldsymbol{y}}^{*} at which

ϕ(𝒚(t))=t,ϕy1(𝒚(t))=0,andϕ(𝒚(t))𝟎.\displaystyle\phi({\boldsymbol{y}}^{(t)})=t,\quad\frac{\partial\phi}{\partial y_{1}}({\boldsymbol{y}}^{(t)})=0,\quad\mbox{and}\quad\nabla\phi({\boldsymbol{y}}^{(t)})\neq{\boldsymbol{0}}. (15)

Moreover, if we assume also that (2ϕ/y12)(𝐲)0(\partial^{2}\phi/\partial y_{1}^{2})({\boldsymbol{y}}^{*})\neq 0, then the preintegrated quantity (P1ft)(𝐲1)(P_{1}f_{t})({\boldsymbol{y}}_{-1}) has a square-root singularity at 𝐲1(t)d1{\boldsymbol{y}}^{(t)}_{-1}\in{\mathbb{R}}^{d-1} along any line in d1{\mathbb{R}}^{d-1} through 𝐲1(t){\boldsymbol{y}}^{(t)}_{-1} that is not orthogonal to ϕ(𝐲(t))\nabla\phi({\boldsymbol{y}}^{(t)}).

Proof.

It is convenient to define ψ(𝒚):=(ϕ/y1)(𝒚)\psi({\boldsymbol{y}}):=(\partial\phi/\partial y_{1})({\boldsymbol{y}}), which by assumption is a real-valued C1(d)C^{1}(\mathbb{R}^{d}) function that satisfies

ψ(𝒚)=0andψ(𝒚)𝟎.\psi({\boldsymbol{y}}^{*})=0\quad\mbox{and}\quad\nabla\psi({\boldsymbol{y}}^{*})\neq{\boldsymbol{0}}.

We need to show that for tt in some open interval containing ϕ(𝒚)\phi({\boldsymbol{y}}^{*}) there exists 𝒚(t)d{\boldsymbol{y}}^{(t)}\in\mathbb{R}^{d} in a neighborhood of 𝒚{\boldsymbol{y}}^{*} at which

ϕ(𝒚(t))=t,ψ(𝒚(t))=0,andϕ(𝒚(t))𝟎.\displaystyle\phi({\boldsymbol{y}}^{(t)})=t,\quad\psi({\boldsymbol{y}}^{(t)})=0,\quad\mbox{and}\quad\nabla\phi({\boldsymbol{y}}^{(t)})\neq{\boldsymbol{0}}.

Clearly, we can confine our search for 𝒚(t){\boldsymbol{y}}^{(t)} to the zero level set of ψ\psi, that is, to the solutions of

ψ(𝒚)=0,𝒚d.\psi({\boldsymbol{y}})=0,\quad{\boldsymbol{y}}\in\mathbb{R}^{d}.

Since ψ(𝒚)\nabla\psi({\boldsymbol{y}}) is continuous and non-zero in a neighborhood of 𝒚{\boldsymbol{y}}^{*}, the zero level set of ψ\psi is a manifold of dimension d1d-1 near 𝒚{\boldsymbol{y}}^{*}, whose tangent hyperplane at 𝒚{\boldsymbol{y}}^{*} is orthogonal to ψ(𝒚)\nabla\psi({\boldsymbol{y}}^{*}), see, e.g., [16, Chapter 5]. On this hyperplane there is a search direction starting from 𝒚{\boldsymbol{y}}^{*} for which ϕ(𝒚)\phi({\boldsymbol{y}}) has a maximal rate of increase, namely the direction of the orthogonal projection of ϕ(𝒚)\nabla\phi({\boldsymbol{y}}^{*}) onto the tangent hyperplane, noting that this is a non-zero vector because ϕ(𝒚)\nabla\phi({\boldsymbol{y}}^{*}) is not parallel to ψ(𝒚)\nabla\psi({\boldsymbol{y}}^{*}). Setting out from the point 𝒚{\boldsymbol{y}}^{*} in the direction of positive gradient, the value of ϕ\phi is strictly increasing in a sufficiently small neighborhood of 𝒚{\boldsymbol{y}}^{*}, while in the direction of negative gradient it is strictly decreasing. Thus searching on the manifold for a 𝒚(t){\boldsymbol{y}}^{(t)} such that ϕ(𝒚(t))=t\phi({\boldsymbol{y}}^{(t)})=t will be successful in one of these directions for tt in a sufficiently small open interval containing ϕ(𝒚)\phi({\boldsymbol{y}}^{*}).

Under the additional assumption that (2ϕ/y12)(𝒚)0(\partial^{2}\phi/\partial y_{1}^{2})({\boldsymbol{y}}^{*})\neq 0, all the conditions of Theorem 1 are satisfied with 𝒚{\boldsymbol{y}}^{*} replaced by 𝒚(t){\boldsymbol{y}}^{(t)}, noting that because ϕC2(d)\phi\in C^{2}(\mathbb{R}^{d}) the second derivative is also non-zero in a sufficiently small neighborhood of 𝒚{\boldsymbol{y}}^{*}. This completes the proof. ∎

Remark 2.

We now show that for ϕ\phi as in Examples 14 the singularities in P1ftP_{1}f_{t}, with ftf_{t} as in (9), are not isolated and furthermore, we give the exact regions in which the singularities exist.

  • For ϕ\phi as in Example 1 we may choose 𝒚=(0,0){\boldsymbol{y}}^{*}=(0,0), as in Remark 1. Indeed, the gradient of the first derivative with respect to y1y_{1} is (ϕ/y1)(y1,y2)=(2,0)\nabla(\partial\phi/\partial y_{1})(y_{1},y_{2})=(-2,0), which is not parallel to ϕ(y1,y2)=(2y1,1)\nabla\phi(y_{1},y_{2})=(-2y_{1},1) for all (y1,y2)2(y_{1},y_{2})\in{\mathbb{R}}^{2}. It follows that (14) holds, e.g., at 𝒚=(0,0){\boldsymbol{y}}^{*}=(0,0). Hence Theorem 2 implies that for tt in some interval around ϕ(𝒚)=0\phi({\boldsymbol{y}}^{*})=0 there is 𝒚(t)=(y1(t),y2(t)){\boldsymbol{y}}^{(t)}=(y_{1}^{(t)},y_{2}^{(t)}) in a neighborhood of 𝒚=(0,0){\boldsymbol{y}}^{*}=(0,0) such that ϕ(𝒚(t))=t\phi({\boldsymbol{y}}^{(t)})=t and (15) holds. In particular, there is still a square-root singularity in (P1ft)(y2)(P_{1}f_{t})(y_{2}) at y2=y2(t)y_{2}=y_{2}^{(t)}. We confirm that this is indeed the case by taking 𝒚(t)=(0,t){\boldsymbol{y}}^{(t)}=(0,t) and by observing that, as is easily verified, (P1ft)(y2)(P_{1}f_{t})(y_{2}) has a square-root singularity at y2=ty_{2}=t for all real numbers tt. This singularity in P1ftP_{1}f_{t} is similar to the singularity depicted in Figure 1, but translated by tt.

  • For ϕ\phi as in Example 3 we can consider 𝒚=(0,±1){\boldsymbol{y}}^{*}=(0,\pm 1) as in Remark 1. The gradient of the first derivative with respect to y1y_{1} is (ϕ/y1)(y1,y2)=(2,0)\nabla(\partial\phi/\partial y_{1})(y_{1},y_{2})=(-2,0), which is not parallel to ϕ(y1,y2)=(2y1,2y2)\nabla\phi(y_{1},y_{2})=(-2y_{1},2y_{2}) for all (y1,y2)2(y_{1},y_{2})\in{\mathbb{R}}^{2} with y20y_{2}\neq 0. Thus (14) holds, e.g., at 𝒚=(0,±1){\boldsymbol{y}}^{*}=(0,\pm 1). Theorem 2 implies that for tt in some interval around ϕ(𝒚)=1\phi({\boldsymbol{y}}^{*})=1 there is a point 𝒚(t){\boldsymbol{y}}^{(t)} in a neighborhood of 𝒚=(0,±1){\boldsymbol{y}}^{*}=(0,\pm 1) such that ϕ(𝒚(t))=t\phi({\boldsymbol{y}}^{(t)})=t, (15) holds, and (P1ft)(y2)(P_{1}f_{t})(y_{2}) has a square-root singularity at y2=y2(t)y_{2}=y_{2}^{(t)}. Indeed, for any real number t>0t>0, taking 𝒚(t)=(0,±t){\boldsymbol{y}}^{(t)}=(0,\pm\sqrt{t}) gives ϕ(𝒚(t))=t\phi({\boldsymbol{y}}^{(t)})=t, and it can easily be verified that (P1ft)(y2)(P_{1}f_{t})(y_{2}) has two square-root singularities at y2=±ty_{2}=\pm\sqrt{t}. In this case the behavior of P1ftP_{1}f_{t} is similar to Figure 2, with the location of the singularities now depending on tt.

  • Since ϕ\phi from Example 2 is simply a translation of Example 3 by 1-1, similar singularities exist for that case for t>1t>-1.

  • For ϕ\phi as in Example 4 the condition (14) never holds, since (ϕ/y1)(y1,y2)=(0,0)\nabla(\partial\phi/\partial y_{1})(y_{1},y_{2})=(0,0) whenever (ϕ/y1)(y1,y2)=0(\partial\phi/\partial y_{1})(y_{1},y_{2})=0. So no conclusion can be drawn from Theorem 2 in this case. It is no contradiction that, as is easily seen, there is a singularity (of cube-root character) in (P1ft)(y2)(P_{1}f_{t})(y_{2}) at y2=ty_{2}=t for every real number tt.

3 A high-dimensional example

Motivated by applications in computational finance, for a high-dimensional example we consider the problem of approximating the fair price for a digital Asian option, a problem that can be formulated as an integral as in (1) with a discontinuous integrand of the form (9). When monotonicity holds, it was shown in [12] that preintegration not only has theoretical smoothing benefits, but also that when followed by a QMC rule to compute the (d1)(d-1)-dimensional integral the computational experience can be excellent. On the other hand that paper provided no insight as to what happens, either theoretically or numerically, when monotonicity fails. In this section, in contrast, we will deliberately apply preintegration using a chosen variable for which the monotonicity condition fails. We will demonstrate the resulting lack of smoothness, using the theoretical results from the previous section, and show that the performance of the subsequent QMC rule can degrade when the preintegration variable lacks the monotonicity property.

For a given strike price KK, the payoff for a digital Asian call option is given by

payoff=ind(ϕK),\mathrm{payoff}\,=\,\mathop{\rm ind}(\phi-K),

where ϕ\phi is the average price of the underlying stock over the time period. Under the Black–Scholes model the time-discretised average is given by

ϕ(𝒚)=S0dk=1dexp((r12σ2)kTd+σ𝑨k𝒚),\phi({\boldsymbol{y}})\,=\,\frac{S_{0}}{d}\sum_{k=1}^{d}\exp\bigg{(}(r-\tfrac{1}{2}\sigma^{2})\frac{kT}{d}+\sigma{\boldsymbol{A}}_{k}{\boldsymbol{y}}\bigg{)}, (16)

where 𝒚=(yk)k=1d{\boldsymbol{y}}=(y_{k})_{k=1}^{d} is a vector of i.i.d. standard normal random variables, S0S_{0} is the initial stock price, TT is the final time, rr is the risk-free interest rate, σ\sigma is the volatility and dd is the number of timesteps, which is also the dimension of the problem. Note that in (16) we have already made a change of variables to write the problem in terms of standard normal random variables, by factorising the covariance matrix of the Brownian motion increments as Σ=AA\Sigma=AA^{\top}, where the entries of the covariance matrix are Σk,=min(k,)×T/d\Sigma_{k,\ell}=\min(k,\ell)\times T/d. Then in (16), 𝑨k{\boldsymbol{A}}_{k} denotes the kkth row of this matrix factor.

The fair price of the option is then given by the discounted expected payoff

erT𝔼[payoff]=erTdind(ϕ(𝒚)K)ρd(𝒚)d𝒚.e^{-rT}\,{\mathbb{E}}[\mathrm{payoff}]\,=\,e^{-rT}\int_{{\mathbb{R}}^{d}}\mathop{\rm ind}(\phi({\boldsymbol{y}})-K)\rho_{d}({\boldsymbol{y}})\,\mathrm{d}{\boldsymbol{y}}\,. (17)

Letting f(𝒚)=ind(ϕ(𝒚)K)f({\boldsymbol{y}})=\mathop{\rm ind}(\phi({\boldsymbol{y}})-K), this example clearly fits into the framework (9) where ϕ\phi is the average stock price (16), tt is the strike price KK and each ρ\rho is a standard normal density.

There are three popular methods for factorising the covariance matrix: the standard construction (which uses the Cholesky factorisation), Brownian bridge, and principal components or PCA, see, e.g., [7] for further details. In the first two cases all components of the matrix AA are positive, in which case it is easily seen by studying the derivative of (16) with respect to yjy_{j} for some j=1,,dj=1,\ldots,d,

ϕyj(𝒚)=S0dk=1dσAk,jexp((r12σ2)kTd+σ𝑨k𝒚),\frac{\partial\phi}{\partial y_{j}}({\boldsymbol{y}})\,=\,\frac{S_{0}}{d}\sum_{k=1}^{d}\sigma A_{k,j}\exp\bigg{(}(r-\tfrac{1}{2}\sigma^{2})\frac{kT}{d}+\sigma{\boldsymbol{A}}_{k}{\boldsymbol{y}}\bigg{)},

that ϕ\phi is monotone increasing with respect to yjy_{j} no matter which jj is chosen.

In contrast, for the PCA construction, which we consider below, the situation is very different, in that there is only one choice of jj for which ϕ\phi is monotone with respect to yjy_{j}. This is because with PCA the factorisation of Σ\Sigma employs its eigendecomposition, with the jjth column of AA being a (scaled) eigenvector corresponding to the jjth eigenvalue labeled in decreasing order. Since the covariance matrix Σ\Sigma is positive definite the eigenvector corresponding to the largest eigenvalue has all components positive, thus for j=1j=1 monotonicity of ϕ\phi is achieved. On the other hand, every eigenvector other than the first is orthogonal to the first, and therefore must have components of both signs. Given that

Ak,j>0exp(Ak,jyj){+as yj+,0as yj,\displaystyle A_{k,j}>0\implies\exp(A_{k,j}y_{j})\to\begin{cases}+\infty&\mbox{as }y_{j}\to+\infty,\\ 0&\mbox{as }y_{j}\to-\infty,\end{cases}
Ak,j<0exp(Ak,jyj){0as yj+,+as yj,\displaystyle A_{k,j}<0\implies\exp(A_{k,j}y_{j})\to\begin{cases}0&\mbox{as }y_{j}\to+\infty,\\ +\infty&\mbox{as }y_{j}\to-\infty,\end{cases}

it follows that for j1j\neq 1 there is at least one term in the sum over kk in (16) that approaches ++\infty as yj+y_{j}\to+\infty and at least one other term that approaches ++\infty as yjy_{j}\to-\infty. Given that all terms in the sum over kk in (16) are positive, it follows that for the PCA case and j1j\neq 1, ϕ\phi must approach ++\infty as yj±y_{j}\to\pm\infty, so is definitely not monotone. Moreover, with respect to each variable yjy_{j} the function ϕ\phi is strictly convex, since

2ϕyj2(𝒚)=S0dk=1d(σAk,j)2exp((r12σ2)kTd+σ𝑨k𝒚)> 0for all 𝒚d.\frac{\partial^{2}\phi}{\partial y_{j}^{2}}({\boldsymbol{y}})\,=\,\frac{S_{0}}{d}\sum_{k=1}^{d}(\sigma A_{k,j})^{2}\exp\bigg{(}(r-\tfrac{1}{2}\sigma^{2})\frac{kT}{d}+\sigma{\boldsymbol{A}}_{k}{\boldsymbol{y}}\bigg{)}\,>\,0\quad\text{for all }{\boldsymbol{y}}\in{\mathbb{R}}^{d}.

For definiteness, in the following discussion we denote by y2y_{2} the preintegration variable for which monotonicity fails, and denote the other variables by 𝒚2=(y1,y3,,yd){\boldsymbol{y}}_{-2}=(y_{1},y_{3},\ldots,y_{d}). We now use the results from the previous section to determine the smoothness, or rather the lack thereof, of P2fP_{2}f when f(𝒚)=ind(ϕ(𝒚)K)f({\boldsymbol{y}})=\mathop{\rm ind}(\phi({\boldsymbol{y}})-K). To do this we will use Theorem 1 and Theorem 2, where with a slight abuse of notation we replace y1y_{1} by y2y_{2} as our special preintegration variable.

First, note that we have already established that ϕ\phi is not monotone with respect to y2y_{2}, and since ϕ\phi is strictly convex with respect to y2y_{2}, for a given 𝒚2{\boldsymbol{y}}_{-2} there exists a unique y2y_{2}^{*}\in{\mathbb{R}} such that (ϕ/y2)(y2,𝒚2)=0(\partial\phi/\partial y_{2})(y_{2}^{*},{\boldsymbol{y}}_{-2})=0. Since ϕ\phi is strictly increasing with respect to y1y_{1}, it follows that ϕ(y2,𝒚2)𝟎\nabla\phi(y_{2}^{*},{\boldsymbol{y}}_{-2})\neq{\boldsymbol{0}}. Furthermore, since (2ϕ/y22)(y2,𝒚2)>0(\partial^{2}\phi/\partial y_{2}^{2})(y_{2}^{*},{\boldsymbol{y}}_{-2})>0, Theorem 1 implies that for K=ϕ(y2,𝒚2)K=\phi(y_{2}^{*},{\boldsymbol{y}}_{-2}) the preintegrated function P2fP_{2}f has a square-root singularity along any line not orthogonal to 2ϕ(y2,𝒚2)\nabla_{\!-2}\phi(y_{2}^{*},{\boldsymbol{y}}_{-2}), with 2\nabla_{\!-2} defined analogously to 1\nabla_{\!-1} in Theorem 1.

Furthermore, Theorem 2 implies that this singularity is not isolated. To apply Theorem 2 we note that we have already established the first two conditions in (14) (recall again that we now take y2y_{2} as the preintegration variable). We also have (2ϕ/y22)(y2,𝒚2)>0(\partial^{2}\phi/\partial y_{2}^{2})(y_{2}^{*},{\boldsymbol{y}}_{-2})>0, which implies (ϕ/y2)(y2,𝒚2)𝟎\nabla(\partial\phi/\partial y_{2})(y_{2}^{*},{\boldsymbol{y}}_{-2})\neq{\boldsymbol{0}}. Moreover, we know that (ϕ/y2)(y2,𝒚2)\nabla(\partial\phi/\partial y_{2})(y_{2}^{*},{\boldsymbol{y}}_{-2}) and ϕ(y2,𝒚2)\nabla\phi(y_{2}^{*},{\boldsymbol{y}}_{-2}) are not parallel, since the former has a positive second component while the latter has a zero second component.

Refer to caption
Figure 5: Illustrations for digital Asian option in two dimensions.

To visualise this singularity, in Figure 5 we provide an illustration of the option in two dimensions. (Note that we consider d=2d=2 here for visualisation purposes only; we have shown already that the singularity exists for any choice of d>1d>1. Later we present numerical results for d=256d=256.) Figure 5 gives a contour plot of ϕ(y1,y2)K\phi(y_{1},y_{2})-K (left), the zero level set of ϕ(y1,y2)=K\phi(y_{1},y_{2})=K (middle) and then the graph of P2fP_{2}f (right). As expected, we can clearly see that P2fP_{2}f has a singularity that is of square-root nature.

To perform the preintegration step P2fP_{2}f in practice, note that since ϕ\phi is strictly convex with respect to y2y_{2} for each 𝒚2d1{\boldsymbol{y}}_{-2}\in{\mathbb{R}}^{d-1} there is a single turning point y2y_{2}^{*}\in{\mathbb{R}} for which (ϕ/y2)(y2,𝒚2)=0(\partial\phi/\partial y_{2})(y_{2}^{*},{\boldsymbol{y}}_{-2})=0 and ϕ(y2,𝒚2)\phi(y_{2}^{*},{\boldsymbol{y}}_{-2}) is a global minimum. It follows that there are at most two distinct points, ξa(𝒚2)ξb(𝒚2)\xi_{a}({\boldsymbol{y}}_{-2})\leq\xi_{b}({\boldsymbol{y}}_{-2}), such that

ϕ(ξa(𝒚2),𝒚2)=K=ϕ(ξb(𝒚2),𝒚2).\phi(\xi_{a}({\boldsymbol{y}}_{-2}),{\boldsymbol{y}}_{-2})\,=\,K\,=\,\phi(\xi_{b}({\boldsymbol{y}}_{-2}),{\boldsymbol{y}}_{-2})\,.

Preintegration with respect to y2y_{2} then simplifies to

(P2f)(𝒚2)\displaystyle(P_{2}f)({\boldsymbol{y}}_{-2})\, =ind(ϕ(y2,𝒚2)K)ρ(y2)dy2\displaystyle=\,\int_{-\infty}^{\infty}\mathop{\rm ind}(\phi(y_{2},{\boldsymbol{y}}_{-2})-K)\,\rho(y_{2})\,\mathrm{d}y_{2}
={1if ϕ(y2,𝒚2)K,ξa(𝒚2)ρ(y2)dy2+ξb(𝒚2)ρ(y2)dy2otherwise,\displaystyle=\,\begin{cases}1&\text{if }\phi(y_{2}^{*},{\boldsymbol{y}}_{-2})\geq K,\\[5.69054pt] \displaystyle\int_{-\infty}^{\xi_{a}({\boldsymbol{y}}_{-2})}\rho(y_{2})\,\mathrm{d}y_{2}+\int_{\xi_{b}({\boldsymbol{y}}_{-2})}^{\infty}\rho(y_{2})\,\mathrm{d}y_{2}&\text{otherwise},\end{cases}
={1if ϕ(y2,𝒚2)K,Φ(ξa(𝒚2))+1Φ(ξb(𝒚2))otherwise.\displaystyle=\,\begin{cases}1&\text{if }\phi(y_{2}^{*},{\boldsymbol{y}}_{-2})\geq K,\\[5.69054pt] \Phi(\xi_{a}({\boldsymbol{y}}_{-2}))+1-\Phi(\xi_{b}({\boldsymbol{y}}_{-2}))\hskip 42.67912pt&\text{otherwise}.\end{cases}

In practice, for each 𝒚2d1{\boldsymbol{y}}_{-2}\in{\mathbb{R}}^{d-1} the turning point y2y_{2}^{*} and the points of discontinuity ξa(𝒚2)\xi_{a}({\boldsymbol{y}}_{-2}) and ξb(𝒚2)\xi_{b}({\boldsymbol{y}}_{-2}) are computed numerically, e.g., by Newton’s method.

We now look at how this lack of smoothness affects the performance of using a numerical preintegration method to approximate the fair price for the digital Asian option in d=256d=256 dimensions. Explicitly, we approximate the integral in (17) by applying a (d1)(d-1)-dimensional QMC rule to P2fP_{2}f. As a comparison, we also present results for approximating the integral in (17) by applying the same QMC rule to P1fP_{1}f. Recall that ϕ\phi is monotone in dimension 11 and furthermore, it was shown in [12] that P1fP_{1}f is smooth.

For the QMC rule we use a randomly shifted lattice rule based on the generating vector lattice-32001-1024-1048576.3600 from [14] using N=210,211,,219N=2^{10},2^{11},\ldots,2^{19} points with R=16R=16 random shifts. The parameters for the option are S0=$100S_{0}=\$100, K=$110K=\$110, T=1T=1, d=256d=256 timesteps, r=0.1r=0.1 and σ=0.1\sigma=0.1. We also performed a standard Monte Carlo approximation using R×NR\times N points and a plain (without preintegration) QMC approximation using the same generating vector.

In Figure 6, we plot the convergence of the standard error, which we estimate by the sample standard error over the different random shifts, in terms of the total number of function evaluations R×NR\times N. We can clearly see that preintegration with respect to y2y_{2} produces less accurate results compared to preintegration with respect to y1y_{1}, with errors that are up to an order of magnitude larger for the higher values of NN. We also note that to achieve a given error, say 10410^{-4}, the number of points needs to be increased tenfold. Furthermore, we observe that the empirical convergence rate for preintegration with respect to y2y_{2} is N0.9N^{-0.9}, which is sightly worse than the rate of N0.98N^{-0.98} for preintegration with respect to y1y_{1}. Hence, when the monotonicity condition fails, not only does the theory for QMC fail due to the presence of a singularity, but we also observe worse results in practice and somewhat slower convergence.

We also plot the error of standard Monte Carlo and QMC approximations, which behave as expected and are both significantly outperformed by the two preintegration methods.

Refer to caption
Figure 6: Convergence in NN for different approximations of the fair price for a digital Asian option.

4 Conclusion

If the monotonicity property (10) fails and f=ftf=f_{t} is defined by (9) then we have seen that generically there is a singularity in P1fP_{1}f for some values of tt, and under known conditions this is true even for values of tt in an interval.

It should also be noted that implementation of preintegration is more difficult if monotonicity fails, since instead of a single integral from ξ(𝒚1)\xi({\boldsymbol{y}}_{-1}) to \infty, as in (7), there will in general be additional finite or infinite intervals to integrate over, all of whose end points must be discovered by the user for each required value of 𝒚1{\boldsymbol{y}}_{-1}.

To explore the consequences of a lack of monotonicity empirically, we carried out in Section 3 a 256256-dimensional calculation of pricing a digital Asian option, first by preintegrating with respect to a variable known to lack the monotonicity property, and then with a variable where the property holds, with the result that both accuracy and rate of convergence were observed to be degraded when monotonicity fails.

There is an additional problem of preintegrating with respect to a variable for which the monotonicity fails, namely that because of the proven lack of smoothness, the resulting preintegrated function no longer belongs to the space of (d1)(d-1)-variate functions of dominating mixed smoothness of order one, and as a consequence there is at present no theoretical support for our use of QMC integration for this (d1)(d-1)-dimensional integral.

The practical significance of this paper is that effective use of preintegration is greatly enhanced by the preliminary identification of a special variable for which the monotonicity property is known to hold. The paper does not offer guidance on the choice of variable if there is more than one such variable; in such cases it may be natural to choose the variable for which preintegration leads to the greatest reduction in variance.

Acknowledgements

The authors acknowledge the support of the Australian Research Council under the Discovery Project DP210100831.

References

  • [1] N. Achtsis, R. Cools, and D. Nuyens, Conditional sampling for barrier option pricing under the LT method, SIAM J. Financial Math. 4 (2013), 327–352.
  • [2] N. Achtsis, R. Cools, and D. Nuyens, Conditional sampling for barrier option pricing under the Heston model, in: J. Dick, F.Y. Kuo, G.W. Peters, I.H. Sloan (eds.), Monte Carlo and Quasi-Monte Carlo Methods 2012, pp. 253–269, Springer-Verlag, Berlin/Heidelberg (2013).
  • [3] C. Bayer, M. Siebenmorgen, and R. Tempone, Smoothing the payoff for efficient computation of Basket option prices, Quant. Finance 18 (2018), 491–505.
  • [4] H. Bungartz and M. Griebel, Sparse grids, Acta Numer. 13 (2004), 147–269.
  • [5] J. Dick, F. Y. Kuo, and I. H. Sloan, High dimensional integration – the quasi-Monte Carlo way, Acta Numer. 22 (2013), 133–288.
  • [6] A. D. Gilbert, F. Y. Kuo, and I. H. Sloan, Approximating distribution functions and densities using quasi-Monte Carlo methods after smoothing by preintegration, arXiv:2112.10308, (2021).
  • [7] P. Glasserman, Monte Carlo Methods in Financial Engineering, Springer-Verlag, Berlin-Heidelberg, (2003).
  • [8] P. Glasserman and J. Staum, Conditioning on one-step survival for barrier option simulations, Oper. Res., 49 (2001), 923–937.
  • [9] M. Griebel, F. Y. Kuo, and I. H. Sloan, The smoothing effect of the ANOVA decomposition, J. Complexity 26 (2010), 523–551.
  • [10] M. Griebel, F. Y. Kuo, and I. H. Sloan, The smoothing effect of integration in d{\mathbb{R}}^{d} and the ANOVA decomposition, Math. Comp. 82 (2013), 383–400.
  • [11] M. Griebel, F. Y.  Kuo, and I. H. Sloan, Note on “the smoothing effect of integration in d{\mathbb{R}}^{d} and the ANOVA decomposition”, Math. Comp. 86 (2017), 1847–1854.
  • [12] A. Griewank, F. Y. Kuo, H. Leövey, and I. H. Sloan. High dimensional integration of kinks and jumps – smoothing by preintegration, J. Comput. Appl. Math. 344 (2018), 259–274.
  • [13] M. Holtz, Sparse Grid Quadrature in High Dimensions with Applications in Finance and Insurance (PhD thesis), Springer-Verlag, Berlin, 2011.
  • [14] F. Y. Kuo, https://web.maths.unsw.edu.au/\simfkuo/lattice/index.html, (2007), accessed December 13, 2021.
  • [15] P. L’Ecuyer, F. Puchhammer, and A. Ben Abdellah. Monte Carlo and Quasi-Monte Carlo Density Estimation via Conditioning, arXiv:1906.04607, (2021).
  • [16] J. .M. Lee, Introduction to Smooth Manifolds, Springer, New York, (2000).
  • [17] D. Nuyens and B. J. Waterhouse, A global adaptive quasi-Monte Carlo algorithm for functions of low truncation dimension applied to problems from finance, in: L. Plaskota and H. Woźniakowski (eds.), Monte Carlo and Quasi-Monte Carlo Methods 2010, pp. 589–607, Springer-Verlag, Berlin/Heidelberg (2012).
  • [18] C. Weng, X. Wang, and Z. He, Efficient computation of option prices and greeks by quasi-Monte Carlo method with smoothing and dimension reduction, SIAM J. Sci. Comput. 39 (2017), B298–B322.