This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

1

Fast American Option Pricing using Nonlinear Stencils

Zafar Ahmad Stony Brook UniversityStony BrookUSA [email protected] Reilly Browne Stony Brook UniversityStony BrookUSA [email protected] Rezaul Chowdhury Stony Brook UniversityStony BrookUSA [email protected] Rathish Das University of HoustonHoustonUSA [email protected] Yushen Huang Stony Brook UniversityStony BrookUSA [email protected]  and  Yimin Zhu Stony Brook UniversityStony BrookUSA [email protected]
(20 February 2007; 12 March 2009; 5 June 2009)
Abstract.

We study the binomial, trinomial, and Black-Scholes-Merton models of option pricing. We present fast parallel discrete-time finite-difference algorithms for American call option pricing under the binomial and trinomial models and American put option pricing under the Black-Scholes-Merton model. For TT-step finite differences, each algorithm runs in 𝒪((Tlog2T)/p+T){\mathcal{O}}\left({{\left(T\log^{2}{T}\right)}/{p}+T}\right) time under a greedy scheduler on pp processing cores, which is a significant improvement over the Θ(T2/p)+Ω(TlogT){\Theta}\left({{T^{2}}/{p}}\right)+{\Omega}\left({T\log{T}}\right) time taken by the corresponding state-of-the-art parallel algorithm. Even when run on a single core, the 𝒪(Tlog2T){\mathcal{O}}\left({T\log^{2}{T}}\right) time taken by our algorithms is asymptotically much smaller than the Θ(T2){\Theta}\left({T^{2}}\right) running time of the fastest known serial algorithms. Implementations of our algorithms significantly outperform the fastest implementations of existing algorithms in practice, e.g., when run for T1000T\approx 1000 steps on a 48-core machine, our algorithm for the binomial model runs at least 15×15\times faster than the fastest existing parallel program for the same model with the speed-up factor gradually reaching beyond 500×500\times for T0.5×106T\approx 0.5\times 10^{6}. It saves more than 80% energy when T4000T\approx 4000, and more than 99% energy for T>60,000T>60,000.

Our option pricing algorithms can be viewed as solving a class of nonlinear 1D stencil (i.e., finite-difference) computation problems efficiently using the Fast Fourier Transform (FFT). To our knowledge, ours are the first algorithms to handle such stencils in o(T2){o}\left({T^{2}}\right) time. These contributions are of independent interest as stencil computations have a wide range of applications beyond quantitative finance.

American Option Pricing, Binomial Option Pricing Model, Trinomial Option Pricing, Black-Scholes-Merton Option Pricing Model, Nonlinear Stencil, Fast Fourier Transform, Finite-Difference Method
copyright: noneccs: Computing methodologies Modeling and simulationccs: Computing methodologies Shared memory algorithmsccs: Computing methodologies Modeling methodologiesccs: Mathematics of computing Markov processesccs: Mathematics of computing Solvers

1. Introduction

Option pricing or computing the value of a contract giving one the right to buy/sell an asset under some given constraints is one of the most important computational problems in quantitative finance (Ames, 2014). Rapid changes in financial markets often lead to rapid changes in asset prices which makes the ability to quickly estimate option prices essential in avoiding potential financial losses (Peter and Packer, 2007).

An option is a two-party financial contract that gives one party (called the holder) the right (but not an obligation) to buy/sell (i.e., exercise) an asset from/to the other party (called the writer) at a fixed price (called the strike/exercise price) on or before an expiration date (called the exercise/maturity date). A call option gives the right to buy whereas a put option gives the right to sell. Also, based on the expiration date and the settlement rule, there are two major styles of options: European and American. A European option can only be exercised at the expiration date while an American option can be exercised at any time before that.

The option pricing problem asks for assigning a value or price to an options contract based on the calculated probability that the contract will be exercised at expiration. The theoretical value of an option (Merton, 1973; Smith, 1976; Bensoussan, 1984; Galai and Masulis, 1976; Black and Scholes, 1973; Kumar et al., 2012) is determined by its stock price SS (i.e., its current market price), strike price KK, risk-free rate of return RR (i.e., the theoretical rate of return assuming zero risk), dividend yield YY (i.e., a ratio that shows how much dividend/year is paid relative to SS), volatility VV (i.e., how much the trading price varies over time), and time to expiration EE (e.g., in days).

Table 1. Notations
Symbol Meaning Symbol Meaning
SS stock price KK strike price
RR risk-free rate of return VV volatility
YY dividend yield EE time to expire
TT number of time steps (in days)

The earliest option pricing model (Bachelier, 1900) was based on the assumption of the geometric Brownian motion for asset pricing and the no-arbitrage idea. Many improved models were developed later (Samuelson et al., 2011; Black and Scholes, 1973; Merton, 1973; Derman and Kani, 1998; Dupire, 1994; Stein and Stein, 1991; Heston, 1993; Ball and Roma, 1994; Bergomi, 2015; Merton, 1976; Kou, 2002; Bakshi and Madan, 2000; Duffie et al., 2000; Dempster and Hong, 2002; Matsuda, 2004; Runggaldier, 2003; Madan et al., 1998; Geman et al., 2001; Jackson et al., 2008; Derman and Kani, 1994; Dupire, 1994).

Analytical solutions to the option pricing problem are sometimes available, particularly for European options (Shreve, 2004, 2005; Hull, 2003; Prathumwan and Trachoo, 2020). But they are not available for American options except for a limited number of cases with significant constraints (e.g., American call options with zero or one dividend (Shreve, 2004) and perpetual American put options (MacKean, 1965; Shreve, 2004)). This difficulty in finding closed-form analytical solutions for most option pricing problems makes computational approaches the only path forward. The main computational approaches to solving the option pricing problem include the binomial tree method (Karatzas and Shreve, 1998), the finite difference method (Thomas, 2013; Sewell, 2005; Holmes, 2007; LeVeque, 2007; Ames, 2014; Strikwerda, 2004), and the Monte Carlo method (Duan and Simonato, 1998; Longstaff and Schwartz, 2001; Tsitsiklis and Van R., 2001).

The binomial tree method works by tracing the option’s value at discrete time steps over the life of the option. For a given number of time steps TT between the valuation and expiration dates of the option, a binomial tree of height TT is created with the leaves storing the potential prices of the asset at the time of expiration. Then one works backward to compute for each t[0,T1]t\in[0,T-1] the value of the nodes at depth tt of the tree (each giving a possible price at time step tt) from the values of the nodes at depth t+1t+1 using a simple formula. The value computed for the root node is the required option value. Straightforward iterative implementation of this method runs in Θ(T2){\Theta}\left({T^{2}}\right) time on a single processing core and Θ(T2/p+TlogT){\Theta}\left({T^{2}/p+T\log{T}}\right) time on pp cores (see Figure 1 and Table 2). It provides a discrete-time approximation of the continuous-time option pricing in the Black–Scholes model and is widely used by professional option traders.

BOPM-American-Call(S,K,R,V,Y,E,T)\left(~{}S,~{}K,~{}R,~{}V,~{}Y,~{}E,~{}T~{}\right) (1) ΔtE/T\Delta t\leftarrow{E}/{T}ueVΔtu\leftarrow e^{V\sqrt{\Delta t}}d1/ud\leftarrow{1}/{u}p(e(RY)Δtd)/(ud)p\leftarrow{(e^{(R-Y)\Delta t}-d)}/{(u-d)} meRΔtm\leftarrow e^{-R\Delta t}s0mps_{0}\leftarrow mps1m(1p)s_{1}\leftarrow m(1-p) (2) for j0j\leftarrow 0 to TT do GT,jmax(0,Su2jTK)G_{T,j}\leftarrow\max{\left(0,Su^{2j-T}-K\right)} (3) for iT1i\leftarrow T-1 downto 0 do parallel for j0j\leftarrow 0 to ii do Gi,jmax(s0Gi+1,j+s1Gi+1,j+1,Su2jiK)G_{i,j}\leftarrow\max{\left(s_{0}G_{i+1,j}+s_{1}G_{i+1,j+1},~{}Su^{2j-i}-K\right)} (4) return G0,0G_{0,0}

Figure 1. Standard looping code for pricing American Call options under the Binomial Option Pricing Model.

The trinomial tree method extends the binomial method by accounting for the possibility that an asset value remains the same after a time step (Boyle, 1986). With only a constant factor increase in run-time it provides more precise predictions than the binomial model.

The finite-difference method approximates the continuous-time differential equations describing the evolution of an option price over time by a set of discrete-time difference equations and then solves them iteratively under appropriate boundary conditions. The explicit finite difference method divides the lifetime of the option into TT discrete time steps and then uses the potential values of the asset at time step TT (the time of expiration) to compute the asset values at each time step t[0,T1]t\in[0,T-1] from the asset values at time step t+1t+1 based on the difference equations (i.e., update equations or stencils). The final option value is found at time t=0t=0. Similar to the binomial tree method, the iterative implementation of this method runs in Θ(T2){\Theta}\left({T^{2}}\right) time on a single processing core and Θ(T2/p+TlogT){\Theta}\left({T^{2}/p+T\log{T}}\right) time on pp cores. Other finite difference methods used for option pricing include implicit finite difference and the Crank–Nicolson method (Crank and Nicolson, 1947).

The Monte Carlo method works by generating random backward paths the asset price may follow starting from the time of expiration and ending at the time of valuation. Each of these paths leads to a payoff value for the option and the average of these payoff values can be viewed as an expected value of the option. This method is used for pricing options with complicated features and/or multiple sources of uncertainty that other methods (analytical, tree-based, finite difference) cannot handle (Duan and Simonato, 1998), but is usually not competitive when those methods apply as the convergence rate of Monte Carlo method is sublinear (James, 1980; Metropolis and Ulam, 1949). They are hard to develop for some options, such as Black-Scholes Model for the American put option, but still many results exist on Monte Carlo methods (Ibanez and Zapatero, 2004; Longstaff and Schwartz, 2001; Tsitsiklis and Van R., 2001; Boyle, 1977).

Our Contributions. We present three shared-memory parallel algorithms for American option pricing – call option under the binomial and trinomial option pricing models and put option under the Black-Scholes-Merton model. All three algorithms run in Θ((Tlog2T)/p+T){\Theta}\left({\left(T\log^{2}{T}\right)/p+T}\right) time on pp processing cores which is a significant improvement over the Ω(T2/p+TlogT){\Omega}\left({T^{2}/p+T\log{T}}\right) time taken by the state-of-the-art parallel algorithms, where TT is the number of time steps. When run on a single processing core they run in Θ(Tlog2T){\Theta}\left({T\log^{2}{T}}\right) time compared to the Θ(T2){\Theta}\left({T^{2}}\right) time taken by the fastest existing serial algorithms. We use the Fast Fourier Transform (FFT) to speed up the computation. Table 2 summarizes the results.

We use the work-span model (Cormen et al., 2009) to analyze the performance of parallel programs. Let 𝒯p\mathcal{T}_{p} be the running time on a pp-processor machine under a greedy scheduler. Then 𝒯1\mathcal{T}_{1} and 𝒯\mathcal{T}_{\infty} are called work and span, respectively. The parallel running time 𝒯p=Θ(𝒯1/p+𝒯)\mathcal{T}_{p}={\Theta}\left({\mathcal{T}_{1}/p+\mathcal{T}_{\infty}}\right).

Table 2. Parallel Algorithms for American Option Pricing: The bounds hold for call option pricing under the binomial and trinomial option pricing models, and put option pricing under the Black-Scholes-Merton model. Here, T=T= number of time steps, p=p= number of processing cores, and M=M= cache size. Also, 𝒯p=\mathcal{T}_{p}= running time on pp cores, and thus 𝒯1\mathcal{T}_{1} (Work) and 𝒯\mathcal{T}_{\infty} (Span) represent run-times on one core and an unbounded number of cores, respectively. Under a greedy scheduler, 𝒯p=Θ(𝒯1/p+𝒯)\mathcal{T}_{p}={\Theta}\left({{\mathcal{T}_{1}}/{p}+\mathcal{T}_{\infty}}\right), which is asymptotically optimal.
Algorithm Work Span Parallel Running Time
𝒯1(T)\mathcal{T}_{1}(T) 𝒯(T)\mathcal{T}_{\infty}(T) 𝒯p(T)\mathcal{T}_{p}(T)
Nested Loop (standard, see Figure 1) Θ(T2){\Theta}\left({T^{2}}\right) Θ(TlogT){\Theta}\left({T\log T}\right) Θ(T2p+TlogT){\Theta}\left({\frac{T^{2}}{p}+T\log{T}}\right)
Tiled Loop (cache-aware) (Brunelle, 2022) Θ(TM+TMlogTM){\Theta}\left({TM+\frac{T}{M}\log{\frac{T}{M}}}\right) Θ(T2p+TM+TMlogTM){\Theta}\left({\frac{T^{2}}{p}+TM+\frac{T}{M}\log{\frac{T}{M}}}\right)
Recursive Tiling (cache-oblivious) (Brunelle, 2022; Tang et al., 2011; Frigo and Strumpen, 2005, 2009) Θ(Tlog23){\Theta}\left({T^{\log_{2}3}}\right) Θ(T2p+Tlog23){\Theta}\left({\frac{T^{2}}{p}+T^{\log_{2}3}}\right)
Our Algorithms Θ(Tlog2T){\Theta}\left({T\log^{2}T}\right) Θ(T){\Theta}\left({T}\right) Θ(Tlog2Tp+T){\Theta}\left({\frac{T\log^{2}{T}}{p}+T}\right)

The following proposition, which follows easily from the complexities given in Table 2, notes that each of our parallel algorithms runs asymptotically faster than the corresponding fastest existing parallel algorithm for every value of pp.

Proposition 1.1.

Let 𝒯p(old)(T)\mathcal{T}^{(old)}_{p}(T) be the running time of any existing algorithm from Table 2 on pp cores, and let 𝒯p(new)(T)\mathcal{T}^{(new)}_{p}(T) denote the same for our algorithm. Then for every (positive) value of pp under a greedy scheduler: 𝒯p(new)(T)=o(𝒯p(old)(T))\mathcal{T}^{(new)}_{p}(T)={o}\left({\mathcal{T}^{(old)}_{p}(T)}\right).

We have implemented our algorithms and compared their running times, energy consumption, and cache performance with those of the option pricing implementations available in the Par-bin-ops framework (Brunelle, 2022) developed recently in 2022. Implementations of our algorithms run orders of magnitude faster, consume significantly less energy, and usually incur far fewer L1 cache misses than those implementations.

How Our Algorithms Differ from Existing FFT-based Option Pricing Algorithms. FFTs have been used for European option pricing before. European option pricing is simpler than American option pricing, e.g., the European version of the American option pricing algorithm shown in Figure 1 can be obtained by replacing the assignment Gi,jmax(s0Gi+1,j+s1Gi+1,j+1,Su2jiK)G_{i,j}\leftarrow\max{\left(s_{0}G_{i+1,j}+s_{1}G_{i+1,j+1},~{}Su^{2j-i}-K\right)} in Step 3 with the simpler assignment Gi,js0Gi+1,j+s1Gi+1,j+1G_{i,j}\leftarrow s_{0}G_{i+1,j}+s_{1}G_{i+1,j+1}. The absence of the ‘max’ operator in this assignment makes an efficient evaluation of the doubly-nested loop in Step 3 easier.

Black, Scholes, and Merton (Black and Scholes, 1973; Merton, 1973) showed that the European option can be calculated using a Parabolic PDE with infinite domain constraint. By using the Fourier transform, one gets an integral form for European options. To obtain the numerical value from the integral form, one uses numerical integration (Hildebrand, 1987; Burden et al., 2015) which can be sped up using FFTs.

There are also approximation results (Chang et al., 2007; Oliveira, 2014; Zhylyevskyy, 2010; Lord et al., 2008) based on repeated Richardson extrapolation (Richardson, 1911; Richardson and Gaunt, 1927) and FFT for numerical integration in American options. However, even if the extrapolation is repeated only for a constant number of times for an option that expires in EE days, the approximation takes Ω((E/Δt)NlogN){\Omega}\left({\left({E}/{\Delta t}\right)N\log{N}}\right) time when NN grid points are used to discretize the price of the underlying asset and E/Δt{E}/{\Delta t} exercise points are placed with every pair of consecutive exercise points being Δt\Delta t days apart. Observing that T=E/ΔtT={E}/{\Delta t} corresponds to the number of discrete time steps in the finite difference formulation of the problem, we can rewrite the complexity as Ω(TNlogN){\Omega}\left({TN\log{N}}\right). Usually, NTN\geq T is used in practice (Oliveira, 2014), which reduces the complexity to Ω(T2logT){\Omega}\left({T^{2}\log{T}}\right).

A major weakness of the existing FFT-based numerical integration approach above is that a closed-form expression for the characteristic function of the log-price must be known for the technique to work. However, our approach does not need to know such a closed-form expression as we apply FFT to speed up stencil/finite-difference computations and not numerical integration. Thus, our approach will work on a larger set of option pricing problems. We are not restricted to infinite-domain problems either (Cont and Voltchkova, 2005).

Implications for Nonlinear Stencil Computations. Our option pricing algorithms can be viewed as solving a class of nonlinear 1D stencil computation problems asking to evolve a grid of size Θ(T){\Theta}\left({T}\right) for TT time steps, in Θ(Tlog2T){\Theta}\left({T\log^{2}{T}}\right) work (i.e., serial time) or Θ((Tlog2T)/p+T){\Theta}\left({(T\log^{2}{T})/p+T}\right) parallel time on pp processing cores. As stencil algorithms, they are of independent interest.

A stencil is a pattern (equation) used to update the values of cells in a spatial grid and evolve the grid over a given number of time steps. The process of evolving cell values in the spatial grid according to a stencil is called a stencil computation (Frigo and Strumpen, 2005). The finite-difference method performs a stencil computation with an update equation derived from the differential equations used as a stencil. Stencils are widely used in various fields, including mechanical engineering (Paoli and Schatzman, 2002; Zhang et al., 2006; Renson et al., 2016; Szilard, 2004; Rappaz et al., 2010), meteorology (Murray and Simmonds, 1991; Johnson, 2010; Robert, 1981, 1982; Avissar and Pielke, 1989; Kalnay et al., 1990), stochastic and fractional differential equations (Zhang et al., 2004; Yuan and Agrawal, 2002; Lord and Rougemont, 2004), chemistry (Najm et al., 1998; Snider and Banerjee, 2010; Long et al., 2008; Aubin et al., 2004; Han and Wang, 2015), electromagnetics (Komissarov, 2002; Taflove and Hagness, 2005; Van R., 2012; Atangana and Nieto, 2015), finance (Chen and Forsyth, 2008), and physics (Gammie et al., 2003; Vijayaraghavan and Keith, 1990; Mangeney et al., 2002; Hirouchi et al., 2009; Cundall and Hart, 1992; Pang, 1999; Barth and Deconinck, 2013; Vesely, 1994; Thijssen, 2007), image processing (Weickert, 2001; Peyré, 2011; Vese and Osher, 2002).

Standard stencil algorithms perform Θ(NT){\Theta}\left({NT}\right) work to evolve a grid of size NN for TT time steps, they include looping algorithms, tiled looping algorithms (Bandishti et al., 2012; Bondhugula et al., 2017; Wolfe, 1987; Wolf and Lam, 1991; Wolf et al., 1996; Wonnacott, 2002; Bondhugula et al., 2016; Andonov and Rajopadhye, 1997; Högstedt et al., 1999; Zhang et al., 2015; Malas et al., 2014, 2015), and recursive divide-and-conquer algorithms (Frigo and Strumpen, 2005, 2009; Tang et al., 2011; Palamadai Natarajan et al., 2017; Sato et al., 2010; Krishnamoorthy et al., 2007).

A stencil is called linear if it computes the value of a cell at time step tt as a fixed linear combination of cell values at time steps before tt, otherwise it is called nonlinear. For 1D linear stencils Ahmad et al. (Ahmad et al., 2021) provide FFT-based algorithms that take 𝒪(TlogT){\mathcal{O}}\left({T\log{T}}\right) serial time for periodic grids and 𝒪(Tlog2T){\mathcal{O}}\left({T\log^{2}{T}}\right) serial time for aperiodic grids, assuming that the input grid is of size Θ(T){\Theta}\left({T}\right).

The stencils we encounter in our current work are nonlinear because they do not use a linear combination of cell values from prior time steps for updating a target cell, provided that the resulting value is smaller than the value of a function computed solely based on the spatial coordinates of the target cell and other option pricing parameters (e.g., see Step 3 of Figure 1). Such a stencil divides the space-time grid into two disjoint regions – in one region only the linear combination applies, while in the other only the function value applies. However, the problem is that the boundary between these two regions is not known ahead of time and the location of the boundary may move as the time step tt progresses. As a result, Ahmad et al.’s (Ahmad et al., 2021) results for linear stencils do not apply. To the best of our knowledge, ours are the first algorithms for handling such stencils running in time subquadratic in TT.

Refer to caption

(𝒂)(a)

Refer to caption

(𝒃)(b)

Figure 2. (𝒂)(a) A 3 time-step binomial tree. (𝒃)(b) A binomial tree of 77 time steps embedded in an 8×88\times 8 grid. An upward arrow has a price change factor of 1/u{1}/{u} while a rightward arrow has a price change factor of uu.

2. American Call Option under the Binomial Option Pricing Model

2.1. Binomial Option Pricing Model (BOPM)

BOPM (Sharpe, 1978; Cox et al., 1979; Rendleman, 1979) is a simple discrete-time option pricing model without using advanced mathematical tools. It is a paradigm of practice.

BOPM encodes the various sequences of prices the asset might take as paths in a binomial tree. Each node in the tree represents a possible price at a certain time and the nodes at two successive layers in the tree represent prices at times that are apart by some fixed time step Δt\Delta t. The prices increase or decrease by some factor after every Δt\Delta t time. Figure 2(a)(a) gives an example of a 3-time-step binomial price tree that is produced by moving from the valuation day to the expiration day. Denote the initial price by SS. The price in the next time step (i.e., after Δt\Delta t time) can go up to fu=Suf_{u}=S\cdot u or go down to fd=Sdf_{d}=S\cdot d, where the up factor u=eVΔtu=e^{V\sqrt{\Delta t}} and the down factor d=1/ud={1}/{u} are determined by Δt\Delta t and volatility VV.

Denote the node value as Xnode=S×uNuNdX_{node}=S\times u^{N_{u}-N_{d}}, where NuN_{u} and NdN_{d} are the numbers of ticks up and down, respectively. The final nodes of the tree represent the prices on the expiration date. Given the strike price of KK, the price one can call or put before the contract expires, i.e., the exercise value of each node will be max(XnodeK,0)\max(X_{node}-K,0) for a call option and max(KXnode,0)\max(K-X_{node},0) for a put option.

The risk-neutral valuation of the binomial value is performed iteratively backward. Under the assumption of risk neutrality, the value of the option today is its expected future payoff discounted at the risk-free rate of RR. Let us number the nodes in each layer of Figure 2(a)(a) from top to bottom by successive integers starting from 1. Then the node values Xt,jX_{t,j} and Xt,j+1X_{t,j+1} of a layer representing some time tt can be used to compute the binomial value of the jj-th node in the layer representing time tΔtt-\Delta t as follows: eRΔt(pXt,j+(1p)Xt,j+1)e^{-R\Delta t}(p\cdot X_{t,j}+(1-p)\cdot X_{t,j+1}), where, p=(eRΔtd)/(ud)p={(e^{R\Delta t}-d)}/{(u-d)} (Hull, 2003). Denote m=eRΔtm=e^{-R\cdot\Delta t}, s0=m(1p)s_{0}=m(1-p), and s1=mps_{1}=mp. Then the binomial value of that node is: s0Xt,j+s1Xt,j+1s_{0}\cdot X_{t,j}+s_{1}\cdot X_{t,j+1}. For options on stocks paying a continuous dividend yield YY, p=(e(RY)Δtd)/(ud)p={(e^{(R-Y)\Delta t}-d)}/{(u-d)}.

The value at each node will be equal to its binomial value for European options and the larger of its binomial value and exercise value for American options.

The binomial tree of TT time steps can be embedded in a (T+1)×(T+1)(T+1)\times(T+1) grid. Figure 2(b)(b) shows an example.

Definition 2.1.

Let Gi,jG_{i,j} be the grid value in row i[0,T]i\in[0,T] and column j[0,T]j\in[0,T] of the (T+1)×(T+1)(T+1)\times(T+1) grid GG. Let Gi,jgreen=Su2jiKG^{green}_{i,j}=S\cdot u^{2j-i}-K, and let Gi,jred=s0Gi+1,j+s1Gi+1,j+1G^{red}_{i,j}=s_{0}G_{i+1,j}+s_{1}G_{i+1,j+1} if i[0,T)i\in[0,T), and 0 otherwise. Then

Gi,j={Gi,jred,if Gi,jredGi,jgreenGi,jgreen,otherwise.G_{i,j}=\begin{cases}G^{red}_{i,j},&\text{if }G^{red}_{i,j}\geq G^{green}_{i,j}\\ G^{green}_{i,j},&\text{otherwise.}\end{cases}

We say that cell (i,j)(i,j) of GG is red provided Gi,j=Gi,jredG_{i,j}=G^{red}_{i,j}, and green otherwise. We show in Section 2.2 that all red cells in GG form a single contiguous region and all green cells form another. A single boundary divides the two regions. We analyze the properties of this red-green divider in Section 2.1 which we will exploit to design an efficient algorithm for American call options in Section 2.3.

2.2. Properties of the Red-Green Divider

As shown in the example in Figure 2(b)(b), we assume that a binomial tree for TT time steps is embedded in a (T+1)×(T+1)(T+1)\times(T+1) grid GG with the root at the bottom-left corner G[0,0]G[0,0] and the leaves in the top row G[T,0..T]G[T,0..T]. For 0jiT0\leq j\leq i\leq T, the two children of the binomial tree node at G[i,j]G[i,j] are stored at G[i+1,j]G[i+1,j] and G[i+1,j+1]G[i+1,j+1]. The arrow from G[i,j]G[i,j] to G[i+1,j+1]G[i+1,j+1] represents a price change factor of uu while the one from G[i,j]G[i,j] to G[i+1,j]G[i+1,j] represents a price change factor of d=1/ud=1/u. So, the entire tree occupies only the upper-left triangular part of the grid.

Lemma 2.2 shows that within the upper-left triangle of GG, if a cell is green then the cell to its right is also green.

Lemma 2.2.

(Gi,j+1=Gi,j+1green)\left(G_{i,j+1}=G^{green}_{i,j+1}\right)\implies for i[0,T]i\in[0,T] and j[0,i1]j\in[0,i-1], (Gi,j=Gi,jgreen)\left(G_{i,j}=G^{green}_{i,j}\right).

Proof.

Observe that s0u+s1u=mp(u1u)+mu=eYΔt\frac{s_{0}}{u}+s_{1}u=mp\left(u-\frac{1}{u}\right)+\frac{m}{u}=e^{-Y\Delta t}.

Let δi,j=Gi,jredGi,jgreen\delta_{i,j}=G_{i,j}^{red}-G_{i,j}^{green}, Δi,j=δi,j+1δi,j\Delta_{i,j}=\delta_{i,j+1}-\delta_{i,j}, δ~i,j=Gi,jGi,jgreen\tilde{\delta}_{i,j}=G_{i,j}-G_{i,j}^{green} and Δ~i,j=δ~i,j+1δ~i,j\tilde{\Delta}_{i,j}=\tilde{\delta}_{i,j+1}-\tilde{\delta}_{i,j}.

We will use mathematical induction to show that Δi,j0\Delta_{i,j}\leq 0 for all 0iT0\leq i\leq T and j<ij<i.

As δT,j=KSu2jT\delta_{T,j}=K-Su^{2j-T} and thus ΔT,j=Su2jT(1u2)0\Delta_{T,j}=Su^{2j-T}\left(1-u^{2}\right)\leq 0, the claim holds for i=Ti=T.

Now suppose that the claim holds for some given i+1Ti+1\leq T, i.e., Δi+1,j0\Delta_{i+1,j}\leq 0 for 0j<i+1T0\leq j<i+1\leq T. Since Δi+1,j0\Delta_{i+1,j}\leq 0, there exists a ji+1j_{i+1} such that Gi+1,j=Gi+1,jgreenG_{i+1,j}=G_{i+1,j}^{green} when j>ji+1j>j_{i+1} and Gi+1,j=Gi+1,jredG_{i+1,j}=G_{i+1,j}^{red} when jjij\leq j_{i}, where ji+1j_{i+1} is the largest jj such that δi+1,j>0\delta_{i+1,j}>0. Then,

  • -

    for j>ji+1j>j_{i+1}, Δ~i+1,j=δ~i+1,j+1δ~i+1,j=00=0\tilde{\Delta}_{i+1,j}=\tilde{\delta}_{i+1,j+1}-\tilde{\delta}_{i+1,j}=0-0=0;

  • -

    for j=ji+1j=j_{i+1}, Δ~i+1,j=δ~i+1,j+1δ~i+1,j=δ~i+1,j0\tilde{\Delta}_{i+1,j}=\tilde{\delta}_{i+1,j+1}-\tilde{\delta}_{i+1,j}=-\tilde{\delta}_{i+1,j}\leq 0; and

  • -

    for j<ji+1j<j_{i+1}, Δ~i+1,j=Δi+1,j0\tilde{\Delta}_{i+1,j}=\Delta_{i+1,j}\leq 0.

Thus, Δ~i+1,j0\tilde{\Delta}_{i+1,j}\leq 0 for all j[0,i+1)j\in[0,i+1).

Hence, Δi,j=δi,j+1δi,j\Delta_{i,j}=\delta_{i,j+1}-\delta_{i,j}

=k{0,1}sk((Gi+1,j+k+1Gi+1,j+k+1green)(Gi+1,j+kGi+1,j+kgreen))\displaystyle=\sum_{k\in\left\{0,1\right\}}{s_{k}\left(\left(G_{i+1,j+k+1}-G_{i+1,j+k+1}^{green}\right)-\left(G_{i+1,j+k}-G_{i+1,j+k}^{green}\right)\right)}
+k{0,1}sk(Gi+1,j+k+1greenGi+1,j+kgreen)+Gi,jgreenGi,j+1green\displaystyle\phantom{=}+\sum_{k\in\left\{0,1\right\}}{s_{k}\left(G_{i+1,j+k+1}^{green}-G_{i+1,j+k}^{green}\right)}+G_{i,j}^{green}-G_{i,j+1}^{green}
=s0Δ~i+1,j+s1Δ~i+1,j+1+Su2ji(eYΔt1)(u21)0\displaystyle=s_{0}\tilde{\Delta}_{i+1,j}+s_{1}\tilde{\Delta}_{i+1,j+1}+Su^{2j-i}\left(e^{-Y\Delta t}-1\right)\left(u^{2}-1\right)\leq 0

Therefore, Δi,j0\Delta_{i,j}\leq 0 for all 0i<T0\leq i<T and j<ij<i.

Because Δi,j0\Delta_{i,j}\leq 0, there exists a jij_{i} such that when j>jij>j_{i}, Gi,j=Gi,jgreenG_{i,j}=G_{i,j}^{green} and when jjij\leq j_{i}, Gi,j=Gi,jredG_{i,j}=G_{i,j}^{red}, where jij_{i} is the largest jj such that δi,j>0\delta_{i,j}>0. Now if Gi,j=Gi,jgreenG_{i,j}=G_{i,j}^{green}, we have j>jij>j_{i}, and thus j+1>jij+1>j_{i}, which implies that Gi,j+1=Gi,j+1greenG_{i,j+1}=G_{i,j+1}^{green}. ∎

Lemma 2.3.

(Gi,j=Gi,jgreen)(ji2+12VΔtln(KS1eRΔt1eYΔt))\left(G_{i,j}=G^{green}_{i,j}\right)\implies\Big{(}j\geq\frac{i}{2}+\frac{1}{2V\sqrt{\Delta t}}\ln{\Big{(}\frac{K}{S}\frac{1-e^{-R\Delta t}}{1-e^{-Y\Delta t}}\Big{)}}\Big{)}

Proof.

By Gi,j=Gi,jgreenG_{i,j}=G^{green}_{i,j} and Def. 2.1, we know Su2jiK>s0Gi+1,j+s1Gi+1,j+1S\cdot u^{2j-i}-K>s_{0}G_{i+1,j}+s_{1}G_{i+1,j+1}, Gi+1,jSu2j(i+1)KG_{i+1,j}\geq S\cdot u^{2j-(i+1)}-K, and Gi+1,j+1Su2(j+1)(i+1)KG_{i+1,j+1}\geq S\cdot u^{2(j+1)-(i+1)}-K.

Denote z=Su2jiz=S\cdot u^{2j-i} and use above inequalities:

zKs0(Su2j(i+1)K)+s1(Su2(j+1)(i+1)K)\displaystyle z-K\geq s_{0}(S\cdot u^{2j-(i+1)}-K)+s_{1}(S\cdot u^{2(j+1)-(i+1)}-K)
=m(Su2j(i+1)K)+mp((Su2(j+1)(i+1)K)\displaystyle=m(S\cdot u^{2j-(i+1)}-K)+mp((S\cdot u^{2(j+1)-(i+1)}-K)
(Su2j(i+1)K))\displaystyle\quad-(S\cdot u^{2j-(i+1)}-K))
=(1/u)(mzmKu+mp(u21)z)\displaystyle=(1/u)\left(mz-mKu+mp\left(u^{2}-1\right)z\right)
=1u(mzmKu+m((e(RY)Δt1u)/(u1u))(u21)z)\displaystyle=\frac{1}{u}\left(mz-mKu+m\left({\left(e^{(R-Y)\Delta t}-\frac{1}{u}\right)}/{\left(u-\frac{1}{u}\right)}\right)(u^{2}-1)z\right)
=1u(mKu+ueYΔtz)=eRΔtK+eYΔt\displaystyle=\frac{1}{u}\left(-mKu+u\cdot e^{-Y\Delta t}z\right)=-e^{-R\Delta t}K+e^{-Y\Delta t}

Then we have: zK1eRΔt1eYΔtu2jiKS1eRΔt1eYΔtz\geq K\frac{1-e^{-R\Delta t}}{1-e^{-Y\Delta t}}\implies u^{2j-i}\geq\frac{K}{S}\frac{1-e^{-R\Delta t}}{1-e^{-Y\Delta t}}

\displaystyle\implies ji2+12VΔt(lnKS+ln1eRΔt1eYΔt)\displaystyle j\geq\frac{i}{2}+\frac{1}{2V\sqrt{\Delta t}}\left(\ln{\frac{K}{S}}+\ln{\frac{1-e^{-R\Delta t}}{1-e^{-Y\Delta t}}}\right)

So, (Gi,j=Gi,jgreen)(ji2+12VΔtln(KS1eRΔt1eYΔt))\left(G_{i,j}=G^{green}_{i,j}\right)\implies\left(j\geq\frac{i}{2}+\frac{1}{2V\sqrt{\Delta t}}\ln{\left(\frac{K}{S}\frac{1-e^{-R\Delta t}}{1-e^{-Y\Delta t}}\right)}\right)

Lemma 2.4 shows that within the upper-left triangular area of GG if a cell is green then the cell below it must also be green, and Lemma 2.6 shows that if a cell is red then the cell diagonally left below it must also be red.

Lemma 2.4.

(Gi+1,j=Gi+1,jgreen)(Gi,j=Gi,jgreen)\left(G_{i+1,j}=G^{green}_{i+1,j}\right)\implies\left(G_{i,j}=G^{green}_{i,j}\right) for i[0,T1]i\in[0,T-1] and j[0,i]j\in[0,i].

Proof.

(Gi+1,j=Gi+1,jgreen)(Gi+1,j+1=Gi+1,j+1green)\left(G_{i+1,j}=G^{green}_{i+1,j}\right)\implies\left(G_{i+1,j+1}=G^{green}_{i+1,j+1}\right) by Lemma 2.2. Let z=Su2jiz=Su^{2j-i}, then Gi+1,j=zuKG_{i+1,j}=\frac{z}{u}-K and Gi+1,j+1=uzKG_{i+1,j+1}=uz-K.

Gi,jred=s0Gi+1,j+s1Gi+1,j+1=m(1p)(zuK)+mp(uzK)\displaystyle G^{red}_{i,j}=s_{0}G_{i+1,j}+s_{1}G_{i+1,j+1}=m(1-p)\left(\frac{z}{u}-K\right)+mp(uz-K)
=Su2ji(1eYΔt)+Su2jiKeRΔt\displaystyle=-Su^{2j-i}\left(1-e^{-Y\Delta t}\right)+Su^{2j-i}-Ke^{-R\Delta t}
Su1+1VΔt(lnKS+ln1eRΔt1eYΔt)(1eYΔt)+Su2jiKeRΔt\displaystyle\leq-Su^{1+\frac{1}{V\sqrt{\Delta t}}\left(\ln{\frac{K}{S}}+\ln{\frac{1-e^{-R\Delta t}}{1-e^{-Y\Delta t}}}\right)}\left(1-e^{-Y\Delta t}\right)+Su^{2j-i}-Ke^{-R\Delta t}
(by Lemma 2.3 at Gi+1,jG_{i+1,j})
=uK(1eRΔt)+Su2jiKeRΔt(since u1VΔt=e)\displaystyle=-uK\left(1-e^{-R\Delta t}\right)+Su^{2j-i}-Ke^{-R\Delta t}\quad(\textrm{since~{}}u^{\frac{1}{V\sqrt{\Delta t}}}=e)
K(1eRΔt)+Su2jiKeRΔt=Su2jiK=Gi,jgreen\displaystyle\leq-K\left(1-e^{-R\Delta t}\right)+Su^{2j-i}-Ke^{-R\Delta t}=Su^{2j-i}-K=G^{green}_{i,j}

Hence, the statement holds. ∎

Lemma 2.5.

For any i[0,T2]i\in[0,T-2] and j<ij<i: Gi,jGi+2,j+1G_{i,j}\geq G_{i+2,j+1}.

Proof.

We use mathematical induction. For i=T2i=T-2, we have our base case: Gi,jmax(0,Su2jiK)=GT,j+1G_{i,j}\geq\max(0,Su^{2j-i}-K)=G_{T,j+1}.

Suppose it holds true for all Gi+1,jG_{i+1,j} for some given iT3i\geq T-3 and j[0,i]j\in[0,i]. Then Gi,j=max(s0Gi+1,j+s1Gi+1,j+1,G_{i,j}=\max(s_{0}G_{i+1,j}+s_{1}G_{i+1,j+1}, Su2jiK)max(s0Gi+3,j+1+s1Gi+3,j+2,Su2jiK)=Gi+2,j+1Su^{2j-i}-K)\geq\max(s_{0}G_{i+3,j+1}+s_{1}G_{i+3,j+2},Su^{2j-i}-K)=G_{i+2,j+1}. ∎

Lemma 2.6.

(Gi+1,j+1=Gi+1,j+1red)(Gi,j=Gi,jred)\left(G_{i+1,j+1}=G_{i+1,j+1}^{red}\right)\Rightarrow\left(G_{i,j}=G_{i,j}^{red}\right) for i[0,T1]i\in[0,T-1] and j[0,i1]j\in[0,i-1].

Proof.

Gi+1,j+1=Gi+1,j+1redG_{i+1,j+1}=G_{i+1,j+1}^{red} and Lemma 2.4 gives: Gi+2,j+1=Gi+2,j+1redGi+2,j+1green=Su2jiKG_{i+2,j+1}=G_{i+2,j+1}^{red}\geq G_{i+2,j+1}^{green}=Su^{2j-i}-K. Now, by Lemma 2.5, Gi,jGi+2,j+1Su2jiK=Gi,jgreenG_{i,j}\geq G_{i+2,j+1}\geq Su^{2j-i}-K=G_{i,j}^{green}. So, Gi,j=Gi,jredG_{i,j}=G_{i,j}^{red}. ∎

The following corollary says that at every time step all red cells appear to the left of all green cells, and the boundary between the green and the red regions either remains the same or moves by one cell towards the left every time step.

Corollary 2.7.

For every i[0,T1]i\in[0,T-1], there exists an index ji[0,i]j_{i}\in[0,i] such that all cells Gi,jG_{i,j} with 0jji0\leq j\leq j_{i} are red and all (possibly zero) cells Gi,jG_{i,j} with ji<jij_{i}<j\leq i are green. Also, for i[0,T2]i\in[0,T-2], ji+11jiji+1j_{i+1}-1\leq j_{i}\leq j_{i+1}.

Proof.

Follows from Lemmas 2.2, 2.4, and 2.6. ∎

2.3. Algorithm for American call option pricing under BOPM

Refer to caption
(a) Partitioning the solution space into trapezoids
Refer to caption
(b) Decomposing a trapezoid into smaller trapezoids
Figure 3. American call option pricing under BOPM

The solution space is a right-angle isosceles triangle with base length TT. We know the boundary between the red and green cells in the first row of the triangle (solution space); however, we do not know the locus of the boundary in the subsequent rows of the triangle. We compute the boundary in the following process.

We partition the triangle (solution space) into trapezoids (see Figure 3(a)). We compute the first trapezoid with its first row as the same first row of the triangle (the solution space) and solve this newly created trapezoid (we explain how we create a trapezoid and solve it later in this section). Then we compute the second trapezoid with its first row as the last row of the first trapezoid and solve the second trapezoid. This process continues until we are left with a right-angle isosceles triangle with base size at most T\sqrt{T}. We solve this triangle iteratively by doing quadratic work in time 𝒪(T){\mathcal{O}}\left({T}\right). We describe the process in detail below.

Partitioning the Triangle into Trapezoids. Let abcabc be a right-angle isosceles triangle with base length TT (see Figure 3(a)). Let 1\ell_{1} be the number of red cells on the line segment abab. They will appear consecutively from aa to some point pp. Let dd be the point in the line segment acac that is 1\ell_{1} distance from aa. Draw a horizontal line from dd; let the line intersect bcbc at point ee. Therefore, we get a trapezoid abedabed with height 1\ell_{1} with 1\ell_{1} red cells in its first row.

We solve trapezoid abedabed, which means that we compute the values of all red cells in its last row and thus find the boundary point qq between red and green cells in dede. Let |dq|=2|dq|=\ell_{2}. Let ff be the point on dcdc that |df|=2|df|=\ell_{2}. We draw a horizontal line fgfg, and get the second trapezoid degfdegf of height 2\ell_{2}. The last row of the trapezoid abedabed becomes the first row of degfdegf.

We solve the trapezoid degfdegf and all subsequent trapezoids created following the approach described above until we are left with a right-angle isosceles triangle xycxyc with base length at most T\sqrt{T}. We compute all cell values of xycxyc iteratively in 𝒪(T){\mathcal{O}}\left({T}\right) time.

Solving a Trapezoid. Solving a trapezoid means given all red cell values in its first row computing all red cell values in its last row. Let abcdabcd be a trapezoid of height \ell with \ell red cells from aa to qq in its first row (Figure 3(b)). Let rr be the point on adad such that |ar|=/2|ar|=\left\lfloor\ell/2\right\rfloor. Draw a horizontal line from rr intersecting bcbc at vv. To compute the values of red cells on dcdc, we (1)(1) compute all red cells on rvrv, and (2)(2) using the cell values on rvrv, compute all red cells on dcdc.

(1) Computing all red cells on 𝐫𝐯\boldsymbol{rv}.

Let rvrv and qdqd intersect tt. We compute the cell values on line segment rtrt using the FFT-based stencil algorithm of (Ahmad et al., 2021) which are guaranteed to be red because of Corollary 2.7. Note that |rt|=|pt|=/2|rt|=|pt|=\left\lfloor\ell/2\right\rfloor. The newly created trapezoid pbvtpbvt has height /2\left\lfloor\ell/2\right\rfloor, and there are /2\left\lfloor\ell/2\right\rfloor red cells in its first row. We solve trapezoid pbvtpbvt recursively similar to trapezoid abcdabcd, and thus compute all red cell values on tvtv.

(2) Computing all red cells on 𝐝𝐜\boldsymbol{dc} using the red cells on 𝐫𝐯\boldsymbol{rv}.

Let point uu be on the boundary between red and green cells on rvrv. Draw a line from uu parallel to qdqd intersecting dcdc at ww. Next, draw a line through ww perpendicular to dcdc intersecting rvrv at point ss. We compute all red cells on dcdc in exactly the same way we computed the red cells on rvrv above. We first use the FFT-based stencil algorithm of (Ahmad et al., 2021) to find all cell values on dwdw, and then recursively solve trapezoid svcwsvcw of height /2\left\lceil\ell/2\right\rceil to find all red cell values on wcwc.

Solving trapezoids of height 𝓞(𝟏)\boldsymbol{{\mathcal{O}}\left({1}\right)} (base case).

We solve trapezoids of height 𝒪(1){\mathcal{O}}\left({1}\right) in 𝒪(1){\mathcal{O}}\left({1}\right) work and span using the naïve looping code (e.g., see the pseudocode in Figure 1).

The following theorem gives the work and span of our algorithm.

Theorem 2.8.

Our algorithm solves the American call option pricing problem under BOPM in 𝒪(Tlog2T){\mathcal{O}}\left({T\log^{2}T}\right) work and 𝒪(T){\mathcal{O}}\left({T}\right) span, where TT is the number of time steps.

Proof.

Let ζ1()\zeta_{1}(\ell) and ζ()\zeta_{\infty}(\ell) be the work and span, respectively, for solving a trapezoid of height \ell recursively. We call the 𝒪(log){\mathcal{O}}\left({\ell\log\ell}\right)-work FFT-based periodic algorithm (Ahmad et al., 2021) twice and solve two smaller trapezoids of height /2\ell/2 each recursively. Hence, ζ1()=2ζ1(/2)+Θ(log)=𝒪(log2)\zeta_{1}(\ell)=2\zeta_{1}(\left\lceil\ell/2\right\rceil)+\Theta(\ell\log\ell)={\mathcal{O}}\left({\ell\log^{2}\ell}\right). Observe that although the two smaller trapezoids must be solved one after the other (e.g., svcwsvcw after pbvtpbvt in Figure 3(b)), each of them can be solved in parallel with the FFT-based algorithm (of span 𝒪(logloglog){\mathcal{O}}\left({\log\ell\log\log\ell}\right) (Ahmad et al., 2021)) called to find the red cells on its left (on line segments rtrt and dwdw, respectively). Hence, ζ()=2ζ(/2)+𝒪(logloglog)=𝒪()\zeta_{\infty}(\ell)=2\zeta_{\infty}(\left\lceil\ell/2\right\rceil)+{\mathcal{O}}\left({\log\ell\log\log\ell}\right)={\mathcal{O}}\left({\ell}\right).

Suppose that we solve kk trapezoids of heights 1,2,,k\ell_{1},\ell_{2},\ldots,\ell_{k}, respectively, using the above process as shown in Figure 3(a). Let Ψ1\Psi_{1} and Ψ\Psi_{\infty} be the total work and span, respectively, for solving those kk trapezoids followed by the time needed to solve the leftover triangle of height 𝒪(T){\mathcal{O}}\left({\sqrt{T}}\right). Then Ψ1=(1ik𝒪(ilog2i))+𝒪(T)=𝒪(Tlog2T)\Psi_{1}=\left(\sum_{1\leq i\leq k}{{\mathcal{O}}\left({\ell_{i}\log^{2}\ell_{i}}\right)}\right)+{\mathcal{O}}\left({T}\right)={\mathcal{O}}\left({T\log^{2}T}\right). Since those trapezoids and the triangle are solved in sequence, we have Ψ=1ik𝒪(i)+𝒪(T)=𝒪(T)\Psi_{\infty}=\sum_{1\leq i\leq k}{{\mathcal{O}}\left({\ell_{i}}\right)}+{\mathcal{O}}\left({\sqrt{T}}\right)={\mathcal{O}}\left({T}\right). ∎

3. American Call Option under the Trinomial Option Pricing Model

The trinomial options pricing model (TOPM) encodes the possible sequences of prices for a given asset within the structure of a trinomial tree (see Figure 2(c)(c)). It expands on BOPM by allowing the value of an asset to remain unchanged after a given time step. TOPM was introduced by Boyle (Boyle, 1986), and while it is less popular than the BOPM, it provides for the possibility of more accurate predictions than BOPM at the cost of only a constant factor blowup in runtime when using naïve 𝒪(T2){\mathcal{O}}\left({T^{2}}\right) methods. Langat, Mwaniki, and Kiprop showed that TOPM converges to the same solution as Black-Scholes with half as many time steps (Langat et al., 2019). TOPM is also equivalent to the explicit finite difference method (Hull, 2003).

TOPM carries over many properties from BOPM, e.g., Xnode=S×uNuNdX_{node}=S\times u^{N_{u}-N_{d}} since the number of “remain the same” moves does not factor into the price. Here u=eV2Δtu=e^{V\sqrt{2\Delta t}}, and d=1/ud=1/u. The exercise value in TOPM is max(XnodeK,0)\text{max}(X_{node}-K,0).

The transition probabilities can be expressed as
pu=((e(RY)Δt/2d)/(ud))2p_{u}=\left({(e^{(R-Y)\Delta t/2}-\sqrt{d})}/{(\sqrt{u}-\sqrt{d})}\right)^{2},
pd=((ue(RY)Δt/2)/(ud))2p_{d}=\left({(\sqrt{u}-e^{(R-Y)\Delta t/2})}/{(\sqrt{u}-\sqrt{d})}\right)^{2}, and po=1pupdp_{o}=1-p_{u}-p_{d}, which are alternate forms of those given in (Hull, 2003).

Let m=eRΔtm=e^{-R\Delta t}, s0=mpu,s1=mpo,s2=mpds_{0}=mp_{u},s_{1}=mp_{o},s_{2}=mp_{d}. Let Gi,jG_{i,j} denote the grid value in the row i[0,T]i\in[0,T] and column j[0,2i]j\in[0,2i] of the (T+1)×(2T+1)(T+1)\times(2T+1) grid GG. Let Gi,jgreen=SujiKG^{green}_{i,j}=S\cdot u^{j-i}-K, and let Gi,jred=k{0,1,2}skGi+1,j+kG^{red}_{i,j}=\sum_{k\in\{0,1,2\}}{s_{k}G_{i+1,j+k}} if i[0,T)i\in[0,T), and 0 otherwise. Then similar to BOPM:

Gi,j={Gi,jred,if Gi,jredGi,jgreenGi,jgreen,otherwise.\displaystyle G_{i,j}=\begin{cases}G^{red}_{i,j},&\text{if }G^{red}_{i,j}\geq G^{green}_{i,j}\\ G^{green}_{i,j},&\text{otherwise.}\end{cases}

In the supplementary material, we show that the TOPM grid shows properties similar to the BOPM grid, that is, in every row all red cells appear first in contiguous locations followed by all green cells, and with every time step the red-green boundary moves by at most one cell to the left. This allows us to use a similar algorithm to that given for BOPM (Section 2.3) with the identical work and span.

4. American Put Option Under the Black-Scholes-Merton Model

4.1. Black-Scholes-Merton Pricing Model (BSM)

BSM is a mathematical method to calculate the theoretical value of an option contract. The option pricing problem is transformed into a partial differential equation (PDE) with variable coefficients. An explicit formula for the price can be obtained assuming a log-normal distribution of the asset price. Note that the limit of the discrete-time BOPM approximates the continuous-time BSM under the same assumption. While BOPM utilizes simple statistical methods, BSM requires a solution of a stochastic differential equation.

Denote stock price at time tt by 𝒮(t)\mathcal{S}(t). BSM claims that

there is a deterministic relation between the option price and the stock price and time. This means that there is a deterministic function v(t,x)v(t,x) for option price xx and time tt such that: X(t)=v(t,𝒮(t))X(t)=v(t,\mathcal{S}(t)), where X(t)X(t) represents the value of the option at time tt. Now BSM derives that v(t,x)v(t,x) satisfies the following two-area classical form:

(6) v(t,x)={1r(vt(t,x)+rxvx(t,x)+12σ2x22vx2(t,x)),if x>L(Tt)Kx,if 0xL(Tt)\displaystyle v(t,x)=\left\{\begin{array}[]{lr@{}}\frac{1}{r}\left(\begin{array}[]{@{}c@{}}\frac{\partial v}{\partial t}(t,x)+rx\frac{\partial v}{\partial x}(t,x)\\ +\frac{1}{2}\sigma^{2}x^{2}\frac{\partial^{2}v}{\partial x^{2}}(t,x)\end{array}\right),&\mbox{if $x>L(T-t)$}\\ \vspace{-0.4cm}&\\ K-x,&\mbox{if $0\leq x\leq L(T-t)$}\end{array}\right.

where L(0)=KL(0)=K, vx(t,L(Tt)+)=vx(t,L(Tt))=1\frac{\partial v}{\partial x}(t,L(T-t)^{+})=\frac{\partial v}{\partial x}(t,L(T-t)^{-})=-1, and v(t,L(Tt)+)=v(t,L(Tt))v(t,L(T-t)^{+})=v(t,L(T-t)^{-}) for 0t<T0\leq t<T.

It is equivalent to satisfying the following:

(12) v(t,x){1r(vt(t,x)+rxvx(t,x)+12σ2x22vx2(t,x)),if 0xL(Tt)Kx,if x>L(Tt)\displaystyle v(t,x)\geq\left\{\begin{array}[]{lr@{}}\frac{1}{r}\left(\begin{array}[]{@{}c@{}}\frac{\partial v}{\partial t}(t,x)+rx\frac{\partial v}{\partial x}(t,x)\\ +\frac{1}{2}\sigma^{2}x^{2}\frac{\partial^{2}v}{\partial x^{2}}(t,x)\end{array}\right),&\mbox{if $0\leq x\leq L(T-t)$}\\ \vspace{-0.4cm}&\\ K-x,&\mbox{if $x>L(T-t)$}\end{array}\right.

where, t[0,T]t\in[0,T], Recall that at the maturity time TT, the option price (call option case) will be X(T)=max(𝒮(T)K,0)=(K𝒮(T))+X(T)=\max(\mathcal{S}(T)-K,0)=(K-\mathcal{S}(T))^{+}.

It also means that v(T,x)=(Kx)+v(T,x)=(K-x)^{+} on the boundary. Therefore, the goal becomes solving Equation (6) or Inequality (12). For more details on how the BSM model is formulated or why the complementary form is equivalent to the classical form, see (Chen and Chadam, 2007; Shreve, 2004; Van M., 1976). We show the two areas are contiguous and find properties of their boundary in Section 4.2, which can be exploited by our algorithm in Section 4.3.

4.2. Properties of the Two-Area Boundary

Note that Equation (6) includes dimensional variables. First, we find nondimensionalized forms of Equations (6) and (12).

Let s=lnxKs=\ln\frac{x}{K}, τ=12σ2(Tt)\tau=\frac{1}{2}\sigma^{2}(T-t), v~(τ,s)=1Kv(t,x)\tilde{v}(\tau,s)=\frac{1}{K}v(t,x), L~(τ)=L(Tt)\tilde{L}(\tau)=L(T-t) and ω=2rσ2\omega=\frac{2r}{\sigma^{2}}. Then: v~s=xKvx,2v~s2=x2K2vx2+xKvx,v~τ=2Kσ2vt\frac{\partial\tilde{v}}{\partial s}=\frac{x}{K}\frac{\partial v}{\partial x},\frac{\partial^{2}\tilde{v}}{\partial s^{2}}=\frac{x^{2}}{K}\frac{\partial^{2}v}{\partial x^{2}}+\frac{x}{K}\frac{\partial v}{\partial x},\frac{\partial\tilde{v}}{\partial\tau}=-\frac{2}{K\sigma^{2}}\frac{\partial v}{\partial t}.

Applying Equation (6), and v~(τ,s)\tilde{v}(\tau,s) satisfes the following:

(17) v~(τ,s)={1ω((ω1)v~s(τ,s)+2v~s2(τ,s)v~τ(τ,s)),if s>L~(τ)1es,if sL~(τ)\displaystyle\tilde{v}(\tau,s)=\left\{\begin{array}[]{lr@{}}\frac{1}{\omega}\left(\begin{array}[]{@{}c@{}}(\omega-1)\frac{\partial\tilde{v}}{\partial s}(\tau,s)\\ +\frac{\partial^{2}\tilde{v}}{\partial s^{2}}(\tau,s)-\frac{\partial\tilde{v}}{\partial\tau}(\tau,s)\end{array}\right),&\mbox{if $s>\tilde{L}(\tau)$}\\ 1-e^{s},&\mbox{if $s\leq\tilde{L}(\tau)$}\end{array}\right.

It also satisfies the dimensionless complementary form after plugging into Inequality (12):

(22) v~(τ,s){1ω((ω1)v~s(τ,s)+2v~s2(τ,s)v~τ(τ,s)),if sL~(τ)1es,if s>L~(τ)\displaystyle\tilde{v}(\tau,s)\geq\left\{\begin{array}[]{lr@{}}\frac{1}{\omega}\left(\begin{array}[]{@{}c@{}}(\omega-1)\frac{\partial\tilde{v}}{\partial s}(\tau,s)\\ +\frac{\partial^{2}\tilde{v}}{\partial s^{2}}(\tau,s)-\frac{\partial\tilde{v}}{\partial\tau}(\tau,s)\end{array}\right),&\mbox{if $s\leq\tilde{L}(\tau)$}\\ 1-e^{s},&\mbox{if $s>\tilde{L}(\tau)$}\end{array}\right.

where, L~(0)=1\tilde{L}(0)=1 and v~(s,0)=max(1es,0)\tilde{v}(s,0)=\max(1-e^{s},0). Consider an approximation vknv_{k}^{n} of v~(nΔτ,kΔs)\tilde{v}(n\Delta\tau,k\Delta s) and use the following finite-difference approximations (where, v~\tilde{v} denotes v~(nΔτ,kΔs)\tilde{v}(n\Delta\tau,k\Delta s)):

v~τvkn+1vknΔt,v~svk+1nvk1n2Δs,2v~s2vk+1n2vkn+vk1n(Δs)2\displaystyle\frac{\partial\tilde{v}}{\partial\tau}\approx\frac{v_{k}^{n+1}-v_{k}^{n}}{\Delta t},\frac{\partial\tilde{v}}{\partial s}\approx\frac{v_{k+1}^{n}-v_{k-1}^{n}}{2\Delta s},\frac{\partial^{2}\tilde{v}}{\partial s^{2}}\approx\frac{v_{k+1}^{n}-2v_{k}^{n}+v_{k-1}^{n}}{(\Delta s)^{2}}

Now plug it into Equation (17) to obtain these two regions:

(28) vkn+1={((1ωΔτ2Δτ(Δs)2)vkn+h{1,1}(Δτ(Δs)2+h(ω1)2ΔτΔs)vk+hn),if k>L^1ekΔs,if kL^\displaystyle v_{k}^{n+1}=\left\{\begin{array}[]{lr@{}}\left(\begin{array}[]{@{}c@{}}\left(1-\omega\Delta\tau-2\frac{\Delta\tau}{(\Delta s)^{2}}\right)v_{k}^{n}+\\ ~{}\sum\limits_{h\in\{-1,1\}}{\left(\frac{\Delta\tau}{(\Delta s)^{2}}+h\frac{(\omega-1)}{2}\frac{\Delta\tau}{\Delta s}\right)v_{k+h}^{n}}\end{array}\right),&\mbox{if }k>\widehat{L}\\ \vspace{-.4cm}&\\ 1-e^{k\Delta s},&\mbox{if }k\leq\widehat{L}\end{array}\right.

where, L^=L~((n+1)Δτ)Δs\widehat{L}=\frac{\tilde{L}((n+1)\Delta\tau)}{\Delta s} and vk0=max(1ekΔs,0)v_{k}^{0}=\max(1-e^{k\Delta s},0) for all integer kk. It also satisfies the following condition by discretizing Equation (22):

(33) vkn+1{((1ωΔτ2Δτ(Δs)2)vkn+h{1,1}(Δτ(Δs)2+h(ω1)2ΔτΔs)vk+hn),if kL^1ekΔs,if k>L^\displaystyle v_{k}^{n+1}\geq\left\{\begin{array}[]{lr@{}}\left(\begin{array}[]{@{}c@{}}\left(1-\omega\Delta\tau-\frac{2\Delta\tau}{(\Delta s)^{2}}\right)v_{k}^{n}+\\ \sum\limits_{h\in\{-1,1\}}{\left(\frac{\Delta\tau}{(\Delta s)^{2}}+\frac{h(\omega-1)}{2}\frac{\Delta\tau}{\Delta s}\right)v_{k+h}^{n}}\end{array}\right),&\mbox{if }k\leq\widehat{L}\\ 1-e^{k\Delta s},&\mbox{if }k>\widehat{L}\end{array}\right.

Similar to the BOPM for American option, we define the green zone and the red zone:

Definition 4.1.

We call that vknv_{k}^{n} is in the green zone when kΔsL~(nΔτ)k\Delta s\leq\tilde{L}(n\Delta\tau), otherwise it is in the red zone.

To apply Equations (28)–(33), we need to determine the form of L~((n+1)Δτ)\tilde{L}((n+1)\Delta\tau) which has no analytical form, although it has asymptotic results (Alobaidi and Mallier, 2001) or approximation results (Zhu and He, 2007; Zhu, 2006). Instead we can use the following theorem from (Chen and Chadam, 2007):

Theorem 4.2 ((Chen and Chadam, 2007)).

The early exercise boundary curve L~(τ)\tilde{L}(\tau) is monotonically decreasing.

Theorem 4.3.

Let a=Δτ(Δs)2+(ω1)ΔτΔsa=\frac{\Delta\tau}{(\Delta s)^{2}}+(\omega-1)\frac{\Delta\tau}{\Delta s}, b=Δτ(Δs)2(ω1)ΔτΔsb=\frac{\Delta\tau}{(\Delta s)^{2}}-(\omega-1)\frac{\Delta\tau}{\Delta s}, c=1abωΔτc=1-a-b-\omega\Delta\tau, and knk_{n} be the largest integer such that knL~(nΔτ)Δsk_{n}\leq\frac{\tilde{L}(n\Delta\tau)}{\Delta s}. Then 0knkn+110\leq k_{n}-k_{n+1}\leq 1 when a,b,c0a,b,c\geq 0.

Proof.

We first prove that knkn+10k_{n}-k_{n+1}\geq 0. Suppose that this is not true. Then we will have: L~((n+1)Δτ)kn+1kn+1>L~(nΔτ)\tilde{L}((n+1)\Delta\tau)\geq k_{n+1}\geq k_{n}+1>\tilde{L}(n\Delta\tau), which contradicts Theorem 4.2.

Now we prove that knkn+11k_{n}-k_{n+1}\leq 1. Because knk_{n} is in the green zone, we will have the following:

vknn=1eknΔs(1ωΔτab)vknn1+avkn+1n1+bvkn1n1\displaystyle v_{k_{n}}^{n}=1-e^{k_{n}\Delta s}\geq(1-\omega\Delta\tau-a-b)v_{k_{n}}^{n-1}+av_{k_{n}+1}^{n-1}+bv_{k_{n}-1}^{n-1}
(1ωΔτab)(1eknΔs)+a(1e(kn+1)Δs)\displaystyle\geq(1-\omega\Delta\tau-a-b)\left(1-e^{k_{n}\Delta s}\right)+a\left(1-e^{(k_{n}+1)\Delta s}\right)
+b(1e(kn1)Δs)\displaystyle+b\left(1-e^{(k_{n}-1)\Delta s}\right)
ωΔτ(1eknΔs)+eknΔs(a(eΔs1)+b(eΔs1))0\displaystyle\Rightarrow\omega\Delta\tau\left(1-e^{k_{n}\Delta s}\right)+e^{k_{n}\Delta s}\left(a\left(e^{\Delta s}-1\right)+b\left(e^{-\Delta s}-1\right)\right)\geq 0
1e(kn1)Δs>(1ωΔτab)(1e(kn1)Δs)\displaystyle\Rightarrow 1-e^{(k_{n}-1)\Delta s}>(1-\omega\Delta\tau-a-b)\left(1-e^{(k_{n}-1)\Delta s}\right)
+a(1e(kn)Δs)+b(1e(kn2)Δs)\displaystyle\quad+a\left(1-e^{(k_{n})\Delta s}\right)+b\left(1-e^{(k_{n}-2)\Delta s}\right)

Considering vkn1n+1v_{k_{n}-1}^{n+1}, we first observe that:

vkn2n=1e(kn2)Δs,vkn1n=1e(kn1)Δs,vknn=1e(kn)Δs\displaystyle v_{k_{n}-2}^{n}=1-e^{(k_{n}-2)\Delta s},v_{k_{n}-1}^{n}=1-e^{(k_{n}-1)\Delta s},v_{k_{n}}^{n}=1-e^{(k_{n})\Delta s}

Now we show that vkn1n+1v_{k_{n}-1}^{n+1} is in the green zone. Suppose that it is in the red zone. Then:

vkn1n+1\displaystyle v_{k_{n}-1}^{n+1} =(1ωΔτab)(1e(kn1)Δs)+a(1e(kn)Δs)\displaystyle=(1-\omega\Delta\tau-a-b)\left(1-e^{(k_{n}-1)\Delta s}\right)+a\left(1-e^{(k_{n})\Delta s}\right)
+b(1e(kn2)Δs)<1e(kn1)Δs\displaystyle\quad+b\left(1-e^{(k_{n}-2)\Delta s}\right)<1-e^{(k_{n}-1)\Delta s}

which leads to a contradiction because vkn1n+1v_{k_{n}-1}^{n+1} should be 1e(kn1)Δs\geq 1-e^{(k_{n}-1)\Delta s}. By the definition of kn+1k_{n+1}, we must have kn+1kn1k_{n+1}\geq k_{n}-1, completing the proof of the theorem. ∎

4.3. Algorithm for American put option pricing under BSM

Refer to caption
(a) Decomposing trapezoid abcdabcd into smaller trapezoids
Refer to caption
(b) Partitioning the solution space into trapezoids
Figure 4. American put option pricing under BSM

Our algorithm for the American put option under BSM is similar to our algorithm for the American call option under BOPM as described in Section 2.

Observe that we will have to perform a nonlinear stencil computation based on the update equation (28). For TT time steps we use a T×2TT\times 2T space-time grid with the time dimension being TT and spatial dimension 2T2T. According to Equation (28), we compute a cell vkn+1v_{k}^{n+1} of that grid, where n+1n+1 represents the time coordinate and kk the spatial coordinate, from cells vk1nv_{k-1}^{n}, vknv_{k}^{n}, and vk+1nv_{k+1}^{n} using a 3-point stencil provided k>L~((n+1)Δτ)Δsk>\frac{\tilde{L}((n+1)\Delta\tau)}{\Delta s}. Otherwise, we set it to 1ekΔs1-e^{k\Delta s}. In the first case, cell vkn+1v_{k}^{n+1} will be in the red zone, and in the second case it will be in the green zone. As explained in Section 4.2, the entire boundary between these two zones in not known ahead of time, but it moves by at most one cell toward the green region with every time step. The goal of the algorithm is to compute the value of the central cell of the spatial dimension at time step TT (e.g., apex xx of the isosceles triangle abxabx in Figure 4(b)).

We solve the problem by decomposing the isosceles triangle abxabx into a sequence of the isosceles trapezoids of geometrically decreasing heights (see Figure 4(b)) and solving (i.e., find the cell values of top base given those of the bottom base of the trapezoid) them one by one from bottom to top until we reach a leftover triangle of small constant size which we solve naïvely to find the value of xx. We solve an isosceles trapezoid recursively by decomposing it into two smaller trapezoids of smaller height and solving them recursively and also using the FFT-based algorithm from (Ahmad et al., 2021) solve two subtrapezoids that are entirely composed of red cells (see Figure 4(a)). Details of this algorithm are given below.

We first show how to solve an isosceles trapezoid abdcabdc (as shown in Figure 4(a)) of height ll, bottom/longer base (abab) length 44\ell, and cab=dba=45\angle cab=\angle dba=45^{\circ}. Thus, the top/shorter base dcdc is of length 22\ell. Solving trapezoid abdcabdc means computing the values of the cell at the top base cdcd given the values of the cells at the bottom base abab.

If 10\ell\leq 10, we naïvely solve abdcabdc and identify the location of the red-green boundary point pp in cdcd in 𝒪(1){\mathcal{O}}\left({1}\right) time. If >10\ell>10, we find the row hnhn at height 2\frac{\ell}{2} and calculate all the cells in it. To do so, we recursively solve the trapezoid egljeglj which is found as follows:

1. Let ff be the point on abab that lies in the green region, but f+1f+1 is in the red region. Identify the points ee and jj to the left and right of ff, respectively, such that |ef|=|fj|=|ef|=|fj|=\ell.

2. Construct an isosceles trapezoid with base egeg, height 2\frac{\ell}{2} and top jljl such that jeg=lge=45\angle jeg=\angle lge=45^{\circ}. Thus, |jl|=|jl|=\ell.

After solving the trapezoid egljeglj, we have the cell values in jljl and the location of the red-green boundary point kk in jljl. We can easily calculate the values of cells in hjhj since those values are independent of time and depend only on spatial coordinates. Finally, we use the FFTFFT-based algorithm of (Ahmad et al., 2021) to solve the trapezoid fbnlfbnl, where the point ll is found by forcing fbnlfbnl to be an isosceles trapezoid.

Therefore, we can get the values of the cells in lnln and thus the values of all the cells in hnhn. Then we can calculate the values of the cells in dcdc given the values of the cells in hnhn exactly the same way as we computed the cell values in hnhn from those in abab.

Now, let us go back to Figure 4(b) to see how to compute the value of the apex xx. After solving abdcabdc as above to calculate the cells in cdcd, if |cd|10|cd|\leq 10, we naïvely calculate the value of xx, which takes 𝒪(1){\mathcal{O}}\left({1}\right) time. But if |cd|>10|cd|>10, we recursively apply our trapezoid algorithm to solve a smaller trapezoid with the bottom base cdcd.

Theorem 4.4.

Our algorithm solves the American put option pricing problem under BSM in 𝒪(Tlog2T){\mathcal{O}}\left({T\log^{2}T}\right) work and 𝒪(T){\mathcal{O}}\left({T}\right) span, where TT is the number of time steps.

Proof.

The proof is very similar to that of Theorem 2.8. Let ζ1()\zeta_{1}(\ell) and ζ()\zeta_{\infty}(\ell) be the work and span, respectively, for solving a trapezoid of height \ell (see Figure 4(a)). We recursively solve two trapezoids of height /2\ell/2 each in sequence but use a parallel FFT-based algorithm (Ahmad et al., 2021) on each half, both size Θ(){\Theta}\left({\ell}\right). Since the FFT-based algorithm performs 𝒪(log){\mathcal{O}}\left({\ell\log{\ell}}\right) work in 𝒪(logloglog){\mathcal{O}}\left({\log{\ell}\log\log{\ell}}\right) span, we can write: ζ1()=2ζ1(2)+𝒪(log)\zeta_{1}(\ell)=2\zeta_{1}\left(\frac{\ell}{2}\right)+{\mathcal{O}}\left({\ell\log\ell}\right) if >10\ell>10 and 𝒪(1){\mathcal{O}}\left({1}\right) otherwise. Similarly, ζ()=2ζ(2)+𝒪(logloglog)\zeta_{\infty}(\ell)=2\zeta_{\infty}\left(\frac{\ell}{2}\right)+{\mathcal{O}}\left({\log{\ell}\log\log{\ell}}\right) if >10\ell>10 and 𝒪(1){\mathcal{O}}\left({1}\right) otherwise. Solving, ζ1()=𝒪(log2)\zeta_{1}(\ell)={\mathcal{O}}\left({\ell\log^{2}{\ell}}\right) and ζ()=𝒪()\zeta_{\infty}(\ell)={\mathcal{O}}\left({\ell}\right).

Now, let Ψ1(T)\Psi_{1}(T) and Ψ(T)\Psi_{\infty}(T) be the work and the span, respectively, of solving an isosceles triangle of base size TT (see Figure 4(b)). Then Ψ1(T)=Ψ1(T2)+𝒪(Tlog2T)\Psi_{1}(T)=\Psi_{1}\left(\frac{T}{2}\right)+{\mathcal{O}}\left({T\log^{2}T}\right) if T>10T>10 and 𝒪(1){\mathcal{O}}\left({1}\right) otherwise. Also, Ψ(T)=Ψ(T2)+𝒪(T)\Psi_{\infty}(T)=\Psi_{\infty}\left(\frac{T}{2}\right)+{\mathcal{O}}\left({T}\right) if T>10T>10 and 𝒪(1){\mathcal{O}}\left({1}\right) otherwise. Solving, Ψ1(T)=𝒪(Tlog2T)\Psi_{1}(T)={\mathcal{O}}\left({T\log^{2}{T}}\right) and Ψ=𝒪(T)\Psi_{\infty}={\mathcal{O}}\left({T}\right).∎

Refer to caption
(a) BOPM
Refer to caption
(b) TOPM
Refer to caption
(c) BSM
Figure 5. Running time comparisons of our algorithms with state-of-the-art parallel algorithms.
Refer to caption
(a) BOPM
Refer to caption
(b) TOPM
Refer to caption
(c) BSM
Figure 6. Comparison of energy consumption.
Refer to caption
(a) BOPM
Refer to caption
(b) TOPM
Refer to caption
(c) BSM
Refer to caption
(d) BOPM
Refer to caption
(e) TOPM
Refer to caption
(f) BSM
Figure 7. Comparison of L1 and L2 Cache misses.

5. Experimental Results

In this section, we present an experimental evaluation of our algorithms and compare them with the fastest existing solutions. Our experimental setup is shown in Table 3. The legends used in our plots are listed in Table 4, which are described in more detail in the next few paragraphs.

Table 3. Experimental setup on a Stampede2 (Stampede, ede2) SKX node.
Processor Intel Xeon Platinum 8160 (Skylake / SKX)
Cores 24 cores per socket, 2 sockets (total: 48 cores)
Cache sizes L1 32 KB / core, L2 1 MB / core, L3 33 MB / socket
Memory 144GB /tmp partition on a 200GB SSD
Compiler Intel C++ Compiler (ICC) v18.0.2
Compiler flags -O3 -xhost -ansi-alias -ipo -AVX512
Parallelization OpenMP 5.0
Thread affinity GOMP_CPU_AFFINITY
Table 4. Legends used in plots and tables.
Legend Meaning
fft-bopm, fft-topm, fft-bsm our FFT-based implementations for BOPM, TOPM, and BSM, respectively
ql-bopm, zb-bopm BOPM implementations from Par-bin-ops based on QuantLib and Zubaer et al.’s work, respectively
vanilla-topm, vanilla-bsm our parallel looping implementations for TOPM and BSM, respectively

Benchmarks. For benchmarks, we use American call option pricing under BOPM and TOPM, and American put option under the BSM. For the BOPM call option benchmarks, our baselines are the option call probability calculations from QuantLib (contributors, 2022) and Zubair et al.’s parallel cache optimized model (Zubair and Mukkamala, 2008). We use the optimized implementations of these two baselines available in Par-bin-ops (Brunelle, 2022). These implementations are the fastest existing implementations of BOPM call option pricing. For the TOPM call option and BSM put option, our FFT-based implementations are compared with our parallel looping-based vanilla implementations, as we could not find any publicly available faster implementations.

We use the perf (version: 3.10.0-1160.53.1.el7.x86_64.debug) tool (perf, [n. d.]) to analyze the system-wide energy consumption, and the PAPI (version: 5.6.0) library (Weaver et al., 2013) for cache miss counts for our implementations and benchmarks.

Par-bin-ops. In 2022, Brunelle et al. (Brunelle, 2022) released an open-source framework that can leverage parallel, cache-optimized algorithms to compute a variety of binomial option types and enables a simple interface for developers. We used the latest version from Github. Experimental evaluations by Brunelle et al. (Brunelle, 2022) has shown that Par-bin-ops achieves more than 139×139\times speedup over the QuantLib library when evaluating a European call option using 200,000 steps. Therefore, we chose the Par-bin-ops tool to benchmark our implementations of our FFT-based algorithms. To have a valid comparison of running times, we use Par-bin-ops for both QuantLib and Zubaer et al.’s (Zubair and Mukkamala, 2008) option probability calculation equations and report the running times comparing with our FFT-based implementations. We use the stencil-based cache-optimized version of Zubaer et al.’s algorithm from Par-bin-ops.

Parameter Values. As Table 2 shows, the only option pricing parameter that influences the performance bounds of the algorithms in our experiments is the number of time steps TT. Therefore, we keep all other option pricing parameters fixed in all of our experiments. We use the following parameter values: E=252E=252, K=130K=130, S=127.62S=127.62, R=0.00163R=0.00163, V=0.2V=0.2, Y=0.0163Y=0.0163 (notations are in Table 1).

5.1. Parallel Running Times

Figure 5(a)(a) shows the parallel running time comparison of American call option pricing calculations under BOPM. Our FFT-based algorithm uses a recursive divide-and-conquer approach. We have found empirically that a base case size of 8 steps yields the best running times. Experimental results show that our FFT-based algorithm can outperform Par-bin-ops for any number of step sizes for both serial and parallel implementations. We achieve more than 16×16\times speedup for T1000T\approx 1000 and more than 500×500\times speedup for T0.5millionT\approx 0.5~{}\textrm{million} w.r.t. Par-bin-ops implementations.

Our TOPM algorithm runs more than 2.5×2.5\times faster for T2000T\approx 2000 and more than 390×390\times faster for T2.1millionT\approx 2.1~{}\textrm{million} w.r.t. the parallel vanilla code. Figure 5(b)(b) shows the comparisons.

Figure 5(c)(c) shows the parallel running time comparisons of the American put option pricing computations under BSM. Our FFT-based implementation is compared with the looping-based vanilla implementation. Our algorithm achieves more than 8×8\times speedup for T1000T\approx 1000 and more than 14×14\times speedup for T0.13millionT\approx 0.13~{}\textrm{million} w.r.t. the vanilla implementation.

5.2. Energy Consumption

Figure 6 shows the comparison of system-wide energy consumption while running our FFT-based implementation and respective benchmarks. We collected the energy consumption estimate of the entire package (pkg) and the main memory (RAM) through the RAPL (Running Average Power Limit) interface of model-specific registers (MSR) using the perf (perf, [n. d.]) profiling tool. Our FFT-based BOPM, TOPM, and BSM implementations consume 99%, 99%, and 96% less energy, respectively, compared to their benchmarks for large TT values used in our experiments. For T4000T\approx 4000, the energy savings are 80%, 50%, and 40%, respectively. Energy plots for pkg and RAM separately are included as supplementary material.

5.3. Cache Misses

Figure 7 shows L1 cache-miss (= L2 cache access) and L2 cache-miss results, respectively, for all implementations. Our FFT-based implementation incurs far fewer L1 cache misses than both Par-bin-ops implementations for BOPM. However, while we incur far fewer L2 cache-misses than ql-bopm, the other one (zb-bopm) incurs fewer L2 misses than ours. In case of TOPM, while our FFT-based implementation incurs far fewer L2 misses than our parallel looping implementation (vanilla-topm), the trend is not clear for L1 misses. For BSM, neither L1 nor L2 misses seem to have a clear winner.

5.4. Scalability

As Table 2 shows, our algorithms reduce the span 𝒯\mathcal{T}_{\infty} of solving the option pricing problems that we consider by a factor of Ω(logT){\Omega}\left({\log{T}}\right), but they reduce the work 𝒯1\mathcal{T}_{1} by a substantially larger factor of Θ(T/log2T){\Theta}\left({T/\log^{2}{T}}\right). While the reduction in work leads to a significant reduction in energy consumption (see Section 5.2), it also leads to a low parallelism of 𝒯1/𝒯=Θ(log2T)\mathcal{T}_{1}/\mathcal{T}_{\infty}={\Theta}\left({\log^{2}{T}}\right). But it is still easy to see from the complexities given in Table 2 that the parallel running time 𝒯p\mathcal{T}_{p} of our algorithms will be asymptotically lower than that of the corresponding fastest existing parallel algorithms for every value of pp (stated in Proposition 1.1).

Table 5. Parallel run times (in ms) for T=215T=2^{15} as pp varies.
p=1p=1 p=2p=2 p=4p=4 p=8p=8 p=16p=16 p=32p=32 p=48p=48
fft-bopm 3232 2828 2424 2626 2929 3333 3838
ql-bopm 2655226552 1249812498 67856785 35303530 19501950 13241324 11911191

Table 5 above shows how the parallel running times of our FFT-based BOPM implementation (fft-bopm) and that of the QuantLib-based implementation from Par-bin-ops (ql-bopm) vary with pp as TT is kept fixed at 2152^{15}. We see that although the parallel running time of fft-bopm stops decreasing somewhere between p=4p=4 and p=8p=8, it remains significantly faster than ql-bopm even when p=48p=48. Our algorithm scales better as TT increases, e.g., for T=219T=2^{19}, it scales to a p[8,16)p\in[8,16), and compared to a subsecond running time of fft-bopm for p=1p=1, ql-bopm takes 2\approx 2 hours to run.

6. Conclusion and Future Work

We have designed fast American option pricing algorithms under the binomial, trinomial, and the Black-Scholes-Merton models. We solve a type of nonlinear stencil problem that is of independent interest with potential applications beyond quantitative finance.

Future work may explore other models for American option pricing, such as the time dependent volatility model, stochastic volatility model, and the jump diffusion model. European and Asian options, Lookback options, Knock-out Barrier options, and Bermudan options are also of interest.

Supplementary Material. Includes our FFT-based implementations and details on our TOPM results (Section 3).

Acknowledgements We want to thank Julian Shun, Yuanhao Wei, Shangdi Yu, Pranali Vani, and Sabiyyah Ali for valuable discussions and a careful review of the initial draft of the paper which helped enhance its quality and ensure the correctness.

References

  • (1)
  • Ahmad et al. (2021) Zafar Ahmad, Rezaul Chowdhury, Rathish Das, Pramod Ganapathi, Aaron Gregory, and Yimin Zhu. 2021. Fast stencil computations using fast Fourier transforms. In Proceedings of the 33rd ACM Symposium on Parallelism in Algorithms and Architectures. 8–21.
  • Alobaidi and Mallier (2001) Ghada Alobaidi and Roland Mallier. 2001. Asymptotic analysis of American call options. International Journal of Mathematics and Mathematical Sciences 27, 3 (2001), 177–188.
  • Ames (2014) William F. Ames. 2014. Numerical methods for partial differential equations. Academic press.
  • Andonov and Rajopadhye (1997) Rumen Andonov and Sanjay Rajopadhye. 1997. Optimal orthogonal tiling of 2-D iterations. Journal of Parallel and Distributed computing 45, 2 (1997), 159–165.
  • Atangana and Nieto (2015) Abdon Atangana and Juan J. Nieto. 2015. Numerical solution for the model of RLC circuit via the fractional derivative without singular kernel. Advances in Mechanical Engineering 7, 10 (2015), 1687814015613758.
  • Aubin et al. (2004) Joelle Aubin, David F. Fletcher, and Catherine Xuereb. 2004. Modeling turbulent flow in stirred tanks with CFD: the influence of the modeling approach, turbulence model and numerical scheme. Experimental thermal and fluid science 28, 5 (2004), 431–445.
  • Avissar and Pielke (1989) Roni Avissar and Roger A Pielke. 1989. A parameterization of heterogeneous land surfaces for atmospheric numerical models and its impact on regional meteorology. Monthly Weather Review 117, 10 (1989), 2113–2136.
  • Bachelier (1900) Louis Bachelier. 1900. Théorie de la spéculation. In Annales scientifiques de l’École normale supérieure, Vol. 17. 21–86.
  • Bakshi and Madan (2000) Gurdip Bakshi and Dilip Madan. 2000. Spanning and derivative-security valuation. Journal of financial economics 55, 2 (2000), 205–238.
  • Ball and Roma (1994) Clifford A. Ball and Antonio Roma. 1994. Stochastic volatility option pricing. Journal of Financial and Quantitative Analysis 29, 4 (1994), 589–607.
  • Bandishti et al. (2012) Vinayaka Bandishti, Irshad Pananilath, and Uday Bondhugula. 2012. Tiling stencil computations to maximize parallelism. In Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis. 1–11.
  • Barth and Deconinck (2013) Timothy J. Barth and Herman Deconinck. 2013. High-order Methods for Computational Physics. Vol. 9. Springer Science & Business Media.
  • Bensoussan (1984) Alain Bensoussan. 1984. On the theory of option pricing. Acta Applicandae Mathematica 2, 2 (1984), 139–158.
  • Bergomi (2015) Lorenzo Bergomi. 2015. Stochastic volatility modeling. CRC press.
  • Black and Scholes (1973) Fischer Black and Myron Scholes. 1973. The pricing of options and corporate liabilities. Journal of political economy 81, 3 (1973), 637–654.
  • Bondhugula et al. (2016) Uday Bondhugula, Aravind Acharya, and Albert Cohen. 2016. The Pluto+ algorithm: A practical approach for parallelization and locality optimization of affine loop nests. ACM Transactions on Programming Languages and Systems 38, 3 (2016), 1–32.
  • Bondhugula et al. (2017) Uday Bondhugula, Vinayaka Bandishti, and Irshad Pananilath. 2017. Diamond tiling: Tiling techniques to maximize parallelism for stencil computations. IEEE Transactions on Parallel and Distributed Systems 28, 5 (2017), 1285–1298.
  • Boyle (1986) Phelim Boyle. 1986. Option Valuation Using a Three-Jump Process. International Options Journal 3 (1986), 7–12.
  • Boyle (1977) Phelim P. Boyle. 1977. Options: A monte carlo approach. Journal of financial economics 4, 3 (1977), 323–338.
  • Brunelle (2022) Terryn Brunelle. 2022. Parallelizing Tree Traversals for Binomial Option Pricing. Ph. D. Dissertation. Massachusetts Institute of Technology.
  • Burden et al. (2015) Richard L. Burden, Douglas J. Faires, and Annette M. Burden. 2015. Numerical analysis. Cengage learning.
  • Chang et al. (2007) Chuang-Chang Chang, San-Lin Chung, and Richard C. Stapleton. 2007. Richardson extrapolation techniques for the pricing of American-style options. Journal of Futures Markets: Futures, Options, and Other Derivative Products 27, 8 (2007), 791–817.
  • Chen and Chadam (2007) Xinfu Chen and John Chadam. 2007. A mathematical analysis of the optimal exercise boundary for American put options. SIAM Journal on Mathematical Analysis 38, 5 (2007), 1613–1641.
  • Chen and Forsyth (2008) Zhuliang Chen and Peter A. Forsyth. 2008. A numerical scheme for the impulse control formulation for pricing variable annuities with a guaranteed minimum withdrawal benefit (GMWB). Numer. Math. 109, 4 (2008), 535–569.
  • Cont and Voltchkova (2005) Rama Cont and Ekaterina Voltchkova. 2005. A finite difference scheme for option pricing in jump diffusion and exponential Lévy models. SIAM J. Numer. Anal. 43, 4 (2005), 1596–1626.
  • contributors (2022) The QuantLib contributors. 2022. QuantLib: a free/open-source library for quantitative finance. https://doi.org/10.5281/zenodo.6461652 If you use this software, please cite it using these metadata..
  • Cormen et al. (2009) Thomas H Cormen, Charles E Leiserson, Ronald L Rivest, and Clifford Stein. 2009. Introduction to Algorithms. MIT Press.
  • Cox et al. (1979) John C. Cox, Stephen A. Ross, and Mark Rubinstein. 1979. Option pricing: A simplified approach. Journal of financial Economics 7, 3 (1979), 229–263.
  • Crank and Nicolson (1947) John Crank and Phyllis Nicolson. 1947. A practical method for numerical evaluation of solutions of partial differential equations of the heat-conduction type. In Mathematical proceedings of the Cambridge philosophical society, Vol. 43. Cambridge University Press, 50–67.
  • Cundall and Hart (1992) Peter A. Cundall and Roger D. Hart. 1992. Numerical modelling of discontinua. Engineering computations (1992).
  • Dempster and Hong (2002) Michael A. H. Dempster and George S. S. Hong. 2002. Spread option valuation and the fast Fourier transform. In Mathematical Finance—Bachelier Congress 2000. Springer, 203–220.
  • Derman and Kani (1994) Emanuel Derman and Iraj Kani. 1994. Riding on a smile. Risk 7, 2 (1994), 32–39.
  • Derman and Kani (1998) Emanuel Derman and Iraj Kani. 1998. Stochastic implied trees: Arbitrage pricing with stochastic term and strike structure of volatility. International journal of theoretical and applied finance 1, 01 (1998), 61–110.
  • Duan and Simonato (1998) Jin-Chuan Duan and Jean-Guy Simonato. 1998. Empirical martingale simulation for asset prices. Management Science 44, 9 (1998), 1218–1233.
  • Duffie et al. (2000) Darrell Duffie, Jun Pan, and Kenneth Singleton. 2000. Transform analysis and asset pricing for affine jump-diffusions. Econometrica 68, 6 (2000), 1343–1376.
  • Dupire (1994) Bruno Dupire. 1994. Pricing with a smile. Risk 7, 1 (1994), 18–20.
  • Frigo and Strumpen (2005) Matteo Frigo and Volker Strumpen. 2005. Cache oblivious stencil computations. In Proceedings of the 19th International Conference on Supercomputing. 361–366.
  • Frigo and Strumpen (2009) Matteo Frigo and Volker Strumpen. 2009. The cache complexity of multithreaded cache oblivious algorithms. Theory of Computing Systems 45, 2 (2009), 203–233.
  • Galai and Masulis (1976) Dan Galai and Ronald W. Masulis. 1976. The option pricing model and the risk factor of stock. Journal of Financial economics 3, 1-2 (1976), 53–81.
  • Gammie et al. (2003) Charles F. Gammie, Jonathan C. McKinney, and Gábor Tóth. 2003. HARM: a numerical scheme for general relativistic magnetohydrodynamics. The Astrophysical Journal 589, 1 (2003), 444.
  • Geman et al. (2001) Hélyette Geman, Dilip B. Madan, and Marc Yor. 2001. Time changes for Lévy processes. Mathematical Finance 11, 1 (2001), 79–96.
  • Han and Wang (2015) Daozhi Han and Xiaoming Wang. 2015. A second order in time, uniquely solvable, unconditionally stable numerical scheme for Cahn–Hilliard–Navier–Stokes equation. J. Comput. Phys. 290 (2015), 139–156.
  • Heston (1993) Steven L. Heston. 1993. A closed-form solution for options with stochastic volatility with applications to bond and currency options. The review of financial studies 6, 2 (1993), 327–343.
  • Hildebrand (1987) Francis B. Hildebrand. 1987. Introduction to numerical analysis. Courier Corporation.
  • Hirouchi et al. (2009) Tomoyuki Hirouchi, Tomohiro Takaki, and Yoshihiro Tomita. 2009. Development of numerical scheme for phase field crystal deformation simulation. Computational materials science 44, 4 (2009), 1192–1197.
  • Högstedt et al. (1999) Karin Högstedt, Larry Carter, and Jeanne Ferrante. 1999. Selecting tile shape for minimal execution time. In ACM Symposium on Parallel algorithms and architectures. 201–211.
  • Holmes (2007) Mark H. Holmes. 2007. Introduction to numerical methods in differential equations. Springer.
  • Hull (2003) John C. Hull. 2003. Options futures and other derivatives (5th ed.). Pearson Education India.
  • Ibanez and Zapatero (2004) Alfredo Ibanez and Fernando Zapatero. 2004. Monte Carlo valuation of American options through computation of the optimal exercise frontier. Journal of Financial and Quantitative Analysis 39, 2 (2004), 253–275.
  • Jackson et al. (2008) Kenneth R. Jackson, Sebastian Jaimungal, and Vladimir Surkov. 2008. Fourier space time-stepping for option pricing with Lévy models. Journal of Computational Finance 12, 2 (2008), 1–29.
  • James (1980) Frederick James. 1980. Monte Carlo theory and practice. Reports on progress in Physics 43, 9 (1980), 1145.
  • Johnson (2010) Martin T. Johnson. 2010. A numerical scheme to calculate temperature and salinity dependent air-water transfer velocities for any gas. Ocean Science 6, 4 (2010), 913–932.
  • Kalnay et al. (1990) Eugenia Kalnay, Masao Kanamitsu, and Wayman E. Baker. 1990. Global numerical weather prediction at the National Meteorological Center. Bulletin of the American Meteorological Society 71, 10 (1990), 1410–1428.
  • Karatzas and Shreve (1998) Ioannis Karatzas and Steven E. Shreve. 1998. Methods of mathematical finance. Vol. 39. Springer.
  • Komissarov (2002) Serguei S. Komissarov. 2002. Time-dependent, force-free, degenerate electrodynamics. Monthly Notices of the Royal Astronomical Society 336, 3 (2002), 759–766.
  • Kou (2002) Steven G. Kou. 2002. A jump-diffusion model for option pricing. Management science 48, 8 (2002), 1086–1101.
  • Krishnamoorthy et al. (2007) Sriram Krishnamoorthy, Muthu Baskaran, Uday Bondhugula, Jagannathan Ramanujam, Atanas Rountev, and Ponnuswamy Sadayappan. 2007. Effective automatic parallelization of stencil computations. ACM sigplan notices 42, 6 (2007), 235–244.
  • Kumar et al. (2012) Sunil Kumar, Ahmet Yildirim, Yasir Khan, Hossein Jafari, Khosro Sayevand, and Leilei Wei. 2012. Analytical solution of fractional Black-Scholes European option pricing equation by using Laplace transform. Journal of fractional calculus and Applications 2, 8 (2012), 1–9.
  • Langat et al. (2019) Kenneth Kiprotich Langat, Joseph Ivivi Mwaniki, and George Korir Kiprop. 2019. Pricing options using trinomial lattice method. Journal of Finance and Economics 7, 3 (2019), 81–87.
  • LeVeque (2007) Randall J. LeVeque. 2007. Finite difference methods for ordinary and partial differential equations: steady-state and time-dependent problems. SIAM.
  • Long et al. (2008) Wen Long, James T. Kirby, and Zhiyu Shao. 2008. A numerical scheme for morphological bed level calculations. Coastal Engineering 55, 2 (2008), 167–180.
  • Longstaff and Schwartz (2001) Francis A. Longstaff and Eduardo S. Schwartz. 2001. Valuing American options by simulation: a simple least-squares approach. The review of financial studies 14, 1 (2001), 113–147.
  • Lord and Rougemont (2004) Gabriel J. Lord and Jacques Rougemont. 2004. A numerical scheme for stochastic PDEs with Gevrey regularity. IMA journal of numerical analysis 24, 4 (2004), 587–604.
  • Lord et al. (2008) Roger Lord, Fang Fang, Frank Bervoets, and Cornelis W. Oosterlee. 2008. A fast and accurate FFT-based method for pricing early-exercise options under Lévy processes. SIAM Journal on Scientific Computing 30, 4 (2008), 1678–1705.
  • MacKean (1965) Henry P. MacKean. 1965. A free boundary problem for the heat equation arising from a problem in mathematical economics. Industrial Management Review 6 (1965), 32–39.
  • Madan et al. (1998) Dilip B. Madan, Peter P. Carr, and Eric C. Chang. 1998. The variance gamma process and option pricing. Review of Finance 2, 1 (1998), 79–105.
  • Malas et al. (2015) Tareq Malas, Georg Hager, Hatem Ltaief, Holger Stengel, Gerhard Wellein, and David Keyes. 2015. Multicore-optimized wavefront diamond blocking for optimizing stencil updates. SIAM Journal on Scientific Computing 37, 4 (2015), C439–C464.
  • Malas et al. (2014) Tareq M. Malas, Georg Hager, Hatem Ltaief, and David E. Keyes. 2014. Towards energy efficiency and maximum computational intensity for stencil algorithms using wavefront diamond temporal blocking. ArXiv abs/1410.5561 (2014).
  • Mangeney et al. (2002) André Mangeney, Francesco Califano, Carlo Cavazzoni, and Pavel M. Travnicek. 2002. A numerical scheme for the integration of the Vlasov–Maxwell system of equations. J. Comput. Phys. 179, 2 (2002), 495–538.
  • Matsuda (2004) Kazuhisa Matsuda. 2004. Introduction to Merton jump diffusion model. Department of Economics, The Graduate Center, The City University of New York, New York (2004).
  • Merton (1973) Robert C. Merton. 1973. Theory of rational option pricing. The Bell Journal of economics and management science (1973), 141–183.
  • Merton (1976) Robert C. Merton. 1976. Option pricing when underlying stock returns are discontinuous. Journal of financial economics 3, 1-2 (1976), 125–144.
  • Metropolis and Ulam (1949) Nicholas Metropolis and Stanislaw Ulam. 1949. The monte carlo method. Journal of the American statistical association 44, 247 (1949), 335–341.
  • Murray and Simmonds (1991) Ross J. Murray and Ian Simmonds. 1991. A numerical scheme for tracking cyclone centres from digital data. Australian meteorological magazine 39, 3 (1991), 155–166.
  • Najm et al. (1998) Habib N. Najm, Peter S. Wyckoff, and Omar M. Knio. 1998. A semi-implicit numerical scheme for reacting flow: I. stiff chemistry. J. Comput. Phys. 143, 2 (1998), 381–402.
  • Oliveira (2014) Pedro S. Oliveira. 2014. The convolution method for pricing American options under Lévy processes. Ph. D. Dissertation. University of Lisbon.
  • Palamadai Natarajan et al. (2017) Ekanathan Palamadai Natarajan, Maryam Mehri Dehnavi, and Charles Leiserson. 2017. Autotuning divide-and-conquer stencil computations. Concurrency and Computation: Practice and Experience 29, 17 (2017), e4127.
  • Pang (1999) Tao Pang. 1999. An Introduction to Computational Physics. American Association of Physics Teachers.
  • Paoli and Schatzman (2002) Laetitia Paoli and Michelle Schatzman. 2002. A numerical scheme for impact problems I: The one-dimensional case. SIAM J. Numer. Anal. 40, 2 (2002), 702–733.
  • perf ([n. d.]) perf [n. d.]. perf: Linux profiling with performance counters. https://perf.wiki.kernel.org/index.php/Main_Page.
  • Peter and Packer (2007) HÃ Peter and Frank Packer. 2007. Understanding asset prices: an overview. Bis Papers (2007).
  • Peyré (2011) Gabriel Peyré. 2011. The numerical tours of signal processing-advanced computational signal and image processing. IEEE Computing in Science and Engineering 13, 4 (2011), 94–97.
  • Prathumwan and Trachoo (2020) Din Prathumwan and Kamonchat Trachoo. 2020. On the solution of two-dimensional fractional Black–Scholes equation for European put option. Advances in Difference Equations 2020, 1 (2020), 1–9.
  • Rappaz et al. (2010) Michel Rappaz, Michel Bellet, and Michel Deville. 2010. Numerical Modeling in Materials Science and Engineering. Vol. 32. Springer Science & Business Media.
  • Rendleman (1979) Richard J. Rendleman. 1979. Two-state option pricing. The Journal of Finance 34, 5 (1979), 1093–1110.
  • Renson et al. (2016) Ludovic Renson, Gaëtan Kerschen, and Bruno Cochelin. 2016. Numerical computation of nonlinear normal modes in mechanical engineering. Journal of Sound and Vibration 364 (2016), 177–206.
  • Richardson (1911) Lewis F. Richardson. 1911. The approximate arithmetical solution by finite differences with an application to stresses in masonry dams. Philosophical Transactions of the Royal Society of America 210 (1911), 307–357.
  • Richardson and Gaunt (1927) Lewis F. Richardson and John A. Gaunt. 1927. VIII. The deferred approach to the limit. Philosophical Transactions of the Royal Society of London. Series A, containing papers of a mathematical or physical character 226, 636-646 (1927), 299–361.
  • Robert (1981) André Robert. 1981. A stable numerical integration scheme for the primitive meteorological equations. Atmosphere-Ocean 19, 1 (1981), 35–46.
  • Robert (1982) André Robert. 1982. A semi-Lagrangian and semi-implicit numerical integration scheme for the primitive meteorological equations. Journal of the Meteorological Society of Japan. Ser. II 60, 1 (1982), 319–325.
  • Runggaldier (2003) Wolfgang J. Runggaldier. 2003. Jump-diffusion models. In Handbook of heavy tailed distributions in finance. Elsevier, 169–209.
  • Samuelson et al. (2011) Paul A. Samuelson, Alison Etheridge, Mark Davis, and Louis Bachelier. 2011. Louis Bachelier’s Theory of Speculation: The Origins of Modern Finance. Princeton University Press.
  • Sato et al. (2010) Katsuto Sato, Hiroyuki Takizawa, Kazuhiko Komatsu, and Hiroaki Kobayashi. 2010. Automatic tuning of CUDA execution parameters for stencil processing. Software Automatic Tuning: From Concepts to State-of-the-Art Results (2010), 209–228.
  • Sewell (2005) Granville Sewell. 2005. The numerical solution of ordinary and partial differential equations. Vol. 75. John Wiley & Sons.
  • Sharpe (1978) William F. Sharpe. 1978. Investments. Prentice Hall.
  • Shreve (2005) Steven Shreve. 2005. Stochastic calculus for finance I: the binomial asset pricing model. Springer Science & Business Media.
  • Shreve (2004) Steven E. Shreve. 2004. Stochastic calculus for finance II: Continuous-time models. Vol. 11. Springer.
  • Smith (1976) Clifford W. Smith. 1976. Option pricing: A review. Journal of Financial Economics 3, 1-2 (1976), 3–51.
  • Snider and Banerjee (2010) Dale Snider and Sibashis Banerjee. 2010. Heterogeneous gas chemistry in the CPFD Eulerian–Lagrangian numerical scheme (ozone decomposition). Powder Technology 199, 1 (2010), 100–106.
  • Stampede (ede2) Stampede Stampede2. The Stampede2 supercomputing cluster. https://www.tacc.utexas.edu/systems/stampede2.
  • Stein and Stein (1991) Elias M. Stein and Jeremy C. Stein. 1991. Stock price distributions with stochastic volatility: an analytic approach. The review of financial studies 4, 4 (1991), 727–752.
  • Strikwerda (2004) John C. Strikwerda. 2004. Finite difference schemes and partial differential equations. SIAM.
  • Szilard (2004) Rudolph Szilard. 2004. Theories and applications of plate analysis: classical, numerical and engineering methods. Appl. Mech. Rev. 57, 6 (2004), B32–B33.
  • Taflove and Hagness (2005) Allen Taflove and Susan C. Hagness. 2005. Computational electrodynamics: the finite-difference time-domain method. Artech house.
  • Tang et al. (2011) Yuan Tang, Rezaul A. Chowdhury, Bradley C. Kuszmaul, Chi-Keung Luk, and Charles E. Leiserson. 2011. The pochoir stencil compiler. In Proceedings of the twenty-third annual ACM symposium on Parallelism in algorithms and architectures. 117–128.
  • Thijssen (2007) Jos Thijssen. 2007. Computational Physics. Cambridge University Press.
  • Thomas (2013) James W. Thomas. 2013. Numerical partial differential equations: finite difference methods. Vol. 22. Springer Science & Business Media.
  • Tsitsiklis and Van R. (2001) John N. Tsitsiklis and Benjamin Van R. 2001. Regression methods for pricing complex American-style options. IEEE Transactions on Neural Networks 12, 4 (2001), 694–703.
  • Van M. (1976) Pierre Van M. 1976. On optimal stopping and free boundary problems. Archive for Rational Mechanics and Analysis 60, 2 (1976), 101–148.
  • Van R. (2012) Ursula Van R. 2012. Numerical methods in computational electrodynamics: linear systems in practical applications. Vol. 12. Springer Science & Business Media.
  • Vese and Osher (2002) Luminita A. Vese and Stanley J. Osher. 2002. Numerical methods for p-harmonic flows and applications to image processing. SIAM J. Numer. Anal. 40, 6 (2002), 2085–2104.
  • Vesely (1994) Franz J. Vesely. 1994. Computational Physics. Springer.
  • Vijayaraghavan and Keith (1990) D. Vijayaraghavan and Theo Keith. 1990. An efficient, robust, and time accurate numerical scheme applied to a cavitation algorithm. (1990).
  • Weaver et al. (2013) Vincent M. Weaver, Dan Terpstra, Heike McCraw, Matt Johnson, Kiran Kasichayanula, James Ralph, John Nelson, Phil Mucci, Tushar Mohan, and Shirley Moore. 2013. PAPI 5: Measuring power, energy, and the cloud. In 2013 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS). IEEE. https://doi.org/10.1109/ispass.2013.6557155
  • Weickert (2001) Joachim Weickert. 2001. Applications of nonlinear diffusion in image processing and computer vision. Acta Math. Univ. Comenianae 70, 1 (2001), 33–50.
  • Wolf and Lam (1991) Michael E. Wolf and Monica S. Lam. 1991. A data locality optimizing algorithm. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation. 30–44.
  • Wolf et al. (1996) Michael E. Wolf, Dror E. Maydan, and Ding-Kai Chen. 1996. Combining loop transformations considering caches and scheduling. In Proceedings of the IEEE/ACM International Symposium on Microarchitecture. 274–286.
  • Wolfe (1987) Michael J. Wolfe. 1987. Iteration space tiling for memory hierarchies. Parallel Processing for Scientific Computing 357 (1987), 361.
  • Wonnacott (2002) David Wonnacott. 2002. Achieving scalable locality with time skewing. International Journal of Parallel Programming 30, 3 (2002), 181–221.
  • Yuan and Agrawal (2002) Lixia Yuan and Om P. Agrawal. 2002. A numerical scheme for dynamic systems containing fractional derivatives. Journal of Vibration and Acoustics 124, 2 (2002), 321–324.
  • Zhang et al. (2004) Jianfeng Zhang et al. 2004. A numerical scheme for BSDEs. Annals of Applied Probability 14, 1 (2004), 459–488.
  • Zhang et al. (2015) Jiyuan Zhang, Tze Meng Low, Qi Guo, and Franz Franchetti. 2015. A 3 d-stacked memory manycore stencil accelerator system. (2015).
  • Zhang et al. (2006) Peng Zhang, Sze Chun Wong, and Chi-Wang Shu. 2006. A weighted essentially non-oscillatory numerical scheme for a multi-class traffic flow model on an inhomogeneous highway. J. Comput. Phys. 212, 2 (2006), 739–756.
  • Zhu (2006) Song-Ping Zhu. 2006. A new analytical approximation formula for the optimal exercise boundary of American put options. International Journal of Theoretical and Applied Finance 9, 07 (2006), 1141–1177.
  • Zhu and He (2007) Song-Ping Zhu and Zhi-Wei He. 2007. Calculating the early exercise boundary of American put options with an approximation formula. International Journal of Theoretical and Applied Finance 10, 07 (2007), 1203–1227.
  • Zhylyevskyy (2010) Oleksandr Zhylyevskyy. 2010. A fast Fourier transform technique for pricing American options under stochastic volatility. Review of Derivatives Research 13, 1 (2010), 1–24.
  • Zubair and Mukkamala (2008) Mohammad Zubair and Ravi Mukkamala. 2008. High performance implementation of binomial option pricing. In Computational Science and Its Applications–ICCSA 2008: International Conference, Perugia, Italy, June 30–July 3, 2008, Proceedings, Part I 8. Springer, 852–866.

Appendix A American Call Option under Trinomial Option Pricing Model

A.1. Trinomial Option Pricing Model (TOPM)

Refer to caption
Figure 8. A trinomial tree

TOPM shares many properties with BOPM, which is natural as a trinomial tree (see Figure 8 has similar structure to that of a binomial tree. The node-value remains Xnode=S×uNuNdX_{node}=S\times u^{N_{u}-N_{d}} since the number of ”remain the same” moves does not factor into the price. However, the value of uu and dd is subtly different, as u=eV2Δtu=e^{V\sqrt{2\Delta t}}, and d=1/ud=1/u. We represent the exercise value very similarly, max(XnodeK,0)\text{max}(X_{node}-K,0).

For the trinomial value, we have now three dependencies, Xt,j,Xt,j+1,X_{t,j},X_{t,j+1}, and Xt,j+2X_{t,j+2}. Since we have three possible changes in value, we have three probabilities corresponding to moving up pup_{u}, moving down pdp_{d}, and remaining the same pop_{o}. Typically, the literature refers to the probability of remaining the same as pmp_{m}, but as since use mm to refer to the quantity eRte^{-Rt}, we use pop_{o} to avoid confusion. This lets us write the trinomial value as eRΔt(pdXt,j+poXt,j+1+puXt,j+2)e^{-R\Delta t}(p_{d}X_{t,j}+p_{o}X_{t,j+1}+p_{u}X_{t,j+2}), where pu=(e(RY)Δt/2dud)2p_{u}=\left(\frac{e^{(R-Y)\Delta t/2}-\sqrt{d}}{\sqrt{u}-\sqrt{d}}\right)^{2},
pd=(ue(RY)Δt/2ud)2p_{d}=\left(\frac{\sqrt{u}-e^{(R-Y)\Delta t/2}}{\sqrt{u}-\sqrt{d}}\right)^{2}, and po=1pupdp_{o}=1-p_{u}-p_{d}. The probabilities are alternate forms of those given in (Hull, 2003).

Let m=eRΔtm=e^{-R\Delta t}, s0=mpu,s1=mpo,s2=mpds_{0}=mp_{u},s_{1}=mp_{o},s_{2}=mp_{d}. We will reuse the notion of red and green cells to distinguish those with larger binomial and excercise values respectively.

Gi,jred={0,if i=Ts0Gi+1,j+s1Gi+1,j+1+s2Gi+1,j+2,0i<T\displaystyle G^{red}_{i,j}=\begin{cases}0,&\text{if }i=T\\ s_{0}G_{i+1,j}+s_{1}G_{i+1,j+1}+s_{2}G_{i+1,j+2},&0\leq i<T\end{cases}
Gi,jgreen=SujiK\displaystyle G^{green}_{i,j}=S\cdot u^{j-i}-K

A.2. Properties of the Red-Green Divider

Similarly to the binomial tree, we assume that the trinomial tree will be embedded in a (2T+2)×(T+1)(2T+2)\times(T+1) grid GG with the root at the bottom-left corner G[0,0]G[0,0] and the leaves in the top row G[T,0T]G[T,0...T]. The children of node G[i,j]G[i,j] are at G[i+1,j],G[i+1,j+1],G[i+1,j+2]G[i+1,j],G[i+1,j+1],G[i+1,j+2]. G[i+1,j]G[i+1,j] has a price change factor of dd, G[i+1,j+1]G[i+1,j+1] has the same price and G[i+1,j+2]G[i+1,j+2] has a price change factor of uu. The tree only takes up the upper left triangle, such that the diagonal has a slope of 1/2.

Lemma A.1 shows that within the upper-left triangular area of GG if a cell is green then the cell to its right must also be green.

Lemma A.1.

For i[0,T]i\in[0,T] and j[0,2i1]j\in[0,2i-1], (Gi,j=Gi,jgreen)(Gi,j+1=Gi,j+1green)\left(G_{i,j}=G^{green}_{i,j}\right)\implies\left(G_{i,j+1}=G^{green}_{i,j+1}\right).

Proof.

We will reuse the notions δi,j=Gi,jredGi,j,Δi,j=δi,j+1δi,j,δ~i,j=Gi,jGi,jgreen\delta_{i,j}=G_{i,j}^{red}-G_{i,j},\Delta_{i,j}=\delta_{i,j+1}-\delta_{i,j},\tilde{\delta}_{i,j}=G_{i,j}-G_{i,j}^{green} and Δ~i,j=δ~i,j+1δ~i,j\tilde{\Delta}_{i,j}=\tilde{\delta}_{i,j+1}-\tilde{\delta}_{i,j}. Therefore, it suffices again to show that Δi,j0\Delta_{i,j}\leq 0 for all 0iT0\leq i\leq T and j<2ij<2i. This can again be show through induction.

Consider the base case as i=Ti=T. δT,j=(SujTK)\delta_{T,j}=-\left(Su^{j-T}-K\right), which in turn means that ΔT,j=SujT(u1)0\Delta_{T,j}=-Su^{j-T}(u-1)\leq 0 since ugeq1ugeq1. So, the claim holds for i=Ti=T.

Assuming the claim holds for some i+1Ti+1\leq T, we will show that it holds for ii (Δi+1,j0\Delta_{i+1,j}\leq 0 for 0j<i+10\leq j<i+1). From this assumption, this implies the existence of a ji+1j_{i+1} such that Gi,j=Gi,jgreenG_{i,j}=G_{i,j}^{green} for j>ji+1j>j_{i+1} and Gi,j=Gi,jredG_{i,j}=G_{i,j}^{red} for jji+1j\leq j_{i+1}. This ji+1j_{i+1} is defined as the largest jj such that δi+1,j>0\delta_{i+1,j}>0, the last ”red” cell in row i+1i+1. Therefore:

  • -

    for j>ji+1j>j_{i+1}, Δ~i+1,j=δ~i+1,j+1δ~i+1,j=00=0\tilde{\Delta}_{i+1,j}=\tilde{\delta}_{i+1,j+1}-\tilde{\delta}_{i+1,j}=0-0=0;

  • -

    for j=ji+1j=j_{i+1}, Δ~i+1,j=δ~i+1,j+1δ~i+1,j=δ~i+1,j0\tilde{\Delta}_{i+1,j}=\tilde{\delta}_{i+1,j+1}-\tilde{\delta}_{i+1,j}=-\tilde{\delta}_{i+1,j}\leq 0; and

  • -

    for j<ji+1j<j_{i+1}, Δ~i+1,j=Δi+1,j0\tilde{\Delta}_{i+1,j}=\Delta_{i+1,j}\leq 0.

Now we examine:

Δi,j=\displaystyle\Delta_{i,j}= δi,j+1δi,j\displaystyle\delta_{i,j+1}-\delta_{i,j}
=\displaystyle= s0Gi+1,j+1+s1Gi+1,j+2+s2Gi+1,j+3Gi,j+1greens0Gi+1,j\displaystyle s_{0}G_{i+1,j+1}+s_{1}G_{i+1,j+2}+s_{2}G_{i+1,j+3}-G_{i,j+1}^{green}-s_{0}G_{i+1,j}
s1Gi+1,j+1s2Gi+1,j+2+Gi,jgreen\displaystyle-s_{1}G_{i+1,j+1}-s_{2}G_{i+1,j+2}+G_{i,j}^{green}
=\displaystyle= s0(Gi+1,j+1Gi+1,j+1greenGi+1,j+Gi+1,jgreen)+s1(Gi+1,j+2\displaystyle s_{0}(G_{i+1,j+1}-G_{i+1,j+1}^{green}-G_{i+1,j}+G_{i+1,j}^{green})+s_{1}(G_{i+1,j+2}
Gi+1,j+2greenGi+1,j+1+Gi+1,j+1green)+s2(Gi+1,j+3Gi+1,j+3green\displaystyle-G_{i+1,j+2}^{green}-G_{i+1,j+1}+G_{i+1,j+1}^{green})+s_{2}(G_{i+1,j+3}-G_{i+1,j+3}^{green}
Gi+1,j+2+Gi+1,j+2green)+s0(Gi+1,j+1greenGi+1,jgreen)+s1(Gi+1,j+2green\displaystyle-G_{i+1,j+2}+G_{i+1,j+2}^{green})+s_{0}(G_{i+1,j+1}^{green}-G_{i+1,j}^{green})+s_{1}(G_{i+1,j+2}^{green}
Gi+1,j+1green)+s2(Gi+1,j+3greenGi+1,j+2green)+Gi,jgreenGi,j+1green\displaystyle-G_{i+1,j+1}^{green})+s_{2}(G_{i+1,j+3}^{green}-G_{i+1,j+2}^{green})+G_{i,j}^{green}-G_{i,j+1}^{green}
=\displaystyle= s0δ~i+1,j+s1δ~i+1,j+1+s2δ~i+1,j+2+s0(Gi+1,j+1greenGi+1,jgreen)\displaystyle s_{0}\tilde{\delta}_{i+1,j}+s_{1}\tilde{\delta}_{i+1,j+1}+s_{2}\tilde{\delta}_{i+1,j+2}+s_{0}(G_{i+1,j+1}^{green}-G_{i+1,j}^{green})
+s1(Gi+1,j+2greenGi+1,j+1green)+s2(Gi+1,j+3greenGi+1,j+2green)+Gi,jgreen\displaystyle+s_{1}(G_{i+1,j+2}^{green}-G_{i+1,j+1}^{green})+s_{2}(G_{i+1,j+3}^{green}-G_{i+1,j+2}^{green})+G_{i,j}^{green}
Gi,j+1green\displaystyle-G_{i,j+1}^{green}

From the inductive assumption, we can assume that s0δ~i+1,j+s1δ~i+1,j+1+s2δ~i+1,j+20s_{0}\tilde{\delta}_{i+1,j}+s_{1}\tilde{\delta}_{i+1,j+1}+s_{2}\tilde{\delta}_{i+1,j+2}\leq 0. Therefore, to show that Δi,j0\Delta_{i,j}\leq 0, it suffices to show that the remaining terms are less than or equal to 0. A useful note is that since diagonals indicate no change with respect to the exercise value, Gi+1,j+2greenGi+1,j+1green=Gi,j+1greenGi,jgreenG_{i+1,j+2}^{green}-G_{i+1,j+1}^{green}=G_{i,j+1}^{green}-G_{i,j}^{green}, which lets us simplify the rest of this expression.

0\displaystyle 0\geq s0(Gi+1,j+1greenGi+1,jgreen)+(s11)(Gi+1,j+2greenGi+1,j+1green)\displaystyle s_{0}(G_{i+1,j+1}^{green}-G_{i+1,j}^{green})+(s_{1}-1)(G_{i+1,j+2}^{green}-G_{i+1,j+1}^{green})
+s2(Gi+1,j+3greenGi+1,j+2green)\displaystyle\quad+s_{2}(G_{i+1,j+3}^{green}-G_{i+1,j+2}^{green})
\displaystyle\geq s0(Suj+1i1KSuji1+K)+(s11)(Suj+2i1K\displaystyle s_{0}(Su^{j+1-i-1}-K-Su^{j-i-1}+K)+(s_{1}-1)(Su^{j+2-i-1}-K
Suj+1i1+K)\displaystyle\quad-Su^{j+1-i-1}+K)
+s2(Suj+3i1KSuj+2i1+K)\displaystyle\quad+s_{2}(Su^{j+3-i-1}-K-Su^{j+2-i-1}+K)
0\displaystyle 0\geq Suji1(s0(u1)+(s11)(u2u)+s2(u3u2))\displaystyle Su^{j-i-1}(s_{0}(u-1)+(s_{1}-1)(u^{2}-u)+s_{2}(u^{3}-u^{2}))

From here, we can start dividing out common terms so long as they are greater than 0.

0\displaystyle 0\geq s0(u1)+(s11)(u2u)+s2(u3u2)\displaystyle s_{0}(u-1)+(s_{1}-1)(u^{2}-u)+s_{2}(u^{3}-u^{2})
0\displaystyle 0\geq mpd(u1)+(mpo1)(u2u)+mpu(u3u2)\displaystyle mp_{d}(u-1)+(mp_{o}-1)(u^{2}-u)+mp_{u}(u^{3}-u^{2})
0\displaystyle 0\geq mpd(u1)+(mpo1)u(u1)+mpuu2(u1)\displaystyle mp_{d}(u-1)+(mp_{o}-1)u(u-1)+mp_{u}u^{2}(u-1)
0\displaystyle 0\geq mpd+(mmpdmpu1)u+mpuu2\displaystyle mp_{d}+(m-mp_{d}-mp_{u}-1)u+mp_{u}u^{2}
0\displaystyle 0\geq mpd(1u)+mpu(u2u)+(m1)u\displaystyle mp_{d}(1-u)+mp_{u}(u^{2}-u)+(m-1)u
0\displaystyle 0\geq m(u1)(upupd)+(m1)u\displaystyle m(u-1)(up_{u}-p_{d})+(m-1)u

We can transform the expression upupdup_{u}-p_{d}. We will let e(RY)Δt=ne^{(R-Y)\Delta t}=n to simplify.

upupd=\displaystyle up_{u}-p_{d}= u(ndud)2(unud)2\displaystyle u\left(\frac{\sqrt{n}-\sqrt{d}}{\sqrt{u}-\sqrt{d}}\right)^{2}-\left(\frac{\sqrt{u}-\sqrt{n}}{\sqrt{u}-\sqrt{d}}\right)^{2}
=\displaystyle= 1(ud)2((nudu)2(un)2)\displaystyle\frac{1}{(\sqrt{u}-\sqrt{d})^{2}}\left((\sqrt{nu}-\sqrt{du})^{2}-(\sqrt{u}-\sqrt{n})^{2}\right)
=\displaystyle= 1(ud)2(nu1u+n)\displaystyle\frac{1}{(\sqrt{u}-\sqrt{d})^{2}}\left(\sqrt{nu}-1-\sqrt{u}+\sqrt{n}\right)
(nu1+un)\displaystyle\quad\cdot\left(\sqrt{nu}-1+\sqrt{u}-\sqrt{n}\right)
=\displaystyle= 1(ud)2((n1)(u+1))((n+1)(u1))\displaystyle\frac{1}{(\sqrt{u}-\sqrt{d})^{2}}\left((\sqrt{n}-1)(\sqrt{u}+1)\right)\left((\sqrt{n}+1)(\sqrt{u}-1)\right)
=\displaystyle= (n1)(u1)(ud)2\displaystyle\frac{(n-1)(u-1)}{(\sqrt{u}-\sqrt{d})^{2}}

We can then substitute this back into the previous inequality: 0(n1)m(u1)2(ud)2+umu0\geq(n-1)m(u-1)^{2}(\sqrt{u}-\sqrt{d})^{-2}+um-u. Since (ud)2=(u+d2)1(u1)1(\sqrt{u}-\sqrt{d})^{-2}=(u+d-2)^{-1}\geq(u-1)^{-1}, we can make this substitution and if the resulting expression is less than or equal to 0, clearly the current expression as is will be less than or equal to 0 as well.

0\displaystyle 0\geq m(u1)(n1)+umu\displaystyle m(u-1)(n-1)+um-u
0\displaystyle 0\geq m(unnu+1)+umu\displaystyle m(un-n-u+1)+um-u
0\displaystyle 0\geq munmnmu+m+umu\displaystyle mun-mn-mu+m+um-u
u\displaystyle u\geq munmn+m\displaystyle mun-mn+m
u1\displaystyle u-1\geq (u1)eYΔt+eRΔt1\displaystyle(u-1)e^{-Y\Delta t}+e^{-R\Delta t}-1

Since eRΔt1e^{-R\Delta t}\leq 1, we get: u1(u1)eYΔt(u1)eYΔt+eRΔt1u-1\geq(u-1)e^{-Y\Delta t}\geq(u-1)e^{-Y\Delta t}+e^{-R\Delta t}-1 Which clearly holds since eYΔt1e^{-Y\Delta t}\leq 1.

By induction, Δi,j0\Delta_{i,j}\leq 0 for all 0iT0\leq i\leq T, 0j2i10\leq j\leq 2i-1, we prove Lemma A.1. ∎

Lemma A.2 shows that within the upper-left triangular area of GG if a cell is green then the cell below it must also be green.

Lemma A.2.

For i[0,T1]i\in[0,T-1] and j[0,2i]j\in[0,2i], (Gi+1,j=Gi+1,jgreen)\left(G_{i+1,j}=G^{green}_{i+1,j}\right) (Gi,j=Gi,jgreen)\implies\left(G_{i,j}=G^{green}_{i,j}\right).

Proof.

Assume that this is false. This would imply that for some i,j (Gi,j=Gi,jred)\left(G_{i,j}=G^{red}_{i,j}\right) when (Gi+1,j=Gi+1,jgreen)\left(G_{i+1,j}=G^{green}_{i+1,j}\right). So this would imply:

SujiKs0Gi+1,j+s1Gi+1,j+1+s2Gi+1,j+2Su^{j-i}-K\leq s_{0}G_{i+1,j}+s_{1}G_{i+1,j+1}+s_{2}G_{i+1,j+2}

From Lemma A.1, we know that all the Gi+1,k=Gi+1,kgreenG_{i+1,k}=G_{i+1,k}^{green} for all j<=k<=2i+1j<=k<=2i+1. Therefore, we can expand this to:

SujiK\displaystyle Su^{j-i}-K s0(Suji1K)+s1(SujiK)\displaystyle\leq s_{0}(Su^{j-i-1}-K)+s_{1}(Su^{j-i}-K)
+s2(Suji+1K)\displaystyle\quad+s_{2}(Su^{j-i+1}-K)
SujiK\displaystyle Su^{j-i}-K Suji1(s0+s1u+s2u2)\displaystyle\leq Su^{j-i-1}(s_{0}+s_{1}u+s_{2}u^{2})
K(s0+s1+s2)\displaystyle\quad-K(s_{0}+s_{1}+s_{2})
K(s0+s1+s21)\displaystyle K(s_{0}+s_{1}+s_{2}-1) Suji1(s0+s1uu+s2u2)\displaystyle\leq Su^{j-i-1}(s_{0}+s_{1}u-u+s_{2}u^{2})
Km(pd+po+pu1/m)\displaystyle Km(p_{d}+p_{o}+p_{u}-1/m) Suji1(s0+(s11)u+s2u2)\displaystyle\leq Su^{j-i-1}(s_{0}+(s_{1}-1)u+s_{2}u^{2})

Note that since pd+po+pu=1p_{d}+p_{o}+p_{u}=1, we know that the expression on the left side is strictly greater than 0, allowing us to make the following substitution:

0\displaystyle 0 <Suji1(s0+(s11)u+s2u2)\displaystyle<Su^{j-i-1}(s_{0}+(s_{1}-1)u+s_{2}u^{2})
0\displaystyle 0 <s0+(s11)u+s2u2\displaystyle<s_{0}+(s_{1}-1)u+s_{2}u^{2}

Since u>1u>1, so multiplying both sides by u1u-1 gives us this expression from Lemma A.1.

0<s0(u1)+(s11)(u2u)+s2(u3u2)0<s_{0}(u-1)+(s_{1}-1)(u^{2}-u)+s_{2}(u^{3}-u^{2})

In Lemma A.1, we showed this expression to be less than or equal to 0, which is a contradiction. Therefore, Lemma A.2 holds. ∎

Lemma A.3.

Gi,j1Gi,jG_{i,j-1}\leq G_{i,j} for any 0iT10\leq i\leq T-1 and 1j2i1\leq j\leq 2i

Proof.

This can be done exactly as is done in Lemma 2.5, by induction. We have the same base case, when i=Ti=T we have GT,j1GT,jG_{T,j-1}\leq G_{T,j} because Suj1TKSujTKSu^{j-1-T}-K\leq Su^{j-T}-K. We then can consider the general case of i=i=\ell, where all Ti>T\leq i>\ell are true.

(34) s0G,j1+s1G,j+s2G,j+1s0G,j+s1G,j+1+G,j+2(by induc. Hyp.)\displaystyle\begin{aligned} s_{0}G_{\ell,j-1}+s_{1}G_{\ell,j}+s_{2}G_{\ell,j+1}\leq s_{0}G_{\ell,j}+s_{1}G_{\ell,j+1}+G_{\ell,j+2}\\ \mbox{(by induc. Hyp.)}\end{aligned}
(35) Suj1KSujK(u1)\displaystyle Su^{j-1-\ell}-K\leq Su^{j-\ell}-K\quad\mbox{(}\because u\geq 1\mbox{)}

It implies that

G1,j1\displaystyle G_{\ell-1,j-1} =max(s0G,j1+s1G,j+s2G,j+1,Su2j1K)\displaystyle=\max(s_{0}G_{\ell,j-1}+s_{1}G_{\ell,j}+s_{2}G_{\ell,j+1},Su^{2j-1-\ell}-K)
max(s0G,j+s1G,j+1+s2G,j+2,Su2jK)\displaystyle\leq\max(s_{0}G_{\ell,j}+s_{1}G_{\ell,j+1}+s_{2}G_{\ell,j+2},Su^{2j-\ell}-K)
(by (34), and (35))
=G1,j\displaystyle=G_{\ell-1,j}

Lemma A.4.

Gi,jGi1,j1G_{i,j}\leq G_{i-1,j-1} for any 1iT1\leq i\leq T and 1j2i1\leq j\leq 2i

Proof.

This can also be proved by induction using the similar methods as Lemma 2.5. We have the same base case, when i=Ti=T we have GT1,j1GT,jG_{T-1,j-1}\leq G_{T,j}. If both are green, then since Suj1T+1K=SujTKSu^{j-1-T+1}-K=Su^{j-T}-K it holds. If GT1,j1G_{T-1,j-1} is green and GT,jG_{T,j} is red, then we know that Gi1,j10G_{i-1,j-1}\geq 0 since it is green and that GT,j=0G_{T,j}=0 because it is red. If both are red, then the same applies. We then can consider the general case of i=i=\ell, where if all <iT\ell<i\leq T hold.

(36) s0G+1,j+s1G+1,j+1+s2G+1,j+2s0G,j1+s1G,j+G,j+1(by induc. Hyp.)\displaystyle\begin{aligned} s_{0}G_{\ell+1,j}+s_{1}G_{\ell+1,j+1}+s_{2}G_{\ell+1,j+2}\leq s_{0}G_{\ell,j-1}+s_{1}G_{\ell,j}+G_{\ell,j+1}\\ \mbox{(by induc. Hyp.)}\end{aligned}
(37) SujKSuj1+1K(They are equal)\displaystyle Su^{j-\ell}-K\leq Su^{j-1-\ell+1}-K\quad\mbox{(}\because\mbox{They are equal)}

It implies that

G,j\displaystyle G_{\ell,j} =max(s0G+1,j+s1G+1,j+1+s2G+1,j+2,Su2jK)\displaystyle=\max(s_{0}G_{\ell+1,j}+s_{1}G_{\ell+1,j+1}+s_{2}G_{\ell+1,j+2},Su^{2j-\ell}-K)
max(s0G,j1+s1G,j+s2G,j+1,Su2j1+1K)\displaystyle\leq\max(s_{0}G_{\ell,j-1}+s_{1}G_{\ell,j}+s_{2}G_{\ell,j+1},Su^{2j-1-\ell+1}-K)
(by (36),(37))
=G1,j1\displaystyle=G_{\ell-1,j-1}

Lemma A.5.

For i[0,T1]i\in[0,T-1] and j[0,2i1]j\in[0,2i-1],
(Gi+1,j+1=Gi+1,j+1red)\left(G_{i+1,j+1}=\\ G_{i+1,j+1}^{red}\right) (Gi,j=Gi,jred)\Rightarrow\left(G_{i,j}=G_{i,j}^{red}\right).

Proof.

This follows directly from Lemma A.4. Recall Gi+1,j+1green=Gi,jgreen=SujiKG_{i+1,j+1}^{green}=G_{i,j}^{green}=Su^{j-i}-K. If Gi+1,j+1=Gi+1,j+1redGi+1,j+1green=Gi,jgreenG_{i+1,j+1}=G_{i+1,j+1}^{red}\geq G_{i+1,j+1}^{green}=G_{i,j}^{green} and Gi+1,j+1Gi,jG_{i+1,j+1}\leq G_{i,j}, then Gi,j=Gi,jredG_{i,j}=G_{i,j}^{red}. ∎

Corollary A.6.

For every i[0,T1]i\in[0,T-1], there exists an index ji[0,2i]j_{i}\in[0,2i] such that all cells Gi,jG_{i,j} with 0jji0\leq j\leq j_{i} are red and all (possibly zero) cells Gi,jG_{i,j} with ji<jij_{i}<j\leq i are green. Also, for i[0,T2]i\in[0,T-2], ji+11jiji+1j_{i+1}-1\leq j_{i}\leq j_{i+1}.

Proof.

Follows from Lemma A.1, Lemma A.2, and Lemma A.5. ∎

Refer to caption
Figure 9. Partitioning the solution space into trapezoids.
Refer to caption
Refer to caption
Refer to caption
Refer to caption
(a) BOPM
Refer to caption
(b) TOPM
Refer to caption
(c) BSM
Figure 10. Additional comparisons of energy consumption by domain, Package (pkg) and RAM.

A.3. Algorithm for American call option pricing under TOPM

The solution space, is a right-angle triangle with a vertical base of length TT and a horizontal base of length 2T2T. We prove the following theorem, which is an adaptation of the Binomial Model Algorithm.

Theorem A.7.

Our algorithm solves the American call option pricing problem under TOPM in 𝒪(Tlog2T){\mathcal{O}}\left({T\log^{2}T}\right) work and 𝒪(T){\mathcal{O}}\left({T}\right) span, where TT is the number of time steps.

Proof.

We use the exact same techniques as for BOPM. We again are using the properties of the boundary between the red and green cells to reduce the problem to a stencil-computation. The primary differences are:

  1. (1)

    In the usage of the FFT-based stencil algorithm of (Ahmad et al., 2021), our stencil uses a larger (but still constant sized) neighborhood of grid values in the computation. Specifically, we have a dependence on the values Gi+1,jGi+1,j+1,G_{i+1,j}G_{i+1,j+1}, and Gi+1,j+2G_{i+1,j+2} to compute Gi,jG_{i,j}, whereas in BOPM we depend only on Gi+1,jG_{i+1,j} and Gi+1,j+1G_{i+1,j+1}.

  2. (2)

    The width of the grid we are operating on is twice that of BOPM. This only contributes a constant factor to runtime in terms of the worst-case for the width of the trapezoids. This means that some claims regarding the fact that the solution space is a right isosceles triangle need to be slightly adjusted.

Partitioning the triangle into trapezoids.

Let abcabc be the right-angle triangle representing the solution space as in Figure 9. We can partition abcabc into a sequence of trapezoids just as with BOPM. We know the boundary at the row i=Ti=T, where SujTK>0Su^{j-T}-K>0, so we let 1\ell_{1} be the length from the aa to this boundary. Let dd be the point a distance 1/2\ell_{1}/2 downward from aa. We draw a horizontal through dd and get the point ee from the intersection of the horizontal with the sloped side of abcabc. This defines a trapezoid abdeabde. We then follow along the horizontal through dd to red-green divider to point q, to get a distance 2\ell_{2}, and repeat. We determine this boundary by solving the trapezoid corresponding to 1\ell_{1}. We continue repeating this process of partitioning until we are left with a triangle xycxyc of height T\sqrt{T}.

Once we have solved all of the trapezoids above xycxyc, we know that we can apply a naive O(T2)O(T^{2}) algorithm, which for xycxyc (of height T\sqrt{T}) will take a total of 𝒪(T){\mathcal{O}}\left({T}\right) to solve, which will give us the final value for i=0i=0. Solving the trapezoid of height i\ell_{i} will again take 𝒪(ilog2i){\mathcal{O}}\left({\ell_{i}\log^{2}\ell_{i}}\right) time, which gives a total runtime ψ\psi of ψ=(1ik𝒪(ilog2i))+𝒪(T)=𝒪(Tlog2T)\psi=\left(\sum_{1\leq i\leq k}{\mathcal{O}}\left({\ell_{i}\log^{2}\ell_{i}}\right)\right)+{\mathcal{O}}\left({T}\right)={\mathcal{O}}\left({T\log^{2}T}\right).

Solving a trapezoid

Solving a trapezoid is equivalent to solving the values of all red cells in the last row. Refer to figure 11. We are given the solutions to the last row in the trapezoid above, which makes this possible. Let us say this trapezoid has height \ell, meaning that the boundary at the top of the trapezoid is \ell away from the top left point aa, landing on a point qq. We again find the midpoint of aqaq, called pp and use this measure of /2\ell/2 to find a point below aa, rr. We again draw a horizontal line from rr, intersecting bcbc at vv. We compute all the red cells on dcdc (solving the trapezoid), in the same two steps:

  1. (1)

    Compute the cells on segment rvrv.

  2. (2)

    Using the cells of rvrv, compute the cells of dcdc.

Refer to caption
Figure 11. Decomposing a trapezoid into smaller trapezoids for trinomial.

Computing the values of all red cells on line segment rvrv

Let tt be the intersection of qdqd and rvrv. First, compute the cells of rtrt using the FFT-based stencil algorithm of (Ahmad et al., 2021), as all these cells are red and their values depend only on the red section of the top row, aqaq. We can then compute the second part, tvtv, by recursing on at the smaller trapezoid pbvtpbvt. Note that the height of pbvtpbvt is half that of abcdabcd.

Computing the values of all red cells on line segment dcdc

Once the values of rvrv have been calculated, we can use them to calculate the final row dcdc. From the calculation of rvrv, we know where the red-green boundary lies for that row. Let uu be that boundary point. We can take a line parallel to qdqd through uu to determine a point ww along dcdc. all of the points dwdw depend only on the values in ruru, which have already been solved. We can again use the FFT-based stencil algorithm from (Ahmad et al., 2021). The rest of dcdc, wcwc, can be recursed on. Take a vertical through ww, and note its intersection with rvrv as point ss. svcwsvcw forms a trapezoid that fits our specifications, which allows us to recurse on it to determine the rest of the red values along dcdc. Note: we do not need to calculate the green cells since they have a closed formula which can be evaluated at need. This gives us the recurrence ζ()=2ζ(/2)+𝒪(log)\zeta(\ell)=2\zeta(\lceil\ell/2\rceil)+{\mathcal{O}}\left({\ell\log\ell}\right), which solves to ζ()=log2\zeta(\ell)=\ell\log^{2}\ell, assuming the base case of =𝒪(1)\ell={\mathcal{O}}\left({1}\right) has time complexity 𝒪(1){\mathcal{O}}\left({1}\right), which we show next.

Base case.

The base case entails a trapezoid of 𝒪(1){\mathcal{O}}\left({1}\right) height. Such a trapezoid can have arbitrary width, but will have a constant number of red cells. Since we only wish to compute the red cells since the green can be computed at will in 𝒪(1){\mathcal{O}}\left({1}\right) time, we will use a 𝒪(1){\mathcal{O}}\left({1}\right) number of max operators to determine all the values of the red cells. Hence, the base case only takes 𝒪(1){\mathcal{O}}\left({1}\right) time.

What remains is to show the span. The span of computing one trapezoid of height \ell depends on the time to compute the subtrapezoids and the time to compute the two stencils rtrt and dwdw. The FFT-based stencil algorithm (Ahmad et al., 2021) runs in 𝒪(logloglog){\mathcal{O}}\left({\log\ell\log\log\ell}\right) span. Let ζ()\zeta_{\infty}(\ell) be the span for a single trapezoid of height \ell.

ζ()=2ζ(/2)+𝒪(logloglog)\zeta_{\infty}(\ell)=2\zeta_{\infty}(\lceil\ell/2\rceil)+{\mathcal{O}}\left({\log\ell\log\log\ell}\right)

This solves to 𝒪(){\mathcal{O}}\left({\ell}\right). Let Ψ\Psi_{\infty} be the total span of the algorithm. Ψ=𝒪(1)+𝒪(2)+.𝒪(k)+𝒪(T)=𝒪(T)\Psi={\mathcal{O}}\left({\ell_{1}}\right)+{\mathcal{O}}\left({\ell_{2}}\right)+....{\mathcal{O}}\left({\ell_{k}}\right)+{\mathcal{O}}\left({\sqrt{T}}\right)={\mathcal{O}}\left({T}\right). ∎

Appendix B Energy Comsumption

We give additional plots of energy consumption in Fig. 10. In addition to outperforming existing approaches in terms of total energy consumption, the reduced energy consumption is pronounced when restricted to specific domains. Here we give the values for the package domain (pkg) and the RAM domain as described in Section 5.2 of the main body. For both measures across all algorithms our algorithm outperforms vanilla and standard implementations after instances of size 2132^{1}3. Our algorithm for the Black-Scholes-Merton Model was even more competitive, outperforming on instances of all sizes.