1
Fast American Option Pricing using Nonlinear Stencils
Abstract.
We study the binomial, trinomial, and Black-Scholes-Merton models of option pricing. We present fast parallel discrete-time finite-difference algorithms for American call option pricing under the binomial and trinomial models and American put option pricing under the Black-Scholes-Merton model. For -step finite differences, each algorithm runs in time under a greedy scheduler on processing cores, which is a significant improvement over the time taken by the corresponding state-of-the-art parallel algorithm. Even when run on a single core, the time taken by our algorithms is asymptotically much smaller than the running time of the fastest known serial algorithms. Implementations of our algorithms significantly outperform the fastest implementations of existing algorithms in practice, e.g., when run for steps on a 48-core machine, our algorithm for the binomial model runs at least faster than the fastest existing parallel program for the same model with the speed-up factor gradually reaching beyond for . It saves more than 80% energy when , and more than 99% energy for .
Our option pricing algorithms can be viewed as solving a class of nonlinear 1D stencil (i.e., finite-difference) computation problems efficiently using the Fast Fourier Transform (FFT). To our knowledge, ours are the first algorithms to handle such stencils in time. These contributions are of independent interest as stencil computations have a wide range of applications beyond quantitative finance.
1. Introduction
Option pricing or computing the value of a contract giving one the right to buy/sell an asset under some given constraints is one of the most important computational problems in quantitative finance (Ames, 2014). Rapid changes in financial markets often lead to rapid changes in asset prices which makes the ability to quickly estimate option prices essential in avoiding potential financial losses (Peter and Packer, 2007).
An option is a two-party financial contract that gives one party (called the holder) the right (but not an obligation) to buy/sell (i.e., exercise) an asset from/to the other party (called the writer) at a fixed price (called the strike/exercise price) on or before an expiration date (called the exercise/maturity date). A call option gives the right to buy whereas a put option gives the right to sell. Also, based on the expiration date and the settlement rule, there are two major styles of options: European and American. A European option can only be exercised at the expiration date while an American option can be exercised at any time before that.
The option pricing problem asks for assigning a value or price to an options contract based on the calculated probability that the contract will be exercised at expiration. The theoretical value of an option (Merton, 1973; Smith, 1976; Bensoussan, 1984; Galai and Masulis, 1976; Black and Scholes, 1973; Kumar et al., 2012) is determined by its stock price (i.e., its current market price), strike price , risk-free rate of return (i.e., the theoretical rate of return assuming zero risk), dividend yield (i.e., a ratio that shows how much dividend/year is paid relative to ), volatility (i.e., how much the trading price varies over time), and time to expiration (e.g., in days).
Symbol | Meaning | Symbol | Meaning |
---|---|---|---|
stock price | strike price | ||
risk-free rate of return | volatility | ||
dividend yield | time to expire | ||
number of time steps | (in days) |
The earliest option pricing model (Bachelier, 1900) was based on the assumption of the geometric Brownian motion for asset pricing and the no-arbitrage idea. Many improved models were developed later (Samuelson et al., 2011; Black and Scholes, 1973; Merton, 1973; Derman and Kani, 1998; Dupire, 1994; Stein and Stein, 1991; Heston, 1993; Ball and Roma, 1994; Bergomi, 2015; Merton, 1976; Kou, 2002; Bakshi and Madan, 2000; Duffie et al., 2000; Dempster and Hong, 2002; Matsuda, 2004; Runggaldier, 2003; Madan et al., 1998; Geman et al., 2001; Jackson et al., 2008; Derman and Kani, 1994; Dupire, 1994).
Analytical solutions to the option pricing problem are sometimes available, particularly for European options (Shreve, 2004, 2005; Hull, 2003; Prathumwan and Trachoo, 2020). But they are not available for American options except for a limited number of cases with significant constraints (e.g., American call options with zero or one dividend (Shreve, 2004) and perpetual American put options (MacKean, 1965; Shreve, 2004)). This difficulty in finding closed-form analytical solutions for most option pricing problems makes computational approaches the only path forward. The main computational approaches to solving the option pricing problem include the binomial tree method (Karatzas and Shreve, 1998), the finite difference method (Thomas, 2013; Sewell, 2005; Holmes, 2007; LeVeque, 2007; Ames, 2014; Strikwerda, 2004), and the Monte Carlo method (Duan and Simonato, 1998; Longstaff and Schwartz, 2001; Tsitsiklis and Van R., 2001).
The binomial tree method works by tracing the option’s value at discrete time steps over the life of the option. For a given number of time steps between the valuation and expiration dates of the option, a binomial tree of height is created with the leaves storing the potential prices of the asset at the time of expiration. Then one works backward to compute for each the value of the nodes at depth of the tree (each giving a possible price at time step ) from the values of the nodes at depth using a simple formula. The value computed for the root node is the required option value. Straightforward iterative implementation of this method runs in time on a single processing core and time on cores (see Figure 1 and Table 2). It provides a discrete-time approximation of the continuous-time option pricing in the Black–Scholes model and is widely used by professional option traders.
BOPM-American-Call (1) , , , , , (2) for to do (3) for downto do parallel for to do (4) return
The trinomial tree method extends the binomial method by accounting for the possibility that an asset value remains the same after a time step (Boyle, 1986). With only a constant factor increase in run-time it provides more precise predictions than the binomial model.
The finite-difference method approximates the continuous-time differential equations describing the evolution of an option price over time by a set of discrete-time difference equations and then solves them iteratively under appropriate boundary conditions. The explicit finite difference method divides the lifetime of the option into discrete time steps and then uses the potential values of the asset at time step (the time of expiration) to compute the asset values at each time step from the asset values at time step based on the difference equations (i.e., update equations or stencils). The final option value is found at time . Similar to the binomial tree method, the iterative implementation of this method runs in time on a single processing core and time on cores. Other finite difference methods used for option pricing include implicit finite difference and the Crank–Nicolson method (Crank and Nicolson, 1947).
The Monte Carlo method works by generating random backward paths the asset price may follow starting from the time of expiration and ending at the time of valuation. Each of these paths leads to a payoff value for the option and the average of these payoff values can be viewed as an expected value of the option. This method is used for pricing options with complicated features and/or multiple sources of uncertainty that other methods (analytical, tree-based, finite difference) cannot handle (Duan and Simonato, 1998), but is usually not competitive when those methods apply as the convergence rate of Monte Carlo method is sublinear (James, 1980; Metropolis and Ulam, 1949). They are hard to develop for some options, such as Black-Scholes Model for the American put option, but still many results exist on Monte Carlo methods (Ibanez and Zapatero, 2004; Longstaff and Schwartz, 2001; Tsitsiklis and Van R., 2001; Boyle, 1977).
Our Contributions. We present three shared-memory parallel algorithms for American option pricing – call option under the binomial and trinomial option pricing models and put option under the Black-Scholes-Merton model. All three algorithms run in time on processing cores which is a significant improvement over the time taken by the state-of-the-art parallel algorithms, where is the number of time steps. When run on a single processing core they run in time compared to the time taken by the fastest existing serial algorithms. We use the Fast Fourier Transform (FFT) to speed up the computation. Table 2 summarizes the results.
We use the work-span model (Cormen et al., 2009) to analyze the performance of parallel programs. Let be the running time on a -processor machine under a greedy scheduler. Then and are called work and span, respectively. The parallel running time .
The following proposition, which follows easily from the complexities given in Table 2, notes that each of our parallel algorithms runs asymptotically faster than the corresponding fastest existing parallel algorithm for every value of .
Proposition 1.1.
Let be the running time of any existing algorithm from Table 2 on cores, and let denote the same for our algorithm. Then for every (positive) value of under a greedy scheduler: .
We have implemented our algorithms and compared their running times, energy consumption, and cache performance with those of the option pricing implementations available in the Par-bin-ops framework (Brunelle, 2022) developed recently in 2022. Implementations of our algorithms run orders of magnitude faster, consume significantly less energy, and usually incur far fewer L1 cache misses than those implementations.
How Our Algorithms Differ from Existing FFT-based Option Pricing Algorithms. FFTs have been used for European option pricing before. European option pricing is simpler than American option pricing, e.g., the European version of the American option pricing algorithm shown in Figure 1 can be obtained by replacing the assignment in Step 3 with the simpler assignment . The absence of the ‘max’ operator in this assignment makes an efficient evaluation of the doubly-nested loop in Step 3 easier.
Black, Scholes, and Merton (Black and Scholes, 1973; Merton, 1973) showed that the European option can be calculated using a Parabolic PDE with infinite domain constraint. By using the Fourier transform, one gets an integral form for European options. To obtain the numerical value from the integral form, one uses numerical integration (Hildebrand, 1987; Burden et al., 2015) which can be sped up using FFTs.
There are also approximation results (Chang et al., 2007; Oliveira, 2014; Zhylyevskyy, 2010; Lord et al., 2008) based on repeated Richardson extrapolation (Richardson, 1911; Richardson and Gaunt, 1927) and FFT for numerical integration in American options. However, even if the extrapolation is repeated only for a constant number of times for an option that expires in days, the approximation takes time when grid points are used to discretize the price of the underlying asset and exercise points are placed with every pair of consecutive exercise points being days apart. Observing that corresponds to the number of discrete time steps in the finite difference formulation of the problem, we can rewrite the complexity as . Usually, is used in practice (Oliveira, 2014), which reduces the complexity to .
A major weakness of the existing FFT-based numerical integration approach above is that a closed-form expression for the characteristic function of the log-price must be known for the technique to work. However, our approach does not need to know such a closed-form expression as we apply FFT to speed up stencil/finite-difference computations and not numerical integration. Thus, our approach will work on a larger set of option pricing problems. We are not restricted to infinite-domain problems either (Cont and Voltchkova, 2005).
Implications for Nonlinear Stencil Computations. Our option pricing algorithms can be viewed as solving a class of nonlinear 1D stencil computation problems asking to evolve a grid of size for time steps, in work (i.e., serial time) or parallel time on processing cores. As stencil algorithms, they are of independent interest.
A stencil is a pattern (equation) used to update the values of cells in a spatial grid and evolve the grid over a given number of time steps. The process of evolving cell values in the spatial grid according to a stencil is called a stencil computation (Frigo and Strumpen, 2005). The finite-difference method performs a stencil computation with an update equation derived from the differential equations used as a stencil. Stencils are widely used in various fields, including mechanical engineering (Paoli and Schatzman, 2002; Zhang et al., 2006; Renson et al., 2016; Szilard, 2004; Rappaz et al., 2010), meteorology (Murray and Simmonds, 1991; Johnson, 2010; Robert, 1981, 1982; Avissar and Pielke, 1989; Kalnay et al., 1990), stochastic and fractional differential equations (Zhang et al., 2004; Yuan and Agrawal, 2002; Lord and Rougemont, 2004), chemistry (Najm et al., 1998; Snider and Banerjee, 2010; Long et al., 2008; Aubin et al., 2004; Han and Wang, 2015), electromagnetics (Komissarov, 2002; Taflove and Hagness, 2005; Van R., 2012; Atangana and Nieto, 2015), finance (Chen and Forsyth, 2008), and physics (Gammie et al., 2003; Vijayaraghavan and Keith, 1990; Mangeney et al., 2002; Hirouchi et al., 2009; Cundall and Hart, 1992; Pang, 1999; Barth and Deconinck, 2013; Vesely, 1994; Thijssen, 2007), image processing (Weickert, 2001; Peyré, 2011; Vese and Osher, 2002).
Standard stencil algorithms perform work to evolve a grid of size for time steps, they include looping algorithms, tiled looping algorithms (Bandishti et al., 2012; Bondhugula et al., 2017; Wolfe, 1987; Wolf and Lam, 1991; Wolf et al., 1996; Wonnacott, 2002; Bondhugula et al., 2016; Andonov and Rajopadhye, 1997; Högstedt et al., 1999; Zhang et al., 2015; Malas et al., 2014, 2015), and recursive divide-and-conquer algorithms (Frigo and Strumpen, 2005, 2009; Tang et al., 2011; Palamadai Natarajan et al., 2017; Sato et al., 2010; Krishnamoorthy et al., 2007).
A stencil is called linear if it computes the value of a cell at time step as a fixed linear combination of cell values at time steps before , otherwise it is called nonlinear. For 1D linear stencils Ahmad et al. (Ahmad et al., 2021) provide FFT-based algorithms that take serial time for periodic grids and serial time for aperiodic grids, assuming that the input grid is of size .
The stencils we encounter in our current work are nonlinear because they do not use a linear combination of cell values from prior time steps for updating a target cell, provided that the resulting value is smaller than the value of a function computed solely based on the spatial coordinates of the target cell and other option pricing parameters (e.g., see Step 3 of Figure 1). Such a stencil divides the space-time grid into two disjoint regions – in one region only the linear combination applies, while in the other only the function value applies. However, the problem is that the boundary between these two regions is not known ahead of time and the location of the boundary may move as the time step progresses. As a result, Ahmad et al.’s (Ahmad et al., 2021) results for linear stencils do not apply. To the best of our knowledge, ours are the first algorithms for handling such stencils running in time subquadratic in .


2. American Call Option under the Binomial Option Pricing Model
2.1. Binomial Option Pricing Model (BOPM)
BOPM (Sharpe, 1978; Cox et al., 1979; Rendleman, 1979) is a simple discrete-time option pricing model without using advanced mathematical tools. It is a paradigm of practice.
BOPM encodes the various sequences of prices the asset might take as paths in a binomial tree. Each node in the tree represents a possible price at a certain time and the nodes at two successive layers in the tree represent prices at times that are apart by some fixed time step . The prices increase or decrease by some factor after every time. Figure 2 gives an example of a 3-time-step binomial price tree that is produced by moving from the valuation day to the expiration day. Denote the initial price by . The price in the next time step (i.e., after time) can go up to or go down to , where the up factor and the down factor are determined by and volatility .
Denote the node value as , where and are the numbers of ticks up and down, respectively. The final nodes of the tree represent the prices on the expiration date. Given the strike price of , the price one can call or put before the contract expires, i.e., the exercise value of each node will be for a call option and for a put option.
The risk-neutral valuation of the binomial value is performed iteratively backward. Under the assumption of risk neutrality, the value of the option today is its expected future payoff discounted at the risk-free rate of . Let us number the nodes in each layer of Figure 2 from top to bottom by successive integers starting from 1. Then the node values and of a layer representing some time can be used to compute the binomial value of the -th node in the layer representing time as follows: , where, (Hull, 2003). Denote , , and . Then the binomial value of that node is: . For options on stocks paying a continuous dividend yield , .
The value at each node will be equal to its binomial value for European options and the larger of its binomial value and exercise value for American options.
The binomial tree of time steps can be embedded in a grid. Figure 2 shows an example.
Definition 2.1.
Let be the grid value in row and column of the grid . Let , and let if , and otherwise. Then
We say that cell of is red provided , and green otherwise. We show in Section 2.2 that all red cells in form a single contiguous region and all green cells form another. A single boundary divides the two regions. We analyze the properties of this red-green divider in Section 2.1 which we will exploit to design an efficient algorithm for American call options in Section 2.3.
2.2. Properties of the Red-Green Divider
As shown in the example in Figure 2, we assume that a binomial tree for time steps is embedded in a grid with the root at the bottom-left corner and the leaves in the top row . For , the two children of the binomial tree node at are stored at and . The arrow from to represents a price change factor of while the one from to represents a price change factor of . So, the entire tree occupies only the upper-left triangular part of the grid.
Lemma 2.2 shows that within the upper-left triangle of , if a cell is green then the cell to its right is also green.
Lemma 2.2.
for and , .
Proof.
Observe that .
Let , , and .
We will use mathematical induction to show that for all and .
As and thus , the claim holds for .
Now suppose that the claim holds for some given , i.e., for . Since , there exists a such that when and when , where is the largest such that . Then,
-
for , ;
-
for , ; and
-
for , .
Thus, for all .
Hence,
Therefore, for all and .
Because , there exists a such that when , and when , , where is the largest such that . Now if , we have , and thus , which implies that . ∎
Lemma 2.3.
Lemma 2.4 shows that within the upper-left triangular area of if a cell is green then the cell below it must also be green, and Lemma 2.6 shows that if a cell is red then the cell diagonally left below it must also be red.
Lemma 2.4.
for and .
Lemma 2.5.
For any and : .
Proof.
We use mathematical induction. For , we have our base case: .
Suppose it holds true for all for some given and . Then . ∎
Lemma 2.6.
for and .
The following corollary says that at every time step all red cells appear to the left of all green cells, and the boundary between the green and the red regions either remains the same or moves by one cell towards the left every time step.
Corollary 2.7.
For every , there exists an index such that all cells with are red and all (possibly zero) cells with are green. Also, for , .
2.3. Algorithm for American call option pricing under BOPM


The solution space is a right-angle isosceles triangle with base length . We know the boundary between the red and green cells in the first row of the triangle (solution space); however, we do not know the locus of the boundary in the subsequent rows of the triangle. We compute the boundary in the following process.
We partition the triangle (solution space) into trapezoids (see Figure 3(a)). We compute the first trapezoid with its first row as the same first row of the triangle (the solution space) and solve this newly created trapezoid (we explain how we create a trapezoid and solve it later in this section). Then we compute the second trapezoid with its first row as the last row of the first trapezoid and solve the second trapezoid. This process continues until we are left with a right-angle isosceles triangle with base size at most . We solve this triangle iteratively by doing quadratic work in time . We describe the process in detail below.
Partitioning the Triangle into Trapezoids. Let be a right-angle isosceles triangle with base length (see Figure 3(a)). Let be the number of red cells on the line segment . They will appear consecutively from to some point . Let be the point in the line segment that is distance from . Draw a horizontal line from ; let the line intersect at point . Therefore, we get a trapezoid with height with red cells in its first row.
We solve trapezoid , which means that we compute the values of all red cells in its last row and thus find the boundary point between red and green cells in . Let . Let be the point on that . We draw a horizontal line , and get the second trapezoid of height . The last row of the trapezoid becomes the first row of .
We solve the trapezoid and all subsequent trapezoids created following the approach described above until we are left with a right-angle isosceles triangle with base length at most . We compute all cell values of iteratively in time.
Solving a Trapezoid. Solving a trapezoid means given all red cell values in its first row computing all red cell values in its last row. Let be a trapezoid of height with red cells from to in its first row (Figure 3(b)). Let be the point on such that . Draw a horizontal line from intersecting at . To compute the values of red cells on , we compute all red cells on , and using the cell values on , compute all red cells on .
(1) Computing all red cells on .
Let and intersect . We compute the cell values on line segment using the FFT-based stencil algorithm of (Ahmad et al., 2021) which are guaranteed to be red because of Corollary 2.7. Note that . The newly created trapezoid has height , and there are red cells in its first row. We solve trapezoid recursively similar to trapezoid , and thus compute all red cell values on .
(2) Computing all red cells on using the red cells on .
Let point be on the boundary between red and green cells on . Draw a line from parallel to intersecting at . Next, draw a line through perpendicular to intersecting at point . We compute all red cells on in exactly the same way we computed the red cells on above. We first use the FFT-based stencil algorithm of (Ahmad et al., 2021) to find all cell values on , and then recursively solve trapezoid of height to find all red cell values on .
Solving trapezoids of height (base case).
We solve trapezoids of height in work and span using the naïve looping code (e.g., see the pseudocode in Figure 1).
The following theorem gives the work and span of our algorithm.
Theorem 2.8.
Our algorithm solves the American call option pricing problem under BOPM in work and span, where is the number of time steps.
Proof.
Let and be the work and span, respectively, for solving a trapezoid of height recursively. We call the -work FFT-based periodic algorithm (Ahmad et al., 2021) twice and solve two smaller trapezoids of height each recursively. Hence, . Observe that although the two smaller trapezoids must be solved one after the other (e.g., after in Figure 3(b)), each of them can be solved in parallel with the FFT-based algorithm (of span (Ahmad et al., 2021)) called to find the red cells on its left (on line segments and , respectively). Hence, .
Suppose that we solve trapezoids of heights , respectively, using the above process as shown in Figure 3(a). Let and be the total work and span, respectively, for solving those trapezoids followed by the time needed to solve the leftover triangle of height . Then . Since those trapezoids and the triangle are solved in sequence, we have . ∎
3. American Call Option under the Trinomial Option Pricing Model
The trinomial options pricing model (TOPM) encodes the possible sequences of prices for a given asset within the structure of a trinomial tree (see Figure 2). It expands on BOPM by allowing the value of an asset to remain unchanged after a given time step. TOPM was introduced by Boyle (Boyle, 1986), and while it is less popular than the BOPM, it provides for the possibility of more accurate predictions than BOPM at the cost of only a constant factor blowup in runtime when using naïve methods. Langat, Mwaniki, and Kiprop showed that TOPM converges to the same solution as Black-Scholes with half as many time steps (Langat et al., 2019). TOPM is also equivalent to the explicit finite difference method (Hull, 2003).
TOPM carries over many properties from BOPM, e.g., since the number of “remain the same” moves does not factor into the price. Here , and . The exercise value in TOPM is .
The transition probabilities can be expressed as
,
, and , which are alternate forms of those given in (Hull, 2003).
Let , . Let denote the grid value in the row and column of the grid . Let , and let if , and otherwise. Then similar to BOPM:
In the supplementary material, we show that the TOPM grid shows properties similar to the BOPM grid, that is, in every row all red cells appear first in contiguous locations followed by all green cells, and with every time step the red-green boundary moves by at most one cell to the left. This allows us to use a similar algorithm to that given for BOPM (Section 2.3) with the identical work and span.
4. American Put Option Under the Black-Scholes-Merton Model
4.1. Black-Scholes-Merton Pricing Model (BSM)
BSM is a mathematical method to calculate the theoretical value of an option contract. The option pricing problem is transformed into a partial differential equation (PDE) with variable coefficients. An explicit formula for the price can be obtained assuming a log-normal distribution of the asset price. Note that the limit of the discrete-time BOPM approximates the continuous-time BSM under the same assumption. While BOPM utilizes simple statistical methods, BSM requires a solution of a stochastic differential equation.
Denote stock price at time by . BSM claims that
there is a deterministic relation between the option price and the stock price and time. This means that there is a deterministic function for option price and time such that: , where represents the value of the option at time . Now BSM derives that satisfies the following two-area classical form:
(6) |
where , , and for .
It is equivalent to satisfying the following:
(12) |
where, , Recall that at the maturity time , the option price (call option case) will be .
It also means that on the boundary. Therefore, the goal becomes solving Equation (6) or Inequality (12). For more details on how the BSM model is formulated or why the complementary form is equivalent to the classical form, see (Chen and Chadam, 2007; Shreve, 2004; Van M., 1976). We show the two areas are contiguous and find properties of their boundary in Section 4.2, which can be exploited by our algorithm in Section 4.3.
4.2. Properties of the Two-Area Boundary
Note that Equation (6) includes dimensional variables. First, we find nondimensionalized forms of Equations (6) and (12).
Let , , , and . Then: .
Applying Equation (6), and satisfes the following:
(17) |
It also satisfies the dimensionless complementary form after plugging into Inequality (12):
(22) |
where, and . Consider an approximation of and use the following finite-difference approximations (where, denotes ):
Now plug it into Equation (17) to obtain these two regions:
(28) |
where, and for all integer . It also satisfies the following condition by discretizing Equation (22):
(33) |
Similar to the BOPM for American option, we define the green zone and the red zone:
Definition 4.1.
We call that is in the green zone when , otherwise it is in the red zone.
To apply Equations (28)–(33), we need to determine the form of which has no analytical form, although it has asymptotic results (Alobaidi and Mallier, 2001) or approximation results (Zhu and He, 2007; Zhu, 2006). Instead we can use the following theorem from (Chen and Chadam, 2007):
Theorem 4.2 ((Chen and Chadam, 2007)).
The early exercise boundary curve is monotonically decreasing.
Theorem 4.3.
Let , , , and be the largest integer such that . Then when .
Proof.
We first prove that . Suppose that this is not true. Then we will have: , which contradicts Theorem 4.2.
Now we prove that . Because is in the green zone, we will have the following:
Considering , we first observe that:
Now we show that is in the green zone. Suppose that it is in the red zone. Then:
which leads to a contradiction because should be . By the definition of , we must have , completing the proof of the theorem. ∎
4.3. Algorithm for American put option pricing under BSM


Our algorithm for the American put option under BSM is similar to our algorithm for the American call option under BOPM as described in Section 2.
Observe that we will have to perform a nonlinear stencil computation based on the update equation (28). For time steps we use a space-time grid with the time dimension being and spatial dimension . According to Equation (28), we compute a cell of that grid, where represents the time coordinate and the spatial coordinate, from cells , , and using a 3-point stencil provided . Otherwise, we set it to . In the first case, cell will be in the red zone, and in the second case it will be in the green zone. As explained in Section 4.2, the entire boundary between these two zones in not known ahead of time, but it moves by at most one cell toward the green region with every time step. The goal of the algorithm is to compute the value of the central cell of the spatial dimension at time step (e.g., apex of the isosceles triangle in Figure 4(b)).
We solve the problem by decomposing the isosceles triangle into a sequence of the isosceles trapezoids of geometrically decreasing heights (see Figure 4(b)) and solving (i.e., find the cell values of top base given those of the bottom base of the trapezoid) them one by one from bottom to top until we reach a leftover triangle of small constant size which we solve naïvely to find the value of . We solve an isosceles trapezoid recursively by decomposing it into two smaller trapezoids of smaller height and solving them recursively and also using the FFT-based algorithm from (Ahmad et al., 2021) solve two subtrapezoids that are entirely composed of red cells (see Figure 4(a)). Details of this algorithm are given below.
We first show how to solve an isosceles trapezoid (as shown in Figure 4(a)) of height , bottom/longer base () length , and . Thus, the top/shorter base is of length . Solving trapezoid means computing the values of the cell at the top base given the values of the cells at the bottom base .
If , we naïvely solve and identify the location of the red-green boundary point in in time. If , we find the row at height and calculate all the cells in it. To do so, we recursively solve the trapezoid which is found as follows:
1. Let be the point on that lies in the green region, but is in the red region. Identify the points and to the left and right of , respectively, such that .
2. Construct an isosceles trapezoid with base , height and top such that . Thus, .
After solving the trapezoid , we have the cell values in and the location of the red-green boundary point in . We can easily calculate the values of cells in since those values are independent of time and depend only on spatial coordinates. Finally, we use the -based algorithm of (Ahmad et al., 2021) to solve the trapezoid , where the point is found by forcing to be an isosceles trapezoid.
Therefore, we can get the values of the cells in and thus the values of all the cells in . Then we can calculate the values of the cells in given the values of the cells in exactly the same way as we computed the cell values in from those in .
Now, let us go back to Figure 4(b) to see how to compute the value of the apex . After solving as above to calculate the cells in , if , we naïvely calculate the value of , which takes time. But if , we recursively apply our trapezoid algorithm to solve a smaller trapezoid with the bottom base .
Theorem 4.4.
Our algorithm solves the American put option pricing problem under BSM in work and span, where is the number of time steps.
Proof.
The proof is very similar to that of Theorem 2.8. Let and be the work and span, respectively, for solving a trapezoid of height (see Figure 4(a)). We recursively solve two trapezoids of height each in sequence but use a parallel FFT-based algorithm (Ahmad et al., 2021) on each half, both size . Since the FFT-based algorithm performs work in span, we can write: if and otherwise. Similarly, if and otherwise. Solving, and .
Now, let and be the work and the span, respectively, of solving an isosceles triangle of base size (see Figure 4(b)). Then if and otherwise. Also, if and otherwise. Solving, and .∎












5. Experimental Results
In this section, we present an experimental evaluation of our algorithms and compare them with the fastest existing solutions. Our experimental setup is shown in Table 3. The legends used in our plots are listed in Table 4, which are described in more detail in the next few paragraphs.
Processor | Intel Xeon Platinum 8160 (Skylake / SKX) |
Cores | 24 cores per socket, 2 sockets (total: 48 cores) |
Cache sizes | L1 32 KB / core, L2 1 MB / core, L3 33 MB / socket |
Memory | 144GB /tmp partition on a 200GB SSD |
Compiler | Intel C++ Compiler (ICC) v18.0.2 |
Compiler flags | -O3 -xhost -ansi-alias -ipo -AVX512 |
Parallelization | OpenMP 5.0 |
Thread affinity | GOMP_CPU_AFFINITY |
Legend | Meaning |
---|---|
fft-bopm, fft-topm, fft-bsm | our FFT-based implementations for BOPM, TOPM, and BSM, respectively |
ql-bopm, zb-bopm | BOPM implementations from Par-bin-ops based on QuantLib and Zubaer et al.’s work, respectively |
vanilla-topm, vanilla-bsm | our parallel looping implementations for TOPM and BSM, respectively |
Benchmarks. For benchmarks, we use American call option pricing under BOPM and TOPM, and American put option under the BSM. For the BOPM call option benchmarks, our baselines are the option call probability calculations from QuantLib (contributors, 2022) and Zubair et al.’s parallel cache optimized model (Zubair and Mukkamala, 2008). We use the optimized implementations of these two baselines available in Par-bin-ops (Brunelle, 2022). These implementations are the fastest existing implementations of BOPM call option pricing. For the TOPM call option and BSM put option, our FFT-based implementations are compared with our parallel looping-based vanilla implementations, as we could not find any publicly available faster implementations.
We use the perf (version: 3.10.0-1160.53.1.el7.x86_64.debug) tool (perf, [n. d.]) to analyze the system-wide energy consumption, and the PAPI (version: 5.6.0) library (Weaver et al., 2013) for cache miss counts for our implementations and benchmarks.
Par-bin-ops. In 2022, Brunelle et al. (Brunelle, 2022) released an open-source framework that can leverage parallel, cache-optimized algorithms to compute a variety of binomial option types and enables a simple interface for developers. We used the latest version from Github. Experimental evaluations by Brunelle et al. (Brunelle, 2022) has shown that Par-bin-ops achieves more than speedup over the QuantLib library when evaluating a European call option using 200,000 steps. Therefore, we chose the Par-bin-ops tool to benchmark our implementations of our FFT-based algorithms. To have a valid comparison of running times, we use Par-bin-ops for both QuantLib and Zubaer et al.’s (Zubair and Mukkamala, 2008) option probability calculation equations and report the running times comparing with our FFT-based implementations. We use the stencil-based cache-optimized version of Zubaer et al.’s algorithm from Par-bin-ops.
Parameter Values. As Table 2 shows, the only option pricing parameter that influences the performance bounds of the algorithms in our experiments is the number of time steps . Therefore, we keep all other option pricing parameters fixed in all of our experiments. We use the following parameter values: , , , , , (notations are in Table 1).
5.1. Parallel Running Times
Figure 5 shows the parallel running time comparison of American call option pricing calculations under BOPM. Our FFT-based algorithm uses a recursive divide-and-conquer approach. We have found empirically that a base case size of 8 steps yields the best running times. Experimental results show that our FFT-based algorithm can outperform Par-bin-ops for any number of step sizes for both serial and parallel implementations. We achieve more than speedup for and more than speedup for w.r.t. Par-bin-ops implementations.
Our TOPM algorithm runs more than faster for and more than faster for w.r.t. the parallel vanilla code. Figure 5 shows the comparisons.
Figure 5 shows the parallel running time comparisons of the American put option pricing computations under BSM. Our FFT-based implementation is compared with the looping-based vanilla implementation. Our algorithm achieves more than speedup for and more than speedup for w.r.t. the vanilla implementation.
5.2. Energy Consumption
Figure 6 shows the comparison of system-wide energy consumption while running our FFT-based implementation and respective benchmarks. We collected the energy consumption estimate of the entire package (pkg) and the main memory (RAM) through the RAPL (Running Average Power Limit) interface of model-specific registers (MSR) using the perf (perf, [n. d.]) profiling tool. Our FFT-based BOPM, TOPM, and BSM implementations consume 99%, 99%, and 96% less energy, respectively, compared to their benchmarks for large values used in our experiments. For , the energy savings are 80%, 50%, and 40%, respectively. Energy plots for pkg and RAM separately are included as supplementary material.
5.3. Cache Misses
Figure 7 shows L1 cache-miss (= L2 cache access) and L2 cache-miss results, respectively, for all implementations. Our FFT-based implementation incurs far fewer L1 cache misses than both Par-bin-ops implementations for BOPM. However, while we incur far fewer L2 cache-misses than ql-bopm, the other one (zb-bopm) incurs fewer L2 misses than ours. In case of TOPM, while our FFT-based implementation incurs far fewer L2 misses than our parallel looping implementation (vanilla-topm), the trend is not clear for L1 misses. For BSM, neither L1 nor L2 misses seem to have a clear winner.
5.4. Scalability
As Table 2 shows, our algorithms reduce the span of solving the option pricing problems that we consider by a factor of , but they reduce the work by a substantially larger factor of . While the reduction in work leads to a significant reduction in energy consumption (see Section 5.2), it also leads to a low parallelism of . But it is still easy to see from the complexities given in Table 2 that the parallel running time of our algorithms will be asymptotically lower than that of the corresponding fastest existing parallel algorithms for every value of (stated in Proposition 1.1).
fft-bopm | |||||||
---|---|---|---|---|---|---|---|
ql-bopm |
Table 5 above shows how the parallel running times of our FFT-based BOPM implementation (fft-bopm) and that of the QuantLib-based implementation from Par-bin-ops (ql-bopm) vary with as is kept fixed at . We see that although the parallel running time of fft-bopm stops decreasing somewhere between and , it remains significantly faster than ql-bopm even when . Our algorithm scales better as increases, e.g., for , it scales to a , and compared to a subsecond running time of fft-bopm for , ql-bopm takes hours to run.
6. Conclusion and Future Work
We have designed fast American option pricing algorithms under the binomial, trinomial, and the Black-Scholes-Merton models. We solve a type of nonlinear stencil problem that is of independent interest with potential applications beyond quantitative finance.
Future work may explore other models for American option pricing, such as the time dependent volatility model, stochastic volatility model, and the jump diffusion model. European and Asian options, Lookback options, Knock-out Barrier options, and Bermudan options are also of interest.
Supplementary Material. Includes our FFT-based implementations and details on our TOPM results (Section 3).
Acknowledgements We want to thank Julian Shun, Yuanhao Wei, Shangdi Yu, Pranali Vani, and Sabiyyah Ali for valuable discussions and a careful review of the initial draft of the paper which helped enhance its quality and ensure the correctness.
References
- (1)
- Ahmad et al. (2021) Zafar Ahmad, Rezaul Chowdhury, Rathish Das, Pramod Ganapathi, Aaron Gregory, and Yimin Zhu. 2021. Fast stencil computations using fast Fourier transforms. In Proceedings of the 33rd ACM Symposium on Parallelism in Algorithms and Architectures. 8–21.
- Alobaidi and Mallier (2001) Ghada Alobaidi and Roland Mallier. 2001. Asymptotic analysis of American call options. International Journal of Mathematics and Mathematical Sciences 27, 3 (2001), 177–188.
- Ames (2014) William F. Ames. 2014. Numerical methods for partial differential equations. Academic press.
- Andonov and Rajopadhye (1997) Rumen Andonov and Sanjay Rajopadhye. 1997. Optimal orthogonal tiling of 2-D iterations. Journal of Parallel and Distributed computing 45, 2 (1997), 159–165.
- Atangana and Nieto (2015) Abdon Atangana and Juan J. Nieto. 2015. Numerical solution for the model of RLC circuit via the fractional derivative without singular kernel. Advances in Mechanical Engineering 7, 10 (2015), 1687814015613758.
- Aubin et al. (2004) Joelle Aubin, David F. Fletcher, and Catherine Xuereb. 2004. Modeling turbulent flow in stirred tanks with CFD: the influence of the modeling approach, turbulence model and numerical scheme. Experimental thermal and fluid science 28, 5 (2004), 431–445.
- Avissar and Pielke (1989) Roni Avissar and Roger A Pielke. 1989. A parameterization of heterogeneous land surfaces for atmospheric numerical models and its impact on regional meteorology. Monthly Weather Review 117, 10 (1989), 2113–2136.
- Bachelier (1900) Louis Bachelier. 1900. Théorie de la spéculation. In Annales scientifiques de l’École normale supérieure, Vol. 17. 21–86.
- Bakshi and Madan (2000) Gurdip Bakshi and Dilip Madan. 2000. Spanning and derivative-security valuation. Journal of financial economics 55, 2 (2000), 205–238.
- Ball and Roma (1994) Clifford A. Ball and Antonio Roma. 1994. Stochastic volatility option pricing. Journal of Financial and Quantitative Analysis 29, 4 (1994), 589–607.
- Bandishti et al. (2012) Vinayaka Bandishti, Irshad Pananilath, and Uday Bondhugula. 2012. Tiling stencil computations to maximize parallelism. In Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis. 1–11.
- Barth and Deconinck (2013) Timothy J. Barth and Herman Deconinck. 2013. High-order Methods for Computational Physics. Vol. 9. Springer Science & Business Media.
- Bensoussan (1984) Alain Bensoussan. 1984. On the theory of option pricing. Acta Applicandae Mathematica 2, 2 (1984), 139–158.
- Bergomi (2015) Lorenzo Bergomi. 2015. Stochastic volatility modeling. CRC press.
- Black and Scholes (1973) Fischer Black and Myron Scholes. 1973. The pricing of options and corporate liabilities. Journal of political economy 81, 3 (1973), 637–654.
- Bondhugula et al. (2016) Uday Bondhugula, Aravind Acharya, and Albert Cohen. 2016. The Pluto+ algorithm: A practical approach for parallelization and locality optimization of affine loop nests. ACM Transactions on Programming Languages and Systems 38, 3 (2016), 1–32.
- Bondhugula et al. (2017) Uday Bondhugula, Vinayaka Bandishti, and Irshad Pananilath. 2017. Diamond tiling: Tiling techniques to maximize parallelism for stencil computations. IEEE Transactions on Parallel and Distributed Systems 28, 5 (2017), 1285–1298.
- Boyle (1986) Phelim Boyle. 1986. Option Valuation Using a Three-Jump Process. International Options Journal 3 (1986), 7–12.
- Boyle (1977) Phelim P. Boyle. 1977. Options: A monte carlo approach. Journal of financial economics 4, 3 (1977), 323–338.
- Brunelle (2022) Terryn Brunelle. 2022. Parallelizing Tree Traversals for Binomial Option Pricing. Ph. D. Dissertation. Massachusetts Institute of Technology.
- Burden et al. (2015) Richard L. Burden, Douglas J. Faires, and Annette M. Burden. 2015. Numerical analysis. Cengage learning.
- Chang et al. (2007) Chuang-Chang Chang, San-Lin Chung, and Richard C. Stapleton. 2007. Richardson extrapolation techniques for the pricing of American-style options. Journal of Futures Markets: Futures, Options, and Other Derivative Products 27, 8 (2007), 791–817.
- Chen and Chadam (2007) Xinfu Chen and John Chadam. 2007. A mathematical analysis of the optimal exercise boundary for American put options. SIAM Journal on Mathematical Analysis 38, 5 (2007), 1613–1641.
- Chen and Forsyth (2008) Zhuliang Chen and Peter A. Forsyth. 2008. A numerical scheme for the impulse control formulation for pricing variable annuities with a guaranteed minimum withdrawal benefit (GMWB). Numer. Math. 109, 4 (2008), 535–569.
- Cont and Voltchkova (2005) Rama Cont and Ekaterina Voltchkova. 2005. A finite difference scheme for option pricing in jump diffusion and exponential Lévy models. SIAM J. Numer. Anal. 43, 4 (2005), 1596–1626.
- contributors (2022) The QuantLib contributors. 2022. QuantLib: a free/open-source library for quantitative finance. https://doi.org/10.5281/zenodo.6461652 If you use this software, please cite it using these metadata..
- Cormen et al. (2009) Thomas H Cormen, Charles E Leiserson, Ronald L Rivest, and Clifford Stein. 2009. Introduction to Algorithms. MIT Press.
- Cox et al. (1979) John C. Cox, Stephen A. Ross, and Mark Rubinstein. 1979. Option pricing: A simplified approach. Journal of financial Economics 7, 3 (1979), 229–263.
- Crank and Nicolson (1947) John Crank and Phyllis Nicolson. 1947. A practical method for numerical evaluation of solutions of partial differential equations of the heat-conduction type. In Mathematical proceedings of the Cambridge philosophical society, Vol. 43. Cambridge University Press, 50–67.
- Cundall and Hart (1992) Peter A. Cundall and Roger D. Hart. 1992. Numerical modelling of discontinua. Engineering computations (1992).
- Dempster and Hong (2002) Michael A. H. Dempster and George S. S. Hong. 2002. Spread option valuation and the fast Fourier transform. In Mathematical Finance—Bachelier Congress 2000. Springer, 203–220.
- Derman and Kani (1994) Emanuel Derman and Iraj Kani. 1994. Riding on a smile. Risk 7, 2 (1994), 32–39.
- Derman and Kani (1998) Emanuel Derman and Iraj Kani. 1998. Stochastic implied trees: Arbitrage pricing with stochastic term and strike structure of volatility. International journal of theoretical and applied finance 1, 01 (1998), 61–110.
- Duan and Simonato (1998) Jin-Chuan Duan and Jean-Guy Simonato. 1998. Empirical martingale simulation for asset prices. Management Science 44, 9 (1998), 1218–1233.
- Duffie et al. (2000) Darrell Duffie, Jun Pan, and Kenneth Singleton. 2000. Transform analysis and asset pricing for affine jump-diffusions. Econometrica 68, 6 (2000), 1343–1376.
- Dupire (1994) Bruno Dupire. 1994. Pricing with a smile. Risk 7, 1 (1994), 18–20.
- Frigo and Strumpen (2005) Matteo Frigo and Volker Strumpen. 2005. Cache oblivious stencil computations. In Proceedings of the 19th International Conference on Supercomputing. 361–366.
- Frigo and Strumpen (2009) Matteo Frigo and Volker Strumpen. 2009. The cache complexity of multithreaded cache oblivious algorithms. Theory of Computing Systems 45, 2 (2009), 203–233.
- Galai and Masulis (1976) Dan Galai and Ronald W. Masulis. 1976. The option pricing model and the risk factor of stock. Journal of Financial economics 3, 1-2 (1976), 53–81.
- Gammie et al. (2003) Charles F. Gammie, Jonathan C. McKinney, and Gábor Tóth. 2003. HARM: a numerical scheme for general relativistic magnetohydrodynamics. The Astrophysical Journal 589, 1 (2003), 444.
- Geman et al. (2001) Hélyette Geman, Dilip B. Madan, and Marc Yor. 2001. Time changes for Lévy processes. Mathematical Finance 11, 1 (2001), 79–96.
- Han and Wang (2015) Daozhi Han and Xiaoming Wang. 2015. A second order in time, uniquely solvable, unconditionally stable numerical scheme for Cahn–Hilliard–Navier–Stokes equation. J. Comput. Phys. 290 (2015), 139–156.
- Heston (1993) Steven L. Heston. 1993. A closed-form solution for options with stochastic volatility with applications to bond and currency options. The review of financial studies 6, 2 (1993), 327–343.
- Hildebrand (1987) Francis B. Hildebrand. 1987. Introduction to numerical analysis. Courier Corporation.
- Hirouchi et al. (2009) Tomoyuki Hirouchi, Tomohiro Takaki, and Yoshihiro Tomita. 2009. Development of numerical scheme for phase field crystal deformation simulation. Computational materials science 44, 4 (2009), 1192–1197.
- Högstedt et al. (1999) Karin Högstedt, Larry Carter, and Jeanne Ferrante. 1999. Selecting tile shape for minimal execution time. In ACM Symposium on Parallel algorithms and architectures. 201–211.
- Holmes (2007) Mark H. Holmes. 2007. Introduction to numerical methods in differential equations. Springer.
- Hull (2003) John C. Hull. 2003. Options futures and other derivatives (5th ed.). Pearson Education India.
- Ibanez and Zapatero (2004) Alfredo Ibanez and Fernando Zapatero. 2004. Monte Carlo valuation of American options through computation of the optimal exercise frontier. Journal of Financial and Quantitative Analysis 39, 2 (2004), 253–275.
- Jackson et al. (2008) Kenneth R. Jackson, Sebastian Jaimungal, and Vladimir Surkov. 2008. Fourier space time-stepping for option pricing with Lévy models. Journal of Computational Finance 12, 2 (2008), 1–29.
- James (1980) Frederick James. 1980. Monte Carlo theory and practice. Reports on progress in Physics 43, 9 (1980), 1145.
- Johnson (2010) Martin T. Johnson. 2010. A numerical scheme to calculate temperature and salinity dependent air-water transfer velocities for any gas. Ocean Science 6, 4 (2010), 913–932.
- Kalnay et al. (1990) Eugenia Kalnay, Masao Kanamitsu, and Wayman E. Baker. 1990. Global numerical weather prediction at the National Meteorological Center. Bulletin of the American Meteorological Society 71, 10 (1990), 1410–1428.
- Karatzas and Shreve (1998) Ioannis Karatzas and Steven E. Shreve. 1998. Methods of mathematical finance. Vol. 39. Springer.
- Komissarov (2002) Serguei S. Komissarov. 2002. Time-dependent, force-free, degenerate electrodynamics. Monthly Notices of the Royal Astronomical Society 336, 3 (2002), 759–766.
- Kou (2002) Steven G. Kou. 2002. A jump-diffusion model for option pricing. Management science 48, 8 (2002), 1086–1101.
- Krishnamoorthy et al. (2007) Sriram Krishnamoorthy, Muthu Baskaran, Uday Bondhugula, Jagannathan Ramanujam, Atanas Rountev, and Ponnuswamy Sadayappan. 2007. Effective automatic parallelization of stencil computations. ACM sigplan notices 42, 6 (2007), 235–244.
- Kumar et al. (2012) Sunil Kumar, Ahmet Yildirim, Yasir Khan, Hossein Jafari, Khosro Sayevand, and Leilei Wei. 2012. Analytical solution of fractional Black-Scholes European option pricing equation by using Laplace transform. Journal of fractional calculus and Applications 2, 8 (2012), 1–9.
- Langat et al. (2019) Kenneth Kiprotich Langat, Joseph Ivivi Mwaniki, and George Korir Kiprop. 2019. Pricing options using trinomial lattice method. Journal of Finance and Economics 7, 3 (2019), 81–87.
- LeVeque (2007) Randall J. LeVeque. 2007. Finite difference methods for ordinary and partial differential equations: steady-state and time-dependent problems. SIAM.
- Long et al. (2008) Wen Long, James T. Kirby, and Zhiyu Shao. 2008. A numerical scheme for morphological bed level calculations. Coastal Engineering 55, 2 (2008), 167–180.
- Longstaff and Schwartz (2001) Francis A. Longstaff and Eduardo S. Schwartz. 2001. Valuing American options by simulation: a simple least-squares approach. The review of financial studies 14, 1 (2001), 113–147.
- Lord and Rougemont (2004) Gabriel J. Lord and Jacques Rougemont. 2004. A numerical scheme for stochastic PDEs with Gevrey regularity. IMA journal of numerical analysis 24, 4 (2004), 587–604.
- Lord et al. (2008) Roger Lord, Fang Fang, Frank Bervoets, and Cornelis W. Oosterlee. 2008. A fast and accurate FFT-based method for pricing early-exercise options under Lévy processes. SIAM Journal on Scientific Computing 30, 4 (2008), 1678–1705.
- MacKean (1965) Henry P. MacKean. 1965. A free boundary problem for the heat equation arising from a problem in mathematical economics. Industrial Management Review 6 (1965), 32–39.
- Madan et al. (1998) Dilip B. Madan, Peter P. Carr, and Eric C. Chang. 1998. The variance gamma process and option pricing. Review of Finance 2, 1 (1998), 79–105.
- Malas et al. (2015) Tareq Malas, Georg Hager, Hatem Ltaief, Holger Stengel, Gerhard Wellein, and David Keyes. 2015. Multicore-optimized wavefront diamond blocking for optimizing stencil updates. SIAM Journal on Scientific Computing 37, 4 (2015), C439–C464.
- Malas et al. (2014) Tareq M. Malas, Georg Hager, Hatem Ltaief, and David E. Keyes. 2014. Towards energy efficiency and maximum computational intensity for stencil algorithms using wavefront diamond temporal blocking. ArXiv abs/1410.5561 (2014).
- Mangeney et al. (2002) André Mangeney, Francesco Califano, Carlo Cavazzoni, and Pavel M. Travnicek. 2002. A numerical scheme for the integration of the Vlasov–Maxwell system of equations. J. Comput. Phys. 179, 2 (2002), 495–538.
- Matsuda (2004) Kazuhisa Matsuda. 2004. Introduction to Merton jump diffusion model. Department of Economics, The Graduate Center, The City University of New York, New York (2004).
- Merton (1973) Robert C. Merton. 1973. Theory of rational option pricing. The Bell Journal of economics and management science (1973), 141–183.
- Merton (1976) Robert C. Merton. 1976. Option pricing when underlying stock returns are discontinuous. Journal of financial economics 3, 1-2 (1976), 125–144.
- Metropolis and Ulam (1949) Nicholas Metropolis and Stanislaw Ulam. 1949. The monte carlo method. Journal of the American statistical association 44, 247 (1949), 335–341.
- Murray and Simmonds (1991) Ross J. Murray and Ian Simmonds. 1991. A numerical scheme for tracking cyclone centres from digital data. Australian meteorological magazine 39, 3 (1991), 155–166.
- Najm et al. (1998) Habib N. Najm, Peter S. Wyckoff, and Omar M. Knio. 1998. A semi-implicit numerical scheme for reacting flow: I. stiff chemistry. J. Comput. Phys. 143, 2 (1998), 381–402.
- Oliveira (2014) Pedro S. Oliveira. 2014. The convolution method for pricing American options under Lévy processes. Ph. D. Dissertation. University of Lisbon.
- Palamadai Natarajan et al. (2017) Ekanathan Palamadai Natarajan, Maryam Mehri Dehnavi, and Charles Leiserson. 2017. Autotuning divide-and-conquer stencil computations. Concurrency and Computation: Practice and Experience 29, 17 (2017), e4127.
- Pang (1999) Tao Pang. 1999. An Introduction to Computational Physics. American Association of Physics Teachers.
- Paoli and Schatzman (2002) Laetitia Paoli and Michelle Schatzman. 2002. A numerical scheme for impact problems I: The one-dimensional case. SIAM J. Numer. Anal. 40, 2 (2002), 702–733.
- perf ([n. d.]) perf [n. d.]. perf: Linux profiling with performance counters. https://perf.wiki.kernel.org/index.php/Main_Page.
- Peter and Packer (2007) HÃ Peter and Frank Packer. 2007. Understanding asset prices: an overview. Bis Papers (2007).
- Peyré (2011) Gabriel Peyré. 2011. The numerical tours of signal processing-advanced computational signal and image processing. IEEE Computing in Science and Engineering 13, 4 (2011), 94–97.
- Prathumwan and Trachoo (2020) Din Prathumwan and Kamonchat Trachoo. 2020. On the solution of two-dimensional fractional Black–Scholes equation for European put option. Advances in Difference Equations 2020, 1 (2020), 1–9.
- Rappaz et al. (2010) Michel Rappaz, Michel Bellet, and Michel Deville. 2010. Numerical Modeling in Materials Science and Engineering. Vol. 32. Springer Science & Business Media.
- Rendleman (1979) Richard J. Rendleman. 1979. Two-state option pricing. The Journal of Finance 34, 5 (1979), 1093–1110.
- Renson et al. (2016) Ludovic Renson, Gaëtan Kerschen, and Bruno Cochelin. 2016. Numerical computation of nonlinear normal modes in mechanical engineering. Journal of Sound and Vibration 364 (2016), 177–206.
- Richardson (1911) Lewis F. Richardson. 1911. The approximate arithmetical solution by finite differences with an application to stresses in masonry dams. Philosophical Transactions of the Royal Society of America 210 (1911), 307–357.
- Richardson and Gaunt (1927) Lewis F. Richardson and John A. Gaunt. 1927. VIII. The deferred approach to the limit. Philosophical Transactions of the Royal Society of London. Series A, containing papers of a mathematical or physical character 226, 636-646 (1927), 299–361.
- Robert (1981) André Robert. 1981. A stable numerical integration scheme for the primitive meteorological equations. Atmosphere-Ocean 19, 1 (1981), 35–46.
- Robert (1982) André Robert. 1982. A semi-Lagrangian and semi-implicit numerical integration scheme for the primitive meteorological equations. Journal of the Meteorological Society of Japan. Ser. II 60, 1 (1982), 319–325.
- Runggaldier (2003) Wolfgang J. Runggaldier. 2003. Jump-diffusion models. In Handbook of heavy tailed distributions in finance. Elsevier, 169–209.
- Samuelson et al. (2011) Paul A. Samuelson, Alison Etheridge, Mark Davis, and Louis Bachelier. 2011. Louis Bachelier’s Theory of Speculation: The Origins of Modern Finance. Princeton University Press.
- Sato et al. (2010) Katsuto Sato, Hiroyuki Takizawa, Kazuhiko Komatsu, and Hiroaki Kobayashi. 2010. Automatic tuning of CUDA execution parameters for stencil processing. Software Automatic Tuning: From Concepts to State-of-the-Art Results (2010), 209–228.
- Sewell (2005) Granville Sewell. 2005. The numerical solution of ordinary and partial differential equations. Vol. 75. John Wiley & Sons.
- Sharpe (1978) William F. Sharpe. 1978. Investments. Prentice Hall.
- Shreve (2005) Steven Shreve. 2005. Stochastic calculus for finance I: the binomial asset pricing model. Springer Science & Business Media.
- Shreve (2004) Steven E. Shreve. 2004. Stochastic calculus for finance II: Continuous-time models. Vol. 11. Springer.
- Smith (1976) Clifford W. Smith. 1976. Option pricing: A review. Journal of Financial Economics 3, 1-2 (1976), 3–51.
- Snider and Banerjee (2010) Dale Snider and Sibashis Banerjee. 2010. Heterogeneous gas chemistry in the CPFD Eulerian–Lagrangian numerical scheme (ozone decomposition). Powder Technology 199, 1 (2010), 100–106.
- Stampede (ede2) Stampede Stampede2. The Stampede2 supercomputing cluster. https://www.tacc.utexas.edu/systems/stampede2.
- Stein and Stein (1991) Elias M. Stein and Jeremy C. Stein. 1991. Stock price distributions with stochastic volatility: an analytic approach. The review of financial studies 4, 4 (1991), 727–752.
- Strikwerda (2004) John C. Strikwerda. 2004. Finite difference schemes and partial differential equations. SIAM.
- Szilard (2004) Rudolph Szilard. 2004. Theories and applications of plate analysis: classical, numerical and engineering methods. Appl. Mech. Rev. 57, 6 (2004), B32–B33.
- Taflove and Hagness (2005) Allen Taflove and Susan C. Hagness. 2005. Computational electrodynamics: the finite-difference time-domain method. Artech house.
- Tang et al. (2011) Yuan Tang, Rezaul A. Chowdhury, Bradley C. Kuszmaul, Chi-Keung Luk, and Charles E. Leiserson. 2011. The pochoir stencil compiler. In Proceedings of the twenty-third annual ACM symposium on Parallelism in algorithms and architectures. 117–128.
- Thijssen (2007) Jos Thijssen. 2007. Computational Physics. Cambridge University Press.
- Thomas (2013) James W. Thomas. 2013. Numerical partial differential equations: finite difference methods. Vol. 22. Springer Science & Business Media.
- Tsitsiklis and Van R. (2001) John N. Tsitsiklis and Benjamin Van R. 2001. Regression methods for pricing complex American-style options. IEEE Transactions on Neural Networks 12, 4 (2001), 694–703.
- Van M. (1976) Pierre Van M. 1976. On optimal stopping and free boundary problems. Archive for Rational Mechanics and Analysis 60, 2 (1976), 101–148.
- Van R. (2012) Ursula Van R. 2012. Numerical methods in computational electrodynamics: linear systems in practical applications. Vol. 12. Springer Science & Business Media.
- Vese and Osher (2002) Luminita A. Vese and Stanley J. Osher. 2002. Numerical methods for p-harmonic flows and applications to image processing. SIAM J. Numer. Anal. 40, 6 (2002), 2085–2104.
- Vesely (1994) Franz J. Vesely. 1994. Computational Physics. Springer.
- Vijayaraghavan and Keith (1990) D. Vijayaraghavan and Theo Keith. 1990. An efficient, robust, and time accurate numerical scheme applied to a cavitation algorithm. (1990).
- Weaver et al. (2013) Vincent M. Weaver, Dan Terpstra, Heike McCraw, Matt Johnson, Kiran Kasichayanula, James Ralph, John Nelson, Phil Mucci, Tushar Mohan, and Shirley Moore. 2013. PAPI 5: Measuring power, energy, and the cloud. In 2013 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS). IEEE. https://doi.org/10.1109/ispass.2013.6557155
- Weickert (2001) Joachim Weickert. 2001. Applications of nonlinear diffusion in image processing and computer vision. Acta Math. Univ. Comenianae 70, 1 (2001), 33–50.
- Wolf and Lam (1991) Michael E. Wolf and Monica S. Lam. 1991. A data locality optimizing algorithm. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation. 30–44.
- Wolf et al. (1996) Michael E. Wolf, Dror E. Maydan, and Ding-Kai Chen. 1996. Combining loop transformations considering caches and scheduling. In Proceedings of the IEEE/ACM International Symposium on Microarchitecture. 274–286.
- Wolfe (1987) Michael J. Wolfe. 1987. Iteration space tiling for memory hierarchies. Parallel Processing for Scientific Computing 357 (1987), 361.
- Wonnacott (2002) David Wonnacott. 2002. Achieving scalable locality with time skewing. International Journal of Parallel Programming 30, 3 (2002), 181–221.
- Yuan and Agrawal (2002) Lixia Yuan and Om P. Agrawal. 2002. A numerical scheme for dynamic systems containing fractional derivatives. Journal of Vibration and Acoustics 124, 2 (2002), 321–324.
- Zhang et al. (2004) Jianfeng Zhang et al. 2004. A numerical scheme for BSDEs. Annals of Applied Probability 14, 1 (2004), 459–488.
- Zhang et al. (2015) Jiyuan Zhang, Tze Meng Low, Qi Guo, and Franz Franchetti. 2015. A 3 d-stacked memory manycore stencil accelerator system. (2015).
- Zhang et al. (2006) Peng Zhang, Sze Chun Wong, and Chi-Wang Shu. 2006. A weighted essentially non-oscillatory numerical scheme for a multi-class traffic flow model on an inhomogeneous highway. J. Comput. Phys. 212, 2 (2006), 739–756.
- Zhu (2006) Song-Ping Zhu. 2006. A new analytical approximation formula for the optimal exercise boundary of American put options. International Journal of Theoretical and Applied Finance 9, 07 (2006), 1141–1177.
- Zhu and He (2007) Song-Ping Zhu and Zhi-Wei He. 2007. Calculating the early exercise boundary of American put options with an approximation formula. International Journal of Theoretical and Applied Finance 10, 07 (2007), 1203–1227.
- Zhylyevskyy (2010) Oleksandr Zhylyevskyy. 2010. A fast Fourier transform technique for pricing American options under stochastic volatility. Review of Derivatives Research 13, 1 (2010), 1–24.
- Zubair and Mukkamala (2008) Mohammad Zubair and Ravi Mukkamala. 2008. High performance implementation of binomial option pricing. In Computational Science and Its Applications–ICCSA 2008: International Conference, Perugia, Italy, June 30–July 3, 2008, Proceedings, Part I 8. Springer, 852–866.
Appendix A American Call Option under Trinomial Option Pricing Model
A.1. Trinomial Option Pricing Model (TOPM)

TOPM shares many properties with BOPM, which is natural as a trinomial tree (see Figure 8 has similar structure to that of a binomial tree. The node-value remains since the number of ”remain the same” moves does not factor into the price. However, the value of and is subtly different, as , and . We represent the exercise value very similarly, .
For the trinomial value, we have now three dependencies, and . Since we have three possible changes in value, we have three probabilities corresponding to moving up , moving down , and remaining the same . Typically, the literature refers to the probability of remaining the same as , but as since use to refer to the quantity , we use to avoid confusion. This lets us write the trinomial value as , where ,
, and . The probabilities are alternate forms of those given in (Hull, 2003).
Let , . We will reuse the notion of red and green cells to distinguish those with larger binomial and excercise values respectively.
A.2. Properties of the Red-Green Divider
Similarly to the binomial tree, we assume that the trinomial tree will be embedded in a grid with the root at the bottom-left corner and the leaves in the top row . The children of node are at . has a price change factor of , has the same price and has a price change factor of . The tree only takes up the upper left triangle, such that the diagonal has a slope of 1/2.
Lemma A.1 shows that within the upper-left triangular area of if a cell is green then the cell to its right must also be green.
Lemma A.1.
For and , .
Proof.
We will reuse the notions and . Therefore, it suffices again to show that for all and . This can again be show through induction.
Consider the base case as . , which in turn means that since . So, the claim holds for .
Assuming the claim holds for some , we will show that it holds for ( for ). From this assumption, this implies the existence of a such that for and for . This is defined as the largest such that , the last ”red” cell in row . Therefore:
-
for , ;
-
for , ; and
-
for , .
Now we examine:
From the inductive assumption, we can assume that . Therefore, to show that , it suffices to show that the remaining terms are less than or equal to 0. A useful note is that since diagonals indicate no change with respect to the exercise value, , which lets us simplify the rest of this expression.
From here, we can start dividing out common terms so long as they are greater than 0.
We can transform the expression . We will let to simplify.
We can then substitute this back into the previous inequality: . Since , we can make this substitution and if the resulting expression is less than or equal to 0, clearly the current expression as is will be less than or equal to 0 as well.
Since , we get: Which clearly holds since .
By induction, for all , , we prove Lemma A.1. ∎
Lemma A.2 shows that within the upper-left triangular area of if a cell is green then the cell below it must also be green.
Lemma A.2.
For and , .
Proof.
Assume that this is false. This would imply that for some i,j when . So this would imply:
From Lemma A.1, we know that all the for all . Therefore, we can expand this to:
Note that since , we know that the expression on the left side is strictly greater than 0, allowing us to make the following substitution:
Since , so multiplying both sides by gives us this expression from Lemma A.1.
In Lemma A.1, we showed this expression to be less than or equal to 0, which is a contradiction. Therefore, Lemma A.2 holds. ∎
Lemma A.3.
for any and
Proof.
Lemma A.4.
for any and
Proof.
This can also be proved by induction using the similar methods as Lemma 2.5. We have the same base case, when we have . If both are green, then since it holds. If is green and is red, then we know that since it is green and that because it is red. If both are red, then the same applies. We then can consider the general case of , where if all hold.
(36) | |||
(37) |
It implies that
(by (36),(37)) | |||
∎
Lemma A.5.
For and ,
.
Proof.
This follows directly from Lemma A.4. Recall . If and , then . ∎
Corollary A.6.
For every , there exists an index such that all cells with are red and all (possibly zero) cells with are green. Also, for , .







A.3. Algorithm for American call option pricing under TOPM
The solution space, is a right-angle triangle with a vertical base of length and a horizontal base of length . We prove the following theorem, which is an adaptation of the Binomial Model Algorithm.
Theorem A.7.
Our algorithm solves the American call option pricing problem under TOPM in work and span, where is the number of time steps.
Proof.
We use the exact same techniques as for BOPM. We again are using the properties of the boundary between the red and green cells to reduce the problem to a stencil-computation. The primary differences are:
-
(1)
In the usage of the FFT-based stencil algorithm of (Ahmad et al., 2021), our stencil uses a larger (but still constant sized) neighborhood of grid values in the computation. Specifically, we have a dependence on the values and to compute , whereas in BOPM we depend only on and .
-
(2)
The width of the grid we are operating on is twice that of BOPM. This only contributes a constant factor to runtime in terms of the worst-case for the width of the trapezoids. This means that some claims regarding the fact that the solution space is a right isosceles triangle need to be slightly adjusted.
Partitioning the triangle into trapezoids.
Let be the right-angle triangle representing the solution space as in Figure 9. We can partition into a sequence of trapezoids just as with BOPM. We know the boundary at the row , where , so we let be the length from the to this boundary. Let be the point a distance downward from . We draw a horizontal through and get the point from the intersection of the horizontal with the sloped side of . This defines a trapezoid . We then follow along the horizontal through to red-green divider to point q, to get a distance , and repeat. We determine this boundary by solving the trapezoid corresponding to . We continue repeating this process of partitioning until we are left with a triangle of height .
Once we have solved all of the trapezoids above , we know that we can apply a naive algorithm, which for (of height ) will take a total of to solve, which will give us the final value for . Solving the trapezoid of height will again take time, which gives a total runtime of .
Solving a trapezoid
Solving a trapezoid is equivalent to solving the values of all red cells in the last row. Refer to figure 11. We are given the solutions to the last row in the trapezoid above, which makes this possible. Let us say this trapezoid has height , meaning that the boundary at the top of the trapezoid is away from the top left point , landing on a point . We again find the midpoint of , called and use this measure of to find a point below , . We again draw a horizontal line from , intersecting at . We compute all the red cells on (solving the trapezoid), in the same two steps:
-
(1)
Compute the cells on segment .
-
(2)
Using the cells of , compute the cells of .

Computing the values of all red cells on line segment
Let be the intersection of and . First, compute the cells of using the FFT-based stencil algorithm of (Ahmad et al., 2021), as all these cells are red and their values depend only on the red section of the top row, . We can then compute the second part, , by recursing on at the smaller trapezoid . Note that the height of is half that of .
Computing the values of all red cells on line segment
Once the values of have been calculated, we can use them to calculate the final row . From the calculation of , we know where the red-green boundary lies for that row. Let be that boundary point. We can take a line parallel to through to determine a point along . all of the points depend only on the values in , which have already been solved. We can again use the FFT-based stencil algorithm from (Ahmad et al., 2021). The rest of , , can be recursed on. Take a vertical through , and note its intersection with as point . forms a trapezoid that fits our specifications, which allows us to recurse on it to determine the rest of the red values along . Note: we do not need to calculate the green cells since they have a closed formula which can be evaluated at need. This gives us the recurrence , which solves to , assuming the base case of has time complexity , which we show next.
Base case.
The base case entails a trapezoid of height. Such a trapezoid can have arbitrary width, but will have a constant number of red cells. Since we only wish to compute the red cells since the green can be computed at will in time, we will use a number of max operators to determine all the values of the red cells. Hence, the base case only takes time.
What remains is to show the span. The span of computing one trapezoid of height depends on the time to compute the subtrapezoids and the time to compute the two stencils and . The FFT-based stencil algorithm (Ahmad et al., 2021) runs in span. Let be the span for a single trapezoid of height .
This solves to . Let be the total span of the algorithm. . ∎
Appendix B Energy Comsumption
We give additional plots of energy consumption in Fig. 10. In addition to outperforming existing approaches in terms of total energy consumption, the reduced energy consumption is pronounced when restricted to specific domains. Here we give the values for the package domain (pkg) and the RAM domain as described in Section 5.2 of the main body. For both measures across all algorithms our algorithm outperforms vanilla and standard implementations after instances of size . Our algorithm for the Black-Scholes-Merton Model was even more competitive, outperforming on instances of all sizes.