Summation of certain locally bilinear forms and its applications to the Fast Multipole Method

Yasuhiro Kajima [email protected], [email protected] Nagoya Zokei University, Komaki, Aichi 485-8563, Japan

Abstract

The Fast Multipole Method (FMM) reduces the computation of pairwise two-body interactions among $N$ -particles to order $N$ , whose computation cost should be of order $N^{2}$ by brute force. However, its implementation is somewhat complicated and requires a considerable amount of time to write the code. In this paper, I show a method that enables us to implement and write FMM algorithm code simply and briefly. FMM algorithm is composed of several steps. The main steps are Upward Pass and Downward Pass. Both the Upward Pass and Downward Pass include shift processes by which we move the centers of local expansions and multipole expansions. In this paper, I show a method that enables us to get rid of these processes. As a result of this simplification, the coding of FMM becomes much easier, and we can save considerable computation time. I compared the accuracy and time required to calculate potential fields with that of the existing FMM code.
Keywords:Fast Multipole Method, Parallel computation

1 Introduction

The fast multipole algorithm developed by Greengard and Rokhlin [1] enables us to calculate $N$ -particles interaction of particles such as gravitational or electrostatic forces with $\mathcal{O}(N)$ operations with predictable error bounds and considered as one of the top 10 algorithms of the 20th century [2]. Thus, a large amount of papers were published relating to this theory (such as [3]-[7]). The interaction is divided into two parts; near interaction and the rest (not near) interaction. The near interaction part is the interaction of particles each two of which are within a given distance (or within some boxes). This is calculated directly with $\mathcal{O}(N)$ operations. The rest part is calculated by the FMM also with $\mathcal{O}(N)$ operations. Thanks to the FMM algorithm, we can perform large-scale simulation ([8]). However, its implementation is not necessarily easy and the time needed to perform simulation is not necessarily short. In [8], we spent about half the computation time for the computation of forces, even though we employed FMM. Therefore, I propose a less time-consuming and easier to implement FMM in this paper.

2 Theory

2.1 Greengard’s Fast Multipole Method

In the following, I describe a two-dimensional case, which is simpler than three-dimensional cases but would be sufficient to understand more general cases intuitively. Unlike Greengard’s FMM, our method for two-dimensional cases can be applied to three dimensional cases straightforwardly. We use the following notation:

I.

”The (computation) box” is a (two-dimensional) square with sides of length five. We say the box is at level-zero. The size of the square can be chosen arbitrarily, but I set it five for convenience.
II.

The box contains $N$ -charged particles.
III.

The box is composed of four same-size regular squares, and we call them level-one squares. Here, the ”four” is because we are dealing with a two-dimensional case for simplicity. For a three-dimensional case, it should be ”nine”.
IV.

Each level-one square is composed of four same squares, and we say the squares are at level-two. We repeat this procedure up to a given level $l$ . If a square $m$ is at level $k$ , we write $m$ as $m^{k}$ to indicate its level $k$ if necessary. $m^{0}$ is the computation box (Fig.1).

Figure 1: Squares at level 0, 1, 2.
V.

For $m_{i}^{k}$ with $k\geq 2$ , there is a list of squares around $m_{i}^{k}$ called ”interaction list” (see Fig.2,[1]). If $m_{j}$ is on the interaction list of $m_{i}$ , we denote $m_{j}\in IL(m_{i})$ . We call same-sized squares well-separated if the distance between the squares is $>$ length of the sides of them.

Refer to caption — Figure 2: Interaction list for square i. We say that x’s are in the interaction list of i.

The method I describe in this paper (and Greengard’s method as well) is not restricted to cases where computation boxes are regular squares, nor two-dimensional. We can apply our method to three-dimensional cases. However, for the sake of depicting figures, I restrict myself to two-dimensional cases. The method I describe here for two-dimensional cases can be straightforwardly applied to higher-dimensional cases.

Now, we recall some basic steps composing the Greengard’s FMM briefly and then compare the steps of our method with that. For details, please refer to [1]. FMM is composed of the following steps:

Upward Pass:

U-1

Form multipole expansions $\Phi_{m^{l}_{i}}$ of the potential field due to charged particles in each finest level square $m_{i}^{l}$ about the square center using spherical harmonics (associated Legendre functions).
U-2

Form multipole expansions about the centers of all squares at all coarser levels. This step is done by shifting the multipole expansions $\Phi_{m^{k}_{i}}$ at level $k$ to the center of squares $m_{j}^{k-1}$ containing $m_{i}^{k}$ . Then add the four resulting multipole expansions to obtain $\Phi_{m^{k-1}_{j}}$ . Repeating this from $k=l$ to $k=1$ , we obtain all $\Phi_{m^{k}_{i}}$ ’s. The ”shift” is a translation of associated Legendre functions.

Downward Pass:

D-1

Form a local expansion $\tilde{\Psi}_{m}$ about the center of each square $m$ at each level $k<l$ .

This is done by first converting multipole expansions $\Phi_{m_{i}}$ for squares $m_{i}\in IL(m)$ to a local expansion about the center of $m$ . We denote this by $\tilde{\Psi}_{m_{,}mi}$

Then, adding these multipole expansions $\tilde{\Psi}_{m_{,}mi}$ , we obtain $\tilde{\Psi}_{m}$ :

$\tilde{\Psi}_{m}=\sum_{m_{i}\in IL(m)}\tilde{\Psi}_{m_{,}mi}.$
D-2

Form a local expansion $\Psi_{m}$ and compute interactions at the finest level $l$ .

Let $\Psi_{m}=0$ for the square $m$ at level 0. We shift the center of local expansion $\Psi_{m^{k}}$ of level $k$ to the centers of $m_{1}$ , $m_{2}$ , $m_{3}$ , $m_{4}$ in $m$ , where they are at level $k+1$ (now, we have made four local expansions). Then, we define $\Psi_{m_{i}^{k+1}}$ by adding these local expansions to $\tilde{\Psi}_{m_{i}^{k+1}}$ . Repeating this procedure inductively, we obtain local expansions at the finest level $l$ .

Evaluate:

E-1

For each charge $c\in m_{i}^{l}$ evaluate the potential (or force) due to not near (not contained in $m_{i}^{l}$ nor adjacent eight squares $m_{j}^{l}$ ) particles.
E-2

Compute potential due to charges not contained in E-1 directly and add it to the results of E-1.

2.2 Our Method

As mentioned above, the Greengard’s Fast Multipole Method first evaluates the total action of charged particles in a square toward a charged particle in another same-sized square in its interaction list, and sums up the total actions in a specific way so that the order of the calculation to be N. Greengard’s FMM represented this total action using spherical harmonics. In our method, we represent total actions of charges in a square $m_{i}^{k}$ by charges fixed in position in the square $m_{i}^{k}$ .

We first describe our method in some general manner as ”summation of locally bilinear forms”, and apply it to the computation of the potential of charges.

2.2.1 Locally bilinear forms

I describe here some results on ”summation of locally bilinear forms”.

We assume that there are $N$ points $c_{i}$ $(1\leq i\leq N)$ in the computation box $m^{0}$ . Each of these point $c_{i}$ is tagged to a different vector $v_{i}$ in a $\mathtt{d}$ -dimensional vector space $\bf{V}$ . We denote this relation by $c_{i}\rightarrow v_{i}$ . If $c_{i}\rightarrow v_{i}$ and $c_{i}\in m_{k}$ , we write as $v_{i}\in m_{k}$ by abuse of language. $v_{i}\in m^{0}$ always holds. We will define the map explicitly for charges in two dimensional space later, where the image $v$ of $c$ is determined by combining positions and electric charges of $c$ .

We also assume that there are bilinear forms $B_{m_{i},m_{j}}(v_{k},v_{l})$ for $m_{i}\in IL(m_{j})$ . They are bilinear with respect to vectors $v_{k}$ , $v_{l}\in V$ , however, the bilinear forms have specific meanings only for $v_{k}\in m_{i}$ and $v_{l}\in m_{j}$ and vectors spanned by them (spanned by vectors in each square). In this sense, we call them locally bilinear forms.

Then, we compute the summation of the locally bilinear forms for $v$ by

F(v)=\sum_{v_{k}\in m_{j},\ v\in m,\ 1\leq k\leq N}B_{m,m_{j}}(v,v_{k}).

Note that $B_{m_{i},m_{j}}(v_{k},v_{l})$ is not defined for $v_{k}$ near $v_{l}$ (i.e., in the inside of adjacent or same square of $m_{j}$ ), we understand as $B_{m_{i},m_{j}}(v_{k},v_{l})=0$ in this case. We can compute interactions of well-separated $N$ particles (i.e., interaction between points in well-separated squares) using $F(v)$ later.

The main result of this paper is that we can compute the function $F(v)$ in a $\mathcal{O}(N)$ computation without ”shift” procedure.

To show that the computation above is $\mathcal{O}(N)$ , we proceed as follows:

Upward Pass:

$\rm\bar{U}$ -1

Form vectors $v_{m_{i}^{l}}$ for all $m_{i}^{l}$ by $v_{m_{i}^{l}}=\sum_{v_{k}\in m_{i}^{l}}v_{k}$ , where $m_{i}^{l}$ are squares at level $l$ (the finest level).
$\rm\bar{U}$ -2

Form vectors $v_{m_{j}}$ from finest squares to coarser squares inductively by

$v_{m_{j}^{k}}=\sum_{{m_{i}^{k+1}}\subset m_{j}^{k}}v_{m_{i}^{k+1}},\ \ (k+1\leq l)$

Downward Pass:

$\rm\bar{D}$ -1

We form locally linear forms $\tilde{L}_{m}(v)$ for each square $m^{k}$ at each level $k<l$ . For a square $m$ , we add $B_{m_{i},m}(v_{m_{i}},v)$ for squares $m_{i}\in IL(m)$ , $v\in m$ and denote the resulting bilinear form by $\tilde{L}_{m}(v)$ :

$\tilde{L}_{m}(v)=\sum_{m_{i}\in IL(m)}B_{m_{i},m}(v_{m_{i}},v).$
$\rm\bar{D}$ -2

Compute interactions at the finest level $l$ . Let $L_{m}(v)=0$ for the square $m$ at level 0. We add the linear forms $L_{m^{k}}(v)$ of level $k$ to $\tilde{L}_{m_{i}^{k+1}}(v)$ ’s ( $i\in\{1,2,3,4\}$ ), where $m_{1}$ , $m_{2}$ , $m_{3}$ , $m_{4}$ are the (level $k+1$ ) four squares in $m^{k}$ . We denote the resulting linear forms by $L_{m_{i}^{k+1}}(v)$ . Repeating this procedure inductively, we obtain the linear forms at the finest level $l$ .

For $v\in m^{l}$ (a finest level square), put $F(v)=L_{m^{l}}(v)$ . This is the same as previously defined $F(v)$ . Each of these additions (appeared in $\rm\bar{D}$ -1 and $\rm\bar{D}$ -2) is essentially addition of $\mathtt{d}$ -dimensional dual vectors. The reason why this method is $\mathcal{O}(N)$ is almost the same as Greengard’s FMM.

2.2.2 Applications to FMM

To apply the results obtained above, we proceed as follows. For details, see the appendix.

I.

First, we fix a one-to-one correspondence between a charged particle and $v\in\bf{V}$
II.

Second, we construct bilinear forms $B_{m_{i},m_{j}}(v_{k},v_{l})$ such that a interaction (such as Coulomb interaction) between $c_{k}\in m_{i}$ and $c_{l}\in m_{j}$ is equal to $B_{m_{i},m_{j}}(v_{k},v_{l})$ within a predictable error bound, where $c_{k}\rightarrow v_{k}$ and $c_{l}\rightarrow v_{l}$ . In the following, an ”error is predictable” means that the value $|$ error $\times d^{u}|$ is bounded where $d$ is the distance between the squares and we know the exponent $u$ . $u$ becomes large if we increase the dimension $\mathtt{d}$ . The error would be estimated more specifically, but we do not pursue this issue.
III.

Lastly, we compute the interaction from nearby particles (which are not well-separated) directly.

Note: The map $\rightarrow$ does not depend on the choice of squares containing a charge $c$ , i.e., if $c\in m_{1}$ and $c\in m_{2}$ (here the level of $m_{1}$ and $m_{2}$ should be different), the image of the map is the same. Because of this fact, we can get rid of ”shift”’s in Upward Pass and Downward Pass.

3 Results

I have prepared a Fortran program by modifying existing FMM program developed by Prof. Shuji Ogata (FMMP[9]) to test execution time and accuracy. In the computation, I fixed $4\times 4\times 4=64$ positions in each cube precisely the same as Fig.2 in the appendix (The lengths of the sides are five. We are now dealing with three-dimensional cases, so I use a cube instead of a box). The computations were performed in double precision, compiled with Intel Fortran compiler version 19.1, the CPU used is Core i7-9700.

Here, in this section, we compare the computation timings and accuracies of our program with that of FMMP. In FMMP, the maximum order of multipoles is set to five except for a test case shown in the last row in Table 2 which is added to see how the maximum order of multipoles influences the results. Our method can be applied to parallel computing as with Prof. Ogata’s FMMP. However, I have coded for one node this time.

In the following tests, I scattered charges randomly in a cubic box. The charges are $+1$ or $-1$ .

Table 1: Computation timings averaged over

10

measurements for the case where
number of particles=80000, and maximum level

l

[h] program t_up¹ t_down² t_pfs³ total (sec) accuracy⁴ FMMP 9.1 $\times 10^{-4}$ 1.4 $\times 10^{-1}$ 2.9 $\times 10^{-2}$ $1.7\times 10^{-1}$ $0.14\times 10^{-2}$ program based on our method 2.2 $\times 10^{-4}$ 8.5 $\times 10^{-2}$ 9.0 $\times 10^{-3}$ $9.4\times 10^{-2}$ $0.18\times 10^{-2}$

1

time for upward pass
2

time for downward pass
3

time for computing potential field and force field
4

averaged values of the relative errors (average of $\sum\frac{PF_{FMM}-PF_{direct}}{PF_{direct}}$ )

Table 2: number of particles=200000, and maximum level

l

program	`t_up`	`t_down`	`t_pfs`	total (sec)	accuracy
FMMP	9.4 $\times 10^{-4}$	1.7 $\times 10^{-1}$	7.3 $\times 10^{-2}$	$2.4\times 10^{-1}$	$0.35\times 10^{-2}$
program based on our method	2.3 $\times 10^{-4}$	9.1 $\times 10^{-2}$	2.3 $\times 10^{-2}$	$1.1\times 10^{-1}$	$0.40\times 10^{-2}$
FMMP(maximum order
of multipoles is four)	5.3 $\times 10^{-4}$	8.1 $\times 10^{-2}$	5.6 $\times 10^{-2}$	$1.4\times 10^{-1}$	$1.1\times 10^{-2}$

Table 3: number of particles=80000, and maximum level

l

program	`t_up`	`t_down`	`t_pfs`	total (sec)	accuracy
FMMP	1.9 $\times 10^{-4}$	9.7 $\times 10^{-3}$	2.9 $\times 10^{-2}$	$3.9\times 10^{-2}$	$0.093\times 10^{-2}$
program based on our method	7.8 $\times 10^{-5}$	6.4 $\times 10^{-3}$	8.0 $\times 10^{-3}$	$1.4\times 10^{-2}$	$0.15\times 10^{-2}$

4 Discussions

(1) We first replaced spherical harmonics by charges fixed in position using Lagrange interpolation. However, there may be other interpolation suitable for FMM. Such a method had been already invented by William Fong and Eric Darve [3]. They used Chebyshev polynomials to interpolate. Their accuracies seem to be better than ours. However, since I have already coded my program before I knew their results and this paper aims to introduce a method that can remove ”shift” in FMM, I left as it was. It seems likely that the method introduced in this paper can be applied to Chebyshev polynomials.

(2) It takes time to compute the bilinear forms $B_{m_{i},m_{j}}(v_{k},v_{l})$ . However, since it does not depend on $v_{k}$ and $v_{l}$ , we can compute and store the bilinear forms in the memory in advance. It requires only once, so I ignored the time required to compute the bilinear forms. For the case of FMMP, some computation may be done in advance and shorten the computation time.

(3) The bilinear forms $B_{m_{i},m_{j}}(v_{k},v_{l})$ are essentially matrices. The computations to obtain the matrices go through computation with large numbers and end up with relatively small number entries. Thus, the computation of $B_{m_{i},m_{j}}(v_{k},v_{l})$ should not be done in a single precision.

(4) As shown in the tables in the previous section, we could save computation time substantially (about half). However, the accuracy of our method is slightly worse than that obtained by the FMMP program. It may become more accurate if we increase the dimension $\mathtt{d}$ of $\bf{V}$ or employ the method of William Fong and Eric Darve [3]. We find that if we set the maximum order of multipoles equal to four in FMM, the accuracy becomes considerably worse ( $>1\%$ ), but not so fast.

(5) Since the array data $B_{m_{i},m_{j}}(v_{k},v_{l})$ is very large, the time required to access memory storing the array is a crucial factor to determine computation time. If we find a good way to manage the memory, it is probable that the computation time would become less.

Appendix A Appendix

In this appendix, I will give a construction of the bilinear forms $B_{m_{i},m_{j}}(v_{k},v_{l})$ introduced in $\S 2.2.1$ in more detail in the following steps. I also provide here descriptions for $\rm{\,I\,}$ and $\rm{II}$ in $\S 2.2.2$ . We assume that the dimension $\mathtt{d}$ of $\bf{V}$ is equal to 16 for the sake of description. We denote by $\phi(c,c\prime)$ a function determined by charges $c$ and $c\prime$ such as Coulomb potential. We also write $a\thickapprox b$ if $a-b$ is within a predictable error.

STEP1

We define 16 (= $\mathtt{d}$ ) positions in each square at every level (Fig.3).
STEP2

Let a charge $c\in m^{k}$ with $k\geq 0$ . We define a map $h_{m}$ from $c$ to 16 point-charges on the fixed 16 positions in $m$ . For $k=0$ this map is the map introduced in $\S 2.2.1$ . The map $h_{m}$ has the following property (for detail, see A.2.2): If charges $c_{1}\in m_{1}^{k}$ and $c_{2}\in m_{2}^{k}$ with $k\geq 1$ and $m_{1}$ and $m_{2}$ are well-separated, then

$\phi(c_{1},c_{2})\thickapprox\sum^{16}_{i}\phi(c_{1}^{i},c_{2})$

where $h_{m_{1}}(c_{1})$ ={ $c_{1}^{i}$ , ( $1\leq i\leq 16$ )}. By assigning the electric charges of the 16 point charges to the coordinate of 16 dimensional vector space, the image of the map $h_{m_{1}}$ can be regarded as a column vector in a 16-dimensional space. We also denote this column vector by $h_{m_{1}}(c)$ . We identify the vector space $\bf{V}$ with the values of the 16 point-charges in $m^{0}$ .
STEP3

We define a bilinear form $b_{m_{1},m_{2}}(v_{1},v_{2})\thickapprox\phi(c_{1},c_{2})$ for $v_{1}\in m_{1}$ and $v_{2}\in m_{2}$ by

$b_{m_{1},m_{2}}(v_{1},v_{2})=\sum^{16}_{i}\sum^{16}_{j}\phi(c_{1}^{i},c_{2}^{j})$

where $c_{1}\rightarrow v_{1}$ and $c_{2}\rightarrow v_{2}$ , $h_{m_{1}}(c_{1})$ ={ $c_{1}^{i}$ , ( $1\leq i\leq 16$ )}, and $h_{m_{2}}(c_{2})$ ={ $c_{2}^{j}$ , ( $1\leq j\leq 16$ )}.
STEP4

By the STEP2 above, we can map a charge $c\in m$ to 16 point charges $h_{m}(c)$ $\in m$ . Let $p_{i}\in m^{0}$ $(1\leq i\leq 16)$ be the 16 point charges of $m^{0}$ arranged as in Fig.3. We assume $Chg(p_{i})=1$ $(1\leq i\leq 16)$ , where $Chg(p)$ denote the electric charge of $p$ . We can transform each of the 16 point charges $p_{i}\in m^{0}$ in $m^{0}$ to the 16 point charges in $m^{k}_{i}$ by $h_{m^{k}_{i}}$ . Then, we define a $16\times 16$ matrix $M_{m_{1}}$ by

$M_{m_{1}}=(h_{m_{i}}(p_{1}),h_{m_{i}}(p_{2}),,,h_{m_{i}}(p_{16})).$

$M_{m_{1}}$ transforms the 16 charges of $m^{0}$ to the 16 charges in $m_{1}$ .
STEP5

We define bilinear forms $B_{m_{i},m_{j}}(v_{k},v_{l})$ by

$B_{m_{i},m_{j}}(v_{k},v_{l})=b_{m_{i},m_{j}}(M_{m_{i}}v_{k},M_{m_{j}}v_{2}).$

Writing $b_{m_{1},m_{2}}(v_{1},v_{2})=v_{1}^{T}M_{b}v_{2}$ with a suitable matrix $M_{b}$ , the matrix for $B_{m_{i},m_{j}}(v_{k},v_{l})$ is

$M_{m_{i}}^{T}M_{b}M_{m_{j}},$

where $T$ denotes its transpose.

In the following, I describe the steps above in more detail.

A.1 STEP1

We fix 16 points as illustrated in Fig.3. The number of points should be $n^{2}$ and $n^{3}$ for 2-dimensional case and 3-dimensional case, respectively.

A.2 STEP2

A.2.1 Lagrange interpolation

For any polynomial $g(x)$ with $\deg(g)<n$ , we have

\sum^{n}_{i}f_{i}(x)g_{i}(a_{i})-g(x)=0

(1)

for arbitrarily chosen distinct $m$ -numbers { $a_{i}$ }, where $f_{i}(x)=\frac{F_{i}(x)}{F_{i}(a_{i})},\ F_{k}(x)=\prod^{n}_{i\neq k}(x-a_{i})=\frac{\prod^{n}_{i=1}(x-a_{i})}{(x-a_{k})}$ .

If $deg(g(x))\geq n$ or $g(x)$ is a power series, $\sum^{n}_{j}f_{j}(x)g(a_{j})-g(x)=x^{n}\times({\rm power\ series})$ . Thus potential field is approximated with a predictable error.

A.2.2 Approximation of a charge by charges fixed in position

Let us assume that a charged particle is located at a point $P$ . We represent the field of charge due to this particle at point $A$ by four charges located at $P_{2},P_{1},P_{-1},P_{-2}$ by applying the results above. Let $\phi(P,A)$ be a function determined by charges $P$ and $A$ . For instance, $\phi(P,A)=\frac{Chg(P)\times Chg(A)}{|P-A|}$ where $Chg(X)$ denote the electric charge of the point charge $X$ .

Let $A,\ B$ be points shown in Fig.4, $\phi(P,A)$ can be written as

\phi(P,A)\thickapprox\sum_{i,j\in\{2,1,-1,-2\}}f_{i}(x)f_{j}(y)\phi(L_{i,j},A)

(2)

where $f_{i}$ is the function defined in the previous section for $n=4$ and $a_{i}=\{2,1,-1,-2\}$ , and $(x,y)$ is the coordinate of the point $P$ as depicted in the figure. Putting $P=c_{1}$ , $A=c_{2}$ , and $\{c_{1}^{k},\ 1\leq k\leq 16\}=\{f_{i}(x)f_{j}(y)Chg(P),\ 1\leq i\leq 4,\ 1\leq j\leq 4\}$ as $h_{m_{1}}(c_{1})$ ={ $c_{1}^{i}$ , ( $1\leq i\leq 16$ )} described in STEP2, we get the relation $\phi(c_{1},c_{2})\thickapprox\sum^{16}_{i}\phi(c_{1}^{i},c_{2})$ .

References

[1] Leslie F. Greengard, The Rapid Evaluation of Potential Fields in Particle Systems, The MIT Press, Cambridge, Massachusetts, 1988
[2] The Best of the 20th Century: Editors Name Top 10 Algorithms, SIAM News, 33, 1 (2000)
[3] William Fong, Eric Darve, The black-box fast multipole method, J. Comp. Phys. 228, 8712-8725 (2009)
[4] Board, J. and Schulten, K. ,The fast multipole algorithm, Comput. Sci. Eng. 2, 76-79 (2000)
[5] C.A. White and M. Head-Gordon, Derivation and efficient implementation of the fast multipole method, J. Chem. Phys., 101, 6593-6605 (1994)
[6] H. Fujiwara, The fast multipole method for solving integral equations of three-dimensional topography and basin problems, Geophys. Int. J.,140, 198-210 (2000)
[7] T. Hrycak and V. Rokhlin, An improved fast multipole algorithm for potential fields,SIAM J. Sci. Comput., 19, 1804-1826 (1998)
[8] Y. Kajima, S. Ogata, R. Kobayashi, M. Hiyama, and T. Tamura, Fluctuating Local Recrystallization of Quasi-Liquid Layer of Sub-Micrometer-Scale Ice: A Molecular Dynamics Study, J. Phys. Soc. Jpn. 83, 83601 (2014)
[9] Shuji Ogata, Timothy J. Campbell, Rajiv K. Kalia, Aiichiro Nakano, Priya Vashishta, and Satyavani Vemparala, Scalable and portable implementation of the fast multipole method on parallel computers, Comp. Phys. Comm., 153, 445-461 (2003)